检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京工业大学信息与计算科学实验室,北京100022 [2]新华社音像部,北京100803
出 处:《中文信息学报》2007年第4期97-104,共8页Journal of Chinese Information Processing
基 金:国家自然科学基金资助项目(60572125)
摘 要:广播语音的自动识别、标注、检索等是涉及到语音技术、自然语言处理、信息检索等多个领域的综合性课题。在介绍了广播语音的自动标注与检索的研究概况并分析了其中涉及的关键技术基础上,提出了面向普通话广播语音的多层次自动标注框架以及基于多层次标注的语音检索方案,对文档层、句子层和词语层的标注属性进行了探讨,采用了递归标注方法对属性逐层细化,并讨论了对语音自动标注至关重要的语音识别引擎和语音流分割等问题。基于本文提出的方法,对10小时的普通话广播语音资料进行了标注和检索,得到了比较满意的实验结果。The automatic transcription, annotation and retrieval of broadcasting news requires automatic speech recognition, natural language processing and information retrieval technologies. The state-of-the-art of broadcasting news automatic annotation and retrieval progress were discussed and the related key techniques were analyzed; then an approach of multi-level automatic annotation frame for Mandarin broadcasting news and retrieval method based on that annotation frame were presented, the annotation attributes for document level, utterance level and word level were investigated, the recursive method for multi-level annotation was proposed; Furthermore, the speech recognition engine and audio stream media segmentation problems which are closely related the speech annotation problem were investigated, the proposed approaches were applied to 10-hours' Mandarin broadcasting news for annotation and retrieval, the experiment results were satisfactory.
关 键 词:计算机应用 中文信息处理 广播语音 自动标注 语音检索 声学模型 语言模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.218.124.105