检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《中文信息学报》2006年第4期82-87,共6页Journal of Chinese Information Processing
基 金:国家863计划资助项目(2001AA114071)
摘 要:基于主题的语言模型自适应方法应尽可能提高语言模型权重系数的更新速度并降低语言模型的调用量以满足语音识别实时性要求。本文采用基于聚类的方法实现连续相邻二元词对的量化表示并以此刻画语音识别预测历史和各个文本主题中心,依据语音识别历史矢量和各个文本主题中心矢量的相似度更新语言模型权重系数并摒弃全局语言模型。同传统的基于EM算法的自适应方法相比,实验表明该方法明显提高了语音识别性能和实时性,识别错误率相对下降5.1%,说明该方法可比较准确地判断测试内容所属文本主题。Topic-based language model adaptation algorithm should meet the real time need for speech recognition, this goal cab be implemented through imProving the updating speed of language model weighting coefficient and reducing the using of language model. In this paper, a novel quantization representation scheme for continuous adjoining bigram word pair was proposed via clustering, then it was used to characterize the speech recognition predictive histo- ry and each text topic center. The global language model was not used in this new scheme, language model weighting coefficient was updated in terms of the similarity of predictive history vector with text topic center vector, Compared with the traditional topic adaptation method based on EM algorithm, the experiments show that it had an obvious speech recognition gain accompanied with a better efficiency. The reduction of relative recognition error rate is about 5. 1%. So it was concluded that this new adaptation algorithm could more accurately identify the topic of the testing contents.
关 键 词:计算机应用 中文信息处理 语言模型 主题自适应 语音识别 文本分类
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117