检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:WANG Xuyang ZHANG Pengyuan NA Xingyu PAN Jielin YAN Yonghong
机构地区:[1]The Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences [2]Xinjiang Laboratory of Minority Speech and Language Information Processing, Chinese Academy of Sciences
出 处:《Chinese Journal of Electronics》2017年第6期1239-1244,共6页电子学报(英文版)
基 金:supported by the National Natural Science Foundation of China(No.11461141004,No.61271426,No.11504406,No.11590770,No.11590771,No.11590772,No.11590773,No.11590774);the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDA06030100,No.XDA06030500,No.XDA06040603);National 863 Program(No.2015AA016306);National 973 Program(No.2013CB329302);the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region(No.201230118-3)
摘 要:In this paper, an hierarchical n-gram Language model(LM) combining words and characters is explored to improve the detection of Out-of-vocabulary(OOV) words in Mandarin Spoken term detection(STD).The hierarchical LM is based on a word-level LM, with a character-level LM estimating probabilities of OOV words in a class-based way. The region containing OOV words in the sentence to be decoded is detected with the help of the word-level LM and the probabilities of OOV words are derived from the character-level LM. The implementation of the proposed approach is based on a dynamic decoder. The proposed approach is evaluated in terms of Actual term weighted value(ATWV) on two Mandarin data sets. Experiment results show that more than 10% relative improvement for OOV word detection is achieved on both sets. In addition, the detection of In-vocabulary(IV) words is barely influenced as well.In this paper, an hierarchical n-gram Language model(LM) combining words and characters is explored to improve the detection of Out-of-vocabulary(OOV) words in Mandarin Spoken term detection(STD).The hierarchical LM is based on a word-level LM, with a character-level LM estimating probabilities of OOV words in a class-based way. The region containing OOV words in the sentence to be decoded is detected with the help of the word-level LM and the probabilities of OOV words are derived from the character-level LM. The implementation of the proposed approach is based on a dynamic decoder. The proposed approach is evaluated in terms of Actual term weighted value(ATWV) on two Mandarin data sets. Experiment results show that more than 10% relative improvement for OOV word detection is achieved on both sets. In addition, the detection of In-vocabulary(IV) words is barely influenced as well.
关 键 词:Spoken term detection(STD) Language model(LM) Out-of-vocabulary(OOV) words
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229