利用领域信息的基于字的鲁棒中文口语理解研究  被引量:1

Robust Character-Based Chinese Spoken Language Understanding with Domain Information

在线阅读下载全文

作  者:包长春[1] 徐为群[1] 李亚丽[1] 潘接林[1] 颜永红[1] 

机构地区:[1]中科院声学所中科信利实验室,北京100190

出  处:《微计算机应用》2010年第6期1-7,共7页Microcomputer Applications

基  金:国家科技支撑计划(2008BAI50B03);国家自然科学基金(10925419;90920302;10874203;60875014)

摘  要:鲁棒性是口语理解研究最具挑战性的关键问题之一。本文采用两个策略提高口语解析的鲁棒性:一是使用浅层统计理解框架,将口语解析简化为实体识别,并且以字取代词作为基本处理单元;二是在统计框架下,分别从特征提取和语料扩充两个角度充分利用领域信息。实验结果显示上述方法能有效提升语义解析性能。对于人机对话的测试集,当输入为语音识别结果时,解析性能(F1值)由75.27提升至90.24,输入为人工转抄结果时,性能由80.59提升至97.14。For spoken language understanding, robustness is one of the most challenging key issues. To achieve good robustness, two strategies were investigated. One is to adopt a shallow statistical understanding framework, in which the task of spoken language under- standing is simplified into a (named) entity recognition. In this framework, character is chosen as the basic processing unit instead of word. The other is to efficiently exploit domain information through subword enriched features and enlarged training corpus under the statistical framework. Experimental results show that the proposed strategies improved the understanding performance in F1 from 75. 27 to 90. 24 for using speech recognition output as input and from 80. 59 to 97. 14 for using manual transcripts respectively on a human - computer dialogue test set.

关 键 词:中文口语理解 领域信息 鲁棒性 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象