检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:何伟[1] 李红莲[1] 袁保宗[1] 林碧琴[1]
出 处:《中文信息学报》2003年第5期41-47,共7页Journal of Chinese Information Processing
基 金:国家"97 3"项目资助 (G1 9980 3 0 50 1 1 )
摘 要:目前由于特定任务域语料的稀疏并且难以收集,这严重阻碍了对话系统的可移植性。如何利用在线收集的少量训练语料,实现语言模型的快速自适应,从而有效提高对话系统在新任务域的识别率是本文的目的所在。本文对传统cache模型修正后,提出了基于历史单元衰减的cache语言模型,以在线递增方式收集语料进行自适应,并与通用语言模型进行线性插值。在对话系统中,以对话回合为历史单元,也可称为基于对话回合衰减的cache语言模型。在两个完全不同任务域———颐和园导游与火车票订票任务域进行的实验表明,在自适应语料不到1千句时,与无自适应模型相比,有监督模式下的识别错误率分别降低了47 8%和74 0%,无监督模式下的识别错误率分别降低了30 1%和51 1%。The substantial investment required for developing a spoken language system in each specific task is a hamper to the widespread use of speech technology. In this paper, to develop the toolkits for porting a spoken language system to a new application rapidly and simply, an improved cache modela history unit based decaying cache model is provided for online language model adaptation of spoken language systems. To capture the dialog state change, each user's utterance and system response are collected and trained. When each dialog turn finished, the cache is updated and bigram counts would be decimal after decaying. The cache bigram is interpolated with the generic trigram. Experiments are performed on two contrastive tasks: the train travel reservation and the park guide. When the training data just arrived to several hundred utterances, in both tasks there is a satisfying reduction in character error rate for both supervised and unsupervised adaptation.
关 键 词:计算机应用 中文信息处理 口语对话系统 语言模型 cache自适应
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229