检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王兮楼 郭武[1] 解传栋 WANG Xilou;GUO Wu;XIE Chuandong(National Engineering Laboratory for Speech and Language Information Processing,University of Science and Technology of China,Hefei 230027)
机构地区:[1]中国科学技术大学语音及语言信息处理国家工程实验室,合肥230027
出 处:《模式识别与人工智能》2018年第7期662-667,共6页Pattern Recognition and Artificial Intelligence
摘 要:基于资源稀少情况下的语音识别,提出针对大量无标注数据的半监督学习的挑选策略,应用到声学模型和语言模型建模.采用少量数据训练种子模型后,解码无标注数据.首先在解码的最佳候选结果中采用置信度与困惑度结合的方法挑选高可信的语句训练声学模型及语言模型.进一步对解码得到的格进行转化,得到多候选文本,用于语言模型训练.在日语识别任务上,相比基于置信度挑选数据的方法,文中方法在识别率上具有较大提升.For speech recognition of low resources, a selection strategy for semi-supervised learning with a large number of unlabeled data is proposed, and this strategy is applied to both acoustic modeling and language modeling. After a small amount of data is used to train the seed model, the unlabeled data is decoded using the seed model. Firstly, high-confidence sentences are selected by using a combination of confidence measure and perplexity in the decoded best candidate results. Then, the high-confidence sentences are used to train acoustic model and language model. Furthermore, the decoded lattice is transformed to obtain multiple candidate texts for language model training. In the Japanese recognition task, the proposed method obtains a better recognition rate than the method of selecting data based on confidence measure.
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3