基于扩张矩阵理论的汉语韵律短语分析  被引量:2

Chinese Prosodic Phrasing Based on Extension Matrix Theory

在线阅读下载全文

作  者:谌卫军[1,2] 林福宗[1,2] 李建民[1,2] 张钹[1,2] 

机构地区:[1]清华大学计算机科学与技术系,北京100084 [2]清华大学智能技术与系统国家重点实验室,北京100084

出  处:《计算机学报》2003年第1期26-31,共6页Chinese Journal of Computers

基  金:国家自然科学基金重点项目 ( 60 13 5 0 10 );国家"九七三"重点基础研究发展规划项目 (G19980 3 0 5 0 9)资助

摘  要:提出了一种新的、基于扩张矩阵理论的归纳学习算法 :分组覆盖算法 ,并将其应用于汉语文语转换系统中的韵律短语分析问题 .算法以扩张矩阵为基础 ,在反例样本集背景下 ,将正例样本集划分为若干个一致的组 ,每一组对应于一个一致的规则 ,它覆盖了这组正例而且不覆盖任何的反例 ;建造了一个用于韵律短语分析的语料库 ,并提出了一组与韵律有关的特征 ;将数据分为训练集和测试集对算法进行了验证 .实验结果表明 ,新算法在正确率、规则个数和可懂性等各方面均优于传统的决策树方法 ,并接近于手工制定的规则 .This paper presents a new inductive learning algorithm based on the extension matrix theory, and uses it to solve the prosodic phrasing problem for Chinese Text-to-Speech systems. Authors propose a novel definition of the consistency of a rule and of a set of positive examples, and reveal their relationship using a theorem: By dividing the positive examples of a specific class in a given example set into consistent groups and adopting a simple strategy to find a conjunctive rule for each group which covers all the group's positive examples and none of the negative examples, the algorithm finds a set of consistent rules in the form of variable-valued logic. Authors collect 937 sentences of different genres (about 78 minutes length) from CCTV news program and built a large speech corpus. A group of features for modeling prosody are also proposed, and their effectiveness is measured by the interpretation of the resulting rules. Lastly, a serial of experiments are conducted. The data is divided into two parts: training set and test set, and the experimental results show that authors' method achieves higher accuracy, better interpretation and less rules than other algorithms. And the generated rules are quite similar to hand-crafted ones, which may help us better understand the relationship between Chinese syntax and prosody.

关 键 词:扩张矩阵理论 汉语韵律短语分析 汉字信息处理 归纳学习算法 汉语文语转换系统 

分 类 号:TP391.12[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象