基于支持向量机的汉语歧义切分算法  被引量:2

An Algorithm for segmenting Ambiguities in Chinese Words Based on Support Vector Machine

在线阅读下载全文

作  者:李蓉[1] 

机构地区:[1]北京物资学院信息学院,北京101149

出  处:《计算机仿真》2009年第7期354-357,共4页Computer Simulation

基  金:高等学校人才强教计划资助项目(PHR200906210);北京市教育委员会科研基地建设项目(WYJD200902);北京市教育委员会科技计划项目(KM200810037001);国家自然科学基金重点项目(10673017)

摘  要:针对于解决交集型伪歧义字段的切分,提出了一种应用支持向量机的汉语歧义切分方法。歧义切分问题可看为一个模式分类问题,为提高字段处理能力,应用支持向量机方法建立分类模型。先对歧义字段进行特征提取,采用互信息来表示歧义字段。求解过程是一个有教师学习过程,从歧义字段中挑选出一些高频伪歧义字段,人工将其正确切分作为训练样本并代入SVM训练得到一个分类模型。在分类阶段将SVM和KNN相结合构造一个新的分类器,对于待识别歧义字段代入分类器即可得到切分结果。实验证明不仅具有一定的识别准确率,而且可以提高歧义切分速度。This paper presents an algorithm for segmenting ambiguities in Chinese words based on support vector machine, which aims to deal with the segmentation of overlapped ambiguities. The segmentation of ambiguities can be regarded as a classification problem, then the support vector machine method is applied. The mutual information is used to represent the ambiguities as a feature extraction method. As a supervised learning, the false ambiguities with high frequency are selected and classified by handwork as the training set, which are trained by SVM. After the ambiguities have been selected and classified by handwork, the false ambiguities with high frequency are trained by SVM. The experiments show that not only a correct rate of 91.6% can he reached for overlapped ambiguities, but also less time would be spent in the segmentation process.

关 键 词:支持向量机 核函数 伪歧义 特征提取 

分 类 号:O234[理学—运筹学与控制论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象