检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李蓉[1]
出 处:《计算机仿真》2009年第7期354-357,共4页Computer Simulation
基 金:高等学校人才强教计划资助项目(PHR200906210);北京市教育委员会科研基地建设项目(WYJD200902);北京市教育委员会科技计划项目(KM200810037001);国家自然科学基金重点项目(10673017)
摘 要:针对于解决交集型伪歧义字段的切分,提出了一种应用支持向量机的汉语歧义切分方法。歧义切分问题可看为一个模式分类问题,为提高字段处理能力,应用支持向量机方法建立分类模型。先对歧义字段进行特征提取,采用互信息来表示歧义字段。求解过程是一个有教师学习过程,从歧义字段中挑选出一些高频伪歧义字段,人工将其正确切分作为训练样本并代入SVM训练得到一个分类模型。在分类阶段将SVM和KNN相结合构造一个新的分类器,对于待识别歧义字段代入分类器即可得到切分结果。实验证明不仅具有一定的识别准确率,而且可以提高歧义切分速度。This paper presents an algorithm for segmenting ambiguities in Chinese words based on support vector machine, which aims to deal with the segmentation of overlapped ambiguities. The segmentation of ambiguities can be regarded as a classification problem, then the support vector machine method is applied. The mutual information is used to represent the ambiguities as a feature extraction method. As a supervised learning, the false ambiguities with high frequency are selected and classified by handwork as the training set, which are trained by SVM. After the ambiguities have been selected and classified by handwork, the false ambiguities with high frequency are trained by SVM. The experiments show that not only a correct rate of 91.6% can he reached for overlapped ambiguities, but also less time would be spent in the segmentation process.
分 类 号:O234[理学—运筹学与控制论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15