检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张利[1] 张立勇[1] 张晓淼[1] 耿铁锁[2] 岳宗阁[3]
机构地区:[1]大连理工大学电子与信息工程学院,辽宁大连116024 [2]大连理工大学国有资产处,辽宁大连116024 [3]大连理工大学附属医院,辽宁大连116024
出 处:《大连理工大学学报》2007年第1期131-135,共5页Journal of Dalian University of Technology
基 金:国家自然科学基金资助项目(60573172)
摘 要:文本挖掘中中文歧义字段的自动分词是计算机科学面临的一个难题.针对汉语书写时按句连写,词间无间隙,歧义字段分词困难的特点,对典型歧义中所蕴含的语法现象进行了归纳总结,建立了供词性编码使用的词性代码库.以此为基础,通过对具有特殊语法规则的歧义字段中的字、词进行代码设定,转化为神经网络能够接受的输入向量表示形式,然后对样本进行训练,通过改进BP神经网络的自学习来掌握这些语法规则.训练结果表明:算法在歧义字段分词上达到了93.13%的训练精度和92.50%的测试精度.In the text mining, the technology of Chinese automatic word segmentation is a difficult problem that the computer science has to face. Aiming at the characteristics of Chinese writing, such as no space between words, continuous writing in sentences and difficulty of segmenting the ambiguous words, the grammatical phenomena are summarized which lie in the typical ambiguity, and the codes library of different parts of speech used for coding is built up. On this basis, words in ambiguity fields with special grammatical rules are set with codes and transformed to the representation form of inputting vector which can be accepted by the neural network. Then the samples are trained and the grammatical rules can be obtained by improving the self-learning of BP neural network. After a lot of training through adopting the BP network, the algorithm reaches 93. 13% of training precision and 92.50% of test precision on ambiguous words segmentation.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90