检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]江苏科技大学计算机科学与工程学院,江苏镇江212000
出 处:《信息技术》2015年第9期191-195,共5页Information Technology
摘 要:词典系统是自然语言处理领域较为基础,但又很重要的数据来源。其质量的好坏,会影响上层的分词技术和语义的标注工作。对词间关系的语义分析,能够使得自然语言处理智能化。因此文中针对词典系统提出了分层次管理模式,以行业为父节点,称为行业类别,语义相近的词语集合作为其子节点,称为词语类别集合,其包括子代表词,简称词,同义词等词语类型的词语。将这种词间关系设计成一个词间关系模型,这对多重语义的词语也起到了有效的管理。由于词典系统大都是手工录入,有一定的局限性,因此在K-means的基础上,设计了MS-kmeans算法,对词语类别分类得到有效地提高,同时对词语的标注也得到较大的改善。Dictionary system is not only natural data sources. Its quality affects the work of word language processing relatively basic, but very important segmentation and semantic annotation of the upper. A semantic analysis of the relationship between words can make natural language processing intelligent. Therefore this paper puts forward the hierarchical management model for the dictionary system in the industry, the parent node, called the industry category, semantic similar words set as its child nodes, called as a set of word categories, including sub representative words, referred to as the word, synonym words such as types of words. The relationship between words is designed as a model of relation between words, the multiple semantic words also provides effective management. Because the dictionary systems are mostly manual entry and there are certain limitations. Therefore, on the basis of K-means algorithm, it designs the MS-k-means algorithm, improves the word classification effectively, on the word tagging can also be improved greatly.
关 键 词:词典管理 词间关系 相似度 MS-k-means算法 SOA模式
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171