检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京工业大学国际WIC研究院,北京100022 [2]广西师范大学计算机科学与信息工程学院,广西桂林541004
出 处:《计算机应用与软件》2011年第6期57-58,134,共3页Computer Applications and Software
基 金:国家自然科学基金重大研究计划培育项目(90718020);澳大利亚ARC项目(DP0667060)
摘 要:自动分词技术的瓶颈是切分歧义,切分歧义可分为交集型切分歧义和组合型切分歧义。以组合型歧义字段所在句子为研究对象,考察歧义字段不同切分方式所得结果与其前后搭配所得词在全文中的支持度,构造从合或从分切分支持度度量因子,依据该因子消除组合型歧义。通过样例说明和实验验证该方法可行并优于现有技术。The bottleneck of automatic word segmentation is to segment the ambiguity of word senses,which can be divided into crossing ambiguity and combinational ambiguity of the word senses.In this paper,we took the sentence including word section with combinational ambiguity as our research object,examined the support degree of the words composed of the segmented results of ambiguous word section derived from different segmentation methods and their co-occurrence words in the text,constructed the metric factor of support degree of segmentations either in compliance to composition or to separation,the combinational ambiguity of word senses is cleared up according to the factor.The feasibility of the method and its predominance over present techniques have been illustrated by the exemplar and attested by the experiment.
关 键 词:中文信息处理 组合型歧义 共现支持度 歧义消解 支持度因子
分 类 号:TP391.12[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:52.14.133.33