检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:雷春雅[1] 郭剑毅[1,2] 余正涛[1,2] 毛存礼[1,2] 张少敏[1] 黄甫[1]
机构地区:[1]云南昆明理工大学信息工程与自动化学院,云南昆明650051 [2]云南省计算机技术应用重点实验室智能信息处理研究所,云南昆明650051
出 处:《山东大学学报(工学版)》2010年第5期141-145,共5页Journal of Shandong University(Engineering Science)
基 金:国家自然科学基金资助项目(60863011);云南省自然科学基金重点项目资助项目(2008CC023);云南省中青年学术技术带头人后备人才项目资助项目(2007PY01-11)
摘 要:实体关系自动获取是信息抽取的难题之一。本文提出自扩展算法和最大熵机器学习算法相结合的方法,以旅游领域为研究对象进行实体关系的自动抽取。首先利用自扩展算法自动获取能体现实体对间大类关系的语义词汇,该词汇作为特征加入最大熵机器学习算法的特征集,并设定阈值实现训练语料的自动标注;然后使用最大熵机器学习算法对训练语料进行学习,构建实体关系抽取的分类器,实现实体关系的自动获取。在收集600篇旅游领域语料的基础上进行实验,4大类实体关系的抽取获得了较好的结果,其中地理位置关系和时节关系的F值分别为82.56%和81.17%。实验结果表明:在人工干预较少的情况下,加入实体对间的语义词汇能有效提高抽取效果。Entity relation extraction is one of the difficulties in information extraction'-s field.In this study,a method of seed self-expansion and maximum entropy machine learning was proposed to extract entity relation in the filed of tourism.First,seed self-expansion was used to get words semantic that express the big types relation between entity pairs,and this words semantic as a characteristic was added to the set of characteristics,at the same time a threshold to automatically tag the studying corpus was designed;and then the maximum entropy machine learning algorithm was used to learn corpus tagged and the classifier of entity relation extraction was built.Experiments based on artificial collection of 600 corpuses obtained a better result for four big types of entity relation extraction,the F values reached 82.56% and 81.17% in which the two big types relation of geographical location and date-season,it showed that in the condition of less manual participation,adding the word semantic of entity pairs could effectively improve the performance of the classifier.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222