检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华中师范大学计算机科学与技术系,武汉430079
出 处:《计算机工程》2006年第21期183-184,193,共3页Computer Engineering
基 金:国家自然科学基金资助项目(60442005);教育部科学技术研究基金资助重点项目(105117)
摘 要:命名实体间关系的抽取是信息抽取中的一个重要研究问题,该文提出了一种从大量的文本集合中自动抽取命名实体间关系的方法,找出了所有出现在同一句子内、词语之间的距离在一定范围之内的命名实体对,把它们的上下文转化成向量。手工选取少量具有抽取关系的命名实体对,把它们作为初始关系的种子集合,通过自学习,关系种子集合不断扩展。通过计算命名实体对和关系种子之间的上下文相似度来得到所要抽取的命名实体对。通过扩展关系种子集合的方法,抽取的召回率和准确率都得到了提高。该方法在对《人民日报》语料库的测试中,取得了加权平均值F-Score为0.813的效果。Named entity relation extraction is an important issue in inforlnation extraction, This paper proposes a special method that extracts named entity relation from large text rendezvous. It finds out the named entity pairs, which appear in the same sentences and the distances of them is under a certain value, and converts their contexts into vectors. It selects a few named entity pair instances that have the relation wanted to extract and make them as initial relation seed set, The relation seed set is extended automatically in sell-study process. It gets the named entity pairs, which have the relation wanted to extract, by calculating the similarity of context vectors between named entity pairs and relation seed set. By the method of bootstrapping, the recall and precision are enhanced. It verifies the method with the PFR corpora and achieves an average weighted F-Score of 0.813.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185