基于底表的多层扫描术语自动标注算法

Term Auto-tagging Algorithm Based on Base Glossary

机构地区：[1]厦门大学人文学院,福建厦门361005 [2]福建卫生职业技术学院

出　　处：《厦门大学学报（自然科学版）》2011年第3期546-552,共7页Journal of Xiamen University：Natural Science

基　　金：国家社会科学基金项目(10BYY041);国家语言资源监测与研究中心资助项目(E080106-02)

摘　　要：以学科教材术语语料库建设为目标,实现了一种基于底表的多层扫描术语自动标注算法.该算法首先采用预测性规则模板扫描文本中未登录术语,并进行标注;其次采用最大匹配方法识别出每个可能的候选术语,把每个候选术语看作术语的定位点,扫描其上下文语境,分别调用单位术语规则模板、例外规则、部件规则、部件例外规则、例外校正规则等对扫描结果进行判断,确定候选术语的身份,并进行标注.该方法以规则的预测和限定功能为辅,充分利用了底表术语信息,取得了较高的标注准确率和召回率,开放测试F-指数达到了84%左右.A multi-scanning algorithm based on base glossary was designed for constructing a teaching material term corpus.Firstly,the term auto-tagging process scans and labels terms which match the prediction templates;secondly,finds out every candidate terms with the maximum matching algorithm based on a base glossary,and takes every candidate term as an anchor point,scans the context of the anchor point,calls the unit-term templates,exceptional-correct rules,term component and component exceptional rules in tern to judge whether the candidate string is a term or not.Together with the prediction and limited function of rules,this method makes full use of the information of terms in base glossary and achieves a higher precision and recall rate.The F-index reached about 84% in open test.

关键词：术语自动标注规则术语部件

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于底表的多层扫描术语自动标注算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于底表的多层扫描术语自动标注算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索