字标注汉语词法分析中上文和下文孰重孰轻  被引量:3

Which is More Effective for Chinese Lexical Analysis via Character Tagging:Above-context Versus Below-context

在线阅读下载全文

作  者:于江德[1] 王希杰[1] 樊孝忠[2] 

机构地区:[1]安阳师范学院计算机与信息工程学院,安阳455000 [2]北京理工大学计算机科学技术学院,北京100081

出  处:《计算机科学》2012年第11期201-203,236,共4页Computer Science

基  金:高等学校博士学科点专项科研基金项目(20050007023);河南省高等学校青年骨干教师项目(2009GGJS-108)资助

摘  要:汉语词法分析是中文信息处理的基础,现阶段汉语词法分析的主流技术是基于统计的方法,这类方法的本质都是把词法分析过程看作序列数据标注问题。上下文是统计方法中获取语言知识和解决自然语言处理中多种实际应用问题必须依靠的资源和基础。汉语词法分析时需要从上下文获取相关的语言知识,但上文和下文是否同样重要呢?为克服仅凭主观经验给出猜测结果的不足,对基于字标注汉语词法分析的分词、词性标注、命名实体识别这3项子任务进行了深入研究,对比了上文和下文对各个任务性能的影响;在国际汉语语言处理评测Bakeoff多种语料上进行了封闭测试,采用分别表征上文和下文的特征模板集进行了对比实验。结果表明,在字标注框架下,下文对汉语词法分析性能的贡献比上文的贡献高出6个百分点以上。Chinese lexical analysis is a foundational task for Chinese information processing.At the current,the mainstream technology of Chinese lexical analysis is based on statistical methods.These methods treat the analysis process as a sequence data tagging problem.Context is the necessary resource not only for obtaining linguistic knowledge in statistical linguistics but also for solving the problem in natural language processing.Chinese lexical analysis needs the help of correlative context.However,are above and below the same important? To overcome the lack of giving the result by the subjective experience,we studied the contribution of above and below for character-based tagging Chinese lexical analysis via the large number of experiments about word segmentation,POS tagging and named entity recognition.Closed evaluations were performed on many kinds of corpus from the international Chinese language processing Bakeoff,and comparative experiments were performed on different feature templates which describe above-context and below-context.Experimental results show that the performance by the below-context increases 6 percentage points than by the above-context.

关 键 词:汉语词法分析 字标注 上下文 分词 词性标注 命名实体识别 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象