微博语料分词及标注方法初探被引量：1

Preliminary Study of Chinese Word Segmentation and Part-of-Speech Tagging Being Used for Microblog Data

出　　处：《新疆大学学报（自然科学版）》2013年第1期81-86,共6页Journal of Xinjiang University(Natural Science Edition)

基　　金：国家自然科学基金(61163029)

摘　　要：本文将清华大学中文分词和词性标注系统应用于部分微博语料数据,检测系统对微博新词识别能力及对识别错误进行了分类总结,并对识别率低的新词制定了标注规范.通过人工校对获得新的训练语料数据集,提高系统对微博文本的处理能力,为建立微博专用语料库做前期准备工作.In this paper, Tsinghua University’s Chinese word segmentation and part-of-speech tagging system is used to analyze microblog data .One finding is that the system cannot identify most of new words in microblog data, Identification errors are systematically classified and analyzed. An annotation guideline is designed to identify new words manually. The goal is to build manually annotated training data, improve the performance of microblog new word identification, and prepare for building microblog corpus for a specific purpose.

关键词：分词标注系统专有名词新词微博语料库

分类号：TP391.2[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

微博语料分词及标注方法初探被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

微博语料分词及标注方法初探 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

微博语料分词及标注方法初探被引量：1