基于词典词语量化关系的中文文本分割方法被引量：2

Research on Chinese text segmentation based on quantified conceptual relations extracted from Chinese dictionary

出　　处：《计算机工程与应用》2008年第21期25-29,88,共6页Computer Engineering and Applications

基　　金：国家自然科学基金( the National Natural Science Foundation of China under Grant No.60496326);江西省教育厅科技计划项目( No.[2006]178)

摘　　要：随着Internet网络资源的快速膨胀,海量的非结构化文本处理任务成为巨大的挑战。文本分割作为文本处理的一个重要的预处理步骤,其性能的优劣直接影响信息检索、文本摘要和问答系统等其他任务处理的效果。针对文本分割中需要解决的主题相关性度量和边界划分策略两个根本问题,提出了一种基于词典词语量化关系的句子间相关性度量方法,并建立了一个计算句子之间的间隔点分隔值的数学模型,以实现基于句子层次的中文文本分割。通过三组选自国家汉语语料库的测试语料的实验表明,该方法识别分割边界的平均错误概率■和最低值均好于现有的其他中文文本分割方法。With the quick expanding of the Internet information resource, the task of processing a mass of non-structured texts is faced with a huge challenge.Text segmentation based on the topic is a very important preproeessing step of text processing,and the performance of text segmentation technique has an immediate influence on the result of these tasks,such as Information Retrieval,Text Summarization and Q-A system.However,there exists two key problems in the text segmentation task,namely,how to measure the relevance of between topics and how to make a strategy for identifying the segment boundary based on the relevance of the context.In order to solve the above problems,this paper presents a new approach to measure the relevance of between sentences based on the Quantified Conceptual Relations （QCR） extracted from Modern Chinese Standard Dictionary （MCSD）,and built a model to calculate the Segmentation Value of the gap point of between sentences for the task of text segmentation oriented sentence-level （no paragraph-level）：The experiment results show that this approach has achieved a lower average error rate Pk than that of state-of-the-art methods in the task of Chinese Text Segmentation.

关键词：文本分割词语量化关系句子相关性度量间隔点分隔值

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词典词语量化关系的中文文本分割方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词典词语量化关系的中文文本分割方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于词典词语量化关系的中文文本分割方法被引量：2