A New Word Detection Method for Chinese Based on Local Context Information  被引量:1

A New Word Detection Method for Chinese Based on Local Context Information

在线阅读下载全文

作  者:曾华琳 周昌乐 郑旭玲 

机构地区:[1]Department of Cognitive Science,Fujian Key Laboratory of the Brain-like Intelligent Systems,Xiamen University

出  处:《Journal of Donghua University(English Edition)》2010年第2期189-192,共4页东华大学学报(英文版)

基  金:National Natural Science Foundation of China ( No.60903129);National High Technology Research and Development Program of China (No.2006AA010107, No.2006AA010108);Foundation of Fujian Province of China (No.2008F3105)

摘  要:Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method,the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information,which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentation and new word detection which achieves a good effect in the close or opening test,and outperforms some well-known Chinese segmentation system to a certain extent.

关 键 词:new word detection improved PPM model context information Chinese words segmentation 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象