Predicting Chinese Abbreviations from Definitions:An Empirical Learning Approach Using Support Vector Regression  被引量:8

Predicting Chinese Abbreviations from Definitions:An Empirical Learning Approach Using Support Vector Regression

在线阅读下载全文

作  者:孙栩 王厚峰 王波 

机构地区:[1]Institute of Computational Linguistics,School of Electronics Engineering and Computer Science

出  处:《Journal of Computer Science & Technology》2008年第4期602-611,共10页计算机科学技术学报(英文版)

基  金:the National Natural Science Foundation of China(Grant Nos.60473138 and 60675035);the Beijing Natural Science Foundation(Grant No.4072012).

摘  要:In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).

关 键 词:statistical natural language processing abbreviation prediction support vector regression word clustering 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象