以词为本的编码方案的探讨  

Encoding Scheme Based on Words

在线阅读下载全文

作  者:程元斌[1] 

机构地区:[1]江汉大学数学与计算机科学学院,湖北武汉430056

出  处:《江汉大学学报(自然科学版)》2013年第2期47-52,共6页Journal of Jianghan University:Natural Science Edition

摘  要:语言是人进行思维的主要工具,词是语言处理的基本单位。在计算机信息处理中,目前是按字设计编码。随着计算机信息处理技术的发展,这种完全按字编码的不足也日益显示出来。从信息处理的基本需求以及词的基本特性出发,提出字词综合考虑且以词为本的统一编码方案。该方案以现行的主要编码标准UTF-16为基础,维持现有的字编码,增加词编码;词编码以包括一定语义信息及语义关系的概念空间树进行逻辑组织,以适应聚类检索及语种间代码转换的原则进行空间组织。最后指出了需要进一步深入研究的几个疑难问题。Language is the main tool of thinking. Words are the basic unit of language. Howev- er, character encoding is the present encoding method in computer information processing. With in-depth development of computer information processing, the disadvantages of character encoding increasingly appear. From the basic needs of information processing and the basic characteristics of the words, an unified encoding scheme on comprehensive consideration of word-character, and word-oriented is proposed. The scheme based on the existing coding standard UTF-16, maintains the existing character encoding, adds words coding; words encoding are logical organized with the concept space tree including some semantic information and semantic relationship, adapting to clus-ter retrieval and language code convert between two languages are the principles of spatial organiza-tion. At last, points out several problems which need further study.

关 键 词:词编码 UTF-16 聚类检索 概念空间树 自然语言处理 

分 类 号:TP391.11[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象