检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:傅爱平[1]
出 处:《中文信息学报》1999年第5期7-13,共7页Journal of Chinese Information Processing
摘 要:汉英 M T 源语分析首先遇到的问题是词的识别。汉语中的“词”没有明确的定义,语素和词、词和词组、词组和句子,相互之间也没有清楚的界限。按照先分词、再句法分析的办法,会在分词时遇到构词问题和句法问题相互交错的困难。作者认为,可以把字作为源语句法分析的起始点,使词和词组的识别与句法分析同时进行。本文叙述了这种观点及其实现过程,并且以处理离合词为例,说明了识别的基本方法。The first problem we have metin source language analysis in a Chinese English M Tsystem is Chinese sentence tokenization ,as in written Chinese there is no explicit word delimiter . Finding tokenboundaries for a character string will be often interlaced with syntactic parsing ,or even with semantic re lations . This paper presents an approach of combination of sentence tokenization and syntactic semanticanalysis. Instead of getting tokenized word string before sentence parsing ,the tokenizing component isbuiltinto the parser ,i .e .syntactic and semantic information could be used for recognizing words whennecessary during parsing which is supported by a dictionary with descriptions for individual usage and aset of com mon rules .
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249