《红楼梦》词和N元文法分析  被引量:10

Words and N-gram Models Analysis for “A Dream of Red Mansions”

在线阅读下载全文

作  者:肖天久 刘颖[1] 

机构地区:[1]清华大学中国语言文学系,北京100084

出  处:《现代图书情报技术》2015年第4期50-57,共8页New Technology of Library and Information Service

基  金:国家自然科学基金项目"基于语用信息的交互行为与语言特征的建模研究"(项目编号:61171114);教育部自主科研项目"基于大规模语料库的社会语用信息网的构建"(项目编号:20111081010)的研究成果之一

摘  要:【目的】研究《红楼梦》前八十回与后四十回的关系,从而判定《红楼梦》是否为一人所写。【方法】定量统计和定性分析相结合,比较前、中、后四十回的独有词;利用虚词、词及词类的N元文法模型、实词以及词长进行聚类;计算三个部分的相似度。【结果】证明前八十回与后四十回有差异。前八十回用词连贯性较高,更重视细节描写,长词较少,可读性更强;后四十回更重视动作和场景化描写,长词较多,可读性稍弱。【局限】仅限于词和N元文法,未能进一步考察语义、语篇等方面的特征。【结论】从词、词类、短语串和词类串等方面分析,前八十回与后四十回很可能并非一人所作。[Objective] Research on the relationship between the first 80 chapters and the last 40 chapters of "A Dream of Red Mansions". [Methods] Combined quantitative with qualitative method, compare the first 40 chapters, the middle 40 chapters and last 40 chapters with each other to calculate the ratios of the unique words of every part. Clustering is conducted respectively by utilizing the function words, N-gram model of words and part-of-speech, all content words and the word length, compute the similarities among the first 40 chapters, the middle 40 chapters and last 40 chapters according to high-frequency words. [Results] There are differences between the first 80 chapters and the last 40 chapters. There are less long words in the first 80 chapters and it is more readable and coherent than the last 40 chapters. The first 80 chapters pay more attention to description of details, while the last 40 chapters focus more on the description of actions and scenes. [Limitations] Only consider words and N-gram models, semantic and pragmatic features are not utilized. [Conclusions] The author of the first 80 chapters and the author of the last 40 chapters are not the same according to these features.

关 键 词:风格分析 层次聚类 K—means聚类 N元文法 

分 类 号:H15[语言文字—汉语]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象