数据挖掘技术在文本特征分析中的应用研究--以夏目漱石中长篇小说为例  被引量:10

A Study on the Application of Data Mining Technology in Text Feature Analysis——Taking Natsume Soseki's Novels as an Example

在线阅读下载全文

作  者:毛文伟 MAO Wen-wei(Office of Research Affairs,Shanghai International Studies University,Shanghai 200083,China)

机构地区:[1]上海外国语大学科研处,上海市200083

出  处:《外语电化教学》2018年第6期8-15,共8页Technology Enhanced Foreign Language Education

摘  要:本研究运用数据挖掘技术对夏目漱石的中长篇小说进行聚类分析,发现以1908年为界,夏目漱石的中长篇小说可分为三个时期。t检验结果显示,这些作品在名词比、动词比、修饰词比、MVR等指标方面表现较为一致。早期和过渡期作品在接续词句比、非过去式句比方面,过渡期和后期作品在非过去式结句比方面,前期和后期作品在接续词句比、非过去式句比方面存在显著性差异。对指标进行标准化后发现,它们的共性特征在于文本偏重描写,且倾向于状况描写。前期作品的句子极短,容易理解。之后的作品句子逐渐变长,但仍偏短、易懂。句子间关联性不断增强,前后意思联系更加密切,表达更加富有逻辑性。在叙事方式方面,由生动描写转向客观描写,发生了由第一人称视角向第三人称视角的转换。The paper aimed to explore data mining techniques to classify and analyze text features of literary works effectively.The cluster analysis found that Natsume Soseki’s novels could be divided into three groups.T-test results showed that these works were consistent in noun ratio,verb ratio,adjunct ratio and MVR,but were significantly different in the ratio of the sentences that contain continuatives or end with non-past tense predicate.Rank Cases standardized the indicators,helping to find the common characteristics of these works.The texts were descriptive and tended to describe the situation.The sentences in the early works were very short and easy to understand.The sentences in the subsequent works gradually became longer but were still short and easy to understand.The relevance between sentences was constantly getting stronger,and the expression became more logical.As to the narrative mode,there was a shift and from vivid to objective,and from the first-person perspective to the third-person perspective.

关 键 词:数据挖掘 聚类分析 日本文学 文本特征 

分 类 号:H319.3[语言文字—英语]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象