检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李晓超[1,2,3] 贾立国[4] 罗燕[1,2,3] 陈敏[1,2,3] 柳萌萌[1,2,3] 赵书良[1,2,3]
机构地区:[1]河北师范大学数学与信息科学学院,石家庄050024 [2]河北师范大学河北省计算数学与应用重点实验室,石家庄050024 [3]河北师范大学移动物联网研究院,石家庄050024 [4]河北师范大学教务处,石家庄050024
出 处:《情报杂志》2015年第6期62-67,共6页Journal of Intelligence
摘 要:布茨定律反映了英文文本同频词的分布规律,但布茨定律是否适用于中文文本很少有学者对其进行深入研究。为了探究布茨定律对于中文文本的适用性,揭示中文文本同频词的统计规律,对大量中文文本同频词进行统计研究,实验过程中注重了实验数据规模的选取和文本长度跨度的设计。实验得出:随着文本长度的增大,低频词的同频词数与不同词数的比值并非定值,而是逐渐减小;低频词的同频词数与不同词数的关系呈幂函数增长。另外,随着文本长度的增大,低频词的同频词数与频次为1的同频词数的比值也非定值,而是逐渐增大。上述结果与布茨所做英文的实验不一致,故得出结论:布茨定律不适用于中文文本。Booth's law reflects the rule of the same frequency words in English text. But there few of scholars give some research of whether Booth' s law can fit Chinese text well. In order to explore the law of Booth' s applicability of the Chinese text, discover the statisti- cal rules of Chinese text with t frequency words, in this paper, a large number of Chinese text carried on the statistical study with frequency words, pay attention to the selection of the size of the experimental date in the process of the experiment and the design of the span length of the text. Experiments shows that along with the increa~ of length of the text, the low-frequency words of the ratio of different number of words is not a fixed value, but the decreases is gradually; The low-frequency words and the different function words are in growth. In addition, the ratio of the number of words occurringn times and words occurring once is not a fixed value, too, but increases gradually with the length of the article growth. The results are inconsistent with Booth' s experimental results on English text. So we give a conclu- sion that Booth' s law doesn' t fit Chinese text.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.0.68