检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:安兴茹[1]
出 处:《情报杂志》2014年第10期129-136,共8页Journal of Intelligence
摘 要:词频分析法高频关键词或主题词的界定是开展信息分析的重要基础。首先,在文献统计分析的基础上,总结了目前词频分析法高频词确定的四种方法:TOPN方法、WF>=M方法、%WF=P方法以及T计算方法,这些方法存在着经验性、随意性、理论基础和适用性上的问题。接着,通过实证方法,验证了关键词和主题词在文献库中的分布符合正态分布,并根据正态分布的特性,提出了词频分析法高频词阈值的F计算方法。最后,在多个数据样本基础上,将F方法与T方法进行了对比分析,认为基于正态分布的高频词阈值F计算方法在理论基础和适用性上都能达到较好的效果。Along with the outburst of information and the developing of information analysis,word frequency analysis is becoming more and more popular in which the defining of high-frequency words serves as the cornerstone.By summarizing the precedent literature researches,this paper first concluded four methods of defining high-frequency words at present,i.e.TOPN,WF = M,% WF = P and T formula.After briefly discussing the main and obvious shortcomings of the above four methods,such as depending on experience too much,subjectivity,lack of theoretical background,inapplicability or impracticability and so on,the paper empirically tested and verified the normal distribution of high-frequency words in depositories,and accordingly proposed the F formula for threshold analysis of high-frequency words.At the final part,the paper compared and contrasted the T formula and the F formula through the analysis of many datasets,and by doing this the F formula was theoretically and applicably legitimized in the research of threshold of high-frequency words based on normal distribution.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15