检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国水产科学研究院东海水产研究所,中国水产科学研究院渔业资源遥感信息技术重点开放实验室,上海200090
出 处:《渔业信息与战略》2014年第1期54-59,共6页Fishery Information & Strategy
基 金:"十二五"国家科技支撑计划项目(2013BAD13B01);上海市科学技术委员会资助项目(12511501200).
摘 要:本文选取公开日从1992年1月1日到2011年12月31日的水产类的9 894条失效专利作为数据挖掘的文本。从中选出56条专利,利用分词器对其摘要进行分词,并通过卡方检验的方法过滤掉与分类相关度比较小的词,形成词组矩阵。然后采用朴素贝叶斯的方法对这些矩阵进行训练并设计程序。使用训练过后的程序对失效专利进行分类测试,合格后再对所有的专利的摘要文本进行分类,并对分类结果进行了分析和验证。验证的结果表明该程序对文本进行分类的准确率达到了85%,达到了比较好的可信度,可以用它对文本分类。如此我们就可以把失效的水产类专利文本按照设定的类别进行归类,了解一个时间段它们的分布情况,为以后做决策提供参考。In this article,the aquatic product patents from Jan. 1,1992 to Dec. 31,2011 are selected as the data mining texts,and the total number is 9894. Firstly,56 patents are chosen,and the segmentation devices are used to segment the summary. Secondly,the words of small correlation are filtered with the category through the chi-square test. And a matrix of words is established. Thirdly,the Nave Bayes method is used to train the program according to the matrix. Fourthly,the program is tested after being trained. Lastly,after passing the test,the program is used to classify all the patent' s summaries,then to analyze and verify the results. The verification shows that the accuracy of the text classification program is 85%,Which means that we can use it to classify text. So we can categorize the lapsed aquatic patents' summaries according to the categories set by ourselves. Then we will have knowledge of the distribution of the lapsed aquatic patents in a period of time,which can provide a reference for the future decision.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.250.24