检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张晓川[1] Zhang Xiaochuan(China Mobile Communications Group Guangdong Co.LTD,guangdong guangzhou 510000)
机构地区:[1]中国移动通信集团广东有限公司,广东广州510000
出 处:《现代科学仪器》2020年第3期193-196,共4页Modern Scientific Instruments
摘 要:针对微博数据搜索过程中存在的信息量不对称,搜索精度和搜索效率较低的问题。基于人工智能推理引擎技术来实现对微博网页数据的采集和获取,满足用户端需求的定准定制。将人工智能应用于网络爬虫程序,通过API并结合混合策略对网路爬虫进行改进,消除了网络爬虫行进中的无关链接,提高信息资源收集的精确度;在数据挖掘过程中,采用具有和语义具有相似性的词频-反稳定频率(TF-IDF)函数来计算词语语义的相似度,引入特征权重作为特征向量重要程度的指标,建立组合相似度策略,提升算法性能。通过实例验证结构表明:采用该组合策略方式有效提升了数据挖掘的准确度和召回率,降低了算法的差错率,实现了人工智能平台对微博数据挖掘的精确捕捉。aiming at the problem of asymmetric information,low search accuracy and low search efficiency in the process of Weibo data search.Based on the artificial intelligence reasoning engine technology to achieve the Weibo web page data collection and acquisition,to meet the needs of the user-side customization.The artificial intelligence is applied to the web crawler program to improve the web crawler by API and combining with the mixed strategy,which eliminates the irrelevant link in the web crawler travel and improves the accuracy of information resource collection.In the process of data mining,the word frequency-antistable frequency(TF-IDF)function with semantic similarity is used to calculate the similarity of word semantics.feature weights are introduced as indicators of the importance of feature vectors to establish a combined similarity strategy to improve algorithm performance.It is proved by an example that this combination strategy can effectively improve the accuracy and recall of data mining,reduce the error rate of the algorithm,and realize the accurate capture of Weibo data mining by artificial intelligence platform.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.13