基于大数据的科研热点分析系统研究  

Hot spot analysis visualization research based on big data

在线阅读下载全文

作  者:郭润平[1] 陈保国[1] 熊桂芳[1] GUO Runping;CHEN Baoguo;XIONG Guifang(Xi'an Siyuan University,Xi'an 710038,China)

机构地区:[1]西安思源学院,西安710038

出  处:《自动化与仪器仪表》2022年第5期136-141,共6页Automation & Instrumentation

基  金:陕西省社科界重大理论与现实问题研究项目:大数据视域下发展老龄服务产业研究,(NO.2020Z400)。

摘  要:为解决科研热点分析的传统数据处理算法中未曾考虑文本语义分析、以及编码方式造成的服务器压力和存储压力过大、最优主题个数求解算法主观性过高等问题,从模型改进入手,引入Word2Vec模型来改进传统LDA主题模型,获得Word2Vec-LDA模型;从编码方式入手,将描述向量特征的One-hot编码改为词袋模型编码并进行优化,实现对向量编码的降维,进而减轻服务器压力和存储压力;从算法改进入手,对最优主题个数求解算法进行设计,使求解算法具有极大的客观性。最后通过数据实验进行模型的性能验证和分析结果可视化展示。实验结果表明,设计的科研热点分析在主题强度、稳定性、相似性三方面的结果都满足设计要求;改进后的模型困惑值分布远高于传统LDA主题模型,具有更好的分类效果。基于上述分析可知,最终设计的科研热点分析模型基本满足设计要求。In order to solve the problems of the traditional data processing algorithm of scientific research hotspot analysis without considering the text semantic analysis,and the subjectivity of the optimal number of topic solving algorithm is too high,Starting with the model improvement,The Word2Vec model was introduced to improve the traditional LDA theme model,The Word2Vec-LDA model was obtained;Start with the encoding method,The One-hot encoding describing the vector features was changed to word pouch model encoding and optimized,Implement a dimension reduction for the vector encoding,Then reduce the server pressure and storage pressure;Starting with the algorithmic improvement,The algorithm for optimal number of designed,The solution algorithm is made extremely objective.Finally,the performance verification of the model and the analysis results were visualized through the data experiments.The experimental results show that the results of the subject strength,stability and similarity meet the design requirements;the improved model confusion value distribution is much higher than the traditional LDA theme model,with better classification effect.Based on the above analysis,the final designed analysis model of hot research spots basically meets the design requirements.

关 键 词:科研热点 主题模型 Word2Vec模型 词向量 

分 类 号:TP392[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象