舆情短文本挖掘的数学模型及其实现  被引量:2

Mathematical Model of Short Text Mining of Public Opinion and its Implementation

在线阅读下载全文

作  者:王超[1] 彭湃 李波[1] WANG Chao;PENG Pai;LI Bo(School of Mathematics and Statistics,Central China Normal University,Wuhan,Hubei 430079,China)

机构地区:[1]华中师范大学数学与统计学学院,湖北武汉430079

出  处:《数学建模及其应用》2018年第3期29-36,43,共9页Mathematical Modeling and Its Applications

摘  要:传统针对文本数据的分析,往往基于词频、词频逆文本统计量作为文本的表示特征.这类方法往往只反映了文本的部分信息,忽略了文本的内在语义特征.本文研究了中文词语衔接的概率语言模型,其基本思想在于根据文本中词语出现的先后顺序进行建模分析,该模型在短文本数据挖掘中能够很好地针对文本语义进行量化分析.主要解决两类问题:一、如何合理地将中文词转化为数字向量,并且保证中文近义词在数字空间特征上的相似性;二、如何建立恰当的向量空间,将中文文本的语义和结构特征等信息保留在向量空间中.最后结合某城市房屋管理部门留言板的实际留言文本数据,利用BP神经网络和RNN网络两种算法,实现概率语言模型的求解.与传统文本处理方法的对比说明,本文的模型方法针对短文本语义挖掘问题具有一定的优势性.Traditional analysis of text data is often based on word frequency and word frequency inverse text statistic as the representation of text. These methods often only reflect part of the text, ignoring the inherent semantic features of the text. In this paper,the probabilistic language model of Chinese word cohesion is studied. The basic idea is to model and analyze text data according to the order of occurrence of words in the text. This model can be used for quantitative analysis of text semantics in short text data mining. Mainly solve two kinds of problems:First, how to reasonably convert Chinese words into digital vectors,and ensure the similarity of Chinese synonyms in digital spatial features. Second,how to establish appropriate vector space, the semantics and structural features of Chinese text, etc. Information is kept in the vector space. Finally, combined with the actual message text data of the message board of a city housing management department, the BP neural network and RNN network algorithms are used to solve the probabilistie language model. Compared with the traditional text processing method, the model method of this paper aimed at shorting text semantics mining problems is advantageous.

关 键 词:文本挖掘 概率语言模型 BP网络 RNN网络 短文本分析 

分 类 号:O29[理学—应用数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象