增量文本软聚类速度改善算法设计及仿真  被引量:2

Design and Simulation of Incremental Text Soft Clustering Speed Improvement Algorithm

在线阅读下载全文

作  者:刘艳 周斌 LIU Yan;ZHOU Bin(Information and Engineering School,Wuhan University of Engineering Science,Wuhan Hubei 430200,China;School of Software,Huazhong University of Science and Technology,Wuhan Hubei 430200,China)

机构地区:[1]武汉工程科技学院信息工程学院,湖北武汉430200 [2]华中科技大学软件学院,湖北武汉430200

出  处:《计算机仿真》2022年第8期524-528,共5页Computer Simulation

基  金:2020年度湖北省教育厅科学研究计划指导性项目(B2020291)。

摘  要:为全面展示文本信息并清晰划分文本主题类别,提出基于群体智能的增量文本软聚类算法,提升聚类速度并改善聚类效果。计算增量文本中不同主题文本的相似语义序列集合覆盖度,将覆盖度计算过程中的最小熵重叠值的候选类作为下一步聚类的内容,降低软聚类过程中的文本向量空间维数;并通过基于群体智能的蚁群算法让蚂蚁随机选择增量文本,计算增量文本在现阶段局部区域内的群体相似性,得到蚂蚁抓取或丢弃文本的概率,以决定蚂蚁是否抓取、丢弃或移动增量文本,之后采用Python语言构建SCAST程序实现算法迭代训练,使增量文本能够按照其群体相似性聚集至一处,得到文本聚类结果。经仿真验证,上述算法计算语义序列相似度值较高,对异常文本较为敏感,聚类时间较低,能够快速实现增量文本聚类。In order to fully display the text information and clearly categorize the text subjects,a soft clustering algorithm for incremental text based on swarm intelligence was put forward to improve the clustering speed and clustering effect.Firstly,the coverage ratio of the set of similar semantic sequences of different subject texts in an incremental text was calculated,and then the candidate class of the minimum entropy overlap in the process of calculating the coverage rate was taken as the content of the next clustering.In the process of soft clustering,the dimension of text vector space should be reduced.Meanwhile,the ant colony algorithm based on swarm intelligence allowed the ants to randomly choose the incremental texts.After that,the similarity of the colony of the incremental texts in the local area at the present stage was calculated to obtain the probability that the ant grabbed or discarded the text,and thus to determine whether the ant should grab,discard or move the incremental text.Furthermore,Python language was used to design SCAST program,thus realizing the iterative training of the algorithm.Finally,these incremental texts could be clustered together according to their colony similarity,so that the text clustering results were obtained.Simulative and experimental results verify that the proposed algorithm has a high similarity value of the semantic sequence and is sensitive to abnormal text.In addition,it has less clustering time and can quickly achieve incremental text clustering.

关 键 词:群体智能 增量文本 软聚类 蚁群算法 群体相似性 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象