弹幕视频的情感时间曲线聚类与传播效果  被引量:9

Sentiment Curve Clustering and Communication Effects of Barrage Videos

在线阅读下载全文

作  者:张腾 倪渊[1,2] 莫同 吕学强[4] Zhang Teng;Ni Yuan;Mo Tong;Lv Xueqiang(School of Economics and Management,Beijing Information Science and Technology University,Beijing 100192,China;Beijing Knowledge Management Research Base,Beijing 100192,China;School of Software and Microelectronics,Peking University,Beijing 102600,China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100192,China)

机构地区:[1]北京信息科技大学经济管理学院,北京100192 [2]北京知识管理研究基地,北京100192 [3]北京大学软件与微电子学院,北京102600 [4]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100192

出  处:《数据分析与知识发现》2022年第6期32-45,共14页Data Analysis and Knowledge Discovery

基  金:北京市社会科学基金规划项目(项目编号:21GLB027)的研究成果之一。

摘  要:【目的】构建面向弹幕文本的情感曲线聚类模型,为视频传播效果预测提供新的决策方式。【方法】提出词向量扩充领域情感词典,优化情感分类器性能;采用综合权重等手段使情感时序平稳平滑;提出SBD度量K-shape聚类模型,分析情感时序模式、特征及传播效果。【结果】优化情感词典模型在多分类指标(主客观、极性分类)上F1值分别达到0.89和0.79,主客观分类器性能提升123%。对比多种时序度量聚类算法组合,SBD度量K-shape聚类模型在戴维森堡丁指数和轮廓系数指标上均性能更优。【局限】情感词典算法未完全考虑网络流行语或不含中心形容词的句子情形,情感时序聚类结果描述、解释程度需要进一步加深。【结论】基于领域情感词典-SBD-Kshape算法可以削弱弹幕文本非规整噪声及时序相位偏移的影响,聚类结果可作为识别传播效果差异的依据。[Objective] This paper constructs a clustering model for sentimental time series of bullet screen texts,aiming to predict video communication effects. [Methods] First, we used the Word2Vec to expand the sentiment dictionary and optimize the performance of sentiment classifiers. Then we added comprehensive weights to make the sentiment sequence smooth and stable. Finally, we constructed the SBD measurement and K-shape clustering model to analyze sentiment sequence patterns, characteristics, and communication effects. [Results] The optimized model had F1 values of 0.89 and 0.79 with multi-classification indicators(subjective or objective, and polar classification). The performance of the subjective and objective classifier was improved by 123%.Compared with the existing multiple time series measurement clustering algorithms, the proposed new model generated better Davies-Bouldin Index and Silhouette Index. [Limitations] The new algorithm did not fully utilize the Internet buzzwords or sentence situations without central adjectives. The description and interpretation of sentimental time series clustering results need to be further explored. [Conclusions] The proposed model could reduce the irregular noise and the timing phase shift of the bullet screen texts, while the clustering results are the basis for identifying the different effects.

关 键 词:情感词典 情感曲线 时间序列 

分 类 号:TP393[自动化与计算机技术—计算机应用技术] G250[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象