基于混合特征的电影评分预测系统  被引量:1

Film Rating Prediction System Based on Mixed Features

在线阅读下载全文

作  者:黄东晋[1] 耿晓云 李娜[1] 丁友东[1] HUANG Dong-jin;GENG Xiao-yun;LI Na;DING You-dong(Shanghai University,Shanghai 200072,China)

机构地区:[1]上海大学,上海200072

出  处:《计算机技术与发展》2020年第12期136-141,共6页Computer Technology and Development

基  金:国家自然科学基金项目(61402278);上海市自然科学基金项目(19ZR1419100)。

摘  要:电影评分是衡量一部电影优劣的重要标准,对于投资商和观影者极具参考价值,因此电影评分的预测成为电影领域的研究热点。然而目前的评分预测系统由于特征信息不足,特征工程处理方法过于简单,机器学习算法较为单一,所以预测误差偏大。针对这一问题,结合自然语言处理技术提出一种基于混合特征的预测模型,并应用到电影评分预测系统中。数据集来源是某常用电影网站,同时为了获取更好的训练数据,需要对电影特征信息进行复杂的特征工程处理。利用训练完成的Bert模型矢量化电影数据集中的文本信息得到文本矢量特征,并采用支持向量机(SVM)算法初步训练预测评分。将该评分作为一维新特征和电影特征信息一起通过随机森林(random forest)算法训练预测最终评分。实验结果表明,该预测模型是可行的,预测值与真实值的误差较小,准确性显著提升。Film rating is an important criterion for measuring the pros and cons of a film,which is of great reference value for investors and moviegoers.Therefore,the prediction of film rating has become a research hotspot in the film field.However,the current film rating prediction system has insufficient feature information,the feature engineering processing method is too simple,and the machine learning algorithm is relatively simple,so the prediction error is too large.Aiming at this problem,a prediction model based on mixed features is proposed in combination with natural language processing technology and applied to the film rating prediction system.The source of the dataset is a commonly used film website.At the same time,in order to obtain better training data,complex feature engineering processing of film feature information is required.The trained Bert is used to vectorize the text information in the film dataset to obtain the text vector features,and the support vector machine(SVM)algorithm is used to initially train and predict the text rating.The rating is used as a one-dimensional new feature along with film feature information to train and predict the final rating through the random forest algorithm.The experiment shows that the prediction model is feasible,the error between the predicted value and the real value is small,and the accuracy is significantly improved.

关 键 词:电影评分预测 机器学习 自然语言处理 文本矢量特征 Bert 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象