User Profiling for CSDN:Keyphrase Extraction,User Tagging and User Growth Value Prediction  

在线阅读下载全文

作  者:Guoliang Xing Hao Gao Qi Cao Xinyu Yue Bingbing Xu Keting Cen Huawei Shen 

机构地区:[1]Key Laboratory of Network Data Science and Technology,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China [2]University of Chinese Academy of Sciences,Beijing 100049,China

出  处:《Data Intelligence》2019年第2期137-159,共23页数据智能(英文)

基  金:The work is supported by the National Natural Science Foundation of China(NSFC)under grant numbers 61472400,91746301 and 61802371;H.Shen is also funded by K.C.Wong Education Foundation and the Youth Innovation Promotion Association of the Chinese Academy of Sciences.

摘  要:The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams.

关 键 词:User profiling Keyphrase extraction User tagging Growth value prediction Word embedding 

分 类 号:TN9[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象