基于大数据岗位需求的文本聚类研究  被引量:20

Research on Text Clustering Based on Requirements of Big Data Jobs

在线阅读下载全文

作  者:刘睿伦 叶文豪[1] 高瑞卿 唐梦嘉 王东波[1] 

机构地区:[1]南京农业大学信息科学技术学院,南京210095

出  处:《数据分析与知识发现》2017年第12期32-40,共9页Data Analysis and Knowledge Discovery

基  金:江苏省社会科学基金项目"大数据环境下汉英短语级平行语料标注及知识挖掘研究"(项目编号:13XWC017)的研究成果之一

摘  要:【目的】对大数据工作岗位需求文本进行挖掘,帮助大数据企业更精准地定位所需人才。【方法】抽取招聘网站上2017年第一季度关于"大数据"的工作岗位信息,使用TF-IDF并结合Word2Vec和K-means实现基于语义的聚类,并利用轮廓系数方法获取最佳聚类效果。【结果】利用抽取获得的实体对文本向量进行表达能够达到良好的聚类效果,最终将岗位需求文本分为工作能力要求、学历要求以及工作经验要求三类。【局限】各网站信息发布的格式不统一,数据清洗不够全面,对聚类效果产生影响;挖掘获取的招聘信息数据量不充足,使Word2Vec模型训练集较小,训练结果还有提升空间。【结论】根据聚类结果发现大数据岗位对学历要求不高、企业偏好有经验的但也不排除无经验的求职者、企业对职位素养要求要高于计算机技术要求等特点。[Objective] This study analyzes the requirements of big data related positions, aiming to identify high-quality candidates for the companies. [Methods] We retrieved job postings in the field of big data from major recruitment websites during the first quarter of 2017. Then, we used the TF-IDF, word2 vec, and k-means algorithms to cluster the texts semantically, which were optimized with the help of silhouette coefficient. [Results] We obtained very good clustering results, and divided the job requirements into three categories of capability, education background and work experiences. [Limitations] First, the formats of job announcement posted on different websites were not unified, which affected the data cleaning and clustering. Second, the training set for word2 vec was small due to insufficient data retrieved from the Web. [Conclusions] We found that the big data related jobs do not require advanced degrees and the companies prefer experienced candidates. Those applicants with no relevant experience will also be considered. The candidates' professionalism is more important than their computer skills.

关 键 词:大数据岗位 Word2Vec K-MEANS 轮廓系数 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象