基于Hadoop的固网宽带终端识别技术研究和实现  被引量:2

Research and Implementation of Terminal Identification Technology of Fixed-line Broadband Based on Hadoop

在线阅读下载全文

作  者:范孟可 王攀[1] 

机构地区:[1]南京邮电大学物联网学院,江苏南京210003

出  处:《计算机技术与发展》2017年第11期171-175,共5页Computer Technology and Development

基  金:2015江苏省产学研前瞻性联合研究项目(BY2015011-02)

摘  要:随着大数据时代的来临,大数据在各个行业应用越来越广泛。大数据在运营商行业的应用也很普遍,但同时也遇到了很多技术问题,其中家庭画像的塑造是运营商大数据的一个核心问题。如何提取和识别固网宽带下的终端类型是一个有待解决的问题。不像移动网,固网宽带由于没有信令通道,所以不携带任何准确的终端信息,因而对固网下的终端类型识别比较困难。传统方法都是采用解析和匹配HTTP GET报文中的UA字段进行识别。但这种方法由于UA的非标准化,以及终端数量和种类众多的缘故而导致终端类型的识别准确率低下。文中采用Hadoop框架,利用Hive中UDF的方法,结合分布式爬虫获取终端库,可以更加快速准确地识别出用户上网终端信息。实验结果表明,终端识别准确率可以达到92%以上,相比传统方法有了大幅提升。With the coming of the era of big data,big data is more and more widely applied in various industries, which is also done in op- erators industry, but many technical problems are found simultaneously, of which family portraits of shaping is a core for operators of large data. How to extract and identify the terminal type of fixed-line broadband is a problem needed to be solved. Unlike mobile net- work, fixed-line broadband don't take any accurate terminal information due to lack of signaling channel, so it is hard to conduct termi- nal type identification in fixed-line. The traditional method adopts UA fields of HTTP GET message parsing and matching for identifica- tion,but it is low in identification accuracy because of UA non-standardized and the large amounts of terminal number and varieties. Based on the Hadoop framework, the UDF of Hive is used, and combined with the distributed crawler for obtainment of terminal library, the user terminal information online is identified more quickly and accurately. According to the experiment, the accuracy of terminal iden- tification can reach above 92% ,a substantial increase compared with the traditional method.

关 键 词:终端识别 HADOOP User Defined Function(UDF) 分布式爬虫 固网宽带 大数据运营 

分 类 号:TP31[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象