基于Apriori改进算法的企业Web日志挖掘研究  被引量:1

Research on Enterprise Web Log Mining Based on Improved Apriori Algorithm

在线阅读下载全文

作  者:吴红星[1] 王浩[1] 

机构地区:[1]合肥工业大学计算机与信息学院,安徽合肥230009

出  处:《计算机技术与发展》2015年第4期43-47,共5页Computer Technology and Development

基  金:国家"973"重点基础研究发展计划项目(2013CB329604);国家"863"高技术发展计划项目(2012AA011005)

摘  要:由于企业的Web日志中隐藏着大量有价值的信息,Apriori算法的缺点在于产生大量的候选集以及频繁扫描数据集,文中是基于协同门户和网站的日志信息进行研究。企业的协同门户里企业通知栏目可以随时发布企业的相关通知信息,是企业第一时间想让用户看到的。而网站里企业的新闻栏目也是想给用户展示企业的相关新闻信息和企业的经营活动信息,完成企业品牌以及企业文化的宣传等。基于协同门户和网站在企业的这点共性,文中提出了针对企业的一种改进Apriori算法,即在企业主动向访问者展现通知公告或者企业的经营新闻信息的前提下,挖掘出其他一级主栏目在访客心中的地位,以及访客对这些栏目的关注度和兴趣度,以便于企业实现如何调整其他栏目布局,更好地为企业宣传做服务,同时又能满足访问者的便捷访问,等等。文中算法改进的核心思想是减少候选集来对Apriori算法进行改进。在Apriori算法的扫描过程中,某个ID不参与,当算法挖掘出最大频繁集后再将这个ID添加到最大频繁项集的每个项集中,开展关联规则的挖掘。这样在数据集的扫描次数及候选集的产生上都有较大程度的优化。对比实验结果表明,改进的Apriori算法效果明显,对企业有较强的实际应用价值。A large number of valuable information is hidden in the enterprise Web log, the disadvantage of Apriori algorithm is to produce a large number of candidate set and frequent scan data set. In this paper, study based on Web log information from collaborative Web por- tal. The enterprises collaborative Web portal can release the relevant notice of enterprise information at the announcements column at any time, which is what the enterprise want visitors to see at the first time. The Website news is to show visitors for enterprise related news, information and enterprise management activities,it' s also to complete the enterprise brand and enterprise culture propaganda, etc. Based on the general character of collaborative Web portal, present an improved Apriori algorithm for enterprises, the enterprises show visitors announcements or business news and information actively, dig out the status of the other main column in visitors, and the degree of these columns' attention and interest in visitors. In this way, the enterprises can adjust the other column layout, do better service for enterprise propaganda, and meet the visitors' convenient access, etc. The core of the improved algorithm is to reduce the candidate set. In the process of scanning of Apriori algorithm,an ID is not to participate in, when the algorithm mining the maximum frequent sets and then adding the ID to the maximum frequent item sets concentration of each item, to carry out the association rules mining. There is a larger degree of optimization in the number of data sets of scanning and candidate set generation. After the contrast experiments, it shows that the improved Apriori algorithm is effective and has the strong practical application value for enterprises.

关 键 词:WEB应用 日志 关联规则 算法改进 APRIORI算法 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象