基于海量冗余网页过滤的Web挖掘技术研究  被引量:2

Web Mining Technology Research Based on the Mass Redundant Web Filter

在线阅读下载全文

作  者:赵玺[1] 

机构地区:[1]北京联合大学师范学院,北京100011

出  处:《科技通报》2013年第4期21-22,25,共3页Bulletin of Science and Technology

摘  要:智能教学系统通过搜索网页关键词获取教学资源时,由于存在许多具有相同关键词的垃圾网页的影响,使得教学资源较难从海量网页信息中快速挖掘出来,传统的关键词查找方法受垃圾网页的影响使得搜索量过大,造成智能教学资源获取的及时性不高。为此,提出Web信息抽取技术应用在智能教学资源挖掘中。根据教学资源获取要求批量获取相关Web网页,利用Xpath语言结合搜索请求和网页主题信息块特征对Web网页进行清洗,然后根据Web文本特征模型挖掘出教学所需的资源。仿真实验表明,这种方法能够有效克服垃圾网页地干扰,快速完成教学资源地挖掘,取得了满意的结果。Research intelligent teaching system of teaching resources fast mining.When intelligent teaching system through the web keywords to search the teaching resources,because there are many with the same key words of garbage the influence of the web page,which is hard to teaching resources from huge web information quickly dug out.The traditional ways to search keywords by the municipal waste the influence of web search volume is too large,cause intelligent teaching resources of the gain of timeliness is not high.In order to solve this problem,this paper puts forward Web information extraction technology used in intelligent teaching resource mining.According to the teaching requirements for access to resources related Web page batch,Xpath language is used to union search requests and Web page subject information piece features on the Web page for cleaning,and then based on the Web text characteristic model dig out the teaching resources needed.The simulation experiment shows that this method can effectively avoid the interference of garbage web page,complete the teaching resources of the fast mining,and satisfactory results were obtained.

关 键 词:智能教学 垃圾网页 信息抽取 

分 类 号:TP30[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象