Research on Web Page Automatic Classification Based on Internet News Corpus  

Research on Web Page Automatic Classification Based on Internet News Corpus

在线阅读下载全文

作  者:蔡巍 王永成 尹中航 

机构地区:[1]Dept. of Computer Science & Eng., Shanghai Jiaotong Univ.

出  处:《Journal of Shanghai Jiaotong university(Science)》2007年第6期731-735,共5页上海交通大学学报(英文版)

基  金:The National Natural Science Foundation of China(No60082003)

摘  要:Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.

关 键 词:AUTOMATIC classification Web PAGES SUBJECT EXTRACTION 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象