动态分块网页主题信息准确自动提取仿真  被引量:2

Dynamic Segmentation Web Page Topic Information Accurate and Automatic Extraction Simulation

在线阅读下载全文

作  者:崔彦青 CUI Yan-qing(Institute of Computer Information,Inner Mongolia Medical University,Hohhot Inner Mongolia 010110,China)

机构地区:[1]内蒙古医科大学计算机信息学院

出  处:《计算机仿真》2019年第10期349-352,377,共5页Computer Simulation

基  金:国家自然科学基金项目(51167010)

摘  要:针对当前方法在进行动态分块网页主题信息自动提取是存在提取准确率较低、错误率较高、耗时较长的缺点,采用混合加权方法对动态分块网页主题信息进行自动提取.在对动态分块网页主题信息进行预处理的基础上,构建预处理后动态分块网页主题信息的分层树模型,确定网页主题信息的内在联系,采用二元集合序列描述目标提取的动态分块网页主题信息,计算不同的网页主题信息文本对全网页主题信息的贡献程度;采用空间向量模型描述动态分块网页主题信息特征,并利用混合加强的方法从空间向量模型中提取动态分块网页主题信息.仿真结果证明,采用的方法耗时可控制在0.1s内,对样本数据提取的准确率较高.说明采用的方法能够实现动态分块网页主题信息的准确、高效提取.Currently, the method has low accuracy, high error rate and long time consumption in automatically extracting topic information from dynamic partitioned web page. In this paper, the mixed weight method was used to automatically extract topic information from dynamic partitioned page. On the basis of preprocessing the topic information of dynamic partitioned web page, the hierarchical tree model of topic information of dynamic partitioned web page after the pretreatment was constructed, and then the internal relation of topic information of web page was determined. Moreover, a binary set sequence was used to describe the topic information of dynamic partitioned web page of object extraction and calculate the contribution degree of different page topic information texts to the whole web page topic information. Finally, the spatial vector model was used to describe the feature of topic information of dynamic partitioned web page. Meanwhile, the method of mixed enhancement was used to extract the topic information of dynamic partitioned web page from the spatial vector model. Simulation results prove that the time consumption of proposed method is controlled within 0.1 s. Meanwhile, the accuracy of sample data extraction is high. Therefore, the proposed method can achieve accurate and efficient extraction of topic information from dynamical partitioned web page.

关 键 词:动态分块网页 主题信息 自动提取 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象