检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨娟[1,2] 吴志明 张远鹏[4,5] YANG Juan;WU Zhiming;ZHANG Yuanpeng(School of Textiles and Clothing,Nantong University,Nantong,Jiangsu 226019,China;College of Textile and Clothing Engineering,Soochow University,Suzhou,Jiangsu 215123,China;School of Textile and Clothing,Jiangnan University,Wuxi,Jiangsu 214122,China;Department of Medical Informatics,Nantong University,Nantong,Jiangsu 226001,China;School of Digital Media,Jiangnan University,Wuxi,Jiangsu 214122,China)
机构地区:[1]南通大学纺织服装学院,江苏南通226019 [2]苏州大学纺织与服装工程学院,江苏苏州215123 [3]江南大学纺织服装学院,江苏无锡214122 [4]南通大学医学信息学系,江苏南通226001 [5]江南大学数字媒体学院,江苏无锡214122
出 处:《纺织学报》2018年第10期156-161,共6页Journal of Textile Research
基 金:国家自然科学基金项目(81701793);江苏高校哲学社会科学基金项目(2016SJB760064)
摘 要:针对目前网络家纺资源采集方式在处理海量网络资源尤其是深网资源时效率低下的问题,提出了一种自动化的网络家纺资源抽取方法。该方法首先根据查询接口属性有限性和收敛性的特征,构建领域模型对深网查询接口进行识别,然后利用家纺领域关键词自动填写查询接口,抽取深网家纺资源;对于返回的查询页面,为过滤与抽取与主题无关的噪声信息,对页面进行视觉分块,利用标记的分块样本数据训练分块重要度模型,并利用该模型过滤与主题无关的噪声信息。实验结果表明,领域模型识别深网查询接口的阳性预测值和准确率比基于规则的方法分别提高了8%和6%,分块重要度模型过滤噪声的准确率和召回率的调和平均数值在3个等级上比基于规则方法的正确率平均提高了12.90%。Aiming at the of poor efficiency while processing a huge quantity of Web resources,particularly data resources hidden in deep web by problem of current household textile resources from Web acquisition mode,an automatic approach to extract home textile resources from Web was proposed.In this approach,a domain model was firstly proposed to identify deep Web query interfaces,then the identified query interfaces were filled automatically with domain keywords from household textiles,and the household textile resources from deep Web were extracted.In addition,in order to filter noises from response Web pages,pages were divided into different view blocks,a block importance model was proposed and trained by labeled blocks,and the model was utilized to filter the noise information independent from the subject.Experimental results show that in comparison with rule-based approaches,the domain model achieves 8%and 6%improvements in terms of positive predictive value and accuracy for query interface identification.Also,the block importance model achieves average 12.9%improvements at three levels in terms of harmonic average value for filtering noise information.
分 类 号:TP311.11[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222