基于改进遗传退火HMM的Web信息抽取研究  被引量:3

RESEARCH ON Web INFORMATION EXTRACTION BASED ON IMPROVED GENETIC ANNEALING AND HMM

在线阅读下载全文

作  者:李荣[1] 冯丽萍[1] 王鸿斌[1] 

机构地区:[1]忻州师范学院计算机科学与技术系,山西忻州034000

出  处:《计算机应用与软件》2014年第4期40-44,共5页Computer Applications and Software

基  金:国家自然科学基金项目(601003247);山西省高校科技开发项目(20101120;2013147);忻州师院重点学科建设项目(ZDXK201204)

摘  要:为进一步提高Web信息抽取的准确率,针对隐马尔可夫模型HMM(Hidden Markov Model)及混合法在参数寻优上的不足,提出一种改进遗传退火HMM的Web抽取算法。构建一个后向依赖假设的HMM;用改进遗传退火优化HMM参数,将遗传算子和模拟退火SA(simulated annealing)参数改进后,据GA(genetic algorithm)的自适应交叉、变异概率给子群体分类,实现多种群并行搜索和信息交换,以避免早熟,加速收敛;并将SA作为GA算子,加强局部寻优能力;最后,用双序Viterbi解码,与现有HMM优化法相比,实验的综合Fβ=1平均提高了6%,表明改进算法能有效提高抽取准确率和寻优性能。In order to further raise the accuracy of Web information extraction,for the shortcomings of hidden Markov model( HMM) and its hybrid method in the parameter optimisation,we present a Web extraction algorithm which is based on the improved genetic annealing and HMM. First,the algorithm sets up a novel HMM with backward dependency assumption; secondly,it applies the improved genetic annealing algorithm to optimise HMM parameters. After the genetic operators and parameters of simulated annealing( SA) have been improved,the subpopulations are classified according to the adaptive crossover and mutation probability of GA in order to realise the multi-group parallel search and information exchange,which can avoid premature and accelerate convergence. Then SA is taken for a GA operator to strengthen the local searching capability. Finally,the bi-order Viterbi algorithm is used for decoding. Compared with existing HMM optimisation method,the comprehensive Fβ = 1value in experiment increases by 6% in average,which shows that the improved algorithm can effectively raise the extraction accuracy and search performance.

关 键 词:信息抽取 遗传退火 隐马尔可夫模型 VITERBI算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象