机器学习下半结构化文本信息抽取仿真  

Simulation of Semi-Structured Text Information Extraction Using Simulation of Machine Learning

在线阅读下载全文

作  者:朱小龙 邱林[3] ZHU Xiao-long;QIU Lin(School of Computer Engineering,Jingchu University of Technology,Jingmen Hubei 448000,China;Institute of Intelligent Information Technology,Hubei Jingmen Industrial Technology Research Institute,Jingmen Hubei 448000,China;College of Computer Science,Yangtze University,Jingzhou Hubei 434023,China)

机构地区:[1]荆楚理工学院计算机工程学院,湖北荆门448000 [2]湖北省荆门产业技术研究院智能信息技术研究所,湖北荆门448000 [3]长江大学计算机科学学院,湖北荆州434023

出  处:《计算机仿真》2023年第2期540-544,共5页Computer Simulation

摘  要:为了在海量信息源中抽取特定信息,将高维信息转换为低维信息,降低信息抽取难度,提出基于机器学习的半结构化文本信息抽取算法。利用自编码网络对文本信息实行降维处理,将高维的文本信息转变为低维信息,降低信息抽取的复杂度;在单词相似度和文本相似度的基础上,对文本信息实行聚类处理,将机器学习中的隐马尔可夫模型应用在不同的文本信息类别中,实现半结构化文本信息的抽取。仿真结果表明,所提算法的信息抽取精度高、召回率高、准确率高、抽取效率高。In order to extract specific information from massive information,it is necessary to convert high-dimensional information into low-dimensional information,and thus to reduce the difficulty of information extraction.Therefore,an algorithm of extracting semi-structured text information based on machine learning was proposed.Firstly,we used the self-encoding network to reduce the dimension of text information and then to convert high-dimensional text information into low-dimensional information,thus reducing the complexity of information extraction.Based on word similarity and text similarity,the text information was clustered.Moreover,the hidden Markov model in machine learning was applied to different text information categories.Finally,we extracted semi-structured text information.Simulation results show that the proposed algorithm has high precision of information extraction,high recall rate,high accuracy and high extraction efficiency.

关 键 词:机器学习 自编码网络 信息聚类 隐马尔可夫模型 半结构化文本 信息抽取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象