基于容错Earley解析算法的领域语义文法自动学习方法  

Automatic Learning Method of Domain Semantic Grammar Based on Fault-tolerant Earley Parsing Algorithm

在线阅读下载全文

作  者:马一帆 马涛涛 方芳 王石[2] 唐素勤 曹存根[2] MA Yi-fan;MA Tao-tao;FANG Fang;WANG Shi;TANG Su-qin;CAO Cun-gen(School of Computer Science and Information Engineering,Guangxi Normal University,Guilin,Guangxi 541000,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100190,China;Department of Educational Technology,Faculty of Education,Guangxi Normal University,Guilin,Guangxi 541000,China)

机构地区:[1]广西师范大学计算机科学与信息工程学院,广西桂林541000 [2]中国科学院计算技术研究所,北京100190 [3]中国科学院信息工程研究所,北京100190 [4]广西师范大学教育学部教育技术系,广西桂林541000

出  处:《计算机科学》2021年第11期276-286,共11页Computer Science

基  金:科技部重点研发计划课题(2017YFC1700302);北京市科技新星计划交叉学科合作课题(Z191100001119014);国家重点研发计划重点专项(2017YFB1002300);国家自然科学基金(61967002)。

摘  要:精细化的领域文本分析是高质量领域知识获取的重要前提,它通常依赖于大量某种形式的语义文法产生式,但总结这些文法通常耗时耗力。对此,文中提出了一种基于容错Earley解析算法的语义文法自动学习方法,根据种子文法自动生成新的语义文法(包括词类和文法产生式),以减少人工成本。该方法利用优化后的容错Earley解析器,对输入的语句进行容错解析,然后根据容错解析生成的解析树产生候选语义文法,最后对候选语义文法进行过滤或纠正得到最终的语义文法。在5种不同疾病的中医医案的实验中,该方法的词类学习的正确率达到63.88%,文法产生式学习的正确率达到81.78%。Refined domain text analysis is an important prerequisite for high-quality domain knowledge acquisition.It usually relies on a large number of some form of semantic grammars,but summarizing them is often time-consuming and labor-intensive.In this paper,an automatic learning method of semantic grammar based on fault-tolerant Earley parsing algorithm is proposed,which automatically generates new semantic grammars(including lexicons and grammar production rules)according to seed grammar to reduce labor costs.This method uses the optimized fault-tolerant Earley parser to perform fault-tolerant parsing on the input statements,and then generates candidate semantic grammars based on the parse tree generated by the fault-tolerant parsing.Finally,the candidate semantic grammars are filtered or corrected to obtain the final semantic grammars.In the experiment of five TCM medical records with different diseases,the precision rate of learning new lexicons is 63.88%,and precision rate of learning new grammar production rules is 81.78%.

关 键 词:容错Earley解析 语义文法 文法学习 过滤算法 语义纠正 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象