面向中医文献的短语挖掘方法  

Phrase Mining Technology for TCM Literature

在线阅读下载全文

作  者:谢永红[1,2] 蒋彦钊 贾麒 范欣欣 XIE Yonghong;JIANG Yanzhao;JIA Qi;FAN Xinxin(School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China;Beijing Key Laboratory of Knowledge Engineering for Materials Science,Beijing 100083,China)

机构地区:[1]北京科技大学计算机与通信工程学院,北京100083 [2]材料领域知识工程北京市重点实验室,北京100083

出  处:《情报工程》2021年第6期76-87,共12页Technology Intelligence Engineering

基  金:中国科学技术信息研究所情报工程实验室2019年开放基金项目“中医药古代文献的知识术语挖掘关键技术研究”。

摘  要:[目的 /意义]在中医文献中存在大量的短语,目前的短语挖掘方法在中医文献上效果差强人意,针对这个问题,提出了面向中医文献的短语挖掘方法。[方法 /过程]该方法在中医文献分词器基础上,利用中医领域新语言知识库,训练得到短语质量评分模型,并在此基础上利用词性标签信息构建短语分割模型对文献进行挖掘,提高中医文献中短语挖掘的准确率。并在《中医古代名医医案》上进行实验。[结果 /结论 ]选取挖掘短语的Top300对其进行精确率的评估,其准确率为84.96%。实验证明中医文献分词器+短语分割模型的挖掘方法在中医领域文献上的短语挖掘效果优于其他挖掘方法。[Objective/Significance] There are a large number of high-quality phrases in traditional Chinese medicine literature.The current phrase mining method is unsatisfactory in traditional Chinese medicine literature. In response to this problem, a phrase mining method for traditional Chinese medicine literature is proposed in this research. [Methods/Process] The method uses the new language knowledge based on the field of traditional Chinese medicine to train the phrase quality scoring model on the basis of text segmentation. On this basis, it uses the part-of-speech tag information to construct a phrase segmentation model to mine the literature and improve the precision of phrase mining. Then, this research conduct experiments in “Ancient Chinese Medical Cases of Traditional Chinese Medicine”. [Results/Conclusions] The Top300 mining phrase was selected to evaluate the precision, and the precision was 84.96%. The experiment proves that the text segmentation for the traditional Chinese medicine literature + phrase segmentation model is superior to other mining methods in the literature of traditional Chinese medicine.

关 键 词:中医文献短语挖掘 短语挖掘 高质量短语 中医文献分词器 短语质量评分模型 词性标签 短语分割模型 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] G35[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象