检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘逍 龚庆悦[1] 李铁军 王红云[1] LIU Xiao;GONG Qing-yue;LI Tie-jun;WANG Hong-yun(College of Artificial Intelligence and Information Technology,Nanjing University of Chinese Medicine,Nanjing 210046,China;The Second Affiliated Hospital of Nanjing University of Chinese Medicine(Jiangsu Second Hospital of Traditional Chinese Medicine),Nanjing 210017,China)
机构地区:[1]南京中医药大学人工智能与信息技术学院,江苏南京210046 [2]南京中医药大学第二附属医院(江苏省第二中医院),江苏南京210017
出 处:《软件导刊》2022年第11期12-18,共7页Software Guide
摘 要:自然语言处理中,实体与关系抽取是构建知识图谱、设计问答系统、语义分析等任务中不可或缺的环节。中医领域的信息多数以非结构化文本形式储存,中医文本关键信息抽取对挖掘名老中医的经验有重要作用。然而,中医文本往往存在样本不均衡、实体关系多词一义的问题,如多种诊断结果指向同一证候。为解决这些问题,构建半监督学习框架下基于SimBERT的关系抽取模型对中医文本的实体关系进行抽取,利用SimBERT的相似文本生成功能进行文本增强,以解决样本不均衡问题,SimBERT的相似句检索功能较好地解决了多词一义的问题。实验结果证明,半监督学习框架下的SimBERT模型在构建的中医医案数据集上能更精确地抽取中医文本中的实体关系。In natural language processing,entity and relation extraction is an indispensable part of knowledge graph construction,question answering system design,semantic analysis and other tasks. Most of the information in the field of TCM is stored in the form of unstructured texts. The extraction of key information in TCM texts plays an important role in mining the experience of famous TCM practitioners. However,traditional Chinese medicine texts often have the problems of imbalanced samples and multiple words and one meaning in entity relationship,such as multiple diagnosis results pointing to the same syndrome. To solve these problems,constructed a relationship extraction model based on SimBERT under the semi-supervised learning framework to extract entity relations of traditional Chinese medicine texts. The similar text generation function of SimBERT is used to enhance the text to solve the problem of unbalanced samples. The similar sentence retrieval function of SimBERT solves the problem of multiple words with one meaning. The experimental results show that the SimBERT model based on semi-supervised learning framework can extract entity relations from TCM texts more accurately on the TCM medical case data set constructed in this paper.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28