机构地区:[1]食品质量与安全北京实验室,北京100083 [2]中国农业大学信息与电气工程学院,北京100083
出 处:《农业工程学报》2021年第20期211-218,共8页Transactions of the Chinese Society of Agricultural Engineering
基 金:现代农业产业技术体系北京市生猪产业创新团队项目(BAIC02-2021);国家重点研发计划(2017YFC1601803)。
摘 要:人类营养健康命名实体识别旨在检测营养健康文本中的营养实体,是进一步挖掘营养健康信息的关键步骤。虽然深度学习模型广泛应用在人类营养健康命名实体识别中,但没有充分考虑到营养健康文本中含有大量的复杂实体而出现长距离依赖的特点,且未能充分考虑词汇信息和位置信息。针对人类营养健康文本的特点,该研究提出了融合规则与BERT-FLAT(Bidirectional Encoder Representations from Transfromers-Flat Lattice Transformer,转换器的双向编码器表征量-平格变压器)模型的营养健康文本命名实体识别方法,识别了营养健康领域中食物、营养物质、人群、部位、病症和功效作用6类实体。首先通BERT模型将字符信息和词汇信息进行嵌入以提高模型对实体类别的识别能力,再通过位置编码与词汇边界信息结合的Transformer模型进行编码以提高模型对实体边界的识别效果,利用CRF(Coditional Random Field,条件随机场)获取字符预测序列,最后通过规则对预测序列进行修正。试验结果表明,融合规则与BERT-FLAT模型的人类营养健康领域识别的准确率为95.00%,召回率为88.88%,F1分数为91.81%。研究表明,该方法是一种有效的人类营养健康领域实体识别方法,可以为农业、医疗、食品安全等其他领域复杂命名实体识别提供新思路。A nutritious and healthy diet can be widely expected to reduce the incidence of disease,while improving body health after the disease occurs.The nutritional diet knowledge can be acquired mostly through the Internet in recent years.However,reliable and integrated information is highly difficult to discern using time-consuming searching of the huge amount of Internet data.It is an urgent need to integrate the complicated data,and then construct the knowledge graph of nutrition and health,particularly with timely and accurate feedback.Among them,a key step is to accurately identify entities in nutritional health texts,providing effective location data support to the construction of knowledge graphs.In this study,a BRET+BiLSTM+CRF(Bidirectional Encoder Representations from Transformers+Bi-directional Long Short-Term Memory+Conditional Random Field)model was first used with location information.It was found that the precision of the model was 86.56%,the recall rate was 91.01%,and the F1 score was 88.72%,compared with the model without location information,indicating improved by 1.55,0.20,and 0.32 percentage points.A named entity recognition was also proposed to accurately obtain six types of entities in text:food,nutrients,population,location,disease,and efficacy in the field of human nutritional health,combining rules with BERT-FLAT(Bidirectional Encoder Representations from Transformers-Flat Lattice Transformer)model.Firstly,the character and vocabulary information were stitched together and pre-trained in the BERT model to improve the recognition ability of the model to entity categories.Then,a position code was created for the head and tail position of each character and vocabulary,where the entity position was located with the help of a position vector,in order to improve the recognition of entity boundary.A long-distance dependency was also captured using the Transformer model.Specifically,the output of the BERT model was embedded into the Transformer as a character-embedding conjunction word,thus for the chara
关 键 词:营养 健康 食物 命名实体识别 自注意力机制 BERT模型 Transformer模型
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...