多特征融合的专利功效短语抽取  

Patent efficacy phrase extraction based on multi-feature fusion

在线阅读下载全文

作  者:游新冬 赵颖 刘佳琦 吕学强[1] YOU Xin-dong;ZHAO Ying;LIU Jia-qi;LYU Xue-qiang(Beijing Key Laboratory of Internet Culture Digital Dissemination,Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]北京信息科技大学网络文化与数字传播重点实验室,北京100101

出  处:《计算机工程与设计》2024年第5期1413-1419,共7页Computer Engineering and Design

基  金:国家自然科学基金项目(62171043);北京市自然科学基金项目(4212020);国家语委基金项目(ZDI145-10、YB145-3);国防科技重点实验室基金项目(6412006200404);北京市教育委员会科学研究计划基金项目(KM202111232001)。

摘  要:为提高专利功效短语抽取的准确率和召回率,保障专利布局等研究工作的高质量进行,提出一种融合多特征的专利功效短语抽取模型。基于Bert-BiLSTM-CRF的整体框架,利用Bert模型对文本进行向量化,融合偏旁部首、五笔、词长+词性等特征输入到BiLSTM或Transformer进行编码,使用CRF解码得到对应输入的标签序列,得到专利功效短语。实验采用新能源汽车领域的专利文本作为训练数据,尝试组合不同的特征进行实验,实验结果表明,所提模型在准确率、召回率、F1值上均取得了明显提升,验证了多特征融合在功效短语抽取任务上的有效性。To improve the accuracy and recall rate of patent efficacy phrase extraction and ensure the high quality of patent layout research,a multi-feature extraction model of patent efficacy phrase was proposed.Based on the overall framework of Bert-BiLSTM-CRF,the Bert model was used to vectorize the text,which integrated features such as radicals,five strokes,word length and part of speech,and these features were inputted into BiLSTM or transformer for encoding,and the CRF decoder was used to get the corresponding tag sequences and then enter the tag sequence to get the patent efficacy phrases.Patent texts in the field of new energy vehicles were used as training data.With different features combined,experimental results show that the proposed model can achieve significant improvements in accuracy,recall,and F1 value,which verifies the effectiveness of multi-feature fusion on the task of extracting efficacy phrases.

关 键 词:多特征融合 专利功效短语 深度学习 词语抽取 双向长短期记忆模型 条件随机场模型 词向量模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象