基于文本及历史数据的多标签专利分类算法研究被引量：1

Multi-label Patent Classification Based on Text and Historical Data

作　　者：徐雪洁王宝会[1] XU Xuejie;WANG Baohui(College of Software,Beihang University,Beijing 100191,China)

出　　处：《计算机科学》2024年第5期172-178,共7页Computer Science

摘　　要：专利分类是专利数据挖掘领域一项非常重要的任务,该任务的目标是为给定专利文献分配若干个国际专利分类(IPC)号,近几年针对该任务的很多研究都集中在通过挖掘专利文本表示对IPC分类体系中部级或大类级分类号的多分类预测。而实际场景中,一篇专利往往有多个分类号,是一种多标签分类任务,且除了专利的文本内容外,每个专利都有对应的专利权组织,专利权组织的历史专利申请行为会有一定的业务倾向,这种申请行为的偏好表示能有效提高专利分类准确度。然而,目前专利分类的相关研究中并没有充分利用到专利的历史数据,针对IPC体系小类的多标签分类问题,提出了一个综合考虑专利内容的专利自动分类模型。首先用BERT预训练语言模型初始化专利文本表示,再利用Text-CNN捕捉局部特征获得将其输出作为专利文本的最终表示;其次,通过Bi-LSTM对历史专利文本及专利标签进行双通道聚合,学习该组织的历史专利申请行为表示;最后,将专利的文本表示与历史专利申请行为表示进行融合后做预测。在真实专利数据集上,将所提模型与基于专利文本挖掘的不同基线进行了对比实验,结果表明基于专利文本和历史数据建模的深度学习分类算法在精确度上有很大的提升。Patent classification,which is used to assign multiple international patent classification(IPC)codes to a given paten,is a very important task int the field of patent data mining.In recent years,many studies on this task focus on mining patent text to predict the first or second level codes for IPC.In real scenarios,a patent often has multiple IPC codes which is a multi-label classification task.Apart from the texts,each patent has a corresponding assignee and the assignee's historical patent application behavior may have a certain business tendency.The preference representation of this behavior can effectively improve the precision of patent classification.However,previous methods fail to make full use of patent historical data.A classification model is proposed for patent automatic classification.Main processing of this model is as follows:firstly,initialize the patent text representation with BERT pretraining language model,then use Text-CNN model to capture local features and take the output as the final patent text representation;secondly,Bi-LSTM is used to learn the preference representation by aggregating historical patent texts and labels through dual channels;finally,we fuse the texts and assignee's sequential preferences for prediction.Experiments on real data set and comparisons with different baselines show that the proposed patent classification algorithm based on patent text and historical data has a great improvement in precision.

关键词：深度学习多标签专利自动分类 IPC分类号专利

分类号：TP312[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于文本及历史数据的多标签专利分类算法研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于文本及历史数据的多标签专利分类算法研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于文本及历史数据的多标签专利分类算法研究被引量：1