基于预训练语言模型和TRIZ发明原理的专利分类方法  

Patent Classification Method Based on Pre-trainedLanguage Model and TRIZ Inventive Principle

在线阅读下载全文

作  者:贾丽臻 白晓磊 JIA Li-zhen;BAI Xiao-lei(College of Transportation Science and Engineering,Civil Aviation University of China,Tianjin 300300,China;College of Aeronautical Engineering,Civil Aviation University of China,Tianjin 300300,China)

机构地区:[1]中国民航大学交通科学与工程学院,天津300300 [2]中国民航大学航空工程学院,天津300300

出  处:《科学技术与工程》2024年第30期13055-13063,共9页Science Technology and Engineering

基  金:中央高校基本科研业务费(3122022052)。

摘  要:为充分挖掘专利文本中已有的解决方案和技术知识,依据发明问题解决理论(theory of inventive problem solving,TRIZ),提出了一种基于预训练语言模型的方法,将其用于面向TRIZ发明原理的中文专利分类研究中。基于整词掩码技术,使用不同数量的专利数据集(标题和摘要)对中文RoBERTa模型进一步预训练,生成特定于专利领域的RoBERTa_patent1.0和RoBERTa_patent2.0两个模型,并在此基础上添加全连接层,构建了基于RoBERTa、RoBERTa_patent1.0和RoBERTa_patent2.0的三个专利分类模型。然后使用构建的基于TRIZ发明原理的专利数据集对以上三个分类模型进行训练和测试。实验结果表明,RoBERTa_patent2.0_IP具有更高的准确率、宏查准率、宏查全率和宏F 1值,分别达到96%、95.69%、94%和94.84%,实现了基于TRIZ发明原理的中文专利文本自动分类,可以帮助设计者理解与应用TRIZ发明原理,实现产品的创新设计。To fully explore the existing solutions and technical knowledge in patent texts,based on TRIZ(theory of inventive problem solving),a method based on pre-trained language models is proposed for Chinese patent classification research oriented towards TRIZ inventive principles.Based on WWM(whole word masking technology),the Chinese RoBERTa model was further pre-trained with different number of patent datasets(composed of title and abstract of patent),and RoBERTa_patent1.0 and RoBERTa_patent 2.0 models specific to the patent domain were generated.On this basis,a Fully Connected Layer was added to construct three patent classification models based on RoBERTa,RoBERTa_patent1.0 and RoBERTa_patent2.0.Then,the constructed patent datasets based on TRIZ inventive principle was used to train and test the above three patent classification models.The experimental results show that,RoBERTa_patent2.0_IP has higher accuracy,P Macro,R Macro,and F 1Macro,reaching 96%,95.69%,94%,and 94.84%respectively,achieving automatic classification of Chinese patent texts based on TRIZ inventive principle and helping designers understand and apply TRIZ inventive principle and achieve innovative product design.

关 键 词:预训练语言模型 RoBERTa 发明原理 整词掩码技术 文本分类 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象