检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:洪群业[1] 刘琦 刘春燕 郑路[1] 李烨辉 杨申学 HONG Qunye;LIU Qi;LIU Chunyan;ZHENG Lu;LI Yehui;YANG Shenxue(Zhengzhou Tobacco Research Institute of CNTC,Zhengzhou 450001;Intellectual Property Publishing House Co.,Ltd.,Beijing 100081;School of Intellectual Property,Nanjing University of Science&Technology,Nanjing 210094)
机构地区:[1]中国烟草总公司郑州烟草研究院,郑州450001 [2]知识产权出版社有限责任公司,北京100081 [3]南京理工大学知识产权学院,南京210094
出 处:《中国发明与专利》2024年第8期21-29,共9页China Invention & Patent
基 金:中国烟草总公司重大科技项目“烟草行业知识产权大数据综合服务研究与应用”(编号:110202101082)。
摘 要:本文基于SimBERT+CNN深度学习模型,以烟草产业相关专利为例,研究了基于烟草相关技术专利文献的智能分类技术,用于专利数据的自动技术分类或者人工辅助分类。主要研究方法:利用人工对烟草相关专利文献进行二级技术分类标注,将包括烟草技术类和非烟草技术类专利作为深度学习的样本数据,然后抽取相关专利中有X类引证的专利文献中的权利要求项和被引专利的对应文本段落作为句对,用于优化基于SimBERT构建的语义模型训练,使用训练优化后的SimBERT模型,对烟草行业的专利分类样本数据进行文字型特征向量和IPC分类号特征向量特征拼接并输入CNN模型。通过对15万余件烟草技术专利和2万余件非烟草技术专利样本的实证训练和测试,发现基于采用上述优化方法的SimBERT+CNN模型对烟草技术专利的技术分类测试准确率在一级烟草技术分类和二级技术分类方面均优于使用BERT+CNN的分类效果。This paper presents a SimBERT+CNN deep learning model for intelligent patent classification in the tobacco industry,using tobacco-related technology patents as examples.The main research method is as follows:Tobacco-related patents are manually annotated with two-level technology classifications,including tobacco technology class and non-tobacco technology class patents,to serve as sample data for deep learning.For patents with X-type citations,claim items and the corresponding text paragraphs of the cited patents are extracted as sentence pairs to optimize the semantic model training based on SimBERT.The optimized SimBERT model is used to generate textual feature vectors and IPC classification number feature vectors for the patent classification samples in the tobacco industry.These features are concatenated and fed into a CNN model.Through empirical training and testing on over 150,000 tobacco technology patents and 20,000 non-tobacco technology patents,it is found that the SimBERT+CNN model optimized by the above methods achieves higher accuracy in both first-level tobacco technology classification and second-level technology classification compared to using BERT+CNN.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222