基于高质量语义和双向注意力机制的潜在高价值专利识别研究  

Identifying Potential High-Value Patents Based on High-Quality Semantics and BiAttention Mechanism

在线阅读下载全文

作  者:窦路遥 周志刚 申婧 冯宇 苗均重 Dou Luyao;Zhou Zhigang;Shen Jing;Feng Yu;Miao Junzhong(School of Information,Shanxi University of Finance and Economics,Taiyuan 030006,China;School of Economics and Business Administration,Chongqing University,Chongqing 400030,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)

机构地区:[1]山西财经大学信息学院,太原030006 [2]重庆大学经济与工商管理学院,重庆400030 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001

出  处:《数据分析与知识发现》2025年第3期56-68,共13页Data Analysis and Knowledge Discovery

基  金:国家自然科学基金项目(项目编号:61902226);中国教育技术协会重点项目(项目编号:XJJ202205014);山西省研究生教育教学改革课题(项目编号:2023JG110)的研究成果之一。

摘  要:【目的】解决潜在高价值专利识别过程中序列建模的长距离依赖问题以及序列特征的关键信息获取问题,提升潜在高价值专利识别精准度和可解释性。【方法】提出一种基于预训练模型XLNet和双向注意力机制的潜在高价值专利识别模型XLBBC,通过XLNet模型进行专利文本表示和高质量语义获取,再利用BiGRU网络获取全局文本序列信息,随后嵌入BiAttention层使模型集中注意力于输入序列的不同部分,联合CNN层捕捉专利文本中的关键短语和特定模式。在非晶合金、工业机器人、钙钛矿太阳能电池和基因芯片等领域的混合专利数据集展开实证研究。【结果】XLBBC模型在一定数据规模(40000条专利数据)时具备高准确性(0.89)和一致性(0.65)的双重优势;模型的预测准确率达到42%左右,较既有研究模型的准确率提升约9%。【局限】未考虑标准必要专利与高价值专利的关联关系和融合机制,算法效率和可扩展性仍需进一步优化。【结论】XLBBC模型在处理复杂文本数据时更具优势;XLNet模型在全局语义理解上具备优越性;当注意力层处于XLNet-BiGRU层和CNN层之间时,会取得更好的模型效果。[Objective]This study aims to address the challenges of identifying high-value patents,specifically the issue of long-distance dependencies in sequence modeling and the extraction of key features from patent text sequences,and to improve both the accuracy and interpretability of high-value patent identification.[Methods]We propose XLBBC,a model for high-value patent identification,which integrates the pre-trained XLNet model and a bidirectional attention mechanism(BiAttention).The XLNet model is utilized for patent text representation and semantic extraction,while a BiGRU network captures global sequence information.The BiAttention layer is incorporated to allow the model to focus on different segments of the input sequence,and a CNN layer captures key phrases and patterns in the patent text.Empirical research is conducted using a mixed patent dataset from industries including amorphous alloys,industrial robotics,perovskite solar cells,and gene chip.[Results]The XLBBC model demonstrates strong performance,achieving an accuracy of 0.89 and consistency of 0.65 on a dataset of 40,000 patent records.The prediction accuracy of the model is around 42%,which is a 9%improvement over existing models.[Limitations]The model does not account for the relationship and integration mechanisms between standard-essential patents and high-value patents.Additionally,there is room for improvement in the efficiency and scalability of the algorithm.[Conclusions]The XLBBC model outperforms traditional methods in handling complex textual data.It shows superior performance in text classification compared to CNN-based ensemble models.XLNet excels in global semantic understanding,and placing the attention layer between the XLNet-BiGRU and CNN layers leads to the best overall model performance.

关 键 词:高质量语义 双向注意力机制 XLNet 双向门控循环单元 卷积神经网络 高价值专利 

分 类 号:TP393[自动化与计算机技术—计算机应用技术] G250[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象