基于词句重要性的中文专利关键词自动抽取研究  被引量:5

Automatic Keywords Extraction from Chinese Patents Based on Sentence Importance Ranking

在线阅读下载全文

作  者:王志宏[1] 过弋[1,2] 

机构地区:[1]华东理工大学,上海200237 [2]石河子大学,新疆石河子832003

出  处:《情报理论与实践》2018年第9期123-129,160,共8页Information Studies:Theory & Application

基  金:国家自然科学基金项目"面向事件分析的信息意图检测;建模与群体意图推理技术研究"(项目编号:61462073);上海市科学技术委员会项目"基于知识库的数据搜素引擎技术"(项目编号:17DZ1101003)的研究成果

摘  要:[目的/意义]专利关键词是对专利核心内容的概括,高效准确地抽取专利关键词不仅可以辅助人们对专利的快速查找,同时对专利分类、聚类、翻译等具有重要意义。[方法/过程]提出了"关键词在关键句中"的关键词抽取新思路。首先构建了一个联合句网络语义图特征和启发式规则特征的专利摘要句排序模型,然后仅选择Top-KS%的句子参与关键词计算,同时将句子语义权重参数引入到关键词权重计算过程中,从而使得句子的重要性传递到句中的词上。[结果/结论]在真实中文专利数据集中实验表明,从中文专利中选择适当比例关键句参与关键词抽取计算,相较于传统关键词抽取算法F值提升了6%~13%左右,有效地降低原始文档的噪声数据,提升了关键词抽取的效果。[ Purpose/significance ] Keywords of Chinese patents, which provide a high-level topic description of a patent doc- ument, hold an important position in classic NLP tasks, such as patent classification, patent clustering, patent retrieval and pa- tent translation. [ Method/process] This paper proposes an innovative idea that "the keywords are in the key sentences" to extract keywords. The sentence-ranking modelis constructed to select the top-Ks percent of the sentences for calculation based on the char- acteristics of sentence-embedding graph and heuristic rules. Meanwhile, the semantic weights of sentences are also introduced to calculate keywords weights, so the importance of sentences can be transferred to the keywords in the sentences. [ Result/conclu- sion] The experimental results of Chinese patents datasets show that compared with traditional keywords extraction algorithm, se- lecting appropriate percent of key sentences for keywords extraction calculation improves the performance by 6% to 13% in F-score, which can effectively filter out noisy sentences in original documents and improve the performance of keywords extraction.

关 键 词:中文专利 关键句 句排序 专利关键词 自动抽取 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象