结合领域知识的影视文本关键词提取算法研究  被引量:1

Research on Keyword Extraction Combined with Domain Knowledge for Film and Television Text

在线阅读下载全文

作  者:王芳 刘嘉恩 李晶 WNAG Fang;LIU Jiaen;LI Jing(School of Information Engineering,Beijing Institute of Petrochemical Technology,Beijing 102617,China)

机构地区:[1]北京石油化工学院信息工程学院,北京102617

出  处:《北京石油化工学院学报》2022年第3期57-63,共7页Journal of Beijing Institute of Petrochemical Technology

基  金:北京市委组织部优秀人才项目(2018000020124G089);北京市教育委员会科技计划一般项目(KM202010017011)。

摘  要:以影视文本为例,基于图排序技术研究领域关键词提取算法,挖掘影视领域知识,在候选关键词生成、候选词网络构建、网络节点排序三方面融入领域知识,设计实现了影视文本关键词提取模型。该模型在生成候选关键词中引入影视领域实体词表,在候选词网络构建中增加影视实体关系,在排序算法中添加影视实体语义关系权重,最后通过图排序技术对候选关键词排序,选取排序靠前的候选词作为关键词提取结果。该模型无需大量人工标注数据,可实现高效影视文本关键词提取。实验结果表明:将领域专业词汇导入分词词表,可减少通用分词模型在专业词汇上的分词错误,提高候选词召回率;将领域知识引入候选词网络构建和候选词排序,在TextRank和PositionRank排序算法上提取效果提升显著,该方法在领域关键词提取场景中有较强应用价值。Taking film and television text as an example, keyword extraction algorithm is studied based on graph ranking technology. By mining the knowledge in the field of film and television, and incorporating the domain knowledge in the three aspects of candidate keyword generation, candidate word graph construction and candidate word ranking, a film and television text keyword extraction model is designed and implemented. Based on graph ranking technology, this model, can achieve efficient keyword extraction for film and television text without manual annotation data. The experimental results show that the introduction of domain specialized words into the word segmentation lexicon can reduce the word segmentation errors in the general word segmentation model and improve the candidate recall rate. By introducing domain knowledge into candidate word graph construction and candidate word ranking, the extraction effect of TextRank and PositionRank is significantly improved, and the proposed method has strong application value in domain keyword extraction scenarios.

关 键 词:影视文本 图排序 关键词提取 领域知识 

分 类 号:TP368[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象