检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:何梓源 张仰森[1] 吴云芳[2] 亓文法[3] HE Zi-yuan;ZHANG Yang-sen;WU Yun-fang;QI Wen-fa(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100192,China;Institute of Computational Linguistics,Peking University,Beijing 100871,China;WANGXUAN Institute of Computer Technology,Peking University,Beijing 100080,China)
机构地区:[1]北京信息科技大学智能信息处理研究所,北京100192 [2]北京大学计算语言学研究所,北京100871 [3]北京大学王选计算机研究所,北京100080
出 处:《计算机工程与设计》2022年第2期580-586,共7页Computer Engineering and Design
基 金:国家自然科学基金项目(61772081);国家重点研发计划基金项目(2018YFB1403104)。
摘 要:为提供准确且更贴近日常用语的关键词,针对视频弹幕内容提出一种基于TI-RANK(TTF-ICDF-DWTextRank)的词频词义相结合的关键词提取模型。将标题内容进行分类得到标题的关键信息,将该信息用于词频提取构建TTF算法;进一步考虑词频与篇章数对提取效果的影响,通过分段函数构建ICDF算法;引入语义维度信息并利用中文拼音作为编辑距离的计算单元构建DWTextRank模型。实验结果表明,TI-RANK模型提取关键词的F1值达到0.8以上,相较传统TF-IDF和TextRank算法提高了约20%。为更合理评价关键词提取的准确率,按照关键词重要程度降序排列定义三级梯度评价标准,该标准能够更好体现出排序靠前关键词的正确性对准确率的影响。To provide accurate and colloquial keywords,a keyword extraction model TI-RANK(TTF-ICDF-DWTextRank)considering the combination of word frequency and word meaning was proposed.The TTF algorithm was constructed which innovatively classified the title to obtain the key information of the title,and used this information to extract the word frequency.The influence of word frequency and article numbers on the extraction effect was further considered and the ICDF algorithm was built through a piecewise function.Semantic information was used and Chinese pinyin was used as the edit distance calculation unit to construct the DWTextRank model.Experimental results show that the F1 value of keywords extracted using the TI-RANK model reaches more than 0.8,which is about 20%higher than the traditional TF-IDF and TextRank algorithms.To evaluate the accuracy of keyword extraction more reasonably,a three-level gradient evaluation standard was defined on account of the characteristics that keywords should be arranged in descending order of importance.It is found that this standard can more clearly reflect the correctness of the keyword especially in the high place.
关 键 词:词频-逆文档频率 文本关键词抽取 词频词义关键词提取 三级梯度评价标准 视频弹幕
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222