检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国民航大学民航安全科学研究所,天津300300
出 处:《交通信息与安全》2016年第5期82-86,101,共6页Journal of Transport Information and Safety
基 金:国家科技支撑计划(2014BAG01-B04)资助
摘 要:对于民航安全信息,关键词有着体现文本概要的功能,在信息提取与调用等方面对民航安全工作者有所帮助。在研究当前关键词提取技术后,结合民航领域词特征,提出了一种以朴素贝叶斯模型为基础的关键词提取模型。模型中所选取的特征项为词语的词长、词性、以及包含词语位置与段落跨度的词频与逆向文档频率乘积(TF-IDF)值,特征项代表了每个候选词的基本属性。利用该模型对已人工标注好关键词的民航安全信息进行训练以获取各个特征项的概率,利用特征项概率计算每个备选词的关键词评分,将评分前3位的词汇输出为关键词。实验表明,针对民航安全信息的关键词提取,所提方法与传统的TF-IDF算法、KEA算法相比,准确率分别提高了18%和11.9%,民航词汇识别率分别提高了15.3%和12.3%。结果证明,与传统算法相比,所提方法能大幅提升关键词提取的准确率与民航词汇识别的能力。Keywords in civil aviation can reflect synopsis of safety information.It is significant for security officers to extract and call information.An academic review of technologies to extract keywords is conducted in this paper.The features of keywords in civil aviation are analyzed.And a naive Bayes model for extraction of keywords is proposed.The selected features of this model are length of keywords,part of speech,frequency of words(including span of paragraph and position of words),and Term Frequency-Inverse Document Frequency(TF-IDF)value,which reflect the basic attributes of each candidate word.This model is trained by the safety information which has been manually labeled,in order to obtain the probability of each feature for extracting keywords.The probability of features is used to compute the score of all alternative words.The words with top three scores are regarded as keywords.Compared with the traditional TFIDF algorithm and KEA algorithm,this method improves the accuracy by 18%and 11.9%,respectively.The recognition rate of words is also improved by 15.3% and 12.3%,respectively.The results show that,compared with other general methods,the accuracy and capability to recognize special words in civil aviation can be significantly improved by the method proposed in this study.
关 键 词:民航安全信息 关键词提取 TF-IDF 朴素贝叶斯模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38