检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]大连理工大学系统工程研究所,辽宁大连116023
出 处:《计算机应用研究》2007年第7期168-170,共3页Application Research of Computers
基 金:国家自然科学基金资助项目(70431001;70271046)
摘 要:提出一种不依赖于词典的抽取文本特征词的桥接模式滤除算法(BPFA)。该算法统计文本中的汉字结合模式及其出现频率,通过消除桥接频率得到模式的支持频率,并依此来判断和提取正确词语。实验结果显示,BPFA能够有效提高分词结果的查准率和查全率。该算法适用于对词语频率敏感的中文信息处理应用,如文本分类、文本自动摘要等。This paper put forward a bridge-connection patterns filtering algorithm (BPFA) for extracting high-frequency words without thesaurus. Firstly, the frequencies of co-occurrence patterns of Chinese characters were counted from documents, then the bridge-connection frequencies were eliminated and therefore obtains the support frequencies of patterns. Afterwards, the words were identified and acquired according to the support frequencies instead of the primary appearing frequencies. The experimental results show that BPFA can improve both precision and recall of extracted lexical set to some extent. This algorithm can be applied to text categorization and automatic summarization.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249