检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于娟 吴晓鹏 廖晓 刘建国 YU Juan;WU Xiao-peng;LIAO Xiao;LIU Jian-guo(School of Economics and Management,Fuzhou University,Fuzhou 350108;School of Internet Finance and Information Engineering,Guangdong University of Finance,Guangzhou 510521;Institute of Finance and Accounting,Shanghai University of Finance and Economics,Yangpu Shanghai 200433)
机构地区:[1]福州大学经济与管理学院,福州350108 [2]广东金融学院互联网金融与信息工程学院,广州510521 [3]上海财经大学会计与财务研究院,上海杨浦区200433
出 处:《电子科技大学学报》2021年第1期84-90,共7页Journal of University of Electronic Science and Technology of China
基 金:国家自然科学基金(71771054)。
摘 要:法语复杂的语法和词形变化规则导致N-gram等词语提取方法的效果无法保证,影响法语文本挖掘的准确性。该文提出一种高效的法文词语提取方法,从待分析的法语文本中自动获取包括单词和短语的词语集合,构建法语文本挖掘所需的词库。该方法把文本中的单词共现信息压缩为FP序列树结构,快速提取频繁词串并计算其成词度,得到法文词语集合。实验表明,该方法的准确率高达90%,且具有比现有法文词语提取方法更高的召回率,能有效支持法语文本挖掘应用。French is one of the working languages of the United Nations.Its complex grammar and part-ofspeech rules result in the inability of term extraction methods such as N-gram and thus affect the accuracy of French text mining.This paper proposes an effective and efficient French term extraction method,which can be used to extract words and phrases from the analyzing French text corpora and provide a complete lexicon for French text mining.Firstly,word co-occurrence information of the corpora being analyzed is compressed into an FP(Frequent Pattern)sequence tree for extracting frequent word sequences rapidly,and then the termhood of each frequent word sequence is calculated to obtain the term set.The FP sequence tree is a newly-designed data structure for reducing the time complexity of word co-occurrence statistics to linear time.Experiments show that the proposed method has a high accuracy of approximate 90%with a much higher than normal recall rate and thus has good potentials for French text mining applications.
关 键 词:FP序列树 法语文本挖掘 词语提取 成词度 文本压缩
分 类 号:TP182[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249