检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗明[1] 黄海量[1,2] Luo Ming;Huang Hailiang(College of Information Management & Engineering;Shanghai Key Laboratory of Financial Information Technology,Shanghai University of Finance & Economic,Shanghai 200433,China)
机构地区:[1]上海财经大学信息管理与工程学院,上海200433 [2]上海财经大学上海市金融信息技术研究重点实验室,上海200433
出 处:《计算机应用研究》2018年第8期2281-2284,2288,共5页Application Research of Computers
基 金:上海市科技人才计划项目(14XD1421000);上海市科技创新行动计划项目(16511102900);上海财经大学2014年研究生创新基金资助项目(CXJJ-2014-438)
摘 要:针对基于词袋的机器学习文本分类方法所存在的高维度、高稀疏性、不能识别同义词、语义信息缺失等问题,和基于规则模式的文本分类所存在的虽然准确率较高但鲁棒性较差的问题,提出了一种采用词汇—语义规则模式从金融新闻文本中提取事件语义标注信息,并将其作为分类特征用于机器学习文本分类中的新方法。实验证明采用该方法相比基于词袋的文本分类方法在采用相同的特征选择算法和分类算法的基础上,F1值提高8.6%,查准率提高7.7%,查全率提高8.8%。本方法融合了知识驱动和数据驱动在文本分类中的优点,同时避免了它们所存在的主要缺点,具有显著的实用性和研究参考价值。The main problems of traditional machine learning text classification method which based on BOW (bag of words) are high dimension and high sparseness, can not identify synonyms and lack of semantic information etc. Meanwhile, rule based methods have high precision but have weaker robustness. In order to solve these problems, this paper proposed a novel method which based on lexical-semantic patterns to extract event semantic annotations from financial news text, and applied these annotations as features in machine learning method. The experiment shows that this method lifts F 1 value 8.6% than BOW, and the precision is increased by 7.7%, recall is increased by 8.8%, which based on same feature selection algorithm and classification method. This method combines the advantages of the two methods of knowledge driven and data driven in text classification, at the same time avoids the major drawbacks of last two methods, it has a good practical and research reference value.
关 键 词:文本分类 金融文本 语义标注 词汇-语义模式 有限状态机
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.94