检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王芷筠 常杪[1] 周黎 郭培坤[1] 谷美枫 WANG Zhijun;CHANG Miao;ZHOU Li;GUO Peikun;GU Meifeng(School of Environment, Tsinghua University;Service Center of Environmental Information and Technology Assessment, Panzhihua Bureau of Ecology and Environment)
机构地区:[1]清华大学环境学院 [2]攀枝花市生态环境局环境信息与技术评估服务中心
出 处:《环境工程技术学报》2021年第2期385-392,共8页Journal of Environmental Engineering Technology
基 金:北京市科技计划首都蓝天行动培育项目(Z191100009119010)。
摘 要:随着我国环境政策法规数量的不断增加,采用纯人工方式对政策法规进行整理归纳和分析解读变得越来越困难。运用文本挖掘等计算机技术辅助开展环境政策法规信息提取、内容分析以及智能化管理应用具有重要意义。精准分词则是实现文本挖掘各项功能的必要条件。为改善政策法规文本分词效果,以我国各级生态环境部门官网发布的环境政策法规文本为语料基础,通过新词发现算法与人工补充修正构建得到环境管理专业词库。应用实证结果表明:添加专业词库能将政策法规文本的分词准确率由72.6%升至94.1%;将基于支持向量机模型的政策法规文本自动分类误判率降低22.7%;且添加词库后的词频统计和关键词提取结果能为环境政策法规分析提供更全面、更具有时效性的统计信息。With the rapid development of environmental policies in China,collating,inducing,analyzing and interpreting a large number of policies and regulations in a purely manual way has become more and more difficult.Therefore,it is of great significance to use computer technologies,such as text mining,to support intelligent environmental policy management and environmental policy analysis,including information extraction and text analysis.Accurate word segmentation,or tokenization,is the basis of all text mining functions.In order to improve the effect of policy text segmentation,the environmental policies published on official websites of China's ecological and environmental departments of all levels were collected and taken as corpus.New word discovery algorithms and manual supplement and modification were adopted to develop the environmental management professional lexicon.The empirical results showed that with addition of the environmental lexicon,the accuracy of environmental policy segmentation could improve from 72.6%to 94.1%,and the misjudgment rate of policy automatic classification based on support vector machine could reduce by 22.7%.Besides,the results of word frequency statistics and keyword extraction after adding lexicon could also provide more comprehensive and more timely statistical information for environmental policy analysis.
分 类 号:X11[环境科学与工程—环境科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.8