检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈婧汶 陈建国[1,3] 王成彬[1,3] 朱月琴 CHEN Jingwen;CHEN Jianguo;WANG Chengbin;ZHU Yueqin(State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences,Wuhan 430074,China;Collaborative Innovation Center for Exploration of Strategic Mineral Resources, China University of Geosciences(Wuhan),Wuhan 430074,China;Faculty of Earth Resources,China University of Geosciences(Wuhan),Wuhan 430074,China;Key Laboratory of Geological Information Technology,Ministry of Natural Resources,Beijing 100037,China;Development and Research Center,China Geological Survey,Beijing 100037,China)
机构地区:[1]中国地质大学地质过程与矿产资源国家重点实验室,湖北武汉430074 [2]中国地质大学(武汉)紧缺矿产资源勘查协同创新中心,湖北武汉430074 [3]中国地质大学(武汉)资源学院,湖北武汉430074 [4]自然资源部地质信息技术重点实验室,北京100037 [5]中国地质调查局发展研究中心,北京100037
出 处:《中国矿业》2018年第9期69-74,101,共7页China Mining Magazine
基 金:国土资源部公益性行业科研专项项目"地质大数据技术研究与应用试点"资助(编号:201511079-02);国家重点研发计划项目"基于‘地质云’平台的深部找矿知识挖掘"资助(编号:2016YFC0600510)
摘 要:中文与英文不同,词与词之间没有类似空格的天然分隔符,致使中文分词成为中文信息处理中的难题。地质矿产文本中含有大量未登录地质专业术语,现阶段仍无效果较好的分词方法。本文探讨了一种基于双语料库条件随机场模型的方法对地质矿产文本进行分词,并与通用领域分词方法、单语料库条件随机场模型分词方法进行对比实验。实验表明,本文提出的方法在开放测试下分词效果明显优于其他方法,准确率为94.80%,召回率为92.68%,F-值为93.73%。本文对地质矿产文本进行了中文分词研究,既能够很好地识别未登录地质专业术语,又保证了普通词汇的识别率,为对地质领域的自然语言处理工作奠定了基础。Unlike English,the Chinese language has no space between words,it is difficult for machines to detect what constitutes a word in Chinese.The geological mineral text contains a large number of unknown geological words,which still have no effective Chinese word segmentation method.This motivated us to develop a segmenter specifically for geological mineral text which combines the characteristic of dictionary and conditional random fields model.We make a comparison experiment with generic segmentation method and a conditional random fields model which just use a single corpus.The results show that this measure should go far towards solving the Chinese word segmentation problem,and get 94.80%in precision,92.68%in recall,93.73%in F-score.Here we explore CRFs for a Chinese word segmentation of geological mineral text task that is good to identify the unknown geological words and ensure the recognition rate of ordinary words.This work makes a base for natural language processing in the field of geology.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147