检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张洪刚[1] 李焕[1] ZHANG Hong- gang LI Huan(School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China)
机构地区:[1]北京邮电大学信息与通信工程学院,北京100876
出 处:《华南理工大学学报(自然科学版)》2017年第3期61-67,共7页Journal of South China University of Technology(Natural Science Edition)
基 金:国家自然科学基金青年基金资助项目(61601042)~~
摘 要:中文分词是中文自然语言处理中的关键基础技术之一.目前,传统分词算法依赖于特征工程,而验证特征的有效性需要大量的工作.基于神经网络的深度学习算法的兴起使得模型自动学习特征成为可能.文中基于深度学习中的双向长短时记忆(BLSTM)神经网络模型对中文分词进行了研究.首先从大规模语料中学习中文字的语义向量,再将字向量应用于BLSTM模型实现分词,并在简体中文数据集(PKU、MSRA、CTB)和繁体中文数据集(HKCity U)等数据集上进行了实验.实验表明,在不依赖特征工程的情况下,基于BLSTM的中文分词方法仍可取得很好的效果.Chinese word segmentation is one of the fundamental technologies of Chinese natural language process-ing. At present, most conventional Chinese word segmentation methods rely on feature engineering, which re-quires intensive labor to verify the effectiveness. With the rapid development of deep learning, it becomes realistic to learn features automatically by using neural network. In this paper, on the basis of bidirectional long short-term memory ( BLSTM) model, a novel Chinese word segmentation method is proposed. In this method, Chinese cha-racters are represented into embedding vectors from a large-scale corpus, and then the vectors are applied to BLSTM model for segmentation. It is found from the experiments without feature engineering that the proposed method is of high performance in Chinese word segmentation on simplified Chinese datasets ( PKU, MSRA and CTB) and traditional Chinese dataset ( HKCityU).
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145