基于Word2vec的铁路工程地质语料库构建与词嵌入被引量：1

作　　者：戴均豪

出　　处：《科技创新与应用》2022年第35期89-92,共4页Technology Innovation and Application

基　　金：中国铁建重大专项(2021-A02)。

摘　　要：随着铁路工程地质工作的不断开展,相关文本资料大量累积。但由于文本具有非结构化、不直观等特点,难以在信息化进程中得到高效利用。为将文本资料转化为计算机可直接读取的形式,该文面向铁路工程地质领域,收集文献、报告、规范及手册等多种类文本,利用Jiaba函数库,构建4192189词规模的铁路工程地质语料库;利用Word2vec模型,将非结构化文本分词嵌入词向量空间中,转化为具有语义信息的数值。经过降维可视化、聚类和语义相似度计算的检验,结果表明,该文构建的语料库及其所训练的词向量能有效记录语义信息。为铁路工程地质语义分析、实体识别和知识图谱构建等工作提供数据基础。With the continuous development of railway engineering geological work,a large number of related text materials have been accumulated.However,because the text is unstructured and unintuitive,it is difficult to be used efficiently in the process of informatization.In order to transform the text data into a form that can be directly read by computer,this paper collects documents,reports,specifications,manuals and other kinds of texts in the field of railway engineering geology,uses Jieba Chinese word segmentation technology to build a railway engineering geological corpus with a scale of 4192189 words,and uses Word2vec model to embed unstructured text word segmentation into word vector space and transform it into numerical values with semantic information.Through the tests of dimensionality reduction visualization,clustering and semantic similarity calculation,the results show that the corpus constructed in this paper and its trained word vectors can effectively record semantic information,thus providing a data basis for semantic analysis of railway engineering geology,entity recognition,knowledge graph construction and so on.

关键词：铁路工程地质 NLP 语料库词向量

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Word2vec的铁路工程地质语料库构建与词嵌入被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Word2vec的铁路工程地质语料库构建与词嵌入 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Word2vec的铁路工程地质语料库构建与词嵌入被引量：1