检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:欧阳剑
机构地区:[1]上海师范大学语言研究所 [2]广西民族大学图书馆,研究馆员上海200234
出 处:《中国图书馆学报》2016年第2期66-80,共15页Journal of Library Science in China
摘 要:传统的古籍开发与应用模式已难以适应人文学科研究的需要,人文学科研究者期待一个技术逻辑和人文逻辑相耦合的数字人文研究范式的出现。本文从古籍文献深层次开发与利用出发,利用新的信息技术与面向数字人文研究跨学科方法,以大规模中国古籍文本为研究对象,采用大数据研究理念,对古籍进行整理、标注、自动分词等处理,以词频分析统计为研究核心,采用数据降噪、基于窗口时间单位的统计分析计算、滑动窗口预测等分析与挖掘方法,采用大数据实时分析技术,实现了实时、在线、立体、可视化、定量分析字词的历史词频分布规律,创建了一个以语言学、历史文献学、历史地理学等人文学科研究为主的古籍实时统计分析平台,可辅助研究者在大量的古籍文献中发现新的模式、现象、趋势等,实现古籍开发与应用模式创新的初步尝试。图11。参考文献36。Digital humanity, a new research pattern, brings consequently a new way of research for traditional humanity and social sciences for traditional development and utilization mode of the ancient literature resources that no longer fit the requirements of humanity researches. This paper aims at the deep development and utilization of ancient literature resources by using new information technology and method of digital humanity with the ancient Chinese literatures as to construct a new platform for real-time textual statistic analysis of linguistics, studies of historical literature and historical geography etc.This study adopts a big data concept, and applies sorting and labelling to Chinese ancient texts for the construction of a corpus of more than 40 000 kinds of ancient texts. This study also adopts means of dictionary superposition of piecewise and Bigram model to carry out word segmentation of Chinese ancient texts and also with the application of Grubbs method for data denoising and the maximum elimination of problematic data. With word frequency statistical analysis as the research focus base on ancient corpus, we use time window unit analytical computing to analyze the word frequency, apply the idea of memory realtime computing to solve the bottleneck problem of reading big data. The results of the statistics and analysis are displayed by the micro-level scatter plot and the macro-level curve graph based on the time axis as the main line. With the author of the ancient books as the main line, we use the geographic information system( GIS) technology to integrate and display digital ancient books, and with the retrieval of the ancientliterature as a clue to show the geographical distribution of the authors. This study improves the efficiency of real-time inquiry and realizes the visualization of the scatter diagram and curve graph of the word frequency according to the years. A statistical and analytical platform of ancient literatures and documents in linguistics, history and historical geography will
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117