检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]吉林省教育学院,吉林长春130022 [2]吉林省工商银行,吉林长春130021
出 处:《情报科学》2014年第11期147-151,共5页Information Science
摘 要:基于网络检索的语料库研究均开始于语料库软件系统的开发,语料库软件系统是从事语料库语言学、机器翻译、语言教学、词典编纂等研究的基础,软件系统的质量决定了语料库建设规模的大小和研究成果的优劣。大规模语料库软件系统建设的关键环节包括:文档抽取;元数据建立;词性、句法和语误标注;索引、检索和统计分析。针对上述技术环节,我们收集并编程测试了大量国外语料库开发软件包,从软件实现的理论方法、执行效率、准确率、鲁棒性、实用性、支持中文等多个方面进行分析和评述,以期对国内大规模语料库软件系统的建设提供借鉴和帮助。The study of corpus software system based on network retrieval was all launched out with the development of corpus software system. The corpus software system plays as the foundational stone in the building of the studies on corpus linguistics, machine translation, language teaching and lexicography.The system's quality formulates the scale of corpus construction and the outputs of the studies as well.The construction of large-scale corpus software system, whose key links include: document extraction;Metadata set up; the part of speech, syntax and miss labeling; indexing, retrieval and statistical analysis.According to the technologies above, we analyzed and commented the corpus development package fromvarious of aspects, like the theory method, execution efficiency, accuracy, robustness and practicability,weather support Chinese and so on, by means of a large amount of foreign corpus development packagecollection and programming tests. We do it for the reason that we may provide a reference or a little help for the construction of domestic large-scale corpus software system later on.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3