检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱玉强[1] 江涛[2] 李翼飞 ZHU YuQiang;JIANG Tao;LI YiFei(Library of Shandong Normal University,Jinan 250014,P.R.China;Library of Hainan Medical University,Haikou 571199,P.R.China)
机构地区:[1]山东师范大学图书馆,济南250014 [2]海南医学院图书馆,海口571199
出 处:《数字图书馆论坛》2022年第2期33-39,共7页Digital Library Forum
基 金:2021年度海南省哲学社会科学规划课题(编号:hnsz2021-19)资助。
摘 要:针对外文数据库英译中文作者姓名存在多记录指向同一人或同记录指向不同人等情况,模拟人工排检法,整合多源数据、学术社交网络、知识百科及在线翻译网站等语料库,利用网页文档对象自动操作、正则表达式、短文本相似度计算等技术编制程序开展英译中文作者姓名消歧实践。结果表明,算法架构稳定有效、扩展性强,成功率得到从业人员认可,为数据预处理和清洗工作提供了新思路和新方法。Aiming at the situation where there are multiple records of the names and addresses of the authors in English-to-Chinese translations of foreign language databases pointing to the same person or the same records pointing to different people,the article simulates manual sorting,integrating multi-source data,academic social networks,knowledge encyclopedias,and online translation websites and other corpora.Use the automatic operation of web document objects,regular expressions,short text similarity calculation and other technologies to compile programs to carry out the practice of disambiguation from English to Chinese name and address.The results show that the algorithm architecture is stable and effective,with strong scalability,and the success rate is recognized by practitioners.It provides new ideas and new methods for data preprocessing and cleaning.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.31.104