融合多策略的中文科技文献机构名称规范化研究与实践  被引量:1

Study and Practice on Institution Name Normalization of Chinese Scientific and Technical Literature Based on Multiple Strategies

在线阅读下载全文

作  者:刘燕[1] 孙月萍[1] 侯丽[1] LIU Yan;SUN Yueping;HOU Li(Institute of Medical Information,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020,China)

机构地区:[1]中国医学科学院/北京协和医学院医学信息研究所,北京100020

出  处:《医学信息学杂志》2022年第12期32-38,共7页Journal of Medical Informatics

基  金:中国工程科技知识中心建设项目“医药卫生专业知识服务系统”(项目编号:CKCEST-2022-1-6);国家社科青年基金项目“基于语义增强的医学学术出版创新融合研究”(项目编号:18CTQ024)。

摘  要:分析中文科技文献中机构著录项的组织特点和中文机构名称的命名特点,详细阐述常见机构名称规范化方法、中文科技文献机构名称规范化处理流程,提出利用字符串匹配词典和规则过滤等方法提取规范化的机构名称,并基于机构-作者共现关系,计算作者共现率,结合绝对共现量和共现率阈值实现机构实体的消歧,能够有效匹配同一机构的不同表现形式。The paper analyzes the organization characteristics of institution description items in Chinese scientific and technical literature and the naming characteristics of Chinese institutions,expounds the common methods of institution name normalization and the process of institution name normalization for Chinese scientific and technical literature,and proposes that the methods of extracting the normalized institution names by using the methods of string matching,dictionary-based and rule-based filtering,calculating the co-occurrence rate of authors based on the co-occurrence relationship between institutions and authors,and disambiguating the institution entities through the number of absolute co-occurrence and the co-occurrence rate threshold,which can effectively match different forms of an institution.

关 键 词:机构名称规范 科技文献 作者共现 实体挖掘 

分 类 号:R-058[医药卫生]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象