德温特专利信息清洗与标注模型研究被引量：7

The Research of ETL and Annotation Model Construction of Derwent Patent Information

作　　者：翟东升[1] 李倩[1] 张杰[1] 黄鲁成[1] 赵京[1]

出　　处：《情报杂志》2013年第8期150-154,203,共6页Journal of Intelligence

基　　金：国家科技支撑计划项目"面向企业创新应用链的知识管理体系建设与集成应用示范"(编号:2012BAH34F00);国家社会科学基金重大项目"新兴技术未来分析理论方法与产业创新研究"(编号:11&ZD140);北京市自然科学基础资助项目"中文专利侵权检测与分析理论方法及关键技术研究"(编号:9132005)的研究成果之一

摘　　要：专利数据集的质量和处理效率是进行专利分析和知识发现的基础,以构造高质量专利数据集的处理模型为目的,以SQL Server BI为研究平台,设计并实现了德温特专利数据库(DII)信息清洗标注模型。以文本形式的专利信息为数据源,在对各字段内容进行分别抽取的基础上,综合运用表达式清洗策略、循环清洗策略和基于正则表达式的脚本清洗策略对各字段进行清洗转换,结合SQL语言将关系数据转变为XML语义数据。实验证明,模型可以有效而较为准确地完成对大规模DII专利信息的清洗、存储与标注。The quality and processing efficiency of patent datasets is the basis of patent analysis and knowledge discovery. At the aim of constructing a processing model for the generation of patent datasets with high equality, our research is based on the platform of SQL Server BI, we develop the information cleaning （ ETL ） and annotating model for Derwent patent information （ DII）. We use patent information with text form as the data source, after extracting the content of every field, construct different information cleaning strategy based on func- tion expressions, regular expressions and cycle roles to deal with the unique problems of different fields annotate the data which has been cleaned with SQL for the transformation of the rational data to the semantic data. The experiment shows that our model can give a good re- sult for cleaning, annotating and normally storing of the patent information.

关键词：德温特专利数据库(DII) 专利信息数据清洗抽取策略

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

德温特专利信息清洗与标注模型研究被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

德温特专利信息清洗与标注模型研究 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

德温特专利信息清洗与标注模型研究被引量：7