TFC-Reducing:一种基于属性语义距离和规则的文本型形式背景约简方法  被引量:3

TFC-Reducing:An Approach for Reduction of Textual Formal Context Based on Semantic Distance Between Attributes and Rules

在线阅读下载全文

作  者:杨小平[1] 何伟[2] 孙亚琳[1] 廖俊宇[1] 

机构地区:[1]中国人民大学信息学院,北京100872 [2]怀化学院数学与应用数学系,湖南怀化418008

出  处:《小型微型计算机系统》2012年第10期2170-2176,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(70871115)资助

摘  要:形式概念分析作为数据分析和知识处理的形式化工具,可以有效的从海量文本数据中挖掘出人们感兴趣的知识,受到许多研究人员的推崇.形式概念分析的前提条件是必须有一个纯净、良好定义的形式背景.从文本中直接提取特征词,利用文本-特征词形成的文本型形式背景(Textual Formal Context TFC)是一个高度稀疏的二维表,带有很多的噪音信息,严重影响形式概念分析的建格效率以及概念格的结构.因此找到一种有效的文本型形式背景约简方法很有必要.本文综合考虑文本型形式背景的本质特征,从属性语义距离和数学原理出发,提出了一种文本型形式背景的约简方法TFC-Reducing,并给出文本型形式背景约简的评价方法--信息损失熵和语义覆盖度.As a tool of data analysis and formalizing for knowledge management, Formal Concept Analysis ( FCA ) can effective mine knowledge interested for people from lager textual data, and which are held in esteem by many researchers. The premise of FCA is that need a pure and well defined formal context. Extracting characteristic word directly from the text and exploiting document with characteristic words to form textual formal context ( TFC ), which lead to generating a highly sparse two-dimensional table with a lot of noise. It seriously affects efficiency of building concept lattice and the structure of lattice. Therefore, it is necessary to find an effective method for reducing the textual formal context. Comprehensively considering the nature of textual formal context in this paper, we propose a method named TFC-Reducing for the reduction of textual formal context from the view of semantic distance between attributes and mathematical theory, and give a method for evaluating reduction of textual formal context, named as information losses entropy ILE and semantic coverage SC.

关 键 词:文本型形式背景 语义距离 属性约简 领域主题词表 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象