网络信息审计系统中的文本片断模糊分类算法  被引量:2

Text-Fragment Fuzzy Classification Algorithm for Network Information Auditing System

在线阅读下载全文

作  者:李金库[1] 张德运[1] 高鹏[1] 孙钦东[1] 

机构地区:[1]西安交通大学电子与信息工程学院,西安710049

出  处:《西安交通大学学报》2005年第8期800-803,共4页Journal of Xi'an Jiaotong University

基  金:国家高技术发展计划资助项目(2003AA148010).

摘  要:分析了分段对文本分类的影响,提出了与文本语义密切相关的最大语义标志原则(MSMR)和段落间的语义激励原则(SIR),在模糊K-最近邻分类算法的基础上,应用这2个原则设计并实现了一种基于上下文的文本片断模糊分类算法.该算法依据SIR判断文本片段分类的相互影响,降低了片段分类的错误率,当某一片断类隶属度大于某一阈值时,依据MSMR判定可知,同一文档的后续片断均属于同一类别,这样就不用计算所有片断的类隶属度.实验表明:与模糊K-最近邻分类算法相比,所提算法能有效提高系统的查准率、查全率和正确率,其中查全率可提高16%以上;在同一会话中,由于被明确分类后的后续片段不需要计算类隶属度,所以算法总计算时间明显少于模糊K-最近邻分类算法,具有较高的分类效率.The impact on text classification when text document is broken into fragments is analyzed; the most semantic marking rule (MSMR) and semantic inspiring rule (SIR) between paragraphs which are closely correlated to text semantics are defined; using these two rules, based on KNN (K-nearest-neighbor) algorithm, a context-sensitive text-fragment classification algorithm is designed and implemented.Through computing the classification interaction between text-fragments, the algorithm can reduce the error rate of classification according to SIR, and when the membership value of one fragment is more than an especial threshold it can conclude that the following fragments of a document belong to a same class according to MSMR. Compared to KNN algorithm, the experiment shows that the new algorithm increases veracity and efficiency of classification by more than 16%, and in a session, because the subsequent fragments that have been classified definitely do not need the computation of the membership value, the total computing time of the proposed algorithm is much less than ordinary nearest fuzzy neighbor classification method, thus has higher classification efficiency.

关 键 词:文本片段分类 信息审计 K-最近邻 模糊分类 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象