检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李劲[1,2] 张华[1] 吴浩雄[1] 向军[1] 辜希武[3]
机构地区:[1]湖北民族学院信息工程学院,湖北恩施445000 [2]华中师范大学信息管理系,武汉430079 [3]华中科技大学计算机科学与技术学院,武汉430074
出 处:《计算机应用》2012年第5期1335-1339,共5页journal of Computer Applications
基 金:国家自然科学基金资助项目(61040006);湖北省自然科学基金资助项目(2010CDZ027);湖北省教育厅科技项目(B20101909)
摘 要:社会标注是一种用户对网络资源的大众分类,蕴含了丰富的语义信息,因此将社会标注应用到信息检索技术中有助于提高信息检索的质量。研究了一种基于社会标注的文本分类改进算法以提高网页分类的效果。由于社会标注属于大众分类,标注的产生具有很大的随意性,标注的质量差别很大,因此首先利用文档间的语义相似度以及标注间的语义相似度来对标注的质量进行量化评估。在此基础上对标注进行质量过滤,利用质量相对较好的标注对文档向量空间模型进行扩展,将文档表示成由文档单词以及文档标注信息组成的扩展向量。同时采用支持向量机分类算法进行分类实验。实验结果表明,通过对标注进行质量评估并过滤质量差的标注,同时结合文档内容以及标注来对文档能提高分类的效果,同传统的基于文档内容的分类算法相比,分类结果的F1度量值提高了6.2%。Social annotation is a form of folksonomy, which allows Web users to categorize Web resource with text tags freely. It usually implicates fundamental and valuable semantic information of Web resources. Consequently, social annotation is helpful to improve the quality of information retrieval when applied to information retrieval system. This paper investigated and proposed an improved text classification algorithm based on social annotation. Because social annotation is a kind of folksonomy and social tags are usually generated arbitrarily without any control or expertise knowledge, there has been significant variance in the quality of social tags. Under this consideration, the paper firstly proposed a quantitative approach to measure the quality of social tags by utilizing the semantic similarity between Web pages and social tags. After that, the social tags with relatively low quality were filtered out based on the quality measurement and the remained social tags with high quality were applied to extend traditional vector space model. In the extended vector space model, a Web page was represented by a vector in which the components were the words in the Web page and tags tagged to the Web page. At last, the support vector machine algorithm was employed to perform the classification task. The experimental results show that the classification result can be improved after filtering out the social tags with low quality and embedding those high quality social tags into the traditional vector space model. Compared with other classification approaches, the classification result of F1 measurement has increased by 6.2% on average when using the proposed algorithm.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117