检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京市虚拟仿真与可视化工程技术研究中心(北京大学),北京100871
出 处:《计算机科学》2014年第10期31-35,共5页Computer Science
基 金:863计划重点项目(2011AA120301);国家自然科学基金项目(60925007;61173080;61232014)资助
摘 要:随着移动通信技术的不断发展,手机的普及率在不断上升,而短信作为传统的移动通信服务,长久以来一直在人们的日常生活中占据着极为重要的位置。可以说,短信在一定程度上记录了人们生活的轨迹。但是,现有的短信管理系统仅对短信进行以联系人为特征分类、以时间为顺序显示的简单非智能化的管理,导致了用户手机中各类短信混杂不清,短信的管理效率极低。通过研究短信的特征,分析传统的基于文档频率的特征值提取方法和基于互信息的特征值提取方法的优势与不足,提出了一种适用于短信的基于词频和互信息的特征值提取方法,并结合短信长度实现了一种改进的贝叶斯分类算法。实验证明,算法在进行短信分类时可以得到相当可观的召回率和准确率。With the development of the mobile communication technology, the number of mobile phone users is increasing continuously. As a traditional mobile communication service, SMS occupies a very important position in people~ s lives. SMS messages record the track of one's life to a certain extent. However, the existing SMS management systems only manage our messages in an unintelligent way-classifying by contacts and showing in the order of sending time. As a result, different kinds of messages mix together and are hard to be managed. By studying the characteristics of SMS messages and analyzing the shortages of the traditional algorithm based on word frequency and the algorithm based on mutual information, we proposed a new feature selection algorithm for SMS messages based on both word frequency and mutual information and improved the accuracy of the Bayes classification algorithm using more features including the length of SIMS messages. In the experiments, it is proved that this new algorithm can get a very good recall rate and accuracy rate when processing SIMS messages.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.224.93.225