检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]淮阴工学院计算机工程学院,江苏淮安223003 [2]沧州师范学院计算机系,河北沧州061000
出 处:《山东大学学报(工学版)》2013年第3期7-12,共6页Journal of Shandong University(Engineering Science)
基 金:河北省科技计划资助项目(10213581);淮安市社会支撑资助项目(HASZ2012046)
摘 要:短信文本信息流携带了丰富的信息资源,为了在其中挖掘出多热点事件,给出了短信文本信息流在线分检算法,该方法采用特征词共现频度定义了特征词相关度,综合前导信息集合及信息产生频率定义了短信文本相似度。并且每聚类到一个时间段后,就对已聚类的短信文本进行周期分类。该算法对大数量短文本信息流的多热点事件检索效率较高,同时减少了信息的误检和漏检的可能性。在真实数据集上与Single-Pass算法进行比较实验,其结果表明了各项指标都有不同程度的提高。The text information flow of SMS had carded abundant information resources. In order to find out the hot events behind it, an online sorting algorithm was given for the text information flow of SMS. This method used the co-occurrence frequency of feature words to define its relevance. And the similarity of message texts was defined on the ba- sis of preamble information collection and information frequency. Furthermore, after each time period of clustering, the clustered SMS texts were classified periodically. This algorithm had higher efficiency to find hot events for a large num-ber of short text information retrieval. Moreover, this algorithin reduced the possibility of false detection and missed de- tection. Based on the comparison of experiments on algorithms between real data sets and Single-Pass, the results showed that each index was improved to some degree.
关 键 词:短信文本 信息流 热点事件 Single-Pass 聚类
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.198