检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]复旦大学计算机与信息技术系,上海200433
出 处:《计算机应用与软件》2008年第11期3-5,22,共4页Computer Applications and Software
基 金:国家自然科学基金资助(60473070)
摘 要:在传统的字符串处理算法中往往分别考虑字符串的频度和长度。然而,在实际应用中,将字符串的频度和长度结合考虑是有意义的。基于这点我们提出了频长积的概念,规定字符串的频度和长度的乘积为字符串的频长积。并基于广义后缀树和Uk- konen算法,提出了时间复杂度为O(N)的查找算法。效率实验证实了该算法的高效性。语义实验表明,本算法找出的最大频长积字符串相比于最大频度字符串或最大长度字符串,其实际语义更为明确。这样的字符串在文本压缩、基因序列的分析以及其他注重语义的应用中将具有很高的应用价值。In the traditional processing of natural language text, researchers often only consider the frequencies of the strings emerging in a text or only the lengths of them. However, it is meaningful to take the frequency and the length of a string into consideration together in practical applications. PFL (Product of Frequency and Length) is defined as the product of a string' s frequency and its length, and then an algorithm which finds the string that has the biggest PFL in a text is constructed. An efficient algorithm based on the Generalized Suffix Tree and Ukkonen algorithm is provided to search the string with the biggest PFL. The efficiency experiment indicates the high efficiency of the algorithm. According to the results of semantics experiment, compared with the strings which have the biggest frequencies or lengths in a text,the strings found by the proposed algorithm have more explicit meaning. Such strings have high values in the domains of text-compression and gene sequence analysis.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28