最大频长积字符串及其高效查找算法  

STRING WITH THE BIGGEST PFL AND EFFICIENT SEARCHING ALGORITHM

在线阅读下载全文

作  者:严铭清[1] 陶晓鹏[1] 胡运发[1] 

机构地区:[1]复旦大学计算机与信息技术系,上海200433

出  处:《计算机应用与软件》2008年第11期3-5,22,共4页Computer Applications and Software

基  金:国家自然科学基金资助(60473070)

摘  要:在传统的字符串处理算法中往往分别考虑字符串的频度和长度。然而,在实际应用中,将字符串的频度和长度结合考虑是有意义的。基于这点我们提出了频长积的概念,规定字符串的频度和长度的乘积为字符串的频长积。并基于广义后缀树和Uk- konen算法,提出了时间复杂度为O(N)的查找算法。效率实验证实了该算法的高效性。语义实验表明,本算法找出的最大频长积字符串相比于最大频度字符串或最大长度字符串,其实际语义更为明确。这样的字符串在文本压缩、基因序列的分析以及其他注重语义的应用中将具有很高的应用价值。In the traditional processing of natural language text, researchers often only consider the frequencies of the strings emerging in a text or only the lengths of them. However, it is meaningful to take the frequency and the length of a string into consideration together in practical applications. PFL (Product of Frequency and Length) is defined as the product of a string' s frequency and its length, and then an algorithm which finds the string that has the biggest PFL in a text is constructed. An efficient algorithm based on the Generalized Suffix Tree and Ukkonen algorithm is provided to search the string with the biggest PFL. The efficiency experiment indicates the high efficiency of the algorithm. According to the results of semantics experiment, compared with the strings which have the biggest frequencies or lengths in a text,the strings found by the proposed algorithm have more explicit meaning. Such strings have high values in the domains of text-compression and gene sequence analysis.

关 键 词:频长积 最大频长积字符串 广义后缀树 Ukkonen算法 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论] O157.5[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象