检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机工程与科学》2013年第9期127-131,共5页Computer Engineering & Science
基 金:北大方正集团有限公司数字出版技术国家重点实验室开放课题资助项目(2012072011)
摘 要:基于双数组Trie树的中文分词词典具有较高的查找效率,但其插入时间复杂度较高。为此提出了一种基于双数组Trie树结构的改进算法iDAT,在原始词典初始化时优先处理分支多的节点,并在初始化之后对base数组中的空序列的下标值做Hash,Hash表中存放空序列之前的所有空序列个数之和,而后运用iDAT算法进行插入。本算法借鉴了单模式匹配的Sunday算法中的跳跃思想,在适当增加空间开销的基础上,降低了Trie树在动态插入过程中的平均时间复杂度,在实际操作过程中有着良好的性能。Abstract:Chinese word segmentation dictionary based on the double-array Trie-tree has higher search effi- ciency, but the dynamic insertion consumes a lot of time. Therefore, an improved algorithm (iDAT) based on double-array Trie-tree for Chinese word segmentation dictionary is proposed. The nodes with more branches are handled while the original dictionary is being initialized. After the initialization, a Hash process is performed on the index values of empty sequence in base array. The final Hash table stores the sum of the empty sequences before the current empty sequence. After that, the iDAT is used to carry out the dynamic insertion process. This algorithm adopts Sunday jumps algorithm of single pattern matching. With the reasonable increasement of space, it reduces the the average time complexity of the dynamic insertion process in Trie-tree. Practical results show it has good operation performance.
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.129.58.166