%0 Journal Article
%T CuMen: Clustering Sequences Based on Maximal Frequent Sequential Pattern and its Application in Genome Sequence Assembly
CuMen:基于最大频繁序列模式的聚类算法及其在基因拼接中的应用
%A HUANG Dong
%A TANG Jun
%A WANG Wei
%A SHI Bai-Le
%A
黄东
%A 唐俊
%A 汪卫
%A 施伯乐
%J 计算机科学
%D 2005
%I
%X Sequencing genomes is a fundamental aspect of biological research. A variety of assembly programs have been previously proposed and implemented. Because of great computational complexity and increasingly large size, they incur great time and space overhead. In realistic applications, sequencing process might come to become unacceptably slow for insufficient memory even with a mainframe with huge RAM. This paper offeres a clustering algorithm based on maximal frequent sequential patterns,aiming at divide the whole dataset into several parts which can be processed independently and efficiently in limited memory. Some techniques are applied to optimize the mining and clustering procedure. This approach is introduced into grid environment, exploiting parallelism and distribution for improving scalability further.
%K Maximal frequent sequential pattern
%K Sequence clustering
%K Sequence assembly
%K Grid
最大频繁序列模式
%K 序列聚类
%K 序列拼接
%K 网格
%K 基因组序列
%K 序列模式
%K 拼接处理
%K 聚类算法
%K 应用
%K 生物数据
%K 算法复杂度
%K 网格系统
%K 资源管理
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=C6971577391CB420&yid=2DD7160C83D0ACED&vid=9971A5E270697F23&iid=F3090AE9B60B7ED1&sid=EB552E4CFC85690B&eid=D59111839E7C8BDF&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=11