全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
电子学报  2015 

基于码书索引变换的高通量DNA序列数据压缩算法

DOI: 10.3969/j.issn.0372-2112.2015.05.026, PP. 1007-1013

Keywords: 高通量DNA序列,码书索引变换模型,块排序压缩变换,前移编码,信息熵,数据压缩算法

Full-Text   Cite this paper   Add to My Lib

Abstract:

提出一种高通量DNA序列数据的压缩算法.该算法先采用码书索引变换模型,将传统码书索引值的表示方法变换成由四个标准碱基字符替代的四进制数值方式,并采用一种界定替换串与非替换串的简明编码方法,接着通过信息熵的大小来决定是否进行块排序压缩变换(BWT),最后进行前移编码变换和Huffman熵编码.在多种测序数据集上的实验结果表明,CITD在大多数情况下可以获得比本文所对比的高通量DNA专用压缩方法更优的压缩性能.

References

[1]  Kuruppu S,Beresford-Smith B,et al.Iterative dictionary construction for compression of large DNA datasets:supplementary material[OL].http://www.computer.org/csdl/trans/tb/2012/01/ttb2012010137Abs.html,2013-12-09.
[2]  纪震,周家锐,等.基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,38(4):991-995. Ji Zhen,Zhou Jia-rui,et al.Bioinformatics features based DNA sequence data compression algorithm[J].Acta Electronica Sinica,2011,38(4):991-995.(in Chinese)
[3]  朱泽轩,张永朋,等.高通量DNA测序数据压缩研究进展[J].深圳大学学报理工版,2013,30(4):409-415. Zhu Ze-xuan,Zhang Yong-peng,et al.Advance in the compression of high-throughput DNA sequencing data[J].Journal of Shenzhen University Science and Engineering,2013,30(4):409-415.(in Chinese)
[4]  纪震,周家锐,等.DNA序列数据压缩技术综述[J].电子学报,2010,38(5):1113-1121. Ji Zhen,Zhou Jia-rui,et al.Overview of DNA sequence data compression techniques[J].Acta Electronica Sinica,2010,38(5):1113-1121.(in Chinese)
[5]  Korodi G,Tabus I.An efficient normalized maximum likelihood algorithm for DNA sequence compression[J].ACM Transactions on Information Systems,2005,23(1):3-34.
[6]  Zhu Zexuan,Zhou Jiarui,et al.DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm[J].IEEE Transactions on Evolutionary Computation,2011,15(5):643-658.
[7]  周家锐,纪震,等.基于Memetic优化的智能DNA序列数据压缩算法[J].电子学报,2013,41(3):513-518. Zhou Jia-rui,Ji Zhen,et al.Intelligent DNA sequence data compression using memetic algorithm[J].Acta Electronica Sinica,2013,41(3):513-518.(in Chinese)
[8]  Kuruppu S,Puglisi S J,et al.Optimized relative Lempel-Ziv compression of genomes[A].Proceeding of the 34th Australasian Computer Science Conference[C].Australia:ACSC,2011.91-98.
[9]  Wang Congmao,Zhang Dabing.A novel compression tool for efficient storage of genome resequencing data[J].Nucleic Acids Research,2011,39(7):E45-U74.
[10]  Jones D,Ruzzo W,et al.Compression of next-generation sequencing reads aided by highly efficient de novo assembly[J].Nucleic Acids Research,2012,40(22):E171.
[11]  Kuruppu S,Beresford-Smith B,et al.Iterative dictionary construction for compression of large DNA data sets[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2012,9(1):137-149.
[12]  Li Cong,Ji Zhenzhou,et al.Efficient parallel design for BWT-based DNA sequences data multicompression algorithm[A].Proceeding of International Conference on Automatic Control and Artificial Intelligence[C].Xiamen:ACAI,2012.967-970.
[13]  Wikipedia.Move-to-front Transformation[DB/OL].http://en.wikipedia.org/wiki/Move-to-front-transform,2013-12-09.
[14]  Wikipedia.FASTA[DB/OL].Http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml,2013-12-09.
[15]  Shamir.Gil I.Universal lossless compression with unknown alphabets-the average case[J].IEEE Transactions on Information Theory,2006,52(11):4915-4944.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133