全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Efficient Serial and Parallel Algorithms for Selection of Unique Oligos in EST Databases

DOI: 10.1155/2013/793130

Full-Text   Cite this paper   Add to My Lib

Abstract:

Obtaining unique oligos from an EST database is a problem of great importance in bioinformatics, particularly in the discovery of new genes and the mapping of the human genome. Many algorithms have been developed to find unique oligos, many of which are much less time consuming than the traditional brute force approach. An algorithm was presented by Zheng et al. (2004) which finds the solution of the unique oligos search problem efficiently. We implement this algorithm as well as several new algorithms based on some theorems included in this paper. We demonstrate how, with these new algorithms, we can obtain unique oligos much faster than with previous ones. We parallelize these new algorithms to further improve the time of finding unique oligos. All algorithms are run on ESTs obtained from a Barley EST database. 1. Introduction Expressed Sequence Tags (or ESTs) are fragments of DNA that are about 200–800 bases long generated from the sequencing of complementary DNA. ESTs have many applications. They were used in the Human Genome Project in the discovery of new genes and are often used in the mapping of genomic libraries. They can be used to infer functions of newly discovered genes based on comparison to known genes [1]. An oligonucleotide (or oligo) is a subsequence of an EST. Oligos are short, since they are typically no longer than 50 nucleotide bases. Oligos are often referred to in the context of their length by adding the suffix “mer”. For example, an oligo of length 9 would be referred to as a 9-mer. The importance of oligos in relation to EST databases is quite significant. An oligo that is unique in an EST database serves as a representative of its EST sequence. The oligonucleotides (or simply oligos) contained in these EST databases have applications in many areas such as PCR primer design, microarrays, and probing genomic libraries [2–4]. In this paper we will improve on the algorithms presented in [2] to solve the unique oligos search problem. This problem requires us to determine all oligos that appear in one EST sequence but not in any of the others. In addition, we will consider two oligos to be virtually identical if they fall within a certain number of mismatches from each other. In the appendix we include all the algorithms used and developed in this paper. 2. The Unique Oligos Search Problem In this paper we use the notation to denote the Hamming Distance between the strings and . Given an EST database , where is a string over the alphabet , integers and , and -mer , we say that occurs approximately in if there exists a substring of

References

[1]  M. D. Adams, J. M. Kelley, J. D. Gocayne et al., “Complementary DNA sequencing: expressed sequence tags and human genome project,” Science, vol. 252, no. 5013, pp. 1651–1656, 1991.
[2]  J. Zheng, T. J. Close, T. Jiang, and S. Lonardi, “Efficient selection of unique and popular oligos for large EST databases,” Bioinformatics, vol. 20, no. 13, pp. 2101–2112, 2004.
[3]  S. H. Nagaraj, R. B. Gasser, and S. Ranganathan, “A hitchhiker's guide to expressed sequence tag (EST) analysis,” Briefings in Bioinformatics, vol. 8, no. 1, pp. 6–21, 2007.
[4]  W. Klug, M. Cummings, and C. Spencer, Concepts of Genetics, Prentice-Hall, Upper Saddle River, NJ, USA, 8th edition, 2006.
[5]  F. Li and G. D. Stormo, “Selection of optimal DNA oligos for gene expression arrays,” Bioinformatics, vol. 17, no. 11, pp. 1067–1076, 2001.
[6]  S. Rahmann, “Rapid large-scale oligonucleotide selection for microarrays,” in Proceedings of the 1st IEEE Computer Society Bioinformatics Conference (CSB '02), pp. 54–63, IEEE Press, Stanford, Calif, USA, 2002.
[7]  S. Go, Combinatorics and its applications in DNA analysis [M.S. thesis], Department of Mathematics and Statistics, Memorial University of Newfoundland, 2009.
[8]  OpenMP.org, 2012, http://openmp.org/wp/.
[9]  Khronos Group, “OpenCL—The open standard for parallel programming of heterogeneous systems,” 2012, http://www.khronos.org/opencl/.
[10]  Nvidia, “Parallel Programming and Computing Platform—Cuda—Nvidia,” 2012, http://www.nvidia.com/object/cuda_home_new.html.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133