全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A classification approach for genotyping viral sequences based on multidimensional scaling and linear discriminant analysis

DOI: 10.1186/1471-2105-11-434

Full-Text   Cite this paper   Add to My Lib

Abstract:

MuLDAS starts by aligning the query sequence to the reference multiple sequence alignment and calculating the subsequent distance matrix among the sequences. They are then mapped to a principal coordinate space by multidimensional scaling, and the coordinates of the reference sequences are used as features in developing linear discriminant models that partition the space by genotype. The genotype of the query is then given as the maximum a posteriori estimate. MuLDAS tests the model confidence by leave-one-out cross-validation and also provides some heuristics for the detection of 'outlier' sequences that fall far outside or in-between genotype clusters. We have tested our method by classifying HIV-1 and HCV nucleotide sequences downloaded from NCBI GenBank, achieving the overall concordance rates of 99.3% and 96.6%, respectively, with the benchmark test dataset retrieved from the respective databases of Los Alamos National Laboratory.The highly accurate genotype assignment coupled with several measures for evaluating the results makes MuLDAS useful in analyzing the sequences of rapidly evolving viruses such as HIV-1 and HCV. A web-based genotype prediction server is available at http://www.muldas.org/MuLDAS/ webcite.We are observing rapid growth in the number of viral sequences in the public databases [1]: for example, HIV-1 and HCV sequence entries in NCBI GenBank have doubled almost every three years. These viruses also show great genotypic diversities and thus have been classified into groups, so-called genotypes and subtypes [2,3]. Consequently classifying these virus strains into genotypes or subtypes based on their sequence similarities has become one of the most basic steps in understanding their evolution, epidemiology and developing antiviral therapies or vaccines. The conventional classification methods include the following: (1) the nearest neighbour methods that look for the best match of the query to the representatives of each genotype, so-called refe

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133