%0 Journal Article %T Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST %A E Michael Gertz %A Yi-Kuo Yu %A Richa Agarwala %A Alejandro A Sch£¿ffer %A Stephen F Altschul %J BMC Biology %D 2006 %I BioMed Central %R 10.1186/1741-7007-4-41 %X We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy.TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms.BLAST [1,2] is a popular and effective tool for finding significant alignments between a biological query sequence and a database of subject sequences. BLAST has several modes of operation, one of which aligns an amino acid query sequence to a database of nucleotide sequences, where the nucleotide sequences are often either fragments of a genome or cDNAs representing expressed genes. This mode of operation is known by the name TBLASTN. TBLASTN operates by translating database nucleotide sequences to hypothetical amino acid sequences in all six reading frames and then aligning the hypothetical amino acid sequences to the query.TBLASTN is widely used as associating proteins with chromosomes or with mRNAs is useful in many biological studies. Despite this popularity, a performance evaluation of TBLASTN has never been published. BLASTX, a related variant of BLAST that aligns a DNA sequence to a protein %U http://www.biomedcentral.com/1741-7007/4/41