%0 Journal Article %T FLU, an amino acid substitution model for influenza proteins %A Cuong Dang %A Quang Le %A Olivier Gascuel %A Vinh Le %J BMC Evolutionary Biology %D 2010 %I BioMed Central %R 10.1186/1471-2148-10-99 %X A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from ~113, 000 influenza protein sequences, consisting of ~20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains ~42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU webcite or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/ webcite.FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.The majority of statistical methods used for analyzing protein sequences require an amino acid substitution model to describe the evolutionary process of protein sequences. Amino acid substitution models are frequently used to infer protein phylogenetic trees under maximum likelihood or Bayesian frameworks [[1,2], and references therein]. They are also used to estimate pairwise distances between protein sequences that subsequently serve as inputs for distance-based phylogenetic analyses [3]. Moreover, these models can be used for aligning protein sequences [4]. These and other applications of the amino acid substitution model are reviewed in [5].Many methods have been proposed to estimate general amino acid substitution models from large and diverse databases [[1,6], and references therein]. These methods belong to either counting or maximum likelihood approaches. The first counting method was proposed by Dayhoff et al. [7] to estimate the PAM model. As more protein sequences accumulated, Jones et al. [8] used the same counting method to estimate the JTT model from a larger protein data %U http://www.biomedcentral.com/1471-2148/10/99