%0 Journal Article
%T Composition Profiler: a tool for discovery and visualization of amino acid composition differences
%A Vladimir Vacic
%A Vladimir N Uversky
%A A Keith Dunker
%A Stefano Lonardi
%J BMC Bioinformatics
%D 2007
%I BioMed Central
%R 10.1186/1471-2105-8-211
%X The program takes two samples of amino acids as input: a query sample and a reference sample. The latter provides a suitable background amino acid distribution, and should be chosen according to the nature of the query sample, for example, a standard protein database (e.g. SwissProt, PDB), a representative sample of proteins from the organism under study, or a group of proteins with a contrasting functional annotation. The results of the analysis of amino acid composition differences are summarized in textual and graphical form.As an exploratory data mining tool, our software can be used to guide feature selection for protein function or structure predictors. For classes of proteins with significant differences in frequencies of amino acids having particular physico-chemical (e.g. hydrophobicity or charge) or structural (e.g. ŚÁ helix propensity) properties, Composition Profiler can be used as a rough, light-weight visual classifier.Often the first step in characterizing a group of related non-homologous proteins (that is, for which there is no meaningful multiple sequence alignment) is to identify statistically significant patterns of amino acid enrichment or depletion. Here we introduce Composition Profiler, a web-based software that automates this task and graphically summarizes the results. Composition Profiler is also available as a stand-alone command line application that can be used for task automation or analysis of large samples. The following sections will introduce the methodology and discuss several examples of composition profiles in greater depth.Let P denote the protein sample under study, Q the background sample, and let pk and qk denote the probabilities of observing amino acid k in the two samples. Let us assume that the amino acid compositions of the two samples P and Q are independent and identically distributed, each generated by a separate stochastic process according to probability distributions p = (pAla, pArg, ...) and q = (qAla, qArg, ...).
%U http://www.biomedcentral.com/1471-2105/8/211