oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 19 )

2018 ( 24 )

2017 ( 24 )

2016 ( 37 )

Custom range...

Search Results: 1 - 10 of 1959 matches for " Khalid Sayood "
All listed articles are free for downloading (OA Articles)
Page 1 /1959
Display every page Item
The Average Mutual Information Profile as a Genomic Signature
Mark Bauer, Sheldon M Schuster, Khalid Sayood
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-48
Abstract: We have analyzed genomic sequences of eukaryotic and prokaryotic chromosomes as well as various subtypes of viruses using an information theoretic framework. We confirm the existence of a species specific average mutual information (AMI) profile. We use these profiles to define a very simple, computationally efficient, alignment free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to separate and cluster subtypes of the HIV-1 virus, and classify DNA fragments to species of origin.AMI profiles of DNA sequences prove to be species specific and easy to compute. The structure of AMI profiles are conserved, even in short subsequences of a species' genome, rendering a pervasive signature. This signature can be used to classify relatively short DNA fragments to species of origin.The existence of patterns that can be used as a signature of data is indicative of statistical or deterministic structures in the data. In DNA sequences this structure can be due to biological processes which involve the DNA or they may appear because of events and processes in the evolutionary history of the DNA. There have been significant efforts in understanding the sequential structure and complexity of DNA using various approaches, information theoretic measures or other mathematical models.The standard approach to studying statistical relationships in a sequence is the use of correlation profiles or spectral profiles such as periodograms and power spectrums. To translate the sequence of letters that form the DNA sequence into a sequence of numbers, which can then be easily analyzed using autocorrelation or spectral techniques, different mappings have been proposed by Gates [1], Voss [2] and Peng et al. [3]. The power spectral densities obtained from these approaches show a power law relationship, which points to the existence of long range correlations. A number of m
Data Compression Concepts and Algorithms and Their Applications to Bioinformatics
?zkan U. Nalbantoglu,David J. Russell,Khalid Sayood
Entropy , 2010, DOI: 10.3390/e12010034
Abstract: Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.
Grammar-based distance in progressive multiple sequence alignment
David J Russell, Hasan H Otu, Khalid Sayood
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-306
Abstract: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets.We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets.Generation of meaningful multiple sequence alignments (MSAs) of biological sequences is a well-studied NP-complete problem, which has significant implications for a wide spectrum of applications [1,2]. In general, the challenge is aligning N sequences of varying lengths by inserting gaps in the sequences so that in the end all sequences have the same length. Of particular interest to computational biology are DNA/RNA sequences and amino acid sequences, which are comprised of nucleotide and amino acid residues, respectively.MSAs are generally used in studying phylogeny of organisms, structure prediction, and identifying segments of interest among many other applications in computational biology [3].Given a scoring scheme to evaluate the fitness of an MSA, calculating the best MSA is an NP-complete problem [1]. Variances in scoring schemes, need for expert-hand analysis in most applications, and many-to-one mapping governing elements-to-functionality (codon mapping and function) make MSA a more challenging problem when considered from a biological context as well [4].Generally, three approaches are used to automate the generation of MSAs. The first offers a brute-force method of multidimensional dynamic programming [5], which may find a good alignment but is generally computat
RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
Ozkan U Nalbantoglu, Samuel F Way, Steven H Hinrichs, Khalid Sayood
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-41
Abstract: We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The sensitivity-specificity characteristics for relatively longer contigs were compared with the PhyloPythia and TACOA algorithms. RAIphy performs better than these algorithms at varying clade-levels. For an acid mine drainage (AMD) metagenome, RAIphy was able to taxonomically bin the sequence read set more accurately than the currently available methods, Phymm and MEGAN, and more accurately in two out of three tests than the much more computationally intensive method, PhymmBL.With the introduction of the relative abundance index metric and an iterative classification method, we propose a taxonomic classification algorithm that performs competitively for a large range of DNA contig lengths assembled from metagenome data. Because of its speed, simplicity, and accuracy RAIphy can be successfully used in the binning process for a broad range of metagenomic data obtained from environmental samples.A principal goal of metagenomics [1] is to sample microbiomes and recover genetic material without isolating single organisms, thereby mitigating the problem of limiting genomic analysis to a small percentage of existing culturable species. Eventually, this will help extend the tree of life [2], enrich sequence libraries, and expand analysis from genomic to metagenomic (e.g., samples from various habitats could be used to study interactions within communities,
A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences
David J Russell, Samuel F Way, Andrew K Benson, Khalid Sayood
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-601
Abstract: The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST and UCLUST. The validation results are especially striking for large datasets.We introduce a fast and accurate clustering algorithm that relies on a grammar-based sequence distance. Its statistical clustering quality is validated by clustering large datasets containing 16S rDNA sequences.The amount of biological information being gathered is growing faster than the rate at which it can be analyzed. Data clustering, which compresses the problem space by reducing redundancy, is one viable tool for managing the explosive growth of data. In general, clustering algorithms are designed to operate on a large set of related values, eventually generating a smaller set of elements that represent groups of similar data points. A central data element may then be used as the sole representative of a group.Significant clustering work relating to bioinformatics may be traced to the late 1990 s when methods for quick generation of nonredundant (NR) protein databases were developed. These combined identical or nearly identical protein sequences into single entries [1-3]. The primary benefits of these methods include faster searches of the NR protein databases and reduced statistical bias in the query results [1]. Similarly, computer programs such as those in ICAtools [4] were developed for compressing DNA databases by removing redundant sequences found via clustering resulting in faster database queries. Note that the use of the term "clustering" in these applications differs from another use often found in the literature wh
Large Direct Repeats Flank Genomic Rearrangements between a New Clinical Isolate of Francisella tularensis subsp. tularensis A1 and Schu S4
Ufuk Nalbantoglu,Khalid Sayood,Michael P. Dempsey,Peter C. Iwen,Stephen C. Francesconi,Ravi D. Barabote,Gary Xie,Thomas S. Brettin,Steven H. Hinrichs,Paul D. Fey
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0009007
Abstract: Francisella tularensis subspecies tularensis consists of two separate populations A1 and A2. This report describes the complete genome sequence of NE061598, an F. tularensis subspecies tularensis A1 isolated in 1998 from a human with clinical disease in Nebraska, United States of America. The genome sequence was compared to Schu S4, an F. tularensis subspecies tularensis A1a strain originally isolated in Ohio in 1941. It was determined that there were 25 nucleotide polymorphisms (22 SNPs and 3 indels) between Schu S4 and NE061598; two of these polymorphisms were in potential virulence loci. Pulsed-field gel electrophoresis analysis demonstrated that NE061598 was an A1a genotype. Other differences included repeat sequences (n = 11 separate loci), four of which were contained in coding sequences, and an inversion and rearrangement probably mediated by insertion sequences and the previously identified direct repeats I, II, and III. Five new variable-number tandem repeats were identified; three of these five were unique in NE061598 compared to Schu S4. Importantly, there was no gene loss or gain identified between NE061598 and Schu S4. Interpretation of these data suggests there is significant sequence conservation and chromosomal synteny within the A1 population. Further studies are needed to determine the biological properties driving the selective pressure that maintains the chromosomal structure of this monomorphic pathogen.
An Application of Cyclotomic Polynomial to Factorization of Abelian Groups  [PDF]
Khalid Amin
Open Journal of Discrete Mathematics (OJDM) , 2011, DOI: 10.4236/ojdm.2011.13017
Abstract: If a finite abelian group G is a direct product of its subsets such that G = A1···Ai···An, G is said to have the Hajos-n-proprty if it follows that one of these subsets, say Ai is periodic, meaning that there exists a nonidentity element g in G such that gAi = Ai . Using some properties of cyclotomic polynomials, we will show that the cyclic groups of orders pα and groups of type (p2,q2) and (pα,pβ) where p and q are distinct primes and α, β integers ≥ 1 have this property.
Changes in the Shoreline Position Caused by Natural Processes for Coastline of Marsa Alam – Hamata, Red Sea, Egypt  [PDF]
Khalid Dewidar
International Journal of Geosciences (IJG) , 2011, DOI: 10.4236/ijg.2011.24055
Abstract: The probability of storms and ice-drift events and their impact on coasts is expected to increase as result of climate change. Multi-years shoreline mapping is considered a valuable task for coastal monitoring and assessment. This paper presents shoreline maps illustrating the shoreline erosion accretion pattern in the coastal area between Marsa Alam – Hamata of Red Sea coastline by using different sources of remote sensing data. In the present study, Landsat MSS (1972), Landsat TM (1990), Landsat ETM+ (1998, 2000) and Terra Aster (2007) satellite images were used. In this study, two techniques were used to estimate rate of shoreline retreat. The first technique is corresponding to the formation of automated shoreline positions and the second one is for estimating rate of shoreline change based on data of remote sensing applying Digital Shoreline Analysis System (DSAS) software. In this study, the End Point Rate (EPR) was calculated by dividing the distance of shoreline movement by the time elapsed between the earliest and latest measurements at each transect. Alongshore rate changes shows that there are changes of erosion and accretion pattern due to coastal processes and climate changes.
Simulation of Average Turbulent Pipe Flow: A Three-Equation Model  [PDF]
Khalid Alammar
Open Journal of Fluid Dynamics (OJFD) , 2014, DOI: 10.4236/ojfd.2014.41005
Abstract: The aim of this study is to evaluate a three-equation turbulence model applied to pipe flow. Uncertainty is approximated by comparing with published direct numerical simulation results for fully-developed average pipe flow. The model is based on the Reynolds averaged Navier-Stokes equations. Boussinesq hypothesis is invoked for determining the Reynolds stresses. Three local length scales are solved, based on which the eddy viscosity is calculated. There are two parameters in the model; one accounts for surface roughness and the other is possibly attributed to the fluid. Error in the mean axial velocity and Reynolds stress is found to be negligible.
An Updated Review on Chicken Eggs: Production, Consumption, Management Aspects and Nutritional Benefits to Human Health  [PDF]
Khalid Zaheer
Food and Nutrition Sciences (FNS) , 2015, DOI: 10.4236/fns.2015.613127
Abstract: Ancestors of the modern chicken were domesticated from members of the Gallus genus probably 7 to 8 thousand years ago in southeastern Asia. Subsequently, they spread globally for meat and egg production. In the chicken egg, there is a balance of numerous, high-quality nutrients, many of which are highly bioavailable. The egg confers a multitude of health benefits to consumers emphasizing its classification as a functional food. Current global per capita egg consumption estimates approach 9 kg annually but vary greatly on a regional basis. This review deals with global production, consumption, and management aspects such as hygiene, feeding, and housing. Management aspects play key roles in the composition, quality, food safety, and visual (consumer) appeal of the egg. Also the manipulation of egg nutrients and value for human health is discussed.
Page 1 /1959
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.