全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

High-throughput bioinformatics with the Cyrille2 pipeline system

DOI: 10.1186/1471-2105-9-96

Full-Text   Cite this paper   Add to My Lib

Abstract:

We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster.The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines.Large-scale computational analysis of biomolecular data often involves the execution of multiple, interdependent operations on an input dataset. The software tools, models and databases that are used in this process need to be arranged in precise computational chains, where output of one analysis serves as the input of a subsequent analysis. Such chains are often referred to as pipelines or workflows. In formal terms, a pipeline can be defined as a graph that describes the order of, and mutual relationships between, the analyses to be performed on an input dataset. In a pipeline representation, an operation performed by a computational tool on input data is represented by a node. The connection between two nodes is represented by an edge and defines a stream of data in-between two analyses. An example of a simple computational pipeline representing part of a genome annotation process is depicted in Figure 1.Even for a small bioinformatics project with a few interdependent analyses, it is cumbersome to perform all operations manually. For larger projects, e.g. the annotation of a complete eukaryotic genome, which may require the use of dozens of interdependent tools, including gene prediction tools, homology searches against different databases, protein domain analyses and repeat

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133