|
- 2015
基于并行计算的大规模外显子芯片数据分析
|
Abstract:
快速准确地计算出转录组表达水平对转录组研究具有重要的作用。本文针对伽玛分布的概率模型(Gamma model for exon array data, GME)在处理大规模外显子芯 片数据集上效率低下的特点,提出一种充分利用多核处理机或者集群环境来提高效率的并行 计算方法。首先分析GME模型的原理,其次分析模型并行算法的选择,最后在不同规模的数 据集上分析并行计算的效率。通过实验验证了并行计算极大地提高了模型的计算效率。实验结果表 明,与先前的串行计算相比,并行计算使得GME模型更适用于大规模的外显子芯片分析。
The accurate and fast calculation of transcriptome expression level plays an important role in transcriptome research. Based on the previously devised Gamma model for exon array data (GME), a parallel computing method is proposed to improve the computational efficiency of GME on large scale Affymetrix exon chip datasets by taking full advantage of multi-core or cluster computation environment. The princi ples of the GME model and the parallel computing strategy are introduced. The proposed method i s verified using real datasets with various scales. The experimental results show that the propos ed parallel computing approach greatly improves the efficiency of GME model. Thus the GME model is applicable for the analysis on large scale exon array datasets