oalib

OALib Journal期刊

ISSN: 2333-9721

费用:99美元

投稿

匹配条件: “OpenCL” ,找到相关结果约28条。
列表显示的所有文章,均可免费获取
第1页/共28条
每页显示
Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: OpenCL
Shinichi Yamagiwa
International Journal of Networking and Computing , 2012,
Abstract: Multicore/manycore architecture accelerates demand for a new programming environment to utilize the massive processors integrated in an LSI. GPU (Graphics Processing Unit) is one of the typical hardware environments. The programming environments on GPU are traditionally vendor-/hardware-specific, where complicate the management of uniform programs that access computing resources of the massively parallel platform. The recently released OpenCL is expected to become a standard for providing a uniform programming environment for the heterogeneous processors from different vendors. This tutorial paper introduces the overview of the OpenCL that motivates the programmers who are going to program the massively parallel hardware or who migrates the programming method from another vendor specific programming interface to the OpenCL. This paper explains the haracteristics of the OpenCL interface with describing in detail the basic structures used in the program. Moreover, this paper discusses performance aspects to evaluate advanced programming techniques that improve the performance of the OpenCL applications.
Parallelism and Research on Functions with Continuously Independent Data and Intensive Memory Access Using OpenCL
基于OpenCL的连续数据无关访存密集型函数并行与优化研究

蒋丽媛,张云泉,龙国平,贾海鹏
计算机科学 , 2013,
Abstract: Continuously independent data type means when calculating the continuous elements of destination matrix,the used elements of source matrices are also continuous and there are no relationship among them. Intensive memory access function is the function that has less computation but a lot of data transfer operations. This paper took the bit wise function as the example, studied and implemented the parallel and the optimizing methods of the continuously independent data and intensive memory access function on GPU platforms. Based on the OpenCL framework, this paper studied and compared various optimizing methods, such as vectorizing, threads organizing, and instruction selecting, and finally used these methods to implement the cross-platform transfer of the bitwise function among different platforms.The study tested the function's execution time without data transfer both on AMD GPU and NVIDIA GPU platforms.On the AMD Radeon HD 5850 platform, the performance has reached 40 times faster than the CPU version in OpenCV library, 90 times faster on AMD Radeon HD 7970 platform, and 60 times faster on NVIDIA GPU hesla C2050 platform. On NVIDIA GPU `hesla C2050 platform,the speedup is 1. 5 comparing with the CUI}A version in C}enCV library.
基于opencl的归约算法优化
颜深根?,张云泉?,龙国平?,李焱?
软件学报 , 2011,
Abstract: 归约算法在科学计算和图像等领域有着广泛应用,系统研究了在opencl框架下,归约算法在gpu上的跨平台性能优化.已有研究工作一般只侧重单个硬件架构,基于opencl从向量化、片上存储体冲突、线程组织方式和指令选择优化等多个优化角度系统考察了不同优化方法在gpu硬件平台的影响.具体以minmax函数为例,对每种优化方法进行了详细的性能分析,并给出了提高性能的原因.在amdgpu和nvidiagpu平台分别测试的结果表明,优化后的算法在两个平台上都能实现很好的性能加速.在amdatiradeonhd5850平台上,int和float类型数据带宽利用最高达到了实测带宽的89%.在nvidiagputeslac2050平台上,性能也达到了cuda版本的相应函数性能的1.3~1.9倍.
基于opencl的连续数据无关访存密集型函数并行与优化研究
蒋丽媛,张云泉,龙国平,贾海鹏?
计算机科学 , 2013,
Abstract: 连续的数据无关是指计算目标矩阵连续的元素时使用的源矩阵元素之间没有关系且也为连续的,访存密集型是指函数的计算量较小,但是有大量的数据传输操作。在opencl框架下,以bitwise函数为例,研究和实现了连续数据无关访存密集型函数在gpu平台上的并行与优化。在考察向量化、线程组织方式和指令选择优化等多个优化角度在不同的gpu硬件平台上对性能的影响之后,实现了这个函数的跨平合性能移植。实验结果表明,在不考虑数据传输的前提下,优化后的函数与这个函数在opencv库中的cpu版本相比,在amdhd5850gpu达到了平均40倍的性能加速比;在amdhd7970gpu达到了平均90倍的性能加速比;在nvidiatesla02050cpu上达到了平均60倍的性能加速比;同时,与这个函数在opencv库中的cuda实现相比,在nvidiatesla02050平台上也达到了1.5倍的性能加速。
Comparison of OpenMP & OpenCL Parallel Processing Technologies
Krishnahari Thouti,S.R.Sathe
International Journal of Advanced Computer Sciences and Applications , 2012,
Abstract: This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing all available cores and allocating sufficient amount of work among all computing units, can lead to improved performance. In our simulation, we used Fedora operating system; a system with Intel Xeon Dual core processor having thread count 24 coupled with NVIDIA Quadro FX 3800 as graphical processing unit
Multicore Processing for Clustering Algorithms
RekhanshRao,Kapil Kumar Nagwanshi,SipiDubey
International Journal of Computer Technology and Applications , 2012,
Abstract: Data Mining algorithms such as classification and clustering are the future of computation, though multidimensional data-processing is required. People are using multicore processors with GPU’s. Most of the programming languages doesn’t provide multiprocessing facilities and hence wastage of processing resources. Clustering and classification algorithms are more resource consuming. In this paper we have shown strategies to overcome such deficiencies using multicore processing platform OpelCL.
A Uniform Platform to Support Multigenerational GPUs for High Performance Stream-based Computing
Pablo Lamilla álvarez,Shinichi Yamagiwa,Masahiro Arai,Koichi Wada
International Journal of Networking and Computing , 2011,
Abstract: GPU-based computing has become one of the popular high performance computing fields. The field is called GPGPU. This paper is focused on design and implementation of a uniform GPGPU application that is optimized for both the legacy and the recent GPU architectures. As a typical example of such the GPGPU application, this paper will discuss the uniform implementation of the Caravela platform. Especially the flow-model execution mechanism will be considered referring the recent GPU architectures. To verify the design and the implementation on CUDA and OpenCL platform, this paper will evaluate the compatibility among the architectures, and also test measurements of performance.
GPGPU COMPUTING
BOGDAN OANCEA,TUDOREL ANDREI,RALUCA MARIANA DRAGOESCU
Challenges of the Knowledge Society , 2012,
Abstract: Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified Device Architecture) by NVIDIA and Stream by AMD. These are APIs designed by the GPU vendors to be used together with the hardware that they provide. A new emerging standard, OpenCL (Open Computing Language) tries to unify different GPU general computing API implementations and provides a framework for writing programs executed across heterogeneous platforms consisting of both CPUs and GPUs. OpenCL provides parallel computing using task-based and data-based parallelism. In this paper we will focus on the CUDA parallel computing architecture and programming model introduced by NVIDIA. We will present the benefits of the CUDA programming model. We will also compare the two main approaches, CUDA and AMD APP (STREAM) and the new framwork, OpenCL that tries to unify the GPGPU computing models.
Introducing Intelligent Agents Potential into a competent Integral Multi-Agent Sensor Network Simulation Architecture Design  [PDF]
A. Filippou, D. A. Karras
Journal of Software Engineering and Applications (JSEA) , 2013, DOI: 10.4236/jsea.2013.67B008
Abstract:

During this research we spot several key issues concerning WSN design process and how to introduce intelligence in the motes. Due to the nature of these networks, debugging after deployment is unrealistic, thus an efficient testing method is required. WSN simulators perform the task, but still code implementing mote sensing and RF behaviour consists of layered and/or interacting protocols that for the sake of designing accuracy are tested working as a whole, running on specific hardware. Simulators that provide cross layer simulation and hardware emulation options may be regarded as the last milestone of the WSN design process. Especially mechanisms for introducing intelligence into the WSN decision making process but in the simulation level is an important aspect not tackled so far in the literature at all. The herein proposed multi-agent simulation architecture aims at designing a novel WSN simulation system independent of specific hardware platforms but taking into account all hardware entities and events for testing and analysing the behaviour of a realistic WSN system. Moreover, the design herein outlined involves the basic mechanisms, with regards to memory and data management, towards Prolog interpreter implementation in the simulation level.

基于opencl的均值平移算法在多个众核平台的性能优化研究
庞 旭,张云泉,龙国平,贾海鹏,颜深根?
计算机科学 , 2013,
Abstract: opencl作为一种面向多种平台、通用目的的编程标准,已经对许多应用程序进行了加速。由于平台硬件和软件环境的差异,通用的优化方法不一定在所有平台都有很好的加速。通过对均值平移算法在gpu和apu平台的优化,探讨了不同平台各种优化方法的贡献力,一方面研究各个平台的计算特性,另一方面体会不同优化方法的优劣,在优劣的相互转化中寻求最优的解决方案。实验表明,算法并行优化前、后在aviv5850,tesla02050和apua6365。上分别达到了9.68,5.74和1.27倍加速,并行相比串行程序达到79.73,93.88和2.22倍加速,前两个平台opcncl版本相比,cuva版本的opencv程序达到1.27和1.24倍加速。
第1页/共28条
每页显示


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.