oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Progress and status of APEmille  [PDF]
APE collaboration,A. Bartoloni,S. Cabasino,N. Cabibbo,M. Cosimi,P. De Riso,W. Errico,S. Giovannetti,F. Laico,H. Leich,A. Lonardo,G. Magazzu,A. Michelotti,E. Panizzi,P. S. Paolucci,D. Rossetti,U. Schwendicke,H. Simma,K. H. Sulanke,M. Torelli,R. Tripiccione,P. Vicini
Physics , 1997, DOI: 10.1016/S0920-5632(97)00965-1
Abstract: We report on the progress and status of the APEmille project: a SIMD parallel computer with a peak performance in the TeraFlops range which is now in an advanced development phase. We discuss the hardware and software architecture, and present some performance estimates for Lattice Gauge Theory (LGT) applications.
Status of APEmille  [PDF]
APE-Collaboration,:,A. Bartoloni,P. Boucaud,N. Cabibbo,F. Calvayrac,M. Della Morte,R. De Pietri,P. De Riso,F. Di Carlo,F. Di Renzo,W. Errico,R. Frezzotti,T. Giorgino,J. Heitger,A. Lonardo,M. Loukianov,G. Magazzu,J. Micheli,V. Morenas,N. Paschedag,O. Pene,R. Petronzio,D. Pleiter,F. Rapuano,J. Rolf,D. Rossetti,L. Sartori,H. Simma,F. Schifano,M. Torelli,R. Tripiccione,P. Vicini,P. Wegner
Physics , 2001, DOI: 10.1016/S0920-5632(01)01922-3
Abstract: This paper presents the status of the APEmille project, which is essentially completed, as far as machine development and construction is concerned. Several large installations of APEmille are in use for physics production runs leading to many new results presented at this conference. This paper briefly summarizes the APEmille architecture, reviews the status of the installations and presents some performance figures for physics codes.
Dynamic Range Input FFT Algorithm for Signal Processing in Parallel Processor Architecture
Md. Mashiur Rahman,Yadagiri Pyaram,S. M. Mohsin Reza,S.M. Khaled Reza
Lecture Notes in Engineering and Computer Science , 2011,
Abstract:
Payload Inspection Using Parallel Bloom Filter in Dual Core Processor  [cached]
Arulanand Natarajan,S. Subramanian
Computer and Information Science , 2010, DOI: 10.5539/cis.v3n4p215
Abstract: This paper presents payload inspection for identification of spam files using bloom filter in dual core processor. Spam files flood the Internet in an attempt to dump the messages on recipients who do not intend to receive it. Spam costs the sender very little to send and most of the costs are levied to the recipients or the carriers. The proposed system identifies and filters the incoming spam files using Bloom filter algorithm implemented in dual core processor. The results of the Bloom filter algorithm are examined and these results demonstrate the performance of Sequential Bloom filter and Parallel Bloom filter in a Dual Core Processor.
Massively Parallel Processor Architectures for Resource-aware Computing  [PDF]
Vahid Lari,Alexandru Tanase,Frank Hannig,Jürgen Teich
Computer Science , 2014,
Abstract: We present a class of massively parallel processor architectures called invasive tightly coupled processor arrays (TCPAs). The presented processor class is a highly parameterizable template, which can be tailored before runtime to fulfill costumers' requirements such as performance, area cost, and energy efficiency. These programmable accelerators are well suited for domain-specific computing from the areas of signal, image, and video processing as well as other streaming processing applications. To overcome future scaling issues (e.g., power consumption, reliability, resource management, as well as application parallelization and mapping), TCPAs are inherently designed in a way to support self-adaptivity and resource awareness at hardware level. Here, we follow a recently introduced resource-aware parallel computing paradigm called invasive computing where an application can dynamically claim, execute, and release resources. Furthermore, we show how invasive computing can be used as an enabler for power management. Finally, we will introduce ideas on how to realize fault-tolerant loop execution on such massively parallel architectures through employing on-demand spatial redundancies at the processor array level.
Parallel Processor for 3D Recovery from Optical Flow  [PDF]
Jose Hugo Barron-Zambrano,Fernando Martin del Campo-Ramirez,Miguel Arias-Estrada
International Journal of Reconfigurable Computing , 2009, DOI: 10.1155/2009/973475
Abstract: 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Parallel Processor Design and Implementation for Molecular Dynamics Simulations on a FPGA-Based Supercomputer  [cached]
Server Kasap,Khaled Benkrid
Journal of Computers , 2012, DOI: 10.4304/jcp.7.6.1312-1328
Abstract: The design and implementation of an FPGA core that parallelises all the necessary operations to compute the non-bonded interactions in a MD simulation with the purpose of accelarating the LAMMPS MD software is presented in this paper. Our MD processor core comprised of 4 identical pipelines working independently in parallel to evaluate the non-bonded potentials, forces and virials was implemented on the nodes of a FPGA-based supercomputer named Maxwell. Implementing our FPGA core on multiple nodes of Maxwell allowed us to produce a special-purpose parallel machine for the hardware acceleration of MD simulations. The timing performance figures of this machine for the pairwise LJ and short-range Coulombic (via PPPM) interaction computations in the MD simulations of the solvated Rhodopsin protein systems with various numbers of atom show performance gains over the pure software implementation by factors of up to 13 on two nodes of the Maxwell machine. Furthermore, our MD machine is highly scalable, yielding higher computational power with the additional Maxwell nodes. To our knowledge, this is the first attempt to port an existing production-grade MD software to a FPGA-based parallel computer.
A New Hardware Architecture for Parallel Shortest Path Searching Processor Based-on FPGA Technology
Jassim M. Abdul-Jabbar,Majid A. Alwan,Mohammed A. Ali Al-Ebadi
International Journal of Electronics and Computer Science Engineering , 2012,
Abstract: In this paper, a new FPGA-based parallel processor for shortest path searching for OSPF networks is designed and implemented. The processor design is based on parallel searching algorithm that overcomes the long time execution of the conventional Dijkstra algorithm which is used originally in OSPF network protocol. Multiple shortest links can be found simultaneously and the execution iterations of the processing phase are limited to instead of of Dijkstra algorithm. Depending on the FPGA chip resources, the processor is expanded to be able to process an OSPF area with 128 routers. High speed up factors of our proposal processor against the sequential Dijkstra execution times, within (76.77-103.45), are achieved.
Long-range interactions & parallel scalability in molecular simulations  [PDF]
Michael Patra,Marja T. Hyvonen,Emma Falck,Mohsen Sabouri-Ghomi,Ilpo Vattulainen,Mikko Karttunen
Physics , 2004,
Abstract: Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modelling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes - we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e., communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Parallel Knowledge Embedding with MapReduce on a Multi-core Processor  [PDF]
Miao Fan,Qiang Zhou,Thomas Fang Zheng,Ralph Grishman
Computer Science , 2015,
Abstract: This article firstly attempts to explore parallel algorithms of learning distributed representations for both entities and relations in large-scale knowledge repositories with {\it MapReduce} programming model on a multi-core processor. We accelerate the training progress of a canonical knowledge embedding method, i.e. {\it translating embedding} ({\bf TransE}) model, by dividing a whole knowledge repository into several balanced subsets, and feeding each subset into an individual core where local embeddings can concurrently run updating during the {\it Map} phase. However, it usually suffers from inconsistent low-dimensional vector representations of the same key, which are collected from different {\it Map} workers, and further leads to conflicts when conducting {\it Reduce} to merge the various vectors associated with the same key. Therefore, we try several strategies to acquire the merged embeddings which may not only retain the performance of {\it entity inference}, {\it relation prediction}, and even {\it triplet classification} evaluated by the single-thread {\bf TransE} on several well-known knowledge bases such as Freebase and NELL, but also scale up the learning speed along with the number of cores within a processor. So far, the empirical studies show that we could achieve comparable results as the single-thread {\bf TransE} performs by the {\it stochastic gradient descend} (SGD) algorithm, as well as increase the training speed multiple times via adapting the {\it batch gradient descend} (BGD) algorithm for {\it MapReduce} paradigm.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.