oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 38 )

2018 ( 364 )

2017 ( 378 )

2016 ( 414 )

Custom range...

Search Results: 1 - 10 of 208294 matches for " L. Tosoratto "
All listed articles are free for downloading (OA Articles)
Page 1 /208294
Display every page Item
Many-core applications to online track reconstruction in HEP experiments
S. Amerio,D. Bastieri,M. Corvo,A. Gianelle,W. Ketchum,T. Liu,A. Lonardo,D. Lucchesi,S. Poprocki,R. Rivera,L. Tosoratto,P. Vicini,P. Wittich
Computer Science , 2013, DOI: 10.1088/1742-6596/513/1/012002
Abstract: Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the parallel devices.
Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments
A. Gianelle,S. Amerio,D. Bastieri,M. Corvo,W. Ketchum,T. Liu,A. Lonardo,D. Lucchesi,S. Poprocki,R. Rivera,L. Tosoratto,P. Vicini,P. Wittich
Computer Science , 2013, DOI: 10.1109/NSSMIC.2013.6829552
Abstract: Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. We measure performance of different architectures (Intel Xeon Phi and AMD GPUs, in addition to NVidia GPUs) and different software environments (OpenCL, in addition to NVidia CUDA). Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the many-core devices.
NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs
R. Ammendola,A. Biagioni,O. Frezza,G. Lamanna,A. Lonardo,F. Lo Cicero,P. S. Paolucci,F. Pantaleo,D. Rossetti,F. Simula,M. Sozzi,L. Tosoratto,P. Vicini
Computer Science , 2013, DOI: 10.1088/1748-0221/9/02/C02023
Abstract: NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
A. Lonardo,F. Ameli,R. Ammendola,A. Biagioni,O. Frezza,G. Lamanna,F. Lo Cicero,M. Martinelli,P. S. Paolucci,E. Pastorelli,L. Pontisso,D. Rossetti,F. Simeone,F. Simula,M. Sozzi,L. Tosoratto,P. Vicini
Computer Science , 2014,
Abstract: While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU system needs careful characterization of all subsystems along data stream path. The networking subsystem results in being the most critical one in terms of absolute value and fluctuations of its response latency. Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network Interface Card (NIC) design featuring a configurable and extensible set of network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler GPU memories. NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE (10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency KM3link - channels, but its modularity allows for a straightforward inclusion of other link technologies. To avoid host OS intervention on data stream and remove a possible source of jitter, the design includes a network/transport layer offload module with cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division Multiplexing and APElink protocols. After NaNet architecture description and its latency/bandwidth characterization for all supported links, two real world use cases will be presented: the GPU-based low level trigger for the RICH detector in the NA62 experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino telescope.
High-speed data transfer with FPGAs and QSFP+ modules
R. Ammendola,A. Biagioni,G. Chiodi,O. Frezza,F. Lo Cicero,A. Lonardo,R. Lunadei,P. S. Paolucci,D. Rossetti,A. Salamon,G. Salina,F. Simula,L. Tosoratto,P. Vicini
Physics , 2011, DOI: 10.1088/1748-0221/5/12/C12019
Abstract: We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
Roberto Ammendola,Andrea Biagioni,Ottorino Frezza,Francesca Lo Cicero,Alessandro Lonardo,Pier Stanislao Paolucci,Davide Rossetti,Andrea Salamon,Gaetano Salina,Francesco Simula,Laura Tosoratto,Piero Vicini
Physics , 2011, DOI: 10.1088/1742-6596/331/5/052029
Abstract: We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems
Roberto Ammendola,Andrea Biagioni,Ottorino Frezza,Francesca Lo Cicero,Pier Stanislao Paolucci,Alessandro Lonardo,Davide Rossetti,Francesco Simula,Laura Tosoratto,Piero Vicini
Physics , 2013, DOI: 10.1088/1742-6596/513/5/052002
Abstract: Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.
The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture
Andrea Biagioni,Francesca Lo Cicero,Alessandro Lonardo,Pier Stanislao Paolucci,Mersia Perra,Davide Rossetti,Carlo Sidore,Francesco Simula,Laura Tosoratto,Piero Vicini
Computer Science , 2012,
Abstract: One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processor's tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.
'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems
Roberto Ammendola,Andrea Biagioni,Ottorino Frezza,Francesca Lo Cicero,Alessandro Lonardo,Pier Stanislao Paolucci,Davide Rossetti,Francesco Simula,Laura Tosoratto,Piero Vicini
Computer Science , 2013,
Abstract: Many tile systems require techniques to be applied to increase components resilience and control the FIT (Failures In Time) rate. When scaling to peta- exa-scale systems the FIT rate may become unacceptable due to component numerosity, requiring more systemic countermeasures. Thus, the ability to be fault aware, i.e. to detect and collect information about fault and critical events, is a necessary feature that large scale distributed architectures must provide in order to apply systemic fault tolerance techniques. In this context, the LO|FA|MO approach is a way to obtain systemic fault awareness, by implementing a mutual watchdog mechanism and guaranteeing fault detection in a no-single-point-of-failure fashion. This document contains specification and implementation details about this approach, in the shape of a technical report.
A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report
Roberto Ammendola,Andrea Biagioni,Ottorino Frezza,Werner Geurts,Gert Goossens,Francesca Lo Cicero,Alessandro Lonardo,Pier Stanislao Paolucci,Davide Rossetti,Francesco Simula,Laura Tosoratto,Piero Vicini
Computer Science , 2013,
Abstract: This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental heterogeneous many-core hardware platforms; 2- the integration and test of the experimental hardware heterogeneous many-core platform QUoNG, based on the APEnet+ custom interconnect; 3- the design of a Software-Programmable Distributed Network Processor architecture (DNP) using ASIP technology; 4- the initial stages of design of a new DNP generation onto a 28nm FPGA. These developments were performed in the framework of the EURETILE European Project under the Grant Agreement no. 247846.
Page 1 /208294
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.