全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Streaming High-Throughput Linear Sorter System with Contention Buffering

DOI: 10.1155/2011/963539

Full-Text   Cite this paper   Add to My Lib

Abstract:

Popular sorting algorithms do not translate well into hardware implementations. Instead, hardware-based solutions like sorting networks, systolic sorters, and linear sorters exploit parallelism to increase sorting efficiency. Linear sorters, built from identical nodes with simple control, have less area and latency than sorting networks, but they are limited in their throughput. We present a system composed of multiple linear sorters acting in parallel to increase overall throughput. Interleaving is used to increase bandwidth and allow sorting of multiple values per clock cycle, and the amount of interleaving and depth of the linear sorters can be adapted to suit specific applications. Contention for available linear sorters in the system is solved through the use of buffers that accumulate conflicting requests, dispatching them in bulk to reduce latency penalties. Implementation of this system into a field programmable gate array (FPGA) results in a speedup of 68 compared to a MicroBlaze processor running quicksort. 1. Introduction Sorting is an essential function for many scientific and data processing applications. Extensive research has optimized multiple software sorting algorithms for general-purpose computing, thereby increasing application performance. The need for higher performance has also motivated the migration of sorting algorithms into specialized hardware to exploit spatial parallelism. Nonetheless, many of the assumptions made to increase performance on a general-purpose processor do not hold for custom hardware implementations. Thus, reconfigurable applications typically do not enjoy the benefits of software-based sorting algorithms. When directly translated into hardware, software algorithms can quickly degrade into a series of data retrievals, comparisons, swaps, and writes; all problems that can be magnified in systems with low processor speeds, limited storage, disabled caches, and high-latency memory access times. A conventional sorting system involves data acquisition and collection, processing, dynamic and long-term storage, and sorting and dispatch. Many hardware approaches use linear sorting to keep a sorted list with in-order insertions but fail to optimize throughput, the rate at which data elements are processed. The system’s sorting throughput is limited by the weakest link, which in many cases is the sorting stage. Even though parallelism can speed up hardware-based sorting, sorted results are generally attained only one at a time. Thus, a hardware-based linear sorter would only achieve a maximum throughput of one sorted

References

[1]  K. Batcher, “Sorting networks and their applications,” in Proceedings of the AFIPS Spring Joint Computer Conference, vol. 32, pp. 307–314, ACM, 1968.
[2]  E. W. Dijkstra, “A heuristic explanation of Batcher's Baffler,” Science of Computer Programming, vol. 9, no. 3, pp. 213–220, 1987.
[3]  J. Ortiz and D. Andrews, “A configurable high-throughput linear sorter system,” in Proceedings of the 7th IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW '10), Atlanta, Ga, USA, 2010.
[4]  D. E. Knuth, The Art of Computer Programming 3. Sorting and Searching, Addison-Wesley Longman, Amsterdam, The Netherlands, 2nd edition, 1998.
[5]  M. de Prycker, Asynchronous Transfer Mode: Solution for Broadband ISDN, Ellis Horwood, Upper Saddle River, NJ, USA, 1991.
[6]  R. O. Onvural, Asynchronous Transfer Mode Networks: Performance Issues, Artech House, Norwood, Ma, USA, 2nd edition, 1995.
[7]  R. Kannan, “A pipelined single-bit controlled sorting network with O(Nlog2 N) bit complexity,” in Proceedings of the 16th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM '97), vol. 1, pp. 253–260, April 1997.
[8]  J. Martínez, R. Cumplido, and C. Feregrino, “An FPGA-based parallel sorting architecture for the Burrows wheeler transform,” in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig '05), pp. 7–13, September 2005.
[9]  N. Tabrizi and N. Bagherzadeh, “An ASIC design of a novel pipelined and parallel sorting accelerator for a multiprocessor-on-a-chip,” in Proceedings of the 6th International Conference on ASIC (ASICON '05), vol. 1, pp. 46–49, October 2005.
[10]  D. Castells-Rufas, M. Monton, L. Ribas, and J. Carrabina, “High performance parallel linear sorter core design,” GSPx, The International Embedded Solutions Event, September 2004.
[11]  C. Leiserson, “Systolic priority queues,” in Proceedings of the Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pp. 199–214, 1979.
[12]  B. Parhami and D. M. Kwai, “Data-driven control scheme for linear arrays: application to a stable insertion sorter,” IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 1, pp. 23–28, 1999.
[13]  Y. Zhang and S. Q. Zheng, “Design and analysis of a systolic sorting architecture,” in Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, pp. 652–659, October 1995.
[14]  M. Bednara, O. Beyer, J. Teich, and R. Wanka, “Hardware-supported sorting: design and Tradeoff analysis,” in System Design Automation: Fundamentals, Principles, Methods, Examples, p. 97, 1979.
[15]  C.-S. Lin and B.-D. Liu, “Design of a pipelined and expandable sorting architecture with simple control scheme,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '02), vol. 4, pp. 217–220, 2002.
[16]  R. Perez-Andrade, R. Cumplido, F. M. Del Campo, and C. Feregrino-Uribe, “A versatile linear insertion sorter based on a FIFO scheme,” in Proceedings of the IEEE Computer Society Annual Symposium on VLSI: Trends in VLSI Technology and Design (ISVLSI '08), pp. 357–362, April 2008.
[17]  C. Y. Lee and J. M. Tsai, “A shift register architecture for high-speed data sorting,” Journal of VLSI Signal Processing, vol. 11, no. 3, pp. 273–280, 1995.
[18]  A. A. Colavita, A. Cicuttin, F. Fratnik, and G. Capello, “SORTCHIP: a VLSI implementation of a hardware algorithm for continuous data sorting,” IEEE Journal of Solid-State Circuits, vol. 38, no. 6, pp. 1076–1079, 2003.
[19]  R. Marcelino, H. Neto, and J. Cardoso, “Sorting units for FPGA-based embedded systems,” in Proceedings of the IFIP 20th World Computer Congress, TC10 Working Conference on Distributed and Parallel Embedded Systems (DIPES '08), vol. 271, pp. 11–22, Springer, Milano, Italy, September 2008.
[20]  L. Ribas, D. Castells, and J. Carrabina, “A linear sorter core based on a programmable register file,” in Proceedings of the 19th Conference on Design of Circuits and Integrated Systems (DCIS '04), pp. 635–640, 2004.
[21]  J. Ortiz, “A reconfigurable superscalar processor architecture for FPGA-based designs,” in Proceedings of the International Conference on Computer Design (CDES '09), pp. 211–217, July 2009.
[22]  J. Ortiz, Synthesis techniques for semi-custom dynamically reconfigurable superscalar processors, Ph.D. dissertation, University of Kansas, Department of Electrical Engineering and Computer Science, 2009.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133