This paper describes an embedded FFT processor where the higher radices butterflies maintain one complex multiplier in its critical path. Based on the concept of a radix-r fast Fourier factorization and based on the FFT parallel processing, we introduce a new concept of a radix-r Fast Fourier Transform in which the concept of the radix-r butterfly computation has been formulated as the combination of radix-2α/4β butterflies implemented in parallel. By doing so, the VLSI butterfly implementation for higher radices would be feasible since it maintains approximately the same complexity of the radix-2/4 butterfly which is obtained by block building of the radix-2/4 modules. The block building process is achieved by duplicating the block circuit diagram of the radix-2/4 module that is materialized by means of a feed-back network which will reuse the block circuit diagram of the radix-2/4 module. 1. Introduction For the past decades, the main concern of the researchers was to develop a fast Fourier transform (FFT) algorithm in which the number of operations required is minimized. Since Cooley and Tukey presented their approach showing that the number of multiplications required to compute the discrete Fourier transform (DFT) of a sequence may be considerably reduced by using one of the fast Fourier transform (FFT) algorithms [1], interest has arisen both in finding applications for this powerful transform and for considering various FFT software and hardware implementations. The DFT computational complexity increases according to the square of the transform length and thus becomes expensive for large . Some algorithms used for efficient DFT computation, known as fast DFT computation algorithms, are based on the divide-and-conquer approach. The principle of this method is that a large problem is divided into smaller subproblems that are easier to solve. In the FFT case, dividing the work into subproblems means that the input data can be divided into subsets from which the DFT is computed, and then the DFT of the initial data is reconstructed from these intermediate results. Some of these methods are known as the Cooley-Tukey algorithm [1], split-radix algorithm [2], Winograd Fourier transform algorithm (WFTA) [3], and others, such as the common factor algorithms [4]. The problem with the computation of an FFT with an increasing is associated with the straightforward computational structure, the coefficient multiplier memories’ accesses, and the number of multiplications that should be performed. The overall arithmetic operations deployed in the computation of
References
[1]
W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol. 19, pp. 297–301, 1965.
[2]
P. Duhamel and H. Hollmann, “Split radix FFT algorithm,” Electronics Letters, vol. 20, no. 1, pp. 14–16, 1984.
[3]
S. Winograd, “On computing the discrete Fourier transform,” Proceedings of the National Academy of Sciences of the United States of America, vol. 73, no. 4, pp. 1005–1006, 1976.
[4]
T. Widhe, Efficient Implementation of FFT Processing Elements, Linkoping Studies in Science and Technology no. 619, Linkoping University, Link?ping, Sweden, 1997.
[5]
T. Widhe, J. Melander, and L. Wanhammar, “Design of efficient radix-8 butterfly PEs for VLSI,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '97), pp. 2084–2087, June 1997.
[6]
J. Melander, T. Widhe, K. Palmkvist, M. Vesterbacka, and L. Wanhammar, “An FFT processor based on the SIC architecture with asynchronous PE,” in Proceedings of the IEEE 39th Midwest Symposium on Circuits and Systems, vol. 3, pp. 1313–1316, Ames, Iowa, USA, August 1996.
[7]
M. Jaber and D. Massicotte, “The self-sorting JMFFT algorithm eliminating trivial multiplication and suitable for embedded DSP processor,” in Proceedings of the 10th IEEE International NEWAS Conference, Montreal, Canada, June 2012.
[8]
M. Jaber, “Butterfly processing element for efficient fast Fourier transform method and apparatus,” US Patent No. 6, 751, 643, 2004.
[9]
M. A. Jaber and D. Massicotte, “A new FFT concept for efficient VLSI implementation: part I—butterfly processing element,” in 16th International Conference on Digital Signal Processing (DSP '09), pp. 1–6, Santorini, Greece, July 2009.
[10]
Y. Wang, Y. Tang, Y. Jiang, J. Chung, S. Song, and M. Lim, “Novel memory reference reduction methods for FFT implementations on DSP processors,” IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 2338–2349, 2007.
[11]
S. He and M. Torkelson, “Design and implementation of a 1024-point pipeline FFT processor,” in Proceedings of the IEEE Custom Integrated Circuits Conference, pp. 131–134, May 1998.
[12]
S. He and M. Torkelson, “New approach to pipeline FFT processor,” in Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), pp. 766–770, April 1996.
[13]
E. E. Swartzlander, W. K. W. Young, and S. J. Joseph, “A radix 4 delay commutator for fast Fourier transform processor implementation,” IEEE Journal of Solid-State Circuits, vol. 19, no. 5, pp. 702–709, 1984.
[14]
J. H. McClellan and R. J. Purdy, Applications of Digital Signal Processing, Applications of Digital Signal Processing to Radar, chapter 5, Prentice Hall, New York, NY, USA, 1978.
[15]
H. Liu and H. Lee, “A high performance four-parallel 128/64-point radix-24 FFT/IFFT processor for MIMO-OFDM systems,” in Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS '08), pp. 834–837, Macao, China, December 2008.
[16]
S.-I. Cho, K.-M. Kong, and S.-S. Choi, “Implementation of 128-point fast fourier transform processor for UWB systems,” in Proceedings of the International Wireless Communications and Mobile Computing Conference (IWCMC '08), pp. 210–213, Crete Island, Greece, August 2008.
[17]
M. Garrido, J. Grajal, M. A. Sanchez, and O. Gustafsson, “Pipelined radix-2k feedforward FFT architectures,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 1, pp. 23–32, 2013.
[18]
J. A. Johnston, “Parallel pipeline fast Fourier transformer,” IEE Proceedings F: Communications, Radar and Signal Processing, vol. 130, no. 6, pp. 564–572, 1983.
[19]
E. H. Wold and A. M. Despain, “Pipeline and parallel-pipeline FFT processors for VLSI implementations,” IEEE Transactions on Computers, vol. 33, no. 5, pp. 414–426, 1984.
[20]
S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, “A 2.4-GS/s FFT processor for OFDM-based WPAN applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 57, no. 6, pp. 451–455, 2010.
[21]
M. Jaber, “Parallel multiprocessing for the fast Fourier transform with pipeline architecture,” US Patent No. 6, 792, 441.
[22]
M. A. Jaber and D. Massicotte, “A new FFT concept for efficient VLSI implementation: part II—parallel pipelined processing,” in Proceedings of the 16th International Conference on Digital Signal Processing (DSP 2009), pp. 1–5, Santorini, Greece, July 2009.
[23]
M. Jaber, “Fourier transform processor,” US Patent No. 7, 761, 495.
[24]
M. Jaber, “Address generator for the fast Fourier transform processor,” US-6, 993, 547 82 and European Patent Application Serial no: PCT/USOI /07602.
[25]
E. J. Kim and M. H. Sunwoo, “High speed eight-parallel mixed-radix FFT processor for OFDM systems,” in Proceedings of the IEEE International Symposium of Circuits and Systems (ISCAS '11), pp. 1684–1687, Rio de Janeiro, Brazil, May 2011.
[26]
P. Li and W. Dong, “Computation oriented parallel FFT algorithms on distributed computer,” in Proceedings of the 3rd International Symposium on Parallel Architectures, Algorithms and Programming (PAAP '10), pp. 369–373, Dalian, China, December 2010.
[27]
D. Takahashi, A. Uno, and M. Yokokawa, “An implementation of parallel 1-D FFT on the K computer,” in Proceedings of the IEEE International Conference on High Performance Computing and Communication, pp. 344–350, Liverpool, UK, June 2012.
[28]
R. M. Piedra, “Parallel 1-D FFT implementation with TMS320C4x DSPs,” Texas Instruments SPRA108, Digital Signal Processing Semiconductor Group, 1994.
[29]
http://www.fftw.org/.
[30]
V. Petrov, “MKL FFT performance—comparison of local and distributed-memory implementations,” Intel Report, 2012, http://software.intel.com/en-us/node/165305?wapkw=fft.
[31]
V. I. Kelefouras, G. S. Athanasiou, N. Alachiotis, H. E. Michail, A. S. Kritikakou, and C. E. Goutis, “A methodology for speeding up fast fourier transform focusing on memory architecture utilization,” IEEE Transactions on Signal Processing, vol. 59, no. 12, pp. 6217–6226, 2011.