您的当前位置：首页 An Efficient FFT Architecture for OFDM Communication Systems

An Efficient FFT Architecture for OFDM Communication Systems

来源：划驼旅游

2009 International Conference on Advances in Recent Technologies in Communication and Computing

An Efficient FFT Architecture for OFDM

Communication Systems

K.Harikrishna

Dept. of Electronics and Computer Engineering

Narayana Engineering College

Nellore, AP, India kamathamhk@yahoo.com

Abstract—This paper presents a high level implementation of a high performance FFT for Modulator and Demodulator in OFDM. The design has been coded in VHDL and targeted into Xilinx Virtex-E XCV3200E FPGAs. The hardware requirement comparison shows that our implementation outperforms other implementations of FFT on the same series of FPGAs. Radix-22algorithm has the same multiplicative complexity as the radix-4 algorithm, but retains the butterfly structure of radix-2 algorithm.

T. Rama Rao

Dept. of Telecommunication Engineering

SRM University

Kattankulattur, Chennai, TN, India

ramarao@ieee.org

Eqn (2) represents the first two stages of butterflies with only trivial multiplications in the SFG, as BFI and BFII. Full multipliers are required after the two butterflies in order to compute the product of the decomposed twiddle factor

in eqn (1). Note the order of the twiddle factors is

different from that of radix-4 algorithm.

Applying this CFA procedure recursively to the remaining DFTs of length N/4 in eqn (1), the complete radix-22

Keywords—Fast Fourier Transform (FFT), Orthogonal Decimation-in-frequency (DIF FFT) algorithm is obtained.

The corresponding FFT flow graph for N=16 is shown in Fig. Frequency Division Multiplexing (OFDM), VHDL, Radix-22

Algorithm. 1 where small diamonds represent trivial multiplication by

which involves only real-imaginary swapping and

I. INTRODUCTION sign inversion [3]. The FFT algorithm eliminates the redundant calculation

which is needed in computing Discrete Fourier Transform (DFT) and is thus very suitable for efficient hardware implementation [1]. In addition to computing efficient DFT, the FFT also finds applications in linear filtering, digital spectral analysis and correlation analysis, Ultra Wide Band (UWB) applications, etc. A hardware oriented radix-22algorithm [2] is developed by integrating a twiddle factor decomposition technique in divide and conquer approach to form a spatially regular Signal Flow Graph (SFG). Mapping the algorithm to the cascading delay feedback structure leads to the proposed architecture.

II. ARCHITECTURE AND DESIGN METHODOLOGY A. Radix-22 Decimation in Frequency FFT Algorithm He [3] applied a 3-dimensional linear index map, and Common factor algorithm (CFA) to derive a set of 4 DFTs of length N/4 as,

Fig 1. Radix-22 DIF FFT flow graph for N=16

B. Radix-22 FFT Architecure

Mapping radix-22 DIF FFT algorithm derived to the radix-2

SDF architecture, a new architecture of R22SDF approach is obtained [2]. Fig 2 outlines an implementation of the R22SDF architecture for N=1024, note the similarity of the data-path to R2SDF and the reduced number of multipliers. The implementation uses two types of butterflies; one identical to that in R2SDF, the other contains also the logic to implement the trivial twiddle factor multiplication, as shown in Fig 3 (i), (ii) respectively [2].

On first N/2 cycles, the 2-to-1 multiplexers in the first butterfly module switch to position “0”, and the butterfly is

X(k12k24k3)N41n3(k1[H(k1,k2,n3)WN2k2)]WNn3k3

4n30(1)

where H(k1,k2,k3) is expressed in eqn (2).

(2)

183

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on August 17,2010 at 16:52: UTC from IEEE Xplore. Restrictions apply.

FIG 2. R2SDF pipeline FFT architecture for N=1024

idle. The input data from left is directed to the shift registers

until they are filled. On next N/2 cycles, the multiplexers turn to position “1”, the butterfly computes a 2-point DFT with incoming data and the data stored in the shift registers.

The butterfly outputs Z1(n) and Z1(n + N/2) are computed according to the equations given in eqn (5). Z1(n) is sent to apply the twiddle factors, and Z1(n + N/2) is sent back to the shift registers to be “multiplied” in still next N/2 cycles when the first half of the next frame of time sequence is loaded in. The operation of the second butterfly is similar to that of the first one, except the “distance” of butterfly input sequence are just N/4 and the trivial twiddle factor multiplication has been implemented by real-imaginary swapping with a commutator and controlled add/subtract operations, as in Fig 3(i) (ii), which requires two bit control signal from the synchronizing counter.

III. IMPLEMENTATION IN VHDL

The R22SDF presented above has been fully coded in Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL). Once the design is coded in VHDL, the Modelsim XEIII 6.2c compiler [4] and the Xilinx Foundation ISA Environment 9.1i [5] generate a net-list for FPGA configuration. The net-list can then be downloaded into the FPGA using the same Xilinx tools and Texas Instruments prototyping board.

From the architecture of R22SDF in Fig 2, the butterfly blocks BF2I and BF2II are described as building blocks in VHDL code. Booth multiplication algorithm for signed binary numbers is used for complex multipliers. Thus, the overall latency of the real implementation varies as the processing word length changes [2]. Look-up-table (LUT) based Random Access Memories (RAMs) and Flip-Flops are used to implement feedback memory of the very last stages where are the RAM blocks in the FPGA are used for the rest of the stages. Similarly, LUT-based Read Only Memories (ROMs) are used to implement twiddle ROMs of the very last stages whereas Block RAMs are used for the rest of stages [3]. The FFT is heavily pipelined to achieve as highest clock frequency as possible. Twiddle factors are generated by an external program and embedded to the VHDL code.

The implementation results after implementing in Xilinx Virtex-E XCV3200E FPGA [6] are listed in Table 1(a), 1(b). Table 1(a) shows the implementation results where as table 1(b) shows the timing summary.

(i) BF2I (i) BF2I

The resulting figures show that our implementation

outperforms the other implementations of that kind. Its speed nearly matches that of the Xilinx core but its throughput is more than 3 times higher due to its pipeline nature.

TABLE 1(a)

IMPLEMENTATION RESULT

Logic Utilization No. of Slices No. of Slice Flip flops No. of 4 input LUTs No. of bonded IOBs No. of GCLKs

Used 21075 8295 40258 66 1

Available 32448 6 6 804 4

Utilization % 12% 62% 8% 25%

(ii) BF2II

Fig 3. Butterfly structure for R22SDF FFT

(ii) BF2II

Fig 3. Butterfly structure for R22SDF FFT

184

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on August 17,2010 at 16:52: UTC from IEEE Xplore. Restrictions apply.

TABLE 1(b) TIMING SUMMARY

Minimum period Minimum input arrival time

before clock

Maximum output required time after clock

17.110ns (Maximum Frequency: 58.445MHz)

11.517ns 6.744ns

The hardware requirement of proposed architecture as compared with various approaches is shown in Table 2, where not only the number of complex multipliers and adders but also the memory size is listed for comparison. For easy reading, base-4 logarithm is used whenever applicable. It shows that R22SDF has reached the minimum requirement for both multiplier and the storage, and only second to R4SDC for adder. This makes it an ideal architecture for VLSI implementation of pipeline FFT processors.

Further, the proposed architecture consumes the minimum required amount of multipliers, storage and finds extensive application in OFDM based Communication Systems.

ACKNOWLEDGEMENT

The authors are grateful for the support of the SRM University, Chennai to carry out this research work.

REFERENCES

[1] Mian Sijjad Minallah, Gulistan Raja. Real Time FFT Processor

Implementation. IEEE—ICET 2006 2nd International Conference on Emerging Technologies. Peshawar, Pakistan 13-14, Pages: 192-195, November 2006. [2] S Sukhsawas, K Benkrid. A High-level Implementation of a

High Performance Pipeline FFT on Virtex-E FPGAs. Proceedings of the IEEE Comp. Society Annual Symp. on VLSI Emerging Trends in Systems Design (ISVLSI’04). 0-7695-2097-9/2004.

IV. APPLICATIONS IN OFDM COMMUNICATION SYSTEMS In an OFDM system, the transmitter and receiver blocks contain the FFT modules. Our FFT architecture effectively fits into the system since it has a minimum required time period of 17.110ns (table 1(b)).

An OFDM carrier signal is the sum of a number of orthogonal sub-carriers, with baseband data on each sub-carrier being independently modulated commonly using some type of quadrature amplitude modulation (QAM) or phase-shift keying (PSK) [7]. This composite baseband signal is typically used to modulate a main RF carrier. s[n] is a serial stream of binary digits. By inverse multiplexing, these are first demultiplexed into N parallel streams, and each one mapped to a (possibly complex) symbol stream using some modulation constellation (QAM, PSK, etc.). Note that the constellations may be different, so some streams may carry a higher bit-rate than others.

The receiver picks up the signal r(t), which is then

quadrature-mixed down to baseband using cosine and sine [3] S. He and M. Torkelson. A new approach to pipeline FFT waves at the carrier frequency. This also creates signals processor, 10th Int. Parallel Processing Symp. (IPPS’96), p. centered on 2fc, so low-pass filters are used to reject these. The 766–770, 1996. baseband signals are then sampled and digitized using

[4] Modelsim manual. Mentor Graphics Corporation.

analogue-to-digital converters (ADCs), and a forward FFT is

http://support.xilinx.com.

used to convert back to the frequency domain [7].

[5] Xilinx, Inc. http://www.xilinx.com/.

V. CONCLUSIONS

In this paper, a hardware-oriented radix-22 algorithm is derived. Based on this algorithm, new, efficient pipeline FFT architecture, the R22SDF architecture, is put forward.

TABLE 2

HARDWARE REQUIREMENT COMPARISON

R2MDC [8] R2SDF [9] R4SDF [10] R4MDC [8] R4SDC [11] R2SDF [3]

[6] Xilinx, Inc. High-Performance 1024-Point Complex FFT/IFFT

V2.0. http://www.xilinx.com/ipcenter/. [7] http://www.wikipedia.org/.

[8] L.R. Rabiner and B. Gold. Theory and Application of Digital

Signal Processing. Prentice-Hall, Inc., 1975. [9] E.H. Wold and A.M. Despain. Pipeline and parallel-pipeline

FFT processors for VLSI implementation. IEEE Trans. Comput., C-33(5):414–426, May 1984. [10] A.M. Despain. Fourier transform computer using CORDIC

iterations. IEEE Trans. Comput., C-23(10):993–1001, Oct.1974.

[11] G. Bi and E. V. Jones. A pipelined FFT processor for word

Multiplier # 2(log4N-1) 2(log4N-1) log4N-1 3(log4N-1) log4N-1 log4N-1

Adder # 4log4N 4log4N 8log4N 8log4N 3log4N 4log4N

Memory size 3N/2 -2 N-1 N-1 5N/2 -4 2N-2 N-1

sequential data. IEEE Trans. Acoust., Speech, Signal Processing, 37(12):1982–1985, Dec. 19.

185

Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on August 17,2010 at 16:52: UTC from IEEE Xplore. Restrictions apply.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文