# Anuj Shaw, Md. Azeem Hafeeez, Dr. S. Malarvizhi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1900-1903 VLSI Architecture Of MIMO Antenna System Based On Square **Root Algorithm**

# Anuj Shaw<sup>1</sup>, Md. Azeem Hafeeez<sup>2</sup>, Dr. S. Malarvizhi<sup>3</sup>

1(Department of Electronics and communication, SRM University, Chennai - 603203 2 (Department of Electronics and communication, SRM University, Chennai - 603203 3 (Head of Department of Electronics and communication, SRM University, Chennai - 603203

## ABSTRACT

This paper proposes strategy based VLSI architecture designed for estimating power consumption for the pseudo inverse of augmented channel matrix used in MIMO system. MIMO technology involves highly complex signal processing at the receiver end which is directly related with increased in power consumption. The VLSI architecture presented in the paper is based on Square Root Decoder algorithm. A simulationbased analysis has been carried out to evaluate the QR decomposition of 2x2 MIMO systems. The PINV module of Square Root decoder algorithm has been designed and simulated in Xilinx System Generator tool. The total power consumed by Pseudo Inverse module is obtained as 239mW.

### Keyword: MIMO, SIMULINK, SOUARE ROOT DECODER, VBLAST, VLSI.

## 1. INTRODUCTION

MULTIPLE-INPUT MULTIPLE-OUTPUT (MIMO) wireless communications technology has gained considerable attention in recent years due to its potential to significantly increase throughput compared to traditional single-input single-output (SISO) technology. In the modern era of communications, the ability to send large volumes of data is crucial. The weight and size are the bottleneck of portable electronic systems which is directly affects power and cost impacts.

In CMOS, sources of power consumption include short circuits, leakage currents and switching. The switching or dynamic power equation is described as:

 $P = k C_L V^2 f$  (1) where k represents the switching activity factor C<sub>L</sub> the total physical capacitance, V the supply voltage and f the frequency of operation. Algorithmic power optimization includes reduction of both physical capacitance and switching activity factor. Physical capacitance can be reduced by reducing the area of hardware through efficient implementation [5]. Switching activity reduction either comes from area reduction that reduces the number of nodes or from reducing switching frequency of nodes. One of the algorithmic optimizations is reducing redundancy [6] from a design. By reducing the redundant operations or hardware, unnecessary switching of the

Clock as well as other signals can be avoided, thereby saving power consumption.

V-BLAST (Vertical Bell Labs Layered Space Time) is an architecture proposed by Bell Labs to use the added degrees of freedom to send separate signals on each transmit antenna and jointly decode all of them with the receive antenna array [1]. This, theoretically, increases the capacity of the channel linearly with the number of transmit antennas as long as the receive array is at least as large as the transmit array. In VBLAST, the bottlenecks are repeated pseudo inverse calculation required to compute optimal ordering and nulling vectors.

The only VLSI architecture for computing a pseudo inverse module through the square root algorithm has been devised in [4] in which a 3-CORDIC based supercell proposed in [2] has been used. The architecture presented in [4] is a straight forward implementation without regard to area and power optimizations. The architecture proposed in this paper for pseudo inverse computation exploits the parallelism inherent in Jacobi's rotation and is different and better from the architecture in [4] in that it uses two independent and generic pipelined CORDIC units instead of three [4], thereby saving area and power. The scale correction in CORDIC units in the proposed architecture is carried out through 4 shifters and an adder instead of a 16-bit multiplier thereby saving area and power in scale correction.

## 2. MIMO SYSTEM MODEL

In MIMO communication systems, more than one antenna is used at the transmitter to transmit symbols and more than one antenna is used at the receiver to receive them. In the diagram of Figure 1, spatial multiplexing is used and M transmit antennas transmit M symbols simultaneously while each symbol is received by the N receive antennas. Each symbol transmitted is received by all the receiving antennas thus making multiple channel paths. These paths, if combined, make a matrix of channel elements.

Each symbol makes N channel paths and is received by N receive antennas. Since there are M symbols transmitted simultaneously, the channel becomes a NxM matrix

## Anuj Shaw, Md. Azeem Hafeeez, Dr. S. Malarvizhi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1900-1903



Figure 1: MIMO system model

If  $s = (s_1, s_2, \dots, s_M)^T$  denotes the symbol vector transmitted and a vector  $\mathbf{r} = (r_1, r_2, \dots, r_M)^T$  for received signal, H denotes the NxM channel matrix between the receive and transmit antenna array, and v denotes the AWGN independent and identically distributed noise vector, then the corresponding receive vector r at the input of the MIMO receiver is given by:

 $r = Hs + v \qquad (2)$ 

#### 2.1 Square Root Algorithm for V-BLAST

In VBLAST, successive nulling and cancellation is used to detect the transmitted symbols. The channel matrix is first inverted and then reordered to detect that symbol first which has the highest post detection Signal to Noise ratio (SNR). This corresponds to the row of the inverted channel matrix having minimum Euclidean distance. The symbol after detection is subtracted from the received symbol vector. The corresponding column of the H matrix is zeroed down and the process is repeated with the deflated channel matrix until all the symbols are detected. In this research, MMSE is used for channel inversion. The pseudo inverse of a generic matrix H is given by

 $H^{+} = (H^{*}H)^{-1}H^{*} = R^{-1}Q^{*}$  (3)

The pseudo inverse can be computed using either singular value decomposition (SVD) or QR decomposition. The square root algorithm [3] is developed for MMSE-VBLAST and computes the QR decomposition of the augmented channel matrix.

$$\begin{bmatrix} H^{N \times M} \\ \sqrt{\alpha} I^{M \times M} \end{bmatrix} = QR = \begin{bmatrix} Q_a^{N \times M} \\ \chi \end{bmatrix} R^{M \times M}$$
(4)

Here x denotes the entries that are not relevant. The algorithm first decomposes the channel matrix into QR  $a_r + j_{a1}$  and then computes  $P^{1/2} = R^{-1}$ . Once  $Q_a$  and  $P^{1/2}$  are computed, the repeated pseudo inverse can be avoided. The algorithm is described below:

1) Compute  $Q_a$  and  $P^{1/2}$  using equation (5):

$$B = \begin{bmatrix} 1 & H_i P_{|i-1}^{1/2} \\ 0^{M_x 1} & P_{|i-1}^{1/2} \\ -e_i^{N_x 1} & B_{i-1} \end{bmatrix} X = \begin{bmatrix} x & 0^{1xM} \\ x & P_{|i-1}^{1/2} \\ x & B_i \end{bmatrix}$$

(5)

 $B\varepsilon_i = X$ 

Here i represent iterations and i = 1... N. B is the prearray matrix and has dimension of  $(1+M+N) \times (1+M)$  and  $P_{|0}^{1/2} = \frac{1}{\sqrt{\alpha}}I$ ,  $B_0 = 0_{NXM} e_i^{N\times 1}$ is the i-th unit vector of dimension N and  $\theta_i$  is any unitary transformation (Jacobi rotation) that block lower triangularizes the pre-array denoted by M. After N iterations,

$$P_0^{1/2} = P_{|N}^{1/2} and Q_a = B_N ag{6}$$

Equations (5) and (6) are used in pseudo inverse computation. For the rest of the algorithm, the reader is referred to [3]. 2.2 CORDIC

In hardware, an efficient way of accomplishing a Givens rotation is using a CORDIC. CORDIC implements the rotation equations:

$$x' = \cos\theta (x - y \tan\theta)$$
  

$$y' = \cos\theta (y + x \tan\theta)$$
(7)

When angles are selected such that:  $\tan \theta = 2^{-i}$  (8)

In this case, multiplication by simply becomes a right shift. When several of these CORDIC processing elements are used together, one can rotate by an arbitrary angle by rotating by a combination of allowed angles:

$$\theta = tan^{-1}2^{-i} \quad (9)$$

For a rotation using a fixed number of iterations the terms turn out to be a constant. The constant scaling value can be seen in [6] for up to 15 iterations. For our design we need the CORDIC to first rotate a vector to the nulling axis and then remember the angle rotated to following vectors can be rotated to the same angle. These two modes of operation are known as vectoring and rotation, respectively. The design of our CORDIC implemented the rotation equations (7) using the constraint on angles in (9) such that our final result nullsy'. We also needed to design a CORDIC that operates in vectoring and rotation mode.

In order to implement the equations, we used shifters and adders to do the bulk of the work along with simple decision logic. Each processing element receives two input vectors and finds their sign. It must now decide based on their signs whether to rotate up or down.



Figure 2: VLSI architecture for Pseudo Inverse [4]

## Anuj Shaw, Md. Azeem Hafeeez, Dr. S. Malarvizhi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1900-1903



Figure 3: Conventional MAC Module

The main design platform used during this project was SIMULINK. It is a tool-flow that enables the use and creation of high level block diagrams which can be used for simulation, emulation, and hardware description. The blocks used in this design were from the Xilinx block set. Based on Xilinx block set the architecture of Pseudo Inverse is designed and can be seen as follows:



Figure 4: VLSI architecture for Pseudo Inverse designed by Xilinx Blocksets.

## 3. RESULTS

Every block was tested extensively in simulation. Testing was done by performing the algorithm for that block in MATLAB, Xilinx System Generator and Xilinx ISE to obtain the expected values given certain data. The blocks were then given the same inputs as the algorithm was given in MATLAB, simulation was run, and the outputs of the blocks were reviewed. All of the blocks performed as desired, given several known test inputs of a wide range. The total power consumed by Pseudo Inverse module is obtained as 239mW.

| Supply Summary   |         | Total       | Dynamic     | Quiesce   |  |
|------------------|---------|-------------|-------------|-----------|--|
| Source           | Voltage | Current (A) | Current (A) | Current ( |  |
| Vecint           | 1.200   | 0.110       | 0.093       | 0.1       |  |
| Vecaux           | 2.500   | 0.016       | 0.001       | 0.1       |  |
| Vcco25           | 2.500   | 0.027       | 0.025       | 0.1       |  |
|                  |         |             |             |           |  |
|                  |         | Total       | Dynamic     | Quiesce   |  |
| Supply Power (W) |         | 0.239       | 0.178       | 0.1       |  |

Figure 5: Power Analysis of PINV Module



Figure 6: Device family and package used for Pseudo inverse Module

It is clear from Fig. 7 (b) that in Pseudo Inverse Module, CORDIC has used two third of the chip area. However area use by Flip-Flops, Look Up table etc. can be viewed by the figure provided below in Fig. 7(a).

| Device Utilization Summary                     |       |           |             |         |  |  |  |  |
|------------------------------------------------|-------|-----------|-------------|---------|--|--|--|--|
| Logic Utilization                              | Used  | Available | Utilization | Note(s) |  |  |  |  |
| Number of Slice Flip Flops                     | 1,703 | 7,168     | 23%         |         |  |  |  |  |
| Number of 4 input LUTs                         | 1,714 | 7,168     | 23%         |         |  |  |  |  |
| Number of occupied Slices                      | 1,184 | 3,584     | 33%         |         |  |  |  |  |
| Number of Slices containing only related logic | 1,184 | 1,184     | 100%        |         |  |  |  |  |
| Number of Slices containing unrelated logic    | 0     | 1,184     | 0%          |         |  |  |  |  |
| Total Number of 4 input LUTs                   | 1,911 | 7,168     | 26%         |         |  |  |  |  |
| Number used as logic                           | 1,639 |           |             |         |  |  |  |  |
| Number used as a route-thru                    | 197   |           |             |         |  |  |  |  |
| Number used as Shift registers                 | 75    |           |             |         |  |  |  |  |
| Number of bonded IOBs                          | 57    | 173       | 32%         |         |  |  |  |  |
| Number of RAMB16s                              | 1     | 16        | 6%          |         |  |  |  |  |
| Number of MULT18X18s                           | 4     | 16        | 25%         |         |  |  |  |  |
| Number of BUFGMUXs                             | 1     | 8         | 12%         |         |  |  |  |  |
| Average Fanout of Non-Clock Nets               | 1.94  |           |             |         |  |  |  |  |



Figure 7: (a) Area utilization summary for Pseudo inverse Module (b) layout of Pseudo Inverse module

#### 4. CONCLUSION

Instead of QR triangular array that employs large number processors, single processor based VLSI architecture is proposed for V-BLAST detection. The quantization scheme of the square root algorithm for V-BLAST detection is presented considering the tradeoff between the hardware complexity and the performance. The proposed

## Anuj Shaw, Md. Azeem Hafeeez, Dr. S. Malarvizhi / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1900-1903

architecture is implemented in SIMULINK used by special sets of XILINX block sets. While the full Square Root algorithm was not designed, the major computationally complex parts were. Finding  $p^{1/2}$  enables one to easily perform SIC and subsequently decode information streams in V-BLAST architecture.

The future work will be addressed to design and implement other module of square root algorithm like SORT and NULL for power analysis and area utilization.

## ACKNOWLEDGEMENT

The author would like to thank the contributions of the faculty of the SRM University.

## REFERENCES

## **Proceedings Papers:**

- [1] G. J. Foshini: Layered space-time architecture for wireless communication in a fading environment when using multielement antennas. Bell Labs technical Journal, pages 41-57, Autumn 1996
- [2] R.Andraka: A Survey of CORDIC algorithm for FPGAs, FPGA'98. Proceeding of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays, Feb. 22-24, 1998, Monterey, CA. pp191-2000
- [3] Babak Hassibi: An Efficient Square-Root Algorithm for BLAST", http://mars.belllabs.com/
- [4] Z.Guo and P. Nilson: A VLSI implementation of MIMO detection for future wireless communications, in Proc. IEEE PIMRC'03, vol. 3, 2003, pp. 2852-2856
- [5] M.Pedram,"Power Minimisation in IC Design: Principles and Applications", ACM Transactions on Design Automation of Electronic Systems, vol. 1, no. 1, pp. 3-56, January 1996.
- [6] X. Wu, M. Pedram, L. Wang, "Multi code state assignment for low power design", Circuits, Devices and Systems, IEEE Proceedings vol. 147, Issue 5, Oct. 2000 Page(s):271 - 275
- [7] Area & Power Efficient VLSI Architecture for Computing Pseudo Inverse of Channel Matrix in a MIMO Wireless System, Zahid Khan, Tughrul Arslan, John S. Thompson, Ahmet T. Erdogan, Proceedings of the 19th International Conference on VLSI Design (VLSID'06).

## **Journal Papers:**

[8] Nirmalendu Bikas Sinha, R.Bera and M.Mitra: Capacity and V-BLAST techniques for MIMO wireless channel,

- Journal of Theoretical and Applied Information Technology, 2005-2010 JATIT.
- [9] VLSI Design of the Square-Root Algorithm for a Linear MMSE V-BLAST Detector with SIC. "Akin Olugbade, Prof. Borivoje Nikolic, Vinayak Nagpal", SUPERB-CSIS 2008.