Vol. 2, Issue 3, May-Jun 2012, pp. 198-203

Toggle Rate Estimation Technique for 4-Input LUT based FPGA Circuits

Ms. Anju P.J.

M.Tech Scholar VLSI Design Amrita School of Engg. Coimbatore, India Mr. Ramesh S.R.

Assistant Professor Department. of ECE Amrita School of Engg. Coimbatore, India

Abstract : Power dissipation of a VLSI chip was not a concern in the past. A lot of effort has been put into synthesis for speed and area, power optimization has been explored only recently. Since power estimation is the foremost step in any low power design; this work concentrates on developing a technique that estimates the toggle rates for circuits implemented on a 4-input LUT based FPGA using probabilistic technique. The aim of this work is to develop a toggle rate estimation technique with improved accuracy by incorporating the effects of correlation of logic signals and glitches. This approach is tested on a set of MCNC circuits. The ABC logic synthesis system is used for LUT mapping.

*Keywords-* power estimation, field-programmable gate arrays (FPGAs), toggle rate, vectorless, Look Up Tables(LUTs).

## I. INTRODUCTION

FPGAs consists of large number of logic blocks, that implement the logic part of digital circuits, and a configurable routing network, which implements the interconnections as shown in Figure 1. FPGAs use LUTs to implement logic; the function implemented by each LUTs can be configured using configuration bits. Each logic element contains a flip-flop which can be optionally used to register the lookup-table output. FPGAs contain programmable interconnections which are also controlled by configuration bits. The configurable logic elements and the configurable routing fabric make the FPGA flexible. This flexibility comes at the cost of increased power. FPGAs dissipate static and dynamic power. Static power dissipation increases as the feature size decreases or when the LUT size increases. The configuration bits in an FPGA dissipate static power. An ASIC does not contain any such programming bits, and thus would consume significantly less static power than an FPGA.

Dynamic power dissipation is directly proportional to the toggle rate of the signals and it can be as much as double when glitches are present. Connections between gates are associated with some amount of parasitic capacitance due to the metal wire used for implementation. Also the programmable switches in an FPGA significantly increase the parasitic capacitance. Charging and discharging these parasitic capacitances consume dynamic power. Figure 2. shows the

breakdown of static and dynamic core power in Xilinx Spartan-3 FPGAs. Recent studies show that, for typical applications, an FPGA consumes 12 times more power than the corresponding ASIC.



Figure 1. A generic FPGA

Figure2. Breakdown of dynamic & Static power consumption in Xilinx Spartan-3 FPGAs

## II. BACKGROUND

Low power VLSI design problem has to be addressed in three stages:

- Estimation
- Analysis

## > Optimization

Power estimation can be seen as a process of determining accurately the power consumed by a circuit. Three parameters are to be considered at this stage: Efficiency, Accuracy and Availability. Efficiency can be seen as how fast is the estimation technique capable of determining the power. Accuracy of results depends on the availability of design information. An estimation technique becomes more accurate when lower level details are available. Analysis phase can be viewed as a problem of extracting information on the various categories of power dissipation in the circuit. For example at this stage the tool should be able to distinguish between static power and dynamic power consumed by the circuit. Also it should be possible to separate the glitch power from the rest circuit power dissipations. Estimation and Analysis phases provide foundation for power optimization. Power optimization deals with generating a best optimal design solution in terms of power without violating the design objectives and constraints. Since power estimation is the foremost step in any low power design; this work will be

## Vol. 2, Issue 3, May-Jun 2012, pp. 198-203

concentrating on developing a technique that estimates the toggle rate of signals in a circuit.

Power estimation technique can be broadly classified into two:

- Simulation based
- Probabilistic or Statistical.

## A. Simulation Based Power Analysis

In this method we run logic simulation with a set of input vectors [8]. Then the toggle count of each net is computed. Dynamic power of the circuit is computed from the toggle rate. In figure3. inputs A, B, C and D are assigned signals with toggles 2, 1, 2 and 0 respectively ( The simulation is performed for four iterations). As per the functionality of each gate their output is calculated from the inputs available. Hence from figure4. it is clear that the signals E, F and G have toggles of 1, 2 and 2.

## Advantage

• Accuracy of estimation results is very high

## Disadvantage

- Simulation time is very high
  - Simulation results are input pattern dependent



Figure 3. Propagation of simulation vectors

Figure 4.Propagation of statistical parameters

#### B. Probabilistic Power Analysis

In this method a logic signal is viewed as a random one-zero process with certain statistical parameters [1]. In this method we never know the exact time of occurrence of the signal. Instead of using exact historical time domain signal representation, certain characteristic quantities of signal over a period of time are used for computation of power. This method is also called vectorless activity estimation technique [3]. It involves estimating the switching activity of each node based on the switching activities of the inputs and the logic function of that node. Quantities like signal probabilities[4], transition densities[4], spatial correlation[7], temporal correlation[7], spatiotemporal correlation, probability density function, cumulative probability density function, etc. can be used as statistical parameters.

## Advantage

• Estimates power very quickly

#### Disadvantage

• Results are less accurate

For example consider static probability (Probability that a signal is at logic 1) and transition density (average switching

rate of a node) as the only parameters used in a probabilistic power analysis technique. Figure 4 shows the propagation of these two parameters in a circuit using results in [4]. The static probability and transition density are equivalent to the simulated vectors in figure 3 Here we can see that input vectors (waveforms) to terminals A, B, C and D are different. But when we convert it into equivalent probabilistic patterns we see that static probability of node A, B and C are the same. Still Signals of node A and B can be differentiated since their transition densities are different. We can see that the statistical quantities of node A and C (figure 4) are the same but the equivalent patterns in simulated approach (figure3.) are different. This means that when we consider just the two statistical parameters i.e., static probability and transition density; the signals 0110 and 1001 are one and the same. The considered model could not distinguish between these two signals. This example shows how a probabilistic approach is less accurate than the simulation based activity estimation.

So the aim of this work is to improve the accuracy of the probabilistic power estimation approach so that the accuracy can be improved and the simulation time can be reduced.

## C. Factors to be considered in Toggle Rate

The important factors to be considered in toggle rate estimation include Temporal correlation[3], Spatial correlation[6][3], Spatio-Temporal dependencies[19], Circuit delays[6][14], Glitches[6][11][12][13] and Computational efficiency[10]. Probability-based estimates are less accurate than simulation-based estimates for two reasons. One reason is that they ignore wire and gate delays, which are the cause of glitching activities. Secondly they ignore spatial correlation and temporal correlation between signals. Spatial correlation measures the relationship between signals at the input of a gate. It occurs when the logic value of a wire depends on the value of another wire. They can happen at primary inputs when the input data has correlation or between internal nodes when gates fan-out to multiple gates and later they reconverge, at some other node. This is shown in figure 5. Temporal correlation occurs when the value at a node depends on previous values of the same node. This can also occur at the primary inputs or within sequential circuits which have feedback. This is shown in figure 5. Temporal correlation can be computed from 1 to 0 and 0 to 1 transition probability. Ignoring temporal correlation produces 15% to 50% error, and ignoring spatial correlation produces 8% to 120% error.



Figure 5. Internal spatial correlation and temporal correlations

Vol. 2, Issue 3, May-Jun 2012, pp. 198-203

# III. EXISTING ACTIVITY ESTIMATION TECHNIQUES

As already discussed there are two types of power estimation techniques: simulation based and probabilistic power estimation. The merits and demerits of some of them are discussed below. Today simulation is used for functional verification, performance evaluation, cost estimation, reliability checking and power estimation. The simulation based power estimation can be applied at different levels of abstraction. The main difference between the power estimation at these levels of abstraction is the trade of between computing resources and accuracy of the results[1]. Accuracy is higher at lower levels of abstraction than the higher levels; but the computation is easier at higher levels. In probabilistic power analysis the power dissipation of the circuit is then derived from statistical quantities like transition density, static probability and correlation.

In Najm's work on transition density the toggle rate is represented using the theory of probability. This work can be considered as the first work in the area of probabilistic power analysis. According to this work a module propagates transition from its input to output if the corresponding function is sensitive to the changes in the input. Based on a stochastic model of logic signals, this work introduced an algorithm to propagate density values from the primary inputs to internal and output nodes. It also demonstrates how the density values at internal nodes can be used to study circuit reliability by estimating the average power, ground currents, the average power dissipation, the susceptibility to electro migration failures ,the extent of hot-electron degradation etc. Here the only statistical quantity considered is transition density and hence the results are highly deviating from the actual one.

Work presented in [11] introduces a fast and memory efficient power estimation technique for CMOS circuits which estimates the power consumed due to the glitches. This technique is based on the notion of tagged transition waveforms. In this method regular input patterns are replaced with time waveforms that indicates static and dynamic probability. Similar to propagating input patterns here this transition waveform is propagated through the nodes. Since the approach uses this time waveforms; the model accounts for logic delays. Even if transitions occur very close to each other this method counts them as glitches. But in an actual circuit the glitches of short durations are filtered out by the RC components in the circuits. This indicates that this method leads to over estimation of glitches.

The concept of using correlation coefficients is discussed in [17]. This paper addresses the issue of switching activity estimation in combinational circuits under the zero-delay model from a probabilistic point of view. This work accounts for complex spatio-temporal correlations which occur at the primary inputs when the target circuit receives data from real

applications. Using lag-one Markov chains, conditional independence and signal isotropy, sufficient conditions for exact analysis of complex dependencies are derived. Evaluation of the model and a comparative analysis on benchmark circuits show that node-by-node switching activities are strongly pattern dependent and therefore, accounting for spatio-temporal dependencies is mandatory if accuracy is a major concern. This method introduced a new measure called transition coefficient to incorporate the effects of correlation between logic signals but this method was computationally complex.

The toggle rate estimation technique presented in [16] uses binary decision diagrams (BDDs) to compute correlation. Each node is represented using BDD and it is expanded to incorporate all possible transition for each variable. Their BDDs grows exponentially. Also this method considered equal probability for both 1 to 0 and 0 to 1 transitions. These assumptions led to more error in the estimation compared to a simulation based approach. Another method described in [3] also uses a similar approach of toggle rate estimation. Here the authors adopted techniques like partial collapsing and BDD pruning to reduce the size of BDDs. This method could not account for spatial and temporal correlation occurring within gates of a unit. Here they have introduced a method to reduce runtime but this leads to more error in the activity estimation. In BDD pruning low probability branches of a BDD are removed with minimal impact on accuracy of the activity estimation. This will eventually reduce the size of BDD under consideration and hence the computational complexity will get reduced. They have introduced a tool called ACE-V.2.0 which works well for both combinational and sequential circuits. The tool takes static probability, transition probability and transition density as input to simulate the circuit. The output file describes the same measures of the internal nodes.

The technique described in [6] is the latest work in the field of probabilistic power estimation. The authors claims that they have accounted all the issues related to toggle rate estimation like spatial correlation, temporal correlation, circuit delays, glitches and computational efficiency. The method uses a XOR based decomposition scheme<sup>[10]</sup> for breaking a function into subparts and determining the relation between them. This can be used for computing the spatial correlation of signals. The XOR based decomposition scheme described in [10] requires a truth table generation for computation of correlation. This indicates that for circuits with a large number of input vectors the number of elements in the K-Map is very large . This indicates that this method cannot demand advantage of a probabilistic approach, since they require input patterns for computing correlation. For any circuit with k inputs we require 2<sup>k</sup> input patterns for computing spatial correlation. If the number of primary inputs to a circuit is less than the method may find some advantages. For circuits with large number of input patterns this approach is very slow and computationally complex. This approach did not mention how

## Vol. 2, Issue 3, May-Jun 2012, pp. 198-203

they have considered the effects of temporal correlation. The results shows that there are still significant percentage error in their approach. The results shows a huge deviation for circuits with large logic depth. This method did not project any results regarding their application in a sequential circuit. Consideration of process variations in their approach can account the circuit delays and glitches more appropriately and hence they can further modify the approach to better model the power dissipation.

## IV. METHODOLOGY

This work deals with designing a probabilistic power estimator and comparing the result with a simulation based activity estimator and other existing probabilistic activity estimators. Figure 6. describes the whole methodology adopted for this work. The bench mark circuits used are Microelectronic Center of North Carolina circuits in BLIF (Berkeley Logic Inter Changeable Format) and ISCAS circuits. The results are compared using a simulation based activity estimator and existing Probabilistic activity estimators like ACE v2.0 academic tool produced from the University of British Columbia, Vancouver, BC, Canada and commercial Quartus II 8.0 CAD tool.



In this work we consider 4-input LUT based FPGAs like Altera Stratix II FPGAs as shown in figure 5. By using the logic synthesis tool ABC the bench mark circuit can be mapped into the LUTs.



Figure 5. 4-input Logic Element(LE) block diagram (Altera's Stratix FPGA family).

Each LUT can have a maximum of 4 inputs which are arriving from LUTs. So the resulting arrangement of LUTs will be similar to that in figure 6. Since LUTs are processed in topological order we consider the 4 predecessors of a master LUT for computing the spatial correlation. These predecessors in turn have a maximum of 16 inputs altogether. So in computing the spatial correlation of inputs of a master LUT the first step is to derive the functionality of the predecessor LUTs. There after we compute conditional probability of K1,K2,K3 and K4 (inputs of master LUT) with each other. Conditional probabilities thereby have enough information to describe the relationship between static probabilities of these four logic functions . On having the modified probability values(due to correlation) of master LUT we have to propagate these values to its output.



Figure 6. Master LUT- Predecessor architecture and detailed diagram for the functionality of Predecessors

Consider the circuit given in figure 5. It is a 4 input circuit so it can be implemented in a 4 input LUT. Knowing the modified input probabilities we use results in [4]and[6] to compute the transitions. Since the functionality of the mater LUT is known we can easily compute the transitions at the output. Here we can incorporate gate delays also. Gate delays results in glitches generated by the LUT. These glitches are propagated downstream.



Figure 7. A master LUT and the function implemented by it.

## v. EXPERIMENTAL RESULTS

Experiment is conducted on a set of MCNC bench mark circuits. The results of basic circuits which can be implemented in an LUT is shown in Table I . It is compared with simulation based approach and the relative variation is also tabulated.. 1000 Orandom vectors are used for simulating the circuits. Its equivalent statistical quantities are used for the proposed probabilistic simulator. Table II. Shows the simulation results for the BLIF benchmark circuits in Quartus II 9.0sp2 and Ace\_v2. Table III and IV shows the results of bench marks of original circuit, its optimized form and the LUT mapping using ABC tool.

## Vol. 2, Issue 3, May-Jun 2012, pp. 198-203

VI.

Table I. Comparison of activities from simulation based &

## CONCLUSION

probabilistic estimators (TC: toggle count, S.P.:static

#### probability)

| Circuit     | Simulation based<br>Approach |        | Probabilistic<br>Approach |        | Relative<br>Error |
|-------------|------------------------------|--------|---------------------------|--------|-------------------|
|             | T.C.                         | S.P.   | T.C.                      | S.P.   | %                 |
| Half Adder  | 637                          | 0.4243 | 909                       | 0.3371 | 30                |
| Full Adder  | 746                          | 0.4973 | 2060                      | 0.4333 | 64                |
| 4:1 mux     | 747                          | 0.4980 | 1155                      | 0.4153 | 35                |
| 2:4 decoder | 331                          | 0.2208 | 834                       | 0.6185 | 60                |
| BCD-EX3     | 705                          | 0.4698 | 986                       | 0.5200 | 28                |

Table II. Simulation results for the BLIF benchmark circuits

#### in Quartus II 9.0sp2 and Ace\_v2.

| Circuit | Logic<br>elements | Dynamic<br>power(mw) | Static<br>probability | Switching<br>probability |
|---------|-------------------|----------------------|-----------------------|--------------------------|
| alu4    | 966               | 27.19                | 0.99487               | 0.491414                 |
| apex2   | 917               | 60.34                | 0.175347              | 0.284624                 |
| apex4   | 829               | 64.15                | 0.468863              | 0.864924                 |
| ex5p    | 198               | 13.05                | 0.938600              | 0.472312                 |
| ex1010  | 888               | 69.01                | 0.727133              | 0.448922                 |
| misex3  | 893               | 57.48                | 0.227102              | 0.346351                 |
| pdc     | 1729              | 97.52                | 0.938785              | 0.384609                 |
| seq     | 1033              | 71.77                | 0.708942              | 0.480792                 |
| spla    | 1745              | 108.63               | 0.429261              | 0.472992                 |

#### Table III. Bench mark circuit details : ABC tool

| Circuit | Input/  | OR    | AND   | NOT   | No. of critical |
|---------|---------|-------|-------|-------|-----------------|
|         | output  | Gates | Gates | Gates | path elements   |
| alu4    | 14/8    | 2732  | 3729  | 5464  | 14              |
| apex2   | 39/3    | 3165  | 4116  | 6330  | 17              |
| apex4   | 9/18    | 2195  | 3513  | 4390  | 12              |
| c5315   | 178/123 | 731   | 1579  | 1898  | 19              |
| c7552   | 207/107 | 820   | 1655  | 2025  | 29              |
| cps     | 24/107  | 593   | 1250  | 1709  | 11              |
| dalu    | 75/16   | 306   | 787   | 947   | 13              |
| e64     | 65/65   | 215   | 541   | 699   | 9               |
| ex4p    | 128/28  | 239   | 527   | 762   | 7               |

Table IV. Bench mark circuits details(optimized and LUT

#### mapped) : ABC tool

| OPTIMIZED |              |                                       | LUT MAPPED     |      |      |                                        |
|-----------|--------------|---------------------------------------|----------------|------|------|----------------------------------------|
| Circuit   | AND<br>Gates | No.of<br>critical<br>path<br>elements | No. of<br>LUTs | EDGE | AIG  | No. of<br>critical<br>path<br>elements |
| alu4      | 2370         | 14                                    | 1150           | 3932 | 2845 | 7                                      |
| apex2     | 2589         | 15                                    | 1318           | 4422 | 3124 | 8                                      |
| apex4     | 1919         | 12                                    | 1059           | 3442 | 2402 | 6                                      |
| c5315     | 1287         | 26                                    | 482            | 1562 | 1622 | 8                                      |
| c7552     | 1390         | 30                                    | 478            | 1432 | 1724 | 11                                     |
| cps       | 1024         | 11                                    | 563            | 1796 | 1327 | 5                                      |
| dalu      | 643          | 16                                    | 246            | 871  | 772  | 6                                      |
| e64       | 445          | 9                                     | 244            | 800  | 556  | 4                                      |
| ex4p      | 401          | 9                                     | 185            | 621  | 451  | 4                                      |

This paper examined various activity estimation Techniques. Their merits and demerits are also discussed. The paper also explains how to accurately measure the toggle rates of signals in field-programmable gate array (FPGA)-based logic circuits without the use of simulation vectors. Improved accuracy of results is achieved by considering correlation of logic signals. By incorporating gate delays effects of glitches are also considered. Comprehensive treatment of glitches also improved our results. Statistical static timing analysis is a procedure that is becoming increasingly necessary to handle the complexities of process and environmental variations in integrated circuits. Statistical Static Timing Analysis can be incorporated so as to better model the effects of process variations. In contrast, statistical static timing analysis algorithms offer a more accurate prediction of the timing behavior of circuit designs.

## REFERENCES

- [1] Gary K. Yeap,"Practical low power digital VLSI design",Kluwer Academic Publishers.
- [2] Julien Lamoureux, "Modeling and Reduction of Dynamic Power in Field-Programmable Gate Arrays", Ph.D. dissertation, Dept. of Electrical and Computer Engineering, The University of British Columbia Nov. 2007.
- [3] J. Lamoureux and S. Wilton, "Activity estimation for field programmable gate arrays," in Proc. IEEE Int. Conf. Field-Programmable Logic Appl., Aug. 2006, pp. 1-8.
- [4] F. Najm, "Transition density: A new measure of activity in digital circuits," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 12, no. 2, pp. 310-323, Feb. 1993.
- [5] Farid N. Najm, , "A Survey of Power Estimation Techniques in VLSI Circuits" IEEE transactions on very large scale integration (VLSI) systems, vol. 2, pp 446-455.NO. 4, Dec. 1994.
- [6] Tomasz S. Czajkowski, and Stephen D. Brown, "Decomposition-Based Vectorless Toggle Rate Computation for FPGA Circuits", IEEE transactions on very large scale integration (VLSI) systems, vol29, no. 11,pp1723-1735 Nov.2010.
- [7] Julien Lamoureux and Wayne Luk ," An Overview of Low-Power Techniques for Field-Programmable Gate Arrays", NASA/ESA Conference on Adaptive Hardware and Systems, vol. 27, no. 12, pp. 338-345, Dec. 2008.
- [8] Andrew Yang and Karti Mayaram ,"Simulation and Modeling" IEEE transactions on computer-aided design of integrated circuits and systems, pp 11-19 July1994
- [9] Farid N. Najm, "Low-Power Design Methodology: Power Estimation and Optimization" IEEE transactions on very large scale integration (VLSI) systems, pp1124-1129, July1997
- [10] T. S. Czajkowski and S. D. Brown, "Functionally linear decomposition and synthesis of logic circuits for FPGAs," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 12, pp. 2236-2249, Dec. 2008.
- [11] C. Tsui, M. Pedram, and A. Despain, "Efficient estimation of dynamic power consumption under a real delay model," in Proc. IEEE Int. Conf. Comput.-Aided Design, 1993, pp. 224-228.
- [12] F. Hu and V. Agrawal,"Dual-transition glitch in probabilistic waveform power estimation,"Proc. ACM Great Lakes Symp. VLSI,2005,pp.357-360.
- [13] C. Ding, C. Tsui, and M. Pedram, "Gate-level power estimation using tagged probabilistic simulation," IEEE Trans. Comput.-Aided Design Electron. Circuits Syst., vol. 17, no. 11, pp. 1099-1107, November. 1998.
- [14] Y. Lin, M. Huttom, and L. He, "Placement and timing for FPGAs considering variations," in Proc. FPL, 2006, pp. 1-7.
- [15] www.ece.cmu.edu/~ee760/760docs/blif.pdf

## Vol. 2, Issue 3, May-Jun 2012, pp. 198-203

- [16] P. Schneider, U. Schlichtmann, and B. Wurth, "Fast power estimation of large circuits," *IEEE Design Test Comput.*, vol. 13, no. 1, pp. 70–77, March. 1996.
- [17] R. Marculescu, D. Marculescu, and M. Pedram, "Probabilistic modeling of dependencies during switching activity analysis," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 17, no. 2, pp. 73–83, Feb. 1998.
- [18] Jongyoon Jung and Taewhan Kim "Scheduling and Resource Binding Algorithm Considering Timing Variation" *IEEE Trans. on very large scale integration (VLSI) systems*, vol. 19, no. 2, pp 205-216 Feb. 2011.
- [19] C. Ding, C. Tsui, and M. Pedram, "Gate-level power estimation using tagged probabilistic simulation," *IEEE Trans. Comput.-Aided Design Electron. Circuits Syst.*, vol. 17, no. 11, pp. 1099–1107, Nov.1998.

