# A Delayed Buffered Technique Using The Concept Of Gated Driven Tree For Optimizing The Power

# Rajya Lakshmi G, Radha Krishna T, Lowkya Ch, Basheer Ali Sheik, P Mahesh, L Srikanth

### Abstract

This Project presents circuit design of a low-power delay buffer. In order to store a data in a memory, the buffers in the memory should be selected sequentially. For selecting buffers, ring counter, double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency by half. The C-element gated-clock strategy is proposed to reduce the power consumption. The gated-driver-tree idea is also employed in the input and output ports of the memory block to decrease their loading, thus saving even more power.

#### Introduction

The skyrocketing increasing transistor count and circuit density of modern very large scale integrated (VLSI) circuits have made them extremely difficult and expensive to test comprehensively. The DFT method in a digital processing chip of mobile communications, the delay buffer takes up a large portion of the circuit layout. If the power consumption of the delay buffer could be reduced significantly, the overall power consumption of the digital processing chip could be reduced significantly as well. On the other hand, as these chips are working at even higher operation frequencies, a new, low-power delay buffer should be operable under high frequencies. Fig.1 is a schematic diagram showing a conventional delay buffer having a length N and data width W bits using shift registers. As illustrated, the delay buffer contains N times W shift registers, arranged between the input and the output in N stages, each with W shift registers. The N times W shift registers are triggered by a same clock signal CLK. For every clock period of CLK, W-bit data is shifted from W shift registers of a previous stage to those of a next stage, and so on. A W-bit data input N clock periods ago therefore would be delayed and output after N clock periods. The clock signal CLK is provided to all N times W shift registers, contributing to the high power consumption. Moreover, the N times W shift registers would also take up a large die area. In general therefore, in real life, delay buffer such as the one in Fig.1 is most commonly used.



Fig.1: Dual-port SRAM memory

One of the common delay buffer implementation is a dual-port SRAM memory whose operation is different from that of the shift-register-based delay buffer. For an N times W SRAM-based delay buffer, there is no data movement between stages. Instead, at every clock period, a W-bit data is written to one of the N times W storage locations of the SRAMbased delay buffer, and another W-bit data that is written N clock periods ago is output. The power consumption of a SRAM-based delay buffer is mainly from the address decoder and the drivers for its input and output ports. As memory related technology has already quite mature and satisfactory results in terms of layout area and speed are achievable. Therefore in reality a delay buffer is often implemented using SRAM memory.

## Characteristics Terms For Various Memory Devices

The following terms are most commonly used for identifying comparative behaviour of various memory devices and technologies.

#### Storage Capacity

It is a representative of the size of the memory. The capacity of internal memory and main memory can be expressed in terms of number of words or bytes. The storage capacity of external memory is normally measured in terms of bytes.

#### Unit of transfer

Unit of transfer is defined as the number of bits read in or out of the memory in a single read or write operation, for main memory and internal memory; the normal unit of transfer of information is equal to the word length of a processor. In fact it depends on number of data lines in and out of the memory module. In general, these lines are kept equal to the word size of the processor. The unit of transfer of external memory is normally quite large and is referred to as block of data.

#### Access Modes

Once we have defined the unit of transfer next important characteristics is the access mode in which the information is accessed from the memory. A memory is considered to consist of various memory locations. The

# Vol. 2, Issue 2, Mar-Apr 2012, pp.727-733

information from memory devices can be accessed in the following ways:

#### a) Random Access Memory (RAM)

It is the mode in which any memory location can be accessed in any order in the same amount of time. Ferrite and Semiconductor memories which generally constitute main memory are of this nature. The storage locations can he accessed independently and there exist separate access mechanism for each location.

#### b) Sequential access

On the other hand we have memories which can be accessed in pre-defined sequences for example; the songs stored on a cassette can be accessed only one by one. The example of sequential access memory is Magnetic Tape. Here the access mechanisms need to be shared among different locations. Thus, either the location or the read/write head or both should be moved to access the desired location.

#### c) Direct Access

In certain cases the information is neither accessed randomly nor in sequence but something in between. In this kind of access, a separate read/write head exist for a track and on a track the information can be accessed serially. These semi-random modes of operation exist in magnetic disks.

#### d) Access Time

The access time is the time required between the requests made for a read or write operation till the time the data is made available or written at the requested location. Normally it is measured for read operation. The access time depends on the physical characteristics and access mode used for that device. Permanence or Storage: It is Possible to lose information by the memories over a period of time. The reasons of the loss of information. There are several reasons for information destruction, these are destructive readout, dynamic storage, volatility and hardware failure. If for a particular memory the reading process destroys the stored information. Call it Destructive readout. In such memories the information has to be written back on the same location from which it had been read after each read operation. The reading process where the data is not destroyed on reading is referred to as Non-destructive readout.

#### **Background Work & Proposed System**

Delay buffer works quite similarly like a fixed jitter buffer, that is it will delay the frame retrieval by some interval so that caller will get continuous frame from the buffer. This can be useful when the operations are not evenly interleaved, for example when caller performs burst of put () operations and then followed by burst of operations. With using this delay buffer, the buffer will put the burst frames into a buffer so that get () operations will always get a frame from the buffer (assuming that the number of get () and put () are matched). The buffer is adaptive, that is it continuously learns the optimal delay to be applied to the audio flow at run-time. Once the optimal delay has been learned, the delay buffer will apply this delay to the audio flow, expanding or shrinking the audio samples as necessary when the actual audio samples in the buffer are too low or too high.



The block diagram of delay buffer in existing technique is given below.



Fig.3: Delay Buffer-Existing Technique

#### Input Buffer

The Input buffer is also commonly known as the input area or input block. When referring to computer memory, the input buffer is a location that holds all incoming information before it continues to the CPU for processing. Input buffer can be also used to describe various other hardware or software buffers used to store information before it is processed. Some scanners (such as those which support "include" files) require reading from several input streams. As flex scanners do a large amount of buffering, one cannot control where the next input will be read from by simply writing a YY\_INPUT() which is sensitive to the scanning context. YY\_INPUT () is only called when the scanner reaches the end of its buffer, which may be a long time after scanning a statement such as an include statement which requires switching the input source.



Fig.4: Input Buffer

#### Memory Block

Random-access memory (RAM) is a form of computer data storage. Today, it takes the form of integrated circuits that allow stored data to be accessed in any order (that is, at random). "Random" refers to the idea that any

Vol. 2, Issue 2, Mar-Apr 2012, pp.727-733

piece of data can be returned in a constant time, regardless of its physical location and whether it is related to the previous piece of data. The word "RAM" is often associated with volatile types of memory (such as DRAM memory modules), where the information is lost after the power is switched off. Many other types of memory are RAM as well, including most types of ROM and a type of flash memory called NOR-Flash.

*Random Access Scan*- Fig.5 shows the basic architecture of the RAS. A Decoder is used to address every FF. The RAS allows reading or writing of any flip-flop using address bits where "n" is the number of scanned flipflops when the address is applied, the address decoder produces a scan enable signal to the corresponding flipflop needed to be placed with a data from the scan-in. In this technique, the scan function is implemented as a random-access memory. Hence at every given time only one FF is accessed while other FFs retain their state. The architectures described in most literatures mainly consists of a scan-in signal that is broadcasted to every FF, a test control signal that is broadcasted to all FFs, and a unique decoder signal from the decoder to every FF.



In those two decoder structures, suppose there are N flip-flops in the circuit. There will be Nff - address wires to those N flip-flops, there are only address wires to N-ffs. Although this structure need both row decoder and column decoder when only one decoder is used in, the hardware overhead reduced greatly using structure2.



Fig.6: Decoder design

#### **Ring Counter**

A ring counter is a type of counter composed of a circular shift register. The output of the last shift register is fed to the input of the first register.

There are two types of ring counters:

• A straight ring counter or Over beck counter connects the output of the last shift register to the first shift register input and circulates a single one (or zero) bit around the ring. For example, in a 4-register one-hot counter, with initial register values of 1000, the repeating pattern is: 1000, 0100, 0010, 0001, 1000... Note that one of the registers must be pre-loaded with a 1 (or 0) in order to operate properly.

• A twisted ring counter (also called Johnson counter or Moebius counter) connects the complement of the output of the last shift register to its input and circulates a stream of ones followed by zeros around the ring. For example, in a 4-register counter, with initial register values of 0000, the repeating pattern is: 0000, 1000, 1100, 1110, 1111, 0111, 0011, 0001, 0000.....



Fig.7: Ring counter, shift output fed back to input

Make provisions for loading data into the parallel-in/ serial-out shift register configured as a ring counter below. Any random pattern may be loaded. The most generally useful pattern is a single 1.

# Vol. 2, Issue 2, Mar-Apr 2012, pp.727-733



Fig.8: Parallel-in, serial-out shift register configured as a ring counter

Loading binary 1000 into the ring counter, above, prior to shifting yields a viewable pattern. The data pattern for a single stage repeats every four clock pulses in our 4-stage example. The waveforms for all four stages look the same, except for the one clock time delay from one stage to the next.



Fig.9: Load 1000 into 4-stage ring counter and shift

The structure of ring counter with RS flip flops in existing technique is given below Fig.10.



Fig.10: Ring counter with SR-Flip flops

The above block diagram shows the power controlled Ring counter. First, total block is divided into two blocks. Each block is having one SR-Flip flop controller.

### **D-Flip flop**

The D flip-flop is the most common flip-flop in use today. It is better known as data or delay flip-flop (as its output Q looks like a delay of input D). The Q output takes on the state of the D input at the moment of a positive edge at the clock pin (or negative edge if the clock input is active low). It is called the D flip-flop for this reason, since the output takes the value of the D input or data input, and delays it by one clock cycle. The D flip-flop can be interpreted as a primitive memory cell, zero-order hold, or delay line. Whenever the clock pulses, the value of Qnext is D and Qprev otherwise

### Truth table

| Clock       | D | Q                 | Qprev |
|-------------|---|-------------------|-------|
| Rising edge | 0 | 0                 | Х     |
| Rising edge | 1 | 1                 | Х     |
| Non-Rising  | Х | Q <sub>prev</sub> |       |

# Present Technique:

The block diagram of delay buffer in present technique is given below Fig.11.



Fig.11: Delay Buffer Present Technique

# Gated driver tree Technique:

The basic structure of gated driver tree is given in following block diagram.



Fig.12: Structure of gated driver tree

To save area, the memory module of a delay buffer is often in the form of an SRAM array with input/output data bus. Special read/write circuitry, such as a sense amplifier, is needed for fast and low-power operations. However, of all the memory cells, only two words will be activated: one is written by the input data and the other is read to the output. Driving the input signal all the way to all memory cells seems to be a waste of power. The same can be said for the read circuitry of the output port. In light of the previous gated-clock tree technique, we shall apply the same idea

# Vol. 2, Issue 2, Mar-Apr 2012, pp.727-733

to the input driving/output sensing circuitry in the memory module of the delay buffer.

#### Modified ring counter

The block diagram of ring counter which is used in present technique is given below Fig.13. In compare with existing technique ring counter, the present technique ring counter doesn't need a separate global clock and C-gate element replaced the SR-Flip flop.



Fig.13: Ring counter with C-gate element

# Simulation Work On Existing & Proposed Systems

# A) Ring Counter In Existing Technique

The simulation result for ring counter which is used in the existing technique is given in Fig.14 and RTL Schematic for RS-Flip flop in existing technique is given as shown in Fig.15.



Fig.14: Simulation result for ring counter in existing technique



**ig.15:** RTL Schematic for RS-Flip flop in existing technique

## B) Delay Buffer in Existing Technique

The simulation result for delay buffer which is used in the existing technique is given in the Fig.16 and RTL Schematic for delay buffer in existing technique ass in Fig 17.







Fig.17: RTL Schematic for delay buffer in existing technique

# Vol. 2, Issue 2, Mar-Apr 2012, pp.727-733

### C) Ring Counter in Proposed Technique

The simulation result for ring counter which is used in the proposed technique is given in the Fig.18 and RTL Schematic for delay buffer in proposed technique as in Fig.19.



Fig.18: Simulation results of ring counter in proposed technique



Fig.19: RTL Schematic for delay buffer in proposed technique

# RESULTS

## A) Power Results Of Delay Buffer In Proposed Technique

The power result for delay buffer which is used in the proposed technique is given in the following Fig.20. And observed the total consumed power is 214 mw. So while comparing with the existing technique, achieved the less power consumption in proposed technique.



Fig.20: Power results of delay buffer in proposed technique

## B) Power Comparison graph

The total power consumed in the existing technique of the delay buffer = 290mw. The total power consumed in the proposed technique of the delay buffer = 214mw. The comparison graph between existing technique and proposed technique can be given as shown in Fig.20.



Fig.21: Comparison graph between existing technique and proposed technique

### Conclusion

In this paper, presented a low-power delay buffer architecture which adopts several novel techniques to reduce power consumption. The ring counter with clock gated by the C-elements can effectively eliminate the excessive data transition without increasing loading on the global clock signal. The gated-driver tree technique used for the clock distribution networks can eliminate the power wasted on drivers that need not be activated. Another gated-de-multiplexer tree and a gatedmultiplexer tree are used for the input and output driving circuitry to decrease the loading of the input and output data bus. All gating signals are easily generated by a Celement taking inputs from some DET flip-flop outputs of the ring counter. Optimization of power is observed by using tool xilinx 12.3 and observed the simulation results. Power consumption is known to be a crucial issue in current IC designs. To tackle this problem in this project, increasing the no of blocks in a ring counter is

# Vol. 2, Issue 2, Mar-Apr 2012, pp.727-733

the main solution. But increasing the no of blocks in a ring counter increases the no of C-gate elements. By doing this achieve the less power consumption but increases the circuit complexity for the design.

#### References

- A. Renyi. On measures of entropy and information. Proc. 4th Berkeliey Symp. Math. Stat. and Prob.,(1)547–561, 1961.
- [2] L. W. Leung, B. King and V. Vohora. Comparison of image data fusion techniques using entropy and INI. 22nd Asian Conference on Remote Sensing, November 2001.
- [3] S. Kulkarni and D. Sylvester, "High performance level conversion for dual VDD design," IEEE Trans. Very Large Scale Integr. VLSI. Syst., vol. 12, no. 9, pp. 926–936, Sep. 2004.
- [4] R. Krishnamurthy, S. Hsu, M. Anders, and B. Bloechel, "Dual supply voltage clocking for 5G, 130 nm integer execution core," in Proc. IEEE Very Large Scale Integr. (VLSI) Symp., 2002, pp. 128–129.
- [5] P. Zhao, G. P. Kumar, and M. Bayoumi, "Contention reduced/conditional discharge flip-flops for level conversion in CVS systems," in Proc. IEEE Int. Symp. Circuits Syst., Vancouver, BC, Canada, May 23–26, 2004, pp. 669–672.
- [6] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: a differential CMOS logic family," in Proc. IEEE Solid-Circuits Conf., 1984, pp. 16–17.
- [7] M.-B. Lin, "A parallel VLSI architecture for the LZW data compression algorithm," in Proc. Int. Symp. VLSI Technologym Systems, and Applications, Taiwan, R.O.C., 1997, pp. 98–101.
  [8] M.-B. Lin, "A parallel VLSI architecture for the LZW data
- [8] M.-B. Lin, "A parallel VLSI architecture for the LZW data compression algorithm," J. VLSI Signal Process., vol. 26, no. 3, pp. 369–381, Nov. 2000.
- [9] W. Li and L.Wanhammar, "A pipeline FFT processor," in Proc. Workshop Signal Process. Syst. Design Implement., 1999, pp. 654–662.
- [10] E. K. Tsern and T. H. Meng, "A low-power video-rate pyramid VQ decoder," IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1789–1794, Nov. 1996.
- [11] N. Shibata, M.Watanabe, and Y. Tanabe, "A current-sensed highspeed and low-power first-in-first-out memory using a wordline/bitline- swapped dual-port SRAM cell," IEEE J. Solid-State circuits, vol. 37, no. 6, pp. 735–750, Jun. 2002.
- [12] E. Sutherland, "Micropipelines," Commun. ACM, vol. 32, no. 6, pp. 720–738, Jun. 1989.

**AUTHOR LIST** 



Mrs. Rajya Lakshmi Garimidi received Msc(Elecetronics), M.Tech(Microwave) from ANU, Guntur. Presently she is working as Assistant Professor at Vignan Institute of Management and Technology for Women, Hyderabad, Andhra Pradesh, India. He is having 3+ years of teaching experience in the field of Electronics and Communication Engineering. She can be reached at garimidi.nallapati@gmail.com\_Cell no: +91 9866973299



Mr. Radha Krishna Thippineni received B.E (ECE) from Andhra University, M.Tech (MWE) from ANU, Guntur. Presently he is working as Associate Professor at SRI VANI SCHOOL OF ENGINEERING, VIJAYAWADA, KRISHNA DIST. Andhra Pradesh, India. He is having 5+ years of teaching experience in the field of Electronics &Communication Engineering. He can be reached at neni.rkc@gmail.com Cell no: +91 9346092345



Lowkya Chandaka obtained her Bachelor's degree from JNTUK. Later she obtained her Master's degree from JNTUK with specialization in VLSI System design. Her areas of interests include Digital Integrated Circuits, VLSI Design and Low power VLSI Circuits. Presently She is working as an Assistant Professor in Raghu Engineering College, Dakamarri, A.P, India. She can be reached at lowkya,vlsi@gmail.com



Basheer Ali Sheik received Master's degree in Radar and Microwave Engineering from Andhra University. Bachelor's degree in Electronics and Communication Engineering from Andhra University. He is a research scholar in field of Integrated Electronics .He is having experience of 2 Years in the field of Electronics and Communication Engineering, presently working as Assistant Professor in the department of Electronics and Communication Engineering, Raghu Engineering College ,Dakamarri, A.P,INDIA. He can be reached at basheeralis@yahoo.com



Pilla Mahesh received Master's degree in Radar and Microwave Engineering from Andhra University. Bachelor's degree in Electronics and Communication Engineering from Andhra University. He worked as an Assistant Professor for 3yrs in Vignan Engineering College. Presently working as Assistant Engineer in APTRANSCO, SKota, A.P,INDIA. He can be reached at pillaamahi@yahoo.co.in



Lade Srikanth received Bachelor's degree in Electronics and Communication Engineering from JNTUH. He is a research scholar in field of digital integrated circuts.He is having experience of 3 Years in the field of Electronics and Communication Engineering, presently working as Assistant Professor in the department of Electronics and Communication Engineering, Raghu Engineering College, Dakamarri, A.P,INDIA. He can be reached at srikanth451@gmail.com.