Thursday, July 30, 2020

Different Types of Clock Tree Structure

In the Last post , we had discuss about the Parameters required for good CTS . In this post we will discuss about the Various Clock tree structures widely used in the industry, which having its own merit & demerit.
Lets discuss different Clock Tree Structure one by one

Conventional CTS/Single point CTS: 
Single point CTS is the default choice for most of the designers having lower frequency & lesser no of sinks. As name suggested having single clock source which distribute clock to every corner of design.

Figure 1 : Single Point CTS
                                       
In Single point CTS the point of divergence lie at the clock source, so it shared very large uncommon clock path, more susceptible to OCV variation. In this clock gates are stratergically placed near the source, saving large amount of dynamic power

Advantages:
Simplicity of Implementation
Better clock gating, reducing Power dissipation

Disadvantages:
Higher Insertion delay
More uncommon clock path, more prone to OCV variation
Difficult to achieve lower skew, due to asymmetric distribution of sinks.
                                                                   So, conventional CTS is not a good choice for high frequency signals, having high no of sinks (all over core region)

Clock Mesh Structure
As the name suggests it create a dense mesh of  shorted wires which is being driven by mesh drivers to distribute clock in every corner of the design.
In the mesh structure, there will be a network of pre-mesh drivers to drive the clock signal from clock port to input of mesh drivers. The output of all the mesh drivers will be shorted using a metal mesh, which will carry the clock signal across the block using horizontal and vertical metal stripes.
Figure 2: Clock Mesh
                                 
As we can see from Figure 2, mesh drivers are connected to mesh net as multi-driven net. The benefit of the mesh net is that it smoothes out the arrival time differences from the multiple mesh drivers that drive it. If the mesh net is dense enough it require only few stages of clock network (2-3 stages) to route all the sequential element with the clock, which makes the uncommon clock path is very less (path of divergence less), so more prone to OCV variation. In clock Mesh structure power dissipation is going to be high as clock gates cells are inserted after the mesh net, thereby implying clock mesh is always on & switching continuously (as clock gating done at local level only).

Advantages:
Lower Skew
Highly tolerant to On-Chip Variation
Possible to achieve lower insertion delay

Disadvantage:
More power dissipation (Dynamic)
More routing resources required for creating mesh
Difficult to implement

Multi-Source CTS:
Multi-Source CTS is a hybrid approach, between Conventional CTS & Clock Tree Mesh. It involves a global distribution network in form of a sparse mesh or an H-tree with tap points strategically inserted at different locations. These tap points are followed by a local clock tree distribution to route clock from these tap points to the Sequential cell clock end-pins

Advantages:
More common clock path then Single point CTS , so more prone to OCV variation then Single            Point CTS
Less routing resources required then Mesh based clock Tree
Less power Dissipation then Mesh based Clock Tree
Less insertion delay as compared to Conventional Clock Tree
Lesser Skew

In last , have tried to conclude all the design metric with different variants of Clock Tree

Design Metric
Single Point CTS
Mesh based CTS
Multi-Source CTS
Power
Low
High
Moderate
Performance
Low
High
Moderate
Area/Routing Resource
Low
High
Moderate
Impact of OCV
High
Low
Moderate

Let me know , in case you have any queries.Stay Blessed !!

Wednesday, July 29, 2020

Parameters required for Good CTS

CTS is one of the most important stages in PnR, CTS QOR decides the timing convergence & Power. In most of the IC design, Clock tree is going to contribute 30-40% of power dissipation. With the Technology advancement Clock tree robustness has become even more critical affecting the overall SOC performance. Before going deep dive into CTS, we will first understand the quality parameters required for good CTS.
  • Minimum Latency : Latency is defined as the total time taken by the clock to propagate from the Clock source to the sink pins of flop/sequential device(clk pin of D F/F). We are targeting minimum latency because of less no. of clock cells required in clock path, less power dissipation, less area consumed, more routing resources were available.
  • Minimum Skew: Skew is defined as the latency difference b/w two flops. Minimum Skew is helping in timing closure especially Hold timing, but targeting too aggressive minimum skew can be counterproductive because it can create other problems such as overall latency of a design is going to increase, no. of clock cells also going to increase, more uncommon clock path, more power dissipation 
  • Minimum Uncommon Clock Path: Registers must have minimum uncommon clock path, as timing derates ( OCV variation) are applied only on uncommon path, if having more uncommon clock path, its become difficult to close design across scenarios (timing). 
  • Duty Cycle: Duty cycle is defined as fraction of one period in which a signal is in active state. Maintaining a good duty cycle is also one of the important requirements in CTS because if our duty cycle is going to distort (DCD) it can be a case after some logic we will fail the Min pulse width requirement & face the MPW violation.
  • Minimum Power dissipation: To reduce the power dissipation, we will do certain things like clock gating at architectural level & more tighter constraints related DRV’s.
  • Signal Integrity: As clock net having very high switching activity, having more prone to noise & Em violation, we will construct the clock tree with NDR rules (double width-double spacing) or whatever the routing rules defined for clock net. Increasing space helps in reducing parasitic cap & increasing width helps in addressing EM.



Wednesday, June 17, 2020

Target Skew

Target Skew, the skew value on which the cts engine will try to build a balanced clock tree. In this post we will discuss about on which factors we will choose the target skew of our design & how’s that factors affect our design QOR.
As a designer, it is general tendency to have a zero skew & have a perfect balanced clock tree, but Zero skew is not overall good for design, why? Think about in terms of latency, buffer count, Dynamic power & congestion.
For Zero skew, overall latency of a design is going to increase, as it will take more clock Buff/Inv to balance the flops (for zero skew), which may results in increase of uncommon clock path (more prone to OCV variation)  & high dynamic power dissipation as all the flops & buffer will going to toggle at same time. As each clock net takes double routing resource because of NDR settings apply on clock net, so the congestion also increases as the lesser skew are targeted.
As the technology is shrinking, so it is becoming more critical to close timing across corners. Skew has direct impact on setup/hold. The main motive to attain Zero skew is the hold timing across all the corners. So, by optimal selection of skew number (target skew), we will decrease clock power consumption, clock buffer/inv count & significant congestion reduction.

How we will analyze the Target skew value?
For Target skew we have to do multiple experiments, creating clock tree with target skew defined by keeping the constraints constant (SDC) & then different Skew numbers are analyzed based on latency, power & congestion.
We know that hold timing equation of a flop i.e,

launch & capture timing path
Figure 1: Timing Path


                               Tck->q + Tcomb > TSkew  + THold

We can re-write this as,
                               TSkew  < Tck->q + Tcomb - THold
For worst case scenario in hold ,lets suppose Tcomb = 0 (flops sitting very near, no logic path), above equation can be rewritten as
                              TSkew  < Tck->q  - THold  
Lets assume flop delay in worst case is 100 ps & hold time is 30 ps
                                  TSkew  < 70 ps
Which mean there is a scope of ~70 ps skew without degrading hold timing in worst case.
So we do multiple iterations by setting the target skew in range of +- 30 ps & analyse for above factors as well as for checked for timing(NFE) in all the corners.

NFE -> No. of failing endpoints

Let me know if you have any doubts. Stay Blessed!!


Sunday, June 7, 2020

Temperature Inversion

Temperature inversion is a phenomenon which occurs in lower nodes,which makes the delay of a cell decreases when there is a rise in temperature contradictory the delay in higher nodes.
Lets unfold this,
If you look at the MOSFET drive current equation,

Saturation current equation


So, ID varies linearly as u (mobility) and (VGS-Vt)^2 or the overdrive voltage. We can conclude that Delay of a cell depend upon two factors mobility & threshold voltage( Vt)of a transistor.

How mobility & Vt  depend upon temperature
Due to rise in temperature, metal ions going to vibrate more, so mobility of charge carriers will decreases such that delay of a cell going to increase.

Threshold voltage is also going to decrease,with rise in temperature as number of minority carriers in the substrate going to increase, which makes the less Vt than usual required to form a channel.
To Summarise, increase in temperature, makes the delay of a cell
  • Decreases due to Decrease in threshold voltage,
  • Increases due to Decrease in the mobility.
So delay of a cell may increase or decrease depend upon which factor going to dominate either mobility or threshold voltage on final current.

When the VGS- Vt or overdrive voltage is large(in higher node-> high VGS), then decrease in threshold voltage due to variation in temperature is negligible because overall overdrive voltage has very less impact, so mobility factor is going to dominate here, results delay of a cell going to increase with rise in temperature .

When the overdrive voltage(VGS-Vt) is less (smaller node -> less VGS), then decrease in threshold voltage due to rise in temp going to dominate the overall overdrive voltage, results delay of a cell going to decrease with rise in temperature.

One thing should be noted that temperature inversion is come into picture at lower nodes (lower voltage) with more prominent effects on HVT cells.

Friday, June 5, 2020

ICG Optimization

In the previous post we have read about ICG Enable timing problem, to overcome the problem we use ICG optimization technique in pre-cts stage.
ICG optimization is executed during place stage & performs
  • Dummy -CTS
  • ICG Splitting
  • Clock aware placement
Dummy CTS
In Dummy CTS ,it will build a Dummy clock tree to identify the critical ICG, calling it as dummy clock because in cts stage tool will build the actual clock tree by discarding the dummy one 
Benefits of dummy cts are:
  • Accurately determine the ICG enable critical timing paths with the help of Dummy cts
  • Accurate data path optimization of timing critical ICG enable paths in place stage.
  • Effective ICG splitting & clock aware placement (discuss below).
One thing should be taken care of that we have to apply all cts related settings like clock tree exceptions, NDR rules, layers, etc before running place stage in order to correlate Dummy cts & actual cts  clock tree as much as possible for optimum ICG optimziation .

ICG Splitting
After Dummy CTS, ICG optimization perform ICG splitting, we know that if ICG cell is driving multiple reg , it will increase the ICG downstream latency lead to more enable timing critical paths.
Tool identify the Critical ICG's in Dummy cts & do ICG splitting i.e instantiate one ICG into many ICG's & place them near to the reg they drive to reduce ICG downstream latency for better ICG enable timing.
ICG splitting is timing driven means only ICG's with enable timing violations will split.

ICG splitting
Figure 1: ICG Splitting

Clock aware Placement
In last stage of ICG optimization tool will do clock aware placement i.e after timing driven splitting of ICG, tool will place the ICG's with critical enable timing  near to register clusters(as shown in figure1).

One thing to note that ICG optimization may increase dynamic power dissipation in our design.
Let me know if you have any doubts .Stay Blessed!!

Wednesday, June 3, 2020

ICG Enable Timing Problem

As we know Integrated Clock Gating cells are used to reduce dynamic power dissipation in the design, which is being Enable by CTRL logic. To get the glitch free output from ICG cell , it should meet the timing requirement (setup/hold) at enable pin of ICG cell.

Figure1: ICG cell

In the above figure as we seen ICG cell is driving multiple flops which is being enabled by control logic flop R1. L2 & L3 is the latency from clock port to ICG & flops.so our ICG cell latency(latency from ICG output clock to flops) will be

                                      ICG latency = L3-L2

Ideally one ICG cell can drive infinite flops, as no. of flops going to increase driven by ICG cell,tool is going to add more buffer in the clock path to balance clock tree, which will increase the ICG latency.
As L3 latency going to increase, results in increase in ICG latency, as clock period is fixed so now we are having lesser clock period than before to meet setup timing at EN pin.
So we can conclude that Larger the ICG latency , more critical the ICG enable timing.
It is always advisable to address ICG timing in place/pre-cts stage, as after CTS it can be too late for the design to address ICG timing violation.
we know that Pre-cts timing analysis used ideal clock latency for all clock pins,that means L2=L3, & ICG latency will be 0.
As ICG latency is 0 ,which will make ICG Enable timing analysis too optimistic, because now ICG cell will get full clock to meet setup at Enable pin(before it get only L3 - ICG Latency). So, In Pre-cts actual ICG violations are not seen, therefore not fixed in the design.
To overcome this design problem ICG optimization is a technique recommended for designs having critical ICG enable timing.In the next post we will discuss about ICG optimization technique, how it is executed.
Let me know if you have any doubts in it.Stay Blessed!!

Wednesday, May 20, 2020

How to Connect Power Switches

Power switches helps in reducing leakage power, today we will discuss about how to connect power switches as the way they are connected has direct impact on rush current & wake up time. To know more about Inrush current & wake up time please read about power switch in my previous post.
There are multiple ways we can connect enable pin of a power switch, but the best way is to mitigate the inrush current is to connect power switch cells enable pin in a daisy-chain fashion.
In Daisy chain enable pin of power switch is connected to other enable pin of power switch, which will allow  power switch to wake up in synchronous manner one after another instead of all at a time.


Power switch daisy chain fashion
Figure 1: Daisy Chain

If power switch enable in a simultaneous enable fashion, then huge rise in rush current will take place by minimising the ramp up time,.



Figure 2 : Simultaneous Enable

Referring the figure 2, we can see that at a time 4 power switch column going to power up ,suppose one column is taking I current from grid, so in simultaneous enable fashion it will take 4I from the grid at a time, which will concurrently increase our rush current.


Please refer the below link to know more about different styles of connecting enable pin of power switch

Saturday, May 9, 2020

Header Switch v/s Footer Switch

When gets to choose between one of these two switching strategy (either header or footer) area, IR drop,efficiency & design architectural issues are the key matrix.
Actually, both could be used. However, it would generate a big IR drop, which, in turn, could cause large standard cells delays.

Considering about Header switch which is being implemented by using pMOS to control VDD supply. pMOS transistors are less leaky  than nMOS transistor of same size.The disadvantage is that pMOS is having less driving capability than nMOS of a same size (mobility of electrons is higher than holes)

Considering about Footer switch which is being implemented by nMOStransistors to control VSS supply. The advantage of footer is having high drive strength & less IR . Thus, to achieve the same drive strength and IR drop, a design with footers would require less switches than a design with headers. However nMOS transistors are more leakier than pMOS & design become more sensitive to ground noise on the  virtual GND coupled through Footer switch.

Despite the obvious advantage of Footer switch ,Header Switch are generally being used in Power gating designs as in multi voltage designs demands for level shifters on signals between blocks with different supply voltages. These elements have a common ground and two power supplies. In this case, footers should not be used.

Friday, May 8, 2020

What is Power Gating Technique

Power gating is a technique to reduce Leakage power dissipation by switching off the power of blocks that are in standby mode and switching the power back on when their functionality is required.
In order to switch power off, high Vt transistors are used as switches and placed between the block PG (power and ground) pins and the PG rails.

There are 2 types of power switch

Header switchPmos transistor placed between VDD  & power rails.
Footer switch: Nmos transistor is placed between GND & power rails.

Figure 1: Header & Footer Switch


This way the controlled block, is no longer powered by the main power rails (always-on rails), but powered by a switched power rail (collapsible power).
There are multiple aspects designers need to take care of including IR drop, ramp up time, rush current, and the number of power switches added in the design. Lets discuss one by one what it is & what are there significance.

Rush Current: Current drawn by the circuit during power up known as Rush current. As when the circuit powered up all the capacitors is going to charge simultaneously at once, which fetch high amount of current from grid, cause a sudden rush of current which can damage the power switch network.
Usually we use multiple power switch in parallel to divide the power domain supply into small blocks, so by doing this the load on power switch going to reduced which helps in managing rush current

Ramp up Time: The time required for powering up a shutdown domain known as Ramp up time. It should be as less as possible.

IR Drop: To handle rush current power switches are designed to have high channel resistance which leads to high IR drop across the power switch which may cause the degrading of power in the design leads to functionality failure (as minimal supply is going to logic cells).

So, we have to design power switch which should have less IR drop by keeping in mind of Rush current which will be achieved by using two types of power switch one which is used during power up to handle rush current & the second one during normal operation

Leakage Current: As switch cell we used is of high VT  which also going to contribute in the leakage current, so we have to take care of number of power switches we used in a design.

Requirement is to minimise the ramp up time as much as possible keeping in mind of rush current.

To meet this we have different variety of power switches.Some Power switch cells have one enable control (for gating power) and some have two enable controls. The ones with two enables provide a way to control the ramp up time (power switch network) and rush current at power up. 
This is achieved by carefully designing the cell so that internally, one enable pin is connected to a smaller switch, and the other enable pin is connected to a bigger switch.(2 switches in one power switch)
To power up the switch network the smaller switch is going to turn on first to slowly bring the power supply to expected voltage level keeping rush current under control. Once the circuit gets to certain voltage levels larger switch is turned ON for normal operations of power domain logic cells while manage IR drop.

In the next post we will discuss about some power switch chain style to reduce Power Switch Rush Current. Stay Blessed :) 

Wednesday, May 6, 2020

What do you mean by Testing & Verification

Testing is a process to determine the presence of faults (not the absence of faults) in a given circuit,whereas Verification is the process to determine the correctness of a design i.e whether the design is working correctly or not.
Some other differences are as below

Testing
Verificataion
  • Testing verifies the correctness of manufactured hardware i.e gate level circuit
  • Verification verifies the correctness of a design i.e design is working perfectly or not
  •  In Testing we have a hardware on which we can apply inputs & observe the output
  • Whereas Verification works on a design which is on paper
  • Responsible for quality of a devices
  • Responsible for Quality of a design
  • Performed on every manufacture devices
  •  Performed only once prior to manufacturing

Testing can be done at below level
  • Chip level
  • Board level
  • System level
As we go down the hierarchy level the cost of testing is going to increase by 10 times,so its better to do testing in early stages of a design.
One point to be noted that no amount of a testing gives us a guaranty that a circuit is fault free,the more we test the more confidence we get about for proper working of circuit.