# A Bayesian Based EDA Tool for Accurate VLSI Reliability Evaluations

Walid Ibrahim, Azam Beg
College of Information Technology, UAE University
{walidibr, abeg}@uaeu.ac.ae

### **Abstract**

As the sizes of (nano)device are aggressively scaled deep towards the nanometer regime, the design and manufacturing of future nano-circuits will become extremely complex and inevitably introduce more defects and their functioning will be adversely affected by transient faults. Therefore, accurately calculating the reliability of future designs will become a very important factor for nano-circuit designers as they investigate several design alternatives to optimize the trade-offs between the conflicting metrics of areapower-energy-delay versus reliability. This paper introduces a novel EDA tool (NANO-CR-EDA<sup>2</sup>) for accurate calculation of future nano-circuits reliabilities. Our aim is to provide both educational and research institutions (as well as the semiconductor industry at a later stage) with an accurate and easy to use tool for comparing the reliability of different design alternatives, and for selecting the design that best fits a set of given (design) constraints.

#### 1. Introduction

Over the last four decades, CMOS scaling has been the means by which the semiconductor industry has achieved its historically unprecedented gains in productivity and performance. CMOS devices have been subject to a steady feature size reduction, and are now fast approaching the ten nanometer mark. At the same time, the advancements of nanotechnologies such as DNA scaffolding and self-assembly has triggered the anticipation of a new set of nano-scale devices such as carbon nano-tubes (CNTs), resonant tunnel diodes (RTDs), single-electron technology (SETs), and molecular switches [1].

However, designs based on future nano-devices are expected to be highly unreliable and will experience high defect and transient error rates [2]. The massive scaling of CMOS devices deep into the nanometer regime will introduce severe static and dynamic parameter fluctuations at both material, device and circuit levels [2]. These parameter variations will dramatically increase the complexity of future ICs

manufacturing and inevitably will introduce more defects. The failure rate of SET devices is expected to be around 10% [3], and go as high as 30% for self-assembled DNA [4], while a recently manufactured 160 Kbit molecular memory was reported to have defect rates of 60% [5]!

At the same time, the devices' small sizes—and consequently the tiny amounts of energy required for their switching—make them highly susceptible to transient failures [6]. Transient failures may occur due to external sources such as thermal noise, electromagnetic interference, and terrestrial radiation. Since capacitance and voltages will decrease massively in future technologies, only a few electrons will be needed to flip the state of a memory, flip flop or latch device. Although this event is highly unlikely per single device, soft errors are becoming a major reliability concern for future systems design with nano-devices due to the expected huge number of devices the system will have.

An accurate calculation (estimation) of the reliability of nano-architecture through simulations is going to become essential for future designs. It would allow not only verifying theoretical results, but could also help in designing/selecting the most suitable (nano)architecture that optimally trades delay, power, and area versus reliability. Hence, there will be a growing need to accurately estimate reliability/yield (see [6], [7]). As a result, the reliability community will be forced to thoroughly investigate what exactly is determining these margins, and how we can change our reliability assessment methodology to gain new reliability space for the most advanced technologies.

The objective of this paper is to provide educational and research institutes as well as (eventually) the semiconductor industry, with an easy to use Electronic Design Automation (EDA) tool that enables accurate reliability evaluations of nano-circuits. The Nano-Circuit-Reliability EDA-tool for Evaluation of Design Alternatives (Nano-CR-EDA<sup>2</sup>) allows circuit designers to evaluate and assess the reliability of different (nano-) architectures, and to select the best architecture that optimizes target area, speed, power versus reliability.

The paper is organized as follows. Section II highlights the different approaches for reliability/yield estimations. The NANO-CR-EDA<sup>2</sup> tool is explained in detail in Section III followed by simulation experimental and results in Section IV. Concluding remarks and future directions of research are ending the paper.

# 2. Reliability/Yield Estimations

Calculating the reliability of a large (i.e., complex) logic circuit can be done analytically (mathematical evaluations/equations) (EQ), and/or based simulation methods. The methods used for simulating stochastic systems can be divided further into experimental and numerical methods. The most popular experimental method is Monte Carlo (MC) simulation, which reproduces the behavior of the system. Once the model is built, the computer performs as many sample runs from the model as necessary to draw meaningful conclusions about the model's behavior. The biggest advantages of MC are its intuitiveness and its ability of simulating models for which deterministic solutions are intractable. Being (very) time consuming its use appears to be limited, but the (precise) reliability results obtained could be collected and used at the higher levels (e.g., as parameters in future libraries of gates).

Numerical Algorithms (ALG) are designed for analyzing stochastic models without incorporating the random behavior. The simulation results that they deliver are the same for the same model parameters. These methods work by describing the flow of probabilities within the system—usually differential equations and numerical methods for solving them. Markov chains can be used for describing and analyzing models that contain exclusively exponentially distributed state changes. Depending on the character of the time domain, there are discrete-time Markov chains (DTMCs) and continuous-time Markov chains (CTMCs). The interested reader can find many earlier results (including REL70, RELCOMP, CARE, CARSRA, CAST, CARE-III, ARIES-82, SAVE, MARK, HARP, SHARPE, GRAMP, SURF, SURE, SUPER, ASSIST, SPADE, METFAC, ARM, and SUPER) in the excellent review [8]. This was followed by several other reliability tools including PRISM [9], proxel [10], PTM [11], PGM [12] and Bayesian Networks (BN) [13].

It follows that any approach for estimating reliability has to be based on one or a combination of some of these *three different alternatives*: EQ, ALG, and MC. Each of these three methods can be applied at different levels out of which the following *four levels* can easily be identified: device, gate (tens of devices), circuit/core



Fig. 1. Possible alternatives for reliability/yield estimations.

(thousand of gates), and network (on chip) (many/multi cores). The three methods and the four levels lead to only 30 different possible combinations (see Fig. 1). Unfortunately, very few of these 30 alternatives have been (or are being) used, with more than half of them never explored. Obviously, using equations alone (alternative #1) is probably the fastest approach (but not necessarily the most accurate one), while using MC up to the circuit/core level (alternatives #29 and #30) is certainly the most time consuming alternative, but one which leads to quite accurate solutions.

# 3. Nano-Circuit Reliability EDA-tool for Evaluation of Design Alternatives

The aim of the NANO-CR-EDA<sup>2</sup> tool is to provide educational and research institutes as well as (eventually) the semiconductor industry with an accurate and easy to use way to assess and compare the reliabilities of different alternative architectural designs. The current version of the tool is limited to the calculation of the probability of failure at the circuit/core ( $pf_{CIR}$ ) level using a gate probability of failure ( $pf_{GATE}$ ) provided by the user. As shown in Fig. 1, users can use a combination of MC, ALG, and EG approaches to calculate  $pf_{GATE}$ .



Fig. 2. Two Full adder implementations: (a) NAND based; (B) Minority based.

The tool uses the BN numerical method to calculate  $pf_{CIR}$  using the  $pf_{GATE}$  provided by the user. The BN method has been selected for three main reasons:

- 1- It has been known as a powerful tool for calculating reliability, especially for problems involving uncertainty.
- 2- It scales well as the numbers of nodes, input and output signals increase. Other numerical methods such as PTM and PGM suffer as the size of the problem increases.
- 3- The availability of several open source libraries and algorithms for solving large size BN accurately and efficiently make it very attractive to try to use these available libraries to model and calculate the reliability of nano-architectures.

The first step toward using the tool is to prepare a comprehensive description of the circuit(s) under test that includes a complete list of the gates used and their interconnections. The tool uses the circuit's netlist file as the most appropriate source for the circuit's description. Netlist files usually convey connectivity information and provide nothing more than the gate instances and the connections linking them together. They are generated by many EDA tools; however, a user can easily generate a netlist file for a small circuit manually by following the netlist syntax and its simple semantic rules.

The tool currently supports three different netlist file formats, namely: Bench, Cadence Verilog, and Synopsis Verilog. While other netlist formats are expected to be supported in the near future, we believe that the currently supported formats are the most commonly used ones. To facilitate the comparison between multiple designs, the tool allows the user to select multiple netlist files (however, currently all the netlist files should be of the same format). This feature is useful when the user selects the graphical mode to illustrate reliability results. In this case, the user will be

able to immediately visualize and evaluate the reliability differences among the selected designs.

After selecting the netlist file, the user should upload a text file that includes at least one  $pf_{GATE}$  value (for each type of gate). The tool also allows users to calculate  $pf_{CIR}$  for multiple or a range of  $pf_{GATE}$  values. This is very important as some designs may perform better than other designs within a specific range of  $pf_{GATE}$ , while performing worse outside that range. In future releases, the tool will allow users to specify  $pf_{GATE}$  for each individual gate. This will allow studying the effect of improving the reliability of a certain gate (by adding redundancy or using rad hardened gates) on the  $pf_{CIR}$ .

Moreover, the Nano-CR-EDA<sup>2</sup> tool allows users to select the fault model to be applied to the faulty gates. The user can select either von Neumann (the output is the opposite of the expected value), stuck-at-0, or stuckat-1 fault models. This will allow the user to study the effects of each fault model (independently) on the circuit's reliability. Another valuable feature the tool provides to users is the ability to export the circuit's BN reliability model. This allows the user to visualize the exported reliability model using a variety of Bayesian reliability tools including e.g. Hugin developer and Explorer, Netica, Ergo, and GeNIe. Once the reliability model is visualized, the user can gain several important insights regarding the circuit's reliability. For example, the user may examine the model to identify the subset of gates (or even just one gate) that has the major impact on reliability. The user can also use the Bayesian evidence and inference properties to calculate the impact of improving the reliability of those gates only on the circuit's overall reliability. The user can improve the reliability of individual gates by introducing rad hardened gates, resizing, or introducing (variable) redundancy in both space and time ( starting from the device level up).



Fig. 3. NAND-FA and MIN-FA results when all the gates have the same  $pf_{GATE}$ .

# 4. Experimental Results

To verify the effectiveness and the correctness of the tool, this version of Nano-CR-EDA<sup>2</sup> has been evaluated on calculating the probability of failure of two different full adder implementations (see Fig. 2). The first implementation is based on NAND gates (NAND-FA) while the second implementation is based on Minority gates (MIN-FA).

In the first experiments we compare the two implementations assuming that all the gates have the same  $pf_{\text{GATE}}$ . In this experiment, the  $pf_{\text{GATE}}$  range 0:0.005:0.01 has been used for all the NAND, Minority, and Inverters gates. The simulation results (see Fig. 3) show that the MIN-FA is significantly more reliable than the standard NAND-FA for the specified range of  $pf_{\text{GATE}}$ . This result is expected as we have assumed that all the gates have the same  $pf_{\text{GATE}}$  and the NAND-FA has significantly more gates (12 gates) than the MIN-FA (5 gates).

Although different types of gates are (normally) implemented on the same die using similar devices, they are usually built using different numbers of devices (not mentioning different logic styles and maybe even

different materials). While a standard CMOS inverter is implemented using two transistors, NAND-2 and Minority gates are implemented in standard CMOS using 4 and respectively 10 transistors. Therefore, assuming that all the gates have the same  $pf_{\rm GATE}$  is not accurate enough.

To improve the accuracy of the reliability evaluations, in the second experiment, the probability of failure of the individual gates was estimated using the equation used by Forshaw *et al.* in [14].

$$pf_{GATE} = 1 - (1 - \varepsilon)^n, \tag{3}$$

where  $\varepsilon$  denotes the probability of failure of a device (e.g., transistor, junction, capacitor, molecule, etc.), and n is the number of devices a gate has. In this experiment,  $pf_{\text{GATE}}$  for the Inverter, NAND-2, and Minority-3 gates were calculated (offline) for a range of  $\varepsilon$  equals 0:0.0001:0.01. The calculated  $pf_{\text{GATE}}$  for the three types of gates were then written into the probability of failure file which was later accessed by the tool during simulation.

The simulation results of the second experiment (see Fig. 4) show that the higher number of devices required to implement the Minority gate makes its  $pf_{\text{GATE}}$  much higher than those of NAND-2 ( $\cong$  250%) and Inverters ( $\cong$  500%). Consequently, the probability of failure of the MIN-FA becomes slightly lower than the that of NAND-FA ( $\cong$  3%). Fig. 4 also shows that this small difference diminishes as  $\varepsilon$  increases which eventually makes NAND-FA more reliable than MIN-FA for large  $\varepsilon$ . This is why it is very useful to be able to use a range of probabilities of failures when comparing different alternative designs.

The results form experiment 1 and 2 emphasizes the importance of accuracy for reliability calculation. It is obvious from Fig. 1 that at least 30 different combinations might be used to calculate the reliability of a large network on chip. However, the overall accuracy of these different combinations depends heavily on the accuracy on the simulation approach used at each level.

The BN numerical method utilized by the Nano-CR-EDA<sup>2</sup> tool can accurately calculate the probability of failure at the circuit/core level given the  $pf_{\text{GATE}}$  of the individual gates. However, the accuracy of the final result ( $pf_{\text{CIR}}$ ) depends heavily on how accurate is  $pf_{\text{GATE}}$ . In the second experiment we used analytical approach to roughly estimate  $pf_{\text{GATE}}$  as a function of the number of devices the gate has. A more accurate, but more computing-intensive and time-consuming, approach is to use MC simulations to calculate the  $pf_{\text{GATE}}$  for all different gates. We are quit advanced in developing novel algorithms to accurately calculate  $pf_{\text{GATE}}$  in terms of probability of failure of the gate's individual devices



Fig. 4. NAND-FA and MIN-FA results when all the gates have the same  $\epsilon$ .

and of their interconnections. This will extend the capabilities of the proposed tool to accurately calculate the probability of failure at both gate, and circuit/core levels.

### 5. Conclusion

This paper presented Nano-CR-EDA<sup>2</sup>, a novel, accurate, and easy-to-use EDA tool for reliability calculations (of future nano-circuits). Unlike other reliability tools currently available on the market (e.g., the MC-based ones), this tool has an edge on both speed and accuracy for calculating the reliability of large(r) digital circuits. It can be easily integrated with other EDA tools as it only needs the circuit's netlist file to operate. The Nano-CR-EDA<sup>2</sup> tool provides the users with the ability to display the reliability results in either text or plot modes. It also allows the users to visualize the reliability of different designs alternatives over a wide range of gates probabilities of failures.

To improve the tool accessibility, we are currently working on adding a web interface to the tool. The web version will be made available to the public shortly. This will allow the users to access the tool over the

Internet and eliminates the need to install either the tool or the MATLAB package (for plot mode) on local machines.

## References

- International Technology Roadmap for Semiconductors (ITRS), 2005 Edition and 2006 Update.
- [2] Q. Chen, and J.D. Meindl, "Nanoscale metal-oxidesemiconductor field-effect transistors: Scaling limits and opportunities," *Nanotechnology*, vol. 15, Jul. 2004, pp. S549–S555.
- [3] K.K. Likharev, "Single-electron devices and their applications," *Proc. IEEE*, vol. 87, Apr. 1999, pp. 606– 632.
- [4] U. Feldkamp, and C.M. Niemeyer, "Rational design of DNA nanoarchitectures," Angew. Chem. Intl. Ed., vol. 45, 13 Mar. 2006, pp. 1856–1876.
- [5] J.E. Green et al., "A 160-kilobit molecular electronic memory patterned at 10<sup>11</sup> bits per square centimeter," *Nature*, vol. 445, 25 Jan. 2007, pp. 414–417.
- [6] C. Lau, A. Orailoglu, and K. Roy (Eds.), Special Issue on Nano-electronic Circuits and Nano-architectures, *IEEE Trans. CAS I*, vol. 54, Nov. 2007.
- [7] V. Beiu, and W. Ibrahim, "On computing nanoarchitectures using unreliable nano-devices," Chp. 12 in S. E. Lyshevski (Ed.): *Handbook of Nano and Molecular Electronics*, London, UK: Taylor & Francis, May 2007, pp. 1–49.
- [8] A.M. Johnson, and M. Malek, "Survey of software tools for evaluating reliability, availability, and serviceability," *ACM Comp. Surveys*, vol. 20, no. 4, Dec. 1988, pp. 227– 269.
- [9] M. Kwiatkowska, G. Norman, D. Parker, and R. Segala, "Symbolic model checking of concurrent probabilistic systems using MTBDDs and simplex," *Tech. Rep. CSR-*99-01, School of Comp. Sci., Univ. of Birmingham, Birmingham, UK, Jan. 22, 1999.
- [10] S. Lazarova-Molnar, "The proxel-based method: Formalisation, analysis and applications," *PhD dissertation*, Faculty of Informatics, Otto-von-Guericke-Universität, Magdeburg, Germany, Nov. 2005.
- [11] K.N. Patel, I.L. Markov, and J.P. Hayes, "Evaluating circuit reliability under probabilistic gate-level fault models," *Proc. Intl. Workshop Logic Synthesis IWLS'03*, Laguna Beach, USA, May 2003, pp. 59–64.
- [12] J. Han, E.R. Taylor, J.B. Gao, and J.A.B. Fortes, "Faults, error bounds and reliability of nanoelectronic circuits," *Proc. Intl. Conf. Appl.-Specific Sys., Arch. & Processors ASAP'05*, Samos, Greece, Jul. 2005, pp. 247–253.
- [13] T. Rejimon, and S. Bhanja, "An accurate probabilistic model for error detection," *Proc. Intl. Conf. VLSI Design* VLSID'05, Kolkata, India, Jan. 2005, pp. 717–722.
- [14] M. Forshaw, K. Nikolić, and A. S. Sadek, "ANSWERS: Autonomous Nanoelectronic Systems With Extended Replication and Signaling," *MEL-ARI* #28667, 3rd Year Report, University College London, London, UK, 2001.