• No results found

Monitoring operating temperature and supply voltage in achieving high system dependability

N/A
N/A
Protected

Academic year: 2021

Share "Monitoring operating temperature and supply voltage in achieving high system dependability"

Copied!
5
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Monitoring Operating Temperature and Supply

Voltage in Achieving High System Dependability

Muhammad Aamir Khan, Hans G. Kerkhoff

Testable Design and Test of Integrated Systems (TDT) Group,

University of Twente, Centre of Telematics and Information Technology (CTIT), Enschede, the Netherlands

m.a.khan / h.g.kerkhoff@utwente.nl

Abstract— System dependability being a set of number of attributes, of which the important reliability, heavily depends on operating temperature and supply voltage. Any change beyond the designed specifications may change the system performance and could result in system reliability and hence dependability problems. These reliability problems could be short-term variations and can be solved if the system returns back to its normal operational temperature and supply voltage. Therefore, these reliability problems should be differentiated from the other long-term reliability problems resulting from aging mechanisms. These are a function of stress time and have a cumulative nature. This differentiation is essential to better manage the system dependability during its operational life. This separation of two reliability problems requires a regular monitoring of the system operating temperature and the supply voltage during its operational life. The problem has been solved in the proposed hardware architecture and workflow that takes this monitoring into account to tackle them separately and carries out proper actions in order to enhance the system dependability. The simulation results for a target system carried out in LabVIEW environment fully support the proposed idea.

Keywords- reliability; system dependability; calibration; self-diagnosis; redundancy

I. INTRODUCTION

Dependability of electronic systems is becoming an important design concern as the technology is shrinking towards the nanometer limits. On one side different physical degradation mechanisms like negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), hot carrier injection (HCI), time-dependent-dielectric-breakdown (TDDB), and electromigration (EM) are becoming prominent [1-4]. On the other hand fabrication related process variations, supply voltage, and operational temperature (PVT) related variations are also affecting the performance of these electronic systems [5-10].

The interesting thing to note here is that some of these performance degrading effects have a common source of disturbance. For example, NBTI and PBTI are heavily dependent on stress temperature and input stress voltage causing serious performance degradations in electronic systems [1-3, 11]. While operational-temperature and supply-voltage variations, on the other hand, also individually contribute to the performance degradation of these electronic systems [7-10]. The difference lies in the parameter “time”. Temperature and voltage variations beyond specifications can cause serious

performance degradations that could be of instantaneous, temporary or short-term nature and could diminish if the temperature and voltage values return back to their nominal values. With respect to the time duration for which these variations will remain effective, they will cause long-term aging (NBTI, PBTI) effects. These effects may or may not be contributing to the performance degradation of the electronic systems at a particular time. The cumulative effect of these degradation mechanisms (short and long-term) over the operating time will decide about their contribution in degrading the performance of these electronic systems. Therefore, variations in system reliability as a function of operational temperature and voltage will be of short-term as well as long-term nature. Similarly, dependability being a set of a number of attributes, like reliability, maintainability, availability, safety, security, and survivability, should be tackled separately according to these short-term and long-term effects.

The purpose of the current paper is to understand these effects and to investigate:

- how operational-temperature and supply-voltage variations can cause short-term (non-permanent) and long-term (could be permanent) performance degradations?

- how would it be possible to differentiate between these two performance degradation scenarios?

- how this differentiation could be helpful in achieving high system dependability during its operational life? The remainder of this paper is organized as follows. Section II will describe the importance of supply-voltage and operational-temperature (VT) variations in affecting the performance parameters of both digital and analog circuits. Sections III, IV and V will present the basic idea, the proposed hardware architecture, and the simulation results for enhancing system dependability respectively. The conclusions and some important references will be presented at the end of the paper.

II. VTVARIATIONS

Technology scaling is forcing both VDD and VTH to reduce,

making it critical for electronic designers especially for analog designers as shown in Fig. 1 [12]. The distance (VDD-VTH),

known as free-voltage space for analog design, is also reducing. Therefore, any variations in VDD or VTH will

influence this free-voltage space and hence affecting the performance of electronic systems. Similarly, MOSFETs device characteristics like threshold voltage, carrier mobility,

(2)

22nm 32nm 45nm 65nm 90nm 130nm180nm 250nm350nm 500nm0 0.5 1 1.5 2 2.5 3 3.5 Technology Node (nm) Vo lt ag e [V] Vdd [V] Vth [V]

Figure 1: Power supply (VDD) and

threshold voltage (VTH) vs technology node [12]

0 1/8Y 1/4Y 1/2Y 1Y 2Y 0.5 1 1.5 2 2.5 3 3.5 4 4.5 P erce n tag e Ch an g e in Del ay Time (Years) 1V,25C 0.5V,25C 1V,100C 0.5V,100C

0 1/8Y 1/4Y 1/2Y 1Y 2Y 0.5 1 1.5 2 2.5 3 3.5 4 4.5 P erce n tag e Ch an g e in Del ay Time (Years) @1V @0.5V

1/8Y 1/4Y 1/2Y 1Y 2Y

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 P erce n tag e Ch an g e in Del ay Time (Years) @25C @100C (a) (b) (c)

Figure 2: Percentage change in delay of a five-stage ring oscillator due to change in (a) NBTI effect at two different temperatures and two different supply voltages, (b) operating temperature from 25°C-100°C at two different supply voltages,

and (c) supply voltage from 0.5V-1.0V at two different temperatures versus time respectively (extracted from [7]).

0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Aging [year] C h

ange in Gain due t

o N B TI ef fect [ d B ] Stressed at 125C Stressed at 100C Stressed at 75C Stressed at 50C Stressed at 25C 0 5 10 15 20 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 Aging [year] C h a n g e in G a in d u e to t emp c h n ag e [d B] Stressed at 125C Stressed at 100C Stressed at 75C Stressed at 50C Stressed at 25C 0 5 10 15 20 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Aging [year] C h a n g e in G ain du e to V D D ch an g e [d B] Stressed at 125C Stressed at 100C Stressed at 75C Stressed at 50C Stressed at 25C (a) (b) (c)

Figure 3: Change in open-loop gain of an opamp, stressed at different temperatures, due to (a) NBTI effect, (b) change in supply voltage from 1.1V-1.3V (c) change in operating temperature from 25°C-125°C versus stress time respectively.

Figure 4: The distribution function of an arbitrary performance parameter ‘P’ of

‘n’ identical electronic systems.

and saturation velocity are heavily affected by temperature variations [9] therefore varying the performance of associated electronic systems. For example, in automotive applications, electronic systems attached to automobile engines usually operate at higher temperature variations ranging from -40°C to 150°C [13]. Therefore, any variations in temperature will affect the performance of these electronic systems.

In digital systems the propagation delay is a function of drain saturation current provided by active transistors which further depends on supply voltage (VDD) and operational

temperature [9]. Therefore digital system performance is highly affected by both operational temperature and supply voltage. This variation in delay is independent of aging effects (NBTI etc.). Figures 2abc show the percentage change in delay of a five-stage ring oscillator over two years due to aging (NBTI), operational-temperature change from 25°C to 100°C and supply-voltage change from 0.5V to 1.0V respectively. Similarly, analog circuit performance is also sensitive to the operating temperature and the supply voltage. Figures 3abc show the change in open-loop gain of an amplifier designed in 65nm technology over twenty years due to aging (NBTI), operational-temperature from 25°C to 125°C, and supply voltage change from 1.1V to 1.3V respectively.

It is clear from these figures that the change in delay of a five-stage ring oscillator and open-loop gain of an amplifier is higher due to change in operating temperature and supply voltage at a particular time as compared to the change due to the NBTI (aging) effect. This means the operational-temperature and the supply-voltage variations play a significant role in the performance of digital and analog systems for the time these variations in operating temperature and supply voltage are present. These changes in performance are independent from the cumulative changes in performance due to the aging (NBTI) effect.

III. ENHANCING SYSTEM DEPENDABILITY

The distribution function of an arbitrary performance parameter ‘P’ of ‘n’ identical systems (e.g. open-loop gain ‘A’ of ‘n’ identical opamps) can be divided into three regions as shown in Fig. 4. The allowed region (green) of performance parameter (e.g. ±3ı of ‘A’) comprises the designed specification region under which the system will function correctly as desired. The not-allowed region of the performance parameter is the region where the performance parameter ‘P’ goes beyond the designed specifications and the system starts disfunctioning. Although under this condition the system still be working, the results will be unsatisfactory. Similarly, the permanent failure region of the performance parameter is the region where the performance parameter ‘P’ is not only beyond the designed specifications but the system stops working and hence results in a permanent failure (e.g. saturation at the output of an opamp for any input Vout = Vdd or 0).

The allowed region shows the normal operation but in order to enhance the dependability or to prolongate the operational life of the system the second and third regions should be counter-acted with proper actions. This can be accomplished by:

1) bringing back the operating temperature and supply voltage to normal operating values for short-term (non-permanent) effects.

2) digitally tuning back performance parameters to normal specifications for long-term effects.

3) incorporating some fault-tolerant strategies, like redundant systems, for permanent-failure effects. For example, cooling fans for adjusting temperature, digitally-tunable power supplies for adjusting supply voltages, and digitally-assisted electronic systems for reconfiguring or

(3)

Figure 5: Proposed hardware architecture for enhancing the dependability of a system on chip.

Figure 6: Workflow of the proposed dependable hardware architecture for achieving high dependability

digitally adjusting different performance parameters [14] can be used to restore the second region of disfunctioning performance parameters to the normal region. Similarly, the third region of permanent failure can also be restored to the normal region of performance parameters by completely replacing the permanent faulty unit with a spare unit.

Enhancing system dependability means enhancing its individual attributes [15]. However, in this paper the primary focus will be on three attributes namely the reliability, the maintainability and the availability. The reliability, the probability as a function of time that the system will be functioning correctly, can be enhanced by using options 1, 2, and 3. The maintainability, the probability as a function of time that the system will be repaired if it fails to function correctly, can be enhanced by making the right choice between options 1, 2, and 3. Similarly the availability, the probability as a function of time that the system will be available for its service, can be enhanced by reducing the repair time. One approach is by estimating the reliability of the system in advance and making the proper decisions at the right time for repair.

IV. DEPENDABLE HARDWARE ARCHITECTURE

In order to enhance the dependability of a system on chip, a new hardware architecture is proposed as shown in Fig. 5. It consists of redundant sub-blocks (SB1-SB4) and a number of

Figure 7: An example of a system where an arbitrary performance parameter ‘P’ is being monitored for its variations due to VT and NBTI effects.

Time points t1-t10 show VT adjustment points, T1-T3 show digitally tuning

points and T4 show the replacement point with a fresh unit.

monitoring, tuning and decision making circuits. The on-chip temperature and supply-voltage monitoring circuits can be used to monitor the overall temperature and supply-voltage of the whole chip or it can be used to monitor the temperature and supply-voltage of each individual sub-block as shown by the red and blue lines respectively. A performance parameter monitoring circuit is also present to monitor the most sensitive performance parameter(s) to short-term and long-term effects for estimating reliability of the whole system or reliability of each sub-block. These three monitoring blocks then communicate with the decision making and tuning circuitry, which is responsible for digital repair strategies as shown by dotted black lines.

Fig. 6 shows the workflow of this dependable architecture. This is further explained by an example of a system where an arbitrary performance parameter ‘P’, considered most sensitive to short-term and long-term effects, is being monitored for its variations due to short-term VT and long-term NBTI (aging) effects as shown in Fig. 7. The dotted blue line shows the variations in ‘P’ due to VT effects and the green line shows the variations due to the NBTI effect. During regular testing mode (Fig. 6), first, the performance parameter ‘P’ will be checked that either it is within the specified limits of the system specification or not. The system specifications stored in the database will be used here for comparison purposes. If ‘P’ is within the defined system specifications then the current value of ‘P’ along with the operating temperature and supply voltage values will be logged and stored in the database. These values are subsequently used to estimate the reliability of the current state of the system that can be further used to predict in advance a possible hazard if any. Similarly, in case ‘P’ is out of the defined system specifications stored in the database, proper actions as discussed in previous section will be taken.

It is clear from Fig. 7 that at time points t1-t5 the VT effects

try to move the performance parameter ‘P’ beyond its performance margin. At each of these time points (t1-t5, Fig. 7)

the VT values are brought back (option 1) to their normal values and the performance parameter ‘P’ returns to its normal allowed region of specification. This can be done by cooling techniques, switching off unnecessary parts, regulating or reconfiguring power supply to each sub-block etc. On the other hand at each of these time points (t1-t5, Fig. 7) the variations

(4)

T1 ‘P’ is moved beyond its performance margin. Therefore, at

time point T1 the digital tuning capabilities (option 2) of the

system are used to bring ‘P’ back to its normal region of specifications. Similarly, at t6-t10 and T2-T3 time points (Fig. 7)

the performance parameter ‘P’ also returns to its normal specification using above mentioned techniques. Most importantly, at time point T4 the system capabilities to move

‘P’ back to its normal specifications are not feasible anymore. That is there are no more digital tuning and VT adjustment options available to bring ‘P’ back to its normal specifications. Therefore, at this time point T4 the system will be replaced

(option 3) by a spare unit and it will start functioning again. This approach of dealing with the problem has a number of benefits. Adjusting temperature and supply voltage, on one side, will bring the system back to its normal operating conditions and on the other side will lower the gradual effect of aging phenomena hence slowing down its effect. Similarly, the digital tuning and replacement options will be solely used for adjusting long-term aging and permanent failure effects respectively. This will certainly increase the reliability of the system.

Estimating reliability in advance will make it easy to anticipate possible solutions and hence lowering the repair time. By lowering the repair time the maintainability will be increased. This decrease in repair time will further increase the availability of the system to perform its operation. Hence by increasing reliability, availability, and maintainability, the dependability of the system will be increased.

V. SIMULATIONS AND RESULTS

In order to investigate the feasibility of the proposed architecture and the workflow, a target system has been modeled in the LabVIEW environment. The proposed idea is explained based on the simulation results of a single sub-block (SB1) of the system in Fig. 5 and is compared to a similar sub-block simulated without the proposed idea. The ‘gain’ of the SB1 has been considered as the most sensitive parameter to VT and NBTI variations. Table 1 shows the designed specifications of SB1 at the design stage and the possible VT variations in the system working environment.

Fig. 8 shows the possible ‘gain’ (GPSB1) variations in SB1

due to possible VT variations in the system working environment. It is clear from this figure that GPSB1 can go

beyond designed specifications (GDSB1) as a result of possible

VT variations in the system working environment. Similarly, Fig. 9 shows the simulation results of SB1 with and without the proposed architecture and workflow.

Let SB1C (subscript C as an indicator) represents the

sub-block which is compensated by means of the proposed idea and has eight possible digital tuning options for adjusting ‘gain’ performance parameter. Similarly, SB1NC represents the

sub-block which is neither compensated by means of the proposed idea nor has any digital tuning capabilities. The four sub-graphs in Fig. 9 represent the four monitors (M1-M4). Monitor

M4 shows the change in gain due to aging (NBTI) effect only

and the results of the decision making and tuning circuitry. The other three modeled monitors are the temperature monitor

Table 1: Designed specifications of SB1 and the possible VT variations in the system working environment

Name Value

Designed gain of SB1 GSB1 20dB

Designed supply voltage variations of SB1 VDSB1 [1.175V-1.225V]

Designed temperature variations of SB1 TDSB1 [-25°C - 75°C]

Designed gain variations of SB1 GDSB1 [19dB - 21dB]

Possible working supply voltage variations VPSB1 [1.1V - 1.3V]

Possible working temperature variations TPSB1 [0°C - 125°C]

Figure 8: Possible gain variations of SB1 due to VT variations in the system working environment.

(M1), the supply voltage monitor (M2), and the performance

(the ‘gain’ in current case) monitor (M3). The vertical red lines

in M4 show the time points when temperature and supply

voltage for SB1C are adjusted back (option 1) to normal values

(i.e. 25°C and 1.2V respectively) if the GDSB1 goes beyond

designed specifications ([19dB-21dB]). These adjustments in temperature, supply voltage and resulting ‘gain’ of SB1C are

shown by the white lines in monitors M1, M2, and M3

respectively. The T1-T8 show the time points when the

adjustments in ‘gain’ of SB1C are no more possible by means

of adjusting temperature and supply voltage back to normal values. At these time points (T1-T8) the aging (NBTI) effect is

moving the ‘gain’ of SB1C beyond its designed specifications

and the digital tuning (option 2) capabilities of SB1C are used

to move the ‘gain’ back to its normal designed specification. The ‘gain’ values of SB1NC and the corresponding

unadjusted variations in temperature and supply voltage are shown by a red line in monitors M3, M1, and M2 respectively. It

is clear from these monitors that because of no adjustments in temperature and supply voltage at several time points (t1-t15)

the ‘gain’ parameter of SB1NC remains out of specification

bounds either for short duration or long duration depending on the VT and aging variations respectively.

The comparison between the two ‘gain’ parameters of SB1C (white line) and SB1NC (red line) from M3 have made it

clear that due to the proposed architecture and workflow the SB1C can be used for a longer time till time point R1. It needs a

replacement (option 3) with a new SB1C at time point R1 (Fig.

9) when there is no digital tuning option left for tuning the ‘gain’ back to its normal specifications. On the other hand SB1NC needs a replacement at each time point T1-T8. The worst

thing is it will behave out of specification bounds at several time points (t1-t15). Similarly, the presented idea can be

extended to other sub-blocks for enhancing the dependability of the whole system.

(5)

Figure 9: Simulation results of a single sub-block (SB1) of the system in Fig. 5 with and without the proposed architecture (Fig. 5) and workflow (Fig. 6) VI. CONCLUSIONS

In this paper the dependability enhancement of electronic systems has been discussed. Reliability being one attribute of dependability is being influenced for short-term or non-permanently by operating temperature, supply voltage and for long-term or permanently by a number of aging phenomena. Therefore, by separating them as short-term and long-term effects proper actions can be taken in order to enhance the dependability of the system. This has been achieved in the proposed hardware architecture and workflow which regularly monitors and stores the operating temperature and supply voltage along with the most sensitive performance parameter(s) of the system in the logged database. It then estimates the reliability of the system based on these stored values. These estimations are subsequently used in anticipating system performance in advance or taking proper actions by adjusting back temperature and supply-voltage variations to nominal values, or by digitally tuning and replacing system sub-blocks to enhance the overall dependability of the system. Unfortunately due to lack of system-level dependability enhancement techniques especially for analog and mixed-signal systems, according to our best knowledge, the current technique cannot be compared to other techniques. However, the simulation results for a dummy system, modelled in a LabVIEW environment, under random supply voltage and working temperature variations show that the proposed technique for enhancing system dependability during operational life is a valid technique. Furthermore, these simulations are based on the high-level abstract models that do not include the detailed implementation complexities and fabrication related area overheads. Similarly, fabrication related process variations are not discussed here rather they will be discussed in future publications.

REFERENCES

[1] L.L. Lewyn, T. Ytterdal, C. Wulff, and K. Martin, “Analog Circuit Design in Nanoscale CMOS Technologies,” in Proceedings of the IEEE, Vol. 97, No. 10, pp. 1687–1714, 2009.

[2] D.K. Schroder and J.A. Babcock, “Negative bias temperature instability: Road to cross in deep submicron silicon semiconductor manufacturing,” in Journal of Applied Physics, Vol. 94, No. 1, pp. 1–18, 2003. [3] N.K. Jha, P.S. Reddy, D.K. Sharma, and V.R. Rao, “NBTI degradation

and its impact for analog circuit reliability,” in IEEE Tran. Electron Devices, Vol. 52, No. 12, pp. 2609- 2615, 2005.

[4] B.C. Paul, K. Kang, H. Kufluoglu, M.A. Alam, and K. Roy, “Impact of NBTI on the temporal performance degradation of digital circuits,” in IEEE Electron Device Letters, Vol. 26, No. 8, pp. 560- 562, 2005. [5] Y. Lu, et al., “Statistical reliability analysis under process variation and

aging effects,” in IEEE Int. Design Automation Conference (DAC), pp. 514-519, 2009.

[6] M.A.A. Latif, N.B.Z. Ali, and F.A. Hussin, “A case study of process-variation effect to SoC analog circuits,” in IEEE Int. Conf. Recent Advances in Intelligent Computational Systems (RAICS), pp. 520-523, 2011.

[7] S. K. Krishnappa, H. Singh, and H. Mahmoodi, “Incorporating Effects of Process, Voltage, and Temperature Variation in BTI Model for Circuit Design,” in IEEE Latin American Symposium on Circuits and Systems, pp. 236-239, 2010.

[8] O.S. Unsal, et al., “Impact of Parameter Variations on Circuits and Microarchitecture,” in IEEE Micro, Vol. 26, No. 6, pp. 30-39, 2006. [9] R. Kumar, and V. Kursun, “Impact of temperature fluctuations on circuit

characteristics in 180nm and 65nm CMOS technologies,” in IEEE Int. Symp. on Circuits and Systems (ISCAS), pp. 3858-3861, 2006. [10] M. Alioto, and G. Palumbo, “Impact of Supply Voltage Variations on

Full Adder Delay: Analysis and Comparison,” in IEEE Tran. on Very Large Scale Integration (VLSI) Systems, Vol. 14, No. 12, pp. 1322-1335, 2006.

[11] N.K. Jha, P.S. Reddy, D.K. Sharma, and V.R. Rao, “NBTI degradation and its impact for analog circuit reliability,” in IEEE Tran. Electron Devices, Vol. 52, No. 12, pp. 2609- 2615, 2005.

[12] A. Baschirotto, et al., “Low Power Analog Design in Scaled Technologies,” in Topical Workshop on Electronics for Particle Physics (TWEPP), pp.103-110, 2009.

[13] R. W. Johnson et al., “The Changing Automotive Environment: High Temperature Electronics,” in IEEE Tran. on Electronics Packaging Manufacturing, Vol. 27, No. 3, pp. 164-176, 2004.

[14] S.C. dela Cruz, et al., “Design and implementation of operational amplifiers with programmable characteristics in a 90nm CMOS process,” in IEEE Eur. Conf. Circuit Theory and Design, pp. 209-212, 2009.

[15] A. Avizienis, J-C. Laprie, and B.Randel1, “Fundamental concepts of dependability,” in Laboratory for Analysis and Architecture of Systems (LAAS-CNRS) Technical Report no. 01-145, Apr. 2001.

Referenties

GERELATEERDE DOCUMENTEN

Als we er klakkeloos van uitgaan dat gezondheid voor iedereen het belangrijkste is, dan gaan we voorbij aan een andere belangrijke waarde in onze samenleving, namelijk die van

In conclusion, this thesis presented an interdisciplinary insight on the representation of women in politics through media. As already stated in the Introduction, this work

In het laboratorium werden de muggelarven genegeerd zowel door bodemroofmijten (Hypoaspis miles, Macrochelus robustulus en Hypoaspis aculeifer) als door de roofkever Atheta

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

• The final author version and the galley proof are versions of the publication after peer review.. • The final published version features the final layout of the paper including

3 Deterministic linear dynamical systems SYSID usually employs as the model class dynamical systems that are i linear, ii time-invariant, and iii that satisfy a third property,

Utilization of opportunistic BLE network for animal mobility pattern RSSI/PathLoss Model Estimated Euclidean Distance Matrix Estimated Adjacency Matrix Critical Distance