UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL INSTITUTO DE INFORMÁTICA CURSO DE ENGENHARIA DE COMPUTAÇÃO

RODRIGO NOGUEIRA WUERDIG

# Low-Power Design of CMOS Time-to-Digital Converters

Work presented in partial fulfillment of the requirements for the degree of Bachelor in Computer Engineering

Advisor: Prof. Sergio Bampi Coadvisor: M.Sc. Bruno Canal

Porto Alegre May 2022

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL

Reitor: Prof. Carlos André Bulhões Mendes

Vice-Reitora: Prof.<sup>a</sup> Patricia Helena Lucas Pranke

Pró-Reitora de Ensino (Graduação e Pós-Graduação): Prof.<sup>a</sup> Cíntia Inês Boll Diretora do Instituto de Informática: Prof<sup>a</sup>. Carla Maria Dal Sasso Freitas Diretora da Escola de Engenharia: Prof<sup>a</sup>. Carla Schwengber Ten Caten Coordenador do Curso de Engenharia de Computação: Prof. Walter Fetter Lages Bibliotecária-chefe do Instituto de Informática: Beatriz Regina Bastos Haro Bibliotecária-chefe da Escola de Engenharia: Rosane Beatriz Allegretti Borges

*"c'est par la logique qu'on démontre, c'est pas l'intuition qu'on invente. Savoir critiquer est bon, savoir créer est mieux."* — JULES-HENRI POINCARÉ

#### ACKNOWLEDGEMENTS

I could not start differently than by deeply thanking my dear family and girlfriend, those who always motivated me to follow my inner aspirations and were always there for me.

"If I have seen further it is by standing on the shoulders of Giants." states a famous quotation attributed to a letter from Sir Isaac Newton to Robert Hooke. If I were to look at that quote from my own personal perspective, especially during my brief academic career, I would attribute the term "giant" to those who guided me on this wonderful path. First and foremost, I would credit that to my then-teacher, Matheus Trevisan, who introduced me to the fascinating field of microelectronics. A topic that absolutely caught my attention and kept me curious. Equally, Prof. Ney Laert Vilar Calazans (PUCRS) and Prof. Sergio Bampi (UFRGS), both of whom I deeply admire and respect as my former undergraduate research advisors, are two more magnificent giants. However, I am a lucky person who had several other giants on my path, including Matheus Bohrer. I was looking for someone to help me deal with PDEs. Nonetheless, I found a great and clever friend. As well as my comrades from the 215 laboratory. Vitor Lima, Brunno Abreu, Iago Severo, and Guilherme Ferreira that always kept the lab in a good mood. My professional colleagues and fellow friends, Lucio Franco, Tiago Fróes, Marcos Sartori, and Matheus Ferronato, also supported my trajectory with their friendship and knowledge of several endless topics, which often piqued my interest. I would also like to thank Eduardo Uzejka and Anderson Sant'Ana for their friendship and support after all those years, and also, last but not least, my co-advisor, Bruno Canal, for his support and valuable insights on this work, and all HaiLa employees for their kindness and comprehension.

#### **ABSTRACT**

The Time-to-Digital Converter (TDC) is an important circuit block for digitally quantifying the time displacement between digital events. Among several applications of the TDC, this work focuses on its application to low-power Successive-approximation Analog-to-Digital Converters (SAR ADC). The TDCs can assist the SAR algorithm to improve the energy efficiency of capacitive DAC switching schemes, which constitute a key block in the SAR ADC. This work has four main goals: i) investigating physical device sizing optimizations for digital cells using methods applicable prior to the actual circuit and cell design; ii) using device-level simulations to determine the possibility of using ZTC operation of MOSFETs, at less-than-nominal  $V_{DD}$ , for the target 28nm CMOS technology; iii) comparing different D Flip-Flop topologies in terms of setup time requirement, power, and energy per operation; and iv) designing a flash architecture TDC for the aforementioned application. The implemented coarse 8-bit deep TDC, in a manufacturable 28 nm Bulk CMOS technology, displayed good coverage of the SAR-ADC input after a calibration step. The TDC had a simulated mean power dissipation of just  $9.25 \mu W$  at 600 mV supply voltage, making it a good option for applications that are not very demanding in terms of precision.

Keywords: Time-to-Digital Converters. Low-Power Design. VLSI CMOS. Mixed-Signal CMOS.

# Design de um Conversor Time-to-Digital de Baixa Dissipação de Potência em Tecnologia CMOS

# RESUMO

Conversores *Time-to-Digital* (TDC) são extremamente importantes em sistemas eletrônicos para a quantização de tempo entre eventos de natureza digital. Dentre as diversas aplicações de um TDC, este trabalho foca na aplicação dele para conversores analógicodigital de aproximações sucessivas (SAR ADC) de baixa dissipação de potência. A utilização de TDCs pode auxiliar o algoritmo de busca SAR em aumentar a eficiência energética do esquema de chaveamento capacitivo do conversor digital-analógico (DAC), que é um bloco crítico dos SAR ADCs. Este trabalho tem quatro grandes propósitos: i) investigar otimizações de dimensionamento de transistores para células digitais antes do desenvolvimento do circuito e das células que compõem o mesmo ; ii) realizar simulações em nível de dispositivo. para investigar a possibilidade de uso do ponto de operação ZTC dos MOSFETs, abaixo da tensão  $V_{DD}$ ) nominal, para a tecnologia 28 nm alvo deste trabalho; iii) comparar diferentes topologias de Flip-Flops tipo D em termos de tempo de *setup*, dissipação de potência e de energia por operação; iv) desenvolver um TDC do tipo flash para conversores analógico-digital SAR. A implementação do TDC de 8-bits de profundidade, no nodo tecnológico *Bulk* CMOS fabricável de 28 nm, demonstrou uma boa cobertura das entradas possíveis do SAR ADC após a etapa calibração. Os resultados de simulação indicam que a potência média dissipada pelo TDC chegou a valores como de  $9.25 \mu$  W em 600 mV, fazendo dele uma boa opção para a aplicação em circuitos não muito demandantes em termos de precisão.

Palavras-chave: Conversores *Time-to-Digital*, Baixa-Potência, CMOS , Sinais-mistos.

### LIST OF ABBREVIATIONS AND ACRONYMS

- ADC Analog-to-Digital Converter
- ADPLL All-Digital Phase-Locked Loop
- ASIC Application-Specific Integrated Circuit
- BSIM Berkeley Short-Channel IGFET Model
- CDAC Capacitor Digital-to-Analog Converter
- CMOS Complementary metal-oxide-semiconductor
- DAC Digital-to-Analog Converter
- DFF D-type Flip-Flop
- DNL Differential Non-Linearity
- DTC Digital-to-Time Converter
- DUT Device Under Test
- EDA Electronic Design Automation
- EDP Energy-Delay Product
- ENOB Effective Number of Bits
- FF Flip-Flop
- FINFET Fin Field-effect Transistor
- FO4 Fan-out of 4
- FoM Figure of Merit
- GA Genetic Algorithm
- GRO-TDC Gated-Ring-Oscillator Time-to-Digital Converter
- HVT High-Threshold-Voltage
- IC Integrated Circuit
- IEA International Energy Agency
- INL Integral Non-Linearity
- IoT Internet of Things
- IP Intellectual Property
- LSB Least Significant Bit
- LVT Low-Threshold-Voltage
- MC Monte Carlo
- MEP Minimum Energy Point
- MIM Metal-Insulator-Metal
- MOM Metal-Oxide-Metal
- MOSFET Metal-oxide-semiconductor Field Effect Transistor
- MSB Most Significant Bit
- NMOS N-channel MOSFET
- NTV Near-Threshold Voltage
- PDK Process Design Kit
- PDP Power-Delay Product
- $P_{dynamic}$  Dynamic Power
- PLL Phase-Locked Loop
- PMOS P-channel MOSFET
- PN p-type and n-type
- $P_{SC}$  Short-Circuit Power
- P<sub>Static</sub> Static Power
- PVT Process, Voltage and Temperature
- RFID Radio-frequency identification
- RO Ring-Oscillator
- RSCE Reverse Short-channel Effect
- SAFF Sense-Amplifier Flip-Flop
- SAR Successive-Approximation

# SNM Static Noise Margin

- SNR Signal-to-Noise Ratio
- SoC System-on-Chip
- STDC Stochastic Time-to-Digital Converter
- STEM Science, Technology, Engineering, and Mathematics
- SVT Standard-Threshold-Voltage
- TA Time Amplifier
- $t_{CO}$  Clock-to-Output Delay
- TDC Time-to-Digital Converter
- $t_{DC}$  Setup Time Interval
- TMSP Time-mode Signal Processing
- ToF Trade-off Function
- $t_{ox}$  silicon-dioxide thickness
- TSPC True Single Phase Clock
- TVC Time-to-voltage Converter
- VDL-TDC Vernier Time-to-Digital Converter
- $V_{DS}$  Drain-to-Source Voltage
- $V_{GB}$  Gate-to-Bulk Voltage
- $V_{GS}$  Gate-to-Source Voltage
- $V_t$  FET Threshold Voltage
- VTC Voltage-to-Time Converter
- $V_{tho}$  Threshold Voltage for Zero Source-to-Substrate Bias
- VS Voltage Scaling
- ZTC Zero-Temperature-Coefficient
- ∆Σ TDC Delta-Sigma Time-to-Digital Converter
- $\tau$  Delay

# LIST OF FIGURES





# LIST OF TABLES



# **CONTENTS**



# <span id="page-13-0"></span>1 INTRODUCTION

Mobile devices partake a notable role in today's society, from smartphones to credit cards. They can shape the way we interact, behave, and carry out monetary transactions. Most mobile devices rely on the use of a scarce energy supply, which can be powered by a battery or through energy harvesting (e.g., RFID devices). Energy harvesting, or energy scavenging, is the process of gathering energy from the environment passively, e.g., solar power, thermal energy, wind energy, and kinetic energy. Batteries are vital factors in those mobile devices; fortunately, recent advances enabled battery-cell energy densities to almost triple since 2010 [\(SANGUESA et al.,](#page-70-0) [2021\)](#page-70-0). However, until these days, batteries were not improving their energy volumetric density at the same rate as integrated circuits demand more power and higher energy supply to last longer. Usually, the more significant advances taken towards energy efficiency are done in integrated circuits, not in the battery. Those advances in energy efficiency can be correlated with Moore's law. Moore's law is an engineering observation overtime, regarding the trend that the transistor density on an integrated circuit used to double every two years, for more than 5 decades. This increased density is a result of the advances in lithography and the capability of making smaller transistors, which fits how the energy efficiency changes according to the gate length. In 2010, Koomey et al. described the trend that the number of computations per joule of energy dissipated doubles about every 1.57 years [\(Koomey](#page-69-0) [et al.,](#page-69-0) [2011\)](#page-69-0). Unfortunately, on deep sub-micron devices, the difference between every node's efficiency is getting smaller. To increase efficiency in deep sub-micron devices, designers need to adopt other strategies such as reducing the supply voltage of the circuit operation, i.e., Voltage Scaling (VS).

Within the growing market of IoT, disposable electronics (e.g., RFID in clothes) and the obsolescence of consumer electronics, electronic waste, or e-waste, are raising lots of environmental concerns. Recent studies indicate that only in 2021, over 52 million metric tons of post-consumer e-waste were discarded globally. If we keep the same rate, those values should double at some point between 2030 and 2040 [\(LEPAWSKY,](#page-69-1) [2020\)](#page-69-1). Moreover, in their annual review on energy, the International Energy Agency (IEA) showed that global electricity demand is heading for its fastest growth in more than ten years. Unfortunately, in 2020, the  $CO<sub>2</sub>$  levels, due to electrical energy production, reached its highest-ever average annual concentration in the atmosphere [\(INTER-](#page-69-2)NATIONAL..., [2021\)](#page-69-2).

Transistor aging affects system reliability and timing, which is a vital concern for CMOS devices [\(TAGHIPOUR; ASLI,](#page-71-0) [2017\)](#page-71-0). Aging is related to the injection of charge carriers into the insulator at the silicon- $SiO<sub>2</sub>$  interface, which has to be minimized and is caused by very high electric fields present in the MOSFETs, especially in concise channels. The near-threshold voltage (NTV) is an ultra-low-power strategy where the circuit supply voltage is close to, and in most cases, slightly below, the transistor threshold. Since the currents decrease exponentially below the moderate inversion, the CMOS energy consumption can be significantly reduced while paying the price of a substantially increased circuit delay. It means that the current density is lower and the carriers in the channel have lower energy, which drastically reduces the harmful hot carrier effects and oxide injection at low  $V_{DD}$ , improving the circuit reliability [\(KHDR; AMROUCH; HENKEL,](#page-69-3) [2018\)](#page-69-3). Therefore, NVT is a very resilient operating mode concerning aging effects while providing the best energy efficiency for IoT SoCs.

The field of green computing aims to reduce the environmental impacts of manufacturing, operation, and device disposal. Yet, reducing the supply voltage could benefit two main green computing pillars: electronic devices operation (due to the reduction of energy consumption) and disposal (due to the decrease in aging effects in semiconductors).

<span id="page-14-0"></span>

Figure 1.1: Block diagram of TMSP for analog and digital processing.

Source: [\(ROBERTS; ALI-BAKHSHIAN,](#page-70-1) [2010\)](#page-70-1), modified by the author.

Time-mode Signal Processing (TMSP) is a form of signal processing that utilizes propagation delay as their primary form of data encoding. The usage of TMSP can reduce both the area and power dissipation of analog circuits [\(ROBERTS; ALI-BAKHSHIAN,](#page-70-1) [2010\)](#page-70-1). The main building blocks for TMSP, Fig. [1.1,](#page-14-0) are: i) Voltage-to-Time Converters (VTC); ii) Time-to-Voltage Converters (TVC); iii) Digital-to-Time Converters (DTC); iv) Time-to-Digital Converters (TDC). The VTC and TVC convert from one continuous domain (time or voltage) to another continuous domain (voltage or time). The TDC and DTC, on the other hand, convert from a continuous domain (time) to a discrete domain (digital word) and vice-versa. This work focuses on the implementation of a TDC.

A Time-to-Digital circuit is implemented in order to quantify the time displacement between events. As in Fig. [1.2,](#page-15-0) it takes two inputs (start and stop) and quantifies the time displacement ( $\Delta t$ ). This displacement is them presented in form of a digital word output that can be encoded in different ways, e.g., binary, thermometer code. TDCs can be found in a variety of circuits, especially mixed-signal circuits such as Phase-Locked Loops, SAR ADCs, Particle Detectors, etc.

<span id="page-15-0"></span>Figure 1.2: Waveform of TDC inputs and the analog interval  $\Delta t$  to be quantized or digitized.



Source: The Author

With the increase in popularity of IoT, applications such as mesh wireless sensor networks are already a reality. Those node devices are usually battery-free (e.g., RFID) or powered by tiny batteries. In order to fetch sensors' analog information, those devices rely on the usage of Analog-to-Digital Converters (ADC). Improving the energy efficiency of ADCs is crucial for extending the IoT devices' lifetime and efficiency. Instead of counting on several comparators, Successive-approximation Analog-to-Digital Converters (SAR ADC) utilize search schemes to improve energy efficiency while doing the analog-to-digital conversion. A built-in digital-to-analog converter (DAC) and a comparator are used repeatedly during each phase of the binary search, see Fig. [1.3.](#page-16-0) Usually, the internal DAC used on a SAR-ADC is composed of a capacitive digital-to-analog converter (CDAC). The linearity of the complete SAR-ADC is very dependent on the DAC's linearity. Unfortunately, because of mismatch effects, the CDAC linearity is proportional to the capacitor sizes used. As smaller capacitors tend to present higher variations than larger ones. Furthermore, to improve SAR ADC linearity, designers can rely on postfabrication calibration strategies.

The usage of TDCs can assist the SAR algorithm in improving the energy effi-

<span id="page-16-0"></span>

Source: [\(TANI et al.,](#page-71-1) [2017\)](#page-71-1), modified by the author.

ciency of capacitive DAC switching schemes. There are several ways of applying a TDC to SAR ADC architectures. In coarse time domain SAR ADCs, a differential voltage-totime converter (VTC) can be used as an input to the TDC to quantify different voltage levels from the capacitive digital-to-analog converter (CDAC) [\(WU et al.,](#page-71-2) [2014\)](#page-71-2). This work implements another way of measuring the time taken by the comparator, since different voltage inputs will lead to different delays. The non-linearity between the input voltage range and time is the fundamental flaw in this strategy. However, this strategy also presents low area and power costs when compared to other approaches.

A Phase-Locked Loop (PLL) is a control system that generates an output signal from a reference source that is synced with the original one. To keep a steady state, the output frequency is usually comparable to the reference source frequency. The PLL can outline the source frequency and generate multiple frequencies from the signal source. The PLLs can be analog, fully digital, or mixed-signal. The appropriate topology changes according to the application. Even though this work focuses on the development of a TDC for SAR ADCs, the same methodology used in this work can also be applied to the development of TDCs for All-Digital Phase-Locked Loops (ADPLL) [\(STASZEWSKI et](#page-71-3) [al.,](#page-71-3) [2004\)](#page-71-3) [\(STASZEWSKI et al.,](#page-71-4) [2005\)](#page-71-4). ADPLL relies on the inherent benefits of the semi-custom design template to reduce power dissipation, reduce area, and increase the time to design as there are several consolidated EDA tools that are already adopted by the industry for digital IC design.

Nowadays, several devices rely on wireless system communications. This communication can be done using several different technologies: I) Bluetooth Low Energy; II) Zigbee; III) Zigbee; IV) Z-Wave; V) LoRa / LoRaWAN; VI) SigFox; VII) WiMAX. Those technologies can count on the usage of proprietary hardware that can be made

of both ADPLLs and other non-digital PLLs, E.g., LoRa transceiver IP generates a stable chirp using a Fractional-N Phase Lock Loop (PLL). Furthermore, other technologies rely on the incorporation of all-digital PLLs, such as WiMAX. Salvatore Levantino et al. [\(LEVANTINO et al.,](#page-70-2) [2009\)](#page-70-2) described the design of an ADPLL for the WiMAX 3.3- 3.8GHz bandwidth.

The motivation for this work urges as Time-to-digital converters, due to their highswitching nature, are very power punitive. Despite the target application, designers are constantly putting their efforts into improving TDC's energy efficiency (as in Chapter [5\)](#page-41-1) or completely removing them. Xing Chen et al. [\(CHEN et al.,](#page-68-1) [2019\)](#page-68-1) proposed a novel architecture for low-power Bluetooth where the ADPLL power dissipation is mitigated by removing the explicit TDC and the normalization circuit and employing what they call an embedded TDC and so relying on the divider-less design of an ADPLL. However, the TDC resolution is limited by the number of RO stages at a high frequency where an ADPLL for aggressive in-band PN suppression is implemented.

This work is organized as follows: after the present introduction, Chapter [2](#page-18-0) reviews some basic low-power design methodologies and concepts that can be used for the design of a low-power TDC. Since flip-flops are key parts in the functioning of a TDC, Chapter [3](#page-27-1) deals with different types of flip-flops and shows the three different Dtype topologies that were considered for the TDC design. Chapter [4](#page-31-0) revisits the basic TDC concepts, introduces the main performance metrics and their main characteristics, and also describes seven different TDC architectures. Chapter [5](#page-41-1) exhibits state-of-the-art works related to the design of TDCs and highlights which aspects and techniques of each work are interesting for the aforementioned application. Chapter [6](#page-45-1) depicts the design steps taken to the design of the 8-bit deep flash TDC proposed in this work.

#### <span id="page-18-0"></span>2 REVIEW ON LOW-POWER DESIGN

As introduced in the precious chapter, this work focuses on the development of a digital CMOS circuit. For most applications, time-to-digital converters present a high switching nature and power dissipation. This chapter introduces the metrics and techniques that can be used to reduce the circuit's power dissipation and increase the battery life of mobile devices. In order to deeply understand the energy consumption nature of a device, first some basic concepts about power dissipation should be addressed. The power dissipated by a circuit can be depicted in two main parts: Static Power and Dynamic Power.

• Static Power  $(P_{static})$  - the power that is being dissipated while the circuit has no digital switching activity and it is defined as in Eq[.2.1.](#page-18-1)

<span id="page-18-1"></span>
$$
P_{static} = I_{static} * V_{DD}
$$
 (2.1)

• Dynamic Power ( $P_{dynamic}$ ) - It is related intrinsically to the switching activity, and it is composed of two main parts, as in Eq. [2.2.](#page-18-2) The  $P_{switching}$  that describes the amount of power dissipated when the driver devices change the charge state of the capacitive loads at each gate output ( between logic states 0 to 1 (charging the load) and 1 to 0 (discharging the stored charge in the load). And the short-circuit power  $(P_{SC})$ , which is the power dissipated by the current directly drained from the supply voltage when the inputs to the gates are transitioning such that both the pull-up and the pull-down networks are on. The capacitive load  $C_{Load}$  is composed of several effects, like the MOSFET device gate capacitance, wire load, and fanout loads.

<span id="page-18-2"></span>
$$
P_{dynamic} = P_{switching} + P_{SC} \qquad (2.2) \qquad P_{dynamic} = C_{Load} V_{DD}^2 \alpha f + P_{SC} \quad (2.3)
$$

<span id="page-18-3"></span>As part of the digital CMOS dynamic power dissipation, the Short-Circuit Power  $(P_{SC})$  appears during the transient switching of a complementary pair of transistors, when the input is near the half of the supply- $V_{DD}$ , such that both PMOS and NMOS transistors exhibit a strong inversion channel charge - hence, during this short transient, a short-circuit current  $(I_{SC})$  flows from the  $V_{DD}$  rail to the  $V_{SS}$  power domain. The short-circuit power is described in Eq. [2.4](#page-18-3) [\(RABAEY,](#page-70-3) [1996\)](#page-70-3).

$$
P_{SC} \approx V_{DD} I_{SC} \frac{\tau_{in}}{4} 2f \approx V_{DD}^2 f \frac{C_{Load}}{10}
$$
 (2.4)

A reduction of the supply voltage, leads to decrease in the power dissipation; however, the operation delay is also severely affected. Energy is a relationship between the total power dissipation ( $P_{dynamic} + P_{static}$ ) and the operation delay (τ). As energy per operation is defined as:

$$
Energy/op = (P_{dynamic} + P_{static}) * \tau
$$
\n(2.5)

Usually, most devices have a minimum-energy-point (MEP), or the supply voltage where you use less energy to perform the same task, whether the amount of time it spends is around 300 to 500 mV. In Fig. [2.1,](#page-19-0) Vivek et al. [\(DE; VANGAL; KRISHNAMURTHY,](#page-68-2) [2017\)](#page-68-2) displayed how the energy per operation changes according to the supply- $V_{DD}$ . The measured chip can vary its power dissipation from 2 to 737 mW. Due to design limitations, the cache cannot go below 550 mV, displaying a limiting factor on the SoC voltage scaling boundaries. However, by separating the cache from the core logic voltage domains, they were able to achieve better results. Despite having increased design complexity, an increased number of voltage domains permits exploring more efficiently the voltagescaling in each domain. Energy per operation at the NTV operating point of 450 mV is almost five times better than at the nominal voltage. It is highly plausible that the design MEP would be closer to 300 mV without the cache limitation.



<span id="page-19-0"></span>

To mitigate power dissipation in electronic devices, several techniques have been developed with the advancements in STEM areas in the last 50 years. Those techniques can cover one or more abstraction layers. Nowadays, an emerging field in energy-efficient circuits is hardware-software co-design. Although, as mentioned in the Low Power Design Methodologies by Rabaey et al. [\(RABAEY,](#page-70-3) [1996\)](#page-70-3) those abstraction layers can be classified as in Fig. [2.2.](#page-20-0) A simplified view of the system leads to the perception that all techniques applied to each abstraction layer will actually improve the power dissipation by either reducing the supply- $V_{DD}$ , the voltage swing, the physical capacitance, the switching activity, or improving a set of them.

<span id="page-20-0"></span>Figure 2.2: A ultra-low-power solution would ideally require optimization at all design abstraction layers.



Source: [\(RABAEY,](#page-70-3) [1996\)](#page-70-3), Modified by the Author

Acting on one abstraction layer is likely to affect just a few of these basic concepts; e.g., on the algorithm layer, we would mostly reduce the switching activity. Acting on every front/abstraction of the project is highly important in order to obtain a true lowpower solution.

### <span id="page-20-1"></span>2.1 System Level

While designing a particular system, engineers, despite defining the particular hardware and their power requirements, often designers can rely on the usage of techniques such as partitioning and power states to reduce the circuit's power-dissipation. On modern SoCs, a common approach is to introduce sleep or idle modes where the power dissipation is drastically reduced [\(BENINI; MICHELI,](#page-68-3) [1999\)](#page-68-3). Those systemlevel approaches can gather benefits from several different low-power techniques, e.g., frequency-, voltage-scaling, and power gating. Due to the advances in Moore's law, along with the increase in the density of transistors per  $\mu$  m<sup>2</sup> also led to an increase in the switching activity per area. This increase in the power switching density caused a problem that is popularly called "Dark Silicon," where partitioning and power gating of blocks (cores) inside multi-core chips is a must to reduce circuits' power/thermal dissipation [\(ES-](#page-69-4)[MAEILZADEH et al.,](#page-69-4) [2011\)](#page-69-4).

### <span id="page-21-0"></span>2.2 Algorithm Level

Previous work displayed the impact of algorithm level optimizations in terms of the Energy-Delay Product (EDP) for different languages, algorithms, compilers, and implementation choices [\(ABDULSALAM et al.,](#page-68-4) [2014\)](#page-68-4) [\(GEORGIOU et al.,](#page-69-5) [2018\)](#page-69-5). Since energy is defined as the power times the delay, the EDP puts quadratic importance on delay. A software runtime on an embedded system can be optimized by tweaking several aspects, such as the language, compiler, compiler flags, complexity, concurrency, regularity, and locality.

### <span id="page-21-1"></span>2.3 Architectural Level

On the architectural level, several digital systems techniques were developed, mostly during the '90s. Parallelization of hardware functions to work at lower frequencies, clock-gating of functional units, power-gating of functional blocks, and similar techniques are powerful to enable power savings in digital processing. Other schemes for energy efficiency include low-power data encoding and processing architectures for low power. Jun Yang et al. [\(YANG; GUPTA,](#page-72-0) [2001\)](#page-72-0) proposed a clever way of encoding data into a bus according to a small number of distinct values, which they called "frequent values." This dramatically reduces the bus switching activity. Chapter [4](#page-31-0) covers most state-of-the-art topologies for low-power TDCs.

#### <span id="page-21-2"></span>2.4 Circuit/Logic Design Level

At this level, the implementation of digital blocks like adders, encoders, comparators, and multipliers can be optimized for power and energy minimization. Techniques like complex CMOS logic, arithmetic unit approximation, and approximate logic synthesis can be used judiciously, as long as they do not compromise the full-system application performance [\(PAIM,](#page-70-4) [2021\)](#page-70-4). Transistor sizing, as further seen in Sec. [6.3,](#page-53-1) can also be labeled as a circuit abstraction technique.

#### <span id="page-22-1"></span>2.5 Physical/Device Level

P and N diffusion have different electrical behaviors, and their asymmetry at low- $V_{DD}$  becomes even more delicate since a few millivolts can decide if the transistor is either on weak, moderate, or high inversion. To increase symmetry between P and N MOS transistors, several characteristics can be tweaked: doping concentration, silicon dioxide thickness, electrical potential, and transistor dimensions (length and width).

<span id="page-22-0"></span>

Source: The Author

The variation in doping concentration, e.g., boron, arsenic, and phosphorus dopants, in metal-oxide-semiconductor field-effect transistors (MOSFET), severely impact transistor behavior. This behavior is significantly correlated to the change of transistors threshold voltage by doping concentration [\(TAKAMIZAWA et al.,](#page-71-5) [2012\)](#page-71-5). Process Design Kits (PDKs) often offer several transistor types; some popular ones are Low-Threshold-Voltage (LVT), Standard-Threshold-Voltage (SVT), High-Threshold-Voltage (HVT), whose threshold voltage difference is mainly due to the doping concentration or silicon-dioxide thickness  $(t_{ox})$ . Unfortunately, fine-tuning the dopant concentration levels and silicondioxide thickness is not usually available to designers since this is a manufacturing process. Furthermore, thicker  $t_{ox}$  leads to a larger  $V_t$  variation and results in higher random dopant fluctuation and a higher  $V_t$  mismatch between devices, which would be terrible for designing time-sensitive designs as a TDC.

### <span id="page-23-1"></span>2.6 Standard-Cell Sizing

Designers of digital cells are able to tweak a few dimensions of the transistor. The geometry of a transistor plays a significant role in its functionality. The increase in gate length, *L* in Fig. [2.3,](#page-22-0) severely impacts the threshold voltage since the threshold voltage is defined as the minimal gate-to-source voltage  $(V_{GS})$  to create a conducting pathway between the drain and source terminals. Increasing the distance between the drain and source terminals requires a higher electrical potential (threshold voltage) between the gate and source to create a conducting path.

<span id="page-23-0"></span>



Source: The Author

Blesken et al. [\(Blesken et al.,](#page-68-5) [2009\)](#page-68-5) proposed a multi-objective sizing optimization method, for parameters like static noise margin (SNM), delay, and dynamic energy consumption, which can be arranged into bi- and three-dimensional search spaces for optimization and analysis. In [\(Blesken; Lütkemeier; Rückert,](#page-68-6) [2010\)](#page-68-6), their method was tested for sub-threshold operation and impacts on NM were studied. In [\(Lutkemeier et](#page-70-5) [al.,](#page-70-5) [2013\)](#page-70-5), the methodology was ASIC-proven. Their method, however, exploits effects that are not so relevant anymore for more advanced CMOS nodes, such as the reverse short-channel effect (RSCE), which is no longer prevailing in undoped-channel FinFETs, for instance [\(THEAN et al.,](#page-71-6) [2006\)](#page-71-6).

Previous work done by the author demonstrated a systematic evaluation of different transistor sizing methodologies for ultra-low-power operation. The presented methodologies correlate different library design optimizing goals, such as area, maximum attainable frequency, energy consumption, and symmetric transition slews. The proposed method can be applied to any CMOS technology, as their results establish the methodology as a simple approach to enhance energy efficiency for NTV [\(WUERDIG et al.,](#page-72-1) [2020\)](#page-72-1). In the aforementioned work, simulations were performed for a 40 nm bulk CMOS technology, which is very similar to the one used in this work, with a nominal supply voltage  $(V_{DD})$  of 900 mV. In order to operate in NTV, the  $V_{DD}$  is reduced down to 0.3 V, which is very near to the digital MEP, as previously found in [\(Rosa et al.,](#page-70-6) [2015\)](#page-70-6) [\(JAIN; LIN;](#page-69-6) [ALIOTO,](#page-69-6) [2017\)](#page-69-6).

First, the trade-off functions (TOFs) proposed in [\(WUERDIG et al.,](#page-72-1) [2020\)](#page-72-1) for cell-design optimization are presented. The presented TOFs are meaningful Figures-of-Merit (FoMs) aiming to customize the sizing of the FETs in the logic cell for each near- $V_T$  application. Design variables (sizes) and performance figures, like energy loss per switching pair events (L-H-L or H-L-H), frequency, delays, etc., are normalized in the TOF formulation, according to the equation shown below:

$$
std(x) = \frac{x - min(x)}{max(x) - min(x)},
$$
\n(2.6)

where x is the circuit metric to be normalized,  $std(x)$  is the normalized value of x,  $min(x)$ is the minimum value of x, and  $max(x)$  is the maximum value of x for a given set of design space exploration. Despite normalization, some variables or metrics - such a cell area cost - might be more difficult to model directly (in  $\mu$ m) from the transistor sizes. To consider a TOF with two design parameters (e.g. area and delay, or delay and power), the approach includes two weight constants,  $K_1$  and  $K_2$ , shown below, to consider the designers' intent in realizing trade-offs and to allow a fine tuning in the std-cells FETs sizing.

The TOFs analysis were done at both the nominal and near-threshold supply  $V_{DD}$ , and considering multiple drive strengths for the inverter or logic cell considered. These simulations use SPICE transient analysis, in which  $W_{PMOS}$  is swept from the current value of  $W_{NMOS}$  up to 1.44  $\mu$ m (a foundry-imposed design rule for maximum W), for multiple values of  $W<sub>NMOS</sub>$ . Varying transistor sizing for minimum delay uses ring-oscillator (RO) simulations as test-benches, since the RO oscillating frequency is inversely proportional to the average of  $tp_{hl}$  and  $tp_{lh}$  delays. This RO method allows for the evaluation of both cell timing and power.

To model realistic fan-out scenarios, 4 minimum-size inverters per drive strength are added to both the input and the output (FO4) of the ring-oscillator (RO). Therefore, the RO test-setup is composed by 5 stages: three inverter stages (with 4 inverters each), one Device Under Test (DUT), and one NAND2 that enables/disables oscillation. This RO benchmark attains at a FO4 a nominal frequency of about 15 GHz (in 40 nm CMOS, at the nominal  $V_{DD}$  and temperature).

# <span id="page-25-1"></span>2.6.1 Energy vs. Maximum Frequency Trade-off

The TOF that considers both energy consumption and maximum frequency is given by

<span id="page-25-0"></span>
$$
F_1(\xi) = \frac{K_1 * std(\overline{U}) + K_2 * std(T_{min})}{2},
$$
\n(2.7)

where  $\xi$  is the ratio of  $W_{PMOS}$  and  $W_{NMOS}$ ,  $T_{min}$  is the period of the oscillation (i.e. the inverse of the maximum RO frequency),  $K_1$  and  $K_2$  are weight constants, and U is the energy consumption per output pulse (l-h-l).

The first term of  $F_1$  represents a normalized energy consumption, while the second term represents normalized avg delay, as to maximize the oscillation frequency. The constants  $K_1$  and  $K_2$  allow cell designers to weight more in the TOF either speed performance or energy consumption.

#### <span id="page-26-0"></span>2.6.2 Area Cost vs. Maximum Frequency Trade-Off

The area consumed by the ring oscillator is correlated with its oscillation frequency by the following TOF

$$
F_2(\xi) = \frac{K_1 * std(Ac) + K_2 * std(T_{min})}{2},
$$
\n(2.8)

where  $T_{min}$  is the period of the oscillation,  $K_1$  and  $K_2$  are weight constants, and Ac is the active area of the oscillator. The active area is for now simplified (as a pre-layout estimate) and it is given by  $Ac = W_{PMOS} + W_{NMOS}$ , where W is the width of PMOS and NMOS. While the value of  $W_{NMOS}$  is kept constant, the value of  $W_{PMOS}$  is varied from minimum-Width (120 nm) to 1.44  $\mu$ m, according to foundry design rules.

#### <span id="page-26-1"></span>2.6.3 Diffusion Area Cost vs. Slew-Rate Symmetry Trade-Off

The trade-off between cell area and the slew-rates ratios is given by the following TOF

$$
F_3(\xi) = \frac{K_1 * \xi + K_2 * (\overline{t_{rise}(\xi)}/\overline{t_{fall}(\xi)})}{2},
$$
\n(2.9)

where where  $\xi$  is the ratio of  $W_{PMOS}$  and  $W_{NMOS}$ ,  $t_{rise}$  and  $t_{fall}$  are respectively the rise and fall times of the DUT, and  $K_1$  and  $K_2$  are weight constants.

The first term of  $F_3$  is the area cost, while the second term stands for the slew-rate symmetry. The constants  $K_1$  and  $K_2$  set the focus of the TOF. If  $K_1 > K_2$ , the TOF prioritizes the area cost figure. Meanwhile, if  $K_1 < K_2$ , the TOF prioritizes the slew-rate symmetry. Finally, if  $K_1 = K_2$ , both figures have equal priority. In this work, the latter situation is selected.

#### <span id="page-27-1"></span>3 REVIEW ON FLIP-FLOPS

New solutions are continuously arising in response to the growing interest in suband near-threshold digital circuits. Notably, in those conditions, the digital circuit speed is severely affected, or reduced by 3 or 4 orders of magnitude from nominal supply operating condition. As explained in the next section, energy efficiency is a trade-off relationship between delay and power dissipated. To achieve high frequencies, SoCs often rely upon increasing temporal barriers, i.e., changing the pipeline granularity. Additionally, this results in an overall increase in flip-flops (FF) present in the architecture. Unfortunately, their performance in terms of setup-delay, hold-delay, and Input-to-Output delay ( $t_{CO}$  or  $t_{DO}$ ) is severely hampered under low supply  $V_{DD}$  conditions, significantly affecting the SoC's energy consumption.

Despite the TDC architecture, registers are a key part of their construction and, in some cases, are responsible for most of the TDC's power dissipation. In this chapter, three very distinct flip-flops, Tab. [3.1,](#page-27-0) are presented and further reviewed in the chapter [6.](#page-45-1) The selected flip-flops were intentionally distinct as a form of evaluating which characteristic could be more beneficial to the design of a low-power TDC. Each selected D-type flipflop has a different appealing aspect that does not necessarily bring a great advantage to the design of a low-power TDC. The three selected FFs were: a classic dynamic-logic TSPC; a static-logic PowerPC 603 FF; and a state-of-the-art sense-amplifier-based FF with completion detection (SAFF-TCD). The common characteristic among all of them is that all of them are reclaimed as low-power solutions, each with their own benefits and drawbacks.



<span id="page-27-0"></span>

# <span id="page-27-2"></span>3.1 TSPC

True Single Phase Clock logic (TSPC) [\(YUAN; SVENSSON,](#page-72-2) [1989\)](#page-72-2) emerged as the demand for faster registers as the heavy capacitive loading and long interconnects

lead to longer transition times and negatively affect clock skews. In Fig. [3.1](#page-28-0) we have a precharged TSPC Flip-Flop with a total of 11 transistors (nine in dynamic stages, plus an extra two for the static CMOS inverter to have a positive unate output). Due to their dynamic logic nature, transistors should be carefully sized for proper operation.

<span id="page-28-0"></span>

Figure 3.1: True Single Phase Clock Flip-Flop Schematic

Source: The Author

#### <span id="page-28-1"></span>3.2 PowerPC 603 FF

While the TSPC uses 3 dynamic inverter stages, the PowerPC 603 in Fig. [3.2](#page-29-0) is a static flip-flop. As mentioned by CORNELIUS et al. [\(CORNELIUS et al.,](#page-68-7) [2016\)](#page-68-7), the large total delay  $t_{DC} + t_{CQ}$ , as a consequence of the increased setup time, is one of the main weaknesses of this design. Despite this drawback, static-logic designs have benefits for applications where the input signals are not frequently triggered or deal with dynamic power-states (e.g., the sleep mode of a SoC).

#### <span id="page-28-2"></span>3.3 SAFF-TCD

Hanwool Jeong et al. proposed a novel Sense-Amplifier Based Flip-Flop that features a completion detection system [\(JEONG et al.,](#page-69-7) [2018\)](#page-69-7). Its topology focuses on low-supply voltage  $(V_{DD})$  operation. Extra circuitry that fetches the state of  $/S$  and  $/R$ 

## Figure 3.2: PowerPC 603 FF Schematic

<span id="page-29-0"></span>

Source: [\(CORNELIUS et al.,](#page-68-7) [2016\)](#page-68-7)

signals, indicating whether the completion of the sense-amplifier stage is complete or not. The completion signal gates the pull-down network (PDN) path of the sense-amplifier stage and the slave latch.

The SAFF-TCD can operate at voltages in the near-threshold or subthreshold region for the employed 22 nm FinFET PDK, reaching voltages of about 300–400 mV. Results reported that the delay of the SAFF-TCD is twice as quick as the master–slavebased FF (MSFF). Primarily due to the increased count of transistors, the SAFF-TCD may hold extra switching activity and fan-in capacitance, leading to an energy consumption overhead of approximately 20% compared to MSFF.

<span id="page-30-0"></span>

Figure 3.3: SAFF-TCD Schematic



<span id="page-31-0"></span>With the advance in time-mode signal processing (TMSP) design methodology for analog and mixed-signal circuits, novel circuits are being explored to convert continuous data (as time and voltage) into discrete (digital) data to process them digitally. The digital manipulation of signals enables designers to explore several benefits inherent to digital design, such as the time-to-design of a digital circuit and the low-power dissipation of standard cells. Some popular converters are voltage-to-time converters (VTC), time-tovoltage converters (TVC), digital-to-time converters (DTC), and the one explored in this work, the time-to-digital converter (TDC). The TDC is an instrumentation device intended for measuring the time discrepancy between two signals (or events). Its resolution and power-consumption requirements, however, are application-dependent, and the number of possible applications for a TDC is huge. The TDC can be found in time-based receivers, biomedical applications, data converters, and ADPLLs.

# <span id="page-31-1"></span>4.1 FoMs and Metrics

While making a hypothesis, humans commonly dispose of trade-off ideas to quantify if something is worth it or not. Generally, relying on the composition of a figure-ofmerit (FOM), some well-designed FOMs have come to be almost a scientific standard for some circuits. Time-to-digital converters are usually compared using the same state-ofthe-art FOMs as DACs and ADCs. TDCs also hang on the same metrics as ADCs and DACs (as they all aim for good conversion linearity) such as Differential Non-Linearity (DNL) and Integral Non-Linearity (INL).

The Differential Non-Linearity (DNL) is a performance metric that describes the deviation of each step from its expected value, according to each step. The deviation is measured in least significant bits (LSb).

$$
DNL(n) = \left(\frac{\Delta t(n+1) - \Delta t(n)}{T_{Resolution}}\right) - 1\tag{4.1}
$$

The Integral Non-Linearity (INL) is a performance metric deviation between the ideal output value and the actual measured output value for a certain input code. Its ideal value, which is normalized to one LSb.

<span id="page-32-1"></span>
$$
INL(n) = \frac{\Delta t(n) - T_{Resolution}}{T_{Resolution}}
$$
\n(4.2)

Walden, in his classical paper on Analog-to-digital converters [\(WALDEN,](#page-71-7) [1999\)](#page-71-7), proposed a FOM, Eq. [4.3,](#page-32-0) that has become almost a standard metric in the academy. The proposed figure of merit makes a relationship between  $P$  (power dissipation),  $f_s$  (Nyquist sampling rate), and ENOB (effective number of bits, Eq. [4.4\)](#page-32-1).

<span id="page-32-0"></span>
$$
FOM_W = \frac{P}{f_s \cdot 2^{ENOB}} \quad (4.3) \qquad \qquad ENOB = \frac{SNDR_{peak} - 1.76dB}{6.02} \quad (4.4)
$$

The Effective Number of Bits (ENOB), Eq. [4.4,](#page-32-1) is defined by terms of the signal-to-noise and-distortion ratio (SNDR, see Eq. [4.5\)](#page-32-2) (where  $SNDR_{peak}$  is the peak SNDR expressed in dB). Signal-to-(Noise + Distortion) (SNDR or SINAD) is the ratio of the signal power to the total noise and harmonic power at the output.

$$
SNDR = 20log\left(\frac{Signal}{Noise + Distortion}\right)
$$
\n(4.5)

<span id="page-32-3"></span><span id="page-32-2"></span>As Walden, Schreier also proposed two variants of a Figure-of-Merits that became widely used among the analog and mixed-signal community. Those FOMs are the Schreier Distortion-Ratio (DR) FOM, Eq. [4.6,](#page-32-3) [\(SCHREIER; TEMES,](#page-70-7) [2005\)](#page-70-7) and the Schreier Signal-to-(Noise + Distortion) (SNDR or SINAD) FOM, Eq. [4.6](#page-32-3) [\(ALI et al.,](#page-68-8) [2010\)](#page-68-8).

$$
FOM_{S,DR} = DR + 10log\left(\frac{BW}{P}\right)
$$

$$
FOM_{S,SNDR} = SNDR + 10log\left(\frac{f_s/2}{P}\right)
$$
(4.6)

TDCs, as mentioned at the start of this section, can be found in a variety of applications. Several topologies have been developed over the years to meet the needs of each application, each with its own set of benefits and drawbacks. Designers can use the presented FoMs to decide which topology is best for them.

### <span id="page-33-2"></span>4.2 Single Counter Time-to-Digital Converter

While doing hardware descriptions for FPGAs, designers often confront the necessity of implementing counters for several applications. Usually, the most natural way of implementing is by using the stable external crystal clock that development boards provide, along with frequency dividers and conditional counters. The counter steps (minimal  $\Delta time$  or unitary delay  $\tau$ ) correlate to the frequency period; moreover, higher frequencies will provide a higher granularity, i.e., smaller time gaps between digital words.

<span id="page-33-0"></span>

Figure 4.1: Diagram of the Single Counter TDC Architecture.

Source: The Author

The simplest way to implement a Time-to-Digital for ASICs is very similar. Single Counter Time-to-Digital Converter quantifies the period between  $Start \rightarrow Rise$  and  $Stop \longrightarrow Rise$  by the number of clock periods in the Reference Clock pin, see Fig[.4.1.](#page-33-0)

<span id="page-33-1"></span>



Source: The Author

Assuming a hypothetical reference clock signal of 1 GHz, or a period of 1 ns, the circuit would start counting values at the rise of the start signal and stop counting with

the rise of the stop signal. In figure [4.2](#page-33-1) there were four clock cycles between the start and stop signals, so there was about  $4\times$  the reference clock period in time displacement (or 4ns). So, the single counter is not able to measure sub-clock period resolutions, making it not applicable for several applications even with its simplicity.

#### <span id="page-34-1"></span>4.3 Flash Time-to-Digital Converter

The Flash TDC [\(RAHKONEN; KOSTAMOVAARA,](#page-70-8) [1993\)](#page-70-8) is another simplistic implementation of a TDC. The Flash ADC, the circuit that inspired the Flash TDC name, utilizes a resistor ladder with uniformly distributed resistances to create a uniform step of voltages. The flash TDC utilizes a similar concept; but, instead of having voltage steps, it has delay elements (instead of resistors) and registers (instead of comparators) to create delay steps in the time domain.

<span id="page-34-0"></span>



# <span id="page-35-1"></span>4.4 Stochastic Time-to-Digital Converter

The Stochastic Time-to-Digital Converter (STDC) [\(SAMARAH; CARUSONE,](#page-70-9) [2013\)](#page-70-9) is similar to the Flash TDC, but it exploits the delay mismatch between latches in order to have a resolution that is not defined by the delay line. Unfortunately, in order to benefit from the higher resolution, the STDC has the disadvantage of having its resolution defined by the number of redundant latches (Fig. [4.4\)](#page-35-0), which results in a large area overhead and increased switching and static power dissipation. For those reasons, the STDC, despite its accuracy and form of bypassing device variability, depending on the application constraints, employing those extra flash-TDC lines and extra circuitry may not be worth it.

<span id="page-35-0"></span>



Source: The Author

As mentioned by Ito et al. [\(ITO et al.,](#page-69-8) [2010\)](#page-69-8), the outputs of the DFFs may contain so-called bubble errors [\(RAZAVI,](#page-70-10) [1995\)](#page-70-10), mainly due to setup and hold time mismatches among DFFs and Delay lines. However, applying an encoder circuit that counts the number of ones in the DFFs' outputs can ensure the monotonicity of the TDC.

#### <span id="page-35-2"></span>4.5 Gated Ring-Oscillator Time-to-Digital Converter

The Gated Ring-Oscillator Time-to-Digital Converter (GRO-TDC) is quite similar to the Flash TDC architecture. Instead of relying on an external source for the "start" signal, it would be generated by an internal ring-oscillator, see Fig. [4.5.](#page-36-0) This capabil-
ity provides some excellent features as a self-timed circuit, e.g., measuring the RO and compensating the PVT variations.



Figure 4.5: Diagram of the Gated Ring-Oscillator TDC Architecture.

Source: The Author

The author has already designed at UFRGS a GRO TDC, which was fabricated in the TSMC CMOS 180 nm technology, during the 2020 calendar year. This TDC is a gated ring-oscillator time-to-digital converter, with a micrograph shown in Fig. [4.6.](#page-36-0) The GRO-TDC layout view can be found in Fig. [A.1](#page-73-0) in the Appendix of this monograph.

<span id="page-36-0"></span>Figure 4.6: On the left, the GME-AMS 2020 chip micrograph, and on the right, a 3D rendered view of the GRO-TDC.



Source: The Author

The fabricated Gated Ring-Oscillator Time-to-Digital Converter has a 5-bit deep resolution with 29 inverting gates. Simulation results display proper operation for voltages as low as 300 mV; However, it is important to reinforce the complexity embedded to this type of TDC. Oscillators can get really complicated. This design was fully custom and had an area cost of  $108\mu$  m  $\times$  95 $\mu$  m with the buffers and  $73\mu$  m  $\times$  95 $\mu$  m without the buffers. Simulations display that the 29 inverting stages RO could achieve frequencies close to 450 MHz at 1.8V, as in Appendix [A.2.](#page-74-0) Simulations also display that it could achieve power dissipation values as low as 6.3  $\mu$  W at 300 mV, see Appendix [A.3.](#page-74-1) A test board (Appendix [A.4\)](#page-75-0) is already designed for doing the measurements. Due to shipment mishaps, the author was not able to measure the chip for the context of this work.

#### 4.6 Vernier Time-to-Digital Converter

The Vernier Time-to-Digital converter (VDL TDC), despite its similarity with the Flash TDC. The resolution, or time granularity, between each output is defined by the time displacement between  $\tau_1$  and  $\tau_2$  (where  $\tau_2 < \tau_1$ ). So, its resolution,  $Res = \tau_1 - \tau_2$ , can be smaller than one delay element  $(\tau)$ . Unfortunately, since both delay lines and registers utilize CMOS standard cells, they suffer from PN mismatch, needing meticulous tweaking of parameters for achieving a high resolution under near- $V_t$  regime, yet it is still a good option for low-power devices.

<span id="page-37-0"></span>

Figure 4.7: Diagram of the Vernier TDC Architecture.

Its functionality is similar to the Flash TDC. The main difference is that after the arrival of start and stop signals, both signals are manipulated to decrease their time difference at each Vernier stage, see Fig. [4.7.](#page-37-0) Another variation of the VDL TDC is the Vernier ring TDC, which merges the benefits from the VDL TDC and the Gated-RO TDC, where each delay line is a closed ring-oscillator loop.

## 4.7 Cyclic Pulse-Shrinking Time-to-Digital Converter

Raisanen-Ruotsalaine et al. [\(RAISANEN-RUOTSALAINEN; RAHKONEN; KOS](#page-70-0)TA-[MOVAARA,](#page-70-0) [1995\)](#page-70-0), proposed a TDC where its resolution can be shorter than one buffer delay without the use of any interpolation. Its functionality is very like the VDL TDC. The Cyclic Pulse-Shrinking TDC, instead of employing regular delay lines as the flash and vernier TDCs, utilizes pulse-shrinking delay lines. On pulse-shrinking delay lines, the width of the propagating pulse decreases uniformly across the stages until the pulse disappears, i.e., undetectable.

<span id="page-38-0"></span>

Figure 4.8: Diagram of the Cyclic Pulse-Shrinking TDC Architecture.

Source: The Author

The pulse-shrinking topology, as seen in Fig. [4.8,](#page-38-0) requires asymmetrical delay elements, the propagation between rising and falling edges should be carefully controlled and matched between each stage to obtain enough linearity. Those challenges increase significantly the complexity of implementation such as TDC; However, employing it as a closed-loop (cyclic), gathers the same benefits as on the Gated-RO TDC, the use of the oscillators reduces the matching requirements on the delay elements. This feature reduces the complexity of the design and mitigates PVT variation-induced problems. The pulseshrinking delay element can be made out of several ways, one of the simplest ways is to utilize a current-starving inverter attached to another generic static inverter.

## 4.8 ∆Σ Time-to-Digital Converter

One of the main disadvantages of the Flash TDCs is that the time resolution is determined by the delay value,  $\tau$  in Fig. [4.3.](#page-34-0) Instead,  $\Delta \Sigma$  Time-to-Digital Converter has its resolution determined by the sampled period, being the resolution inversely proportional to the sampled time. Since the  $\Delta\Sigma$  TDC, relies on the repetitive sampling of signals for precise estimation of the time discrepancy, they are not suitable for single-event measurements. Also, mismatches among the standard cells composing the delay lines degrade the linearity of the TDC [\(UEMORI et al.,](#page-71-0) [2012\)](#page-71-0).



<span id="page-39-0"></span>

Source: [\(SZPLET,](#page-71-1) [2014\)](#page-71-1), modified by the author.

 $\Delta\Sigma$  TDC functionality is attached to the charging and discharging behavior of a capacitor, see Fig. [4.9.](#page-39-0) After the arrival of the start and stop signals on the pulse generator (i.e., fully differential charge pump), a pulse charges the Metal-Insulator-Metal (MIM) capacitor that has its charging curve (modulated time difference  $\Delta$ , T in Fig. [4.10\)](#page-40-0) integrated ( $\Sigma$  modulator,  $T_R$  in Fig. [4.10\)](#page-40-0) and fed to a counter. Meanwhile a clock period pulses the counter, as seen in Fig. [4.10,](#page-40-0) similarly to the Single Counter TDC introduced earlier. But, since the discharging  $(T_R)$  of the capacitor takes more time (compared to the clock period) than the time discrepancy between the two input signals  $(T)$ , you can have a much better resolution than the Single Counter TDC.

<span id="page-40-0"></span>Figure 4.10: Diagram of the ∆Σ TDC functionality. Start and End markers indicate where the counter would start and stop counting after crossing  $\Sigma$  modulator threshold value.  $U_C$ is the Metal-Insulator-Metal (MIM) capacitor charge.



Source: [\(SZPLET,](#page-71-1) [2014\)](#page-71-1), modified by the author.

# 5 RELATED WORK

In the previous chapters, several different TDC topologies were presented. This chapter reviews some of the state-of-the-art works regarding TDCs. Those implementations are compared in Tab. [5.1](#page-41-0) in order to further evaluate the different topologies and design strategies, and their impacts in several aspects such as power-dissipation, area, DNL, INL, resolution, and event-rate capabilities. All of the works cited in this section were created and developed using deep-submicron technologies, which are very similar to the 28 nm bulk CMOS technology used in this work, with the exception of the FinFET 14nm design in Tab. [5.1.](#page-41-0)

<span id="page-41-0"></span>

|                           | <b>TDC</b> Implementations |                        |                         |                          |
|---------------------------|----------------------------|------------------------|-------------------------|--------------------------|
|                           | (KIM et al., 2015)         | (KIM; KIM; PARK, 2014) | (WANG; DAI; WANG, 2018) | (CHUNG; HYUN; KIM, 2021) |
| <b>Topology</b>           | Stochastic                 | Cyclic                 | 2-D Spiral Vernier      | Stochastic               |
| <b>Node</b>               | $14 \text{ nm}$            | $28 \text{ nm}$        | $45 \text{ nm}$         | $65 \text{ nm}$          |
| <b>Process</b>            | FinFet                     | <b>Bulk CMOS</b>       | <b>CMOS SOI</b>         | <b>Bulk CMOS</b>         |
| Supply $V_{DD}$           | 0.6V                       | 0.9V                   | 1.0V                    | $1.1 \sim 1.3 V$         |
| Power [mW]                | 0.78                       | 0.82                   | $0.07 - 0.69$           | 6.2                      |
| <b>Event Rate ME/s</b>    | 100                        | 10                     | 80                      | 100                      |
| <b>Resolution</b> [ps]    | 1.17                       | 0.63                   | 1.25                    | 0.36                     |
| Area $\lceil mm^2 \rceil$ | 0.036                      | 0.01                   | 0.04                    | 0.068                    |
| DNL [LSb]                 | 0.8                        | 0.5                    | 0.25/0.31               | 0.77                     |
| $INL$ [LSb]               | 2.3                        | 3.8                    | 0.34/0.4                | 0.75                     |

Table 5.1: State-of-the-art implementations of TDC circuits.

Despite the TDC architecture, using place and route tools is usually not worth it, even considering the benefits of time-to-design. The layout-induced timing mismatch is even more prevalent in top-notch technologies since, with the reduction of gate length, the interconnection delay has become more critical than the gate delay. Some works presented herein are leading towards the femtosecond scale intending to measure jitter variation. While going for extreme resolutions, even small routing discrepancies between elements can lead to linearity impacts.

Kim et al. [\(KIM et al.,](#page-69-0) [2015\)](#page-69-0) proposed a low-power and PVT-variation-tolerant TDC architecture that does not require any calibration, using stochastic phase interpolation and 16  $\times$  spatial redundancy. The Stochastic Phase Interpolation TDC with 16  $\times$ Spatial Redundancy, as in Fig. [5.1,](#page-42-0) consists of seven main blocks: i) buffers and input inverters; ii) simple logic inverter delay cells; iii) a 3-input AND gate; iv) latch; v) a pulse generator; vi) an adder circuit; vii) an LSB truncating block. Since all the blocks are simple and capable of being easily composed of digital standard cells, the circuit can be



<span id="page-42-0"></span>Figure 5.1: Block Diagram of the Stochastic Phase Interpolation TDC with  $16 \times$  Spatial Redundancy

Source: [\(KIM et al.,](#page-69-0) [2015\)](#page-69-0).

synthesized. The synthesis can lead to mismatch problems that were mentioned before. However, a  $16 \times$  redundancy drastically softens the mismatch impact on the output data, even for modern technologies such as the 14 nm FinFET used in their work. Despite that, the redundancy comes with a considerable area and power drawback, Tab. [5.1.](#page-41-0)

A clock signal with a period  $T_{CLK}$ , is fed to the delay chain, which is composed of N unitary delays ( $\tau_{UNIT}$ ). The output of each delay cell is sequentially switched as the input clock propagates through the chain. The rise-time of the  $N_{th}$  stage can be calculated as the sum of unit delays with the clock period, i.e.,  $\tau_N = N * \tau_{UNIT} + k * T_{CLK}(k =$  $(0, 1, 2...)$ . When the rise-edge of the clock signal propagates to the last delay cell,  $2^n$ interpolated phases can be collected from each unit delay output. Two START and STOP signals are also inserted into the TDC. The START and STOP time difference (STOP-START) can be interpolated by an AND operation with the output of each delay cell, i.e., to find if the clock rising-edge at each delay stage matches the STOP-START time window. This interpolation will be stored on latches that, after being added, will have their LSB bits truncated to mitigate the non-linearity of the N-bit linear quantifier. The number of bits that are truncated (m) will impact the effective TDC output bits, which will be  $(N - m)$  bits. To improve performance, their architecture employs  $2^m$  times more unitary delays, which might explain the area cost. The area cost of their work, even with a modern 14 nm FinFET, almost surpasses the one done by Wang et al. [\(WANG;](#page-71-2) [DAI; WANG,](#page-71-2) [2018\)](#page-71-2), which features similar resolution and power dissipation (even with a greater Supply- $V_{DD}$ ).

Sung-Jin Kim et al. [\(KIM; KIM; PARK,](#page-69-1) [2014\)](#page-69-1) proposed a cyclic TDC that can

achieve sub-picosecond resolution. They implemented a novel  $2 \times$  time amplifier (TA) whose gain is insensitive to variations and noise using their proposed synchronous time adder. The TA-based TDC features a multi-step data conversion scheme that can increase the time resolution. TA-based TDCs, however, are usually very power-hungry [\(LEE;](#page-69-2) [ABIDI,](#page-69-2) [2008\)](#page-69-2) [\(KIM et al.,](#page-69-3) [2012\)](#page-69-3) [\(ELKHOLY et al.,](#page-68-1) [2015\)](#page-68-1). The use of a  $2 \times$  timeamplifier can soften both the area and power drawbacks.

The application of 2-D Vernier proposed by [\(WANG; DAI; WANG,](#page-71-2) [2018\)](#page-71-2) seems a good choice for ultra-low-power implementation despite its resolution when compared to the other cited works. This 2-D vernier topology gathers benefits from different topologies and circuit design techniques, being a reconfigurable VDL TDC with a 2-D spiral comparator array and  $\Delta\Sigma$  modulators for improving both detection range and linearization. The author used identical unit delay cells to reduce the mismatch between the vernier fast and slow delay chains. The authors present a tunable unit delay cell that can vary from 19 to 43 ps according to a seven-bit digital input that is used to obtain digitally assisted calibration that meets tuning requirements against PVT variations. This tunable unit delay cell served as an inspiration for the one implemented in the next chapter. Since the authors prototyped the TDC in a 45-nm silicon on insulator technology, they measured maximum DNL across its detectable range of 1.35/1.03 ps without the linearization techniques and 0.31/0.4 ps with the proposed linearization techniques.

The stochastic topology presented by [\(CHUNG; HYUN; KIM,](#page-68-0) [2021\)](#page-68-0) presents a TDC with a very high resolution, featuring dual time offset arbiters that enable calibration for linearity improvements. Each arbiter circuit is composed of a decision circuit, a rail-to-rail latch, a DFF, and a delay cell. The TDC utilizes a complex linearity calibration scheme using a genetic algorithm (GA). The use of GA-assisted calibration enables the rapid search within the search space to find the optimal time offset mode selection setting. This setting is used in arbiters to minimize INL. This work achieves incredible resolutions with the largest node technology among the compared ones in Tab. [5.1.](#page-41-0) Unfortunately, these efforts toward high resolutions come with a big drawback in area and power dissipation, resulting in a mean power dissipation of 6.2 mW.

Some recent work implements hybrid TDC topologies, which are a combination of different TDC topologies. Merging coarse and fine TDCs according to MSB or LSB can gather the benefits of each type, achieving a good trade-off between power dissipation and resolution [\(ZHANG et al.,](#page-72-0) [2019\)](#page-72-0).

The SAFF-TCD evaluated in the next chapter was specially chosen under the in-

fluence of the detailed work presented by Popong et al. [\(EFFENDRIK,](#page-68-2) [2011\)](#page-68-2). Their work demonstrates the development and results of a pseudo-differential TDC architecture featuring sense-amplifier flip-flops in 40 nm CMOS. The implemented TDC meets WiMAX ADPLL specifications yet again, with a power dissipation of 2.99 mW and a resolution of 10.84 ps to 12.55 ps.

## 6 DESIGN METHODOLOGY

As previously mentioned, the TDC finds numerous and important applications in IC design. Among those, this work selects and focuses on its application to an alternative low-power SAR ADC topology shown in Fig. [6.2](#page-46-0) [\(CANAL et al.,](#page-68-3) [2022\)](#page-68-3). This SAR ADC implements a new binary search algorithm to find the digital code that best represents the analog input. Generally, this algorithm requires N steps to find an N-bit digital code. In each step, the SAR process compares the input voltage against the voltage provided by the DAC and, according to the comparator output, defines the digital value of the corresponding bit. Then, the SAR process adjusts the DAC voltage to a value that tries to be closer to the input one and compares it against the input value again.

<span id="page-45-0"></span>

Figure 6.1: Simplified Schematic of the CDAC

Source: The Author

In low-power SAR ADCs, the DAC is generally implemented by a Capacitive Digital-to-Analog Converter (CDAC) that employs the sample and hold function and is responsible for the binary search algorithm. Then, the CDAC consists of an input switch to sample the input voltage and a capacitor array composed of unit capacitors, usually disposed in a  $2^n$  progression of capacitance values. A binary word can control the bottom plate, performing different capacitance combinations, and, through switching the bottom plate capacitors, it can output different desired voltages, as in Fig. [6.1.](#page-45-0) In the topology presented in Fig. [6.2,](#page-46-0) the use of TDCs aims to assist the SAR algorithm to accelerate the searching process, improving the energy efficiency of the capacitive DAC switching scheme.

In the SAR ADC, the differential output voltage from the CDAC is fed into the inputs of a comparator circuit. The comparator's primary function is to transform a voltage difference between its input terminals into a digital signaling. According to the amplitude

<span id="page-46-0"></span>

Figure 6.2: Block Diagram of the SAR ADC target application.

Source: [\(CANAL et al.,](#page-68-3) [2022\)](#page-68-3), Modified by The Author

of the CDAC output voltage, the utilized hybrid comparator may take more or less time to conclude the comparison. A NOR operation of the differential comparator output can be used to identify when the comparator concludes its process. Therefore, the TDC exploits the comparator timing process information and translates it into a binary signal that represents the amplitude of the CDAC output voltage. Feeding the SAR logic block with the polarity information provided by the comparator and the signal amplitude characteristic from TDC allows one to identify when the CDAC voltage is close to the sampled input signal. This behavior, then, can avoid some unnecessary SAR steps. This process is called the window switching scheme and aims to skip some CDAC capacitor switching to save energy in the CDAC switching process [\(CANAL et al.,](#page-68-3) [2022\)](#page-68-3).

# <span id="page-46-1"></span>6.1 28 nm Bulk CMOS Zero-Temperature-Coefficient (ZTC) Analysis

Knowing at design time, the operating corner cases of the circuit is critical and needs to be taken into account in the TDC circuit design. Lower temperatures (as -40  $^{\circ}$ C) are typically used as the best case while the circuit operates at the nominal supply- $V_{DD}$ , because while in the MOSFET strong inversion current is essentially dominated by drift current of the inverted-channel carriers, i.e., at nominal  $V_{DD}$ , lower temperatures lead to a higher  $I_{DS}$  current. However, this temperature and current relationship is not sustained for the low- and medium-inversion regimes of the MOSFETs. Using the FETs PDKs allow the designer to determine, through the so-called Zero-Temperature-Coefficient (ZTC) current level, if the drain current is dominate by the drift current or the diffusion current. This way, the designers is able to specify the temperature corners more accurately.

Planar MOSFET devices exhibit a Zero-Temperature-Coefficient (ZTC) condition, which means that at a given gate-to-bulk voltage  $V_{GB}$  bias, the different temperature effects in the electron/hole transport in the FET inversion layer compensate each other, so that the source-drain current exhibits very weak (near zero) temperature dependence. This ZTC behavior of the transistor current  $I_{DS}$  relates to the negative temperature coefficient of the threshold voltage and the mobility reduction with temperature [\(FILANOVSKY;](#page-69-4) [ALLAM,](#page-69-4) [2001\)](#page-69-4), which means that the drift and diffusion current components in the channel exhibit a mutual cancellation of the temperature behavior (namely negative TC and positive TC, respectively, for each component). This condition establishes, for each MOSFET device structure, a ZTC bias point  $(V_{GZ}, I_{DZ})$  at which the drain current  $(I_{DS})$ is temperature insensitive. Knowing the ZTC point is critical for establishing operational temperature corners and accurately simulating the best and worst case situations of a circuit. Since a higher temperature can be the best or the worst case, depending on the transistor type and biases  $V_{GB}$  and  $V_{SB}$ . This effect can be explained as the drain-tosource current  $(I_{DS})$  is composed of both Drift Current and Diffusion Current. In the strong inversion region, i.e.,  $V_{GB} \gg V_t$ , the total  $I_{DS}$  current is mainly composed by the drift current. Alternately, in the weak inversion region, i.e.,  $V_{GB} < V_t$ ,  $I_{DS}$  is dominated by the diffusion current [\(TSIVIDIS,](#page-71-3) [2010\)](#page-71-3) around the channel barrier potential from the MOSFET source to the inversion channel. In the moderate inversion region, both drift and diffusion are relevant. The temperature acts inversely on diffusion and drift currents, a higher temperature benefits diffusion currents and harms drift currents and vice-versa.

<span id="page-47-0"></span>

Figure 6.3: Zero-Temperature-Coefficient (ZTC) Test Bench

Source: The Author

A test bench, similarly to the one in Fig. [6.3,](#page-47-0) was created using the Cadence Spectre ™ electrical simulator with empirical device models provided by the TSMC foundry. A parametric DC simulation changed the gate voltage  $(V_G)$  for both PMOS (from -400 mV to -900 mV) and NMOS (from 400 mV to 900 mV) transistors. The lower limit for  $V_{GB}$ was set at 400 mV since  $V_{GZ}$  in this bulk technology, as in previously explored CMOS bulk in other works [\(TOLEDO,](#page-71-4) [2015\)](#page-71-4), falls in moderate to strong inversion, hundreds of  $mV$  above the FET  $V_t$ . The threshold voltages for zero substrate bias  $(V_{tho})$  in this technology are  $PMOS \approx -334$  mV and  $NMOS \approx 449$ mV respectively, in nominal conditions for process variation and Temperature (PT). The upper limit of 900 mV is the maximum nominal drain-source voltage  $V_{DS}$  for NMOS. Since the width of transistors can slightly affect the value of  $V_t$ , the investigation (by DC simulations) was performed for single-finger transistor widths commonly used in digital cells in this the technology, which are used in this work (100 nm, 200 nm, 300 nm, and 400 nm). Also, we considered in the VTC analysis the FETs with minimum gate length, as it is of interest for digital cells. In these simulations, both P- and N-MOSFETs had a fixed nominal channel length of 30 nm.

<span id="page-48-0"></span>



Fig. [6.4](#page-48-0) shows the variation in the NMOS drain current  $(I_D)$  according to the gate-to-bulk voltage ( $V_{GB}$ ) with a fixed drain terminal voltage ( $V_D$ ) of 100 mV. Since  $V_B$ is tied to the  $V_S$ , which is connected to the ground domain, the range of  $V_{GB}$  is the same as  $V_{GB} = V_G$ . Results display that the ZTC points for the simulated NMOS transistors fell around 712 mV to 724 mV of  $V_{GB}$ .

Fig. [6.5](#page-49-0) show the variation in the PMOS drain current  $(I_D)$  according to the gate-



<span id="page-49-0"></span>Figure 6.5: PMOS Zero-Temperature-Coefficient (ZTC) for the utilized 28 nm bulk CMOS technology and different transistor widths.

Source: The Author

to-bulk voltage ( $V_{GB}$ ) with a fixed source terminal voltage ( $V_S$ ) of 100 mV. Since  $V_B$  is tied to the  $V_S$ , the actual range of  $V_{GB}$  is set as  $V_{GB} = |V_G| + 100 \, mV$ . Furthermore, as the  $V_G$  is parametrically simulated from -400 mV to -900 mV, the actual range is -500 mV to -1V. The -900 mV to -1 V range was omitted in the graph since the interest was on the ZTC point. Results in Fig. [6.6](#page-49-1) show that the ZTC points for the simulated PMOS transistors fell around 694 mV to 716 mV of  $V_{GB}$ .

<span id="page-49-1"></span>Figure 6.6:  $V_{GZ}$  for the 28 nm bulk CMOS technology for different transistor widths.



Source: The Author

Since in this TDC design work the circuits operate at the low voltage, specified at 600 mV  $V_{DD}$  supply constraint, all transistors in the proposed circuit design will have a  $V_{GB}$  that is lower than 600 mV. The results, as shown in Fig. [6.6,](#page-49-1) show that  $V_{GZ}$  for

both NMOS and PMOS is greater than the maximum achievable  $V_{GB}$  in this work; thus, higher temperatures - for instance,  $125^{\circ}$  C - mean the best case temperature scenario for timing (or frequency) performance. While  $-40^{\circ}$  C is the worst temperature scenario for this performance figure.

An ealier Master's thesis in microelectronics [\(TOLEDO,](#page-71-4) [2015\)](#page-71-4) pointed out the importance of exploring the zero-temperature-coefficient (ZTC) for low-power design. In Toledo's work, a design methodology for a typical CMOS analog design flow is proposed, in order to make circuits as insensitive as possible to temperature variations. The MOSFET ZTC for the utilized 28 nm bulk CMOS technology found in this section can be further used by designers developing circuits with the same PDK as a reference on equations presented by Toledo to deeply understand how temperature will change transistor operation and hence the analog circuit behavior.

## <span id="page-50-1"></span>6.2 Flip-Flops: Setup, Delay, and Power Analysis

Dozens of types of Flip-Flops have been proposed and designed in the past decades, several targeting low-power operations. To refine the scope of possible FFs and to evaluate their performance under the specified constraint of 600 mV supply, a simple test bench was created, as shown in Fig[.6.7.](#page-50-0) Three very different types of FFs were evaluated in order to gain a broad perspective on their behaviors and to show how certain characteristics relate to TDC performance.

<span id="page-50-0"></span>

Figure 6.7: Flip-Flop Electrical SimulationTest Bench

Source: The Author

An important consideration is that the SAFF-TCD utilizes a differential data input (D and /D) with completion detection. Since the test bench employs an inverter (marked inverter \* in Fig[.6.7\)](#page-50-0) between D and /D signals.

This test bench mainly evaluates the setup and data for output delays. Two sources that were isolated by two inverters (drive strength X1 followed by X2), were set for the data and clock signals. The data signal had a rising edge after 1 ns and the clock would rise after setup time. This setup time was a global variable that was iterated during the parametric simulation, thus enabling the simulation of the clock-to-output  $(t_{CQ})$  delay and data-to-clock  $(t_{DC})$  time interval, i.e., the setup time interval, leading to the following graph, Fig[.6.8.](#page-51-0) The figure also displays results for each corner condition (as the dashed line, whereas the typical is displayed as a continuous line). Since in the simulated circuit, both NMOS and PMOS transistor biases points are below the zero-temperature-coefficient bias  $V_{GZ}$ , the temperature corners were set as 125° C for the best case, 27° C for typical, and -40 $\degree$  C for the worst case. For the voltage corners, the  $V_{DD}$  was decreased 10% for the worst corner and increased by 10% for the best corner. Process-wise, the device was set to SS in the worst case, TT in the typical, and FF in the best corner.

<span id="page-51-0"></span>Figure 6.8:  $t_{CQ}$  delay and setup time interval  $(t_{DC})$  curves of the SAFF-TCD, PowerPC 603 FF, and TSPC FF.



On most topologies, the time granularity, i.e., resolution, is defined as a composition of delays. For instance, to find the minimum resolution of a flash TDC, we should consider it as the sum of both the delay of the delay cell and the minimum setup time of the FF. In Fig. [6.8](#page-51-0) the optimal setup time (x-axis) can be found as the  $t_{CQ}$  starts to stabilize with the increase of  $t_{DC}$ . The steeper part of the figure indicates a place where the

output delay is increased; however, this does not result in a loss in resolution. The maximum attainable resolution is defined by the start point on the (x-axis). With the given conditions, the SAFF-TCD can deal with a setup time lower than 80 ps, while the TSPC needs a minimum of 91 ps. Applying signals (data and clock) with values that are close to those boundaries (as 80 ps on SAFF-TCD) inflicts a higher delay-to-output.

The energy per operation takes into account the power and the operation delay. The total delay of the FF is the sum of the setup-time and the clock-to-output. Designers should make the decision based on their application whether they would accept being more energy-inefficient in an effort to have a greater resolution.

<span id="page-52-0"></span>SAFF-TCD | PowerPC 603 | TSPC Peak Power @ 1 GHz PVT TT 14.4 $\mu$ W 12.1 $\mu$ W 13.2 $\mu$ W RMS Power @ 1 GHz PVT TT  $\vert$  2.4 $\mu$ W  $\vert$  1.61 $\mu$ W  $\vert$  1.46 $\mu$ W Mean Power @ 1 GHz PVT TT  $\begin{array}{|c|c|c|c|c|c|c|c|c|} \hline 0.281\mu W & 0.274\mu W \ \hline \end{array}$ Min. Setup  $\begin{array}{|c|c|c|c|c|} \hline \text{Min. Setup} & \text{79 ps} & \text{128 ps} & \text{91 ps} \ \hline \end{array}$ Min.  $t_{DC} + t_{CO}$  186 ps 211 ps 135 ps

Table 6.1: FF Main Characteristics at 600 mV

Table [6.1](#page-52-0) compares some essential characteristics of a FF for those designing lowpower hardware. Although this work focuses on time-to-digital converters, these concepts also apply to other circuits that also utilize a good number of registers, such as Serial-to-Parallel Interfaces (SPI). The power column is the product of the flip-flop's current and the  $V_{DD}$  supply voltage, which is 600 mV. The minimum setup is a key factor for determining the granularity of the TDC. A flip-flop with a higher span on the x-axis in Fig[.6.8](#page-51-0) to the left leads to a better attainable granularity.

<span id="page-52-1"></span>Hence, since the operation delay ( $t_{DC} + t_{CO}$ ) changes according to the setup time, another important aspect is to evaluate the energy per operation according to the setup margin. The energy at any given setup time can be achieved as:

$$
Energy/op(t_{DC}) = P_{MEAN} * (t_{CQ}(t_{DC}) + t_{DC})
$$
\n(6.1)

Results regarding the energy per operation and setup margin, Fig. [6.9,](#page-53-0) indicate that both TSPC and SAFF-TCD seem to have their strengths and weaknesses for a TDC design; yet, the PowerPC 603 does not seem like a good option for TDC applications that would require enhanced granularity. TSPCs' low energy per operation is endorsed by previous work done by Oskuii et al. [\(OSKUII; ALVANDPOUR,](#page-70-1) [2004\)](#page-70-1). Among eight different topologies, the TSPC got the best energy per operation results. The SAFF-



<span id="page-53-0"></span>Figure 6.9: Mean Energy per Operation and setup time interval  $(t_{DC})$  curves of the SAFF-TCD, PowerPC 603, and TSPC FF as given on Eq. [6.1.](#page-52-1)

Source: The Author

TCD displays good performance, overcoming PowerPC 603 and TSPC with a lower setup delay. For applications that may need extra robustness to PVT variation and deal with very low-frequency clock signals, the PowerPC 603 may be a good option since it is a static FF and spends way less energy than the SAFF-TCD.

#### <span id="page-53-1"></span>6.3 Sizing Flip-Flops

Transistor sizing plays a role in both the cell's delay  $(\tau)$  and power dissipation; furthermore, to improve TDC performance towards one objective (e.g., more precision, less energy per operation), knowing the PDK in depth is essential. Due to the difference in electron and hole mobilities in N- and P-channel FETs and transistor sizing for compact digital CMOS standard cells, the cells often present different rising and falling edge transition times. Typically, designers rely on increasing the width of the P transistor to reduce this rising and falling time mismatch. However, as the supply voltage decreases, this discrepancy becomes even more pronounced because the P and N transistors have slightly different absolute threshold voltages  $(V_t)$ . Another option is to slightly increase the N transistors' channel length. This technique is very sensitive since it heavily impacts transistors  $V_t$ . A higher  $V_t$  leads to a reduction in the static power, with a drawback on the device's delay.

Even though some TDC topologies are less susceptible to slew-rate asymmetryrelated issues than others, such asymmetry could impact short-circuit power on most

<span id="page-54-0"></span>

Figure 6.10: Transistor sizing test bench with FO4 configuration.

Source: The Author

topologies. Some techniques can be employed to mitigate the amount of timing mismatch, yet those design techniques are very application-sensitive, and some target applications can benefit more than others. Yet, most TDC architectures are highly sensitive to layout-induced delay mismatches.

To find an optimal transistor width and  $W_P/W_N$  ratio that complies with the given constraint of 600 mV of the supply voltage a test bench for electrical simulations was chosen. As in Fig. [6.10,](#page-54-0) the test bench features a simple CMOS inverter in the 28 nm PDK using transistor models for equal FET lengths, considering a fanout 4 (FO4) configuration. The inverter under test has its dimensions of PMOS, and NMOS transistors swept through a series of values and fingers.



<span id="page-54-1"></span>Figure 6.11: Inverter cell delay ( $\tau$ ) at a FO4 configuration and 600 mV of supply voltage.

The mean cell delay  $(\tau)$  of an inverter cell was obtained for a range of transistor width (and finger) values in Fig. [6.11.](#page-54-1) The width of PMOS transistors is usually defined as the NMOS width times the  $W_P / W_N$  ratio. Instead of relying on an increment of the width to find the optimal ratio, the test bench did a space exploration where the "PMOS width" is achieved by the NMOS width (x-axis) multiplied by the number of fingers (different series). Results indicate that for every finger configuration (1,2,3, and 4), the best  $\tau$ delay is when the NMOS transistor is 300 nm. Increasing the transistors' width from the minimum transistor size, on this PDK, of 100 nm to 300 nm leads to a delay reduction that goes up to  $18.94\%$ .

<span id="page-55-0"></span>Figure 6.12: Inverter cell mean switching energy at a FO4 configuration and 600 mV of supply voltage.



Source: The Author

In Fig. [6.15](#page-57-0) the  $t_{CO}$  and  $t_{DC}$  were evaluated. The previous section reviewed flipflops regarding setup time, output delay, power dissipation, and energy per operation. Results show that the performance of TSPC in terms of energy per operation was far beyond that of others. However, the SAFF-TCD could trigger with data-to-clock delays that were 11.86 ps shorter. The data-to-clock delay directly impacts TDC's granularity.

Data from both Figures [\(6.11](#page-54-1) and [6.12\)](#page-55-0) were applied to the  $F_1$  trade-off function presented on Sec. [2.6.1.](#page-25-0) This trade-off function values both energy consumption and maximum frequency. Since energy already has a relationship with the delay, having an extra relationship with the period of oscillation led to values that were larger than the minimum width. On the specified TOFs, the lower the value, the better. In Fig. [6.13,](#page-56-0) results show that for the given trade-off, the best width value was 300 nm despite the number of fingers. Considering that the TDC is part of the time-mode signal processing (TMSP) de-



<span id="page-56-0"></span>Figure 6.13: Transistor Sizes Applied to the Trade-off Function as in Eq. [2.7.](#page-25-1) The lower the value on the y-axis, the better.

Source: The Author

sign methodology and time-sensitiveness, sizing characteristics may be carefully biased towards timing improvements.



<span id="page-56-1"></span>Figure 6.14: Size-optimized TSPC Width and Finger Count Values for Each Transistor

Source: The Author

After gathering the trade-off function results, the TSPC was sized according to Fig. [6.14.](#page-56-1) The base width for the NMOS FET was set to 300 nm with a  $W_P/W_N$  ratio of 1.5  $\times$ . The output inverter had a different  $W_P/W_N$  ratio to achieve better slew-rate symmetry between the rising and falling edges at the output.

In this specific case of the TSPC DFF, within the customized sizing, simulations

<span id="page-57-0"></span>Figure 6.15: Size-Optimized TSPC Comparison with the, X1 drive strength, logical effort-based sizing TSPC. The lower left, the better.



Source: The Author

display a minor increase in energy per operation at the TT corner and 600 mV of supply voltage, Fig. [6.16.](#page-57-1) Despite the increase in energy per operation, the size-optimized TSPC has a lower energy per operation than both the PowerPC 603 and the SAFF-TCD.

<span id="page-57-1"></span>Figure 6.16: Mean Energy per Operation and  $t_{DC}$  curves of the SAFF-TCD, PowerPC 603, TSPC, and Size-Optimized TSPC FF as given on Eq. [6.1.](#page-52-1)



In accordance with Fig. [6.12,](#page-55-0) results on the Tab. [6.2](#page-58-0) display a slight increase in the power dissipated by the TSPC FF after the custom sizing. Nevertheless, it leads to a reduction in both the data-to-clock delay and clock-to-output, as in Fig. [6.15.](#page-57-0) After sizing improvements, TSPC can sustain a data-to-clock delay close to the SAFF-TCD's minimum, 83 ps (TSPC) vs. 79 ps (SAFF-TCD).

<span id="page-58-0"></span>

|                           | <b>TSPC</b>   | <b>SIZE-OPTIMIZED TSPC</b> |
|---------------------------|---------------|----------------------------|
| Peak Power @ 1 GHz PVT TT | $13.2 \mu W$  | $13.6 \mu W$               |
| RMS Power @ 1 GHz PVT TT  | $1.46 \mu W$  | $1.72 \mu W$               |
| Mean Power @ 1 GHz PVT TT | $0.274 \mu W$ | $0.305\mu W$               |
| Min. Setup                | $91$ ps       | $83$ ps                    |
| Min. $t_{DC} + t_{CO}$    | $135$ ps      | $128$ ps                   |

Table 6.2: TSPC Main Characteristics Comparison at 600 mV

### 6.4 Tunable Delay Cell Design

The implemented delay cell features an adjustable delay by the usage of a currentstarved architecture. The non-inverting delay cell is composed of two inverting stages. The architecture includes pseudo-differential transistor pairs in the Pull-Down Network (PDN) of the first stage and on the Pull-Up Network (PUN) of the second stage as illustrated in Fig. [6.17.](#page-59-0) The current capability of the pseudo-differential pairs depends on the input amplitude of a differential voltage reference  $(CONTROL_+$  and  $CONTROL_-$ ) connected to its gates. Therefore, a falling edge on the  $IN$  signal induces a sharp and quick transition at the output node  $OUT$ , as the current flows through transistors  $MP_1$ and  $MN_2$  to charge and discharge the outputs of the first and second stage respectively. On the other way, at a rising edge input signal, the pseudo-differential pair  $MN_3$ - $MN_4$ controls the current that discharges the internal delay cell node while  $MP_3$ - $MP_4$  controls the output node charges rate. Since the extra transistors  $(MP_3, MP_4, MN_3,$  and  $MN_4)$ are used as a form of limiting the cell's current, the cell sizing does not need to follow the logical effort. The implemented tunable delay cell sizing was done according to the table inside Fig. [6.17.](#page-59-0) As the TSPC presented in the previous section, the same  $W_P/W_N$  ratio of  $1.5 \times$  was used.

An energy-efficient voltage source is required to fine-tune the cell delay by the differential input,  $CONTROL_+$  and  $CONTROL_-$ . Unlike resistor-based voltage dividers, a capacitor-based voltage divider ideally has no static power consumption, which is a good option for a low-power design, given the fact that under low-supply voltage, the static power is proportionally more pronounced. This work implements a capacitorbased voltage divider, as presented in Fig. [6.18,](#page-59-1) with NMOS transistors with shorted drain, bulk, and source terminals exploring the gate capacitance of large transistors. Notice that MOSFET gate capacitance changes according to the inversion regime and, can suffer from process variability. Taking into account the area optimization, this work uses MOSFET, but a process/mismatch variability evaluation should be done to validate this

<span id="page-59-0"></span>

Figure 6.17: Tunable Delay Cell Schematic and Transistor Sizing Table

Source: The Author

implementation. Some strategies can be taken to compensate for the variability effects as layout techniques, increase the MOSFET sizes or still employ the divider capacitors with Metal-Oxide-Metal (MOM) or MIM capacitors instead. There is also the possibility to implement a calibration/trimming process to adjust the voltages after the circuit manufacturing.

<span id="page-59-1"></span>Figure 6.18: NMOS Gate Capacitor-Based Voltage Divider Setup. Transistors  $MV_1$  and  $MV_2$ , had their number of fingers changed during parametric simulation. Varying from 1 to 52 fingers.



Source: The Author

A test-bench similar to that presented in Fig. [6.10,](#page-54-0) was used to evaluate the cell's delay according to different voltage control inputs and properly size the capacitor-based voltage divider for the TDC presented in section 6.5. The DUT inverter of Fig. [6.10](#page-54-0) was

swapped for the tunable delay cell. Its differential control apparatus (capacitor-based voltage divider) had its capacitance ratio analyzed via parametric simulation. The transistors labeled as  $MV_1$  and  $MV_2$  in Fig. [6.18](#page-59-1) have their number of fingers swept from 1 to 52. Where, when it has 1 finger the capacitance ratio is 1:104, and when it has 52 fingers, the capacitance ratio is 1:2, since  $MV_1$  and  $MV_2$  transistors have half of the width as  $MF_1$ and  $MF_2$ . The delay and differential control input values extracted from the simulation can be seen in Fig. [6.19.](#page-60-0) Within the simulated control input range, the tunable delay cell can vary its delay from 51.18 ps to 128.8 ps.



<span id="page-60-0"></span>Figure 6.19: Capacitor-based Voltage Divider Ratio and Configurable Cell Delay

Source: The Author

Since this work focuses on the space exploration of the components composing the TDC (delay cell and FFs), a configurable voltage reference was not fully designed. This section, however, did a space exploration on the behavior of the delay cell regarding its inputs. These characteristics could lead to the further development of a CDAC as a voltage reference supplier. Another delay-cell architecture was also tested, using a single control transistor for PDN and another for PUN. That architecture, however, had no voltage input limitations, being capable of putting any  $V_{GS}$  bias on the control transistor. This creates space for  $V_{GS}$  bias that is below the threshold voltage, making the transition delay very slow and producing erratic behavior for a TDC. On the other hand, the pseudo differentialpair structures employed in the delay cell ensures that one of the transistors is driven with a  $V_{GS}$  higher than the CONTROL common-mode voltage, i.e.,  $V_{GS} > V_{DD}/2$ 

### <span id="page-61-0"></span>6.5 TDC Design

Since on the SAR ADC, the target measurement time window is quite large, as it can vary from 189.4 ps up to the nanosecond scale, a TDC with accumulated delay is a good option. The Flash TDC is an excellent starting point for applications that do not require a high level of granularity, and it can also be used to build more advanced TDCs such as the Stochastic TDC (see Sec. [4.4\)](#page-35-0). So, even while targeting a STDC building, it is important to first build a good Flash TDC and then replicate it to the desired redundancy and add the decoding block, which can be fully digital and rely on the usage of commercial synthesis tools.



Figure 6.20: Implemented Time-to-Digital Converter and delay cell.

Applying the techniques presented, an 8-bit deep coarse Flash TDC was implemented at the transistor schematic level, as in Fig. [6.23.](#page-64-0) Instead of utilizing standard delay cells, the programmable delay cell was outlined. The TSPC registers had their FET sizes optimized as explained in previous section 6.3.

### 6.6 Electrical Simulation Results

Results presented in this chapter were achieved through design and device-level simulations. The Cadence Spectre ™tool was used for electrical simulations, employing Berkeley Short-Channel IGFET Model 4 (BSIM4) device models provided by the foundry. Two sources that were isolated by two inverters (drive strength X1 followed by X2) were set for the start (data) and stop (clock) signals. The start signal had a rising edge after 1 ns and the stop would rise after the stipulated time. This margin between the start and stop signals was a global variable that was iterated during the parametric simulation, thus enabling the simulation of the margin for the TDC's output digital word and power values. Each D-type Flip-Flop output of the simulated TDC was connected to four minimum-size inverters, configuring a Fan-out-4 setup.

<span id="page-62-0"></span>Figure 6.21: Step Function of the TDC with 1:104 Capacitive Ratio and 600 mV of Supply- $V_{DD}$ 



Source: The Author

Fig. [6.21](#page-62-0) displays the encoded TDC output for the given start-stop margin (not in thermometer code for easier visualization). The values were achieved by simulation with a 1:104 capacitor ratio on the voltage reference. The figure also displays results for each corner condition (as the dashed line, whereas the typical is displayed as a continuous line). Since on the simulated circuit, both NMOS and PMOS transistor bias points are below the zero-temperature-coefficient, the temperature corners were set as 125° C for the best case, 27° C for typical, and -40° C for the worst case. For the voltage corners, the V DD was decreased 10% for the worst corner and increased by 10% for the best corner. Process-wise, the device was set to SS on the worst, TT on the typical, and FF on the best corner. Results indicate that the last bits on the TDC chain are more susceptible to variability due to the accumulated impact of temperature and voltage variables across all delay cells.

The average power and RMS power results, as in Fig. [6.22,](#page-63-0) show a decrease in the power dissipated along with the increase in the start-stop margin. To measure the mean power dissipation of the implemented TDC, the test bench fed a thousand randomly generated input values that were spaced from 0 ps to 600 ps. According to the simulation



<span id="page-63-0"></span>Figure 6.22: Power dissipated by the TDC according to the Step Function of the TDC with 0.005:1 Capacitive Ratio and 600 mV of Supply- $V_{DD}$ 

results, the mean power dissipation is 9.2464  $\mu$  W and the RMS power dissipation is 17.554  $\mu$  W.

|                        | <b>TDC</b> Implementations |                                       |                        |                         |
|------------------------|----------------------------|---------------------------------------|------------------------|-------------------------|
|                        | This Work                  | $\overline{\text{KIM et al.}}, 2015)$ | (KIM; KIM; PARK, 2014) | (WANG; DAI; WANG, 2018) |
| <b>Topology</b>        | Flash                      | Stochastic                            | Cyclic                 | 2-D Spiral Vernier      |
| <b>Node</b>            | $28 \text{ nm}$            | $14 \text{ nm}$                       | $28 \text{ nm}$        | $45 \text{ nm}$         |
| <b>Process</b>         | <b>Bulk CMOS</b>           | FinFet                                | <b>Bulk CMOS</b>       | <b>CMOS SOI</b>         |
| Supply $V_{DD}$        | 0.6V                       | 0.6V                                  | 0.9V                   | 1.0V                    |
| Power [mW]             | 0.009                      | 0.78                                  | 0.82                   | $0.07 - 0.69$           |
| <b>Event Rate ME/s</b> | 500                        | 100                                   | 10                     | 80                      |
| <b>Resolution</b> [ps] | 73                         | 1.17                                  | 0.63                   | 1.25                    |
| DNL [LSb]              | $N-A$                      | 0.8                                   | 0.5                    | 0.25/0.31               |
| $INL$ $[LSb]$          | $N-A$                      | 2.3                                   | 3.8                    | 0.34/0.4                |

<span id="page-63-1"></span>Table 6.3: Comparison with Low-Power State-of-the-art implementations of TDC circuits.

The implemented coarse TDC is compared to state-of-the-art TDCs in table [6.3.](#page-63-1) Due to the lack of information on coarse TDCs for similar applications, e.g., some authors do not show resolution in terms of time (they rather show it in terms of the number of bits), this work was compared to fine TDCs. It is possible to notice the inverse relationship between power dissipation and resolution. The coarse flash TDC developed in this work displays at least almost  $10 \times$  less power dissipation than the compared ones, with a penalty in the resolution (but the resolution fits the application). Since the cited works were fabricated, the authors' display measured DNL and INL values. Even though corneraware simulations were done, in order to properly fetch DNL and INL values, a fine grain Monte Carlo (MC) simulation, considering multi-parameter variability, would be necessary. To measure linearity parameter, would be good to simulate inter-cell variation, while regarding on corner-aware simulations, all devices are running on the same given PVT-conditions (worst, nominal, and best). Those metrics were not simulated due to a lack of time and computational resources.

Figure 6.23: Block Diagram of the SAR ADC with the Calibration Delay Cell

<span id="page-64-0"></span>

Source: The Author

The minimum comparator delay is 190 ps, which is already greater than the proposed TDC three first bits (which are triggered at 24 ps, 97 ps, and 170 ps respectively at the nominal corner). Hence, these three LSB bits would not be meaningful for the SAR logic and do not need to be considered as inputs in the hybrid comparator of the SAR ADC, unless a calibration scheme is performed by the control architecture. To make efficient use of the TDC, the addition of delay cells before the TDC start signal can be used to shift values towards a more useful timing window.

An extra tunable delay cell was carefully inserted into the clock signal which connects to the TDC's start port, as in Fig. [6.23,](#page-64-0) creating a delay clock that can be controlled by the SAR logic. The SAR algorithm can switch the capacitive ratio that tweaks the delay of this start delay, time-shifting the whole TDC's step function (but not changing its granularity or precision), i.e., changing the offset along the x-axis of Fig. [6.24.](#page-65-0) To fit this work, the start delay was fed by the voltage generated by a 2:5 capacitive voltage divider, shifting the "0000" to "0001" encoded-word transition from 24 ps to 180 ps, Fig. [6.24.](#page-65-0) This calibrated delay is intentionally slightly below the minimum comparator delay of 190 ps, so an encoded TDC output word of "0000" could identify the circuit's variability towards the best case corner. This feature could help the future development of a PVT-aware capacitive voltage divider switching scheme since, as seen in Fig. [6.21](#page-62-0) the



<span id="page-65-0"></span>Figure 6.24: Step Function of the TDC before and after calibration at 600 mV of Supply- $V_{DD}$ 

Source: The Author

step delay variation is greater between the specified nominal and best case than between the nominal and worst.

| <b>TDC</b> Output | <b>SAR ADC Analog Input</b> |
|-------------------|-----------------------------|
| 0011              | 600 mV $\sim$ 400 mV        |
| 0100              | $400~mV \sim 360~mV$        |
| 0101              | $360 \ mV \sim 260 \ mV$    |
| 0110              | $260 \ mV \sim 220 \ mV$    |
| 0111              | $220\ mV \sim 190\ mV$      |
| 1000              | $190 \ mV \sim -190 \ mV$   |
| 0111              | $-190$ mV $\sim -220$ mV    |
| 0110              | $-220 \ mV \sim -260 \ mV$  |
| 0101              | $-260$ mV $\sim -360$ mV    |
| 0100              | $-360$ mV $\sim -400$ mV    |
| 0011              | $-400$ mV $\sim -600$ mV    |
|                   |                             |

Table 6.4: Before (left) and After (right) Delay Calibration



As shown in Tab. [6.24,](#page-65-0) the calibration of delay cells can increase the number of effective bits and improve the voltage resolution. This is especially necessary given the larger variability that affects the CMOS circuits operating under low voltage inputs. Since the hybrid comparator's operation delay does not increase linearly according to the input voltages, an asymmetrical tuning of the delay cells could better distribute the output words across the voltage range, instead of relying on unitary delay. However, this would increase the SAR logic complexity with a highly anticipated increase in area and power dissipation. That kind of tuning would also benefit the input voltages that are higher than half of the Supply- $V_{DD}$  ( $V_{DD}$  > 300 mV in this case). Input voltages greater than half of supply- $V_{DD}$  are the operations that are most energy-consuming because of the time taken to charge CDAC capacitors. In spite of that, as previously mentioned, low voltage inputs are more sensitive to variability, and having that extra resolution below half of the Supply- $V_{DD}$  is also good.

# 7 CONCLUSIONS

This work covers four fundamental steps that can guide future designers seeking to develop low-power TDCs. In Sec. [6.3,](#page-53-1) a physical device sizing investigation led to optimizations of digital standard cells targeting their usage in TDCs. Results display that using transistor width values that are greater than the allowed technology minimum ( $3 \times$  for this PDK) resulted in an improvement in the applied FoM that weights energy per operation and operation delay. Sec. [6.1](#page-46-1) depicts the importance of using device-level simulations to determine the possibility of using ZTC operation of MOSFETs, at less-than-nominal  $V_{DD}$ , for the target 28nm CMOS technology and also establishing the corner-cases for the given supply- $V_{DD}$ . In Sec. [6.2,](#page-50-1) different D-type flip-flop topologies were compared in terms of a setup time requirement, power, and energy per operation, highlighting which aspects (such as minimum setup time) can be more useful for designing a low-power TDC. The schematic-level simulations have shown that at the given specs, the TSPC FF had the best trade-off of delay ( $t_{CO} + t_{DC}$ ) versus energy per operation. Sec. [6.5](#page-61-0) shows the schematic design of a flash architecture TDC that profits from all previous steps towards an improved TDC for the aforementioned application.

Future work in TDC will include the layout design of the presented 8-bit deep flash TDC in the manufacturable TSMC 28 nm technology PDK. The layout design would enable the fast replication of the flash TDC, facilitating the design of a stochastic TDC. In addition, the simulation of the extracted layout and the fine grain MC simulation of the presented TDC will have to be carried out. This set of transient mismatch-aware simulations will further the study of the DNL and INL metrics. Another area for improvement is the creation of a customized CDAC switching scheme that could interact with a dynamic configuration of the TDC's delay lines, mimicking behavior that is similar to time-amplifier TDCs.

The results demonstrate a feasible path for designing future ultra-low-power SAR ADCs and other TDC-assisted circuits that may run on low supply voltage conditions. The implemented coarse 8-bit deep time-to-digital converter presented in this work was designed in a commercialTSMC 28 nm Bulk CMOS technology. Results indicate good coverage of the SAR-ADC inputs after delay calibration. The TDC had a simulated mean power dissipation of only 9.25  $\mu$  W at 600 mV, making it a good option for applications that are not so demanding in terms of precision.

#### **REFERENCES**

ABDULSALAM, S. et al. Program energy efficiency: The impact of language, compiler and implementation choices. In: International Green Computing Conference. [S.l.: s.n.], 2014. p. 1–6.

ALI, A. M. A. et al. A 16-bit 250-MS/s IF Sampling Pipelined ADC With Background Calibration. IEEE Journal of Solid-State Circuits, v. 45, n. 12, p. 2602–2612, 2010.

BENINI, L.; MICHELI, G. de. System-level power optimization: techniques and tools. In: Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477). [S.l.: s.n.], 1999. p. 288–293.

Blesken, M.; Lütkemeier, S.; Rückert, U. Multiobjective optimization for transistor sizing sub-threshold CMOS logic standard cells. In: **Proceedings of 2010 IEEE** International Symposium on Circuits and Systems. [S.l.: s.n.], 2010. p. 1480–1483. ISSN 0271-4302.

Blesken, M. et al. Multiobjective optimization for transistor sizing of cmos logic standard cells using set-oriented numerical techniques. In: 2009 IEEE Nordic Circuits and Systems Conference. [S.l.: s.n.], 2009. p. 1–4.

<span id="page-68-3"></span>CANAL, B. et al. Hybrid Comparator and Window Switching Scheme for low-power SAR ADC. In: 2022 IEEE 11th Latin American Symposium on Circuits Systems (LASCAS). [S.l.: s.n.], 2022. p. 1–4.

CHEN, X. et al. Analysis and Design of an Ultra-Low-Power Bluetooth Low-Energy Transmitter With Ring Oscillator-Based ADPLL and  $4 \times$  Frequency Edge Combiner. IEEE Journal of Solid-State Circuits, v. 54, n. 5, p. 1339–1350, 2019.

<span id="page-68-0"></span>CHUNG, H.; HYUN, M.; KIM, J. A 360-fs-Time-Resolution 7-bit Stochastic Time-to-Digital Converter With Linearity Calibration Using Dual Time Offset Arbiters in 65-nm CMOS. IEEE Journal of Solid-State Circuits, v. 56, n. 3, p. 940–949, 2021.

CORNELIUS, C. et al. Performance analysis of powerpc 603 ff at nano-scale cmos technologies for wsns. In: 2016 3rd International Conference on Devices, Circuits and Systems (ICDCS). [S.l.: s.n.], 2016. p. 271–274.

DE, V.; VANGAL, S.; KRISHNAMURTHY, R. Near Threshold Voltage (NTV) Computing: Computing in the Dark Silicon Era. IEEE Design Test, v. 34, n. 2, p. 24–30, 2017.

<span id="page-68-2"></span>EFFENDRIK, P. Time-to-Digital Converter (TDC) for WiMAX ADPLL in State-of-The-Art 40-nm CMOS. Dissertation (Master) — Delft University of Technology, 04 2011. Available from Internet: [<http://resolver.tudelft.nl/uuid:](http://resolver.tudelft.nl/uuid:0957ee62-0d58-4a93-b5ad-40cad54b895d) [0957ee62-0d58-4a93-b5ad-40cad54b895d>.](http://resolver.tudelft.nl/uuid:0957ee62-0d58-4a93-b5ad-40cad54b895d)

<span id="page-68-1"></span>ELKHOLY, A. et al. A 3.7 mW Low-Noise Wide-Bandwidth 4.5 GHz Digital Fractional-N PLL Using Time Amplifier-Based TDC. IEEE Journal of Solid-State Circuits, v. 50, n. 4, p. 867–881, 2015.

ESMAEILZADEH, H. et al. Dark silicon and the end of multicore scaling. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA). [S.l.: s.n.], 2011. p. 365–376.

<span id="page-69-4"></span>FILANOVSKY, I.; ALLAM, A. Mutual compensation of mobility and threshold voltage temperature effects with applications in cmos circuits. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, v. 48, n. 7, p. 876–884, 2001.

GEORGIOU, S. et al. What are your programming language's energy-delay implications? In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). [S.l.: s.n.], 2018. p. 303–313.

INTERNATIONAL Energy Agency (IEA): Global Energy Review 2021. 2021. Available from Internet: [<https://www.iea.org/reports/global-energy-review-2021>.](https://www.iea.org/reports/global-energy-review-2021)

ITO, S. et al. Stochastic TDC architecture with self-calibration. In: 2010 IEEE Asia Pacific Conference on Circuits and Systems. [S.l.: s.n.], 2010. p. 1027–1030.

JAIN, S.; LIN, L.; ALIOTO, M. Design-oriented energy models for wide voltage scaling down to the minimum energy point. IEEE Transactions on Circuits and Systems I: Regular Papers, IEEE, v. 64, n. 12, p. 3115–3125, 2017.

JEONG, H. et al. Sense-amplifier-based flip-flop with transition completion detection for low-voltage operation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v. 26, n. 4, p. 609–620, 2018.

KHDR, H.; AMROUCH, H.; HENKEL, J. Aging-aware boosting. IEEE Transactions on Computers, IEEE, v. 67, n. 9, p. 1217–1230, 2018.

<span id="page-69-3"></span>KIM, K. et al. A 7b, 3.75ps resolution two-step time-to-digital converter in 65nm CMOS using pulse-train time amplifier. In: 2012 Symposium on VLSI Circuits (VLSIC). [S.l.: s.n.], 2012. p. 192–193.

<span id="page-69-1"></span>KIM, S.-J.; KIM, T.; PARK, H. A 0.63ps, 12b, synchronous cyclic TDC using a time adder for on-chip jitter measurement of a SoC in 28nm CMOS technology. In: 2014 Symposium on VLSI Circuits Digest of Technical Papers. [S.l.: s.n.], 2014. p. 1–2.

<span id="page-69-0"></span>KIM, S.-J. et al. 15.5 A 0.6V 1.17ps PVT-tolerant and synthesizable time-to-digital converter using stochastic phase interpolation with  $16\times$  spatial redundancy in 14nm FinFET technology. In: 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. [S.l.: s.n.], 2015. p. 1–3.

Koomey, J. et al. Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing, v. 33, n. 3, p. 46–54, 2011.

<span id="page-69-2"></span>LEE, M.; ABIDI, A. A. A 9 b, 1.25 ps Resolution Coarse–Fine Time-to-Digital Converter in 90 nm CMOS that Amplifies a Time Residue. IEEE Journal of Solid-State Circuits, v. 43, n. 4, p. 769–777, 2008.

LEPAWSKY, J. Sources and streams of electronic waste. One Earth, v. 3, n. 1, p. 13–16, 2020. ISSN 2590-3322. Available from Internet: [<https://www.sciencedirect.](https://www.sciencedirect.com/science/article/pii/S2590332220303079) [com/science/article/pii/S2590332220303079>.](https://www.sciencedirect.com/science/article/pii/S2590332220303079)

LEVANTINO, S. et al. AD-PLL for WiMAX with Digitally-Regulated TDC and Glitch Correction Logic. EURASIP Journal on Embedded Systems, v. 2010, n. 1, p. 175764, Nov 2009. ISSN 1687-3963. Available from Internet: [<https://doi.org/10.1155/2010/175764>.](https://doi.org/10.1155/2010/175764)

Lutkemeier, S. et al. A 65 nm 32 b Subthreshold Processor With 9T Multi-Vt SRAM and Adaptive Supply Voltage Control. **IEEE Journal of Solid-State Circuits**, v. 48, n. 1, p. 8–19, Jan 2013. ISSN 0018-9200.

<span id="page-70-1"></span>OSKUII, S.; ALVANDPOUR, A. Comparative study on low-power high-performance standard-cell flip-flops. Proceedings of SPIE - The International Society for Optical Engineering, 03 2004.

PAIM, G. P. Approximate and timing-speculative hardware design for highperformance and energy-efficient video processing. Thesis (PhD) — Universidade Federal do Rio Grando do Sul, 2021.

RABAEY, M. P. J. M. Low Power Design Methodologies. [S.l.]: Springer, 1996.  $14-15$  p.

RAHKONEN, T.; KOSTAMOVAARA, J. The use of stabilized cmos delay lines for the digitization of short time intervals. IEEE Journal of Solid-State Circuits, v. 28, n. 8, p. 887–894, 1993.

<span id="page-70-0"></span>RAISANEN-RUOTSALAINEN, E.; RAHKONEN, T.; KOSTAMOVAARA, J. A low-power CMOS time-to-digital converter. IEEE Journal of Solid-State Circuits, v. 30, n. 9, p. 984–990, 1995.

RAZAVI, B. Principles of Data Conversion System Design. Wiley-IEEE Press, 1995. Available from Internet: [<https://ieeexplore.ieee.org/servlet/opac?bknumber=5264233>.](https://ieeexplore.ieee.org/servlet/opac?bknumber=5264233)

ROBERTS, G. W.; ALI-BAKHSHIAN, M. A brief introduction to time-to-digital and digital-to-time converters. IEEE Transactions on Circuits and Systems II: Express Briefs, v. 57, n. 3, p. 153–157, 2010.

Rosa, A. L. R. et al. Designing CMOS for near-threshold minimum-energy operation and extremely wide V-F scaling. In: 2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI). [S.l.: s.n.], 2015. p. 1–6.

SAMARAH, A.; CARUSONE, A. C. A Digital Phase-Locked Loop With Calibrated Coarse and Stochastic Fine TDC. IEEE Journal of Solid-State Circuits, v. 48, n. 8, p. 1829–1841, 2013.

SANGUESA, J. A. et al. A review on electric vehicles: Technologies and challenges. Smart Cities, v. 4, n. 1, p. 372–404, 2021. ISSN 2624-6511. Available from Internet: [<https://www.mdpi.com/2624-6511/4/1/22>.](https://www.mdpi.com/2624-6511/4/1/22)

SCHREIER, R.; TEMES, G. C. Understanding delta-sigma data converters; 1st ed. New York, NY: Wiley, 2005. Available from Internet: [<https://cds.cern.ch/record/](https://cds.cern.ch/record/733538) [733538>.](https://cds.cern.ch/record/733538)

STASZEWSKI, R. et al. All-digital TX frequency synthesizer and discrete-time receiver for Bluetooth radio in 130-nm CMOS. IEEE Journal of Solid-State Circuits, v. 39, n. 12, p. 2278–2291, 2004.

STASZEWSKI, R. et al. All-digital PLL and transmitter for mobile phones. IEEE Journal of Solid-State Circuits, v. 40, n. 12, p. 2469–2482, 2005.

<span id="page-71-1"></span>SZPLET, R. Time-to-digital converters. In: \_\_\_\_\_. Design, Modeling and Testing of Data Converters. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. p. 211–246. ISBN 978-3-642-39655-7. Available from Internet: [<https://doi.org/10.1007/](https://doi.org/10.1007/978-3-642-39655-7_7) [978-3-642-39655-7\\_7>.](https://doi.org/10.1007/978-3-642-39655-7_7)

TAGHIPOUR, S.; ASLI, R. N. Aging comparative analysis of high-performance FinFET and CMOS flip-flops. Microelectronics Reliability, Elsevier, v. 69, p. 52–59, 2017.

TAKAMIZAWA, H. et al. Correlation between threshold voltage and channel dopant concentration in negative-type metal-oxide-semiconductor field-effect transistors studied by atom probe tomography. Applied Physics Letters, v. 100, n. 25, p. 253504, 2012. Available from Internet: [<https://doi.org/10.1063/1.4730437>.](https://doi.org/10.1063/1.4730437)

TANI, S. et al. Behavior-Level Analysis of a Successive Stochastic Approximation Analog-to-Digital Conversion System for Multi-Channel Biomedical Data Acquisition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100.A, n. 10, p. 2073–2085, 2017.

THEAN, A. V.-Y. et al. Performance and Variability Comparisons between Multi-Gate FETs and Planar SOI Transistors. In: 2006 International Electron Devices Meeting. [S.l.: s.n.], 2006. p. 1–4.

<span id="page-71-4"></span>TOLEDO, P. MOSFET Zero-Temperature-Coefficient (ZTC) Effect Modeling and Analysis for Low Thermal Sensitivity Analog Applications. Dissertation (Master) — Universidade Federal do Rio Grande do Sul, 09 2015. Available from Internet: [<http://hdl.handle.net/10183/140814>.](http://hdl.handle.net/10183/140814)

<span id="page-71-3"></span>TSIVIDIS, C. M. Y. Operation and Modeling of the MOS Transistor. In: [S.l.]: Oxford University Press, 2010.

<span id="page-71-0"></span>UEMORI, S. et al. Multi-bit Sigma-Delta TDC Architecture for Digital Signal Timing Measurement. In: 2012 IEEE 18th International Mixed-Signal, Sensors, and Systems Test Workshop. [S.l.: s.n.], 2012. p. 67–72.

WALDEN, R. Analog-to-digital converter survey and analysis. IEEE Journal on Selected Areas in Communications, v. 17, n. 4, p. 539–550, 1999.

<span id="page-71-2"></span>WANG, H.; DAI, F. F.; WANG, H. A Reconfigurable Vernier Time-to-Digital Converter With 2-D Spiral Comparator Array and Second-Order  $\Delta \Sigma$  Linearization. IEEE Journal of Solid-State Circuits, v. 53, n. 3, p. 738–749, 2018.

WU, S. Y. et al. A 10-bit 100MS/s time domain Flash-SAR ADC. In: 2014 IEEE International Conference on Electron Devices and Solid-State Circuits. [S.l.: s.n.], 2014. p. 1–2.
WUERDIG, R. N. et al. Evaluating Cell Library Sizing Methodologies for Ultra-Low Power Near-Threshold Operation in Bulk CMOS. In: 2020 IEEE 11th Latin American Symposium on Circuits Systems (LASCAS). [S.l.: s.n.], 2020. p. 1–4.

YANG, J.; GUPTA, R. FV encoding for low-power data I/O. In: ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581). [S.l.: s.n.], 2001. p. 84–87.

YUAN, J.; SVENSSON, C. High-speed CMOS circuit technique. IEEE Journal of Solid-State Circuits, v. 24, p. 62–70, 1989.

ZHANG, M. et al. 3.5 A 0.6V 13b 20MS/s Two-Step TDC-Assisted SAR ADC with PVT Tracking and Speed-Enhanced Techniques. In: 2019 IEEE International Solid-State Circuits Conference - (ISSCC). [S.l.: s.n.], 2019. p. 66–68.

## APPENDIX A — DESIGNED AND FABRICATED GRO-TDC IN TSMC 180 NM **CMOS**

Figure A.1: Full-custom Layout of the 180 nm Gated-Ring-Oscillator TDC designed by the author. The Ring-Oscillator lies at the top-part, registers in the middle, and buffers at the bottom.



Source: The Author



Figure A.2: Simulated Values for RO-Frequency Vs. Supply Voltage for the 180 nm GRO-TDC

Source: The Author

Figure A.3: Simulated values for Power Dissipation Vs. Supply Voltage for the 180 nm GRO-TDC



Source: The Author



Figure A.4: 3D Rendering of the Developed Test Board

Source: The Author