

# Time-to-Digital Converter for Low-Power Direct Time-of-Flight System

Guo Xu



**Delft University of Technology** 

# Time-to-Digital converter for low power dToF system

**Master's Thesis** 

To fulfill the requirements for the degree of Master of Science Electrical Engineering, Track: Microelectronics at Delft University of Technology under the supervision of Dr. Sijun Du (Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology) Ting Gong (Silicon-Integrated) Ao Ba (Silicon-Integrated)

Guo Xu (5274834)

September 25, 2022

# Contents

| Li | List of Acronyms 5            |         |                                                   |    |  |
|----|-------------------------------|---------|---------------------------------------------------|----|--|
| Ał | ostrac                        | :t      |                                                   | 7  |  |
| 1  | Introduction                  |         |                                                   |    |  |
|    | 1.1                           | Proble  | m statement and research questions                | 8  |  |
|    | 1.2                           | Thesis  | contributions                                     | 9  |  |
|    | 1.3                           | Thesis  | outline                                           | 9  |  |
| 2  | Background and related work 1 |         |                                                   |    |  |
|    | 2.1                           | Depth   | sensing technologies                              | 11 |  |
|    |                               | 2.1.1   | Indirect Time-of-Flight                           | 12 |  |
|    |                               | 2.1.2   | Direct Time-of-Flight                             | 13 |  |
|    |                               | 2.1.3   | dToF and iToF Summary                             | 16 |  |
|    | 2.2                           | Key Pa  | arameters of Time-to-Digital Converter            | 17 |  |
|    | 2.3                           | Introdu | uction to Time-to-Digital Converter               | 17 |  |
|    |                               | 2.3.1   | Ring based TDC                                    | 20 |  |
|    |                               | 2.3.2   | Vernier TDC                                       | 23 |  |
|    |                               | 2.3.3   | Time Difference Amplification based TDC           | 24 |  |
|    | 2.4                           | Applic  | ation of Time-to-Digital Converter in dToF system | 26 |  |
|    |                               | 2.4.1   | Independent TDC with Duty-Cycled VCOs             | 26 |  |
|    |                               | 2.4.2   | Shared TDC with Always-On VCOs                    | 29 |  |
|    |                               | 2.4.3   | Shared TDCs with Duty-Cycled VCOs                 | 30 |  |
|    |                               | 2.4.4   | TDC Summary                                       | 32 |  |
| 3  | Syst                          | em-Lev  | el Modeling of dToF system                        | 33 |  |
|    | 3.1                           | Theore  | etical Analysis                                   | 33 |  |
|    |                               | 3.1.1   | Signal Events Distribution                        | 33 |  |
|    |                               | 3.1.2   | Noise Events Distribution                         | 34 |  |
|    |                               | 3.1.3   | Combination of Signal Events and Noise Events     | 36 |  |
|    | 3.2                           | Systen  | n-Level Model                                     | 37 |  |
|    |                               | 3.2.1   | Finding-Peak Algorithm                            | 38 |  |
|    |                               | 3.2.2   | Effect of TDC Parameters                          | 38 |  |
|    | 3.3                           | Simula  | ation Verification                                | 40 |  |
|    | 3.4                           | Archit  | ecture Choice of the Proposed TDC                 | 43 |  |

| 4  | Circ                           | cuit Design                               | 44 |  |  |  |  |
|----|--------------------------------|-------------------------------------------|----|--|--|--|--|
|    | 4.1                            | Top-level architecture                    |    |  |  |  |  |
|    | 4.2                            | Voltage-Controlled Oscillator             |    |  |  |  |  |
|    |                                | 4.2.1 Pseudo Differential Inverter        | 47 |  |  |  |  |
|    | 4.3                            | Phase Comparator                          | 51 |  |  |  |  |
|    |                                | 4.3.1 StrongARM Based Comparator          | 52 |  |  |  |  |
|    |                                | 4.3.2 Chopping Scheme                     | 56 |  |  |  |  |
|    | 4.4 Coarse Counter             |                                           |    |  |  |  |  |
|    |                                | 4.4.1 Gray Counter and Correction Logic   | 64 |  |  |  |  |
|    |                                | 4.4.2 Double Sampling and Selection Logic | 67 |  |  |  |  |
|    | 4.5                            | Readout Circuit                           | 69 |  |  |  |  |
|    |                                | 4.5.1 Gate Logic                          | 71 |  |  |  |  |
|    |                                | 4.5.2 Latch Arrays                        | 72 |  |  |  |  |
|    |                                | 4.5.3 Scan Chain                          | 73 |  |  |  |  |
| 5  | Post-Layout Simulation Results |                                           |    |  |  |  |  |
|    | 5.1                            | Frequency of the VCO                      | 74 |  |  |  |  |
|    | 5.2                            | Noise Analysis                            | 74 |  |  |  |  |
|    | 5.3                            | DNL Estimation                            | 77 |  |  |  |  |
|    | 5.4                            | Performance Summary                       | 79 |  |  |  |  |
| 6  | Conclusions                    |                                           |    |  |  |  |  |
|    | 6.1                            | Responses to Research Questions           | 82 |  |  |  |  |
|    | 6.2                            | Future Work                               | 82 |  |  |  |  |
| Bi | bliog                          | raphy                                     | 84 |  |  |  |  |

# List of Acronyms

| AR    | Augmented Reality                       |
|-------|-----------------------------------------|
| ТоF   | Time-of-Flight                          |
| dToF  | direct Time-of-Flight                   |
| iToF  | indirect Time-of-Flight                 |
| SPAD  | Single-Photon Avalanche Diode           |
| VCO   | Voltage-Controlled Oscillator           |
| DNL   | Differential Non-Linearity              |
| INL   | Integral Non-Linearity                  |
| LSB   | Least Significant Bit                   |
| SSP   | Single-Shot Precision                   |
| TCSPC | Time-Correlated-Single Photon Counting  |
| RO    | Ring Oscillator                         |
| DLL   | Delay-Locked Loop                       |
| FLL   | Frequency-Locked Loop                   |
| TDC   | Time-to-Digital Converter               |
| TDA   | Time-to-Digital Amplifier               |
| DFF   | D Flip-Flop                             |
| MUX   | Multiplexer                             |
| Ts    | Timestamp                               |
| DCR   | Dark-Count Rate                         |
| PDE   | Photon Detection Efficiency             |
| AFOV  | Angular Field-of-View                   |
| DAC   | Digital-to-Analog Converter             |
| FPGA  | Field Programmable Gate Array           |
| ASIC  | Application-Specific Integrated Circuit |
|       |                                         |

# Acknowledgments

With the approaching of my graduation, I am delighted to prepare to enter a new stage of career life, but also depressed to leave TUDelft and ending my student life. I really appreciate everyone I have met in Delft and Eindhoven. Besides, Netherlands, such a beautiful and multicultural country, also gives me many pieces of unforgettable time.

I give my deepest gratitude to my supervisor Dr. Sijun Du for his patient guidance on me. He gave me many useful and long-sighted suggestions when I looked for my researching direction, guided me to deal with problems in my circuit design and help me to improve my thesis. Also, I sincerely thank my daily supervisors, Gong Ting and Ao Ba, who are experts from the Silicon-Integrated company. They provided me with many inspired ideas during the proposal of my design and helped me overcome all the barriers I encountered in my work. Finally, I am also thankful to Dr. Stoyan Nihtianov and Dr. Chang Gao as my thesis committee.

I also want to say thank you to my dear friends. Many of my confusions about the system-model were solved with the help of Jia Shi and Mingzhen Chen. I received many encourages from Jixuan Mou and Jinqian Yu. These useful words kept my motivation to improve the thesis work further and further. Xiaomeng An and Yige Li accompanied me to enjoy my leisure time, always bring me a sunny mood. In addition, I appreciate the Silicon-Integrated company where I did my internship. It provided me with a precious opportunity to know how life in a career is before me.

Lastly, I give an appreciation to myself for never giving up and persisting in finishing my master program. Wish I could always keep such a heart full of enthusiasm for knowledge.

# Abstract

Depth sensing technology has developed rapidly recently due to its broad applications. Time-of-Flight (**ToF**) is a popular technology in depth sensing because of its advantages in high measuring accuracy and low complexity in image processing. ToF technologies are divided into two kinds: indirect time-of-flight (**iToF**) and direct Time-of-Flight (**dToF**). iToF measures the target distance by detecting the phase shift or frequency shift of the laser. It achieves a good performance in a short range, while for a long-range detection, it is limited by wavelength, frequency or laser peak power. Compared to the former, dToF can detect targets at a longer distance by directly calculating the flight time of the laser. To obtain time information, the dToF system requires a Time-to-Digital Converter (**TDC**). This thesis proposes a system-level model to optimize the TDC parameters for a dToF system used in an autofocus system of a mobile phone.

The system modeling shows that resolution and Differential Non-linearity (**DNL**) are the two most important parameters influencing the performance of dToF system, while single-shot precision (**SSP**) of a TDC only has a negligible effect. According to the simulation result of the system modeling, a ring-based TDC is chosen and built using a shared and duty-cycled Voltage-Controlled Oscillator (**VCO**). To decrease the DNL of the TDC, chopping comparators are used in this design. In addition, to save power and area, a new method called double sampling is used to align the coarse and fine phases of the ring based TDC.

The proposed TDC achieves a resolution of 100 ps and a DNL of -0.27/+ 0.37, which is smaller than that of state-of-the-art TDCs used in dToF systems. In [1], a TDC with always-on VCOs consumes 0.5 mW/TDC. Compared with it, this work consumes a smaller average power of 0.179 mW/TDC with a similar fabricating technology. For covering 2.5 m detection range, the TDC is designed as a 8-bit depth range. SMIC 55BSC technology is used to design and tape out the TDC.

Keywords: dToF, ring-based TDC, low power, small DNL, chopping, double sampling.

# 1 Introduction

Depth sensing technology becomes popular recently due to its broad applications including automotive, face recognition, Augmented and Virtual Reality (AR/VR) and so on. It is defined as a measuring system that can obtain distance information from targets. Different techniques provide distinct information and resolution. Time-of-Flight (**ToF**) is one of the most commonly used depth measurement techniques, which can be implemented as direct measurement- direct Time-of-Flight (**dToF**) or indirect way- indirect Time-of-Flight (**iToF**). Some depth sensing technologies, such as the dToF system, can measure the distance of targets in a long range with good performance[2], suitable for applications such as automotive Li-DAR. In short-range applications, structure light and iToF can provide the shape and distance of targets, applied in FaceID of mobile phone[1] or VR [3].

The dToF system, which has a long detection range and a high Signal-to-Background Noise Ratio (SBNR), can be applied in an environment with high interference, such as outdoor applications[4][5]. The Time-to-Digital Converter (TDC) is one of the most important components of the dToF system, responsible for measuring the time difference between the emitting signal and the reflected signal. The performance of the TDC has a vital influence on the performance of a dToF system. This thesis focuses on designing a TDC with optimized specifications for a dToF system applied in the autofocusing system of mobile phones. This application requires that the system has low power consumption, moderate detection range, and high rejection to background noise.

# **1.1** Problem statement and research questions

#### **Problem statement**

Firstly, the dToF system is a novel depth-sensing technology. Although there are papers designing different TDC with sub-gate delay or low power or good linearity, few of them consider which specifications are important to dToF system and how the TDC performance affects precision and accuracy of dToF system. With more and more applications using the dToF system to obtain depth information, a research on how to optimize TDC performance including resolution, power, linearity, area and phase noise for the proposed dToF system is needed.

Second, since the low-power dToF system is aimed at applications in mobile phones and robotics, a low-power and low-cost design for TDC needs to be considered. In addition, nonlinearity of TDC greatly influences the accuracy and precision of dToF system, and it is mainly determined by local process variation. Therefore, it is worth discussing how to calibrate effect from local process variation.

# **Research questions**

This thesis mainly focuses on the following questions:

- Can the dToF system-level model prove that TDC specifications such as resolution, Differential Non-Linearity (DNL) and Integral Non-Linearity (INL), and power consumption are optimized to achieve the required performance of the target dToF system?
- 2. According to the simulation results of the system modeling, is there a reasonable and feasible architecture of TDC which can meet the specification requirements, resistant to local process variation (e.g. LSB variation and maximum DNL), and simplicity of calibration?

# **1.2** Thesis contributions

The main contribution of this thesis is summarized as follows.

- 1. A model of the proposed dToF system, implemented in MATLAB. It optimizes the TDC specifications including resolution, DNL and single-shot precision (**SSP**) for an application in the dToF system. This model provides referencing specifications for choosing and designing TDC architecture.
- A chopping scheme is used in phase comparators of TDCs, decreasing their offsets. The reason for causing an offset residue is also explained by mathematical analysis.
- 3. A new method for aligning the fine and coarse phases of a ring-based TDC. This method saves area occupation and power consumption.

# 1.3 Thesis outline

The rest of the thesis is organized as follows.

**Chapter 2: Background and Related Works.** This chapter introduces commonly used depth-sensing technologies and focuses on comparing iToF and dToF. Prior arts of TDCs applied in dToF systems are also discussed.

**Chapter 3: System-Level Modeling of the dToF system.** A system-level modeling of the proposed dToF system is introduced in this chapter. The optimized specifications are obtained from the simulation results. In addition, the effect of TDC performance on the dToF system is also analyzed according to the model.

**Chapter 4: Circuit Design.** This chapter focuses on the implementation of the proposed TDC. Two important contributions, chopping comparator and double sampling, are discussed. The remaining components of the TDC are also introduced, such as voltage-controlled oscillator (**VCO**), half-gray code counter, and readout circuit.

**Chapter 5: Post-Layout Simulation Results.** This chapter focuses on implement of the proposed TDC. Two important contributions, chopping comparator and double sampling, are discussed. The rest components of the TDC are introduced, such as VCO, half gray code counter, and readout circuit.

**Chapter 6: Conclusion.** Conclusion of this thesis is drawn in this chapter, answering the research questions raise in the Chap. 1. Outlook is given for researching directions in the future.

# 2 Background and related work

In this chapter, different depth-sensing technologies are introduced and compared. After analyzing the advantages and disadvantages of each technology, the scope is narrowed to the target of this thesis - TDC applied in a low-power dToF system. Prior arts of TDCs are discussed, providing a reference for choosing an appropriate TDC architecture mentioned in Chap. 3.

# 2.1 Depth sensing technologies



The main depth-sensing technologies are classified as shown in Fig. 2.1.

Figure 2.1: Categories of depth sensing technology[1]

Microwave-based radar technology has been widely used in military, aerospace fields, etc., featuring long-distance detection and high tolerance to environmental effects[6]. Although it is a very mature technology, its spatial and depth resolution are poor because of long wavelength of the microwave. Ultrasonic sensing technology has the advantages of low power, relatively small size, and high depth resolution, but suffers from great losses in the air and thus can only detect targets in a short range[7]. Optical sensing technology achieves long range, the highest spatial and depth resolution, compact size and power, and a large field of view. These features make it possible to apply in automotive, AR/VR and face recognition[8].

Stereoscopic vision uses two or more cameras to mimic the function of human eyes. It provides depth information based on a triangular relationship among cameras and a target point[1]. However, this technique does not apply active illumination, so it suffers serious environmental interference, providing a limited detection range and precision. Structured light is a depth-sensing technology with active illumination. It generates a signal pattern on a target, and then measures the depth and shape information by detecting pattern deformation. This technique is commonly used in computer games and FaceID on smart phones [3]. Time-of-flight depth sensing technique has become popular in recent decades. The depth-sensing technologies mentioned above estimate the depth information by some specific algorithms. This requires a high capability for imaging processing, which is usually slow and power hungry[8]. In addition, previous technologies detect depth information based on the intensity of light, which means they are severely affected by environmental conditions. Additionally, the accuracy decreases with increasing target distance. ToF technology, however, calculates the target distance by the flight time of the light without complicated image processing, called hardware-based measurements. The calculation can be realized by detecting phase shifting or frequency shifting of the light (indirect Time-of-Flight) or time difference between emitting and receiving light (direct Time-of-Flight).

Indirect Time-of-Flight (**iToF**) and direct Time-of-Flight (**dToF**) are two most important and promising technologies in the optical sensing system. Both have advantages in some specific area. A specific introduction to them is included in the following two parts. This thesis focuses on the dToF system.

# 2.1.1 Indirect Time-of-Flight

The working principle of the phase-shift-based iToF system is shown in Fig. 2.2. A continuous sinusoidal wave is used to illuminate the target, and then the reflected light accumulates in four continuous windows shown in Fig. 2.2 ( $C_0$  to  $C_4$ ). With the four accumulation windows, we can calculated the phase shift  $\Delta \phi$  by[10]

$$\Delta \phi = \arctan \frac{C_3 - C_1}{C_0 - C_2} \tag{2.1}$$

Therefore, the target distance can be derived with the corresponding  $\Delta \phi$ , given by[11]

$$d = \frac{c}{2f} \cdot \frac{\Delta \phi}{2\pi} = R_D \cdot \frac{\Delta \phi}{2\pi}$$
(2.2)

where *c* is the speed of light and f is the modulation frequency of the illumination sinusoidal wave. The largest unambiguous detection range ( $R_D$ ) can be found to be determined by the frequency of light (*f*). For a longer range such as the light propagating more than one period, the system cannot distinguish phase shift of the first period from that of other periods, which leads to an ambiguous situation, named as multi-path interference. Eq. 2.2 implies a smaller modulation frequency increases the unambiguous range. However, it also decreases the distance precision  $\sigma_d$ , given by[10]

$$\sigma_d = \frac{R_D}{\sqrt{8\pi}} \frac{B}{A} \tag{2.3}$$



Figure 2.2: Phase-shift based iToF operation diagram. A modulated sinusoidal wave illuminates the target, and the reflected light is divided into four areas to calculate the phase shift [9].

where *B* is the intensity of background light and *A* is the signal intensity of illumination light. Except for multichannel interference, iToF also suffers from a limited power of the laser source. Since the illumination light is continuous, a relatively low peak power of laser source has to be used to satisfy eye-safety criteria. Therefore, the SBNR is relatively low. As shown in Eq. 2.3, a lower signal power leads to lower precision, limiting the detection range.

# 2.1.2 Direct Time-of-Flight

As shown in Fig. 2.3a, dToF system is mainly composed of three parts: a emitter (pulsed laser source), detectors, and time interval digitizers, which usually are Time-to-Digital Converters (TDCs). The working principle of this system is as following: firstly, the synchronizer generates a voltage pulse to drive the laser source to emit a laser pulse. Meanwhile, the voltage pulse is also sent to initiate the TDC to start counting. Then, the pulse laser is reflected by the target. The reflected laser pulse is detected by the detector, which also generates a voltage pulse at the same time (marked as detecting an event). The detector usually is single photon avalanche diode, which is reverse biased far above breakdown voltage (known as Geiger-mode). Therefore, this diode has a very high gain and short time response, allowing single-photon detection. Finally, the second voltage pulse is used to stop the TDC, which then counts the time interval between the two pulses. The relation between the target distance ( $d_{target}$ ) and the detected Time-of-Flight ( $t_{flight}$ ) is

$$d_{target} = c \cdot t_{flight}/2 \tag{2.4}$$

where *c* is the speed of light in the air. Usually, detectors consist of a single-photon avalanche diode (SPAD) array. A SPAD is a p-n junction that is reverse biased far above its breakdown voltage, which is called Geiger mode [12] [13]. When a photon is absorbed in the depletion region of the SPAD, avalanche occurs, and a large current is generated. The gain of a properly biased SPAD is virtually infinite and is limited by the number of carriers in the avalanche. Such a high gain allows a few volts of electrical signal to be generated in a short time when the SPAD is triggered by a photon. If we connect the SPAD to a buffer as shown in Fig. 2.4 [14], a digital signal  $V_{out,SPAD}$  from 0 to 1 is generated when receiving a photon. This signal can indicate a TDC to measure the photon arrival time directly. In Fig. 2.4,  $V_{BD}$  is the breakdown voltage and  $V_{EB}$  is the excess bias voltage. When the SPAD is triggered,  $V_A$  will increase and generate a voltage pulse.  $R_q$  limits the current flowing through the SPAD, quenching the avalanche when  $V_A$  exceeds  $V_{EB}$ .



Figure 2.3: dToF system:(a) architecture, (b) TCSPC histogram [1].

If we repeat measurements for several times in a certain time, a histogram is built by accumulating all detecting results as shown in Fig. 2.3b. This process is called Time-Correlated-Single Photon Counting (**TCSPC**). Theoretically, the peak of the red part is the the average flying time of the laser, which indicates the target distance.

Compared to the iToF system which needs to read the phase or frequency of the laser, dToF utilizes pulse modulated light. Since the SPAD can be triggered by a single photon, the laser pulse width can be very small. To meet the requirements of the eye-sight safety criteria, decrease of the laser pulse width allows us to increase the peak power of the laser, which gives an increased SBNR. This means that the

system can have either a longer detection range or a decreased power of the laser source. In [2], with the large power of the laser source, the detection range of the dToF system can even reach hundreds of meters.



Figure 2.4: A simplified SPAD front-end circuit[14].

| Tuna      | iTo                        | dToF[15]                |                          |
|-----------|----------------------------|-------------------------|--------------------------|
| Type      | Dhaga shift hagad[11]      | Frequency-shift         |                          |
|           | Phase-shift based[11]      | based[16]               |                          |
|           |                            | 1. High resolution.     |                          |
|           |                            | 2. Except distance,     |                          |
|           |                            | the velocity of the     | 1. Pulse modulated light |
|           |                            | object can be measured, | allows a high peak       |
|           | 1. High resolution.        | which is a big          | optical power while      |
|           | 2. High robustness.        | advantage in automotive | maintaining the average  |
|           | 3. Low cost compared to    | applications.           | eye-safe exposure.       |
| Benefits  | dToF due to the lower      | 3. Better tolerance     | 2. High SBNR and large   |
|           | speed requirement to       | against environmental   | detection range.         |
|           | both the illuminator       | disturbances compared   | 3. Multi-path reflection |
|           | and the receiver.          | to phase shift based    | can be detected and      |
|           |                            | iToF.                   | recognized easily by     |
|           |                            | 4. Multiple-path        | multi-event measurement  |
|           |                            | interference can be     |                          |
|           |                            | distinguished.          |                          |
|           | 1. Short range detection.  |                         | 1. Higher cost compared  |
|           | 2. Low SBNR                |                         | to iToF due to           |
|           | due to low peak            | 1. Maximum measuring    | higher requirement to    |
|           | optical power to satisfy   | distance is limited     | illuminators and         |
| Drawbacks | eye-safety criteria.       | due to phase noise.     | receivers                |
|           | 3. Multi-path interference | 2. Low SBNR due to      | 2. Low resolution in     |
|           | makes it challenging to    | eye-safety criteria.    | short range.             |
|           | sense a complicated        |                         | 3. Require extra devices |
|           | scene.                     |                         | (TDCs)                   |

# 2.1.3 dToF and iToF Summary

Table 1: Comparison between iToF and dToF

Table. 1 compares the advantages and disadvantages between the iToF and dToF systems. iToF system mainly features high resolution in a short range, but suffers from limited detection range and multi-path interference. dToF system is advantageous in a long detection range and SBNR but not competitive in cost, and it has a low precision in a short range because of the limitation of TDC resolution. For an application on an autofocus system of the mobile phone, high SBNR and long detection range are required. Therefore, the dToF system is more competitive in this application.

This paper focuses on TDC used in dToF system. TDC is an important device in the system, greatly influencing its performance. In the following few parts, prior arts of TDCs used in dToF system are introduced and compared.

# 2.2 Key Parameters of Time-to-Digital Converter

TDC is a device that measures the time interval between two signals (*START* and *STOP*). It is a necessary part in the dToF system since the flight time of the laser needs to be measured. The important parameters of a TDC include: Least Significant Bit (LSB), DNL, INL, and Single-Shot Precision (SSP). The parameters are explained as follows.

# LSB

LSB is the minimum distinguished time interval of a TDC. Its value usually ranges from a few to hundreds of picoseconds. In a dToF system, the LSB determines the bin size of the histogram.

# INL

As shown in Fig. 2.5a, INL describes the macroscopic bending of the actual TDC response. Its definition is the deviation of each step from the ideal position, and this deviation is usually normalized to one  $T_{LSB}$ . Usually, the maximum value or root-mean-square (rms) value of INL is used to evaluate non-linearity of TDC.

# DNL

DNL is the deviation of INL. It is defined as the deviation of the length of each step from its ideal value (one LSB). It gives a microscopic view on the TDC non-linearity.

# SSP

The classic dynamic measurement of a TDC is called a single-shot experiment. During this experiment, a fixed time interval  $\Delta T$  is injected into the TDC repeatedly. With noise, the result value varies in a certain range, as shown in Fig. 2.5b. The standard deviation of these measurement results is called single-shot precision. It reveals the noise level of a TDC and how reproducible the TDC measurement is with noise.

# 2.3 Introduction to Time-to-Digital Converter

TDC is an important component in a dToF system, converting time information generated by pixels into digital code for post signal processing. In dToF applications, usually a start signal triggers the TDC to count, and then a stop signal latches



Figure 2.5: (a) Staircase curve of TDC[17], (b) single-shot precision definition[17]

the phase of TDC. By computing the time interval between start and stop signal, the distance of the target is obtained. Time digitization of a TDC is usually performed by coarse and fine levels. The coarse level is typically implemented by a counter, which is triggered by a system clock. The number of bits of the counter determined the dynamic range of the TDC, and its resolution (coarse resolution) is in nanosecond range. The second stage (fine stage) interpolates the system clock, scaling down the resolution to picosecond range. The fine stage is the most critical parts in a TDC design, determining fine resolution, non-linearity, single-shot precision and most part of power consumption. Depending on application requirements, technology, cost and stability, there are many different ways to implement a TDC.

Currently, there are mainly two ways to design TDCs: Field-Programmable Gate Array (FPGA) based and Application-Specific Integrated Circuit (ASIC) based. FPGA TDCs utilized resources (usually the fastest delay elements) of programmable devices as a Tapped Delay Line (TDL). FPGA based method provides a convenient way to design TDCs, but it performs badly in many features. Firstly, the delay elements in FPGA vary with PVT process since no any compensation is applied to them. It causes a large LSB variation among different TDCs. Secondly, the layout of the delay elements is not optimized, which leads to a bad linearity. [18] and [19] report FPGA-based TDCs with temperature compensation. The TDC LSB is calibrated by a on-board look-up table (LUT) over a wide range of temperature. However, the delay variation over temperature for TDL is rather complex, LUT

can not give a precise compensation for delay elements. In [20], Routing resources (1024paths) are used as delay elements to eliminate the effect of temperature and voltage variation. The TDC achieves 7.4 *ps* time resolution, a DNL of 0.74 *LSB*, an INL of 1.57 *LSB*, and 0.92 *LSB* of jitter. Its power consumption is very high, reaching 23 *mW* for a single channel.

ASIC based method is the most popular way of designing TDC since its performance can be customized for a given purpose. In this article, ASIC method is used to design TDC since the performance of the TDC is needed to optimized for dToF system. The most common used TDC in dToF system can be divided into three type:

**Ring-based TDC**[21][22]: this kind of TDC usually consists of a ring oscillator that gives several different phases and a counter to counting the cycles of the oscillator. The phase residues are read out by flip-flops or phase comparators. This implementation is theoretically deadtime free since no device is needed to be reset. Therefore, it features a high sampling speed and is suitable for applications demanding a high conversion rate. However, this TDC resolution is limited by the fastest delay element, which is mainly determined by CMOS technology. Therefore, it is hard to achieve a very high resolution for this architecture. In [23], a sub-gate delay is obtained by inserting resistors between two phases, achieving a resolution of 4.7 *ps* in 90 *nm* technology. In [24], the author designs a clock interpolation circuit based on a resistive interpolation mesh circuit. This topology achieves a resolution of 8.6 *ps* in 180 *nm* technology, but at the cost of a high power consumption of 9.1 *mW* per channel (16 channels in total).

**Vernier TDC**[25][26]: Compared with Ring oscillator based TDC, this TDC dramatically increases resolution without increasing power consumption. The dead-time (=  $t_{clk}^2/resolution$ ) of vernier topology greatly climbs as resolution scaling down and dynamic range increasing since the fast line needs more time to catch the slow line. In [25], the gated vernier ring oscillator topology achieves 7.3 *ps LSB* with only 1.2 *mW* power consumption in 130 *nm* technology, but its dynamic range is only 9ns with a deadtime of 415 *ns*. The long deadtime limits the vernier toplogy can not be used in shared TDC toplogy in dToF system.

**Time Difference Amplification (TDA) based TDC**[27][28]: In this technique, the residue between the stop signal hitting the edge and the reference clock is amplified by a time amplifier. Then, the amplified time residue is sent to the next stage (usually the same structure as the first stage) to be measured again. Finally, the fine resolution is given by  $\tau/gain$ , where  $\tau$  is the delay of a delay element and *gain* is the gain of the time amplifier. The TDC resolution is affected by *gain*, so it is hard for this TDC to achieve a good linearity since the gain of time amplifier varies with PVT. In addition, deadtime is also a big challenge in this architecture, so it is also hard to be applied in shared TDC topology in dToF system. In [27], the TDA topology achieves a resolution of 62.5 *ps*,  $\pm 0.52$  *LSB* of DNL and  $\pm 0.8$  *LSB* of INL.

The details of these three type of TDCs are introduced in the following.

# 2.3.1 Ring based TDC

Fig. 2.6 shows the typical topology of ring-based TDC. The delay elements are usually implemented by inverters, whose minimum gate delay determines the highest resolution of the TDC. The inverters are connected as an Ring Oscillator (RO), providing multiple phases. To maintain the oscillation, there should be an odd number of inverters. When the start signal is low state, the oscillator stops working and thus the TDC is turned off.



Figure 2.6: Typical ring based TDC topology

When the start is high, the oscillator begins to work. Phase 4 is injected into the coarse counter to count the oscillation period. When the stop signal arrives, the rising edge triggers coarse and fine comparators to store the state of the counter (coarse code) and fine phase residues (fine code). Finally, the time interval between start and stop can be described by an equation:

$$\Delta T_{measured} = code_{coarse} \cdot NT_{LSB} + code_{fine} \cdot T_{LSB}$$

$$(2.5)$$

where  $T_{LSB}$  is the fine resolution of the TDC, and N is the number of stages of delay elements.

#### **Voltage-Controlled Oscillator**

The oscillator shown in Fig. 2.6 is free-running, which means its frequency changes with process, supply voltage, and temperature variation. To achieve a stable resolution of the TDC, the frequency of the TDC should be stabilized. Therefore, the free-running oscillator is modified into a voltage controlled oscillator (VCO) as shown in Fig. 4.2. Transistor M1 is connected as a current source, which controls the current flowing into the oscillator and thus changes its frequency.



Figure 2.7: VCO

The delay of an inverter is described as

$$T_p = 0.52 * \frac{C_L * V_{DD}}{(W/L)_n k_n V_{DSatn} (V_{DD} - V_{thn} - V_{DSat}/2)}$$
(2.6)

This equation shows that delay of inverters can be controlled by changing  $V_{DD}$  or current flowing through inverters. Frequency of the oscillator  $f_{osc}$  is equal to  $1/NT_p$ .

 $V_{ctrl}$  is provided by a Delay-Locked Loop (DLL) or Frequency-Locked Loop (FLL) with a replica oscillator, as shown in Fig. 2.8.

# **Phase interpolation**

The main drawback of ring-based TDC is that resolution is limited by the intrinsic gate delay of an inverter. This means that resolution can only be improved by advanced technology. [23] provides a method to improve resolution by interpolating phases with resistors. As shown in Fig. 2.9, a phase is interpolated into four fine



Figure 2.8:  $V_{ctrl}$  is generated by a DLL with a replica oscillator.

phases by four resistors. Although this method greatly improves resolution, it performs badly in linearity since the integrated resistance is not accurate. In addition, the change of rinsing and falling time of inverters also causes the phase to be not equally interpolated.



Figure 2.9: Phase interpolation by inserting resistors[23].

In [29], the oscillator is composed of differential inverters, producing a group of inverse phases as shown in 2.10. By choosing more combinations of phases (e.g.  $\overline{\Phi_3}$  and  $\Phi_0$ ), the number and resolution of phases both double, as shown in 2.11a and 2.11b. Differential inverters also have more benefits: (a) insensitive to common mode noise due to the large CMRR, (b) better linearity because *LSB*  length is not influenced by different rising and falling transition time. This method also degrades the DNL due to the mismatch among the phase comparators.



Figure 2.10: Differential inverters chain[29].



Figure 2.11: (a) interpolated phase comparators[29], (b) the number and resolution of output phases double by interpolation[29].

# 2.3.2 Vernier TDC

Fig. 2.12 shows the block diagram of the Vernier ring oscillator TDC. Tow ROs with slightly different delays ( $\tau_1 > \tau_2$ ) are used to greatly improve resolution. The start signal is first injected into the slow delay line, and after a certain time period  $T_{measured}$ , the stop signal is injected into the fast delay line. When the phases of the fast line catch the slow one, the end-of-conversion detection array will stop the oscillator and enable the readout array. Finally, the measurement result is given by:

$$\Delta T_{measure} = N \cdot (\tau_1 - \tau_2) \tag{2.7}$$

$$resolution = \tau_1 - \tau_2 \tag{2.8}$$

Eq. 2.8 shows that resolution of vernier TDC is determined by the frequency difference of two ROs. It means a very high resolution can be obtained even with slow oscillators. Compared with ring based TDC, this topology usually has worse DNL and INL due to mismatch between two ROs. The deadtime of vernier TDC is also larger because after stop signal coming, slow RO need a time to catch fast RO, and it (=  $t_{clk}^2/resolution$ ) greatly climbs as resolution scaling down and dynamic range increasing.



Figure 2.12: Block diagram of the Vernier ring oscillator TDC.[25]

#### 2.3.3 Time Difference Amplification based TDC

Fig. 2.13 shows a block diagram of Time-to-Digital Amplifier (**TDA**) based TDC. The time difference between start and stop is first quantized by first-stage TDC. Then, the time residue is amplified by a time amplifier, and quantized by the second-stage TDC. The final resolution of the TDC is given by

$$T_{LSB,system} = \frac{T_{LSB,second}}{G}$$
(2.9)

Where  $T_{LSB,system}$  is the timing resolution of the whole TDC, G is the gain of the time amplifier and  $T_{LSB,second}$  is the timming resolution of the second TDC. The



Figure 2.13: Block diagram of TDA based TDC[17].

resolution of the TDA-based TDC thus is inversely proportional to the gain of time amplifier.

The first stage of TDA-based TDC is shown in Fig. 2.14, and its timing diagram is shown in Fig. 2.15. When the *STOP* signal arrives, the phase detectors record the state of the first-stage TDC, and then the synchronizer generates a signal *SYNC* aligned with the next falling edge of VCO after *STOP*. *STOP* is delayed by  $\Delta T$  to *STOP*<sub>OUT</sub> to fit the input range of the second-stage TDC. *STOP*<sub>OUT</sub> and *SYNC* are used as *START* and *STOP* signals for the second-stage TDC.



Figure 2.14: First stage block diagram of TDA based TDC[17].

This TDC achieves a resolution of 9 *ps*, but a large DNL of 1.4 *LSB* and a deadtime of 320 *ns*. DNL degrades greatly when the first-stage code changes due to the gain error between the first stage and the second stage.

#### 2.4 Application of Time-to-Digital Converter in dToF system

The precious part discusses some basic topology about TDC, which can be used in dToF system. However, these basic topology needs to be developed further to be applied in dToF system. This part focuses on further developed topology in dToF system. They are divided into three groups: independent TDC with dutycycled VCOs, shared TDC with duty-cycled VCOs and shared TDC with alwayson VCOs.



Figure 2.15: Timming diagram of TDA based TDC[17].

#### 2.4.1 Independent TDC with Duty-Cycled VCOs

Fig. 2.16 shows a block diagram of a commonly used TDC topology - independent TDC with duty-cycled VCO. As shown in the figure, each pixel has a corresponding TDC with a independent VCO. When a pixel is triggered, it generates a start signal to open a corresponding VCO. The VCO begins to oscillate until the stop signal coming. Stop signal is a reference clock with a certain frequency, and it also triggers the sampling circuit to record the state of the VCO meanwhile. When stop is high, the VCO is reset and disabled for saving power. This is the reason why this architecture is called duty-cycled. Finally, all the data from N TDCs is exported by the readout circuit. The stop signal (reference clock) acts as a coarse counter, and the VCO provides fine phases.

The biggest problem in this architecture is the LSB length variation among different VCOs. Since there are always mismatches among VCOs, their oscillating frequencies are different due to PVT process. In addition, VCOs need some time to achieve stable after the beginning of oscillation, so there are frequency distortions in the first few circles of VCOs. This disadvantage limits the number of TDCs and thus limits the largest size of SPAD array. On the other hand, the longest oscillating time of VCOs is only one period of the reference clock (stop signal). Therefore, the accumulated jitter is small in this architecture. In addition, since the enable and disable of VCOs are independent, some TDC topology needs deadtime for reset

can be applied in this architecture, such as Vernier TDC and TDA based TDC. This topology can help TDC achieve a high resolution.



Figure 2.16: Block diagram of a independent TDC with duty-cycled VCO topology.

As shown in Fig. 2.18[17], it introduces a TDA-based TDC with independent TDC with duty-cycled VCOs. When the stop signal comes, the first TDC samples the time difference between the start signal from a pixel and the stop signal. Then, the time residue caused by the quantization error is sent to a time-to-digital amplifier. This time residue is amplified and then sampled again by the second TDC. Finally, the sampled data from the first and second TDC are exported together by the readout circuit. The conversion time of this architecture is 320 *ns* and it takes 102  $\mu$ s to read all data. The conversion rate is 9.8 *kS/s*.



Figure 2.17: Timing diagram of the topology shown in Fig. 2.16.



Figure 2.18: Timing diagram of TAD based TDC with independent and duty-cycled VCOs[17].

# 2.4.2 Shared TDC with Always-On VCOs

A typical architecture of shared TDC with always-on VCOs is given by [1]. In Fig. 2.18, 1024 SPADs (pixels) are divided into 16 subgroups, and every subgroup has a corresponding TDC. Each subgroup has 4 decision trees composed of 16 SPADs. Therefore, each TDC is shared by 64 SPADs. A decision tree means when one or more SPADs in a column are triggered, it generates a stop signal to trigger TDC and restores to be ready for next trigger after a certain deadtime. Each TDC has 4 samplers corresponding to 4 decision trees. It is noticeable that in this design, VCOs are always on, as shown in Fig. 2.21. In this timming diagram, start is a reference clock aligned with the first phase of VCOs. When stop signals coming (SPADs are triggered), samplers will record the state of VCOs. Then the time differences between the reference clock and stop signals can be calculated.



Figure 2.19: Timing diagram of shared TDC with always-on VCOs[1].

The always-on structure minimizes the overall frequency variation among different VCOs, which is the biggest problem in the structure of independent VCOs. As shown in Fig. 2.20, since VCOs are always oscillating, they can be coupled with each other, being pushed to locking state, and oscillating synchronously. This technique keeps the skew and accumulated jitter among TDCs to be minimum. Therefore, this architecture allows to realize a larger pixel array with low LSB variation among TDCs. However, it also consumes a large number of power since VCOs always work. Although a TDC is shared by 64 SPADs, which decreases equivalent power consumption for each SPAD, the total power consumption is still high.



Figure 2.20: Mutual coupling of TDCs with a high-frequency monitoring structure[1].



Figure 2.21: Timming diagram for the architecture of shared TDC with always-on VCOs.

# 2.4.3 Shared TDCs with Duty-Cycled VCOs

In [17], an architecture of shared TDCs with duty-cycled VCOs is introduced. As shown in Fig. 2.22, a VCO is shared by 192 TDCs with a common stop signal. Its timming diagram is shown in the right of Fig. 2.23. The VCO begins to oscillate after *EN* signal becomes high. When *start* signal (generated by a SPAD) comes,

the phase detector in a TDC will latch the phases of VCO as fine code, and then a counter will count the period of the VCO until *EN* disables the VCO, which is as coarse code. Finally, the output data is composed of the coarse code from the counter and the fine code from the VCO.



Figure 2.22: Block diagram for the architecture of shared TDC with duty-cycled VCOs[17].



Figure 2.23: Architecture of the proposed duty-cycled TDC with a common oscillator and its timing diagram[17].

Compared with shared TDC with always-on VCOs, this architecture consumes less power since VCO is duty-cycled. However, the number of TDCs is limited because large mismatch among VCO phases is caused when they propagate along TDCs. The more TDCs are shared, the worse the DNL of TDCs are. Compared with independent TDCs with duty-cycled VCOs, this architecture has no LSB variation since only one VCO is applied. The problem of instability of VCO when turning it on is also can be solved by shifting *EN* signal earlier with a certain offset. On the other hand, this architecture cannot apply sub-gate delay technique since VCO only can be reset after all measurements end.

# 2.4.4 TDC Summary

The features of TDC architectures mentioned above are concluded in the Table. 2. The independent TDC with duty-cycled VCO can achieve high resolution, but its pixel array size is limited by the variation of LSB length. The shared TDC with always-on VCO features the best performance in TDC linearity, but its power consumption is very high. As for the shared TDC with duty-cycled VCO, its power consumption is very low, but with the increase of the TDC number, the linearity of the TDC worsens.

| Topology | Independent VCO in TDC                                                                                                                                                     | Shared VCO in TDC                                                                                                                     |                                                                                                                                                                    |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Topology | VCO is duty-cycled                                                                                                                                                         | VCO is always-on                                                                                                                      | VCO is duty-cycled                                                                                                                                                 |
| Pros     | <ol> <li>Sub-gate delay technique<br/>can be used, achieving<br/>high resolution.</li> <li>Low power consumption.</li> </ol>                                               | <ol> <li>Small variation<br/>of LSB.</li> <li>No stability problem<br/>of turning on VCOs.</li> </ol>                                 | <ol> <li>Low power</li> <li>consumption.</li> <li>Stability problem</li> <li>of turning on VCOs</li> <li>can be solved by</li> <li>shifting EN earlier.</li> </ol> |
| Cons     | <ol> <li>Variation of LSB length,<br/>which limits the number<br/>of TDCs.</li> <li>Large area occupation.</li> <li>Stability problem<br/>when turning on VCOs.</li> </ol> | <ol> <li>Number of TDCs</li> <li>is not limited by</li> <li>VCO mismatches.</li> <li>Very high power</li> <li>consumption.</li> </ol> | <ol> <li>Number of TDCs</li> <li>is limited.</li> <li>Moderate variation</li> <li>of LSB length.</li> </ol>                                                        |

Table 2: Comparison of different TDC topologies applied in dToF systems.

# **3** System-Level Modeling of dToF system

This chapter introduces a theoretical analysis of the photon distribution of the proposed dToF model. In addition, the system model is discussed, and the optimized parameters of the proposed TDC are given according to the simulation results.

# 3.1 Theoretical Analysis

In the dToF system, the laser source firstly emits a laser pulse, and then the laser pulse is reflected by targets. When the SPAD array receives a photon, an electrical pulse is generated and sent to a TDC. If a electrical pulse is triggered, it is defined that the SPAD generates an event. Then, the TDC samples the arriving time of the electrical pulse and then exports a corresponding code, which is marked as a timestamp. After multiple timestamps are produced, we accumulate these timestamps into a histogram with multiple bins, and the peak bin represents the target distance. Therefore, the target distance  $d_{target}$  is given by

$$d_{target} = \frac{c}{2} \cdot LSB \cdot bin[k_{max}]$$
(3.1)

where c is the speed of light, *LSB* is the resolution of the TDC and  $bin[k_{max}]$  is the index of the histogram bin where the peak is located.

#### 3.1.1 Signal Events Distribution

To build a system-level model, we first need to perform a statistical analysis for the emitted and reflected photons. The photon energy from the laser source is given by

$$E_{photon} = \frac{h \cdot c}{\lambda} \tag{3.2}$$

where *h* is the Planck constant, and  $\lambda$  is the wavelength of the laser, equal to 905*nm*. In a certain time  $\Delta T_{res}$ , the largest number of emitted photons ( $N_{signal\_total}$ ) from the laser source are:

$$N_{signal\_total} = P_{laser} \cdot \Delta T_{res} / E_{photon}$$
(3.3)

The laser peak power  $P_{laser}$  is 16mW. Then, we can consider these photons to be reflected and uniformly distributed on a receiving SPAD array. Therefore, we can derive the number of photons spread per pixel, given by

$$N_{signal\_perpixel} = \frac{N_{signal\_total}}{N_{pixel}} = \frac{P_{laser} \cdot \Delta T_{res}}{E_{photon} \cdot N_{pixel}}$$
(3.4)

 $N_{pixel}$  is the number of pixels. The target system has a 4 × 32 SPAD array, so  $N_{pixel}$  is equal to 128. However, not all of these photons can be reflected and reach

the SPAD array. On one hand, the number of photons reflected by the target is effected by its reflectivity  $R_{target}$ . On the other hand, the reflected laser is diffused by the target, but the Angular Field-of-View (**AFOV**) of lens in front of SPADs is limited. Therefore, the number of reflected photons are decreased by a distance factor  $d_{filter}$ , which satisfies[1]:

$$d_{filter} = \frac{A_{lens}}{4\pi \cdot d_{target}^2 \cdot 0.5}$$
(3.5)

where  $A_{lens}$  is the area of the receiving camera lens. Combining Eq. 3.4 and Eq. 3.5, we can obtain the number of photons influenced by target reflectivity and distance factor, given by

$$N_{signal\_filter} = N_{signal\_perpixel} \cdot R_{target} \cdot d_{filter}$$

$$= \frac{P_{laser} \cdot \Delta T_{res} \cdot R_{target} \cdot A_{lens}}{E_{photon} \cdot N_{pixel} \cdot 2\pi \cdot d_{target}^2}$$
(3.6)

In [30], it reveals that the reflected photons follow Gaussian distribution, with a mean value at time of flight ( $t_{flying}$ ) and a standard deviation ( $\sigma_{laser}$ ) same as full width at half maximum of the laser. Since the total number of reflected photons is  $N_{signal\_filter}$  in  $\Delta T_{res}$ , the distribution of the reflected photons ( $dist_{signal}(k)$ ) is given by

$$distr_{signal}(k) = N_{signal\_filter} \cdot exp[-\frac{(k - t_{flying})^2}{2(\sigma_{laser}/\Delta T_{res})^2}]$$
(3.7)

$$t_{flying} = 2d_{target} / (c \cdot \Delta T_{res})$$
(3.8)

$$d_{target} = t_{flying} \cdot c/2 \tag{3.9}$$

where  $\sigma_{laser}$  is 2ns,  $t_{flying}$  is the flying time of laser when there is a target at distance  $d_{target}$ , and k stands for the number of simulation time resolution  $\Delta T_{res}$ . The above calculation derives the distribution of the reflected photons, but not all photons can trigger the SPADs and they only can be activated with a probability called Photon Detection Efficiency (*PDE*). Therefore, we can obtain the probability distribution ( $pd_{signal}(k)$ ) of triggering a SPAD by signal photons (**signal events probability distribution**) in time resolution  $\Delta T_{res}$ .

$$pd_{signal}(k) = 1 - (1 - PED)^{distr_{signal}(k)}$$
(3.10)

#### 3.1.2 Noise Events Distribution

The SPAD array can also be triggered by noise photons from environmental light. Here, we only calculate the main contribution of environmental interference - sunlight. It is noted that we only calculate the noise photons reflected by the targets but no photon directly from sunlight. This is because if the SPAD array is directly exposed to sunlight, it is saturated and thus does not work. For this reason, the total noise power is related to the distance to the target. To mimic the situation in reality, we take a large surface satisfying Lambetian reflectance as target. Thus, this surface can diffusely reflect sunlight. In [1], it is revealed that the relationship between total noise power and target distance is given by

$$P_{noise\_total} = P_{sunlight} \cdot d_{target}^2 \cdot tan(AFOV)$$
(3.11)

Therefore, the number of noise photons spread per pixel in time resolution  $\Delta T_{res}$  satisfies:

$$N_{noise\_perpixel} = \frac{P_{noise\_total} \cdot \Delta T_{res}}{E_{photon} \cdot N_{pixel}}$$
(3.12)

Similar to Eq. 3.13, the number of photons reaching the SPAD array is also decreased by the target reflectivity and distance factor. Thus, Eq. 3.12 is modified to:

$$N_{noise\_filter} = N_{noise\_perpixel} \cdot R_{target} \cdot d_{filter}$$

$$= \frac{P_{sunlight} \cdot tan(AFOV) \cdot \Delta T_{res} \cdot R_{target} \cdot A_{lens}}{E_{photon} \cdot N_{pixel} \cdot 2\pi}$$
(3.13)

It is noticeable that Eq. 3.13 shows the number of photons reaching the SPAD array is not relevant to the target distance. Over the detection range, the noise photons are uniformly distributed since the sunlight keeps illuminating the target. Therefore, the distribution of the noise photons is:

$$distr_{noise}(k) = N_{noise\_filter}$$
(3.14)

The noise events are due to not only environmental factors but also internal noise of SPADs. This noise level is evaluated by Dark-Count Rate (**DCR**). Therefore, Eq.3.14 is modified to:

$$distr_{noise}(k) = N_{noise\_filter} + DCR \cdot \Delta T_{res}$$
(3.15)

Therefore, the probability distribution  $(pd_{noise}(k))$  of triggering SPADs by noise photons (**noise events probability distribution**) is given by the following:

$$pd_{noise}(k) = 1 - (1 - PED)^{distr_{noise}(k)}$$
(3.16)

# 3.1.3 Combination of Signal Events and Noise Events

According to the signal events probability distribution and the noise events probability distribution mentioned above, we can derived the total probability distribution of triggering a SPAD by any photon  $(pd_{total}(k))$ :

$$pd_{total}(k) = 1 - [1 - pd_{signal}(k)][1 - pd_{noise}(k)]$$
(3.17)

The above analysis considers that a SPAD can be triggered by photons without deadtime, but in fact once a SPAD is triggered, it needs a period of time  $T_{dead}$  to reset. Applying the effect of deadtime to total probability distribution, we can obtain **the probability distribution with deadtime**  $(pd_{total,d})$ :

$$pd_{total,d}(k) = \begin{cases} \Sigma_{m=k-1-N_{int}}^{k-1} [1-pd_{total,d}(m))] pd_{total}(k), & C_{1} \\ (1-p_{noise})^{N_{int}-(k-1)} \cdot \Sigma_{m=0}^{k-1} [1-pd_{total,d}(m))] pd_{total}(k), & C_{2} \\ C_{1} = k > N_{int}, \\ C_{2} = 0 \le k \le N_{int} \end{cases}$$

$$(3.18)$$

$$N_{int} = \frac{T_{dead}}{\Delta T_{res}} \tag{3.19}$$

With Eq. 3.18, we can calculate two important parameters of the dToF system - accuracy and precision. Since the peak value of  $pd_{total,d}(k)$  indicates the target distance, the accuracy is given by

$$accuracy = \frac{k_{max} \cdot \Delta T_{res}/2c - d_{target}}{d_{target}} \times 100\%$$
(3.20)

where  $k_{maxbin}$  is the number of  $\Delta T_{res}$  where the maximum  $pd_{total,d}(k)$  is located. Accuracy evaluates the measurement error of the system (mainly due to SPAD deadtime in theory calculation). It is a kind of systematic error, which can be calibrated by a look-up table. Precision means the variation of the peak value when repeating the measurements. It is listed as follows:

$$precision = stddev(k_{maxbin}) = \frac{\sigma_{measure}}{\sqrt{N_{data}}}$$
(3.21)

where  $\sigma_{measure}$  is the FWHM of the signal probability distribution, and  $N_{data}$  is the number of timestamps included in FWHM. Obviously, precision is a random error, which cannot be calibrated by a look-up table. It can be decreased by using a laser with a smaller FWHM or by increasing the number of data.
# 3.2 System-Level Model

According to the analysis in Chap. 3.1.1 and Chap. 3.1.2, we can build a photon path model to describe the process of emitting and receiving photons. As shown in Fig. 3.1, the photon path model exports a electrical signal to trigger TDC model when the SPAD array is triggered by photons. Then, the TDC model generates a digital code (timestamp). The TDC model allows us to change the parameters (resolution, DNL and SSP) in order to verify their effects on accuracy and precision of the system.



Figure 3.1: Block diagram of system model for simulation.

Then, these timestamps are accumulated into a histogram, and an algorithm will find the location of the peak value. It is noted that the photon path model is built up with the help of the company called Silicon-Integrated.

#### 3.2.1 Finding-Peak Algorithm

As mentioned above, after accumulating the timestamps in a histogram, we need to find the location of the peak value. If we directly take the location of the highest bin as  $t_{flying}$ , it has a system resolution the same as *LSB* of the TDC. In the proposed system, this is done by calculating the center of mass of the signal events.



Figure 3.2: Finding-peak Algorithm. The events above noise floor are first clipped into a new histogram. Then, the center of mass of these bins indicates the flying time of laser to a target. is calculated, indicating the target distance.

As shown in Fig. 3.2, to decrease the effect of noise events, events above noise floor are firstly clipped into a new histogram. Then, the center of mass of these bins ( $T_{center}$ ) indicates the flying time of the laser ( $t_{flying}$ )

$$t_{flying} = T_{center} = \frac{kLSB \cdot count(k) + \dots + (k+N)LSB \cdot count(k+N)}{N_{signal}}$$
(3.22)

where  $N_{signal}$  is the total count of signal events, equal to  $\sum_{m=k}^{N} [count(m)]$ . This equation shows that the system resolution decreases to  $LSB/N_{signal}$ .

# 3.2.2 Effect of TDC Parameters

Since the accuracy of the dToF system can be calibrated by a look-up table, this part mainly focuses on the effect of TDC to precision. The parameters of TDC influencing precision mainly include resolution, DNL, and SSP.

# **Effect of Resolution**

Eq. 3.22 shows that a greatly decreased quantization error. However, in reality, the circuit of finding-peak algorithm has a limited precision of calculation. In the proposed system, the circuit can only keep two decimals in binary. This means



 $\sigma 1:$  original variation of  $t_{flying}$  (system precision in theory calculation)

 $\sigma$ 2: variation of t<sub>flving</sub> obtained from calculation of finding-peak circuit

Figure 3.3: Effect of the quantization error of the finding-peak circuit (worst case). A small variation of  $t_{flying}$  ( $\sigma_1$ ) leads to a larger variation  $\sigma_2$  due to quantization error.

that the quantization error of  $t_{flying}$  is no longer  $LSB/N_{signal}$  but 0.25 *LSB*. Fig. 3.3 describes the worst case of the effect of quantization error on system precision. If  $t_{flying}$  has a small variation near the border of two quantization steps, this variation will increase to 0.25 *LSB*, leading to a poorer precision.

Too large quantization error causes a hugely degraded precision in a short measuring range, limiting the shortest measuring range, since Eq. 3.23 shows the precision of the system is related to  $N_{data}$ . In a short range, the power of the reflected laser increases and thus  $N_{data}$  also increases, causing a smaller precision. However, the quantization error degrades the precision to 0.25 *LSB*.

From the analysis above, it can be found that since the quantization error of the finding-peak circuit is determined by TDC resolution, **TDC should have enough** high resolution to ensure  $\sigma_2$  is close to or smaller than  $\sigma_1$ :

#### **Effect of Differential Non-Linearity**

Fig. 3.4 shows the worst case of the effect of DNL on the precision of the system. If TDC has no DNL and  $t_{flying}$  has a variation of  $\sigma$  equal to 2*LSB*,  $t_{flying}$  is shifted right by 2 LSBs. However, if TDC has a -0.5 DNL,  $t_{flying}$  can be shifted right by 4 LSBs in the worst case. That is to say, in the worst case, **the precision of the system can be degraded by a factor of** 1/(1 - |DNL|).



 $\sigma$  is multipled by 1/(1-|DNL|) after quantized by TDC with DNL

Figure 3.4: Effect of the DNL of the TDC (worst case). Although  $t_{flying}$  has the the same variation  $\sigma$  on the two situations in time zone,  $t_{flying}$  is shifted two steps in the left but 4 steps in the right.

#### **Effect of Single-Shot Precision**

SSP evaluates the noise of TDC in time zone, so it leads to an extra variation to  $t_{flying}$  and thus degrades system precision. If we take the effect of SSP of into consideration, Eq. 3.23 is modified to:

$$precision = stddev(t_{target}) = \sqrt{\frac{\sigma_{measure}^2 + SSP^2}{N_{data}}}$$
(3.23)

 $\sigma_{measure}$  is determined by FWHM of the laser, which is usually a few of nanoseconds (2 *ns* in the proposed system). SSP of TDCs is usually at the level of picoseconds. Therefore, **SSP of TDC only has a negligible effect on system precision since the system is very noisy due to laser width**.

#### **3.3** Simulation Verification

According to the application of the proposed dToF system, the input parameters of the dToF are provided by Silicon-Integrated. These parameters are listed as follows: Input these parameters into the system model and two important figures are obtained as shown in Fig. 3.5 and Fig. 3.6.

| Items                  | Target                                                        |
|------------------------|---------------------------------------------------------------|
| Number of SPADs        | 4*32                                                          |
| Number of macro pixels | 4                                                             |
| Macro pixel deadtime   | 10 ns                                                         |
| Depth range            | 20 to 2500 mm                                                 |
|                        | +/- 10 mm (20 mm <distance<100 mm)<="" td=""></distance<100>  |
| Precision              | +/- 15 mm (100 mm <distance<200 mm)<="" td=""></distance<200> |
|                        | +/-5% of measurement (distance>200 mm)                        |
| Frame rate             | 30 fps (100 ns per pulse, 33333 timestamps per frame)         |
| Total power            | <100 mw                                                       |
| FWHM                   | 2 <i>ns</i>                                                   |

Table 3: Specifications of the proposed dToF system, provided by Silicon-Integrated



Figure 3.5: Precision of the system versus the target distance in five TDC resolutions. N is the number of data to calculate the precision at each data point.

Fig. 3.5 verifies the analysis of the effect of the TDC resolution on the precision of the system. 400 *ps* resolution and 200 *ps* resolution greatly degrades precision,

especially in a relatively short range. From the target distance longer than the shortest detection range (0.02 m), 100 *ps* resolution does not degrade precision too much. This is because in 0.02 m the ideal precision is about 2.3 *mm*, and the largest quantization error caused by 100 *ps* resolution is 3.75 mm, which is close to the ideal precision and thus causes only little precision degradation.

Although resolutions higher than 100 *ps* can be obtained with better precision, this increase is not obvious. On the other hand, higher resolution requires either higher power consumption or more complicated architecture. Therefore, considering the balance among power, complexity and precision, 100 *ps* is chosen as the target resolution of the proposed TDC.

Fig. 3.6 illustrates the effect of DNL on precision. DNL degrades precision as analyzed in Section. 3.2.2. To satisfy the precision requirements of the proposed system, **it can be found the maximum DNL should be smaller than 0.6.** 

Since the TDC resolution is determined as 100 ps, an 8-bit TDC is needed to cover the detection range of 2.5 m.



Figure 3.6: Precision of the system versus the target distance with different DNLs of TDC from 0 to 0.8. N is the number of data to calculate the precision at each data point.

# 3.4 Architecture Choice of the Proposed TDC

The analysis in Section. 3.3 shows that the proposed dToF system needs a TDC with moderate resolution (100 *ps*), low DNL (< 0.6), and 8-bit depth range. According to the summary in Section. 2.4.4, a ring based TDC can provide moderate resolution and good rejection to local process variation, which produces smaller DNL. Since we need not a high resolution, the architecture of independent TDC for each pixel is also abandoned.

As shown in Fig. 4.1, in the proposed dToF system, 32 SPADs are integrated into a macro pixel. If 9 nearby SPADs are triggered in a certain time, the macro pixel will generate a electrical pulse to trigger its corresponding TDC, so 4 TDCs are needed for 4 macro pixels. A VCO is enough to provide phases for 4 TDCs since the propagation distance for phases is relatively short, which means the mismatch caused by layout is small.

Since there is only one VCO, coupling among VCOs is not needed, meaning that an always-on architecture which consumes large power is not a good choice in the proposed TDC. Without coupling, VCO can be duty-cycled to decrease power consumption. Finally, considering low power consumption, moderate resolution, and small LSB variation, the proposed architecture is based on shared TDCs with duty-cycled VCOs.

# 4 Circuit Design

Chap. 3 analyzed the specification requirements of the proposed TDC. In Chap. 4, the details of TDC design are discussed, including overall architecture as well as each individual module. To fulfill the function and specification requirements mentioned in Chap. 3, some techniques are used, such as chopping comparator, half-gray-code counter and double sampling.

# 4.1 Top-level architecture

For the application in a single-point dToF system, the architecture with a dutycycled VCO shared by 4 TDCs is chosen, as shown in Fig. 4.1. To save power, the VCO is duty-cycled, controlled by *start* signal. Since there are only 4 pixels, the propagating distance of VCO output phases is not too long, which makes sure small mismatches among different phases. Therefore, the VCO can provide clock phases with good DNL and SSP for 4 TDCs. In addition, the 4 TDCs share the same VCO, thus having no *LSB* variation due to mismatches among different VCOs.



Figure 4.1: Top-level architecture of the proposed TDC.

The working principle of the proposed TDC is described as following: firstly, the synchronizer triggers laser source and TDC simultaneously. The laser source emits a laser pulse when triggered, and meanwhile the VCO begins to oscillate simultaneously. Then, the laser pulse is reflected by targets, and a pixel array detects reflected photons. To decrease the probability of noise triggering pixels, only the situation when 9 nearby SPADs (as shown in the red square shown in Fig. 4.1) triggered at the same time is considered as a valid event. When a macro pixel detects a valid event, it generates a *stop* signal and triggers the corresponding TDC to sample the VCO and coarse counter. This sampling result is named as a timestamp, which has a 8-bit length.

As shown in Fig. 4.2, the measurement time lasts for 20 *ns* since the maximum detecting range is 2.5 *m*. The relation between detection range  $(d_{range})$  and measurement time  $(T_{measure})$  is

$$T_{measure} = d_{range} * 2/c \tag{4.1}$$

Where *c* is the speed of light. In the next 80 *ns*, the TDC is in an idel state in which the VCO and coarse counter are disabled. During the idle time, the timestamps are read out serially, regulated by a 120 *MHz* clock *scan\_clk*. Each TDC stores the first two timestamps because the deadtime of a macro pixel is about 10 *ns*, so each macro pixel is possible to trigger TDC twice. If a TDC only can store one timestamp, it is hard for it to record signal event when flying time of the laser is longer than 10 *ns* because there is a high probability for noise events in the first 10 *ns* to occupy the TDC . Before entering the next measurement time, the TDC is reset, and all the stored memories in TDCs are refreshed. The time interval *T<sub>interval</sub>* between *start* and *stop* signals is given by

$$T_{interval} = data_out[7:3] \cdot 8 \cdot LSB + data_out[2:0] \cdot LSB$$
(4.2)

The reference TDC is used to control the frequency of the VCO, which will be discussed in Section. 4.2.

#### 4.2 Voltage-Controlled Oscillator

In Fig. 4.3, the VCO consists of four differential inverters, generating 8 clock phases (*phi*0 to *phi*3*n*), and achieving a frequency of 1.25 *GHz* and a LSB of 100 *ps*. When *start* is low, differential inverters are disabled. To avoid floating points, *phi*0 to *phi*3 are pulled to  $V_{dd\_inv}$ , and *phi*0*n* to *phi*3*n* are pulled to ground.



Figure 4.2: Timming diagram of the proposed TDC. *stop*0 to *stop*3 corresponds to macro pixel 0 to macro pixel 3. They have a rising pulse then the corresponding pixel detects a event.



Figure 4.3: Diagram of the VCO.

The frequency of the oscillator is controlled by the current injected into it.

Their relationship is given by

$$Power = C_{tot} \cdot V_{DD}^2 \cdot f_{OSC} = V_{DD} \cdot I_{in}$$

$$(4.3)$$

$$f_{OSC} = \frac{I_{in}}{C_{tot} \cdot V_{DD}} \tag{4.4}$$

Where  $C_{tot}$  is the total capacitance of the oscillator,  $I_{in}$  is the total current injected into the oscillator. Therefore, M1 is connected as a current source to limit the injected current, controlling frequency of the oscillator.  $V_{ctrl}$  is provided by the reference TDC shown in Fig. 4.1. As shown in Fig. 4.4, the time difference between *stop\_ref* and *start* is always at the maximum measuring range 16.67 *ns*. Ideally, the output code of the reference TDC is 166 if *LSB* length is 100 *ps*. If the frequency of the VCO is higher than the target frequency (1.25 *GHz*),  $V_{ctrl}$ will become higher, and vice versa. Since the reference TDC has a maximum quantization error of one *LSB*, the *LSB* length of the VCO is finally given by

$$LSB = 100ps \pm LSB/167 = 100 \pm 0.6ps \tag{4.5}$$



Figure 4.4: The frequency of the VCO is controlled by the reference TDC and a Digital-to-Analog Converter (**DAC**).

#### 4.2.1 Pseudo Differential Inverter

The delay element of the oscillator is implemented by pseudo differential inverters, as shown in Fig. 4.5a. It consists of a input pair  $(M_{n1} \text{ and } M_{n2})$ , active load  $(M_{p1} \text{ and } M_{p2})$  with negative conductance, load  $(M_{p3} \text{ and } M_{p4})$  with positive conductance, and a switch  $(M_{n3})$  to enable and disable the inverter.

In the oscillator chain, the inverters have two states when disabled. The disabled state 1 is in+ and out+(in- and out-) are pulled to the same polarity as



Figure 4.5: (a) A schematic of the pseudo differential inverter. (b) Disabled state 1 of the inverter.

shown in Fig. 4.5b, while the disabled state 2 is defined as in+ and out+(in- and out-) are pulled to different polarity as shown in Fig. 4.6a. In both disable state 1 and disable state 2, there is no path from supply voltage to ground, so in these two states, power consumption is only caused by leakage currents. Fig. 4.6b shows the enable state of the inverter. When  $V_{in+}$  is higher than  $V_{in-}$ , the current in the left brunch is larger than the right one. Finally, the active load latches out- to ground and out+ to supply voltage.

The small signal model of the inverter is illustrated in Fig. 4.7a. The gain of the inverter in small signal analysis is given by

$$A(s) = \frac{g_{mn1}}{-g_{mp1} + G_L + sC_L}$$
(4.6)

$$G_L = g_{dn1} + g_{dp1} + g_{dp3} \tag{4.7}$$

For a oscillator, when the oscillation is stable, the gain of the inverter is equal to 1 [31]. Therefore, the Eq. 4.6 is derived as:

$$f_{osc} = \frac{1}{2\pi} \sqrt{\frac{g_{mn1}^2 - (-g_{mp1} + G_L)^2}{C_L^2}}$$
(4.8)

This equation reveals that the highest frequency of the oscillator is achieved when  $g_{mp1} = G_L$ . On the other hand,  $g_{mp1}$  should not be too small to avoid the inverter



Figure 4.6: (a) Disabled state 2 of the inverter. (b) Enabled state when  $V_{in+}$  is larger than  $V_{in-}$ .

at each stage cannot provide enough phase shift. The phase shift of an inverter is given by

$$\Phi_{shift} = \arctan(\frac{-\omega C_L}{-g_{mp1} + G_L}) \ge 45^o \tag{4.9}$$

The total output capacitance of the oscillator is given by

$$C_L = \frac{C_{ox}}{L} (W_{Mn1} + W_{Mp1} + W_{Mp3})$$
(4.10)

Obviously,  $M_{p3}$  can provide more current than  $M_{p1}$  if it is the same width because  $M_{p3}$  is always biased in velocity saturation. Therefore, if the total current provided by  $M_{p3}$  and  $M_{p1}$  is kept unchanged, increasing the width ratio of  $M_{p3}/M_{p1}$  helps to increase the oscillation frequency (smaller total load capacitance).

The width ratio between  $M_{p3}$  ( $M_{p4}$ ) and  $M_{n1}$  ( $M_{n2}$ ) is important to ensure that the oscillator works properly. Assume that  $V_{in+} = V_{in-} = V_{cm} = V_{dd\_inv}/2$ , Mn1 should be strong enough to pull down  $V_{out+}$  and  $V_{out-}$  lower than  $V_{dd\_inv} - V_{thp}$ to activate the latch. Since  $M_{p3}$  and  $M_{n1}$  both work in velocity saturation, the following equation should be satisfied:

$$\frac{1}{2}\beta'_{n}\frac{W_{n}}{L_{n}}(V_{M}-V_{thn})^{2}(1+\lambda_{n}(V_{dd\_inv}-V_{thp}))$$

$$=\beta'_{p}\frac{W_{p}}{L_{p}}[(V_{dd\_inv}-V_{thp})V_{dsatp}-\frac{1}{2}V_{dsatp}^{2}](1+\lambda_{p}(V_{thp}))$$
(4.11)



Figure 4.7: (a) Small signal model of the pseudo differential inverter. (b) Transition of single-end inverter in rising and falling edges.

SMIC 55BCD is used in this article. Take  $V_{dd\_inv} = 1.2 V$ ,  $V_{thn} = 0.35 V$ ,  $V_{thp} = 0.4 V$ ,  $\beta'_n \approx 1.66\beta'_p$ ,  $\lambda_n = 1.872 V^{-1}$ ,  $\lambda_p = 2.05 V^{-1}$ ,  $V_{dsatp} \approx 0.27 V$ ,  $V_{dsatn} \approx 0.32 V$  into equation Fig. 4.11, the largest ratio between  $M_{p3}$  and  $M_{n1}$  is given by

$$W_{p3} \le 0.28W_{n1} \tag{4.12}$$

Leaving some space for the snfp corner, finally  $W_p/W_n$  is set to 0.21. Taking into account the frequency requirement (1.25 *GHz*), stabilityrising and falling edge symmetry, the ratio among  $M_{p3}$ ,  $M_{p1}$  and  $M_{n1}$  is set as 3 : 11 : 14.

There are multiple benefits for using pseudo differential inverters. Firstly, it gives a better CMRR than the single-end inverter. CMRR of the pseudo-differential inverters is given by[32]

$$CMRR \approx \frac{G_L + g_{mp1}}{G_L - g_{mp1}} \tag{4.13}$$

Secondly, as shown in Fig. 4.7b, the transition time of the single-end is determined by PMOS in state  $\Phi$ , while it is decided by NMOS in state  $\Phi_n$ . Therefore, there is a huge difference between the two periods of transition time due to the bad mismatch between PMOS and NMOS. As for the differential inverter, as shown in Fig. 4.8, the conducting paths are symmetrical in states  $\Phi$  and  $\Phi_n$ . Therefore, a smaller difference between the two periods of transition time is achieved, decreasing DNL of the TDC.



Figure 4.8: (a) Transition of the differential inverter in state  $\Phi$ . (b) Transition of the differential inverter in state  $\Phi_n$ .

# 4.3 Phase Comparator

The VCO composed of four differential inverters can generate 8 phases, as shown in Fig. 4.10. As shown in Fig. 4.10, the VCO phases are connected to four phase comparators. When the stop signal becomes high, the phase comparators are enabled to read the phase state of the VCO. The timing sequence is shown in Fig. 4.11. The readout results of phase comparators are combined as a 4-bit thermal code, which will be converted into a 3-bit binary code later.



Figure 4.9: The 8 phases generated by VCO are readout by 4 phase comparators.



Figure 4.10: VCO generates 8 phases.



Figure 4.11: Timing diagram of phase comparators: the state of VCO is read and latched when stop is high.

#### 4.3.1 StrongARM Based Comparator

The phase comparator used in TDC is implemented by StrongARM latch. This kind of comparator is chosen mainly for three reasons[33]: (1) it ideally consumes zero static power, which satisfies the low power requirement of the TDC. (2) its input referred offset is mainly from a differential input pair. This feature helps to control the offset originating from comparators, and thus reduce the DNL of the TDC.

The circuit design of the phase comparator is shown as. Fig. 4.12. It consists of a differential input pair ( $M_1$  and  $M_2$ ), tow cross-couple latches ( $M_3$  to  $M_6$ ) and five switches ( $S_1$  to  $S_5$ ) for reset. We can describe the operation of the comparator in four phases[33]. Its transient response is shown in Fig.4.13.

In the first phase (reset mode), *stop* is low.  $S_5$  is off, so there is no path from  $V_{DD}$  to the ground and thus, ideally, no static power consumption. P,Q, *outn* and *out p* are precharged to  $V_{DD}$ , so  $M_3$  to  $M_6$  are off.

In the second phase, *stop* is high.  $S_1$  to  $S_4$  are turned off, and S5 is turned on. The circuit can be simplified to Fig. 4.14a.  $M_1$  and  $M_2$  generate differential



Figure 4.12: Circuit design of the phase comparator

currents proportional to  $|V_{inp} - V_{inn}|$ . Since  $M_3$  to  $M_6$  are initially off, these currents flow into node capacitors  $C_P$  and  $C_Q$ , and generate a differential voltage  $|V_P - V_Q|$ . The differential gain can be obtained in this phase, which is given by [33]

$$|V_P - V_Q| \approx \frac{g_{m1,2}|V_{inp} - V_{inn}|t}{C_{P,Q}}$$
 (4.14)

Where  $g_{m12}$  is the transconductance of  $M_1$  and  $M_2$  in the small signal model and  $C_{P,Q} = C_P = C_Q$ , which are the total parasitic capacitance in node *P* or *Q*, respectively. This phase is also called amplification mode because the differential gain is obtained in the phase. When  $V_P$  and  $V_Q$  fall to  $V_{DD} - V_{thn}$ , the corss-couple NMOS (*M*3 and *M*4) are turned on, the second phase ends. The amplification mode thus lasts approximately[33]

$$T_{amp} = C_{P,Q} \cdot V_{thn} / I_{CM} \tag{4.15}$$

where  $I_{CM}$  is common-mode current of the input pair. Therefore, the differential voltage gain in this mode can achieve[34]

$$A_V \approx \frac{g_{m1,2}V_{thn}}{I_{CM}} \tag{4.16}$$

In the third phase,  $V_P$  and  $V_Q$  are smaller than  $V_{DD} - V_{thn}$ , but  $V_{outp}$  and  $V_{outn}$  are larger than  $V_{DD} - V_{thp}$ . It means that  $M_3$  and  $M_4$  are turned on, but  $M_5$  and  $M_6$ 

are turned off. The circuit in this phase can be simplified to Fig. 4.14b. In [33], it reveals the relationship between input and output in this phase, given by

$$C_{out p,outn} \frac{d(V_X - V_Y)}{dt} - g_{m3,4} \left(1 - \frac{C_{X,Y}}{C_{P,Q}} (V_X - V_Y)\right) = -2g_{m3,4} \frac{\Delta I}{C_{P,Q}} t \qquad (4.17)$$

This equation shows that the response form is  $\exp(t/\tau_{reg})$ . The regenerative time constant  $\tau_{reg}$  is given by

$$\tau_{reg} = \frac{C_{outn,outp}}{g_{m3,4}(1 - C_{outn,outp}/C_{P,Q})}$$
(4.18)

The output of the comparator is connected to the following devices, so the total output capacitance is much larger than  $C_{P,Q}$ . It means  $\tau_{reg}$  is negative, and this phase only provides little regeneration.



Figure 4.13: Transient response of the comprator. It is divided into four phase stages. The input signals (*inn*, *inp*) are two sinusoidal waves with a phase difference of  $\pi$ . *clk* is the *stop* signal of the comparator. *P*, *Q*, *outn*, *outp* are the node voltages corresponding to the nodes shown in Fig.4.12.

In the fourth phase,  $V_{outn,outp}$  drops to  $V_{DD} - |V_{thp}|$ . The regenerative latch is

completely turned on, so the regenerative time constant is modified to

$$\tau_{reg} = \frac{C_{outn,outp}}{g_{m3,4}} \tag{4.19}$$

The positive feedback of the regenerative latch finally pulls one side to  $V_{DD}$  and the other side to the ground. From the analysis above, we can conclude that the power consumption of the comparator is mainly from charging and discharging capacitors at node P,Q and *outn,out p*, given by

$$Power = 2f_{stop}(C_{P,O} + C_{outp,outn})V_{DD}^2$$

$$(4.20)$$



Figure 4.14: (a)Equivalent circuit of the comparator in phase 2. (b) Equivalent circuit of the comparator in phase 3.

It is noticeable that some transistors are set for special purpose in this design. Firstly,  $M_3$  and  $M_4$  cut off the path from  $V_{DD}$  to ground at the end of phase 4, decreasing the power consumption of the comparator and avoiding the latched output flipping again. Secondly,  $S_1$  and  $S_2$  recharge  $V_{P,Q}$  to  $V_{DD}$  in reset mode. There are two advantages of this design: (1) Decrease the dynamic offset. During the amplification mode,  $M_3$  and  $M_4$  are turned off. It avoids the offset from  $M_3$  and  $M_4$  being referred to input when the differential gain has not achieved a large value, since Eq. 4.14 shows that the gain increases with time. (2)At the beginning of phase 2, the initial value of  $V_{P,Q}$  is at  $V_{DD}$ , which helps  $M_1$  and  $M_2$  to remain in the saturation region for a longer time. Therefore, a larger differential gain is achieved before  $M_1$  and  $M_2$  enter triode region.

 $M_8$  also plays an important role in decreasing dynamic offset. Without  $M_8$ , in phase 3 and 4, the input pair generates a differential voltage with reversed polarity at node P,Q, and this voltage fights with the cross-couple latch. Therefore, the latch is turned on for a longer time, contributing larger dynamic offset. After applying  $M_8$ , the largest differential voltage at node P,Q is limited, so the fight between the input pair and the latch weakens and the dynamic offset decreases.

### 4.3.2 Chopping Scheme



Figure 4.15: (a) Different offset of phase comparators. (b) Chopping comparator.

As in the analysis in section 4.3.1, the offset of the comparator is mainly from the differential input pair. A typical contribution of offset is given in [33]:

$$V_{off,in}^2 = V_{off,1,2}^2 + \left(\frac{V_{off,3,4}}{4}\right)^2 + \left(\frac{V_{off,5,6}}{10}\right)^2 \tag{4.21}$$

Since each comparator has a different offset, as shown in Fig. 4.15a, the LSB length is influenced even though OSC provides perfect phases. To decrease the effect of comparator offset to DNL, a chopper is connected to the input of the comparator, as shown in Fig. 4.15b.

As shown in Fig. 4.16a, we assume a perfect uniform distribution signal is sampled by N times (N is large enough to neglect statistic error). This is known as a code density test. If the TDC is perfect (DNL is 0), the sampling results in the histogram should also be a uniform distribution. However, if there is DNL caused by the offset of the comparator, the sampling results in the histogram are distorted, as shown in Fig. 4.16b.

The effect of offset can be eliminated by chopping, as shown in Fig. 4.16c. The chopping clock signal reverses its state every time after a measurement. Therefore, in N samples, half are obtained in the clock state  $\Phi$ , and the other half are gotten in the clock state  $\Phi_n$ . In  $\Phi$ , we assume that a negative offset is superposed on the positive input *phi*, the corresponding LSB length decreases to  $(1 - \alpha)LSB$ , and the counts in the histogram reduce from N/16 to  $(1 - \alpha)N/16$ . In  $\Phi_n$ , the same negative offset is superposed to negative input *phi\_n*, the corresponding LSB length increase to  $(1 + \alpha)LSB$ , and the counts in the histogram increase to  $(1 + \alpha)N/16$ . Finally, as shown in Fig.4.16d, if we accumulate the results obtained in two the clock states in one histogram, the number count in this *LSB* is still equal to N/8. This result means that the length *LSB* restores to one *LSB*. By this scheme, the comparator offset can be completely eliminated in theory as long as N is large enough (neglecting a statistic error).

The chopping scheme shown in Fig. 4.31 seems to indicate that the comparator offset can be completely compensated. However, in reality, there are some factors that make the offset not completely canceled. The main reason is that the relationship between voltage offset and time offset is not linear. This causes that although the same voltage offset is superposed to positive and negative inputs separately in the two chopping phases, in the time domain, this voltage offset generates different time offsets. A detailed analysis of this problem will follow.

As shown in Fig. 4.17, assuming that  $T_{tran}$  is the sampling time when the comparator changes its state (output transition from 0 to 1 or from 1 to 0),  $T_{cross}$  is the time when the comparator input reverses its state, and  $\Delta T_{tran}$  is time difference between these two points ( $\Delta T_{tran} = T_{cross} - T_{tran}$ ). Ideally, we want  $T_{cross} = T_{tran}$  or in other words,  $\Delta T_{tran} = 0$ . However,  $\Delta T_{tran}$  usually is larger than 0 since the



(a)

With offset (No chopping):

Phase: 0, 1, 2, 3, 4, 5, 6, 7 LBS length (LSB): 0.7, 0.8, 1.4, 1.5, 1.3, 1.2, 0.6, 0.5



(b)



(d)

Figure 4.16: (a) Histogram obtained by sampling a uniform distribution input signal for N times (code density test). Comparators have no offset (b) Histogram obtained by sampling the same input, with offsets of comparators and no chopping scheme. (c) Chopping scheme eliminates comparator offsets: half samoles are obtained in the clock state  $\Phi$ , and the other half are gotten in the clock state  $\Phi_n$ . (d) Superpose the histogram obtained in the two chopping phases to eliminate offsets.



Figure 4.17: Comparator output versus sampling time.

comparator needs some time to build up its output.

With the chopping scheme, when the offset is superposed on the positive input, we assume that  $\Delta T_{tran}$  drifts to  $\Delta T_{tran} - T_{off_1}$ . When the offset is superposed to the negative input,  $\Delta T_{tran}$  drifts to  $\Delta T_{tran} + T_{off_2}$ . In the precious analysis,  $T_{off_1}$  is assumed to be equal to  $T_{off_2}$ , so the offset is canceled. However, the following calculation shows that  $T_{off_1}$  is not equal to  $T_{off_2}$ .

As analyzed above,  $T_{tran}$  is the edge when the comparator fails to build up the output and the output state transitions. With the help of the simplified circuit shown in Fig. 4.14a, we can calculate the charge accumulation at node P, Q in  $\Delta T_{tran}$ , which are defined as  $Q_+$  and  $Q_-$ , respectively, given by (neglecting the effect of channel length modulation).

$$Q_{+} = C_{P,Q} \int_{T_{tran}}^{T_{cross}} I_{+}(t) dt \qquad (4.22)$$

$$Q_{-} = C_{P,Q} \int_{T_{tran}}^{T_{cross}} I_{-}(t) dt$$
(4.23)

Since  $\Delta T_{tran}$  is a small value, the above equations are simplified to:

$$Q_{+} = C_{P,Q} \int_{T_{tran}}^{T_{cross}} I_{+}(t) dt \approx C_{P,Q} \cdot \Delta T_{tran} [\frac{I_{+}(T_{tran}) + I_{+}(T_{cross})}{2}]$$
(4.24)

$$Q_{+} = C_{P,Q} \int_{T_{tran}}^{T_{cross}} I_{-}(t) dt \approx C_{P,Q} \cdot \Delta T_{tran} [\frac{I_{-}(T_{tran}) + I_{-}(T_{cross})}{2}]$$
(4.25)

The input voltages of the comparator are given by  $V_+ = V_{cm} + \Delta V$  and  $V_- = V_{cm} - \Delta V$  at  $T_{tran}$ , and  $V_+ = V_- = V_{cm}$  at  $T_{cross}$ . Finally, the charge accumulation in  $\Delta T_{tran}$  is given by

$$Q_{+} = \frac{1}{2} C_{P,Q} \Delta T_{tran} \beta_{n} [(V_{cm} + \Delta V - V_{thn})^{2} + (V_{cm} - V_{thn})^{2}]$$
(4.26)

$$Q_{-} = \frac{1}{2} C_{P,Q} \Delta T_{tran} \beta_n [(V_{cm} - \Delta V - V_{thn})^2 + (V_{cm} - V_{thn})^2]$$
(4.27)

The difference of accumulated charge between these two nodes is as follows:

$$\Delta Q = Q_{+} - Q_{-} = 2C_{P,Q}\Delta T_{tran}\beta_{n}(V_{cm} + V_{th}) \cdot \Delta V \qquad (4.28)$$

The input of the comparator can be approximated as a sine wave. Near  $T_{cross}$ , we can consider  $sin\omega t \approx \omega t$ . We can rewrite  $\Delta V$  as  $\omega \cdot \Delta T_{tran}$ , where  $\omega$  is the frequency of the input signal. Eq. 4.29 is reformed as

$$\Delta Q = Q_+ - Q_- = 2C_{P,Q}\Delta^2 T_{tran}\beta_n (V_{cm} + V_{th}) \cdot \omega \tag{4.29}$$

With the effect of offset, in chopping phase  $\Phi$ ,  $\Delta T_{tran}$  drifts to  $\Delta T'_{tran,\Phi} = \Delta T_{tran} + T_{off_1}$ . The input voltages of the comparator at  $\Delta T'_{tran,\Phi}$  change to  $V_+ = V_{cm} + \Delta V'_{\Phi}$  and  $V_- = V_{cm} - \Delta V'_{\Phi}$ . Therefore, the charge accumulation in this situation becomes:

$$Q'_{+,\Phi} = \frac{1}{2} C_{P,Q}(\Delta T_{tran}) \beta_n [(V_{cm} + \Delta V'_{\Phi} - V_{th} - V_{off})^2 + (V_{cm} - V_{th} - V_{off}/2)^2]$$

$$Q'_{-,\Phi} = \frac{1}{2} C_{P,Q} (\Delta T_{tran}) \beta_n [(V_{cm} - \Delta V'_{\Phi} - V_{th})^2 + (V_{cm} - V_{th} - V_{off}/2)^2]$$

Rewrite  $\Delta V'_{\Phi}$  as  $\omega \cdot \Delta T'_{tran,\Phi}$ . The difference of accumulated charge in  $\Phi$  is given by

$$\Delta Q'_{\Phi} = Q'_{+,\Phi} - Q'_{-,\Phi}$$
  
=  $\frac{1}{2}\Delta T_{tran}\beta_n C_{P,Q}[V_{off}^2 - 2V_{off}(\omega\Delta T'_{tran,\Phi} + V_{cm} - V_{th}) + 4(V_{cm} - V_{th})\omega\Delta T'_{tran,\Phi}]$ 

To calculate the transition time in this situation, we make  $\Delta Q'_{\Phi}$  equal to  $\Delta Q$  since the comparator needs to accumulate the same charge to build up output. Therefore, the relationship between  $\Delta T_{tran}$  and  $\Delta T_{tran,\Phi}$  is given by

$$\Delta T_{tran,\Phi} = \frac{4(V_{cm} - V_{th})\omega\Delta T_{tran} - V_{off}^2 + 2V_{off}(V_{cm} - V_{th}))}{2(2V_{cm} - 2V_{th} - V_{off})\omega}$$
(4.30)

In the chopping phase  $\Phi_n$ , similarly, we can derive the equations as follows:

$$Q'_{+,\Phi_n} = \frac{1}{2} C_{P,Q} (\Delta T_{tran}) \beta_n [(V_{cm} + \Delta V'_{\Phi_n} - V_{th})^2 + (V_{cm} - V_{th} - V_{off}/2)^2]$$

$$Q'_{-,\Phi_n} = \frac{1}{2} C_{P,Q} (\Delta T_{tran}) \beta_n [(V_{cm} - \Delta V'_{\Phi_n} - V_{th} - V_{off})^2 + (V_{cm} - V_{th} - V_{off}/2)^2]$$

$$\Delta Q'_{\Phi_n} = Q'_{+,\Phi_n} - Q'_{-,\Phi_n}$$

$$= \frac{1}{2} \Delta T_{tran} \beta_n C_{P,Q} [2V_{off} (\omega \Delta T'_{tran,\Phi_n} - V_{off}^2 + V_{cm} - V_{th}) + 4(V_{cm} - V_{th}) \omega \Delta T'_{tran,\Phi_n}]$$

Also, by making  $\Delta Q'_{\Phi_n}$  equal to  $\Delta Q$ , we can obtain  $\Delta T'_{\Phi_n}$ , given by

$$\Delta T'_{\Phi_n} = \frac{4(V_{cm} - V_{th})\omega\Delta T + V^2_{off} - 2V_{off}(V_{cm} - V_{th}))}{2(2V_{cm} - 2V_{th} - V_{off})\omega}$$
(4.31)

Ideally, we want the offset is eliminated by chopping, which is to say  $\Delta T_{tran} = (\Delta T'_{\Phi} + \Delta T'_{\Phi_n})/2$ . However, **the calculation result is given by** 

$$\Delta T'_{av} = (\Delta T'_{\Phi} + \Delta T'_{\Phi_n})/2 = \frac{4(V_{cm} - V_{th})\Delta T_{tran}}{4(V_{cm} - V_{th}) - 2V_{off}}$$
(4.32)

Eq. 4.32 shows that the chopping performance is determined by  $\Delta T_{tran}$  and  $V_{off}$ . Obviously, smaller  $V_{off}$  provides better chopping performance. But  $V_{off}$  is difficult to further decrease since the size of the comparator is limited. Another way to improve chopping is decrease  $\Delta T_{tran}$ . This can be achieved by reasonably sizing each transistor in the comparator.

 $\Delta T_{tran}$  is mainly determined by the lasting time of amplification mode (comparator phase 2), and Eq. 4.15 shows  $\Delta T_{tran}$  is proportional to  $C_{P,Q}$  if the size of the input pair remains unchanged ( $I_{CM}$  does not change). Therefore, decreasing the size of the cross-couple latch can reduce  $\Delta T_{tran}$ . However, a smaller size of the latch also brings some drawbacks. Firstly, in Fig. 4.12, the latch ( $M_3$  and  $M_5$  or  $M_4$  and  $M_6$ ) has a larger offset. Secondly, the time constant of the latch becomes larger, so the latch contributes a larger dynamic offset.

The analysis above provides a method that we need to find a balanced point where the best chopping performance is achieved. To test the chopping performance of the comparator,  $\Delta T'_{av}$  of the comparator is obtained by 500 times of Monte-Carlo test, and their standard deviations are also calculated. Chopping performance can be evaluated by  $\sigma(\Delta T'_{av})$  since zero  $\sigma(\Delta T'_{av})$  means that  $\Delta T'_{av}$  is not affected by offset. Therefore, we can scan the width ratio between transistors consisting of latch and input pair to find the smallest  $\sigma(\Delta T'_{av})$ . The results are shown in Fig. 4.18. The figure shows that the smallest  $\sigma(\Delta T'_{av})$  is obtained when W5/W1is equal to 0.5.



Figure 4.18: Standard deviation of  $\Delta T'_{av}$  versus the width ratio of  $M_5/M_1$ . Every data point is obtained by 500 times of Monte-Carlo test.

# 4.4 Coarse Counter

As shown in Fig. 4.10, the coarse counter is triggered by phi3n from the VCO. Every time the VCO passes a period, the coarse counter is triggered and counts its period. A main problem that needs to be discussed in this part is the alignment between the fine phases (generated by the VCO) and the coarse phases (generated by the coarse counter). As shown in Fig. 4.19, the transition time of coarse phases and fine phases is impossible to be perfectly aligned since there is always a delay from phi3n to the counter output. In this part, a combination of gray counter, correction logic, and selection logic are used to calibrate the sampled result.



Figure 4.19: Timing diagram of the coarse and fine phases: their transition time is not aligned.

# 4.4.1 Gray Counter and Correction Logic

Gray code has an advantage in data transmission compared to binary code, since there is a less chance of error. In addition, compared with binary counter, gray counter has a smaller transition probability, thus consuming less power. Table. 4 shows the bit flips of a 3-bit binary counter and a 3-bit gray counter. In theory, the power consumption of the latter is only 0.57 of that of the former.

| Counts    | Binary |       |       |           | Gray  |       |       |           |
|-----------|--------|-------|-------|-----------|-------|-------|-------|-----------|
|           | $Q_2$  | $Q_1$ | $Q_0$ | Bit flips | $Q_2$ | $Q_1$ | $Q_0$ | Bit flips |
| 0         | 0      | 0     | 0     | 3         | 0     | 0     | 0     | 1         |
| 1         | 0      | 0     | 1     | 1         | 0     | 0     | 1     | 1         |
| 2         | 0      | 1     | 0     | 2         | 0     | 1     | 1     | 1         |
| 3         | 0      | 1     | 1     | 1         | 0     | 1     | 0     | 1         |
| 4         | 1      | 0     | 0     | 3         | 1     | 1     | 0     | 1         |
| 5         | 1      | 0     | 1     | 1         | 1     | 1     | 1     | 1         |
| 6         | 1      | 1     | 0     | 2         | 1     | 0     | 1     | 1         |
| 7         | 1      | 1     | 1     | 1         | 1     | 0     | 0     | 1         |
| Total     |        |       |       | 14        |       |       |       | Q         |
| bit flips |        |       |       | 14        |       |       |       | 0         |

Table 4: Comparison of bit flips between the 3-bit binary counter and the 3-bit gray counter.



Figure 4.20: The critial path of a 5-bit gray counter



Figure 4.21: High-speed gray counter with 3 + 2 structure.



Figure 4.22: 3-bit gray code counter.



Figure 4.23: 2-bit gray code counter.

In the proposed TDC, a 5-bit counter is required to work at 1.25GHz input clock frequency. Taking into account stability, the counter should work over different process corners,  $\pm 10\% V_{DD}$  variation and temperature range from  $-40^{\circ}$  to

120° (208 K to 368 K). However, a conventionally designed 5-bit gray counter is hard to fulfill this requirement due to long logic paths, as shown in Fig. 4.20.

To increase the speed of the gray counter, a half gray counter with 3+2 structure is used, as shown in Fig. 4.21. As shown in Fig. 4.22, the 3-bit gray counter has a shorter critical path (composed of a DFF, an inverter, and a two-input MUX), so it can achieve a higher speed. Every time the 3-bit counter passes 8 counts, the connection logic shown in Fig. 4.24 generates a *enable* pulse to trigger the 2-bit counter. The timing diagram of this process is shown in Fig. 4.25.

| Place of errors | Flag | Possible error code | Calibrated code |
|-----------------|------|---------------------|-----------------|
| From 00100      | 1    | 00000               | 00100           |
| to 01000        | 1    | 01100               | 00100           |
| From 01100      | 0    | 01000               | 01100           |
| to 11000        | 0    | 11100               | 01100           |
| From 11100      | 1    | 10100               | 11100           |
| to 10000        | 1    | 11000               | 11100           |

Table 5: Possible ambiguous states of the gray counter and calibrated results.



Figure 4.24: Connection logic between the 3-bit gray counter and the 2-bit gray counter. The flag generator is also shown.

Fig. 4.25 shows that every 8 counts, there are two bit flips. They cause the counter to be at risk of generating ambiguous states, so a flag signal is generated by the connection logic to calibrate this situation. Two bit flips occur in three places in the measurement range, so there are six codes that need to be calibrated. The possible ambiguous states and their calibrated results are shown in Table. 5.



Figure 4.25: Timing diagram of the proposed gray counter.

#### 4.4.2 Double Sampling and Selection Logic

Fig. 4.19 shows the problem that fine phases and coarse phases cannot be aligned. In [35], a way is introduced to solve the problem by triple sampling. However, this technique occupies a large area and has strict requirements to delay between coarse and fine phases. In this article, this technique is improved by double sampling and selection logic. Fig. 4.26 illustrates the double sampling algorithm.  $T_{dc}$  is the delay between the fine phase *phi3n* (the eighth phase) and the transition time of the coarse counter;  $T_{db}$  is the delay between two times of sampling;  $T_p$  is the period of the fine phases;  $C_1$  and  $C_2$  are the results of the first and second sampling, respectively. When the stop signal arrives, the counter latches take the first sampling, and after a certain time  $T_{db}$ , the second sampling occurs.

There are three situations of double sampling. Case 1 and case 4 are the situation where  $C_1$  is equal to  $C_2$ , and  $C_1$  is chosen as output. In case 2,  $C_1$  is not equal

to  $C_2$ , and the sampled fine phase is larger than 3, so  $C_1$  is chosen. In the case 3,  $C_2$  is chosen since  $C_1$  is not equal to  $C_2$  and the fine phase is smaller than 4.

The algorithm above ensures we always can get the right coarse result. It should be mentioned that some delay limits are required. Firstly,  $T_{db}$  should be larger than  $T_{dc}$ . Otherwise, in case 3, both of  $C_1$  and  $C_2$  are wrong. Secondly,  $T_{dc}$  is smaller than  $T_p/2$  to ensure that the second algorithm gives the correct coarse result. Lastly,  $T_{db}$  should be smaller than  $T_p/2 + T_{dc}$ . If this rule is broken, when  $C_1$  is sampled in the fine phase, code 3,  $C_2$  will fall in the next cycle of the coarse phase. In this situation, the third algorithm causes the selection logic wrongly chooses  $C_2$ .



Figure 4.26: Possible cases of double sampling, algorithm of selection logic, and conditions of delay time.



Figure 4.27: Block diagram of the half-gray counter with double sampling.

The connection among the modules mentioned above is shown in Fig. 4.27. The output of the half-gray counter is firstly sampled twice by the double sampling

module. Then, two modules of correction logic calibrate the data from the two times of sampling, respectively. Lastly, the two groups of calibrated data are sent to the selection logic to choose the correct coarse result.

# 4.5 Readout Circuit

In the proposed dToF system, each pixel is connected to a readout circuit. Once the pixel is triggered, the corresponding readout circuit will read and store the coarse and fine phases provided by the VCO and the coarse counter, respectively. The readout circuit consists mainly of three parts: a gate logic to filter out stop signal from SPADs, latch arrays to read and temporarily store information of coarse and fine phases, and a scan chain for serially exporting final digital codes, as shown in Fig. 4.28.



Figure 4.28: Block diagram of the readout circuit.



Figure 4.29: Timing diagram of the readout circuit.

Fig.4.29 shows the timing sequence of the readout circuit. Firstly, *reset* signal refreshes all the latches and DFFs in the TDC before *start* becomes high. If valid events are triggered, *trigger\_in* will activate the comparators and the DFF array to perform read and store operations. When the second event is triggered, the previous event (marked as Ts1) will be latched by the second DFF array, and the latest event (marked as Ts2) will be latched by the first DFF array. In idle time, a scan DFF chain will read out all timestamps one by one.

# 4.5.1 Gate Logic



Figure 4.30: Block diagram of gate logic.

The read operation of the readout circuit is triggered by voltage pulses (*stop* signal) from a SPAD array. However, some pulses should be filtered out, since we only count the first two pulses from the SPAD array after the *start* signal rises. There are mainly two situations that need to be filtered out. (1) *stop* arrives before *start*. (2) More *stop* pulses come after the first two *stop* pulses have arrived.

As shown in Fig. 4.30, it consists of three parts. It can be found that the *stop* signal from the SPAD array passes to the output (*trigger\_in*) through the critical path, and *trigger\_in* will be used to trigger comparators and DFF arrays. When *stop* comes before *start*, the permission control logic will disable the AND gate in front of the output (*trigger\_permission* signal is low). This timing sequence is shown in Fig. 4.31a. As shown in Fig. 4.31b, if more than two *stop* pulses arrive, the signal *event\_num* will disable the AND gate, thus preventing more *stop* pulses from passing to the output.

It is noticeable that some extra input pins are added in front of the gate logic for chip testing. SPAD can be manually activated by *auto\_stop*; *erig* is used to directly

import an electrical pulse without the SPAD, and *erig\_sel* can select the input pulse from the SPAD or the electrical pulse. *event\_sel* can decide the TDC stores only the first *stop* pulse or the first two *stop* pulses.



Figure 4.31: (a)Timing sequence when *stop* comes before *start*. (b)Timing sequence when more than two *stop* signals arrive.

#### 4.5.2 Latch Arrays

Fig. 4.32 shows a detailed block diagram of the latch arrays. There are two paths for data transmission. As for the coarse date (*counter* < 4 : 0 >) from the coarse counter, they are firstly sampled twice by two counter sampling blocks. This block is composed of 5 d-flip-flops, which can store the state of the counter when the signal *trigger\_in* arrives. Then, the latched result is corrected and selected, and then marked as *coarse\_latest* < 4 : 0 > as mentioned in the part of the coarse counter (Section 4.4). If the second *trigger\_in* comes, *coarse\_latest* < 4 : 0 > is sampled and stored by counter latch array 2, and the data in *coarse\_latest* < 4 : 0 > is refreshed by newly sampled data from the counter. As for the fine data *comp\_therm* < 3 : 0 > from the comparators, they are processed in a similar way to that of the coarse data. A thermal-to-binary converter is inserted to convert the fine date from thermal code to binary code.


Figure 4.32: Block diagram of the latch arrays.

### 4.5.3 Scan Chain

As we discussed before, each TDC temporarily stores two groups of data (*previous*  $\_event < 7: 0 >$  and *latest* $\_event < 7: 0 >$ ). In idle time, eight groups of data are serially exported by scan DFFs. As shown in Fig. 4.33, at the beginning of idle time, *scan\_en* is low (the input D of the DFF scan is activated), so the first rising edge of *scan\_clk* triggers the scan chain to read the data stored in TDC. Then, *scan\_en* goes to high (the SI input of the scan dff is activated). Eight groups of data are serially exported.



Figure 4.33: Block diagram of the scan chain: each TDC has two groups of scan dff to read the latest and previous events, respectively. Eight groups of scan dff are serially connected to export eight timestamps (each TDC stores two timestamps in the measuring time).

## 5 Post-Layout Simulation Results

In this chapter, the results of the post-layout simulation are introduced and discussed. They are used to estimate the performance of the proposed TDC, which includes frequency of the VCO, noise and SSP, DNL, and power consumption.

### 5.1 Frequency of the VCO

Fig. 5.1 illustrates the frequency of the VCO changes with the control voltage  $V_{ctrl}$ . It proves that the turning range of the VCO can cover the target frequency 1.25*GHz*, and the period of the VCO is 800*ps*. Since the VCO has 8 phases, each phase has a length of 100*ps*, which determines the resolution of the TDC is 100*ps*.



Figure 5.1: The frequency of the VCO changes with the control voltage  $V_{ctrl}$ . The target frequency 1.25*GHz* is included in the tunring range of the VCO.

#### 5.2 Noise Analysis

Fig.5.2 shows the accumulated timing jitter of the VCO at three different periods of measuring time. It is known that the accumulated jitter increases with measuring time. When measuring time equal to 18.4 *ps*, the standard deviation of the accumulated jitter is 1.25*ps*, which is much smaller than the FWHM of the laser. It is noticeable that at the edge of two quantization steps, the SSP of the TDC can

increase to one *LSB* (100 *ps*) due to the quantization error even though the accumulated jitter is smaller than one *LSB*. However, one *LSB* is much smaller than the FWHM of the laser, so the noise of the OSC has a negligible effect on the system precision.



(b)



Figure 5.2: The accumulated jitter of the VCO with a measuring time of (a) 2.33ns, (b) 11.2ns and (c)18.4ns.



Figure 5.3: Jitter of the comparator transition time, obtained by 500 times of transient nosie test.

In Fig.5.5b, the noise of the comprator is simulated by testing its transition time. In 500 times of the transient noise test, the standard deviation of the transition time ( $\sigma_{comp}$ ) is 0.15 *ps*.

Combing the the accumulated jitter of the VCO and the jitter of the comparator, the SSP of the TDC is estimated as following:

$$SSP = \sqrt{\sigma_{VCO,max}^2 + \sigma_{comp}^2} \approx 1.26 \ ps \tag{5.1}$$

where  $\sigma_{VCO,max}$  is the maximum accumulated jitter of the VCO. Obviously, the SSP is much smaller than the FWHM of the laser.

#### 5.3 DNL Estimation

Fig. 5.4 shows the DNL distribution of the VCO. The maximum DNL is -0.14/+0.23, and the standard deviation is 0.064. It is noticeable that this post simulation result includes not only the mismatches caused by VCO itself but also the mismatches from signal propagation and buffers.



Figure 5.4: DNL distribution of the VCO, obtained by 500 times of Monte-Carlo test.

Fig. 5.5 shows that the chopping scheme decreases the offset of the comparator. The transition time of the comparator is obtained by connecting two sinusoidal waves with reverse polarity but the same phase to the positive and negative inputs of the comparator. By testing the transition time, we know that the maximum offset



Figure 5.5: (a)Transition time variation of the comparators obtained from 500 times of Monte-Carlo test, without chopping (b) Transition time variation of the comparators, with chopping. The two input signals reverse their polarity in 0ps, and the output of the comparator transitions at -33 ps and -31 ps on average, respectively.

decreases from -/+15.5 ps to -/+6.8 ps, with a decrease of 56%. The standard deviation decreases from 5.06 ps to 1.86 ps, with a decrease of 63.2%.

Combing the DNL contributed by the VCO and the compatators, it is estimated that the DNL of the TDC is given in Table. 6.

|                  | Maximum DNL(LSB) | DNL Standard deviation(LSB) |  |
|------------------|------------------|-----------------------------|--|
| no chopping      | -0.45/+0.55      | 0.096                       |  |
| chopping         | -0.27/+0.37      | 0.069                       |  |
| Decrease of  DNL | -40%/-33%        | -28.1%                      |  |

Table 6: The estimated DNL of the TDC by combing the Monte-Carlo test of the VCO and the comprator.

#### 5.4 Performance Summary

| TDC state                          | Power comsumption |  |
|------------------------------------|-------------------|--|
| VCO is on. Readout circuit is off  | 2.68 mW           |  |
| VCO is off. Readout circuit is on, | 829 μW            |  |
| running at 120MHz                  |                   |  |
| VCO is off. Readout circuit is off | 0.18 µW           |  |
| Average power                      | 0.716 <i>mW</i>   |  |
| (Laser emitting period=100ns)      |                   |  |

Table 7: Power consumption of the proposed TDC in different states.

Table. 7 shows when the VCO of the TDC is turned on (during measuring time), it consumes a power of 2.68 *mW*. When the measuring time ends, VCO is off and scan chain begins to export data, consuming 829  $\mu W$  at 120 *MHz* scan clock. When both VCO and readout circuit are off, the TDC only consumes 0.18  $\mu W$  due to leakage current. The laser period is 100 *ns*, so the average power consumed by the TDC in a period is 0.716 *mW*. Table. 8 illustrates the power consumption of the proposed TDC. *VCO* consumes the most part of power since it runs at a very high frequency of 1.25 *GHz*.

Table. 9 shows the performance comparison of state-of-the-art TDCs applied in the dToF system. In comparison with [1], this work uses a similar technology. The DNL of this work is only about 1/5 of the former, while the area of this work is only approximately 3 times larger than the former. Considering DNL is an inverse proportion to square root of the area, this work performs better in DNL. Compared

| Block             | Avearge power consumption | Contribution |
|-------------------|---------------------------|--------------|
| VCO               | 0.379 mW                  | 52.8%        |
| Coarse counter    | 0.077 mW                  | 10.8%        |
| Readout Circuit   | 0.256 mW                  | 35.8%        |
| VCO control logic | 0.014 mW                  | 2.0%         |

Table 8: Average power contribution of each block of the TDC. The laser emitting period is 100 *ns*, and the VCO as well as the coarse counter are turned on in the first 20 *ns* and turned off in the rest 80 *ns* in a period.

|                | This work        | [17]              | [1]                  | [9]              |
|----------------|------------------|-------------------|----------------------|------------------|
|                |                  | Shared            | Shared               |                  |
| Architecture   | Shared           | duty-cycled,      | always-on,           | Independent      |
|                | duty-cycled,     | ring-based        | ring-based           | duty-cycled,     |
|                | ring- based      | with phase        | with phase           | ring- based      |
|                |                  | interpolation     | interpolation        |                  |
| Technology     | 55 nm            | 350 nm            | 45 <i>nm</i>         | 180 nm           |
| Supply voltage | 1.2 V            | 3.3 V             | 1.2 V                | 1.8 V            |
| Resolution     | 100 <i>ps</i>    | 49.5 <i>ps</i>    | 60 <i>ps</i>         | 48.8 ps          |
| Depth range    | 8 bit/ 25.6 ns   | 6.49 <i>us</i>    | 14 bit/ 983 ns       | 12 bit/ 200 ns   |
| Num of TDC     | 4                | 432               | 16                   | 1728             |
| Sampling rate  | 80 Mcps          | N/A               | N/A                  | 35.5 Mcps        |
| Peak power     | 0.67 mW          | 1.65 <i>mW</i>    | 0.5 <i>mW</i>        | N/A              |
| /TDC           | 0.07 mw          |                   |                      |                  |
| Average power  | 0 179 mW         | N/A               | 0.5 <i>mW</i>        | 0.3 <i>mW</i>    |
| /TDC           | 0.179 /////      |                   |                      |                  |
| Area/ TDC      | $1875 \ \mu m^2$ | $13400 \ \mu m^2$ | $550 \ \mu m^2$      | $4200 \ \mu m^2$ |
| DNL            | -0.27/+0.37      | 1.79              | -1/+0.9              | -0.48/0.6        |
| Num of Pixels  | 4×32             | 9×18              | 256×256              | 242×144          |
| Features       | Low power        | Low power         | Large<br>Pixel array | Small DNL        |
|                |                  | High reso-        |                      | High reso-       |
|                |                  | lution            |                      | lution           |

Table 9: Comparison performance of State-of-arts TDCs applied in dToF system.

with [9], this work achieves a much better DNL because of the use of chopping comparator.

# 6 Conclusions

This chapter provides conclusions and responses to the research questions raised in Chap. 1.1. Furthermore, the future research direction is also discussed in the second part.

#### 6.1 **Responses to Research Questions**

Can the dToF system-level model prove that TDC specifications such as resolution, DNL and INL, and power consumption can be optimized to achieve the required performance of the target dToF system?

In this thesis, a system model is built to optimize the TDC performance for an application in dToF system. This model clearly explains that Resolution and DNL are two of the most important parameters influencing the precision of the dToF system. The former determines the calculation precision by finding out the center of mass of the histogram, while the latter degrades precision by shifting the center of mass. The proposed system model expects that the optimized parameters of the TDC are 100 *ps* resolution and a DNL smaller than 0.6, while SSP only has a negligible effect on the system.

According to the simulation results of the dToF system modeling, is there a reasonable and feasible architecture of TDC which can meet the specification requirements, resistant to local process variation (e.g. LSB variation and maximum DNL), and simplicity of calibration?

Based on the requirements pf specifications obtained from the system modeling, this thesis introduces a ring based TDC built on the architecture with a shared and duty-cycled VCO. This architecture has low power consumption, decreasing the average power consumption of the TDC to 179  $\mu W/TDC$ . Compared with state-of-arts TDC, this TDC achieves a smaller DNL (-0.27/+0.37) because the use of chopping comparators. It also has a small LSB variation with a standard deviation of 0.64 *ps*. In addition, a new way called double sampling is used to align the fine phases and coarse phases of the ring based TDC, occupying less area, consuming less power and releasing the delay requirements. This TDC is designed with the SMIC 55BCD technology and 1.2 *V* supply voltage.

#### 6.2 Future Work

There are some problems that are not solved in this thesis, so further research could focus on them.

Firstly, as mentioned in Section. 4.31, the chopping scheme does not completely eliminate the offsets of the comparators. Instead of calibrating the comparator offset when it is working, we can do this work in the idle time of the TDC. For example, during idle time, two groups of binary-weighted transistors or capacitors could be connected to the input pair of comprators to calibrate offset.

Secondly, as discussed in Section. 4.4.2, the double sampling scheme has delay requirements between coarse phases and fine phases. In this work, coarse phases and fine phases are aligned manually by shifting the input clock of the coarse counter (e.g. changing from  $phi3_n$  to  $phi2_n$ ). But if the delay changes due to temperature or supply voltage during TDC working, manual calibration cannot solve the problem. Therefore, a calibration circuit could be designed to change the delay automatically and instantly.

Finally, we should also consider an application of the TDC in a larger pixel array. For that application, one VCO cannot cover all TDCs, but too many VCOs without coupling leads to large variation in the LSB. Therefore, VCOs could be coupled and turned on before the laser is emitted. When the *start* signal comes, the oscillation of the VCOs should have been stable. For such a topology, we combine the advantages of always-on and duty-cycled architectures.

### Bibliography

- A. Ronchini Ximenes, "Modular time-of-flight image sensor for light detection and ranging a digital approach to LIDAR," Ph.D. dissertation, Delft University of Technology, 2019.
- [2] M. Perenzoni, D. Perenzoni, and D. Stoppa, "A 64 × 64-pixels digital silicon photomultiplier direct tof sensor with 100-mphotons/s/pixel background rejection and imaging/altimeter mode with 0.14% precision up to 6 km for spacecraft navigation and landing," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 1, pp. 151–160, 2017.
- [3] Y.-J. Chang, S.-F. Chen, and J.-D. Huang, "A kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities," *Research in developmental disabilities*, vol. 32, pp. 2566–70, 07 2011.
- [4] M. Perenzoni, D. Perenzoni, and D. Stoppa, "A 64 × 64-pixels digital silicon photomultiplier direct tof sensor with 100-mphotons/s/pixel background rejection and imaging/altimeter mode with 0.14km for spacecraft navigation and landing," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 1, pp. 151– 160, 2017.
- [5] C. Niclass, M. Soga, H. Matsubara, S. Kato, and M. Kagami, "A 100-m range 10-frame/s 340 × 96-pixel Time-of-Flight depth sensor in 0.18-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 2, pp. 559–572, 2013.
- [6] J. Wu, K. Wang, and Y. Gu, "Research on technology of microwave-photonicbased multifunctional radar," in 2016 CIE International Conference on Radar (RADAR), 2016, pp. 1–4.
- [7] D. Ren, C. Li, J. Shi, and R. Chen, "A review of high-frequency ultrasonic transducers for photoacoustic imaging applications," *IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control*, vol. 69, no. 6, pp. 1848– 1858, 2022.
- [8] M.-C. Amann, T. Bosch, M. Lescure, R. Myllylä, and M. Rioux, "Laser ranging: A critical review of unusual techniques for distance measurement," *Opt. Eng.*, vol. 40, pp. 10–19, 01 2001.
- [9] C. Zhang, "CMOS spad sensors for 3D time-of-flight imaging, lidar and ultrahigh speed cameras," 2019.
- [10] R. Lange and P. Seitz, "Solid-state time-of-flight range camera," *IEEE Jour-nal of Quantum Electronics*, vol. 37, no. 3, pp. 390–397, 2001.

- [11] C. Niclass, C. Favi, T. Kluter, F. Monnier, and E. Charbon, "Single-photon synchronous detection," *Solid-State Circuits, IEEE Journal of*, vol. 44, pp. 1977 – 1989, 08 2009.
- [12] E. Charbon, "Single-photon imaging in complementary metal oxide semiconductor processes," *Philosophical transactions. Series A, Mathematical, physical, and engineering sciences*, vol. 372, p. 20130100, 02 2014.
- [13] S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa, "Avalanche photodiodes and quenching circuits for single-photon detection," *Applied optics*, vol. 35, pp. 1956–76, 04 1996.
- [14] C. Veerappan and E. Charbon, "A low dark count p-i-n diode based spad in CMOS technology," *IEEE Transactions on Electron Devices*, vol. 63, no. 1, pp. 65–71, 2016.
- [15] H. Cho, C.-H. Kim, and S.-G. Lee, "A high-sensitivity and low-walk error ladar receiver for military application," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 61, pp. 3007–3015, 10 2014.
- [16] B. Behroozpour, P. A. M. Sandborn, N. Quack, T.-J. Seok, Y. Matsui, M. C. Wu, and B. E. Boser, "Electronic-photonic integrated circuit for 3d microimaging," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 1, pp. 161–172, 2017.
- [17] S. Mandai, "Multichannel digital silicon photomultipliers for time-of-flight pet," Ph.D. dissertation, Delft University of Technology, 2014.
- [18] M. Abbas and K. Khalil, "A 23 ps resolution time-to-digital converter implemented on low-cost fpga platform," 07 2015.
- [19] X. Qin, C. Feng, D. Zhang, B. Miao, L. Zhao, X. Hao, S. Liu, and Q. An, "Development of a high resolution tdc for implementation in flash-based and anti-fuse fpgas for aerospace application," *IEEE Transactions on Nuclear Science*, vol. 60, no. 5, pp. 3550–3556, 2013.
- [20] M. Zhang, H. Wang, and Y. Liu, "A 7.4 ps fpga-based tdc with a 1024-unit measurement matrix," *Sensors*, vol. 17, p. 865, 04 2017.
- [21] A. Carimatto, S. Mandai, E. Venialgo, T. Gong, G. Borghi, D. R. Schaart, and E. Charbon, "11.4 A 67,392-spad pvtb-compensated multi-channel digital sipm with 432 column-parallel 48ps 17b tdcs for endoscopic time-of-flight pet," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1–3.

- [22] S. Mandai, V. Jain, and E. Charbon, "A fully-integrated 780×800 μm multi-digital silicon photomultiplier with column-parallel time-to-digital converter," in 2012 Proceedings of the ESSCIRC (ESSCIRC), 2012, pp. 89–92.
- [23] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, and D. Schmitt-Landsiedel, "A local passive time interpolation concept for variation-tolerant high-resolution time-to-digital conversion," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 7, pp. 1666–1676, 2008.
- [24] J. Mauricio, L. Freixas, A. Sanuy, S. Gómez, R. Manera, J. Marin, J. Pérez, E. Picatoste, P. Rato Mendes, D. Sanchez, A. Sanmukh, O. Vela, and D. Gascon, "Matrix16: A 16-channel low-power TDC ASIC with 8 ps time resolution," *Electronics*, vol. 10, p. 1816, 07 2021.
- [25] Z. Cheng, M. J. Deen, and H. Peng, "A low-power gateable vernier ring oscillator time-to-digital converter for biomedical imaging applications," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 10, no. 2, pp. 445– 454, 2016.
- [26] J. Yu, F. F. Dai, and R. C. Jaeger, "A 12-bit vernier ring time-to-digital converter in 0.13 CMOS technology," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 4, pp. 830–842, 2010.
- [27] L. Tang, X. Jin, H. Yang, J. Yang, and W. Liu, "In-pixel time-to-digital converter for 3d tof cameras with time amplifier," vol. 85, no. 2, 2015.
- [28] M. Lee and A. A. Abidi, "A 9 b, 1.25 ps resolution coarse–fine time-to-digital converter in 90 nm CMOS that amplifies a time residue," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 4, pp. 769–777, 2008.
- [29] A. Carimatto, S. Mandai, E. Venialgo, T. Gong, G. Borghi, D. R. Schaart, and E. Charbon, "11.4 A 67,392-spad pvtb-compensated multi-channel digital sipm with 432 column-parallel 48ps 17b tdcs for endoscopic time-of-flight pet," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 1–3.
- [30] F. Piron, D. Morrison, M. R. Yuce, and J.-M. Redouté, "A review of singlephoton avalanche diode time-of-flight imaging sensor arrays," *IEEE Sensors Journal*, vol. 21, no. 11, pp. 12654–12666, 2021.
- [31] W. Yan and H. Luong, "A 900-MHz CMOS low-phase-noise voltagecontrolled ring oscillator," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 48, no. 2, pp. 216–221, 2001.

- [32] A. C. Demartinos, A. Tsimpos, S. Vlassis, and G. Souliotis, "Delay elements suitable for CMOS ring oscillators," *Journal of Engineering Science and Technology Review*, vol. 9, pp. 98–101, 2016.
- [33] B. Razavi, "The strongarm latch [a circuit for all seasons]," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 2, pp. 12–17, 2015.
- [34] P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. van der Plas, "Noise analysis of regenerative comparators for reconfigurable adc architectures," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, pp. 1441–1454, 2008.
- [35] K.-C. Choi, S.-W. Lee, B.-C. Lee, and W.-Y. Choi, "A time-to-digital converter based on a multiphase reference clock and a binary counter with a novel sampling error corrector," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 59, no. 3, pp. 143–147, 2012.