

Delft University of Technology

## Digitally Intensive Frequency Synthesis and Modulation Exploiting a Time-mode Arithmetic Unit

Gao, Z.

DOI 10.4233/uuid:c31d254c-045c-4f4d-bd82-0d24ef8d48fa

**Publication date** 2023

**Document Version** Final published version

## Citation (APA)

Gao, Z. (2023). Digitally Intensive Frequency Synthesis and Modulation Exploiting a Time-mode Arithmetic *Unit*. [Dissertation (TU Delft), Delft University of Technology]. https://doi.org/10.4233/uuid:c31d254c-045c-4f4d-bd82-0d24ef8d48fa

#### Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

**Takedown policy** Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology. For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

# Digitally Intensive Frequency Synthesis and Modulation Exploiting a Time-mode Arithmetic Unit

Zhong Gao

## Digitally Intensive Frequency Synthesis and Modulation Exploiting a Time-mode Arithmetic Unit

Dissertation

for the purpose of obtaining the degree of doctor at Delft University of Technology by the authority of the Rector Magnificus Prof.dr.ir. T.H.J.J. van der Hagen chair of the Board for Doctorates, to be defended publicly on

Thursday, 7 December 2023 at 10:00 o'clock

by

Zhong GAO

Master of Natural Science in Microelectronics and Solid State Electronics, University of Chinese Academy of Science, China born in Binzhou, China. This dissertation has been approved by the promotors.

| Composition of the doctoral committee: |                                          |  |  |  |  |  |
|----------------------------------------|------------------------------------------|--|--|--|--|--|
| Rector Magnificus,                     | chairperson                              |  |  |  |  |  |
| Dr. M. Babaie,                         | Delft University of Technology, promotor |  |  |  |  |  |
| Prof. dr. R. B. Staszewski,            | Delft University of Technology, promotor |  |  |  |  |  |
| Independent members:                   |                                          |  |  |  |  |  |
| Prof. dr. C. S. Vaucher,               | Delft University of Technology           |  |  |  |  |  |
| Prof. dr. ir. B. Nauta,                | University of Twente                     |  |  |  |  |  |
| Prof. dr. S. Levantino,                | Politecnico di Milano, Italy             |  |  |  |  |  |
| Dr. K. Yamamoto,                       | Sony Semiconductor Solutions, Japan      |  |  |  |  |  |
| Dr. YH. Liu,                           | Holst Centre/IMEC Netherlands            |  |  |  |  |  |
| Prof. dr. L. C. N. de Vreede           | Delft University of Technology, reserved |  |  |  |  |  |



Zhong Gao,

Digitally Intensive Frequency Synthesis and Modulation Exploiting a Time-mode Arithmetic Unit,

Ph.D. Thesis Delft University of Technology,

Keywords: Time-mode arithmetic unit (TAU), digital-to-time converter (DTC), phase-locked loop (PLL), fractional spur, process voltage and temperature (PVT), spur cancelation, self-interference, synchronous interference, interference mitigation, PLL-based modulator, phase modulator, two-point modulation, non-uniform clock compensation (NUCC), phase-domain digital pre-distortion (DPD), LC-tank nonlinearity

### ISBN 978-94-6366-779-1

An electronic version of this dissertation is available at UUID:c31d254c-045c-4f4d-bd82-0d24ef8d48fa All research data and code supporting the findings described in this thesis are available in 4TU.Centre for Research Data at DOI:9014b4b9-be97-4c02-b891-4d62464f586c.

Copyright © 2023 by Zhong Gao Cover photo was taken from www.shutterstock.com.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the prior written permission of the copyright owner.

Printed in the Netherlands.

"What does not kill me, makes me stronger."

Friedrich Nietzsche, 1844-1900

## Contents

## Contents

| 1                                       | Intr | ntroduction                                                                                                         |   |  |  |
|-----------------------------------------|------|---------------------------------------------------------------------------------------------------------------------|---|--|--|
|                                         | 1.1  | RF Clock Requirements for Wireless Communications                                                                   | 2 |  |  |
|                                         |      | 1.1.1 Spur Issues in Transceivers                                                                                   | 2 |  |  |
|                                         |      | 1.1.2 Phase Error Issues in Transceivers                                                                            | 3 |  |  |
|                                         | 1.2  | Need for Power-Aware Phase-Locked Loop Design                                                                       | 4 |  |  |
|                                         | 1.3  | Low-Spur PLL Design under Low-Power Constraint                                                                      | 6 |  |  |
|                                         |      | 1.3.1 Trade-off between Low Power and Low Spurs                                                                     | 6 |  |  |
|                                         |      | 1.3.2 Opportunities in Addressing the Spur Issue                                                                    | 9 |  |  |
|                                         | 1.4  | Challenges and Opportunities of a PLL-based Phase Modulator                                                         | 0 |  |  |
|                                         | 1.5  | Thesis Objectives                                                                                                   | 2 |  |  |
|                                         | 1.6  | Thesis Outline                                                                                                      | 3 |  |  |
| <b>2</b>                                | A F  | Fractional-N ADPLL Exploiting A Time-Mode Arithmetic Unit                                                           | 5 |  |  |
|                                         | 2.1  | Comparision of Existing Phase-Error-Extraction Strategies                                                           | 6 |  |  |
|                                         |      | 2.1.1 Two Commonly Used Strategies                                                                                  | 6 |  |  |
|                                         |      | 2.1.2 Strategies Utilizing Scaled 'Golden' Time Base                                                                | 7 |  |  |
|                                         | 2.2  | Principle of the Proposed PLL 18                                                                                    | 8 |  |  |
|                                         |      | 2.2.1 Conceptual Architecture                                                                                       | 8 |  |  |
|                                         |      | 2.2.2 Evolution from Time Register to TAU                                                                           | 9 |  |  |
|                                         |      | 2.2.3 RC tuning in the WTR $\ldots \ldots 24$ | 4 |  |  |
|                                         |      | 2.2.4 TAU Control Flow within the Proposed PLL                                                                      | 4 |  |  |
| 2.3 Circuit-Level Implementation of TAU |      | Circuit-Level Implementation of TAU                                                                                 | 7 |  |  |
|                                         |      | 2.3.1 TAU Sub-System Overview                                                                                       | 7 |  |  |
|                                         |      | 2.3.2 Implementation of the Global FSM                                                                              | 8 |  |  |
|                                         |      | 2.3.2.1 Differential Snapshot Circuit                                                                               | 8 |  |  |
|                                         |      | 2.3.2.2 Time Amplification Control and Global Reset                                                                 | 1 |  |  |
|                                         |      | 2.3.3 Implementation of the Tri-Mode PFD                                                                            | 3 |  |  |
|                                         |      | 2.3.4 Implementation of the local FSM                                                                               | 5 |  |  |
|                                         |      | 2.3.5 Implementation of the WTR                                                                                     | 7 |  |  |
|                                         |      | 2.3.6 Implementation of the RC Encoder                                                                              | 0 |  |  |

i

|   | 2.4 Implemented PLL |                                                                                                                          |  |
|---|---------------------|--------------------------------------------------------------------------------------------------------------------------|--|
|   | 2.5                 | Noise/Jitter Analysis                                                                                                    |  |
|   |                     | 2.5.1 Time-Domain Noise $\ldots \ldots 42$         |  |
|   |                     | 2.5.2 Circuit-Level Contributors of Time-Domain Noise                                                                    |  |
|   |                     | 2.5.3 Voltage Noise                                                                                                      |  |
|   |                     | 2.5.4 TAU's Input-Referred Noise and its Contribution to PLL's Phase Noise . 47                                          |  |
|   | 2.6                 | Nonlinearity Analysis                                                                                                    |  |
|   |                     | 2.6.1 INL Characterization and Degradation Mechanism                                                                     |  |
|   |                     | 2.6.2 Simulated INL                                                                                                      |  |
|   |                     | 2.6.3 INL calibration                                                                                                    |  |
|   | 2.7                 | Measurement Results                                                                                                      |  |
|   | 2.8                 | Conclusions                                                                                                              |  |
| 3 | Can                 | celing Fundamental Fractional Spurs Arising from Self-Interference 63                                                    |  |
|   | 3.1                 | Frequency-Dependent Behavior of Spurs                                                                                    |  |
|   | 3.2                 | Theory of Synchronous Self-Interference                                                                                  |  |
|   |                     | 3.2.1 Synchronous Interference from FREF to DCO                                                                          |  |
|   |                     | 3.2.1.1 Qualitative Analysis of the Interference Pattern and the Resulting                                               |  |
|   |                     | Spurs $\ldots$ $68$                                                                                                      |  |
|   |                     | $3.2.1.2$ Quantitative Analysis $\ldots \ldots 69$ |  |
|   |                     | 3.2.2 Synchronous Interference from CKV to FREF                                                                          |  |
|   | 3.3                 | Experimental Verification of Spur Cancellation via Synchronous Interference 73                                           |  |
|   |                     | 3.3.1 Details of the PLL used in the Experiment                                                                          |  |
|   |                     | 3.3.2 Identifying Sources of the Fundamental Fractional Spurs                                                            |  |
|   |                     | 3.3.3 Verifying the Spur Cancellation Mechanism                                                                          |  |
|   | 3.4                 | Digital Approach Canceling the DCO-Interference-Induced Fractional Spurs 83                                              |  |
|   |                     | 3.4.1 Principle of Designing the In-band Interference Sequence                                                           |  |
|   |                     | 3.4.2 Calculating $\theta_{\text{DLF}}$                                                                                  |  |
|   |                     | 3.4.3 Measuring $\theta_{\rm SC,ff}$                                                                                     |  |
|   |                     | 3.4.4 Determining $A_{\rm SC}$                                                                                           |  |
|   |                     | 3.4.5 Implementation                                                                                                     |  |
|   | 3.5                 | Experimentally Verifying the Digitally Intensive Spur Cancellation 90                                                    |  |
|   | 3.6                 | Conclusion                                                                                                               |  |
| 4 | ΑĽ                  | Digital PLL-Based Phase Modulator Achieving Low EVM 95                                                                   |  |
|   | 4.1                 | System-Level Constrains Limiting the Phase Modulation Accuracy 96                                                        |  |
|   | 4.2                 | Modeling a PLL-Based Phase Modulator                                                                                     |  |
|   |                     | 4.2.1 Ideal Phase Modulator Model in Discrete-Time Domain                                                                |  |
|   |                     | 4.2.2 DCO Model in Hybrid-Time Domain                                                                                    |  |
|   |                     | 4.2.3 Hybrid-Time Model of Phase Modulator                                                                               |  |
|   | 4.3                 | Non-Uniform Clock Compensation (NUCC)                                                                                    |  |
|   |                     | 4.3.1 Foundation for NUCC— $\Delta t_{\rm S}$ Estimation                                                                 |  |
|   |                     | 4.3.2 Tackling $\phi_{E,DM}$ due to CKU Period Variation                                                                 |  |
|   |                     | 4.3.3 Addressing $\phi_{E,PP}$ due to CKU Offset Variation 106                                                           |  |
|   | 4.4                 | DCO Frequency Error Compensation                                                                                         |  |

|          |       | 4.4.1   | Characterizing the Error Induced by $1/\sqrt{LC}$ |     |     |     | 107 |
|----------|-------|---------|---------------------------------------------------|-----|-----|-----|-----|
|          |       | 4.4.2   | Phase-Domain Digital Pre-Distortion               | •   | · • |     | 108 |
|          | 4.5   | System  | n Implementation                                  |     | •   |     | 110 |
|          |       | 4.5.1   | System Overview                                   |     | •   |     | 110 |
|          |       | 4.5.2   | Implementation of NUCC                            | •   |     |     | 112 |
|          |       | 4.5.3   | DCO with Calibration                              | •   | •   |     | 114 |
|          |       | 4.5.4   | Calibrated Parameters in Face of Channel Hopping  | •   |     |     | 117 |
|          | 4.6   | Measu   | rement Results                                    | • • | •   |     | 119 |
|          |       | 4.6.1   | Measurement of the DCO's FM-INL                   |     | •   |     | 120 |
|          |       | 4.6.2   | PM Signal Generation and Measurement Setup        | •   | ••  |     | 122 |
|          |       | 4.6.3   | Modulation Performance at 64-PSK                  | •   | •   |     | 123 |
|          |       | 4.6.4   | Performance Comparison                            | • • | •   |     | 128 |
|          | 4.7   | Conclu  | usions                                            |     | •   |     | 130 |
| <b>5</b> | Con   | clusio  | n                                                 |     |     |     | 133 |
|          | 5.1   | Origin  | al Contributions                                  |     | •   |     | 133 |
|          | 5.2   | Thesis  | Outcomes                                          |     | •   |     | 134 |
|          | 5.3   | Recon   | nmendations for Future Development                | •   | •   | ••• | 136 |
| Α        | Diff  | erentia | al Vernier Time-to-Digital Converter              |     |     |     | 139 |
| в        | Out   | put Ji  | tter of the Slicing Comparator                    |     |     |     | 141 |
| Bi       | bliog | graphy  |                                                   |     |     |     | 145 |
| Lis      | st of | Publie  | cations                                           |     |     |     | 159 |
| Su       | mma   | ary     |                                                   |     |     |     | 161 |
| Lis      | st of | Figur   | es                                                |     |     |     | 164 |
| Lis      | st of | Table   | 5                                                 |     |     |     | 171 |
| Ac       | knov  | wledge  | ment                                              |     |     |     | 173 |
| Cł       | nip N | licrog  | raph Gallery                                      |     |     |     | 177 |
| Ał       | oout  | the A   | uthor                                             |     |     |     | 179 |



The flourishing of communication technology has revolutionized the wireless connectivity, making it more ubiquitous than ever before while empowering a multitude of new applications that shape our modern life. One prime example is the emergence of video meeting platforms, which rely on high-speed internet and wireless networks to allow individuals to connect and collaborate irrespective of physical distance. As another example, by harnessing the Internet-of-Things (IoT) connectivity, artificial-intelligence-powered systems can gather and analyze data from a vast network of sensors and devices, enabling data-driven decision-making and bringing the concepts of smart homes, buildings, and even societies to fruition. As a result, the usage of these connectivity-enabled applications is experiencing an exponential growth, leading to an ever-increasing volume of data and the corresponding demands for the efficient data transmission.

A considerable portion of the rapidly expanding data is transmitted through wireless channels. Supporting this, [1] highlights that mobile devices constituted approximately 60% of the global web-page access in June 2022. However, it is important to note that wireless devices operating in a localized area often share the same frequency band resources. With the ever-growing demands for wireless communication, the limited frequency bandwidth for data transmission is becoming increasingly congested, particularly in the lower frequency channels below 6 GHz. Consequently, wireless transceivers (i.e., transmitters/receivers) should continuously enhance their performance to thrive in this crowded electromagnetic environment and to make more efficient use of the limited bandwidth in order to achieve higher effective data rates.

## 1.1 **RF Clock Requirements for Wireless Communications**

For the purpose of enhancing the transceivers' data rate while ensuring immunity from interference, their critical blocks need to satisfy rather stringent specifications. One key critical block is a local oscillator, which provides radio frequency (RF) clocks for the transmitter and receiver and specifically faces issues from the two aspects discussed below.



Figure 1.1: System diagram and signal spectra illustrating how the RF clock spurs can impact the SNR of the received signal and the out-of-band emission of the transmitted signal.

### 1.1.1 Spur Issues in Transceivers

First, the RF clock should exhibit a low spurious level in order to guarantee a sufficient signal-to-noise ratio (SNR) of the received signals in both the desired and neighboring channels when the electromagnetic environment is crowdy. The detailed impact on the desired and neighboring-channel signals can be understood by inspecting the receiver and transmitter behavior, respectively. On the receiver side, low spurious levels ensure a minimum impact of the SNR degradation due to interfering signals present in the neighboring channels. As shown in Fig. 1.1, the receiver aims to demodulate the RF signal at the desired channel of frequency  $f_{ch2}$ , thereby mixing it with the RF clock at  $f_{LO}$  for the down-conversion to the baseband frequency of  $f_{ch2} - f_{LO}$ . If the RF clock contains spurs at  $\pm \Delta f$  offsets, signals in the neighboring channels (with the offsets  $\pm \Delta f$  relative to  $f_{ch2}$ , i.e., at  $f_{ch1}$ and  $f_{ch3}$ ) would also be down-converted to the same baseband frequency at  $f_{ch2} - f_{LO}$ . As a result, the desired baseband signal will be contaminated by these interfering signals down-converted by the clock spurs, further degrading the SNR. Therefore, the lower spurious level of the RF clock, the lower SNR degradation due to the down-conversion of the neighboring-channel signals by the clock spurs<sup>1</sup>. This facilitates the communication in a crowded electromagnetic environment.

On the transmitter side, a lower spurious level of the RF clock results in a lower leakage of the transmitted RF signal into neighboring channels (so called "a good neighbor policy"). The reduced leakage strength will incur less degradation to the signal's SNR in that victim channel when the affected signal is weak. An example is shown in Fig. 1.1, where the transmitter intends to up-convert the baseband signal at  $f_{\rm BB}$  to the desired RF channel at  $f_{\rm BB} + f_{\rm LO}$ . However, because in this case the RF clock contains spurs at the offset frequency of  $\pm \Delta f$ , the baseband signal will also be up-converted into the neighboring RF bands and leak into the frequencies of  $f_{\rm BB} + f_{\rm LO} \pm \Delta f$ . For any receiver tuned to these frequencies, this leakage energy will behave like interference or noise, thus degrading its SNR. Therefore, lowering the RF clock spurs in a transmitter helps to suppress the interference into the neighboring channels, thus protecting any weak victim signals there from any further SNR degradation.

## 1.1.2 Phase Error Issues in Transceivers



Figure 1.2: Constellation diagram of (a) 4-QAM and (b) 16-QAM, illustrating the impact of the RF clock's phase error.

3

<sup>&</sup>lt;sup>1</sup>Note that far-away interferers (typically called "blockers") can be similarly down-converted to baseband if the RF clock contains far-away spurs. However, such blockers will likely be sufficiently attenuated by the receiver front-end's band-pass filter.

The second crucial aspect is that the RF clock should exhibit a sufficiently low phase error to facilitate a higher-rate data communication. Due to the typical bandwidth constraints, wireless communication systems usually increase their data rates by adopting higher-order quadrature amplitude modulation (QAM) schemes, in which data bits are converted into symbol points on the constellation diagram during transmission and vice versa during the reception. As a result, higher data rates necessitate higher-order modulation and denser constellations. However, during the data transfer, the actual constellation points tend to tangentially deviate from their ideal positions due to phase errors in the RF clock. Higher-order modulation schemes are less tolerant to rotation, thereby requiring lower phase error to guarantee the correct demodulation. For instance, in a 4-QAM scheme, a symbol point can be resolved without any ambiguity in the presence of phase errors up to  $\pm \pi/4$  (reflected as a rotation on the constellation diagram), as illustrated in Fig. 1.2(a). However, in a 16-QAM scheme, the same phase error can render the symbol point indistinguishable from the neighboring ones, as depicted in Fig. 1.2(b). Therefore, a low phase error is vitally important when higher-order modulation schemes are adopted to promote the increased data rate.

## 1.2 Need for Power-Aware Phase-Locked Loop Design

Given the exponential growth in the data transfer demand, it is projected that the communication technology (by means of wireless and wireline transfer) will consume approximately 21% of global electricity by 2030 [2]. This estimation already considers a reasonable annual improvement rate in energy efficiency, such as 22% for wireless communications (including mobile and fixed Wi-Fi). However, as the advancement of CMOS technology has been slowing down due to the high cost of fabrication, future improvements in energy efficiency may benefit less from the technology scaling. Consequently, this could lead to the worst-case scenario depicted in Fig. 1.3 where the communication industry would consume over half of the world's electricity. Although the estimation may seem overly pessimistic, it highlights the importance of adopting energy-efficient design techniques in future communication systems to ensure sustainability. This realization calls for a global effort to reduce power consumption across all components of communication systems so that a reasonable rate of improvement in the overall energy efficiency can be maintained.

As our scope narrows down to the wireless transceiver's LO, which is commonly implemented using a phase-locked loop (PLL), improving power consumption not only contributes to the sustainability of the communication



Figure 1.3: Contribution of the communication industry to the global electricity usage [2].

industry but also plays a crucial role in extending the lifetime of batterypowered wireless devices. The latter aspect holds even a greater significance as the longer battery lifetime can have a substantial impact on the user experience and enable the development of a brand-new range of applications, such as implantable medical devices. Thus, the objective of reducing power consumption becomes a key factor in shaping the design of PLLs to meet the clock requirements discussed in Section 1.1.

Regarding the low-spur requirement, a state-of-the-art PLL has demonstrated an in-band fractional-spur level of as low as  $-80 \, \text{dBc}$  [3], which is sufficiently low to ensure the resilience of a wireless transceiver even within an intensely congested electromagnetic environment. However, the pursuit of the PLL's low power consumption might prevent incorporating some existing strategies that effectively mitigate clock spurs, as they tend to incur significant power penalties. Therefore, this thesis will investigate the design techniques that help PLLs achieve low spurious levels while maintaining low power consumption.

Regarding the phase error requirement, recent advancements in PLL technology have enabled synthesizing RF clocks with sub-100-fs integrated rms jitter (characterizing the noise-related phase-error component in the time domain). With such a pure phase-error clock, a well-designed conventional



Figure 1.4: Transmitter architectures: (a) Cartesian transmitter and (b) polar transmitter.

Cartesian transmitter [see Fig. 1.4(a)] can achieve a sufficiently low errorvector-magnitude (EVM) level to support even the most advanced signal modulation modes so far, e.g., the 4k-QAM in IEEE 802.11be [4]. However, the constant strive towards low-power consumption may drive towards adopting highly efficient non-Cartesian architectures, e.g., polar transmitters [5], as sketched in Fig. 1.4(b), which require phase-modulated RF clocks. The additional phase modulation (PM) function introduces a new phase-error source into the modulated RF clock, which could further degrade the EVM of these non-Cartesian transmitters (compared with their baseline Cartesian counterparts) and limiting the application. For example, a state-of-the-art polar TX in [6] has demonstrated sufficiently low EVM to support the 1024-QAM in Wi-Fi 6, but still cannot satisfy the requirement of 4k-QAM in Wi-Fi 7. Consequently, this thesis will also explore a phase modulator design, specifically focusing on generating an accurate phase-modulated RF clock by directly modulating a PLL, i.e., implementing a PLL-based phase modulator of low EVM. This approach can achieve low power consumption by simply avoiding the dedicated power-hungry PM blocks (e.g., delaylines [5] [7] [8] and IQ interpolators [9] [10]), and thereby is commonly adopted in IoT polar transmitters [11] [12].

The next two sections will provide a comprehensive overview of the issues associated with low-spur PLLs and low-EVM PLL-based phase modulators, particularly the impact of the low-power target, design challenges, and opportunities for improving the performance.

## 1.3 Low-Spur PLL Design under Low-Power Constraint

## 1.3.1 Trade-off between Low Power and Low Spurs

Under the constraint of continuously reducing the power consumption, the currently established fractional-spur-reduction techniques are becoming less



Figure 1.5: Diagram of a conventional analog PLL

attractive. This can be understood by inspecting the spur issues in the analog PLL sketched in Fig. 1.5. The PLL generates a variable clock (CKV) and adjusts its frequency  $f_0$  by tuning the voltage-controlled oscillator (VCO). To track the CKV's phase error, a multi-modulus divider (MMD) divides CKV into a feedback clock CKFB such that CKFB and the frequency reference FREF nominally operate at the same frequency of  $f_{\text{REF}}$ . Consequently, the CKV phase error information is embedded in the time difference between the significant (here, falling) edges of FREF and CKFB, i.e.,  $\Delta t_{\text{S}}$ . Then,  $\Delta t_{\text{S}}$  is captured by the phase/frequency detector (PFD) and charge pump (CP), fed into the loop filter, and finally it tunes the VCO frequency to correct the CKV's phase error.

The spur issue of such analog PLLs can arise from the fact that the MMD's quantization error dominates  $\Delta t_{\rm S}$  because the instantaneous frequency division ratio of the MMD cannot be an arbitrarily fine fractional value but is restricted to integers derived by quantizing the fractional frequency control word (FCW), i.e.,  $f_0/f_{\rm REF}$ . The quantization process introduces a periodic pattern to  $\Delta t_{\rm S}$  and ultimately results in high fractional spur content in the CKV spectrum if the pattern is not sufficiently suppressed by the loop filter. So far, one of the most effective ways to eliminate the fractional spur is to sufficiently randomize the quantization pattern in  $\Delta t_{\rm S}$ , e.g., by implementing an FCW quantizer with a high-order  $\Delta \Sigma$  modulator or successive requantizer [13]. However, a more randomized  $\Delta t_{\rm S}$  pattern implies a wider  $\Delta t_{\rm S}$  range [13], which increases the active time of the charge pump's current sources and the associated phase noise contribution [14]. Consequently, the overall PLL would need to burn more power to achieve the same phase noise performance<sup>1</sup>. Therefore, this entails a trade-off between

<sup>&</sup>lt;sup>1</sup>One may doubt whether the charge pump (or more broadly, other types of a phase detection block) can significantly influence the overall PLL power consumption, which is dominated by a VCO (e.g., contributing up to 70% in [15]). Actually, the PLL design is quite systematic—With a low-noise phase detection block, the PLL can utilize a very wide loop bandwidth to suppress the VCO noise, e.g., as with an injection locking technique [16]. Consequently, the VCO phase noise requirement can be relaxed and optimized for lower power, thereby reducing the overall PLL power consumption. Therefore, despite

the spur level and power consumption.



Figure 1.6: Diagram of an analog PLL with DTC canceling the quantization noise of the MMD.

According to the analysis above, both the spur issue and power penalty are related to  $\Delta t_{\rm S}$ —the periodic  $\Delta t_{\rm S}$  pattern results in fractional spurs in the PLL output spectrum; the large  $\Delta t_{\rm S}$  amplitude incurs the power penalty. Considering the dominant component of  $\Delta t_{\rm S}$  is determined by the MMD quantization error, which is highly predictable, one can cancel the deterministic  $\Delta t_{\rm S}$  component to tackle the spur and power penalty issues simultaneously, thereby breaking the trade-off constraint. Figure 1.6 illustrates an example of canceling the deterministic  $\Delta t_{\rm S}$  with a digital-to-time converter (DTC), as proposed in [17]. Considering that the deterministic component in  $\Delta t_{\rm S}[n]$  is proportional to the MMD's quantization error (predicted by the quantizer as  $Q_{\rm E}[n]$ ), properly scaling the predicted quantization error can dictate a value of the DTC control word  $(D_{DTC}[n])$ . Accordingly, the DTC delays FREF to launch the FREF<sub>dlv</sub> falling edge that ideally aligns with the MMD output, i.e., CKFB. Consequently, the dominant deterministic  $\Delta t_{\rm S}$  component will be canceled. The residual time difference between the  $\text{FREF}_{\text{dly}}$  and CKFBedges reflects the CKV phase error and travels through the loop components (PFD&CP, and loop filter) to control the VCO, thus suppressing the phase error.

This  $\Delta t_{\rm S}$ -cancelation strategy is more commonly referred to as a narrowrange phase detection. It was initially meant to extract the random noiseinduced phase error hidden under the relatively large deterministic  $\Delta t_{\rm S}$ pattern so that the phase detector, e.g., the PFD and CP in Fig. 1.5, need only to handle a narrow-range input and can adopt a larger gain to suppress the phase noise contributions from the subsequent loop blocks [18]. This  $\Delta t_{\rm S}$ cancelation strategy (i.e., the narrow-range phase-detection concept) has only been proven successful in improving the PLLs' noise-power efficiency [19] [20].

contributing merely a small portion to the system power breakdown, the phase detection blocks can also significantly impact the overall PLL power consumption.

However, it fails to help PLLs in achieving fractional spur levels as low as those adopting the quantization-error-randomization method. This is because a practical phase-error-extraction circuit, e.g., the DTC in Fig. 1.6, might output a nonlinearity-induced periodic pattern and become a new dominant contributor to the fractional spurs [18]. Therefore, the trade-off between power and spur level still holds.

## 1.3.2 Opportunities in Addressing the Spur Issue



Figure 1.7: Comparing (a) delay-chain DTC and (b) current DAC.

In a PLL adopting the  $\Delta t_{\rm S}$ -cancelation strategy, the nonlinearity-stemmed spur issue has plenty of room for improvement, considering that these timemode phase-error-extraction circuits (e.g., the DTC) are quite new, and their fundamental operational principles are less mature than those of conventional analog circuits. This can be understood by comparing a DTC with its analog counterpart, a digital-to-analog converter (DAC). Figure 1.7(a) sketches a delay-chain-based DTC, whose generated delay is proportional to the number of active delay cells (controlled by the digital code D) between the input and output edges, i.e.,  $D \cdot \Delta t_{\rm dlv}$ , where  $\Delta t_{\rm dlv}$  denotes the intrinsic latency of each delay cell. This is a convenient but unreliable way to generate an accurate time signal because the base for time signal generation, i.e.,  $\Delta t_{\rm dlv}$ , can easily drift due to variations in process, voltage, and temperature (PVT), resulting in a systematic error. In contrast, a conventional current DAC shown in Fig. 1.7(b) utilizes a more reliable strategy—the DAC adopts an accurate external current reference  $I_{\rm ref}$ , then scales it according to the number of active mirror units (also controlled by the digital word D) to generate the desired output current  $I_{out} = D \cdot I_{ref}$ . Because  $I_{ref}$  can be well protected from external disturbances and optimized across the PVT variations,  $I_{\rm out}$  can be quite robust and accurate. Such an operational principle intrinsic to the DAC can be adopted to improve the DTC performance. More broadly, migrating

9

the design strategies from the conventional analog circuits to emerging timemode circuits (e.g., a DTC) can provide great opportunities for the linearity enhancement, which can ultimately help to validate the narrow-range phase detection's advantage in suppressing the PLL fractional spurs.

Recognizing that PLL fractional spurs can be brought about by a broader set of causes other than the phase-error-extraction nonlinearity, e.g., strong interference signals [21] [22], and so the analog strategies may be less effective in such situations, digital compensation techniques offer another tool to combat the plethora of these spur-raising sources. This digital manner of implementation benefits from the concept of all-digital PLL (ADPLL) proposed in [23]. Figure 1.8 sketches an example in accordance with the counterpart in Fig. 1.6. As such, a time-to-digital converter (TDC) replaces the PFD and CP in Fig. 1.6 to detect and quantize the CKV phase error as  $D_{\text{TDC}}$ . Then, a digital loop filter processes  $D_{\text{TDC}}$  and outputs the oscillator tuning word (OTW) to tune the digitally controlled oscillator (DCO), a counterpart of the VCO in Fig. 1.6. The digitalized forms of phase error  $(D_{\text{TDC}})$  and OTW bring up the possibility of using digital techniques to analyze and finally tackle the PLL's non-ideality effects. For example, by observing  $D_{\text{TDC}}$ , [21] measures and compensates for the interference signal suffered by an ADPLL, reporting the suppression of the interference-induced spurs by over 20 dB.

## 1.4 Challenges and Opportunities of a PLL-based Phase Modulator

The published PLL-based phase modulators commonly adopt a two-point frequency/phase modulation scheme, which is illustrated in Fig. 1.9 by supplementing the modulation-related details onto the digital PLL explained in Fig. 1.8. As shown, the PLL is modulated through two feed points. At one point, the modulating data, MOD[n], is denormalized and added to OTW to



Figure 1.8: Block diagram of a digital PLL, as a counterpart of Fig. 1.6.



Figure 1.9: Diagram of a PLL-based frequency/phase modulator realized by two-point modulating the digital PLL shown in Fig. 1.6.

directly tune the DCO to output the modulated carrier frequency. (From this perspective, the system can be treated as a *frequency modulator*.) Meanwhile, the DCO also behaves like a phase accumulator [24], i.e, it integrates the modulating frequency over time to acquire the desired phase and output the phase-modulated clock CKV. (Therefore, the system can also be regarded as a *phase modulator*.) At the other feed point, MOD[n] adds to the quantizer input to reflect the expected CKV modulation behavior onto the DTC and MMD. Consequently, the CKV's modulated phase is ideally eliminated prior to the TDC, and will not show up at the loop filter, thus not disturbing the normal PLL operation.

According to the description above, the DCO is pivotal to a PLL-based phase modulator and acquires the desired phase by integrating the modulation frequency over time. Therefore, the PM accuracy can be degraded by both the frequency- and time-related errors. The former is mainly attributed to the DCO's frequency-modulation (FM) nonlinearity. If the DCO is implemented with a parallel resonant tank consisting of an inductor and a switchedcapacitor bank, the circuit-level nonlinearity sources can comprise the squareroot characteristic of LC resonance [25], mismatch and parasitic routing between the capacitor bank units [26], and the transient behavior during the switching of bank units [27]. The time-related error occurs when the DCO modulating frequency (or the OTW value controlling the DCO) is maintained for a duration that deviates a bit from the expected time. For example, the block generating the modulating signal usually expects the DCO OTW to be updated at a sampling clock of a uniform period, just as in a general digital system running on a uniform clock grid. However, a realistic digital PLL-based phase modulator may conveniently employ a CKV-aligned clock to update the DCO's OTW. For instance, [12] and [11] generate the sampling clock by, respectively, frequency-dividing CKV or retiming FREF to CKV. These CKV-aligned clocks inherit the CKV's phase modulation characteristics and exhibit time-varying periods, deviating from the uniform-clock assumption of a general digital system. Consequently, the phase modulators adopting such CKV-aligned clocks will inevitably suffer from the time-related distortions.

Most of the aforementioned mechanisms degrading the PM accuracy are difficult to address with pure analog techniques, thereby requiring digitally intensive compensation. Fortunately, a digital PLL-based phase modulator has already digitalized most of the control and error signals (e.g., the OTW and  $D_{TDC}$  in Fig. 1.9). These signals provide abundant information to analyze the circuit nonideality and develop corresponding compensation strategies. For example, [28] correlates OTW with  $D_{\text{TDC}}$  to calibrate a look-up table (LUT) used for predistorting an arbitrary FM nonlinearity of the DCO. This has been proven effective in suppressing the frequency-related PM error. Another example, similar to the CKFB timestep estimation for the  $\Delta t_{\rm S}$ -cancelation purpose, any timestep information of CKV-aligned clocks can be estimated by properly processing the FCW and MOD signals. The timestep information can be utilized to compare the clock-period deviation from an ideal uniform clock, evaluate the associated time-related distortion in the generated PM signal, and finally develop the corresponding mitigation techniques.

## 1.5 Thesis Objectives

This thesis addresses the key wireless-communication-related performance issues in an ADPLL with a phase modulation function, i.e., the fractional spurs and phase modulation error. The efficacy of the proposed concepts will be demonstrated with an ADPLL targeting IoT applications that require low power consumption.

Regarding the spur-suppression study, the thesis firstly focuses on the spurs raised by the non-ideal phase-error-extraction circuitry, e.g., the DTC in Fig. 1.8, and investigates a new concept of time-mode circuits that cancels the undesired time-offset by scaling the external reference, in a sense emulating the way that the conventional analog circuits operate. At the system level, the thesis studies the spurs raised by self-interference (e.g., mutual coupling between the reference clock and DCO) and develops a digitally intensive method to cancel these spurs.

To improve the accuracy of a PLL-based phase modulator, some dominant

error sources are firstly investigated, e.g., the DCO's FM nonlinearity and the non-uniformity of the re-timed clock which in turn drives the control circuit to update the DCO's modulation frequency. The corresponding errors are finally addressed by digital compensation techniques.

## **1.6** Thesis Outline

The present dissertation is organized as follows:

Chapter 2 focuses on suppressing the fractional spurs contributed by the PLL's phase-error-extraction blocks. First, a fundamental bottleneck limiting the phase-error-extraction accuracy is explored by comparing the commonly used strategies. This survey leads to a conjecture that scaling the PLL output period as a 'golden' time base to cancel the undesired time-offset pattern may improve the accuracy in the phase-error extraction. To realize this new strategy, we proposed a universal time-signal processing circuit—time-mode arithmetic unit (TAU). The TAU can calculate the weighted sum of all the time inputs, thus making it sufficient to apply the desired scaling and cancelation operation to the input time. After that, a TAU-based fractional-N PLL is presented with the implementation details and analysis for noise and nonlinearity. Finally, measurement results are demonstrated to prove the advantages of the phase-error-extraction strategy adopting the 'golden' time-base scaling concept.

Chapter 3 mainly discusses the fractional spurs attributed to the systemlevel issues, i.e., mutual coupling between the DCO and reference-clock-related circuits (namely self-interference). Because the disturbing signals injected to the phase detector and DCO (respectively denoted as in-band and DCO interference) behave distinctively and require different compensation strategies, this chapter firstly studies the characteristics of fractional spurs raised by these two mechanisms. These theories not only provide the foundation to distinguish the two spur-raising mechanisms, but also lead to a discovery that synchronized in-band and DCO interference signals can cancel each other if their relative phase and amplitude are properly set. Next, the spur-cancellation mechanism is experimentally verified on a fabricated chip. Finally, a digitally intensive scheme addressing the self-interference-induced spurs is developed and evaluated on the same chip.

Chapter 4 explores the techniques for improving the EVM of a phase modulator adopting a two-point PLL modulation scheme. Because a PLLbased phase modulator acquires the desired phase shift by integrating the modulation frequency over a period of the sampling clock, the PM accuracy can be degraded by both frequency- and time-related errors. We first study the time-related errors induced by the non-uniform sampling clock (adopted in response to the system-level constraints), by which the DCO frequency is modulated and integrated. To analyze the effects of the non-ideal clock, a hybrid-time domain model of the PLL-based phase modulator is developed. Then, based on this new model, a non-uniform clock compensation (NUCC) scheme is proposed to suppress the sampling-clock-induced disturbance on the PLL and to improve the phase modulation accuracy. The frequencyrelated error mainly arises from two categories of mechanisms related to the DCO—the  $1/\sqrt{LC}$ -induced nonlinearity and the nonideal behavior related to the switched-capacitor units that the DCO adopts to tune the oscillation frequency. The former is tackled by the proposed phase-domain digital predistortion (DPD), whereas the latter is compensated for by a conventional OTW-domain DPD. Combining these two DPD techniques achieves relatively carrier-frequency-insensitive error-suppressing performance, reducing the effort in calibration. Finally, these proposed techniques, i.e., NUCC and combinational DPD, are implemented on the PLL-based phase modulator. Measurement results are shown at the end to validate the performance enhancements.

Chapter 5 closes this dissertation by summarizing the outcome of this research and providing suggestions for future works.

## CHAPTER



## A Fractional-N ADPLL Exploiting A Time-Mode Arithmetic Unit

The sub-sampling technique [29–33] and the narrow-range phase-detection concepts [18,34] have significantly improved the phase-locked loop's (PLL) phase noise (PN) and power efficiency. Applying this technique to a fractional-N PLL entails a phase-error-extraction block that cancels the deterministic instantaneous time offset before it is presented to the phase detector. As a result, the phase detector will expect a near-zero input, thereby allowing to utilize a large phase-detection gain to suppress the noise contribution of subsequent loop blocks. However, this additional block usually exhibits a nonlinear behavior and thus can dominate the fractional spur levels in the PLL output spectrum.

This chapter<sup>1</sup> mainly tackles the nonlinearity issue of the phase-errorextraction block, so as to significantly suppress the fractional spur levels in the PLL's output spectrum. To achieve this goal, Section 2.1 first studies a common weakness of the widely adopted phase-error-extraction circuits. This has inspired a phase-error-extraction strategy that scales a 'golden' time base to cancel the undesired time offset. Section 2.2 conceptually explains a PLL utilizing this time-base-scaling strategy and the key block realizing this ideal, i.e., a time-mode arithmetic unit (TAU). Then, Section 2.3 and Section 2.4 provide implementation details of the TAU and TAU-based PLL.

<sup>&</sup>lt;sup>1</sup>The main body of this chapter has been published in IEEE Journal of Solid-State Circuits [35].

Following that, Section 2.5 and Section 2.6 analyze the noise and nonlinearity issues of TAU. Finally, Section 2.7 demonstrates the measurement results of the prototype PLL, and Section 2.8 concludes this chapter.

## 2.1 Comparison of Existing Phase-Error-Extraction Strategies

## 2.1.1 Two Commonly Used Strategies

A digital-to-time converter (DTC) [17, 20, 36–39] is the most widely used circuitry to exploit the narrow-range phase-detection concept in a fractional-N PLL. Figure 2.1 (a) illustrates a conceptual PLL (similar to that in [40]) relying on a DTC to cancel the instantaneous time offset from the significant (here, falling) edge of the reference clock (FREF) to that of the variable oscillator clock (CKV). This time offset spans from zero to one CKV period  $T_{\rm CKV}$  and is predicted by scaling the CKV period (i.e., "time base") according to  $\phi_{\rm R,frac} \in [0, 1)$ , i.e., the fractional part of the accumulated frequency control word (FCW) [26]. According to this prediction, the DTC launches a delayed FREF, FREF<sub>dly</sub>, that is substantially aligned with the relevant CKV edge in order to narrow down the input range of the phase detector.



Figure 2.1: Time offset cancellation strategies to narrow the required input range of phase detector (PD) using (a) DTC and (b) voltage-domain cancellation.

This DTC-based solution is highly effective in improving phase noise. However, it potentially introduces high fractional spurs at the PLL output since the DTC delay can easily depart from its nominal expected value of  $(1 - \phi_{\rm R,frac}) \cdot T_{\rm CKV}$ . Such a mismatch stems from the underlying principle of DTCs—delaying input edges based on the circuit's nominal intrinsic latency, e.g., propagation delay of the elements in a delay-chain-based DTC [41]. This is markedly distinct from the conventional digital-to-analog converters (DAC), which generate signals by scaling a stable and accurate base, e.g., a bandgap reference voltage. Given the sensitivity of the circuit's intrinsic latency to process, voltage and temperature (PVT) variations [15,42,43], an extra effort is required in tracking and protecting the DTC's transfer function (i.e. from  $\phi_{R,frac}$  to DTC delay), so as to prevent the associated PLL spurs from arising. For example, [17] [44] [45] track the drift of the DTC transfer gain in the background. Refs. [18] [28] protect the DTC delay from supply variations with dedicated low-drop regulators (LDO) so as to alleviate any memory effects in DTC's transfer function. Ref. [28] further uses a complementary dummy DTC to reduce the time-varying supply perturbations resulting from the main DTC. These countermeasures, however, can only exhibit limited capabilities in suppressing the DTC-related spurs. When an extremely low spurious level is desired, the DTC codes might need to be modulated to smear the spurs into the noise floor [28,46,47]. These extra efforts complicate the design of the overall PLL system and degrade its power efficiency.

Instead of relying on the circuit propagation delay, [15] [43] [48] cancel the instantaneous fractional-N time-offset in the voltage domain. A conceptual example emulating [15] is presented in Fig. 2.1 (b). The time offset between FREF and its subsequent CKV edge,  $\Delta t_{\rm S}$ , is converted into voltage  $\Delta V_{\rm S}$ by the charging curve. The PLL cancels  $\Delta V_{\rm S}$  with its prediction ( $\Delta V_{\rm P}$ ) to extract the phase error information in the voltage domain ( $\Delta V_{\rm e}$ ). Accurate error extraction here requires a charging curve of constant slope since the voltage prediction assumes a linear time-to-voltage conversion. Such a dependency is also imperfect because the slope is generated by (dis)charging a capacitor through a current source, which raises two issues—1) it requires a stable current reference which is costly; 2) it suffers from a high power penalty: First, (dis)charging through a current source is noisy, since the output impedance is so high that noise voltage on the capacitor constantly accumulates without attenuation as long as the (dis)charge persists [14]. This point is also supported by [49], which chooses a resistor instead of a MOS transistor as its main (dis)charge device. Second, the linearity of the (dis)charging slope can be degraded by the finite impedance of the current source (i.e., a MOS transistor). As a result, circuit-level techniques such as cascoding [20] [50] become mandatory. Nevertheless, these techniques consume a significant voltage headroom, thereby further exacerbating the noise issue. To address this noise issue, the associated current must be increased, unavoidably degrading the power efficiency.

## 2.1.2 Strategies Utilizing Scaled 'Golden' Time Base

The dilemmas of the two discussed methods root in their dependence

on the PVT-sensitive physical parameters, i.e., the intrinsic circuit latency of the DTC and the (dis)charging slope in the voltage-domain cancellation. Mathematically, a 'golden' base for the fractional-N time-offset cancellation is  $T_{\rm CKV}$  since the time offset is predicted by  $(1 - \phi_{\rm R,frac}) \cdot T_{\rm CKV}$ . In terms of implementation,  $T_{\rm CKV}$  is also accurate and stable since it is intrinsically tracked by the PLL. Therefore, a new time-offset cancellation method adopting  $T_{\rm CKV}$  as its base, which can be considered analogous to the aforementioned reference voltage in a DAC, seems promising in overcoming the difficulties.

Interestingly, the phase-interpolator-based time-offset cancellation method in the prior arts can be regarded as a member of this new category. For example, [51] phase-interpolates new edges from a quadrature RF clock source to substantially cancel the time offset, just like a DTC does. That method can be also regarded as utilizing a 'golden' time base of  $T_{\rm CKV}/4$ . However, interpolating a new edge with arbitrary phase is intrinsically a nonlinear process [52], thereby incurring penalties from compensation. For example, [52] digitally pre-distorts the phase-interpolator's nonlinearity, requiring significant calibration efforts and power. Although [51] avoids the nonlinearity problem by cascading the fixed phase-interpolation stages that only generate a new edge exactly at the middle phase of two input edges, extending this strategy to achieve fine resolution can be quite bulky and power-hungry.

To realize an intrinsically linear time-offset-cancellation strategy adopting the 'golden' time base of  $T_{\rm CKV}$ , we propose a new time-mode circuit, i.e., a time-mode arithmetic unit (TAU) processor [53] that takes timestamp offsets as inputs and outputs their weighted sum, also in the time domain. Within each PLL cycle, the TAU takes both the timestamps defining  $T_{\rm CKV}$ , as well as the timestamps defining  $\Delta t_{\rm S}$ , i.e. the offset between the oscillator and reference clock edges. Then the weighted sum of their offsets is calculated to extract the desired information (i.e., time error  $\Delta t_{\rm E}$  input to the phase detector). With the 'golden' time base of  $T_{\rm CKV}$ , the TAU-based method can exhibit high linearity and built-in resilience to the supply and temperature variations. This simplifies the overall PLL system design and helps to suppress the generated spurs. As an extra bonus, TAU can advantageously amplify the desired time residue, thereby suppressing the noise contributions from subsequent loop blocks.

## 2.2 Principle of the Proposed PLL

### 2.2.1 Conceptual Architecture

Figure 2.2 shows a conceptual diagram of the proposed fractional-N PLL. To track the reference phase by the DCO, the proposed TAU extracts the time error ( $\Delta t_{\rm E}$ ) between the FREF and CKV timestamps. This  $\Delta t_{\rm E}$  is quantized by the time-to-digital converter (TDC) and input to the digital loop filter for the DCO phase error correction.



Figure 2.2: Conceptual diagram of the proposed TAU-based PLL.

Generally,  $\Delta t_{\rm E}$  'hides' within  $\Delta t_{\rm S}$ , which is the instantaneous 'raw' time offset between FREF and the first subsequent CKV falling edge, with theoretical prediction of  $(1 - \phi_{\rm R,frac}) \cdot T_{\rm CKV}$ . Therefore, extracting  $\Delta t_{\rm E}$  requires canceling  $\Delta t_{\rm S}$  with its prediction. In the proposed system, the TAU samples  $\Delta t_{\rm S}$  and  $T_{\rm CKV}$ , then calculates their weighted sum to extract  $\Delta t_{\rm E}$ . To further help with suppressing the TDC quantization noise, the TAU also time-amplifies the extracted error by  $G_{\rm TA}$  before feeding it to the TDC. Thus, the TAU's output can be described as

$$\Delta t_{\rm E} = G_{\rm TA} \cdot \left[ (1 - \phi_{\rm R, frac}) \cdot T_{\rm CKV} - \Delta t_{\rm S} \right].$$
(2.1)

More abstractly, if  $T_{\text{CKV}}$  and  $\Delta t_{\text{S}}$  are viewed as general inputs, and  $G_{\text{TA}}$  and  $\phi_{\text{R,frac}}$  are treated as their weights, the TAU's function can be generalized as producing the weighted sum of its inputs:

$$\Delta t_{\rm out} = \sum_{i=1}^{n} w_i \cdot \Delta t_i, \qquad (2.2)$$

where  $\Delta t_i$  is the *i*<sup>th</sup> input time offset,  $w_i$  is the weight applied to  $\Delta t_i$ , *n* is the total number of inputs, and  $\Delta t_{out}$  is the output time offset. Note that,  $\Delta t_i$  and  $\Delta t_{out}$  are generally defined as the time offsets between arbitrary edges.

To realize this conceptual PLL system, we first realize this generalized TAU, then program it to calculate the result required by (2.1).

## 2.2.2 Evolution from Time Register to TAU

The starting point for implementing TAU is a time register (TR), which takes pulse-widths as inputs, holds them, and then outputs their sum in a



Figure 2.3: Conceptual and timing diagrams of time register (TR).  $\Delta t_i$  is the  $i^{\text{th}}$  time-domain input.  $\Delta t_{\text{out}}$  is the time domain output.

complementary form [54]. Fig. 2.3 illustrates how to achieve these functions with a simplified RC model of TR [55]. Before a new execution cycle, capacitor C is charged to an initial voltage  $V_{\text{init}}$  by closing the charging switch SWC. After SWC is disconnected, the TR processes the active-low pulses on the discharge switch SWD by means of storing their pulse-widths as voltage drops on capacitor C. For example, during the first pulse, the switch SWD is closed to discharge capacitor C through resistor R. After  $\Delta t_1$ , the duration of the first pulse, the voltage on the capacitor V<sub>C</sub> drops from  $V_{\text{init}}$  to  $V_1 = V_{\text{init}} \cdot \exp(-\Delta t_1/\tau_0)$ , where  $\tau_0 = R \cdot C$  is the RC time constant for discharging. Hence, the input time  $\Delta t_1$  is recorded in the TR as a voltage drop  $V_{\text{init}} - V_1$ . Similarly, after the second pulse,  $V_{\text{C}}$  drops to  $V_2 = V_1 \cdot \exp(-\Delta t_2/\tau_0) = V_{\text{init}} \cdot \exp(-\Delta t_1/\tau_0 - \Delta t_2/\tau_0)$ . The new input time  $\Delta t_2$  is internally summed with the pre-stored  $\Delta t_1$  and recorded as  $V_{\text{init}} - V_2$ . The TR can continue to process more inputs as long as  $V_{\rm C}$  is higher than  $V_{\rm th}$ , i.e. the threshold voltage of the level-crossing comparator (slicer). Assuming the TR has processed n pulses in total, the final  $V_{\rm C}$  becomes

$$V_n = V_{n-1} \cdot \exp\left(-\frac{\Delta t_n}{\tau_0}\right) = V_{\text{init}} \cdot \exp\left(-\sum_{i=1}^n \frac{\Delta t_i}{\tau_0}\right),\tag{2.3}$$

where  $\Delta t_i$  is the width of the *i*<sup>th</sup> pulse. To read the recorded time, SWD is pulled down to discharge the capacitor voltage  $V_{\rm C}$  to below  $V_{\rm th}$ , thereby asserting the comparator's output CMP. The delay between the last SWD and CMP falling edges reflects the processed result, which is an offset (equal to the duration in which  $V_{\rm C}$  is continuously discharged from  $V_{\rm init}$  to  $V_{\rm th}$ ) minus the sum of all time inputs:

$$\Delta t_{\text{out}} = \tau_0 \ln \frac{V_n}{V_{\text{th}}} = \tau_0 \ln \frac{V_{\text{init}}}{V_{\text{th}}} - \sum_{i=1}^n \Delta t_i.$$
(2.4)



Figure 2.4: Conceptual and timing diagrams of weighted time register (WTR).  $\Delta t_i$  is the  $i^{\text{th}}$  time-domain input.  $\Delta t_{\text{out}}$  is the time domain output.

A quick comparison between (2.4) and (2.2) suggests a crucial limitation of the TR—its weight for each  $\Delta t_i$  can only be 1 instead of an arbitrary  $w_i$ . The weighted time register (WTR) shown in Fig. 2.4 overcomes this limitation by replacing the fixed resistor R and capacitor C with the variable ones,  $R_V$  and  $C_V$ . With this change, the WTR acquires a new degree of freedom, i.e. the variable RC time constant  $\tau = R_V \cdot C_V$ , to influence each pulse's discharge speed and the resulting voltage drop on  $V_{\rm C}$ . Accordingly, the WTR's final output becomes

$$\Delta t_{\rm out} = \tau_{\rm out} \cdot \ln \frac{V_{\rm init}}{V_{\rm th}} - \sum_{i=1}^{n} \frac{\tau_{\rm out}}{\tau_i} \Delta t_i, \qquad (2.5)$$

where  $\tau_i$  is the *RC* time constant for  $\Delta t_i$ , and  $\tau_{out}$  is the *RC* time constant for the final output discharge. Here, an arbitrary weight,  $w_i = \tau_{out}/\tau_i$ , is effectively applied to  $\Delta t_i$ .

Although the WTR achieves the weighted sum  $\left[\sum_{i=1}^{n} (\tau_{\text{out}}/\tau_i) \cdot \Delta t_i\right]$ , the offset term  $\tau_{\text{out}} \cdot \ln (V_{\text{init}}/V_{\text{th}})$  in its output raises undesired issues. This term indicates the WTR's sensitivity to voltages, i.e.  $V_{\text{init}}$  and  $V_{\text{th}}$ , and physical parameters, e.g.,  $\tau_{\text{out}}$ , which can ultimately lead to a severe PVT susceptibility. This term is advantageously canceled in a differential WTR

 $\mathbf{2}$ 



Figure 2.5: Conceptual and timing diagrams of differential weighted time registers (DWTR).  $\Delta t_i$  is the *i*<sup>th</sup> time-domain input.  $\Delta t_{out}$  is the time domain output.

(DWTR) configuration shown in Fig. 2.5. Two identical WTRs operate there in parallel and share the common resistive and capacitive tuning terminals, RT and CT. Hence, the same RC time constant  $\tau_i$  is applied to their  $i^{\text{th}}$ input pair (i.e.,  $\Delta t_{i,P}$  and  $\Delta t_{i,N}$ ). Non-shared pins of the two WTRs are distinguished with subscripts P and N. The outputs of two individual WTRs follow the same rule as (2.5). Combining these outputs differentially, the PVT-sensitive offset terms cancel out each other:

$$\Delta t_{\text{out}} = \Delta t_{\text{out,N}} - \Delta t_{\text{out,P}} = \sum_{i=1}^{n} \frac{\tau_{\text{out}}}{\tau_i} \cdot (\Delta t_{i,\text{P}} - \Delta t_{i,\text{N}}).$$
(2.6)

Nevertheless, the differential inputs and output required by the DWTR are too complex to use—they are the pulse-width differences ( $\Delta t_{i,P} - \Delta t_{i,N}$  and  $\Delta t_{out,N} - \Delta t_{out,P}$ ), instead of the time differences defined in (2.2). Therefore, their form is redefined. For the output, we simply impose a constraint that the last falling edges on the SWD<sub>P</sub> and SWD<sub>N</sub> must be launched simultaneously. Then, the differential output  $\Delta t_{out}$  is reinterpreted as the time offset between CMP<sub>P</sub> and CMP<sub>N</sub>, which equals  $\Delta t_{out,N} - \Delta t_{out,P}$  (Fig. 2.5).

For the input form conversion, the proposed TAU employs a phase/frequency detector (PFD). As shown in Fig. 2.6, the PFD bridges the gap between the overall TAU input, i.e. the time difference between  $\text{TIN}_{\text{P}}$  and  $\text{TIN}_{\text{N}}$  falling edges, and the DWTR input, i.e. the width difference of the pulse-pair on SWD<sub>P</sub> and SWD<sub>N</sub>. To do so, the PFD first pulls down SWD<sub>P</sub> and SWD<sub>N</sub> at the TIN<sub>P</sub> and TIN<sub>N</sub> falling edges, respectively. Once both SWDs become low, the PFD resets itself to pull them up simultaneously. By doing so,



Figure 2.6: Conceptual and timing diagrams of time-mode arithmetic unit (TAU).  $\Delta t_i$  is the  $i^{\text{th}}$  time-domain input.  $\Delta t_{\text{out}}$  is the time domain output.

the PFD converts the input time difference to the pulse-width difference. However, during the TAU output processing, the SWDs should stay LOW to keep discharging the WTRs until both CMPs falling edges are asserted. At this moment, the PFD should not revert the SWDs to HIGH because this would disrupt the output process. Therefore, when  $\overline{\text{READ}} = 0$  triggers the final output, it also blocks the PFD's reset (the second mode of PFD), and thus the SWD recovery.

The output of the proposed TAU is

$$\Delta t_{\rm out} = \sum_{i=1}^{n} \frac{\tau_{\rm out}}{\tau_i} \Delta t_i, \qquad (2.7)$$

where  $\Delta t_i$  is the input time difference between the  $i^{\text{th}}$  pair of the TIN<sub>P/N</sub> falling edges, and  $\Delta t_{\text{out}}$  is the output time offset between CMP<sub>P/N</sub>. The TAU calculates the weighted sum of all inputs, whose weights can be manipulated by tuning the corresponding *RC* time constants ( $\tau_{\text{out}}$  and  $\tau_i$ 's). Therefore, the TAU's definition in Section 2.2.1 can be satisfied. However, one may still question the equivalence between (2.7) and (2.2) since the weights are positive-only in the former ( $\tau_{\text{out}}/\tau_i$ ), but can also be negative in the latter ( $w_i$ ). This limitation can be addressed by transferring the weight's  $\pm$  sign to its associated input  $\Delta t_i$ , whose polarity is determined by the corresponding leading-falling edge on the TINs [see TIN<sub>P/N</sub> in Fig. 2.6]. In our implementation shown later, we achieve the negative weight by deliberately swapping the leading-falling edges in the corresponding active-low SWD pulse-pair.

## 2.2.3 RC tuning in the WTR



Figure 2.7: RC tuning in the weighted time register (WTR).

To further detail the weight control in (2.7) by means of  $\tau_{out}/\tau_i$ , Fig. 2.7 reifies the variable resistance and capacitance introduced in the conceptual WTR of Fig. 2.4. The variable resistor is implemented with a switchedresistor (SR) bank, consisting of parallel unit resistors,  $R_{\rm U}$ . RT determines the number of actively discharging  $R_{\rm U}$ 's (8 in total). Meanwhile, the variable capacitor is realized with a fixed capacitor  $C_0$  and a switched-capacitor (SC) bank, consisting of parallel unit capacitors,  $C_{\rm U}$ , whose active count is controlled by CT. Therefore, the RC time constant can be controlled as

$$\tau = \frac{R_{\rm U}}{\rm RT} \cdot (C_0 + C_{\rm U} \cdot \rm CT).$$
(2.8)

Note that during the complete TAU execution cycle (from the reset to output), increasing CT would engage new  $V_{\text{init}}$ -precharged capacitor units, which would lead to charge sharing, thus erroneously increasing the  $V_C$  voltage. Therefore, CT is constrained to stay constant or decrease when processing the TAU inputs (see Fig. 2.8).

The RC tuning of WTR is introduced here to pave the way for the TAU control flow design in the next section. Other details are delayed until Section 2.3.5.

## 2.2.4 TAU Control Flow within the Proposed PLL

The basis of the TAU in the proposed PLL system stems from (2.1). It was then abstracted as computing the weighted sum of its time inputs, which also generalizes the TAU functionality, i.e. (2.7). To program the TAU to execute (2.1), we designed a dedicated control flow to ensure that the TAU receives  $T_{\rm CKV}$  and  $\Delta t_{\rm S}$  [i.e. time inputs of (2.1)], assigns proper weights to them, and outputs the weighted sum.



Figure 2.8: Timing diagram of the differential WTRs' in a complete TAU execution cycle.

According to Fig. 2.8, the TAU processes four time-domain inputs in a single execution cycle. By tuning the RT and CT control pins, different RC time constants ( $\tau$ 's) can be assigned to each input. According to (2.7), the resulted output is

$$\Delta t_{\rm out} = \frac{\tau_A}{\tau_1} T_{\rm CKV} + \frac{\tau_A}{\tau_2} T_{\rm CKV} - \frac{\tau_A}{\tau_3} T_{\rm CKV} - \frac{\tau_A}{\tau_S} \Delta t_{\rm S}, \qquad (2.9)$$

where  $\tau_1$ ,  $\tau_2$ , and  $\tau_3$  are the *RC* time constants during the 1<sup>st</sup> to 3<sup>rd</sup> discharge, while  $\tau_S$  and  $\tau_A$  are those during the  $\Delta t_S$  sampling and final output, respectively. The minus signs result from the swapped leading-falling edges in the corresponding SWD pulse-pairs, as discussed in Section 2.2.2. By replacing the  $\tau$  symbols with their respective components in (2.8),  $\Delta t_{out}$  becomes

$$\Delta t_{\rm out} = 8 \left[ \left( \frac{1}{1 + N_{\rm C} \cdot \frac{C_{\rm U}}{C_0}} - \frac{N_{\rm R}}{8} \right) T_{\rm CKV} + \frac{3}{8} T_{\rm CKV} - \Delta t_{\rm S} \right]$$
(2.10)

where  $N_{\rm C}$  is the CT code during the 1<sup>st</sup> discharge,  $N_{\rm R}$  the RT code during the 3<sup>rd</sup> discharge. To explain the correlation between this output and the
functional requirement in (2.1), the TAU execution cycle is divided into a reset state and three functional states—pre-discharge, snapshot and time amplification (TA). Each of them realizes one term or coefficient in (2.1).

The execution cycle starts with the reset state, in which the SWC closes the relevant switches in the WTRs to charge all the capacitors (CT = max) to  $V_{\text{init}}$ . Then, the non-critical FREF (i.e. rising) edge disconnects the SWC switches and triggers the pre-discharge state, in which the TAU calculates and stores the  $\Delta t_{\rm S}$  prediction term,  $(1 - \phi_{\rm R,frac}) \cdot T_{\rm CKV}$ . The prediction is realized by the weighted sum of three  $T_{\rm CKV}$ 's, which are generated by sampling the CKV period and reflected on the width differences of the active-low SWD pulse-pairs. During the first SWD pulse-pair, the capacitive tuning code  $N_{\rm C}$ (on CT) is applied to finely scale  $T_{\rm CKV}$ . During the third one, the resistive tuning code  $N_{\rm R}$  (on RT) scales  $T_{\rm CKV}$  coarsely. The difference between these two scaled inputs realizes the  $(1 - \phi_{\rm R,frac}) \cdot T_{\rm CKV}$  term in (2.1) with

$$\phi_{\rm R,frac} = \frac{N_{\rm R}}{8} + \left(1 - \frac{1}{1 + N_{\rm C} \cdot \frac{C_{\rm U}}{C_0}}\right) \tag{2.11}$$

Here,  $N_{\rm R}$  ranges from 0 to 7, yielding the resolution of 1/8 in  $\phi_{\rm R,frac}$  tuning. Consequently, the  $N_{\rm C}$  term needs only to cover the tuning range of  $0 \sim 1/8$ . Within such a narrow range, the nonlinearity mapping between  $N_{\rm C}$  and  $\phi_{\rm R,frac}$  is insignificant and simple to compensate. One may notice (2.1) does not reflect the influence of the second discharge. Actually, this discharge introduces an extra offset of  $3/8 \cdot T_{\rm CKV}$  for metastability mitigation, to be discussed in Section 2.3.2.1.

After these three discharges, TAU enters the snapshot state, in which the WTRs directly subtract the sampled  $\Delta t_{\rm S}$  from the pre-stored prediction. This realizes the  $-\Delta t_{\rm S}$  term in (2.1). As a result, only the desired residue (substantially reflecting the DCO phase noise in the phase-locked state) remains in the TAU. Finally, in the TA state, the TAU outputs this residue as the time offset between CMP<sub>P</sub> and CMP<sub>N</sub> ( $\Delta t_{\rm out}$ ). During this process, the residue is also time-amplified by

$$G_{\rm TA} = \frac{\tau_A}{\tau_S} = 8. \tag{2.12}$$

This gain factor corresponds to  $G_{\text{TA}}$  in (2.1), and is realized by manipulating the ratio between  $\tau_A$  and  $\tau_S$ , more specifically, the RT code during the TA and snapshot states. After generating the outputs, the TAU returns to the reset state, awaiting the next cycle.

# 2.3 Circuit-Level Implementation of TAU

# 2.3.1 TAU Sub-System Overview



Figure 2.9: (a) Simplified diagram of the TAU-centered sub-system (without calibration circuitry shown); (b) Timing diagram of the state transition (indicated by  $\overline{\text{RST}_{all}}$ ,  $\text{PDIS}_{done}$ , and  $\overline{\text{TA}_{en}}$ ).

Figure 2.9 (a) illustrates the implemented TAU together with the auxiliary circuits that control its behavior in each state defined in Section 2.2.4. The PFD is actually realized in a more complex tri-mode in order to effectively support the three distinctly functional states—pre-discharge, snapshot, and TA. The TAU is alternatively controlled by the global and local finite state machines (FSM). Figure 2.9 (b) shows the active FSM in each TAU state, indicated by  $\overline{\text{RST}_{\text{all}}}$ , PDIS<sub>done</sub>, and  $\overline{\text{TA}_{\text{en}}}$ . In the pre-discharge state, the local FSM is active. It interacts with the tri-mode PFD (through  $\overline{\text{START}}$ )

and READY) to generate the first three inputs for the WTRs (pulse-pairs on SWD<sub>P</sub> and SWD<sub>N</sub>). Meanwhile, the local FSM adjusts the weight for each input (through RT, CT, and SIGN), whose  $\phi_{R,\text{frac}}$ -dependent weight codes, i.e.  $N_{\rm R}$  and  $N_{\rm C}$ , are calculated by the RC encoder according to (2.11). Once the TAU processes the first three inputs, the local FSM terminates the pre-discharge state and activates the global FSM through PDIS<sub>done</sub> = 1, which controls the TAU in the remaining states.

In the snapshot state, the global FSM captures  $\Delta t_{\rm S}$  and transfers it to the TAU via CKRG<sub>P</sub> and CKRG<sub>N</sub>. To mitigate the issue of potential metastability in the  $\Delta t_{\rm S}$  sampling (Section 2.3.2.1), an *anti-alignment* delay (between FREF and FREF') is added. In the TA state, the global FSM controls the local FSM to apply proper RT for  $G_{\rm TA}$  and prepares the TAU for final output, both by setting  $\overline{\rm TA}_{\rm en} = 0$ . While waiting for the TAU output, the global FSM also launches CKU, a master clock of the overall PLL. After the TAU output is quantized by its subsequent TDC (indicated by  $\overline{\rm TDC}_{\rm done}$ falling), the global FSM resets the overall TAU sub-system with  $\overline{\rm RST}_{\rm all} = 0$ . When this global reset is removed ( $\overline{\rm RST}_{\rm all} = 1$ , by the FREF rising), the local FSM will be activated again, starting the next execution cycle.

### 2.3.2 Implementation of the Global FSM

### 2.3.2.1 Differential Snapshot Circuit

In the snapshot state, the global FSM conveys the  $\Delta t_{\rm S}$  information to the TAU via CKRG<sub>P</sub> and CKRG<sub>N</sub>. Inside the global FSM,  $\Delta t_{\rm S}$  is sampled by the differential snapshot circuit. As shown in Fig. 2.10, it contains two similar single-ended paths, modified from [40]. The P-path captures the first CKV falling edge after FREF'. To achieve this, FREF' first inactivates the reset on the main flip-flop (FREF'=0), and releases CK1, the gated CKV. Once CKV falls, the main flip-flop asserts CKRG<sub>P</sub>. On the N-path, CKRG<sub>N</sub> is asserted at the FREF falling edge (since  $PDIS_{done} = 1$  in the snapshot state). Ideally, the propagation delays on these two paths are canceled, so the time offset between the CKRGs equals that between FREF and CKV, which is  $\Delta t_{\rm S}$ . One may also notice CKR<sub>en</sub>, the gating signal of CKRGs, in the differential snapshot circuit. It is scheduled by the global FSM (Fig. 2.12) for two purposes: First, in the TA state, it launches the concurrent rising edges on the CKRGs, to trigger the TAU output. Second, in the pre-discharge and reset states, it blocks activities on the CKRGs to avoid interfering with the tri-mode PFD.

The differential snapshot circuit can sample  $\Delta t_{\rm S}$  accurately only if its N- & P-path propagation delays are properly canceled. However, in reality, the flip-flop metastability may corrupt this condition, thus distorting the



Figure 2.10: Differential snapshot circuit: (a) schematic, (b) waveforms (for the case of  $\phi_{R,frac} \ge 0.5$ ).

sampled  $\Delta t_{\rm S}$ . For example, in the P-path, the flip-flop's CK-to-Q delay can dramatically increase when the reset removal (FREF' falling) is close to the subsequent critical clock edge (CK1 falling). This occurs with a certain probability (determined by the flip-flop's metastability window) in a fractional-N PLL mode because the time offset between the FREF and CKV edges (also, by extension, the offset between FREF' and CK1, if FREF' aligns with FREF due to the lack of the anti-alignment delay in Fig. 2.10(a)) distributes uniformly between 0 and  $T_{\rm CKV}$ . In contrast, the N-path is free from this issue since its reset, inverse of PDIS<sub>done</sub>, can be guaranteed to settle sufficiently earlier than CK2 (or FREF). Consequently, the P-path delay variation can reflect on the time offset between CKRG<sub>P</sub> and CKRG<sub>N</sub>, thus adding uncertainty to the sampled  $\Delta t_{\rm S}$ .

To avoid this flip-flop metastability issue, we add a conditional *anti*alignment delay, either 0 or  $T_{\rm CKV}/2$ , between FREF' and FREF according to the  $\Delta t_{\rm S}$  prediction [i.e.,  $(1 - \phi_{\rm R,frac}) \cdot T_{\rm CKV}$ ]. Consequently, the FREF' falling edge can be sufficiently separated from its neighboring CKV (strictly speaking, CK1) falling edge, and the flip-flop metastability will not occur. The corresponding delay logic is shown in Fig. 2.10(a). To assist the metastability mitigation, we also add to  $\Delta t_{\rm S}$  an additional offset of  $3/8T_{\rm CKV}$ , during the 2<sup>nd</sup> discharge in the pre-discharge state (see Fig. 2.8). Since any type-II PLL always keeps a zero-mean input to the loop filter, this offset finally appears in the expected  $\Delta t_{\rm S}$ :

$$\Delta t_{\rm S} = (1 - \phi_{\rm R, frac}) \cdot T_{\rm CKV} + \frac{3}{8} T_{\rm CKV}.$$
 (2.13)



Figure 2.11: Boundary cases of the metastability mitigation mechanism that prevents the insufficient separation between FREF' and the subsequent CKV edge [corresponding to CK1 in Fig. 2.10 (a)].

For the purpose of explaining how this metastability mitigation mechanism works and the reason for adding the additional  $\Delta t_{\rm S}$  offset, four boundary cases are examined in Fig. 2.11. From subfigures (a) to (d), these cases are arranged with increasing  $\Delta t_{\rm S}$  (hence, decreasing  $\phi_{\rm R,frac}$ ). In (a), FREF' is relatively close to the subsequent CKV. As  $\Delta t_{\rm S}$  increases, FREF' moves away from the subsequent CKV edge, but gets closer to the precedent CKV edge until (b), right before the anti-alignment delay changes (controlled by SelDelay). At the moment SelDelay switches from 0 to 1 [see (c) when  $\phi_{\rm R,frac} = 0.5$ ], FREF' is shifted by  $T_{\rm CKV}/2$ , thus closer to the subsequent CKV edge again, just as in (a). Then, as  $\Delta t_{\rm S}$  increases, FREF' is gradually away from the subsequent CKV edge and closer to the precedent CKV edge until  $\Delta t_{\rm S}$  reaches its maximum in (d), repeating the trend from (a) to (b).

There are two critical timing separations in these boundary cases. The first one is the minimum level of separation between FREF' and the subsequent CKV edge [see the light blue shaded area in subfigures (a) and (c)]. The exact value of this separation is controlled by the intentional  $\Delta t_{\rm S}$  'bias' offset (i.e.,  $3T_{\rm CKV}/8$  in our case) in the pre-discharge state, and so increasing it helps to mitigate the linearity degradation due to metastability. The second is the minimum separation between FREF' and the precedent CKV edge [see the light red shaded area in subfigures (b) and (d)]. This separation equals  $T_{\rm CKV}/2$  minus the intentional  $\Delta t_{\rm S}$  offset and is essential to avoid FREF' being caught up with the precedent CKV edge, which would cause the snapshot circuit to capture the wrong  $\Delta t_{\rm S}$ . The exact value of this separation is not so critical as long as it does not cross zero.

Interestingly, the sum of these two critical separations equals  $T_{\rm CKV}/2$ . It seems optimal to equally allocate  $T_{\rm CKV}/2$  to these two separations, i.e.,  $T_{\rm CKV}/4$  for either. However, because the separation between FREF' and the subsequent CKV edge can cause the linearity issue, we prefer to assign more margin to it, thus finally choosing  $3T_{\rm CKV}/8$  as the intentional  $\Delta t_{\rm S}$  offset.

Although adding the offset of  $3T_{\rm CKV}/8$  alleviates the metastability issue, it shifts the range of  $\Delta t_{\rm S}$  from  $(0, T_{\rm CKV}]$  to  $(3T_{\rm CKV}/8, 11T_{\rm CKV}/8]$ , thereby increasing the maximum  $\Delta t_{\rm S}$  to  $11T_{\rm CKV}/8$ . To handle the increased  $\Delta t_{\rm S}$ , the WTRs should adopt a larger  $R_0C_0$  (see Section 2.3.5), but this slows the discharge slew rate and degrades the noise performance (see Section 2.5.3). This is a trade-off between linearity (which may be degraded due to metastability) and noise. However, more advanced technology nodes will suffer less from this trade-off because the flip-flops are faster with a narrower metastability window [56].

### 2.3.2.2 Time Amplification Control and Global Reset

Fig. 2.12 shows the overall global FSM, emphasizing the TA control logic and the global reset. The core of the TA control logic is a shift-register chain,



Figure 2.12: Schematic and waveform diagrams of the global FSM.

whose outputs (ST<2:0>) serve as a state variable, scheduling the TArelated actions: In the state of ST<2:0>= 3'b001, the global FSM notifies the local FSM to adjust RT for  $G_{\text{TA}}$ , alters the tri-mode PFD to the TA mode, and prepares the WTR comparator for the final output. All these actions are performed by pulling down  $\overline{\text{TA}_{\text{en}}}$ . When ST<2:0>= 3'b011, the tri-mode PFD is triggered for the final output by the rising CKR<sub>en</sub>, which launches CKRG<sub>P</sub> = 1 and CKRG<sub>N</sub> = 1 in the differential snapshot circuit. The shift-register chain is clocked by a gated CKV, i.e., CKTA. It is activated after sampling  $\Delta t_{\text{S}}$  (indicated by CKR rising), and deactivated after triggering TAU output (ST < 2:0 >= 3'b111). The TA logic also launches the master clock for the PLL digital part (CKU) after triggering the TAU output. This helps protect the critical events (e.g., sampling  $\Delta t_{\rm S}$  and launching the final output of TAU) from potential interferences due to digital activity.

Once the output of TAU has been quantized (indicated by  $\text{TDC}_{\text{done}} = 0$  from the TDC), the global FSM asserts a global reset ( $\overline{\text{RST}_{all}} = 0$ ). As a result, the TAU enters the reset state, waiting for the next TAU execution cycle (triggered by FREF rising).

# 2.3.3 Implementation of the Tri-Mode PFD

Figure 2.13 (a) shows details of the tri-mode PFD, whose three modes pair up with the three functional states of TAU. These modes are switched according to the TAU state indicators— $\overline{\text{RST}_{all}}$ , PDIS<sub>done</sub>, and  $\overline{\text{TA}_{en}}$  [see Fig. 2.9 (b)].

PFD Mode 1 is active in the pre-discharge state. The PFD core is driven then by the dedicated clock gating block, which releases the gated CKV clocks on  $CKVG_P$  and  $CKVG_N$  with one CKV cycle delay (when READY = 0). Once the CKVGs are released, the PFD core launches an active-low pulse-pair on SWD<sub>P</sub> and SWD<sub>N</sub>, whose width difference is  $T_{\rm CKV}$ . Fig. 2.13 (b) illustrates a single SWD pulse-pair generation cycle. Once a cycle is triggered (START falling, event marker  $\langle 1 \rangle$ ), the flip-flop Q2 removes the reset on the output flip-flops Q1 and Q3 (RST = 0,  $\langle 2 \rangle$ ), unsets the PFD idle flag (READY = 0,  $\langle 3 \rangle$ ) and enables the CKV gating block to release the CKVGs successively  $(\langle 4.1 \rangle \text{ and } \langle 4.2 \rangle)$ . At the CKVGs' rising edges, the corresponding SWDs fall  $(\langle 5.1 \rangle \text{ and } \langle 5.2 \rangle)$ . Once both the SWDs become LOW, they are reset  $(\langle 6 \rangle)$  to HIGH simultaneously ( $\langle 7 \rangle$ ). Consequently, the PFD outputs an active-low pulse-pair on the SWDs. Meanwhile, the SWD reset ( $\langle 6 \rangle$ ) also raises the PFD idle indicator (READY = 1,  $\langle 7^* \rangle$ ), which is the check signal for the local FSM (Fig. 2.15) to determine whether to start the next pulse-pair generation cycle (through START = 0,  $\langle 8 \rangle$ ). Additionally, as mentioned in Section 2.2.2, the TAU needs to swap the leading-falling edges in the generated SWD pulse-pair when a negative weight is required. The SIGN signal (from the local FSM) controls this polarity by determining the earlier released CKVG. A question may arise whether the output flip-flops Q1 and Q3 can be disturbed by the activities on  $CKRG_P$  and  $CKRG_N$  in PFD Mode 1. According to Fig. 2.10, this cannot happen since the CKRGs are blocked by  $CKR_{en} = 0$  in the pre-discharge state.

After the pre-discharge, the CKVGs are frozen at LOW by  $PDIS_{done} = 1$ . Then, the tri-mode PFD is driven by the CKRGs, and behaves the same as the dual-mode PFD in the conceptual TAU of Fig. 2.6. Detailed waveforms



Figure 2.13: Tri-mode PFD: (a) simplified diagram, and (b) its waveforms. In the reset flip-flop Q2, the R(eset) has higher priority than the S(et).

are illustrated in Fig. 2.13 (b): In PFD Mode 2 (paired with the snapshot state), the PFD converts the time difference between the CKRGs to the width difference of the SWD pulse-pair. In PFD Mode 3 (corresponding to the TA state), reset of the output flip-flops Q1 and Q3, i.e RST, is initially disabled [by  $\overline{TA_{en}} = 0$ , note that  $\overline{RST_{all}} = 1$  and  $\overline{TA_{done}} = 1$  at this moment, and R(eset) has a higher priority than S(et) in flip-flop Q2]. Consequently,

SWDs can remain at LOW ( $\langle 2 \rangle$ ) after being triggered by the CKRGs ( $\langle 1 \rangle$ ). The LOW-level SWDs keep discharging the WTRs. As soon as both WTRs output, a feedback signal [ $\langle 3 \rangle$ ,  $\overline{TA}_{done} = CMP_P + CMP_N = 0$ , in Fig. 2.9 (a) upper-right] enables the reset (RST = 1,  $\langle 4 \rangle$ ) so that the SWDs can recover HIGH level ( $\langle 5 \rangle$ ) in order to stop discharging the WTRs.

### 2.3.4 Implementation of the local FSM



Figure 2.14: Single pulse-pair generation (SPPG) logic.

In the pre-discharge state, the local FSM controls the tri-mode PFD to generate the first three SWD pulse-pairs and applies proper weights to the WTRs. Each pulse-pair is generated through the interaction between the local FSM and the tri-mode PFD in a self-timed style, emulating the asynchronous SAR ADC [57]. Fig. 2.14 shows the detailed single pulse-pair generation (SPPG) logic. Two prerequisites are needed to activate the SPPG logic completed ( $\overline{RST}_{all} = 1$ ); the precedent (if existing) SPPG logic completed ( $\overline{STATE} < n-1 > = 1$ ). Once the tri-mode PFD becomes idle [READY = 1,  $\langle 1 \rangle$ ], the SPPG cycle starts by raising its state indicator (STATE < n > = 1,  $\langle 2 \rangle$ ). Then a trigger pulse is generated (on TRIG < n >,  $\langle 3 \rangle$ ) to notify the tri-mode PFD to launch a SWD pulse-pair [through START,  $\langle 4 \rangle$ , which sums the TRIG < n >'s from all the SPPG units in Fig. 2.15]. Once the pulse-pair gets generated, the tri-mode PFD sets the idle flag again (READY = 1,  $\langle 5 \rangle$ ), possibly starting the next SPPG cycle ( $\langle 6 \rangle$ ).

Fig. 2.15 sketches the overall local FSM, which cascades three SPPG units and sums their trigger pulses ( $\overline{\text{START}} = \sum_{i=0}^{3} \text{TRIG} \langle i \rangle$ ) to launch the SWD pulse-pairs consequentially. The corresponding timing diagram in a complete TAU execution cycle is shown in Fig. 2.16. After activated by the global reset removal ( $\overline{\text{RST}}_{all} = 1$ ), the local FSM disconnects the TAU's charging switch (SWC = 0), and triggers the tri-mode PFD (through the first  $\overline{\text{START}}$ falling edge) to generate the first SWD pulse-pair. After that, the SPPGs



Figure 2.15: Simplified diagram of the local FSM.



Figure 2.16: Waveforms of the local FSM.

interact with tri-mode PFD (through START and READY) to launch the remaining two SWD pulse-pairs (as Fig. 2.14). Once 'done' (indicated by the  $3^{\rm rd}$  READY rising), the state of the TAU transitions from the pre-discharge to snapshot (PDIS<sub>done</sub> = 1). Accordingly, the tri-mode PFD changes its mode. Then, at the local FSM's further request for pulse-pair generation (the  $4^{\rm th}$  START falling), the tri-mode PFD merely removes its output reset, i.e., RST falls in Fig. 2.13 (a), readying itself for processing  $\Delta t_{\rm S}$  in the snapshot state.

The weight for each WTR discharge is controlled by the corresponding

combinational logic in the local FSM (Fig. 2.15), which translates the outputs of RC encoder ( $N_{\rm C}$  and  $N_{\rm R}$ ) to the weight-control sequences (on RT, CT, and SIGN) according to the SWD pulse-pair indexes (STATE < 3:1>) and certain TAU state indicators (TA<sub>en</sub>, and the inverted RST<sub>all</sub>, i.e. SWC). Note that the delay lines in the local FSM and SPPGs are realized with replica logic gates and routing of the corresponding weight control paths, in order to emulate the propagation delay. Therefore, these delays guarantee the corresponding discharges to be triggered (by START falling) after the weight control signals get settled down.

#### 2.3.5 Implementation of the WTR



Figure 2.17: Schematic of the implemented WTR.

Figure 2.17 shows the implemented WTR. The switched-resistor (SR) and switched-capacitor (SC) units adopt dummy switches, roughly compensating their main switches' charge injection and clock feed-through in order to minimize the TAU's arithmetic accuracy degradation. Finer compensation is performed by a piece-wise pre-distortion in the RC encoder (see Section 2.6.3). Considering that the overall TAU targets 10-bit accuracy, the WTR uses 8 SR units and 223 SC units to realize the upper 3 bits and lower 7 bits, respectively. The over-designed 223 SC units provide enough redundancy for pre-distortion (or calibration).

Contrasting with the conceptual diagram in Fig. 2.7, the SC units here are connected to the power  $(V_{DD})$  instead of ground. This is to avoid a situation where the bottom plate voltages of those disconnected SC units fall below ground after the discharge. This could occur if the bottom plates were initially connected to ground, and would result in reverse polarization of their switches, causing charge leakage, thus degrading the TAU's arithmetic accuracy.

The slicing comparator is modified from the threshold-crossing detector (TCD) in [58]. As shown in Fig. 2.18, the implemented slicer mainly consists



Figure 2.18: Level-crossing slicer in the WTR: schematic and waveforms.



Figure 2.19: Visualization of the equivalent discharge time which accumulates on the differential WTRs during the four discharge-pulse-pairs in Fig. 2.8. Amounts of the discharge time refer to their equivalents in the snapshot state.

of a gated inverter (PM2 and NM1) and a dynamic inverter (PM3, NM3, and NM4). The slicer is enabled (by  $\overline{\text{TA}_{en}} = 0$ ) right before the final discharge of the WTR to avoid unnecessary power consumption due to the possible crowbar current (since  $V_{\rm C}$  can be close to the threshold of the first-stage inverter before the final discharge). Once the slicer output is asserted (CMP = 0), the first-stage inverter is gated off immediately to save power. Capacitors  $C_1$  and  $C_2$  help to suppress the output jitter [58]. The cross detection threshold of this slicer,  $V_{\rm th}$ , is dominated by that of the first-stage inverter, which

drifts with PVT variations. Fortunately, the differential arrangement helps to cancel the influence of  $V_{\rm th}$  drift common to both paths.  $V_{\rm th}$  mismatch between the differential paths mainly causes a constant output offset, which is automatically compensated by the loop dynamic of a type-II PLL.

Considering the constraint in Section 2.2.2 stating that  $V_{\rm C}$  should be higher than  $V_{\rm th}$  after the (W)TR processes all the time inputs, one may wonder how to properly choose  $V_{\text{init}}$ ,  $V_{\text{th}}$ , and the R & C values of the WTR to satisfy this constraint. From the circuit perspective, these four physical parameters determine the upper-limit of the discharge duration that a WTR can handle, i.e.,  $\Delta t_{\rm lim}$ . From the system perspective, the time processing details in Fig. 2.8 determine the maximum discharge duration the TAU should handle, i.e.,  $\Delta t_{\rm max}$ . As long as  $\Delta t_{\rm max} < \Delta t_{\rm lim}$ ,  $V_{\rm C}$  would never fall below  $V_{\rm th}$ after all the inputs get processed. In this way, the four physical parameters of the WTR are constrained. Next, we calculate  $\Delta t_{\text{lim}}$  and  $\Delta t_{\text{max}}$  separately. Note that in the analysis below, all the discharge durations are referred to their corresponding equivalents in the snapshot state, i.e., resulting in the same amount of  $V_{\rm C}$  drop if discharging  $C_0$  through  $R_0/8$ . This is because the primary goal of the TAU is to cancel  $\Delta t_{\rm S}$ , which is processed in the snapshot state.  $\Delta t_{\rm lim}$  can be determined by discharging  $C_0$  from  $V_{\rm init}$  to  $V_{\rm th}$  through  $R_0/8$ :

$$\Delta t_{\rm lim} = \frac{R_0 C_0}{8} \ln \left( \frac{V_{\rm init}}{V_{\rm th}} \right). \tag{2.14}$$

To analyze  $\Delta t_{\text{max}}$ , Fig. 2.19 depicts the equivalent discharge time of the differential WTRs. Each SWD pulse-pair contains a differential component  $\Delta t_{\text{diff}}$ , and a common-mode component  $\Delta t_{\text{cm}}$ . The former is the explicit time input to be processed, i.e.,  $T_{\text{CKV}}$  or  $\Delta t_{\text{S}}$ , depending on the state of the TAU; the latter results from the PFD reset delay. The influences of these two components should be considered separately.

Considering that the time signals on the P and N paths will cancel out, the maximum accumulated duration in the differential mode can be estimated by inspecting the P-path as

$$\max(\Delta t_{\text{acc,diff}}) = \left[\max\left(\frac{1}{1+N_{\text{C}} \cdot C_{\text{U}}/C_{0}}\right) + \frac{3}{8}\right] \cdot T_{\text{CKV}}$$
$$= \frac{11}{8} \cdot T_{\text{CKV}},$$
(2.15)

which is obtained at  $N_{\rm C} = 0$ . For the common-mode discharge, the max

accumulated duration is

$$\max(\Delta t_{\rm acc,cm}) = \left[ \max\left(\frac{1}{1 + N_{\rm C} \cdot C_{\rm U}/C_0}\right) + \frac{3}{8} + \max\left(\frac{N_{\rm R}}{8}\right) + 1 \right] \cdot \Delta t_{\rm cm} = \frac{26}{8} \cdot \Delta t_{\rm cm},$$
(2.16)

which is achieved at  $N_{\rm C} = 0$  and  $N_{\rm R} = 7$ . Summing max( $\Delta t_{\rm acc,diff}$ ) and max( $\Delta t_{\rm acc,cm}$ ) yields  $\Delta t_{\rm max}$ . By substituting (2.14), (2.15), and (2.16) into  $\Delta t_{\rm max} < \Delta t_{\rm lim}$ , the minimum required product of  $R_0 \times C_0$  can be constrained as

$$R_0 C_0 > \frac{11T_{\rm CKV} + 26\Delta t_{\rm cm}}{\ln(V_{\rm init}/V_{\rm th})}.$$
 (2.17)

## 2.3.6 Implementation of the RC Encoder



Figure 2.20: Implementation diagram of the RC encoder.

The RC encoder assists the local FSM with the weight control by mapping  $\phi_{\text{R,frac}}$  to  $N_{\text{C}}$  and  $N_{\text{R}}$ , which are respectively the CT code at the first discharge and the RT code at the third discharge (Fig. 2.8). According to (2.11), the mapping from  $\phi_{\text{R,frac}}$  to  $N_{\text{R}}$  is linear. Considering  $N_{\text{R}}$  is responsible for the coarse tuning, it is simply obtained by truncation,

$$N_{\rm R} = \lfloor 8 \cdot \phi_{\rm R, frac} \rfloor. \tag{2.18}$$

Then,  $N_{\rm C}$  handles the residue phase

$$\phi_{\rm CT} = \phi_{\rm R, frac} - \frac{N_{\rm R}}{8} = 1 - \frac{1}{1 + N_{\rm C} \cdot \frac{C_{\rm U}}{C_{\rm o}}}.$$
 (2.19)

Accurate mapping from  $\phi_{\rm CT}$  to  $N_{\rm C}$  is nonlinear and rather complex, but it can be approximated with Taylor series considering that  $\phi_{\rm CT}$  is merely a small residue (< 1/8) after the coarse tuning:

$$N_{\rm C} = \frac{C_0}{C_{\rm U}} \cdot \left(\frac{1}{1 - \phi_{\rm CT}} - 1\right) = \frac{C_0}{C_{\rm U}} \cdot \left[\phi_{\rm CT} + \phi_{\rm CT}^2 + o(\phi_{\rm CT})\right],\tag{2.20}$$

where the dominant nonlinearity is handled by  $\phi_{CT}^2$ , and higher-order errors are compensated by  $o(\phi_{CT})$ . Fig. 2.20 illustrates the implemented RC encoder. The path from  $\phi_{R,frac}$  to  $N_R$  reflects (2.18). Eq. (2.20) is realized by the path from  $\phi_{CT}$  to  $N_C$ , where a sparse look-up table (LUT) stores the highorder error  $o(\phi_{CT})$ , and  $\mathbf{E}(C_0/C_U)$  estimates the fabricated capacitance ratio  $C_0/C_U$ .

# 2.4 Implemented PLL



Figure 2.21: Top-level diagram of the proposed PLL.

The proposed TAU sub-system is incorporated into the PLL shown in Fig. 2.21. The TAU extracts the time error  $\Delta t_{\rm E}$ , mainly due to the DCO phase noise, by canceling  $\Delta t_{\rm S}$  with its prediction. Unlike the DTC-based or voltage-domain methods, which cancel  $\Delta t_{\rm S}$  with fixed time resolution, the TAU has a fixed *phase* resolution of  $2\pi/1024$  as it scales the carrier period  $T_{\rm CKV}$  with the 10-bit accuracy. The output of the TAU is quantized by a 4-bit differential TDC (details shown in Appendix A), whose overall architecture is quite similar to that in [38]. However, the sub-TDC for each differential path was replaced by a vernier counterpart in [58] in order to achieve fine resolution of  $\Delta t_{\rm res,TDC} = 1.9 \,\mathrm{ps.}$  Considering the TAU's time amplification gain  $G_{\rm TA} = 8$ , the equivalent TDC quantization resolution is finer than 240 fs, thus negligible for the PLL in-band phase noise. In parallel with the TAU-based phase error tracking path, there is also a counter path assisting the frequency (re)locking, which can be turned off to save power once the PLL is locked. Similar as in [19,34,44], the counter path could be 'instantaneously' woken up when the PLL gets unlocked as detected by a range detector in TDC. To unify the scales of the data from the counter and TDC paths before they are combined, the TDC output (D<sub>TDC</sub>) is normalized by  $K_{\rm TDC}/G_{\rm TA}$ , where  $K_{\rm TDC}$  equals the TDC resolution ( $\Delta t_{\rm res,TDC}$ ) normalized by the CKV period, i.e.,  $K_{\rm TDC} = \Delta t_{\rm res,TDC}/T_{\rm CKV}$ .

The DCO is implemented using an LC tank and a complementary crosscoupled pair (circuit details are similar as those to be later explained in Section 4.5.3). It covers the oscillation frequency range from 2.6 to 4.1 GHz. The frequency tuning is achieved by switched-capacitor banks, with the finest frequency resolution varying from 70 kHz to 290 kHz, depending on the oscillation frequency. To reduce its phase noise contribution, the frequency resolution is dithered by a  $\Delta\Sigma$ -modulator (DSM), operating at 1/8 DCO's resonant frequency.

# 2.5 Noise/Jitter Analysis

As the TAU adopts the differential WTRs to perform time-domain signal processing, all the noise generated within the TAU sub-system will be eventually reflected at the differential output as timing variance. The noise sources are categorized into two types: the time-domain noise, which constitutes the SWD jitter and is added to WTRs in conjunction with the time-domain inputs; and voltage noise, which originates inside the WTRs. Each noise type shows a distinctive transfer function at the TAU output.

# 2.5.1 Time-Domain Noise

Figure 2.22 depicts the time-domain noise presenting as jitter on the SWD edges. During the pre-discharge and snapshot states, the jitter that belongs to the same SWD pulse-pair is clustered as a pulse-width difference variance,  $\sigma_{\rm PP}$ . The  $\sigma_{\rm PP}$ 's in the pre-discharge and snapshot states are further distinguished as  $\sigma_{\rm PP,P}$  and  $\sigma_{\rm PP,S}$ , respectively. The  $\sigma_{\rm PP}$ 's are injected into the differential WTRs 'riding' on top of their time-domain inputs to finally appear at the TAU output along the corresponding outputs. Therefore, the TAU's signal processing function of (2.10) also applies to  $\sigma_{\rm PP}$ . Moreover, consider the two facts:  $\sigma_{\rm PP,P}$  and  $\sigma_{\rm PP,S}$  are added to  $T_{\rm CKV}$  and  $\Delta t_{\rm S}$ , respectively; the



Figure 2.22: Time-domain noise injected into the differential WTRs.

factor of 8 in (2.10) results from the time-amplification gain  $G_{\text{TA}} = 8$  [see (2.12)]. Consequently, we obtain the code-dependent TAU output variance resulting from the time-domain noise

$$\sigma_{\rm TD,out}^{2}(N_{\rm C}, N_{\rm R}) = G_{\rm TA}^{2} \cdot \left\{ \left[ \left( \frac{1}{1 + N_{\rm C} \cdot C_{\rm U}/C_{0}} \right)^{2} + \left( \frac{N_{\rm R}}{8} \right)^{2} + \left( \frac{3}{8} \right)^{2} \right] \cdot \sigma_{\rm PP,P}^{2} + \sigma_{\rm PP,S}^{2} \right\}$$
(2.21)

The  $N_{\rm C}$  and  $N_{\rm R}$  related coefficients represent  $\phi_{\rm R,frac}$  [see (2.11)], which uniformly distributes between 0 and 1 in fractional-N channels, thus their effects can be averaged accordingly. This yields the average TAU output variance:

$$\overline{\sigma_{\rm TD,out}^2} \approx G_{\rm TA}^2 \cdot (1.3\sigma_{\rm PP,P}^2 + \sigma_{\rm PP,S}^2).$$
(2.22)

#### 2.5.2 Circuit-Level Contributors of Time-Domain Noise

Up to now,  $\sigma_{\rm PP}$  has been treated as top-level composite noise. In this section, we break it down into circuit-level contributors so that we can estimate  $\sigma_{\rm TD,out}^2$  by combining the simulated jitter of each sub-circuit. According to Fig. 2.23, three physical mechanisms contribute to  $\sigma_{\rm PP}$ . The first is the original edge source which triggers the SWD falling edges, i.e., CKV or FREF. Its edges determine the SWD pulse-width difference. Correspondingly, the edge source adds its jitter  $\sigma_{\rm src}$  to  $\sigma_{\rm PP}$ . The second  $\sigma_{\rm PP}$  contributor is a conceptual edge-sampler, which samples the time information from the



Figure 2.23: Jitter contributors of an SWD pulse-pair. Note: only half of the PFD core is shown here, so  $\sigma_{\rm PFD}$  consists of  $\sigma_{\rm fall}$  and  $\sigma_{\rm rise}$  contributions on both paths, yielding  $\sigma_{\rm PFD}^2 = 2\sigma_{\rm fall}^2 + 2\sigma_{\rm rise}^2$ .

edge source and transfers it to the tri-mode PFD core. For example, in the snapshot state, it represents the differential snapshot circuit (Fig. 2.10), which samples  $\Delta t_{\rm S}$  from CKV and FREF. To realize the required functions, the edge samplers usually block the unwanted edges and pass the desired ones. Thus, the edge sampler smears out the desired edges during the propagation. Consequently, the edge sampler adds its jitter  $\sigma_{\rm samp}$  to the SWD falling edges. The last  $\sigma_{\rm PP}$  component is  $\sigma_{\rm PFD}$ , i.e. width difference variance of the SWD pulse-pair due to the tri-mode PFD core, which launches the pulse-pair, and contributes noise to both the SWD falling and rising edges. Since the PFD reset logic is common for the differential paths, its noise contribution is canceled in the final pulse-width difference [59]. Therefore, only the output flip-flops degrade  $\sigma_{\rm PFD}$ . Finally,  $\sigma_{\rm PP}$  is broken down to

$$\sigma_{\rm PP}^2 = 2\sigma_{\rm src}^2 + 2\sigma_{\rm samp}^2 + \sigma_{\rm PFD}^2, \qquad (2.23)$$

where the factor 2 indicates that the edge jitter adds to both SWD paths.

For  $\sigma_{\rm PP,P}$ , i.e., the  $\sigma_{\rm PP}$  in the pre-discharge state, its edge source is the CKV clock with jitter of  $\sigma_{\rm CKV}$ , and the edge sampler is the CKV gating block in the tri-mode PFD with jitter of  $\sigma_{\rm CKVG}$ . Therefore,  $\sigma_{\rm PP,P}$  is detailed as

$$\sigma_{\rm PP,P}^2 = 2\sigma_{\rm CKV}^2 + 2\sigma_{\rm CKVG}^2 + \sigma_{\rm PFD}^2.$$
(2.24)

For  $\sigma_{\text{PP,S}}$ , i.e., the  $\sigma_{\text{PP}}$  in the snapshot state, its edge source contains both the CKV and FREF clocks, and the edge sampler is the differential snapshot circuit with jitter of  $\sigma_{\text{snap}}$  on either path. Therefore, the  $\sigma_{\text{PP,S}}$  breakdown is

$$\sigma_{\rm PP,S}^2 = \sigma_{\rm CKV}^2 + 2\sigma_{\rm snap}^2 + \sigma_{\rm PFD}^2.$$
(2.25)

The coefficient of  $\sigma_{CKV}^2$  is 1 since the CKV clock only launches one SWD falling edge. Although FREF triggers the other SWD falling edge, its jitter is expediently ignored here since it is usually considered as reference noise in

the PLL systems. Substituting (2.24) and (2.25) into (2.22), we have

$$\overline{\sigma_{\text{TD,out}}^2} \approx G_{\text{TA}}^2 \cdot (3.6\sigma_{\text{CKV}}^2 + 2.6\sigma_{\text{CKVG}}^2 + 2\sigma_{\text{snap}}^2 + 2.3\sigma_{\text{PFD}}^2).$$
(2.26)

Note that  $\sigma_{\text{CKV}}$  here accounts only for the jitter of the DCO buffer (see Fig. 2.21), and does not contain the contribution from the DCO phase noise, although it also shows up in the final output of TAU. This is because the DCO phase noise is the information that the TAU intends to extract for the PLL operation, instead of an additional analog impairment introduced by the TAU.

The detailed reason can be understood by comparing the phase-noiseinduced time signal sampled by an ideal phase detector with that from a conceptual TAU. To aid in analyzing the detected time error, Fig. 2.24 sketches the significant FREF and CKV edges as vertical arrows. In this diagram, an ideal CKV period is denoted as  $T_0$ , and the DCO phase noise reflects on the period variations, i.e.,  $\Delta T_i$ , where  $i = 1, 2, \dots, N + 1$ . FCW is mathematically expressed as N + f, where N and f are, respectively, the integer and fractional components. The FREF signal represents an ideal reference clock with a period of  $(N + f) \cdot T_0$ . CKV and FREF edges are assumed to be initially aligned at the beginning, i.e., t = 0. Since then, the DCO period variation gradually accumulates and finally reflects on the CKV jittery edges [60]. When the next FREF edge arrives at  $t = (N + f) \cdot T_0$ , both the ideal phase detector and TAU can extract the accumulated CKV edge variation relative to FREF,  $\Delta t_e$ .

Regarding the case of an ideal phase detector, the accumulated CKV time error  $\Delta t_{e,ideal}$  reflects on the time difference between the next FREF edge (at  $t = (N + f) \cdot T_0$ ) and a virtual CKV edge with fractional index  $N.f^{\text{th}}$  (the blue one), similar to the phase-error-extraction strategy adopting a DTC in Section 2.1. Assuming the virtual edge is generated by interpolating the  $N^{\text{th}}$ and  $(N + 1)^{\text{th}}$  CKV edge according to the fractional FCW (i.e., f),  $\Delta t_{e,ideal}$ can be expressed as

$$\Delta t_{\rm e,ideal} = \sum_{i=1}^{N+1} \Delta T_{\rm i} - (1-f) \Delta T_{\rm N+1}.$$
 (2.27)

Regarding the case of the TAU, it samples the m<sup>th</sup> CKV period<sup>1</sup>,  $T_0 + \Delta T_m$ , scales it by (1 - f), then cancels this scaled result with  $\Delta t_S$  to extract the desired time error

$$\Delta t_{\rm e,TAU} = \sum_{i=1}^{N+1} \Delta T_{\rm i} - (1-f) \Delta T_{\rm m}.$$
 (2.28)

<sup>&</sup>lt;sup>1</sup>For the sake of simplicity, the TAU is assumed to sample only one CKV period.

The difference between  $\Delta t_{\rm e,ideal}$  and  $\Delta t_{\rm e,TAU}$  just lies in the time variations, i.e.,  $\Delta T_{\rm N+1}$  or  $\Delta T_{\rm m}$ , associated with the CKV periods used to cancel  $\Delta t_{\rm S}$ . This difference does not matter for the PLL system because both  $\Delta T_{\rm N+1}$  and  $\Delta T_{\rm m}$  are the real error accumulated on the DCO and should be detected for correction. In other words, the TAU does not contribute any additional mechanism to convert the DCO phase noise to the PLL phase noise. Hence,  $\sigma_{\rm CKV}$  in (2.26) does not need to consider the contribution from DCO phase noise.

### 2.5.3 Voltage Noise

In the TA state, the differential WTRs convert their internal voltages into the time difference at the output. As such, any internal noise voltage will be manifested as time difference variance  $\sigma_{\rm VD,out}$ . Two types of noise voltages dominate  $\sigma_{\rm VD,out}$ — KT/C noise on the fixed capacitor  $C_0$ , and the noise voltage of the first-stage slicing comparator (see Fig. 2.18). For either WTR, its output jitter due to the KT/C noise is estimated as

$$\sigma_{\rm KT/C}^2 = \frac{kT}{C_0} \cdot \frac{1}{k_{\rm th1}^2},$$
(2.29)

where k is the Boltzmann's constant, T is the absolute temperature, and  $k_{\text{th1}}$  is the slope of the  $C_0$  discharge curve when it crosses  $V_{\text{th1}}$ , the threshold voltage of the first-stage cross comparator. With the windowed integral theory in [14], the first-stage cross comparator approximately degrades the WTR output jitter by

$$\sigma_{\rm cmp}^2 = \frac{\sqrt{2}kT\gamma}{\sqrt{V_{\rm th2} \cdot k_{\rm th1}^3 \cdot g_{\rm m,eq} \cdot C_1}},\tag{2.30}$$



Figure 2.24: Comparison of the time error extracted by an ideal phase detector and that from the conceptual TAU. FCW = N + f;  $T_0$  is the ideal CKV period;  $\Delta T_i$   $(i = 1, 2, \dots, N + 1)$  is the CKV period variation due to the DCO phase noise.

where  $g_{m,eq}$  is the equivalent transconductance combination of PM2 and NM1,  $C_1$  is the load capacitance of PM2 and NM1,  $\gamma$  is the excess noise factor, and  $V_{th2}$  is threshold voltage of the second-stage of the cross comparator. (Details can be found in Appendix B.) Consequently, the TAU's output variance resulting from the voltage-domain noise is roughly

$$\sigma_{\rm VD,out}^2 = 2 \cdot (\sigma_{\rm KT/C}^2 + \sigma_{\rm cmp}^2), \qquad (2.31)$$

where the factor 2 accounts for the differential operation.

#### 2.5.4 TAU's Input-Referred Noise and its Contribution to PLL's Phase Noise

Summing  $\overline{\sigma_{\text{TD,out}}^2}$  and  $\sigma_{\text{VD,out}}^2$  estimates  $\sigma_{\text{TAU,out}}^2$ , the overall time difference variance at the TAU output. Yet, we prefer to use the input-referred jitter for the PLL phase noise analysis, especially that at the FREF side, e.g. [26] [61]. According to (2.10) and (2.12), the transfer gain from FREF related input, i.e.,  $\Delta t_{\text{S}}$ , to TAU's output is  $G_{\text{TA}} = 8$ . Therefore,  $\sigma_{\text{TAU,out}}^2$  is divided by  $G_{\text{TA}}^2$ to derive the TAU's input-referred jitter:

$$\sigma_{\text{TAU,in}}^2 \approx 3.6\sigma_{\text{CKV}}^2 + 2.6\sigma_{\text{CKVG}}^2 + 2\sigma_{\text{snap}}^2 + 2.3\sigma_{\text{PFD}}^2 + \frac{\sigma_{\text{KT/C}}^2 + \sigma_{\text{cmp}}^2}{32}$$
(2.32)

Since the thermal noise dominates  $\sigma_{\text{TAU,in}}^2$ , the noise spectrum can be assumed to uniformly spread over the reference frequency range  $f_{\text{REF}}$ . According to [26], this jitter power spectral density can be normalized to the phase noise spectrum by multiplying  $(2\pi f_{\text{CKV}})^2$ , where  $f_{\text{CKV}}$  is the PLL output frequency. After getting attenuated by the closed-loop transfer function of the PLL, i.e.,  $H_{\text{cl}}(f)$ ,  $\sigma_{\text{TAU,in}}$  contributes to the overall PLL phase noise by

$$\mathscr{L}_{\text{TAU}}(f) = \frac{\sigma_{\text{TAU,in}}^2}{f_{\text{REF}}} \cdot (2\pi f_{\text{CKV}})^2 \cdot |H_{\text{cl}}(f)|^2.$$
(2.33)

# 2.6 Nonlinearity Analysis

### 2.6.1 INL Characterization and Degradation Mechanism

Generally, a nonlinearity of a typical mixed-signal circuit (e.g., DAC and DTC) is characterized by an integral nonlinearity (INL) representing a deviation between the practical and ideal outputs across the input. However, this is inapplicable for TAU, which needs to handle multiple time-domain and digital inputs. However, if the scope is narrowed down to the time-offset cancellation case in a type-II PLL system, the TAU's INL can be well-defined. Consider the corresponding behavior of TAU described in (2.1).  $\Delta t_{\rm S}$  is the time offset to be canceled so it can be regarded as an ideal target, equivalent to the ideal output of a DTC.  $(1 - \phi_{R,frac}) \cdot T_{CKV}$  is the generated term to cancel with  $\Delta t_S$ , thus can be treated as the counterpart of the actual DTC output. Therefore, the cancellation residue  $\Delta t_E$  reflects the TAU's nonidealities.



Figure 2.25: Characterization of the TAU's INL: (a) principle and (b) conceptually expected INL curve.

A conceptual testbench to measure the TAU's INL is illustrated in Fig. 2.25 (a). Two phase-locked clocks, i.e., CKV and FREF, and the digital control target, i.e.,  $\phi_{\rm R,frac}$ , are input to the TAU sub-system (similar as in Fig. 2.9 (a)), emulating the inputs to the TAU in the proposed PLL. Under such an arrangement, the TAU can get a stable time base of  $T_{\rm CKV}$ , a sequence of incremental  $\Delta t_{\rm S}$  ramps, and the corresponding  $\phi_{\rm R,frac}$ , which scales the  $T_{\rm CKV}$  to accurately cancel  $\Delta t_{\rm S}$ . In the ideal case with no analog impairments, the cancellation residue  $\Delta t_{\rm E}$  would reflect the TAU's quantization error (QE), which can be precisely estimated based on the RC encoder structure in Fig. 2.20. However, if the TAU's nonlinearity is included,  $\Delta t_{\rm E}$  will further reflect the INL. Therefore, we can estimate the TAU's INL versus  $\phi_{R,frac}$  as

$$INL(\phi_{R,frac}) = \left[\frac{\Delta t_E(\phi_{R,frac})}{G_{TA} \cdot T_{CKV}} - QE(\phi_{R,frac})\right] \times 2^{10}, \qquad (2.34)$$

where  $\text{QE}(\phi_{\text{R,frac}})$  is the quantization error in the same scale as  $\phi_{\text{R,frac}}$ . After being divided by  $G_{\text{TA}}$ ,  $\Delta t_{\text{E}}$  refers to the TAU's time input on the  $\Delta t_{\text{S}}$ side. This excludes the influence of time amplification, making the INL comparable with that of other time-offset cancellation circuits, such as DTCs. Additionally, the multiplication by 2<sup>10</sup> scales the unit of INL to the LSB of a 10-b converter, which is the case of the implemented TAU. (The unit before the scaling by 2<sup>10</sup> is 1, i.e., characterizing the full range of  $T_{\text{CKV}}$  with 0 ~ 1.)

Fig. 2.25 (b) sketches a conceptually expected INL curve of the TAU. It exhibits a piecewise linear shape due to the TAU's coarse-fine tuning strategy. The eight segments coincide with the 3-b coarse resistive tuning. The vertical offset of each segment results from the nonideality of SR bank units, e.g., charge injection, clock feedthrough, and unit mismatch. The characteristic inside each segment is mainly correlated with the fine capacitive tuning. For example, the slope of each segment results from the  $C_0/C_U$  estimation error in (2.20), and the charge injection of SC-bank units. Since the fine-tuning is determined only by  $N_{\rm C}$  during the first discharge (see Fig. 2.8), which is actually irrelevant for the subsequent coarse tuning behavior, the slopes of all the segments are almost identical.



Figure 2.26: INL curve of the TAU shaped by component mismatch.

One may wonder how the INL curve changes in face of mismatch between the differential WTRs. Actually, the overall piecewise linear feature would remain similar to that in Fig. 2.25 (b), but the offsets and slope values of each segment would change. This can be analyzed by inspecting each term in (2.5) that describes the WTR function. First, consider the offset term  $\tau_{\text{out}} \cdot \ln (V_{\text{init}}/V_{\text{th}})$ , which is supposed to be canceled out in the differential output. The mismatches in  $V_{\text{th}}$  and  $\tau_{\text{out}}$ , i.e., the threshold voltage of the levelcrossing comparator and the RC time constant during the final discharge, would result in a cancellation error which globally offsets the overall INL curve (see Fig. 2.26 left). As for each of the weighted terms, i.e.,  $\tau_{out}/\tau_i \cdot \Delta t_i$ , mismatches in the corresponding discharge RC time constants, i.e.,  $\tau_{out}$  and  $\tau_i$ , would introduce error in the  $\Delta t_i$  scaling. Here, the mismatch of the SR unit dominates that of the RC time constants, since the capacitive mismatch can be addressed by properly sizing the SC units [18]. The detailed effects due to this scaling error are case-dependent. For example, the scaling error would vary the slopes of all segments by the same amount, if it occurred in the fine-tuning discharge (see Fig. 2.26 middle), because this discharge adopts a fixed SR configuration (RT = 8) and the corresponding mismatch introduces a fixed gain error to all the target scaling factor. In contrast, the scaling error would randomly offset each segment if it happened in the coarse-tuning discharge (see Fig. 2.26 right), since the error due to mismatch is  $N_{\rm R}$ -dependent.

## 2.6.2 Simulated INL



Figure 2.27: Simulated INL of TAU at the supply of 1 V and 1.1 V.

Fig. 2.27 shows the INL curve of TAU extracted from post-layout simulations. Under a 1 V supply (the nominal supply of transistors used in the implemented TAU), the INL is 1.7 LSB, corresponding to 0.17% of the full range. This is better than the DTC INL of 0.4% in [4], but worse than that of 0.09% in [62] (both from simulations). The TAU's INL is mainly degraded by the offsets between the coarse-tuning segments, reflecting the contribution from the charge injection of SR units. The INL could be improved to 0.5 LSB if the relative offsets were removed by calibration.

The INL under 1.1 V supply is also shown in Fig. 2.27. The slope of each segment increases significantly, thus degrading the INL to 2 LSB. The

increased slope can be attributed to the nonlinear parasitic capacitance, which varies with supply, thus introducing more error to the estimated capacitance ratio in the RC encoder, i.e.,  $\mathbf{E}(C_0/C_U)$ . After adjusting  $\mathbf{E}(C_0/C_U)$ , the slopes are essentially corrected, and so the INL drops to 1.2 LSB, which is 0.12% of the full range and the same as the DTC INL under 1.1 V in [62].

One may question the advantage of TAU given its apparent lack of superiority in the INL characteristics over those in the best-in-class DTCs, such as [62]. Actually, the INLs presented so far were simulated under ideal constant supply conditions and reflect only the 'static' nonlinearity. In practice, the DTC delay is easily disturbed by instantaneous supply fluctuations and thus suffers from certain 'dynamic' nonlinearity. For this reason, [18] [28] [63] report significant efforts on stabilizing the supply.



Figure 2.28: (Equivalent) delay error under sinusoidal supply fluctuating between 1 V and 1.1 V: (a) Estimated from a virtual DTC emulating the resolution drift behavior in [62], and (b) simulation results of TAU.

This supply-related nonlinearity issue is examined with a 10-b virtual DTC example emulating the resolution drift behavior in [62]. The reported DTC resolution changes (becomes finer) by 14% when the supply increases from 1 V to 1.1 V. Therefore, if the estimated DTC gain,  $K_{\text{DTC}}$ , is not adjusted accordingly, the DTC output delay would exhibit an error that is linearly proportional to the expected value. Figure 2.28 (a) shows the trend lines of the expected delay error of this reference DTC under the supply of 1 V and 1.1 V, with the expected  $K_{\text{DTC}}$  (used for converting the expected delay to

the DTC control word) frozen at the mean value of these two cases. The two trend lines are characterized under a test bench similar to Fig. 2.25 (a), so they converge to 0 at  $\phi_{\rm R,frac} = 1$ , corresponding to the expected delay of 0, and reach the maximum amplitude at  $\phi_{\rm R,frac} = 0$ . One may doubt the efficacy of freezing the estimated  $K_{\rm DTC}$  since a background calibration can constantly track the  $K_{\rm DTC}$  drift. However, the calibration might to be too slow to respond to fast supply disturbances. Fig. 2.28 (a) shows a case with such a fast supply ripple, which sinusoidally fluctuates between 1 V and 1.1 V, in synchronicity with  $\phi_{\rm R,frac}$ . The corresponding delay error of the virtual DTC will oscillate between the two aforementioned trend lines, and the peak-to-peak error can be up to 140 LSB.

For comparison, the  $\Delta t_{\rm S}$  cancellation error of the TAU is simulated under the same supply ripple condition. According to Fig. 2.28 (b), the peak-topeak error is merely ~8 LSB. This benefits from the operating principle of scaling the 'golden' time base, and indicates the TAU would show stronger immunity to aggressors and better 'dynamic' linearity compared with the DTC. One may wonder why the cancellation error of the TAU in face of the supply ripple exceeds the boundaries set by the INL curves under the stable supply cases (i.e., at 1 V and 1.1 V). This comes from our specific WTR implementation, where the bottom plates of the SC units are connected to  $V_{\rm DD}$  (see Fig. 2.17). The supply ripple will affect the internal voltage of the WTRs (i.e.  $V_{\rm C}$ ) through the conducting SC units and parasitic switch capacitance, thus ultimately degrading the INL.

# 2.6.3 INL calibration

According to Fig. 2.25 (b), the INL of TAU is dominated by the coarsetuning offsets and fine-tuning slope, correlated with  $N_{\rm R}$  and  $\phi_{\rm CT}$  in Fig. 2.20, respectively. To combat the INL degradation relevant to these two sources, a piecewise calibration emulating [64] is added to supplement the RC encoder. The calibration operates when the PLL is locked by observing the TDC output, i.e.,  $D_{\rm TDC}$ . As shown in Fig. 2.29, the calibration consists of two parallel paths—one pre-distorts the offset correlated with each possible  $N_{\rm R}$ value, and the other combats the slope relevant to  $\phi_{\rm CT}$ .

Figure 2.29 (a) details the offset calibration. The offset related to each  $N_{\rm R}$  value affects  $D_{\rm TDC}$  (read in the subsequent FREF cycle), and thus can be estimated by accumulating the corresponding  $D_{\rm TDC}$ . This is similar to that in [28].  $\mu_{\rm RT}$  here is a constant controlling the accumulation speed. By subtracting the estimated offsets, i.e., OS0 ~ OS7, from the fine-tuning path, the effects of the coarse-tuning offsets can be compensated. Prior to the subtraction, the estimated offsets are rounded to the same resolution as  $\phi_{\rm CT}$ 



Figure 2.29: Foreground piecewise calibration for the INL of TAU: (a) Offset calibration for each coarse-tuning segment, (b) calibration for the fine-tuning slope.

by a  $\Delta\Sigma$ -modulator to avoid the fine resolution of the offsets being masked by the quantization error of the fine-tuning path. Meanwhile, a constant positive phase  $\phi_{\text{const}}$  is also added in conjunction with the rounded OS to prevent the fine-tuning path underflow due to the potential negative input. Similar to the  $3T_{\rm CKV}/8$  offset for the metastability mitigation, the extra  $\phi_{\rm const}$  would also shift the  $\Delta t_{\rm S}$  range without causing functional issues. While the calibration is running, the OS codes would constantly update until the average  $D_{\rm TDC}$ corresponding to each  $N_{\rm R}$  becomes zero. This indicates that the influences of offsets have been well-compensated, thus becoming invisible to the PLL. Note, since only the relative offsets between the OS codes matter in terms of INL, one specific OS code (arbitrarily chosen) is frozen to 0, thus avoiding a global drift in all the estimated results.

Figure 2.29 (b) depicts the fine-tuning slope calibration, which detects the slope error by correlating (i.e., accumulating the following product)  $D_{\text{TDC}}$  with the fine-tuning target  $\phi_{\text{CT}}$ , similar to the LMS calibration for  $K_{\text{DTC}}$  in [17].  $\mu_{\text{CT}}$  here is a constant controlling the accumulation speed. The correlation output  $N_{\text{URT}}$  is used to correct the capacitance ratio of  $C_0/C_{\text{U}}$ , which significantly influences the fine-tuning slope. Instead of directly updating the estimated  $\mathbf{E}(C_0/C_{\text{U}})$ , which may require long word-length and increased hardware cost, we directly tune the physical ratio of  $C_0/C_{\text{U}}$ : the nominal fixed capacitor  $C_0$  is split into a 'real' fixed  $C'_0$  and an SC-bank with the unit capacitance of  $C'_{\text{U}}$ .  $N_{\text{URT}}$  is dithered by a  $\Delta\Sigma$ -modulator before adjusting the number of active  $C'_{\text{U}}$  to tune the 'real' capacitance ratio  $C_0/C_{\text{U}}$ until the slope error vanishes.



Figure 2.30: Illustration of the FCW<sub>frac</sub> impact on the foreground INL calibration:  $N_{\rm R}$  and  $\phi_{\rm CT}$  steps at FCW<sub>frac</sub>  $\approx 5/8 + 3/32$ .

Since both calibration paths rely on the same  $D_{\text{TDC}}$ , they will likely interfere with each other given that both  $N_{\text{R}}$  and  $\phi_{\text{CT}}$  change at a very slow rate when the PLL operates in a near-integer channel. The mutualdisturbance is attributed to the indistinguishable  $D_{\text{TDC}}$  contribution from the offset and slope errors, and can be suppressed by dithering  $\Delta t_{\text{S}}$ . For example, high-order  $\Delta\Sigma$ -modulators that are commonly used in PLLs to shape the multi-modulus dividers' quantization noise can dither  $\Delta t_{\text{S}}$ . However, using a high-order  $\Delta\Sigma$ -modulator increases the  $\Delta t_{\text{S}}$  range and degrades noise performance [4]. To minimize the mutual-interference without introducing any noise penalty, the offset and slope calibrations are performed in foreground at specific large fractional FCWs, e.g.,

$$FCW_{frac} = \frac{5}{8} + \frac{3}{32} + \epsilon,$$
 (2.35)

where  $\epsilon$  is a tiny fractional number helping  $\phi_{\rm R,frac}$  to traverse all the possible codes. As sketched in Fig. 2.30, 5/8 in FCW<sub>frac</sub> allows  $N_{\rm R}$  to circulate fast in its full range. Consequently, the  $D_{\rm TDC}$  component due to the  $N_{\rm R}$ -related offsets quickly fluctuates around their mean value, thus it can be easily filtered out by averaging. This avoids a  $N_{\rm R}$ -dependent disturbance on the settling behavior of the slope calibration. Similarly, 3/32 in  $FCW_{\rm frac}$  mitigates the disturbance during the offset calibrations due to the slope error. After the calibrations settle, the results are frozen and used for nearby channels. The absence of background calibration would not significantly degrade the TAU's performance since it is insensitive to voltage and temperature variations (due to the utilization of 'golden' time-base).

The aforementioned calibrations are simulated using a PLL model (emulating the implemented one) with exaggerated RC mismatch to verify their performance. As shown in Fig. 2.31, the offset compensation codes, i.e., OS's in subfigure (a), and the slope compensation code, i.e.,  $N_{\text{URT}}$  in subfigure (b), settle within around 100 µs. After applying the calibration results, the TAU's INL with the quantization error (QE)<sup>1</sup> is suppressed from >20 LSB to around 1LSB, as shown in Figure 2.31(c), proving the effectiveness of the proposed calibrations.

<sup>&</sup>lt;sup>1</sup>In the case with RC mismatch, the practical  $C_0/C_U$  deviates from the expected  $\mathbf{E}(C_0/C_U)$  to compensated the mismatch induced slope error. Consequently, the realistic QE is unequal to the predicted value utilizing  $\mathbf{E}(C_0/C_U)$ , and thereby cannot be perfectly eliminated. This is the reason why the QE is preserved here.



Figure 2.31: Simulation results of the TAU's INL calibrations: (a) Convergence curves of the offset codes, i.e., OS1,..., OS7 in Fig. 2.29. (b) Convergence curve of  $N_{\text{URT}}$  tackling the error in the fine-tuning slope. (c) Comparison of the INL with QE before and after calibrations.



Figure 2.32: (a) Chip micrograph and (b) power consumption breakdown.

# 2.7 Measurement Results

The proposed PLL is fabricated in 40-nm CMOS and occupies an active area of  $0.31 \text{ mm}^2$  [excluding output drivers and debugging SRAMs, see Fig. 2.32 (a)]. With a reference clock of 40 MHz, it synthesizes 2.6 GHz to 4.1 GHz. Fig. 2.32 (b) shows its power breakdown at 2668.2 MHz. The overall PLL consumes 3.48 mW, which is dominated by the DCO and its buffer, costing 2.3 mW at a 1.1 V supply. All other blocks are supplied with 1.0 V. The power consumption for the time-mode (e.g., TAU, TDC, and the clock divider for DCO dithering) and digital logic parts are respectively 0.65 mW and 0.52 mW.

Fig. 2.33 shows the measured phase noise (PN) at 2668.2 MHz. The integrated rms jitter (integrated from 10 kHz to 40 MHz, and including all spurs) is 182 fs, almost identical to that in the nearby integer-N channel (177 fs at 2640 MHz). Considering the total power consumption of 3.48 mW, this PLL achieves a jitter-power FoM ([65]) of -249.4 dB. Fig. 2.34 compares



Figure 2.33: Measured PN at 2668.2 MHz.



Figure 2.34: Comparison between the measured PN in Fig. 2.33 and its s-domain model prediction. In the jitter breakdown table,  $\sigma_{\rm KT/C}$  is estimated with  $C_0$  of 1.6 pF and discharge slope of 33.8  $\mu$ V/ps; others are obtained by simulation. These jitter contributions are combined as per (2.32) to estimate the TAU composite noise.

the measured PN with its s-domain prediction, indicating a tight agreement at offset frequencies above 50 kHz. In this s-domain model, the input referredjitter of TAU is 402 fs, estimated by simulating the jitter of each sub-circuit and combining the contributors via (2.32). The corresponding contribution to phase noise is obtained by an amended formula to (2.33) that combines



Figure 2.35: Measured rms jitter (integrated from 10 kHz to 40 MHz) across carrier frequencies with fractional FCW (FCW<sub>frac</sub>) of 0.7.

the sub-block's noise in spectrum domain. The noise contribution of each sub-circuit is also listed in Fig. 2.34. Fig. 2.35 shows the integrated rms jitter across frequencies with the same fractional FCW as 2668.2 MHz, i.e., FCW<sub>frac</sub>  $\approx 0.7$ . The measured jitter degrades as the frequency increases. We suspect the dramatic degradation between 3300 MHz and 3800 MHz is attributed to the nearby inductors in this SoC as well as unoptimized implementation of the DCO switched-capacitor tuning banks to support the wideband direct phase modulation [66].

To demonstrate the TAU's advantages in suppressing fractional spurs, the PLL output spectrum is measured in a near-integer channel of 2680.04 MHz (FCW  $\approx 67.00025$ ). According to Fig. 2.36 (a), the worst-case fractional spur is -44.67 dBc. Note that they are measured *before* any TAU calibration, e.g., for global gain and integral nonlinearity (INL). This compares favorably with the literature reports of worst-case fractional spurs in DTC-based PLLs that adopt only gain calibration but with no further DTC linearity enhancement techniques, e.g., -37 dBc in [44], and -42 dBc in [17]. Our fundamental design choice—adopting  $T_{\rm CKV}$ , the PLL carrier period, as the basis for the time offset cancellation— is thus validated. This 'golden' base automatically scales the global gain of the TAU transfer function, thus avoiding any need for the corresponding calibration.

The fractional spurs in Fig. 2.36 (a) are dominated by the TAU's INL, chiefly due to the coarse-tuning non-ideality and the gain error in finetuning. After compensating the INL with the piecewise calibration, the worst-case fractional spur becomes  $-60.74 \,\mathrm{dBc} \ @50 \,\mathrm{kHz}$ , the 5<sup>th</sup> fractional spur in Fig. 2.36 (b). In this scenario, the integrated rms jitter is 236 fs (shown in Fig. 2.37). The worst-case fractional spur levels and integrated rms jitter are swept for at the fractional channels close to 2680 MHz. As shown in Fig. 2.38, all the spur levels are below  $-59 \,\mathrm{dBc}$ .

Since the TAU utilizes the time basis of  $T_{\rm CKV}$ , which is constantly tracked by the PLL, the TAU-based PLL is expected to exhibit inherent resilience to







Figure 2.37: Measured PN at a near integer channel of 2680.04 MHz (FCW  $\approx$  67.00025), corresponding to the condition of Fig. 2.36(b).



Figure 2.38: The worst-case fractional spur level and the corresponding integrated rms jitter versus fractional FCW (FCW<sub>frac</sub>), with integer FCW fixed at 67.

environmental changes, i.e., supply and temperature drifts. To prove this, we froze the TAU's INL calibration setting, then measured the spur levels under certain environmental changes: From Fig. 2.36 (b) to Fig. 2.36 (c), the TAU's supply was increased from 1.0 V to 1.1 V, and the worst spur remains -54.37 dBc. From Fig. 2.36 (c) to Fig. 2.36 (d), the environment temperature was increased from  $19 \,^{\circ}\text{C}$  to  $85 \,^{\circ}\text{C}$ , and the worst spur level is still below  $-51.7 \,^{\circ}\text{dBc}$ . These are noteworthy improvements compared with the DTC-based counterparts, as they would generate substantial spurs if their transfer function drift could not be compensated. For example, [62] reported a 14% DTC resolution drift when its supply increased from 1.0 V to 1.1 V. As measured in [44], a 10% DTC gain error can cause an in-band

fractional spur higher than  $-30 \, \text{dBc}$ .

Table 2.1 summarizes and compares the performance of the proposed PLL with the state-of-the-art fractional-N PLLs. This work achieves the competitive spur level below  $-59 \,\mathrm{dBc}$ , and a state-of-the-art trade-off between jitter and power, i.e., FoM of -249.4 under the low power constraint.

# 2.8 Conclusions

This chapter introduces a fractional-N PLL based on the proposed timemode arithmetic unit (TAU), which extracts the phase error by calculating a weighted sum of its time-domain inputs derived from timestamps of the reference and DCO clocks. The prototype PLL demonstrates low-spur levels, which are robust under supply and temperature drift. Such a spurious performance benefits from the phase-error-extraction strategy—scaling the 'golden' time base, i.e. DCO period, to cancel the phase detector input—which automatically corrects the TAU's transfer function. The methodology-level improvement indicates a potential for exploring this new phase-detection category for low-spur clock generation.
|                                                    |                                             | Table 2.1: C               | omparison w       | ith state-of-                      | the-art fra             | ctional-N         | PLLs               |                  |          |
|----------------------------------------------------|---------------------------------------------|----------------------------|-------------------|------------------------------------|-------------------------|-------------------|--------------------|------------------|----------|
|                                                    | this map                                    | ISSCC'16                   | JSSC'18           | JSSC'20                            | JSSC'21                 | VLSI'21           | JSSC'21            | JSSC'22          | ISSCC'21 |
|                                                    |                                             | [19]                       | [67]              | [63]                               | [15]                    | [68]              | [4]                | [69]             | [70]     |
| Process (nm)                                       | 40                                          | 28                         | 65                | 28                                 | 130                     | 65                | 14                 | 28               | 65       |
| Phase detection                                    | $\mathbf{TAU}$                              | DTC                        | DTC + TA          | DTC                                | Voltage                 | DTC               | DTC                | DTC              | Voltage  |
| strategy                                           | + TDC                                       | $+ \text{ SPD}^1$          | + TDC             | $+ BBPD^2$                         | domain                  | $+ \text{ SPD}^1$ | + SPD <sup>1</sup> | $+ BBPD^2$       | domain   |
| Ref. freq. (MHz)                                   | 40                                          | 40                         | $26 \times 2$     | 500                                | 80                      | 50                | $76.8 \times 2$    | 250              | 150      |
| Osc. freq. (GHz)                                   | 2.68                                        | 3.88                       | 2.44              | 13.5                               | 3.36                    | 3.3               | 6.2                | 13               | 15       |
| Int. rms jitter (fs)                               | 182                                         | 159                        | 535               | 66.2                               | 101                     | 263               | 96.3               | 99.6             | 104      |
| Worst frac. spur (dBc)                             | -59                                         | $-57.5^{3}$                | -56               | -61                                | -56                     | -53               | $-68^{3}$          | -51.1            | -61      |
| Ref. spur (dBc)                                    | -73.5                                       | $-81.5^{3}$                | -72               | -80.1                              | -79                     | -80               | $-63.6^{3}$        | -73.2            | NA       |
| Built-in resilience to<br>supply and temperature   | $\mathbf{Yes}$                              | No                         | No                | No                                 | Yes                     | No                | No                 | No               | No       |
| Power (mW)                                         | 3.5                                         | 8.2                        | 0.98              | 19.8                               | 9.2                     | 4.6               | 8.2                | 10.8             | 7.3      |
| $FoM^4$ (dB)                                       | -249.4                                      | -246.8                     | -246              | -250.6                             | -250.3                  | -246              | -251.2             | -249.7           | -251     |
| $\mathrm{FoM_N}^5$ (dB)                            | -267.7                                      | -266.7                     | -265.7            | -264.9                             | -266.5                  | -264.2            | -270.3             | -266.9           | -271     |
| Active area $(mm^2)$                               | 0.31                                        | 0.3                        | 0.23              | 0.17                               | 0.27                    | 0.48              | 0.31               | 0.21             | 0.21     |
| <sup>1</sup> Sampling phase detector               | <sup>2</sup> Bang-bang pha<br>$\frac{1}{2}$ | ase detector $\frac{1}{2}$ | $^{3}$ Normalized | to osc. frequ                      | ency $(1 - \sqrt{2})^2$ |                   |                    |                  |          |
| $4 n_{1} n_{1} n_{1} = 10 n_{1} n_{1} n_{1} n_{1}$ |                                             | $[1 X Y_{} 1]$             |                   | $O = 1_{2} \sim 1_{1} \circ 1_{2}$ | 2.1~ 1/ ~~ 17           | / 1               |                    | find a long find |          |

 $foM = 10 \cdot \log_{10} \left[ (\text{jtter}/1 \text{ s})^2 \cdot \text{power}/1 \text{ mW} \right]$  $^{\circ}$ FoM<sub>N</sub> = 10  $\cdot \log_{10} \left[ (\text{jtter/1s})^{2} \cdot \text{power/1 mW} / (\text{osc.freq./ref.freq.}) \right]$ 

 $\mathbf{2}$ 

62

# C H A P T E R

Canceling Fundamental Fractional Spurs Arising from Self-Interference

While carrying out the measurements of the PLL chip described in Chapter 2, we have noticed some unexpected phenomena indicating that the fundamental fractional spurs are no longer dominated by the TAU nonlinearity but rather by some mutual interference between the reference clock and DCO circuitry through various parasitic coupling paths. Some counter-measures were taken in the revised chip to eliminate or at least attenuate these coupling paths but unfortunately they were not proven entirely sufficient. Therefore, we have developed a digitally intensive strategy that reuses the hardware compensating the TAU nonlinearity to cancel the fundamental fractional spurs arising from any potential parasitic self-interference mechanisms.

This chapter<sup>1</sup> summarizes the experimental routine we have developed to distinguish the spurs caused by various interference mechanisms and proposes a method to cancel them. We specifically handle the spurs raised by the self-interference, i.e., interference sources originating from *within* the PLL itself, e.g., parasitic coupling between the reference clock and DCO. The chapter is organized as follows: Section 3.1 discusses the characteristics of fractional spurs raised by different mechanisms, thus providing the foundation to distinguish the mechanisms and to develop the proposed spur cancelation method. Section 3.2 analyzes the features of self-interference, paving the way for the spur-cancelation investigation in the subsequent two sections. Sec-

<sup>&</sup>lt;sup>1</sup>Main content of this chapter is to be submitted to IEEE Journal of Solid-State Circuits.

tion 3.3 experimentally verifies the principle underlying the spur cancelation. Section 3.4 explains the details of the digitally intensive method to cancel the spurs caused by the self-interference to the DCO. Section 3.5 demonstrates the cancelation performance of the proposed method. Finally, Section 3.6 concludes this chapter.

#### 3.1 Frequency-Dependent Behavior of Spurs

Figure 3.1 depicts a simplified diagram of a digital type-II PLL, which generates a variable clock (CKV) at frequency  $f_0$  according to a reference clock (FREF) with frequency  $f_{\text{REF}}$ . The frequency multiplication ratio of  $f_0/f_{\text{REF}}$  is defined by the frequency control word (FCW). During the PLL operation, the phase detector constantly samples the CKV phase at the FREF timing grid, then compares it with the normalized prediction,  $\phi_{\text{R}}$ , obtained by accumulating FCW and consisting of a fractional part  $\phi_{\text{R,frac}}$ and an integer part  $\phi_{\text{R,int}}$ , in order to extract the phase error of CKV. The detected error first feeds into the digital loop filter (DLF), consisting of the parallel proportional and integration paths, respectively scaled by coefficients  $\alpha$  and  $\rho$ . Then, the filtered error is denormalized into the oscillator tuning word (OTW) by  $f_{\text{REF}}/\hat{K}_{\text{DCO}}$ , where  $\hat{K}_{\text{DCO}}$  is the estimated gain (i.e., step size) of the digitally controlled oscillator (DCO). Finally, OTW tunes the DCO frequency to correct the phase error on the output clock CKV.

To assist with analyzing the PLL behavior in face of disturbances, Fig. 3.1(b) sketches the phase-domain model of subfigure (a). Signals  $\phi_{\text{REF}}$  and  $\phi_{\text{V}}$  are respectively the normalized<sup>1</sup> excess phase of FREF and CKV, which are additional phase departure components from their respective carrier phase<sup>2</sup>. All phase signals in this model refer to the CKV period, except for  $\phi_{\text{REF}}$ , which refers to the FREF period. Consequently,  $\phi_{\text{REF}}$  is rescaled by multiplying FCW before subtracting  $\phi_{\text{V}}$ . In addition,  $\hat{K}_{\text{DCO}}$  is assumed to be well estimated so as to perfectly cancel out within the DCO resolution, thereby invisible in this phase domain model.

Generally, a PLL suffers from two types of interference mechanisms, which generate spurs in the DCO output spectrum under the natural condition that the corresponding disturbance signals are periodic. The first type may originate in the circuitry along the reference clock (FREF) path, but ultimately injects disturbance into the loop through the phase detector, as

<sup>&</sup>lt;sup>1</sup>Generally, in this thesis,  $\phi$  represents a normalized phase and  $\theta$  represents a  $2\pi$ -periodic phase.

<sup>&</sup>lt;sup>2</sup>Due to this consideration, the  $\phi_{\rm R}$ -related component, which predicts the ideal CKV carrier phase in Fig. 3.1(a), is not visible in Fig. 3.1(b).



Figure 3.1: Block diagram (a) and phase domain model (b) of a type-II PLL.

 $\phi_{i,IB}$  in Fig. 3.1(b). The transfer function from  $\phi_{i,IB}$  to  $\phi_V$  reads as

$$\frac{\phi_{\rm V}(s)}{\phi_{\rm i,IB}(s)} = \frac{\alpha \cdot s/f_{\rm REF} + \rho}{(s/f_{\rm REF})^2 + \alpha \cdot s/f_{\rm REF} + \rho},\tag{3.1}$$

which is low-pass and indicates the  $\phi_{i,IB}$ -induced spurs can be attenuated by lowering the PLL bandwidth, more specifically through decreasing  $\alpha$ . Therefore, such interference is named "in-band interference" in this work. An example of this would be an interference signal that superimposes on FREF and disturbs the FREF clock buffer's output delay [71]. Another example would be a supply ripple which modulates the output time of a digital-to-time converter (DTC) [18], a sub-block inside the phase detector. From a behavioral perspective, the nonlinearity of the phase detection blocks (e.g., DTC nonlinearity [28]) disturbs the PLL in the same way as  $\phi_{i,IB}$ would. Thus, this can also be categorized as a source of  $\phi_{i,IB}$  for conceptual convenience.

The second type of interference mechanism is the parasitic coupling to the DCO, denoted as  $\phi_{i,DCO}$  in Fig. 3.1(b). Such interference can directly disturb (as a physical mechanism) either the DCO phase or its frequency. However, both types of influence can be time-averaged to a disturbing frequency for the sake of simplifying analysis [72]. Therefore, Fig. 3.1(b) interprets  $\phi_{i,DCO}$  as disturbing the DCO frequency by  $f_{i,DCO}$  which gradually affects  $\phi_V$  by

means of the DCO's phase integration property (described by 1/s). The resulting phase error exhibits a band-pass frequency characteristic according to the following transfer function from  $\phi_{i,DCO}$  to  $\phi_{V}$ , i.e.,

$$\frac{\phi_{\rm V}(s)}{\phi_{\rm i,DCO}(s)} = \frac{1}{\alpha + (s/f_{\rm REF} + \rho \cdot f_{\rm REF}/s)}.$$
(3.2)

The peak value of this function is  $1/\alpha$  [reached at frequency  $f = \sqrt{\rho}/(2\pi)$ ], indicating the  $\phi_{i,\text{DCO}}$ -induced spurs can be suppressed by increasing  $\alpha$  or, in other words, by widening the PLL bandwidth. This is the opposite trend compared with the spurs raised by  $\phi_{i,\text{IB}}$ . Therefore, these two types of interference-induced spurs can be distinguished by observing how the spur levels change with  $\alpha$  (or generally with the PLL bandwidth).

The above discussion considers  $\phi_{i,IB}$  and  $\phi_{i,DCO}$  independently. However, if  $\phi_{i,IB}$  and  $\phi_{i,DCO}$  originate from synchronized sources, i.e., at the same frequency and with a fixed phase offset,  $\phi_{i,IB}$  and  $\phi_{i,DCO}$  will exhibit a fixed phase and amplitude relationship, e.g.,

$$\phi_{i,\text{IB}}(s) = \lambda \times \phi_{i,\text{DCO}}(s), \qquad (3.3)$$

where  $\lambda$  is a *complex* number. Interestingly, the effects of synchronous  $\phi_{i,\text{IB}}$  and  $\phi_{i,\text{DCO}}$  ultimately imposed on  $\phi_{V}$  may cancel each other at a particular frequency according to

$$\phi_{\rm V}(s) = \frac{\phi_{\rm V}(s)}{\phi_{\rm i,IB}(s)} \phi_{\rm i,IB}(s) + \frac{\phi_{\rm V}(s)}{\phi_{\rm i,DCO}(s)} \phi_{\rm i,DCO}(s) = \frac{(\alpha\lambda + 1) \cdot s/f_{\rm REF} + \lambda\rho}{\alpha + (s/f_{\rm REF} + \rho \cdot f_{\rm REF}/s)} \phi_{\rm i,DCO}(s),$$
(3.4)

which contains a zero at

$$f_{\rm z} = -\frac{\lambda\rho}{\alpha\lambda + 1} \cdot \frac{f_{\rm REF}}{2\pi}.$$
(3.5)

Therefore, if the  $\phi_{i,IB}$  and  $\phi_{i,DCO}$  interference signals individually produce fractional spurs, the combined spur level can be significantly suppressed by exploiting this zero. Its position is influenced by the relative phase and amplitude relationship between  $\phi_{i,IB}$  and  $\phi_{i,DCO}$  (reflected by  $\lambda$ ), and the PLL loop dynamic (reflected by  $\alpha$  and  $\rho$ ).

#### **3.2** Theory of Synchronous Self-Interference

According to Section 3.1, a large class of spurs can be canceled out provided they are caused by *synchronized* sources. In a locked PLL, most of the self-



Figure 3.2: Waveform diagram of the key signals in the interference model in Fig. 3.1. The  $\phi_{i,\text{IB}}$  and  $\phi_{i,\text{DCO}}$  signal patterns are synchronous with the  $\phi_{\text{R,frac}}$  sequence. Note,  $\phi_{0,\text{IB}}$  and  $\phi_{0,\text{DCO}}$  are some constant phase offsets.

interference signals, which originate from within the PLL, are synchronized, i.e., each showing a fixed phase offset relative to the  $\phi_{\rm R}$  sequence, or more accurately its wrapped version—the  $\phi_{\rm R,frac}$  sequence, as illustrated in Fig. 3.2.<sup>1</sup> The reason for synchronicity can be understood with an example of an inband phase error pattern raised by the phase-detection nonlinearity (which is also categorized as an in-band interference from the behavioral perspective): In a PLL as in Fig. 3.1(a), the phase detector usually adopts a digital-totime converter (DTC) front-end to cancel the deterministic component of the CKV's fractional phase according to  $\phi_{\rm R,frac}$  [73], so that the residue can represent the CKV's random phase error. However, due to the DTC's nonlinearity, a  $\phi_{\rm R,frac}$ -correlated error is added on top of the residue, which behaves as an in-band interference from the PLL behavioral perspective. Naturally, this interference pattern is synchronous with  $\phi_{\rm R,frac}$ .

In a practical PLL, the DCO and FREF circuitry may interfere with each other through various parasitic coupling paths, thereby creating significant fractional spurs. The interference patterns are also synchronous with  $\phi_{\rm R,frac}$ . This section will discuss the synchronicity and pattern shapes of these mutually interfering signals, paving the way for suppressing the associated spurs.

<sup>&</sup>lt;sup>1</sup>Note that the *synchronized* phase relationship is more general than the narrow case of clock edge *synchronization*. The former requires the aligned/synchronous clock edges to be constrained by a fixed phase offset, while the latter requires the phase offset to be (nearly) zero.

#### 3.2.1 Synchronous Interference from FREF to DCO

#### 3.2.1.1 Qualitative Analysis of the Interference Pattern and the Resulting Spurs

Waveform diagram in Fig. 3.3 illustrates how FREF can disturb the DCO phase that is embedded in the DCO waveform  $v_{\rm DCO}(t)$  (i.e., before being rectified or sliced to CKV by a DCO buffer). The FREF clock is typically input to the chip as a sinusoidal waveform of lower amplitude but then its edges are sharpened by an on-chip reference buffer [30] [74], which consumes a large transient current. A tiny portion of the current may be injected into the DCO through various parasitic paths, in the end disturbing the  $v_{\rm DCO}(t)$  waveform and consequently its phase. The injected current  $i_{\rm inj}(t)$ is ideally represented as periodic impulses occurring around the FREF's significant (here, falling) edges. This is because the transient current of the reference buffer, the root cause of  $i_{inj}(t)$ , is predominantly consumed by a significant-FREF-edge associated transistor, whose size is particularly increased to minimize the jitter degradation [75]. Although the magnitude of the  $i_{ini}(t)$  impulses is the same at each FREF cycle, the impact on the DCO phase varies and can be estimated by the DCO's impulse-sensitivity function (ISF), represented by the  $2\pi$ -periodic  $\Gamma[\theta_V(t)]$ , where  $\theta_V(t)$  is the instantaneous DCO phase. Because the PLL constantly tracks the DCO phase, the phase disturbance due to  $i_{inj}(t)$  cannot grow excessively large. Consequently,  $\phi_{R,frac}$  can always reliably represent the DCO phase at the FREF grid, and the phase disturbance pattern resembles the  $\Gamma(2\pi\phi_{\rm R,frac}[n])$ sequence.

Considering  $\phi_{\text{R,frac}}$  is generated by accumulating FCW at the FREF rate [see Fig. 3.1(a)], the fluctuation frequency of  $\phi_{\text{R,frac}}$  can be precisely reconstructed with FCW<sub>frac</sub>, the fractional part of FCW, i.e., FCW<sub>frac</sub>  $\cdot f_{\text{REF}}$  or  $(1 - \text{FCW}_{\text{frac}}) \cdot f_{\text{REF}}$  (if the range of  $0 \sim f_{\text{REF}}$  is considered). Consequently, the DCO phase disturbance pattern resembling  $\Gamma(2\pi\phi_{\text{R,frac}}[n])$  also fluctuates at the same frequencies, resulting in fractional spurs at the offset frequencies equal to or of integer multiples of FCW<sub>frac</sub>  $\cdot f_{\text{REF}}$  and  $(1 - \text{FCW}_{\text{frac}}) \cdot f_{\text{REF}}$ , as shown in Fig. 3.4; the spurs at higher-order harmonics are ignored for simplicity. Interestingly, the solid-line spurs at the offsets of  $-\text{FCW}_{\text{frac}} \cdot f_{\text{REF}}$  and  $(1 - \text{FCW}_{\text{frac}}) \cdot f_{\text{REF}}$  (relative to the carrier at FCW  $\cdot f_{\text{REF}}$ ) are located exactly at the absolute FREF harmonics, i.e., FCW<sub>int</sub>  $\cdot f_{\text{REF}}$ , and at (FCW<sub>int</sub> + 1)  $\cdot f_{\text{REF}}$ . Consequently, these spurs may be intuitively attributed to the disturbance of FREF harmonics, as in [76].

Since the fractional spurs closer to the carrier tend to be stronger [due to the lower suppression by the PLL dynamics, e.g., the low-pass filtering in (3.1) and the band-pass filtering in (3.2)], this work focuses on the spurs at the lower offset frequency, i.e. either  $FCW_{frac} \cdot f_{REF}$  or  $(1 - FCW_{frac}) \cdot f_{REF}$ .



Figure 3.3: Waveforms illustrating how FREF events can disturb the DCO phase that is embedded in the waveform  $v_{\text{DCO}}(t)$ . This is by means of injecting current  $i_{\text{inj}}(t)$  into a locked PLL.

In other words, we concentrate on the *fundamental* fractional spurs at the offset frequency of  $|FCW_{frac,s}| \cdot f_{REF}$ , where  $FCW_{frac,s}$  is the signed fractional FCW and equals the difference between FCW and its closest integer, i.e.,

$$FCW_{frac,s} = FCW - |FCW].$$
(3.6)

#### 3.2.1.2 Quantitative Analysis

To explore the possibility of canceling the fundamental fractional spurs by utilizing the aforementioned zero in (3.4), the waveform of the DCO interference  $[\phi_{i,DCO} \text{ or } f_{i,DCO} \text{ in Fig. 3.1(a)}]$  should be first mathematically described as a means of assisting with searching for or even designing the required



Figure 3.4: PLL's output spectrum with spurs raised by the interference coupled from FREF.

in-band anti-interferer  $[\phi_{i,\text{IB}} \text{ in } (3.3)]$ . Hence, the DCO phase perturbation shall be quantitatively analyzed in the remaining subsection.

The  $2\pi$ -periodic *total* phase of the DCO is represented as

$$\theta_{\rm V} = 2\pi f_0 t + \theta_{\rm V,init} + \theta_{\rm R2V}(t), \qquad (3.7)$$

where  $f_0$  is the DCO oscillation frequency,  $\theta_{V,\text{init}}$  is the initial phase at t = 0, and  $\theta_{R2V}$  is the *excess* phase due to the  $i_{\text{inj}}(t)$  disturbance. According to [72], the instantaneous angular frequency of  $\theta_{R2V}$  can be represented by

$$\frac{\mathrm{d}\theta_{\mathrm{R2V}}(t)}{\mathrm{d}t} = \tilde{\Gamma}[\theta_{\mathrm{V}}(t)]i_{\mathrm{inj}}(t), \qquad (3.8)$$

where  $\tilde{\Gamma}(\theta)$  is the  $2\pi$ -periodic  $\Gamma(\theta)$  (DCO's ISF) normalized by the maximum charge displacement across the corresponding node capacitor. Considering that  $\theta_{\rm V}(t)$  is constantly tracked by the PLL,  $\theta_{\rm R2V}(t)$  can be regarded as a tiny perturbation on the ideal DCO phase  $(2\pi f_0 t + \theta_{\rm V,init})$ . Hence,  $\tilde{\Gamma}[\theta_{\rm V}(t)]$ can be approximated as  $\tilde{\Gamma}(2\pi f_0 t + \theta_{\rm V,init})$ . Moreover, the periodicity of  $\tilde{\Gamma}(t)$ and  $i_{\rm inj}(t)$  allows us to expand these two functions with a Fourier series and rewrite (3.8) as

$$\frac{\mathrm{d}\theta_{\mathrm{R2V}}(t)}{\mathrm{d}t} = \left[\frac{\Gamma_0}{2} + \sum_{m=1}^{\infty} |\tilde{\Gamma}_{\mathrm{m}}| \cos\left(2\pi m f_0 t + m \theta_{\mathrm{V,init}} + \angle \tilde{\Gamma}_{\mathrm{m}}\right)\right] \\
\cdot \left[\frac{I_{\mathrm{inj},0}}{2} + \sum_{k=1}^{\infty} |I_{\mathrm{inj},k}| \cos\left(2\pi k f_{\mathrm{REF}} t + \angle I_{\mathrm{inj},k}\right)\right],$$
(3.9)

where  $\tilde{\Gamma}_{\rm m}$  and  $I_{\rm inj,k}$  are, respectively, the complex Fourier coefficients of  $\tilde{\Gamma}(t)$ and  $i_{\rm inj}(t)$ . Abundant inter-modulation terms in this equation result in all the sinusoidal phase-modulation components in  $\theta_{\rm R2V}(t)$ . According to [77], these sinusoidal components can be regarded as baseband signals that mix with the ideal DCO carrier (at the frequency of  $f_0$ ) and finally become spurs at the corresponding offset frequencies. Therefore, only the low-frequency components in  $d\theta_{\rm R2V}(t)/dt$  could constitute the root cause of the fundamental fractional spurs at  $\pm |\text{FCW}_{\text{frac},\text{s}}| \cdot f_{\rm REF}$ , and so this is the focus in this work. In addition, noticing that  $|\tilde{\Gamma}_1|$  is usually the largest among  $|\tilde{\Gamma}_{\rm m}|$ 's (e.g., ISF of a conventional LC oscillator is almost sinusoidal [78], thus dominated by the fundamental term with coefficient  $|\tilde{\Gamma}_1|$ ), we only search for the root cause of the fundamental fractional spurs among the low-frequency (LF) inter-modulation terms containing  $|\tilde{\Gamma}_1|$ , and find two candidates represented by

$$\frac{\mathrm{d}\theta_{\mathrm{R2V}}(t)}{\mathrm{d}t}|_{\mathrm{LF},k} = |\frac{\tilde{\Gamma}_{1}I_{\mathrm{inj},k}}{2}|\cos[2\pi f_{\mathrm{im}}(k)t + \angle I_{\mathrm{inj},k} - \theta_{\mathrm{V,init}} - \angle\tilde{\Gamma}_{1}], \qquad (3.10)$$

70

where  $f_{\rm im}(k)$  is the inter-modulation frequency, i.e.,

$$f_{\rm im}(k) = k f_{\rm REF} - f_0,$$
 (3.11)

and  $k = \text{FCW}_{\text{int}}$ ,  $\text{FCW}_{\text{int}} + 1$ , with the integer part of FCW denoted as FCW<sub>int</sub>. These two  $f_{\text{im}}(k)$ 's coincide with the offset frequencies of the solidline spurs in Fig. 3.4, i.e.,  $-\text{FCW}_{\text{frac}} \cdot f_{\text{REF}}$  and  $(1 - \text{FCW}_{\text{frac}}) \cdot f_{\text{REF}}$ . Therefore, the corresponding  $d\theta_{\text{R2V}}(t)/dt|_{\text{LF},k}$  term could aptly represent the pattern of DCO interference frequency [proportional to  $f_{i,\text{DCO}}$  in Fig. 3.1(b)], which causes fractional spurs at  $\pm \text{FCW}_{\text{frac},s} \cdot f_{\text{REF}}$ .

Considering FCW =  $f_0/f_{\text{REF}}$ , the time-varying phase of  $d\theta_{\text{R2V}}(t)/dt|_{\text{LF},k}$ observed at the FREF grid (e.g., at  $t = n \cdot T_{\text{REF}}$ , where *n* is an arbitrary integer) can be represented by

$$2\pi f_{\rm im}(k)t = 2\pi \cdot n \cdot (k - {\rm FCW})$$
  
=  $2\pi (p - \phi_{\rm R, frac}[n]),$  (3.12)

where p is an integer. Therefore,  $d\theta_{R2V}(t)/dt|_{LF,k}$  resembles and is synchronous with the sequence of  $\sin(2\pi\phi_{R,\text{frac}}[n])$ . Hence, it is possible to cancel such  $d\theta_{R2V}(t)/dt|_{LF,k}$ -originated spurs by adding in-band interference of a scaled and phase-shifted  $\sin(2\pi\phi_{R,\text{frac}}[n])$  sequence according to (3.4).

One might notice that the fractional spurs are always present in pairs, i.e., equally spaced on both sides of the carrier in Fig. 3.4, and wonder whether the pair can be canceled by the single zero in (3.4). In fact, the DCO phase perturbation merely fluctuates at a single frequency  $f_{im}(k)$ , according to  $\theta_{R2V}(t)|_{LF,k} = A_k \sin[2\pi f_{im}(k)t + \theta_k]$ , which is obtained by integrating  $d\theta_{R2V}(t)/dt|_{LF,k}$  over time [72] with  $A_k$  and  $\theta_k$  conceptually representing the amplitude and phase offset, respectively. This single-frequency phase error shows up in the total phase of DCO (see (3.7)) as a tiny perturbation, and results in the DCO waveform proportional to

$$\sin[2\pi f_0 t + \theta_{\rm V,init} + \theta_{\rm R2V}(t)|_{\rm LF,k}] \approx \sin(2\pi f_0 t + \theta_{\rm V,init}) + \frac{A_{\rm k}}{2} \sin\{2\pi [f_0 + f_{\rm im}(k)] \cdot t + \theta_{\rm V,init}\} - \frac{A_{\rm k}}{2} \sin\{2\pi [f_0 - f_{\rm im}(k)] \cdot t + \theta_{\rm V,init}\},$$
(3.13)

where the first term stands for the ideal carrier, and the last two terms represent the double-sided spurs around the carrier. Therefore, the doublesided spurs result from a single-side phase perturbation, as predicted by the frequency modulation theory [77]. In other words, once we have canceled the interference component at the frequency of  $f_{\rm im}$ , the spurs on both sides of the carrier (with the offset frequency of  $\pm |f_{\rm im}|$ ) will automatically disappear. In addition, because this work focuses on canceling the fundamental fractional spurs, it cares only about the perturbation at frequency  $f_{\rm im} = -\text{FCW}_{\rm frac} \cdot f_{\rm REF}$  or  $f_{\rm im} = (1 - \text{FCW}_{\rm frac}) \cdot f_{\rm REF}$  (according to (3.11)), depending on which one exhibits a smaller absolute value. So, these two possible frequencies are finally unified as the spur-cancellation (SC) frequency of

$$f_{\rm SC} = -FCW_{\rm frac,s} \cdot f_{\rm REF}. \tag{3.14}$$

#### 3.2.2 Synchronous Interference from CKV to FREF



Figure 3.5: Schematic (a) and waveforms (b) illustrating the supply-ripple-induced FREF delay, i.e.,  $\Delta t_{\rm V2R}$ . While  $v_{\rm REF,ideal}$  is the ideal FREF waveform,  $v_{\rm REF,rip}$  represents the FREF waveform as seen by the FREF buffer (i.e., referred to the buffer's fluctuating ground).

Ref. [71] reported that CKV (i.e., the DCO output) sub-harmonics could couple to and superimpose on the incoming FREF waveform, causing a CKV-dependent delay at the output of the FREF buffer, which ultimately degrades the CKV phase error during the PLL operation. The phase error degradation sinusoidally correlates with the phase offset between CKV and FREF, indicating the synchronicity of this interference mechanism with  $\phi_{\rm R,frac}$ .

In fact, such a disturbance mechanism commonly occurs when the FREF clock propagates across power domains<sup>1</sup>, especially when the chain processing the FREF signal adopts a single-ended structure to save power. Figure 3.5 illustrates a conceptual example: An ideal FREF, having a certain duration of its edge transition, triggers the level-crossing slicer to launch the delayed

<sup>&</sup>lt;sup>1</sup>A PLL designed for low spurs would generally allocate isolated power domains to its sub-blocks in order to minimize mutual influence between the blocks, e.g., [18,28].

clock signal FREF<sub>dly</sub>. When the ground and supply of the slicer are clean, FREF<sub>dly</sub> is launched exactly at the moment<sup>1</sup> the ideal FREF waveform  $v_{\text{REF,ideal}}(t)$  crosses the slicing threshold voltage  $V_{\text{th}}$ . However, when the slicer's internal ground and supply fluctuate (even by the same amount as guaranteed by sufficient decoupling capacitance) with a ripple voltage of  $v_{\text{rip}}(t)$ , the slicer sees the FREF waveform referred to the local ground with the ripple component, i.e.,  $v_{\text{REF,rip}}(t) = v_{\text{REF,ideal}}(t) - v_{\text{rip}}(t)$ . Naturally, the  $V_{\text{th}}$ -crossing moment, indicated by the falling FREF<sub>dly</sub> edge, exhibits a delay strongly correlating with  $v_{\text{rip}}(t)$ , i.e.,  $\Delta t_{\text{V2R}}$ .

If the buffer's supply ripple is dominated by a sinusoidal wave coupled from the phased-locked DCO, the resulting delay in FREF<sub>dly</sub> also exhibits a sinusoidal pattern synchronous with  $\phi_{\text{R,frac}}$ , i.e.,  $\Delta t_{\text{V2R}}[n] \propto \sin[2\pi(\phi_{\text{R,frac}}[n] - \phi_{\text{prop}})]$ , where  $\phi_{\text{prop}}$  is a normalized phase offset due to ripple propagation. This can be understood with the waveforms in Fig. 3.5(b): In the case without supply ripple, the phase of the DCO waveform ( $v_{\text{DCO}}(t)$ ) can be predicted as  $\phi_{\text{R,frac}}[n]$  at the  $V_{\text{th}}$ -crossing moment of the ideal FREF falling edge (see the  $v_{\text{REF,ideal}}(t)$  curve at t = 0). However, when the supply ripple of amplitude  $A_{\text{rip}}$  is present, the FREF waveform referred to the local ground (labeled as  $v_{\text{REF,rip}}$ ) deviates from  $V_{\text{th}}$  by  $A_{\text{rip}} \sin[2\pi(\phi_{\text{R,frac}}[n] - \theta_{\text{prop}})]$  at the critical instant t = 0. Compared with the ideal case, this shifts away the  $V_{\text{th}}$ -crossing moment and the associated FREF<sub>dly</sub> edge roughly by

$$\Delta t_{\rm V2R}[n] = -\frac{A_{\rm rip}\sin(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm prop})}{s_{\rm fall}},\qquad(3.15)$$

where  $s_{\text{fall}}$  is the slope of the ideal FREF falling edge at t = 0.

 $\Delta t_{\text{V2R}}[n]$  thus gets injected into the loop as in-band interference (i.e.,  $\phi_{i,\text{IB}}[n] = f_0 \Delta t_{\text{V2R}}[n]$ ), and results in fractional spurs at the offset frequencies corresponding to the  $\phi_{\text{R,frac}}[n]$  fluctuation, i.e.,  $\pm |f_{\text{SC}}|$ . This in-band interference is synchronous with and resembles  $\sin(2\pi\phi_{\text{R,frac}}[n])$ . This offers a possibility of mutual cancellation with the DCO interference described in Section 3.2.1 provided the relative amplitude and phase offset are properly set. The next section will experimentally verify the feasibility of such cancellation.

## 3.3 Experimental Verification of Spur Cancellation via Synchronous Interference

This section experimentally verifies the spur cancellation mechanism pointed out by the foundational formula (3.4): The experiments are performed on a PLL chip whose output contains fundamental fractional spurs raised by

 $<sup>^{1}</sup>$ Or with a small *fixed* propagational delay.

both in-band and DCO interference signals that are attributed to the typical mutual coupling between the CKV and FREF related circuits. Considering that these two types of interference signals are synchronous, the resulting spurs can be in the end suppressed by changing their relative amplitude and phase. Before diving into the spur cancellation details, we first give an overview of the chip used for the verification, and then identify the on-chip self-interference mechanisms.

#### 3.3.1 Details of the PLL used in the Experiment



Figure 3.6: PLL diagram emphasizing the details related to spur cancellation.

Figure 3.6 sketches a system diagram of the PLL used for verifying the spur cancellation mechanism<sup>1</sup>. Similar to the simplified PLL in Fig. 3.1(a),

<sup>&</sup>lt;sup>1</sup>This chip improves the isolation characteristics of critical blocks of the original IC described in Chapter 2 in order to suppress the fundamental fractional spurs induced by the mutual coupling between the reference clock and DCO. The modification details are not shown here because they exhibit insignificant improvement in the spur levels.

the implemented PLL constantly samples the CKV phase at the grid of  $\text{FREF}_{\text{dly}}$  clock, a delayed version of FREF. Then, the sampled CKV phase is compared with the ideal one predicted by accumulating FCW in order to extract the CKV phase error  $\Delta\phi_{\text{E}}$ . The extracted  $\Delta\phi_{\text{E}}$  passes through the digital loop filter and tunes the DCO to correct the CKV phase error. Considering the predicted CKV phase consists of the fractional and integer parts, respectively  $\phi_{\text{R,frac}}$  and  $\phi_{\text{R,int}}$ , the phase error extraction is performed in two parallel paths.

On the  $\phi_{\text{R,int}}$ -related branch, the number of CKV's significant (falling) edges is constantly monitored by the counter. At the rising edge of the update clock CKU, which aligns with the 5th CKV falling edge after FREF<sub>dly</sub>, the counter value is sampled to obtain the *integer* part of the CKV phase at the FREF<sub>dly</sub> grid [26]. The sampled phase cancels with  $\phi_{\text{R,int}}$  to extract the *integer* part of  $\Delta \phi_{\text{E}}$ .

Regarding the  $\phi_{\text{R,frac}}$ -associated path, CKV's fractional phase reflects on  $\Delta t_{\text{S}}$ , which is the instantaneous time offset between the FREF<sub>dly</sub> and the first subsequent CKV falling edge. In an ideal case without any noise and interference,  $\Delta t_{\text{S}} = (1 - \phi_{\text{R,frac}}) \cdot T_{\text{CKV}}$ , in which  $T_{\text{CKV}}$  is the nominal CKV period. Hence, the CKV's fractional phase error reflects on the time error,  $\Delta t_{\text{E}} = (1 - \phi_{\text{R,frac}}) \cdot T_{\text{CKV}} - \Delta t_{\text{S}}$ , which is extracted by the time-mode arithmetic unit (TAU) described in [35]. The TAU samples  $T_{\text{CKV}}$ , conceptually scales it with  $(1 - \phi_{\text{R,frac}})$ , cancels it with the sampled  $\Delta t_{\text{S}}$ , and outputs the residue as the time offset  $\Delta t_{\text{E}}$ . At the implementation level,  $\phi_{\text{R,frac}}$  splits into  $\phi_{\text{crs}}$  and  $\phi_{\text{fine}}$ , used for the coarse and fine  $T_{\text{CKV}}$  scaling, respectively. Accordingly, the realistic  $\Delta t_{\text{E}}$  extraction is realized as

$$\Delta t_{\rm E} = (1 - \phi_{\rm crs} - \phi_{\rm fine}) \cdot T_{\rm CKV} - \Delta t_{\rm S}. \tag{3.16}$$

The extracted  $\Delta t_{\rm E}$  is quantized by a time-to-digital converter (TDC) and then normalized to the fractional phase error by multiplying with the factor of K<sub>TDC</sub>. The fractional phase error finally adds to the integer part (extracted by the  $\phi_{\rm R,int}$ -related branch) to arrive at the overall phase error  $\Delta \phi_{\rm E}$ .

In the implemented PLL, the TAU scales  $T_{\rm CKV}$  with 10-b accuracy, where  $\phi_{\rm crs}$  and  $\phi_{\rm fine}$  respectively tune the highest 3 and lowest 7 bits. Considering the  $\phi_{\rm crs}$ -associated  $T_{\rm CKV}$ -scaling error dominates the TAU's overall integral nonlinearity (INL), a look-up table (LUT) tackles this issue by adding a  $\phi_{\rm crs}$ -dependent compensation signal  $\phi_{\rm LUT}$  to  $\phi_{\rm fine}$ . To prevent the TAU resolution from limiting the compensation accuracy,  $\phi_{\rm LUT}$  is noise-shaped by a first-order  $\Delta\Sigma$ -modulator before adding it to  $\phi_{\rm fine}$ .

The content of the LUT is calibrated by an LMS-based algorithm shown in the lower-left of Fig. 3.6: The calibration is performed when the PLL operates at a channel with FCW<sub>frac,s</sub>  $\approx 11/16$ :<sup>1</sup> After the  $\phi_{\rm crs}$  code is used, the resulting TDC output  $D_{\rm TDC}$  is scaled by the step-control factor  $\mu_{\rm crs}$  and then de-multiplexed to the accumulator associated with the  $\phi_{\rm crs}$  code. The scaled  $D_{\rm TDC}$  is accumulated to update the corresponding offset compensation word, i.e., OS. When this  $\phi_{\rm crs}$  code is used next time, the corresponding OS value is multiplexed out to  $\phi_{\rm LUT}$ , finally tuning the TAU for the ultimate purpose of reducing the time error. In the end, the resulting  $D_{\rm TDC}$  reduces in magnitude and updates the OS accumulator less significantly. The OS value finally converges to a point that ensures the average  $D_{\rm TDC}$  to be 0. Since  $\phi_{\rm crs}$ has 3 bits, only 8 accumulators and OS values are needed in the LUT.



Figure 3.7: PLL diagram emphasizing partitioning of the power domains.

This chip has been originally designed for low spurious levels, so the blocks were grouped according to their characteristics and allocated dedicated on-chip power (supply and ground) domains to minimize self-interference. Figure 3.7 sketches the power-domain partitioning of the overall PLL system: The DCO and FREF related blocks are grouped in separated power domains since these two groups tend to interfere with each other (as explained in Section 3.2.1 and Section 3.2.2). The digital blocks are also assigned a dedicated power domain to prevent the digital switching activities from disturbing the sensitive mixed-signal blocks (e.g., TDC, TAU, etc.) by means of perturbating the shared supply and ground. Finally, the remaining

 $<sup>^{1}</sup>$ The reason for using this special number is explained in Section 2.6.3.



mixed-signal blocks share a general mixed-signal power domain.

Figure 3.8: Chip micrograph.

The realized chip utilizes a reference clock of 40 MHz to synthesize frequencies from 2.6 to 4.0 GHz. It is fabricated in 40-nm CMOS and its micrograph is shown in Fig. 3.8.

#### 3.3.2 Identifying Sources of the Fundamental Fractional Spurs



Figure 3.9: CKV spectra (a) before and (b) after utilizing the LUT to suppress the in-band interference.

Figure 3.9(a) shows the measured spectrum at the PLL output before applying the LUT compensation at a near-integer channel with  $FCW_{frac,s} \approx$ 0.00025. The highest fractional spurs lie at the offset frequency of around  $\pm 10 \,\mathrm{kHz}$  from the carrier. The magnitude of the offset frequency coincides with FCW<sub>frac.s</sub>  $\cdot f_{\text{REF}}$ , so the spurs are the fundamental fractional spurs in this channel and can be caused by both in-band and DCO interference. To confirm the dominant source, the fundamental fractional spur level is observed while sweeping the FCW range of (69, 69.5). Assuming the strength of the dominant interference is constant (which is reasonable in the narrow FCW range), the curve of the fundamental-spur-versus-FCW<sub>frac.s</sub> reflects the PLL's frequency response to the interference. As shown in Fig. 3.10(a), each fundamental-spur-vs.-FCW<sub>frac.s</sub> curve exhibits a low-pass characteristic, and the bandwidth increases with the digital loop filter's proportional coefficient  $\alpha$  (shown in Fig. 3.6), which equals to  $0.5 \sim 2$  times  $\alpha_0$ , which is the default  $\alpha$  value adopted to measure Fig. 3.9. This trend agrees with (3.1), indicating that the in-band interference dominates the fundamental fractional spurs.



Figure 3.10: Measured fundamental fractional-spur levels versus  $FCW_{frac,s}$ : (a) before and (b) after canceling the in-band interference with the LUT in Fig. 3.6.

Afterward, the LUT is calibrated to cancel the effects of in-band interference. Upon applying the LUT compensation, the fundamental fractional spurs in Fig. 3.9(a) are suppressed to below  $-62.5 \, dB$ , as shown in Fig. 3.9(b), indicating that the residual in-band interference gets significantly attenuated in near-integer channels. However, the suppression performance tends to be less effective as the fractional channel frequency increases (but still within the loop bandwidth). As shown in Fig. 3.10(b), the fundamental fractional-spur curve exhibits a bandpass characteristic and peaks at FCW<sub>frac,s</sub> close to  $2^{-7}$ . In addition, the peak value decreases as  $\alpha$  increases. This trend matches (3.2) and indicates that the DCO interference is also a significant contributor to the fundamental fractional spurs on this chip. The DCO interference is coupled from FREF. The evidence can be found in the output spectrum of the free-running DCO shown in Fig. 3.11. The spectrum contains spurs at the  $f_{\text{REF}}$  harmonics (i.e., 69× and 70× of 40 MHz) and their mirrors relative to the main carrier. These spur positions agree with the mechanism of FREF-to-DCO-coupling-induced fractional spurs explained in Fig. 3.4. The spectrum is measured after disabling all the blocks in Fig. 3.6 except for the DCO (with buffer) and FREF buffer chain (till FREF<sub>dly</sub>), so that FREF is the only possible aggressor of DCO.



Figure 3.11: Spectrum of the free-running DCO with spurs caused by FREF.

#### 3.3.3 Verifying the Spur Cancellation Mechanism

Section 3.3.2 has confirmed that the undersirable interference sources, related to both in-band and DCO, are present in the chip, and has demonstrated the cancelation of the in-band spurs by means of an LUT. On top of the LUT compensation, this section will explain how to further cancel the FREF-induced DCO interference by tuning the phase and amplitude of the in-band interference. A notch (or zero) on the measured fundamental-fractional-spur-vs.-FCW<sub>frac,s</sub> curve finally confirms the cancellation effect, validating the cancellation theory predicted by (3.5).

To better understand how the interference signals can cancel each other, Fig. 3.12(a) sketches their phasor diagram. All the interference patterns are assumed to be sinusoidal (a reasonable assumption according to Section 3.2) and represented by vectors generalized as  $\vec{\phi}_x$ 's. For example,  $\vec{\phi}_{DCO}$  represents the FREF-induced DCO interference,  $\vec{\phi}_{V2R}$  denotes the in-band interference due to the CKV-induced FREF delay, and  $\vec{\phi}_{LUT}$  is the LUT-injected pattern compensating  $\vec{\phi}_{V2R}$ . The phasor diagram is observed in a coordinate system with axes parallel/orthogonal with  $\vec{\phi}_R$ , a virtual unit vector representing



Figure 3.12: Diagrams explaining the principle of FREF-delay-based method to cancel the spurs raised by self-interference (i.e.,  $\vec{\phi}_{\rm DCO}$  from FREF to CKV, and  $\vec{\phi}_{\rm V2R}$  from CKV to FREF) synchronous with the  $\sin(2\pi\phi_{\rm R,frac}[n])$  sequence  $(\vec{\phi}_{\rm R})$ : (a) Phasor diagram after the LUT content  $(\vec{\phi}_{\rm LUT})$  is calibrated to cancel  $\vec{\phi}_{\rm V2R}$ , as the case in Fig. 3.10(b). (b) Phasor diagram illustrating the strategy of suppressing  $\vec{\phi}_{\rm DCO}$  by rotating  $\vec{\phi}_{\rm V2R}$  after accomplishing (a). (c) Waveforms illustrating that phasors  $\vec{\phi}_{\rm DCO}$  and  $\vec{\phi}_{\rm V2R}$  are related to the FREF edge, and thereby can be estimated with the CKV phase at the FREF grid, i.e.,  $\phi'_{\rm R,frac} = \phi_{\rm R,frac} - \phi_{\rm dly}$ , where  $\phi_{\rm dly}$ is a normalized phase offset caused by the constant delay between FREF and FREF<sub>dly</sub>. (d) Phasor diagram illustrating that changing the delay between FREF and FREF<sub>dly</sub> can rotate  $\vec{\phi}'_{\rm R}$ (representing  $\sin(2\pi\phi'_{\rm R,frac})$ ) by  $2\pi\Delta\phi_{\rm dly}$ , where  $\Delta\phi_{\rm dly}$  is the additional FREF<sub>dly</sub>-delay-induced  $\phi_{\rm dly}$  change. Accordingly,  $\vec{\phi}_{\rm V2R}$  and  $\vec{\phi}_{\rm DCO}$  rotate by the same amount to remain stationary relative to  $\vec{\phi}'_{\rm R}$ .

the pattern of  $\sin(2\pi\phi_{\rm R,frac}[n])$ . This is because each interference vector here exhibits a fixed phase offset relative to  $\vec{\phi}_{\rm R}$  (due to the synchronicity with the  $\phi_{\rm R,frac}[n]$  sequence, as mentioned in Section 3.2) and thereby can be relatively stationary in such a coordinate. Because Fig. 3.12(a) describes the case after the LUT calibration in Section 3.3.2,  $\vec{\phi}_{\rm LUT}$  could perfectly cancel  $\vec{\phi}_{\rm V2R}$  such that only  $\vec{\phi}_{\rm DCO}$  remains as the sole significant aggressor, as was the case in Fig. 3.10(b).

Now, to cancel  $\vec{\phi}_{\text{DCO}}$ ,  $\vec{\phi}_{\text{V2R}}$  can be rotated as per Fig. 3.12(b) so that  $\vec{\phi}_{\text{LUT}}$ and  $\vec{\phi}_{\text{V2R}}$  would be combined to construct an in-band *anti*-interferer  $\vec{\phi}_{\text{IB}}$ . If the amplitude and phase are proper,  $\vec{\phi}_{\text{IB}}$  after processed by the loop filter would greatly attenuate or even cancel out the effect of  $\vec{\phi}_{\text{DCO}}$  (the mechanism of propagating this anti-interferer through the loop filter will be explained in Section 3.4).

Rotating  $\phi_{V2R}$ , equivalent to changing the fixed phase offset relative to  $\vec{\phi}_{\rm R}$ , is achieved by tuning the delay between FREF and FREF<sub>dly</sub> through the delayline shown in Fig. 3.6. To understand the mechanism behind it, the synchronous coupling theory in Section 3.2 is adapted to a more realistic case—the phases of  $\vec{\phi}_{V2R}$  and  $\vec{\phi}_{\rm DCO}$  in Fig. 3.12(a), are actually determined by an offset version of  $\phi_{\rm R,frac}$ , i.e.,

$$\phi_{\rm R,frac}' = \phi_{\rm R,frac} - \phi_{\rm dly}, \qquad (3.17)$$

where  $\phi_{dly}$  is a normalized phase offset (referred to the CKV period) caused by the constant delay between FREF and  $\text{FREF}_{\text{dlv}}$  [see Fig. 3.12(c)]. This is because  $\phi_{\rm R,frac}$  merely estimates the CKV phase at the FREF<sub>dly</sub> clock grid, which is the direct input of the TAU, i.e., the PLL's intrinsic phase reference. In contrast, the mutual disturbances between CKV (sharpened DCO output) and FREF, i.e.,  $\vec{\phi}_{V2R}$  and  $\vec{\phi}_{DCO}$ , actually occur around the FREF grid. The root causes should be understood with the generation of these two interference signals. First, we start with the CKV-launched interference victimizing FREF,  $\phi_{V2R}$ . Although the ripple-induced FREF delay can be easily injected at any cross-domain points between FREF and  $FREF_{dly}$  (as Section 3.2.2), the most severe disturbance likely occurs at the point where FREF is input to the chip. This is because with the same ripple signal, a larger additional delay can result when the FREF slope is slower. At the chip's input, the critical FREF edge has yet to be sharpened by the on-chip buffers, hence the edge there exhibits the slowest slope and likely the highest vulnerability to disturbance. On the other hand, the interference from FREF to CKV,  $\dot{\phi}_{\rm DCO}$ , also occurs around the FREF's significant (here, falling) edges because the associated transition of the FREF buffer is the root cause of injecting the current disturbance into the DCO (shown earlier in Fig. 3.3).

To incorporate this fact in the phasor diagram in Fig. 3.12(a), a new virtual vector  $\vec{\phi}'_{\rm R}$  reflecting the  $\sin(2\pi\phi'_{\rm R,frac}[n])$  pattern is defined. Its phase offset relative to  $\vec{\phi}_{\rm R}$  is  $2\pi\phi_{\rm dly}$ .<sup>1</sup> In the  $\vec{\phi}'_{\rm R}$ -based coordinate, the mutual

<sup>&</sup>lt;sup>1</sup>Reminder:  $\phi_{dly}$  is a normalized phase so it is multiplied by  $2\pi$  for the radian unit.

disturbances between CKV and FREF, i.e.,  $\vec{\phi}_{\rm DCO}$  and  $\vec{\phi}_{\rm V2R}$ , keep the fixed angles relative to the  $\vec{\phi}_{\rm R}$ -axis, irrespective of its rotation. For example, when the delay between FREF and FREF<sub>dly</sub> changes, an additional normalized phase shift  $\Delta \phi_{\rm dly}$  is added to  $\phi_{\rm dly}$ . As a result, the new  $\vec{\phi}_{\rm R}$ -based coordinate rotates by  $2\pi\Delta\phi_{\rm dly}$  relative to the original one [see Fig. 3.12(d)]. Accordingly,  $\vec{\phi}_{\rm V2R}$  and  $\vec{\phi}_{\rm DCO}$  also need to rotate by the same amount to keep themselves stationary relative to  $\vec{\phi}_{\rm R}$ . Therefore, by tuning the FREF delay, the  $\vec{\phi}_{\rm V2R}$ rotation required in Fig. 3.12(b) could be achieved.

On the other hand,  $\vec{\phi}_{LUT}$  is not affected by the FREF delay because the  $\vec{\phi}_{LUT}$  pattern is controlled by  $\phi_{R,frac}$  and should keep the constant angle (defined by the LUT content) relative to  $\vec{\phi}_{R}^{-1}$ . Consequently, a non-zero  $\vec{\phi}_{IB}$  arises from the vector sum of  $\vec{\phi}_{LUT}$  and  $\vec{\phi}_{V2R}$ . By tuning the FREF delay, there is a high chance of finding a proper  $\vec{\phi}_{IB}$  canceling the effects of  $\vec{\phi}_{DCO}$ , thereby creating a zero predicted by (3.5).



Figure 3.13: Measured fundamental fractional-spur level versus  $FCW_{frac,s}$  after tuning the FREF delay: (a) Comparing the cases with and without the additional FREF delay; (b) Comparing the cases with different integral coefficients of the loop filter  $\rho$ .  $\rho_0$  is the default  $\rho$  value utilized in measuring (a).

To verify the theory above, we configured the PLL with the same LUT content as in Fig. 3.10(b). Then, we searched for the proper FREF delay to yield the best spur cancellation and measured the corresponding fundamental-fractional-spur level versus FCW<sub>frac,s</sub>. The optimum spur cancellation result is found when the FREF delay is increased by about 18 ps. The result is shown in Fig. 3.13(a), where the curve with the extra FREF delay exhibits two notches which suppress the fundamental fractional-spur levels across the FCW<sub>frac,s</sub> range from  $2^{-9}$  to  $2^{-3}$ , compared to the baseline curve without

<sup>&</sup>lt;sup>1</sup>The same is true for the pattern of TAU nonlinearity. No matter how FREF delay is changed, the cancellation relationship between the nonlinearity pattern and the corresponding LUT compensation component does not change. So they are not mentioned in Fig. 3.12

the additional FREF delay. The high spur levels at very small FCW<sub>frac,s</sub> naturally result from the deliberately introduced in-band interference that could not be properly canceled at all frequencies<sup>1</sup>. Although the reason for the high-frequency notch ("Notch 2") is at this point unclear, the low-frequency one ("Notch 1") agrees with the proposed spur cancellation theory. Additional evidence is presented in Fig. 3.13(b), where the frequency of the low-frequency notch changes proportionally with the integral coefficient of the digital loop filter, i.e.,  $\rho = \rho_0/2^i$ , where  $\rho_0$  denotes the default  $\rho$  used in Fig. 3.13(a). This trend agrees with that of the predicted zero location in (3.5), thus verifying the proposed spur cancellation mechanism.

# 3.4 Digital Approach Canceling the DCO-Interference-Induced Fractional Spurs

Section 3.3.3 has proved the feasibility of canceling the DCO-interferenceinduced fractional spurs using the in-band interference acting as an antiinterferer. However, its phase and amplitude require manual tuning. This section proposes a digitally intensive alternative, which deliberately adds a sinusoidal in-band interferer through the existing LUT (see Fig. 3.6) and determines its phase and amplitude according to the DCO-interferenceinduced pattern in the detected phase error (i.e., the TDC output).

#### 3.4.1 Principle of Designing the In-band Interference Sequence

Phasor diagrams in Fig. 3.14 visualize how to cancel the DCO interference  $\vec{\phi}_{\rm DCO}$  with  $\vec{\phi}_{\rm SC}$ , a signal deliberately added to the phase detector for the purpose of spur cancellation.  $\vec{\phi}_{\rm SC}$  is re-scaled and rotated by the loop filter (due to its frequency-dependent gain and phase shift) and then fed-forward to the DCO as  $\vec{\phi}_{\rm SC,ff}$ . To completely cancel the spurs raised by the DCO interference,  $\vec{\phi}_{\rm SC}$  should be well-constructed to ensure  $\vec{\phi}_{\rm SC,ff}$  exhibits the same amplitude as  $\vec{\phi}_{\rm DCO}$  but with the 180° phase difference.

According to Section 3.2.1, the associated waveform of  $\vec{\phi}_{\rm DCO}$  resembles and is synchronous with  $\sin(2\pi\phi_{\rm R,frac}[n])$ . As the corresponding cancellation signal,  $\vec{\phi}_{\rm SC}$  should thus also be a similar sinusoidal wave, i.e.,  $\vec{\phi}_{\rm SC} = A_{\rm SC} \cdot \sin(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm SC})$ , where  $\theta_{\rm SC}$  is the phase offset relative to  $\vec{\phi}_{\rm R}$ . Logically,  $\theta_{\rm SC}$  consists of two parts, i.e.,  $\theta_{\rm SC} = \theta_{\rm SC,ff} + \theta_{\rm DLF}$ . As shown in Fig. 3.14,  $\theta_{\rm SC,ff}$  is the angle between  $\vec{\phi}_{\rm SC,ff}$  and  $\vec{\phi}_{\rm R}$ , thereby complementary with that between  $\vec{\phi}_{\rm DCO}$  and  $\vec{\phi}_{\rm R}$ , which is determined by the physical coupling

<sup>&</sup>lt;sup>1</sup>This is because the PLL loop filter rotates and rescales the in-band interference  $\vec{\phi}_{\text{IB}}$  in a frequencydependent manner while propagating it to the DCO, thus changing the performance of cancellation. Details will be further explained in Section 3.4



Figure 3.14: Phasor diagram illustrating how the in-band interference designed for spur cancellation  $(\vec{\phi}_{\rm SC})$  is fed-forward by the loop filter (as  $\vec{\phi}_{\rm SC,ff}$ ) and then cancels with the DCO interference  $(\vec{\phi}_{\rm DCO})$ . Vectors representing these patterns are observed in the coordinate with axes parallel/perpendicular with  $\vec{\phi}_{\rm R}$ , which a vector represents the  $\sin(2\pi\phi_{\rm R,frac}[n])$  sequence.

characteristics;  $\theta_{\text{DLF}}$  reflects the angle by which the digital loop filter rotates  $\vec{\phi}_{\text{SC}}$  to generate  $\vec{\phi}_{\text{SC,ff}}$ , and thereby is a function of the loop parameters and operating frequency. Consequently, the pattern of  $\vec{\phi}_{\text{SC}}$  is finally described as

$$\phi_{\rm SC}[n] = A_{\rm SC} \cdot \sin(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm SC,ff} + \theta_{\rm DLF}). \tag{3.18}$$

The next few subsections will discuss how to calculate  $\theta_{\text{DLF}}$ , to measure  $\theta_{\text{SC,ff}}$ , and to determine  $A_{\text{SC}}$ .

#### **3.4.2 Calculating** $\theta_{\text{DLF}}$

 $\theta_{\text{DLF}}$  is incurred while propagating  $\vec{\phi}_{\text{SC}}$  to  $\vec{\phi}_{\text{SC,ff}}$  through the PLL's loop filter. In a type-II PLL,  $\vec{\phi}_{\text{SC,ff}}$  contains two orthogonal components— $\alpha \cdot \vec{\phi}_{\text{SC}}$ and  $(\rho \cdot f_{\text{REF}}/s) \cdot \vec{\phi}_{\text{SC}}$  (see Fig. 3.14). Hence, the magnitude of  $\theta_{\text{DLF}}$  can be expressed by  $\theta_{\text{DLF}} = \arctan[(\rho f_{\text{REF}})/(2\pi\alpha f_{\text{SC}})]$ , where  $f_{\text{SC}}$  is the frequency at which the  $\vec{\phi}_{\text{SC}}$  pattern fluctuates and equals to that of  $\vec{\phi}_{\text{DCO}}$  and  $\vec{\phi}_{\text{R}}$  due to their synchronicity. Substituting the  $f_{\text{SC}}$  expression in (3.14) into the above equation suggests  $\theta_{\text{DLF}}$  is FCW-dependent, i.e.,

$$\theta_{\rm DLF} = -\arctan\left(\frac{\rho}{\alpha} \cdot \frac{1}{\rm FCW_{\rm frac,s}}\right).$$
(3.19)

This angle can be readily calculated in a digital PLL since  $FCW_{frac,s}$  is easily derived from the system's FCW and  $\rho/\alpha$  is easily obtained from the parameter settings of the digital loop filter [see Fig. 3.1(a)].

#### 3.4.3 Measuring $\theta_{\rm SC,ff}$

In a nearly ideal PLL with noise and in-band interference absent, the phase detector output pattern, represented by  $\vec{\phi}_{\rm PD}$ , is entirely determined by the DCO interference,  $\vec{\phi}_{\rm DCO}$ . Denoting the angle between  $\vec{\phi}_{\rm PD}$  and  $\vec{\phi}_{\rm R}$  as  $\theta_{\rm PD}$ ,  $\theta_{\rm SC,ff}$  can be determined by measuring the curve of  $\theta_{\rm PD}$ -vs.-|FCW<sub>frac,s</sub>|, whose positive and negative FCW<sub>frac,s</sub> branches cross at the point where  $\theta_{\rm PD} = \theta_{\rm SC,ff}$  (see Fig. 3.15, lower-right). FCW<sub>frac,s</sub> here represents the frequency of DCO interference, i.e.,  $\omega = -2\pi$ FCW<sub>frac,s</sub> ·  $f_{\rm REF}$  according to (3.14).



Figure 3.15: Diagram explaining how to search for the angular frequency  $(|\omega| = \sqrt{\rho})$ , in which  $\vec{\phi}_{\text{DCO}}$  and the resulting  $\vec{\phi}_{\text{PD}}$  are anti-phase.

The principle of this  $\theta_{\rm SC,ff}$ -measurement method is explained as follows: By definition,  $\theta_{\rm SC,ff}$  is an angle between  $\vec{\phi}_{\rm R}$  and the vector that is set anti-phase with  $\vec{\phi}_{\rm DCO}$  (see Fig. 3.14). So,  $\theta_{\rm PD} = \theta_{\rm SC,ff}$  when  $\langle \vec{\phi}_{\rm DCO}, \vec{\phi}_{\rm PD} \rangle = \pi$ . Here,  $\langle \vec{\phi}_{\rm DCO}, \vec{\phi}_{\rm PD} \rangle$  denotes the angle between  $\vec{\phi}_{\rm DCO}$  and  $\vec{\phi}_{\rm PD}$ , which is a strong function of the DCO interference frequency  $\omega$  and can be expressed as

$$\langle \vec{\phi}_{\rm DCO}, \vec{\phi}_{\rm PD} \rangle = \arctan\left(\frac{\frac{\omega}{f_{\rm REF}} - \frac{\rho \cdot f_{\rm REF}}{\omega}}{\alpha}\right) + \pi,$$
 (3.20)

according to the PLL's phase-domain model in Fig. 3.1(b). The  $\langle \vec{\phi}_{\text{DCO}}, \vec{\phi}_{\text{PD}} \rangle = \pi$  condition is satisfied at the frequency pair  $\omega = \pm \sqrt{\rho}$ , indicating the crossing point of the positive- and negative- $\omega$  branches of the  $\langle \vec{\phi}_{\text{DCO}}, \vec{\phi}_{\text{PD}} \rangle$ -versus- $|\omega|$  curve (see Fig. 3.15 upper-right). Therefore,  $\theta_{\text{SC,ff}}$  can be determined from this crossing point, equivalent to that on the  $\theta_{\text{PD}}$ -versus- $|\text{FCW}_{\text{frac,s}}|$  curve (see Fig. 3.15 lower-right), considering the angle between  $\vec{\phi}_{\text{DCO}}$  and  $\vec{\phi}_{\text{R}}$  (i.e.,  $\pi - \theta_{\text{SC,ff}}$ ) does not change significantly within a narrow frequency range (e.g.,  $\omega \in [-\sqrt{\rho}, \sqrt{\rho}]$ ).

The remaining question is how to measure  $\theta_{\rm PD}$  at each FCW<sub>frac,s</sub>. Basically,  $\theta_{\rm PD}$  can be measured by correlating the detected phase error with the orthogonal  $\vec{\phi}_{\rm R}$ , i.e., the  $\cos(2\pi\phi_{\rm R,frac}[n])$  sequence. In practice, the phase detector output is quantized to the  $D_{\rm TDC}[n]$  sequence by a time-to-digital converter (TDC). Then,  $\theta_{\rm PD}$  theoretically equals the phase offset  $\theta_{\rm x}$  at which the correlation function, i.e.,

$$R_{\rm corr}(\theta_{\rm x}) = \sum_{n=1}^{N} D_{\rm TDC}[n] \cdot \cos(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm x}), \qquad (3.21)$$

is zero. N here equals the length of a complete  $\phi_{\rm R,frac}[n]$  repetition pattern<sup>1</sup>. The reason why  $\theta_{\rm PD}$  can be measured in this manner lies in the fact that the  $\vec{\phi}_{\rm PD}$  pattern in  $D_{\rm TDC}[n]$  is proportional to  $\sin(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm PD})$ , making  $R_{\rm corr}(\theta_{\rm x} - \theta_{\rm PD}) \propto \sin(\theta_{\rm x} - \theta_{\rm PD})$ . In addition, considering  $\sin(\theta_{\rm x} - \theta_{\rm PD})$  also crosses zero when  $\theta_{\rm x} = \pi + \theta_{\rm PD}$ , representing the cases  $\langle \vec{\phi}_{\rm DCO}, \vec{\phi}_{\rm PD} \rangle = 0$ instead of  $\langle \vec{\phi}_{\rm DCO}, \vec{\phi}_{\rm PD} \rangle = \pi$ , the following condition must be checked to exclude that improper solution, i.e.,

$$R_{\rm corr}'(\theta_{\rm PD}) = \sum_{n=1}^{N} D_{\rm TDC}[n] \cdot \sin(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm PD}) > 0, \qquad (3.22)$$

where N is the same as that in  $R_{\rm corr}(\theta_{\rm x})$ . Note that the  $\theta_{\rm PD}$ -measurement strategy is merely used to demonstrate the concept. An implementation-oriented alternative can be realized with a gradient-decent algorithm [80].

#### **3.4.4 Determining** $A_{\rm SC}$

Once  $\theta_{\text{DLF}}$  and  $\theta_{\text{SC,ff}}$  are known, the direction of  $\vec{\phi}_{\text{SC}}$  (in the  $\vec{\phi}_{\text{R}}$ -based coordinate) is fixed. Then, the optimum amplitude  $A_{\text{SC}}$  can be determined iteratively as the PLL operates with the FCW<sub>frac,s</sub> satisfying  $\theta_{\text{PD}} \approx \theta_{\text{SC,ff}}$ , i.e.,  $|\text{FCW}_{\text{frac,s}}|\approx \sqrt{\rho}/(2\pi f_{\text{REF}})^2$ : A tentative version of  $\vec{\phi}_{\text{SC}}$ , i.e.,  $\vec{\phi}_{\text{x}}$ , is added

<sup>&</sup>lt;sup>1</sup>As explained in [79], the complete length of  $\phi_{R,\text{frac}}[n]$  is determined by the smallest bit of FCW<sub>frac</sub>. For example, if FCW<sub>frac</sub> =  $2^{-5} + 2^{-7}$ ,  $\phi_{R,\text{frac}}[n]$  starts to repeat after  $2^7$  consecutive data samples.

<sup>&</sup>lt;sup>2</sup>Operating at such a frequency simplifies the convergence analysis, as will be explained later.

as an acting stimulus to the phase detector. Since  $\vec{\phi}_{\rm x}$  aligns with  $\vec{\phi}_{\rm SC}$ , it takes a form of  $\vec{\phi}_{\rm x} = A_{\rm x} \cdot \sin(2\pi\phi_{\rm R,frac}[n] + \theta_{\rm SC,ff} + \theta_{\rm DLF})$ , where  $A_{\rm x}$  is the amplitude to be updated adaptively, and finally converges to the optimum  $A_{\rm SC}$ . After rotated by the PLL's digital loop filter,  $\phi_{\rm x}$  adds to the DCO a vector in exact anti-phase with  $\vec{\phi}_{\text{DCO}}$  to cancel the latter's effects. If the amplitude of  $\vec{\phi}_x$  is not large enough to cancel  $\vec{\phi}_{DCO}$ , i.e.,  $A_x < A_{SC}$ , the undercompensated residual  $\phi_{\rm DCO}$  results in a feedback vector  $\phi_{\rm DCO,fb}$  at the phase detector side. Hence, the detected phase error  $\vec{\phi}_{PD}$  is dominated by the vector sum of the under-compensated  $\phi_{\rm DCO,fb}$  and the deliberately added acting stimulus vector  $\vec{\phi}_x$ , assuming other in-band interference sources are negligible. As shown in the case of  $A_{\rm x} < A_{\rm SC}$  in Fig. 3.16, the under-compensated  $\dot{\phi}_{\rm DCO,fb}$  is almost anti-phase with  $\dot{\phi}_{\rm DCO}$ , considering  $\langle \dot{\phi}_{\rm DCO}, \dot{\phi}_{\rm DCO,fb} \rangle \approx \pi$ when the PLL operates with  $|\text{FCW}_{\text{frac},s}| \approx \sqrt{\rho}/(2\pi f_{\text{REF}})$  (see Section 3.4.3). Consequently, the angle between  $\vec{\phi}_{\rm R}$  and  $\vec{\phi}_{\rm PD}$  is smaller than that with  $\vec{\phi}_{\rm x}$ , i.e.,  $\theta_{\rm PD} < \theta_{\rm SC,ff} + \theta_{\rm DLF}$ . On the contrary, if the amplitude of  $\vec{\phi}_{\rm x}$  is larger than the optimum, i.e.,  $A_{\rm x} > A_{\rm SC}$ , the phase detector will get an over-compensated  $\phi_{\rm DCO,fb}$ , which is anti-phase with the under-compensated one and finally results in  $\theta_{\rm PD} > \theta_{\rm SC,ff} + \theta_{\rm DLF}$  (see the case of  $A_{\rm x} > A_{\rm SC}$  in Fig. 3.16). Consequently,  $A_x$  can be iteratively updated by accumulating the error between  $\theta_{\rm PD}$  and  $\theta_{\rm SC,ff} + \theta_{\rm DLF}$ , i.e.,  $\theta_{\rm E,PD} = \theta_{\rm SC,ff} + \theta_{\rm DLF} - \theta_{\rm PD}$ . As a result,  $A_x$  should finally converge to the point  $\theta_{\rm PD} = \theta_{\rm SC,ff} + \theta_{\rm DLF}$ , indicating  $\vec{\phi}_{\rm x}$  perfectly cancels the effect of  $\vec{\phi}_{\rm DCO}$  so that  $\vec{\phi}_{\rm DCO,fb} = \vec{0}$ . At that moment,  $A_{\rm x} = A_{\rm SC}$ .



Figure 3.16: Phasor diagram showing the sinusoidal component  $(\vec{\phi}_{PD})$  at the phase detector output, which combines the acting stimulus vector  $\vec{\phi}_x$  for spur cancellation and the detected phase error  $\vec{\phi}_{DCO,fb}$  due to the under-/over-compensation of  $\vec{\phi}_{DCO}$ . Here, the case of  $f_{SC} > 0$ .

Note that the example in Fig. 3.16 merely demonstrates the case with a positive frequency of DCO interference, i.e.,  $f_{\rm SC} > 0$ . When  $f_{\rm SC} < 0$ , both  $\vec{\phi}_{\rm x}$  and  $\vec{\phi}_{\rm DCO,fb}$  would be mirrored from the  $\vec{\phi}_{\rm DCO}$  vector, since the associated angles are inverted according to (3.19) and (3.20). Consequently, the cases of  $A_{\rm x} < A_{\rm SC}$  and  $A_{\rm x} > A_{\rm SC}$  would respectively result in negative and positive  $\theta_{\rm E,PD}$ . This is opposite to the situation with  $f_{\rm SC} > 0$ . Therefore,  $A_{\rm x}$  needs to be updated by accumulating  $-\theta_{\rm E,PD}$ , and can still converge to  $A_{\rm x} = A_{\rm SC}$ .

#### 3.4.5 Implementation



Figure 3.17: Flow to determine the spur-cancellation content of the LUT (in Fig. 3.6), i.e., the waveform of  $\vec{\phi}_{SC}$  which is logically stored in the SC-LUT.

The  $\phi_{\rm SC}$  pattern is incorporated into the LUT that was shown earlier in the implementation diagram in Fig. 3.6. The LUT values are selected by  $\phi_{\rm crs}$ (the 3 MSBs of  $\phi_{\rm R,frac}$  with the values of i/8,  $i \in 0, 1, ..., 7$ ), and then added to the phase detector via  $\phi_{\rm LUT}$ . This way, the reconstructed waveform of  $\vec{\phi}_{\rm SC}$  is always synchronized with  $\vec{\phi}_{\rm R}$ . To distinguish the LUT content that addresses the in-band and DCO interference, the LUT is logically divided into two parallel sub-LUTs—one, SC-LUT, stores the  $\vec{\phi}_{\rm SC}$  pattern while the other, AIB-LUT, compensates the analog in-band interference, as shown in the upper-right of Fig. 3.17.

The AIB-LUT content should be fixed before performing the SC-LUT estimation because the processes determining the  $\vec{\phi}_{SC}$  parameters (i.e.,  $\theta_{SC,ff}$ 

and  $A_{\rm SC}$ ) assume that the PLL in-band interference is negligible (e.g., already suppressed by the AIB-LUT). The AIB-LUT is calibrated with the LMS-based algorithm shown in Fig. 3.6 when the PLL is provisioned with  $|\text{FCW}_{\text{frac},s}| \approx$ 11/16. The large  $|\text{FCW}_{\text{frac},s}|$  ensures the DCO interference is located at an offset frequency (i.e.,  $|f_{\rm SC}|$ ) high enough to be suppressed by the 1/s filtering of the DCO.

Regarding the SC-LUT content, the key parameters of  $\phi_{\rm SC}$ , i.e.,  $\theta_{\rm SC,ff}$ ,  $\theta_{\rm DLF}$ , and  $A_{\rm SC}$ , are sequentially determined through the three steps shown in Fig. 3.17. In these steps, measuring  $\theta_{\rm PD}$  is a common procedure because  $\theta_{\rm SC,ff}$ and  $A_{\rm SC}$  are estimated based on observing  $\theta_{\rm PD}$ . To measure  $\theta_{\rm PD}$ , an on-chip SRAM collects the sequences of  $\phi_{\rm R,frac}[n]$  and quantized phase error  $D_{\rm TDC}[n]$ in the background, after the PLL is locked. These two sequences are read out by software and correlated to estimate  $\theta_{\rm PD}$  as discussed in Section 3.4.3.

During the first step of determining  $\phi_{SC}$ , i.e., estimating  $\theta_{SC,ff}$ , the  $\theta_{PD}$ -versus- $|FCW_{frac,s}|$  curve is measured with the AIB-LUT using a well-calibrated content and with all SC-LUT registers remaining at zero. Likewise,  $\theta_{SC,ff}$  equals  $\theta_{PD}$  at the crossing point of this curve's positive and negative FCW<sub>frac,s</sub> branches.

Next,  $\theta_{\text{DLF}}$  is calculated according to (3.19), where the required parameters can be obtained from the PLL settings— $\rho/\alpha$  from the configurations of the digital loop filter, and FCW<sub>frac,s</sub> from the FCW to be used for the  $A_{\text{SC}}$ optimization in the next step. After this step, the angle between  $\vec{\phi}_{\text{SC}}$  and  $\vec{\phi}_{\text{R}}$ (controlled by  $\theta_{\text{SC,ff}} + \theta_{\text{DLF}}$  in (3.18)) is readily calculated.

The last step is to determine the optimum amplitude of  $\phi_{\rm SC}$ , i.e.,  $A_{\rm SC}$ , with the iteration process shown in Step 3 of Fig. 3.17: A  $\phi_{\rm SC}$ -aligned acting stimulus vector  $\phi_x$  with an arbitrary initial amplitude  $A_x$  is written into the SC-LUT. Then,  $\theta_{\rm PD}$  is measured to extract the error  $\theta_{\rm E,PD} = \theta_{\rm SC,ff} + \theta_{\rm DLF} - \theta_{\rm PD}$ . The extracted error is accumulated to update  $A_x$ , so is the acting stimulus vector  $\phi_x$  in SC-LUT. With the updated SC-LUT,  $\theta_{\rm PD}$  is measured again to correct  $A_x$ . Such an iterative process finally converges at a point where the detected phase error vector  $\phi_{\rm PD}$  aligns with the acting stimulus vector  $\phi_x$ , indicating that  $A_x$  achieves the optimum value, i.e.,  $A_x = A_{\rm SC}$ . During the iterations, the convergence speed is controlled by the  $\theta_{\rm E,PD}$ -scaling factor  $\mu_A$ , and the polarity of accumulating  $\theta_{\rm E,PD}$  is controlled by the sign of FCW<sub>frac,s</sub>.

The above process can only determine  $\vec{\phi}_{\rm SC}$  at a single frequency point where  $A_{\rm SC}$  is optimized because  $\vec{\phi}_{\rm SC}$  experiences a frequency-dependent rotation and re-scaling by the loop filter. Therefore, when the PLL operates at a substantially different fractional frequency, both  $\theta_{\rm DLF}$  and  $A_{\rm SC}$  should be adjusted according to FCW<sub>frac,s</sub>.  $\theta_{\rm DLF}$  can be re-calculated with (3.19). Regarding  $A_{\rm SC}$ , it should guarantee that  $\vec{\phi}_{\rm SC}$  perfectly cancels the DCO interference  $\phi_{\text{DCO}}$  after getting rescaled by the loop filter, i.e.,

$$|\vec{\phi}_{\rm DCO}|^2 = (\alpha^2 + \rho^2 f_{\rm REF}^2 / (2\pi f_{\rm SC})^2) \cdot A_{\rm SC}(f_{\rm SC})^2.$$
(3.23)

Considering that the causes of  $\vec{\phi}_{\rm DCO}$  (i.e., the aggressor being the FREF clock and the coupling path from FREF to DCO) do not change significantly within a narrow frequency range, we assume  $|\vec{\phi}_{\rm DCO}|^2$  is constant and take into account (3.14) to scale  $A_{\rm SC}$  across the fractional frequencies:

$$A_{\rm SC}(\rm FCW_{\rm frac,s}|_{\rm op}) = A_{\rm SC}(\rm FCW_{\rm frac,s}|_{\rm meas}) \cdot \sqrt{\frac{1 + \beta(\rm FCW_{\rm frac,s}|_{\rm meas})^2}{1 + \beta(\rm FCW_{\rm frac,s}|_{\rm op})^2}},$$
(3.24)

where

$$\beta(\text{FCW}_{\text{frac},\text{s}}) = \frac{\rho}{\alpha} \cdot \frac{1}{2\pi \text{FCW}_{\text{frac},\text{s}}},$$
(3.25)

 $FCW_{frac,s}|_{meas}$  is the  $FCW_{frac,s}$  with which  $A_{SC}$  is calibrated, and  $FCW_{frac,s}|_{op}$  is the  $FCW_{frac,s}$  with which the PLL operates in a new frequency.

## 3.5 Experimentally Verifying the Digitally Intensive Spur Cancellation

The digitally intensive spur cancellation approach is verified on the same chip as described in Section 3.3.1. During the verification process, the behavioral AIB-LUT imposes the same LUT content as that in measuring Fig. 3.10(b), where the spur levels in the near-integer channels are below -62 dB, indicating that the uncompensated in-band interference is sufficiently suppressed and would not significantly degrade the accuracy in the  $\theta_{\text{SC,ff}}$  and  $A_{\text{SC}}$  estimation. Next, we determine the parameters of  $\vec{\phi}_{\text{SC}}$ , i.e.,  $\theta_{\text{SC,ff}}$ ,  $\theta_{\text{DLF}}$  and  $A_{\text{SC}}$ .

To search for  $\theta_{SC,ff}$ , the  $\theta_{PD}$ -versus- $|FCW_{frac,s}|$  curve is measured and plotted in Fig. 3.18(a).  $\theta_{SC,ff}$  equals  $\theta_{PD}$  at the crossing point of the positive and negative  $FCW_{frac,s}$  branches, i.e.,  $0.627 \times 2\pi$ . One may notice that these two branches are almost linear in the swept  $FCW_{frac,s}$  range. Consequently, averaging the measured  $\theta_{PD}$  values of each  $\pm |FCW_{frac,s}|$  pair can roughly represent  $\theta_{SC,ff}$ , according to the 'mean' curve in Fig. 3.18(a). This feature can be adopted to accelerate the  $\theta_{SC,ff}$  measurement.

Then,  $A_{\rm SC}$  is optimized at the frequency corresponding to  ${\rm FCW}_{\rm frac,s} \approx 2^{-7}$ , which is close to  $\sqrt{\rho}/(2\pi f_{\rm REF})$  [i.e.,  $|{\rm FCW}_{\rm frac,s}|$  value at the cross-over of the  $\theta_{\rm PD}$ -versus- $|{\rm FCW}_{\rm frac,s}|$  curve in Fig. 3.18(a)] and guarantees the convergence for the  $A_{\rm SC}$  search. At this frequency, the corresponding  $\theta_{\rm DLF}$  is  $-0.049 \times 2\pi$ 



Figure 3.18: (a) Measured  $\theta_{\rm PD}$ -versus- $|\rm FCW_{frac,s}|$  curve used for searching  $\theta_{\rm SC,ff}$ . (b) Convergence curve of  $A_{\rm x}$  to determine  $A_{\rm SC}$ .

according to (3.19), and  $\rho/\alpha \approx 2^{-6}$ . The procedure explained in Fig. 3.17 (see Step 3) is employed to search for the optimum amplitude of  $\vec{\phi}_{\rm SC}$ . Figure 3.18(b) plots the transient of the acting stimulus amplitude  $A_{\rm x}$ , which starts from 0 and settles at 1.2 after 20 iterations. Since  $\vec{\phi}_{\rm SC}$  is injected into the PLL through the LUT related to the  $\phi_{\rm R,frac}$  processing (see Fig. 3.6), the unit of  $A_{\rm x}$  is the LSB of  $\phi_{\rm R,frac}$ , i.e., 0.001 of the normalized phase.



Figure 3.19: PLL's output spectra before (a) and after (b) applying the proposed spur cancellation technique at FCW  $\approx 69.01$ , and the corresponding phase noise profiles (c) and (d).

After setting  $A_{\rm SC}$  to 1.2, the final  $A_{\rm x}$  value in Fig. 3.18(b),  $\vec{\phi}_{\rm SC}$  is now fixed

for the channel of FCW<sub>frac,s</sub>  $\approx 2^{-7}$ . According to the PLL output spectra before and after applying  $\vec{\phi}_{\rm SC}$ , respectively shown in (a) and (b) of Fig. 3.19, the fundamental fractional spur is significantly suppressed from  $-47.5 \,\mathrm{dB}$ to  $-60.6 \,\mathrm{dB}$ . Although the spur is suppressed by deliberately adding the in-band interference  $\vec{\phi}_{\rm SC}$ , the phase noise does not degrade. This is supported by the unchanged value of integrated jitter in the case without and with  $\vec{\phi}_{\rm SC}$ , respectively shown in Fig. 3.19 (c) and (d).



Figure 3.20: Comparison of the worst fractional spur (a) and integrated jitter (b) versus  $FCW_{frac,s}$  before and after applying the proposed spur cancellation technique.

To verify the spur cancellation performance over the fractional channels, the worst-spur-versus-FCW<sub>frac,s</sub> curve is swept across the channels with FCW  $\in$  (69, 69.5). During this process,  $A_{\rm SC}$  and  $\theta_{\rm DLF}$  are adjusted according to (3.24) and (3.19). Based on the measurement results in Fig. 3.20(a), applying  $\vec{\phi}_{\rm SC}$  suppresses the worst spur levels to below  $-59 \, \rm dB$  in most channels. An exception occurs in the channel with FCW<sub>frac,s</sub>  $\approx 2^{-8}$ , where the worst fractional spur is  $-57.8 \, \rm dB$  with the automatically calculated  $\vec{\phi}_{\rm SC}$ . This exception is caused by the non-optimal amplitude of  $\vec{\phi}_{\rm SC}$  (possibly degraded by the non-zero residual in-band interference which violates the assumption for determining  $A_{\rm SC}$  Section 3.4.5), and the worst spur level can also reduce to below  $-59 \, \rm dB$  after manually increasing the amplitude of  $\vec{\phi}_{\rm SC}$  by 0.2 LSB. In addition, the corresponding integrated jitter values are almost the same for the cases with and without  $\vec{\phi}_{\rm SC}$ , indicating no phase noise degradation.

When the PLL hops to a faraway frequency channel (e.g., by changing the integer part of FCW), everything should be re-calibrated to guarantee the optimum spur-cancellation performance because the phase and amplitude of the mutual interference between FREF and CKV may change. This is also true for the cases with supply and temperature variations.

#### 3.6 Conclusion

This chapter analyzed the characteristics of the PLL's self-interference arising from the mutual coupling between the DCO and reference clock buffer, i.e., the in-band and DCO interference injected internally within the PLL, respectively through the phase detector and the DCO. Their impacts on fundamental fractional spurs are also investigated. Based on two features of the self-interference, i.e., sinusoidal pattern and synchronicity with the predicted DCO phase, we developed a digitally intensive strategy that cancels the DCO-interference-induced fundamental fractional spurs utilizing a welldesigned in-band interference. The proposed approach reuses the same hardware that was originally designed to predistort the in-band interference (e.g., the nonlinearity of the phase-detection blocks), thus can be readily applied to a fabricated chip without the need for the chip redesign in order to mitigate the unexpected spurs due to self-interference. More importantly, based on the concept of synchronous-interference cancellation, more methods can be developed to suppress the impacts of mutual coupling between the blocks inside the PLL. This may help to relax the isolation specifications of each block, reduce the system complexity, and improve the power efficiency of the overall system.

# CHAPTER



# A Digital PLL-Based Phase Modulator Achieving Low EVM

As mentioned in Chapter 1, a PLL with a two-point modulation capability can serve as a phase modulation (PM) path in a polar transmitter (TX). This strategy can maximize the system energy efficiency, especially at lower output power, and has been widely adopted in wireless system-on-chip (SoC) solutions for Internet-of-Things (IoT) applications. On the other hand, facing the data explosion trend, many wireless communication standards, including those for IoT, are evolving toward high data rates by adopting high-order modulation schemes, thereby requiring lower EVM to correctly resolve the received data. For example, Wi-Fi HaLow (a new IoT standard) has introduced 256-QAM, which requires an EVM below  $-32 \,\mathrm{dB}$  for the entire TX. From the perspective of a polar-TX system design, the amplitude modulation (AM) path is usually allowed to corrupt a greater EVM portion since it handles a large signal amplitude and is more prone to nonlinearity and EVM degradation. As a result, the PM path is allocated a much lower portion of the EVM budget (e.g.,  $\leq -40 \,\mathrm{dB}$ ). Although the recently published PLL-based phase modulators have reported EVM below  $-40 \, dB$  [81] [28], maintaining such performance is challenging under some practical system-level constraints.

This chapter<sup>1</sup> presents a digital PLL-based phase modulator that can be potentially utilized in a polar TX for IoT applications: Section 4.1 first explains two practical system-level constraints limiting the PM accuracy—the

<sup>&</sup>lt;sup>1</sup>Main content of this chapter has been published in IEEE Journal of Solid-State Circuits [82].

DCO's frequency modulation (FM) nonlinearity and a non-uniform clock at whose grid the DCO updates the modulation frequency. To address the non-uniform clock issue, Section 4.2 first extends the conventional discrete-time phase modulator model to a hybrid-time domain one in order to analyze the non-uniform clock's variations and their effects. Based on the improved model, Section 4.3 proposes a non-uniform clock compensation (NUCC) scheme to suppress the associated PM accuracy degradation. Regarding the DCO nonlinearity, Section 4.4 proposes a phase-domain digital pre-distortion (DPD) technique to combat the DCO nonlinearity component related to the  $1/\sqrt{LC}$ -law. Section 4.5 explains the implementation details of the proposed phase modulator, and Section 4.6 demonstrates the measurement results. Finally, Section 4.7 concludes this chapter.

# 4.1 System-Level Constrains Limiting the Phase Modulation Accuracy

For a realistic PLL-based phase modulator, its accuracy commonly faces headwinds from two system-level constraints. One is that the ever-widening signal bandwidth  $(BW_{sig})$  in advanced communication standards tends to become a large fraction of the RF channel frequency  $(f_{\rm RF})$ , i.e.,  ${\rm BW}_{\rm sig}/f_{\rm RF}$ , ultimately aggravating the  $1/\sqrt{LC}$ -induced nonlinearity of the DCO. For example, WiFi HaLow may use a signal bandwidth up to 16 MHz around 800 MHz, resulting in BW<sub>sig</sub>/ $f_{\rm RF} \approx 2\%$ . If this signal is transmitted by a polar TX, the DCO on the PM path needs to update at a frequency much higher than  $BW_{sig}$  to suppress the replicas and spectral regrowth due to the FM expansion [83]; e.g., the update frequencies in [84] [85] [12] are over  $16 \times$ of  $BW_{sig}$ . The DCO's FM bandwidth ( $BW_{FM}$ ) is usually a large fraction of the update frequency, even equal to it to guarantee the PM range of  $[-\pi,\pi]$  [81] [86]. Consequently, BW<sub>FM</sub> can be many times wider than BW<sub>sig</sub>, covering a portion of  $f_{\rm RF}$  much higher than 2%. Across such a wide FM range, an LC-tank DCO will exhibit significant nonlinearity due to its  $1/\sqrt{LC}$  law conversion [87].

So far, the DCO nonlinearity has been tackled by pre-distorting the oscillator tuning word (OTW). Noting that the pre-distortion setting is highly frequency-sensitive, [25] and [88] calibrate the settings in foreground at multiple frequency points. This not only costs extra power but may also fail to maintain the optimum EVM since a foreground calibration cannot track the relevant parameters under temperature and supply drift. Although the background calibration in [28] and [84] addresses the drawbacks of the foreground calibration, the convergence times are long, e.g., up to 100 ms

in [84]. Considering that the background calibration there involves not only the nonlinearity but also the DCO gain  $(K_{\text{DCO}})$  [28], which is cubically related to the channel frequency [87], the calibration results can easily turn invalid after hopping to some reasonably faraway channel. Therefore, re-calibration may be frequently needed during channel hopping, wasting considerable time and energy.



Figure 4.1: Block diagram of a digital polar transmitter. The DCO-update clock, CKU, is obtained by re-sampling and inverting the reference clock, FREF, by the falling edges of the DCO variable clock, CKV.

Another challenging system-level constraint is that the phase modulator should operate at a non-uniform sampling clock aligned with the channeldependent and phase-modulated RF clock [11, 12, 52, 74, 76], such as the variable clock (CKV) in Fig. 4.1, which depicts a polar TX adopting parallel PM and amplitude modulation (AM) paths to reconstruct the desired RF signal. As highlighted, the digital polar TX uses multiple clock domains (i.e., CKU, CKV, CKD) to allow sufficiently high clock sampling rates of each block while being aware of their effects on power consumption. Aligning all the clocks with a common reference, i.e., CKV, helps to avoid data misalignment and glitches during cross-clock-domain data synchronization. This prevents the EVM and output spectrum from getting degraded by glitches of AM data [11] and misalignment between AM and PM signals [52].

Two strategies are widely utilized to generate the phase modulator's updating clock (CKU) that is synchronous with CKV. One is to frequencydivide the CKV [12, 25, 52, 76, 89]; the other is to re-time the significant edge of the PLL's reference clock (FREF) by that of CKV [11, 44, 90], as exemplified by the CKU generation timing diagram in Fig. 4.1 (in this design, the significant edges of FREF and CKV are both falling, while those of CKU are rising). Since CKV is phase modulated, any clock synchronous with CKV will exhibit some non-uniformity—the clock periods are time-varying;
the offsets between its significant edges and those corresponding to an ideal uniform clock (e.g. those between CKU and FREF in Fig. 4.1) vary across cycles. Considering that PLL-based phase modulators have overwhelmingly adopted the two-point modulation scheme [91] [92], which directly modulates the DCO phase through one feed point and eliminates the excess phase prior to the phase detector through the other feed point, the non-uniform period and time-varying offset of the generated clock will respectively affect the DCO phase modulation and excess phase elimination (details will come in Section 4.2.2). These two mechanisms will disturb the PLL and finally degrade the EVM. Currently, the prior art [12] [89] merely tackles the effects of period variation, but ignores the impairments related to offset variation. Even for the period variation compensation, the existing methods are only valid for the CKU generated by dividing CKV, whose period is determined by the instantaneous CKV frequency, but cannot be extended to the case of using the reference clock re-timed to CKV, whose period is affected by the accumulative CKV phase.

In summary, due to the system-level constraints mentioned above, a PLLbased phase modulator faces a PM accuracy degradation from both the non-uniform clock and DCO nonlinearity. The impacts of these two error sources will be analyzed and addressed in the remaining sections of this chapter.

## 4.2 Modeling a PLL-Based Phase Modulator

This section will first introduce the conventional discrete-time phase modulator model. It will then be extended to a hybrid-time domain one to assist in analyzing the non-uniform clock's variations and related effects, paving the way for developing useful mitigation strategies.

#### 4.2.1 Ideal Phase Modulator Model in Discrete-Time Domain

Figure 4.2 shows a discrete-time domain model of an ideal PLL-based phase modulator. To produce the CKV clock with the excess phase  $\phi'_V$  (i.e., excluding the carrier component), the desired modulation commanding phase  $\theta_M$  is first normalized by  $1/(2\pi)$  to  $\phi_M$ .<sup>1</sup> Then,  $\phi_M$  is differentiated to  $\Delta\phi_M$ , which is the target phase shift to be developed by  $\phi'_V$  during a single reference cycle.  $\Delta\phi_M$  modulates the PLL through two feeding points [93], defined as direct modulation (DM) and phase prediction (PP). Through the DM point,  $\Delta\phi_M$  directly modulates the DCO. Due to its phase integration nature [60],

<sup>&</sup>lt;sup>1</sup>In this chapter, the phase symbol  $\theta$  is in the conventional unit of radian, but, for practical reasons,  $\phi$  is normalized by  $1/(2\pi)$ , i.e. in unit intervals (UI).



Figure 4.2: Discrete-time domain model of an ideal PLL-based phase modulator with a two-point modulation. The gains of DCO and phase detector, respectively  $K_{\text{DCO}}$  and  $K_{\text{PD}}$ , are implied as normalized, as in [26], hence hidden.

the DCO accumulates  $\Delta \phi_{\rm M}$  cycle by cycle such that the output phase  $\phi'_{\rm V}$  equals the delayed modulation target  $\phi_{\rm M}$ , i.e.,  $\phi'_{\rm V}[n] = \phi_{\rm M}[n-1]$ . Meanwhile, the PP-related path also emulates the DCO behavior for its elimination purpose, i.e., by accumulating  $\Delta \phi_{\rm M}$  and then delaying it to predict the DCO phase with  $\phi'_{\rm R}[n-1]$ . Any deviation of  $\phi'_{\rm V}$  from  $\phi'_{\rm R}$ , i.e.,  $\Delta \phi_{\rm E}$ , will be detected and gradually corrected by the loop.

Ideally,  $\phi'_{\rm V}[n] = \phi'_{\rm R}[n-1]$ , so  $\Delta \phi_{\rm E} = 0$  signifies that the loop is oblivious to the modulation 'perturbations'. In practice, however, errors will occur in relation to these two feed points. The DM-induced error is denoted as  $\phi_{\rm E,DM}$  and stems from various impairments of the DCO, such as its phase noise and frequency quantization, as well as the nonlinearity of its FM characteristics. Without the feedback loop, even a tiny but persistent  $\phi_{\rm E,DM}$ can accumulate without bound in the DCO as a PM error. Fortunately, a closed-loop PLL will gradually correct it, thus preventing the accumulation in the long run. A wider PLL bandwidth helps to suppress the effects of  $\phi_{\rm E,DM}$ , but it makes the PM accuracy more vulnerable to the PP-induced error, i.e.,  $\phi_{\rm E,PP}$ , which stems from the phase detector's noise and nonlinearity, as well as the prediction error of  $\phi'_{\rm R}$ . This implies an optimum PLL bandwidth to balance the PM error due to  $\phi_{\rm E,DM}$  and  $\phi_{\rm E,PP}$ . However, the optimum bandwidth is merely a trade-off. To achieve a lower EVM, this work focuses on minimizing both  $\phi_{\rm E,DM}$  and  $\phi_{\rm E,PP}$ .

#### 4.2.2 DCO Model in Hybrid-Time Domain

The DCO model in Fig. 4.2 is merely a discrete-time domain approximation assuming that both the modulating input  $\Delta \phi_{\rm M}$  and developed output phase  $\phi'_{\rm V}$  update simultaneously on the same uniform clock-spacing grid, thus



Figure 4.3: Hybrid-time model of the DCO: (a) schematic and (b) waveforms.

incapable of properly handling the effects of clock impairments, i.e., the FMinduced skew and period variations. To include these non-idealities, the DCO model is expanded to a hybrid (i.e., discrete/continuous)-time domain, with the diagram and waveforms shown in Fig. 4.3. The DCO is basically an FM device whose offset frequency  $\Delta f_{\rm M}$  from the  $f_0$  carrier changes instantaneously in response to the oscillator tuning word (OTW) that is updated by the CKU clock. This FM characteristic is modeled in the discrete-time domain. To be consistent with the discrete-time DCO in Fig. 4.2, we expediently use an ideal CKU aligned with the PLL's reference (FREF), but we will add the timing non-idealities to the CKU later. Considering that OTW is denormalized from  $\Delta \phi_{\rm M}$  by  $f_{\rm REF}/K_{\rm DCO}$ , where  $f_{\rm REF}$  is the frequency of FREF and  $K_{\rm DCO}$ is the DCO FM transfer gain, then  $\Delta f_{\rm M}$  during the  $n^{\rm th}$  clock cycle is related to  $\Delta \phi_{\rm M}$  by

$$\Delta f_{\rm M}[n] = \Delta \phi_{\rm M}[n] \cdot f_{\rm REF} = \frac{\Delta \phi_{\rm M}[n]}{T_{\rm REF}}, \qquad (4.1)$$

where  $T_{\text{REF}}$  is the period of FREF. On the other hand, the DCO also exhibits phase-accumulation characteristic with which it acquires the excess phase  $\phi'_{\rm V}$  by integrating  $\Delta f_{\rm M}$  over time [24], i.e.,  $\phi'_{\rm V}(t) = \int_0^t \Delta f_{\rm M}(\tau) d\tau$ . This characteristic is modeled in a continuous-time domain, and a zero-order hold is added to convert the discrete-time  $\Delta f_{\rm M}[n]$  to continuous-time  $\Delta f_{\rm M}(t)$  [94]. Thus, the continuous-time  $\phi'_{\rm V}(t)$  can be described as

$$\phi_{\rm V}'(t) = \sum_{i=0}^{n-1} \Delta \phi_{\rm M}[i] + \Delta f_{\rm M}[n] \cdot (t - n \cdot T_{\rm REF})$$

$$\tag{4.2}$$

where  $n = \lfloor t/T_{\text{REF}} \rfloor$ . Interestingly,  $\phi'_{V}(t)$  sampled by FREF (for phase detection), i.e.,  $\phi'_{V}[n]$ , equals the  $\sum_{i=0}^{n-1} \Delta \phi_{M}[i]$  term, which is exactly the  $\phi'_{V}[n]$  prediction term  $\phi'_{R}[n-1]$  in Fig. 4.2. Consequently, no error will be detected and so the PLL remains unperturbed. Note that two conditions should be satisfied to perfectly cancel the sampled and predicted phases. First, from the phase accumulation aspect, the excess phase shift in the  $n^{\text{th}}$  clock cycle should exactly equal

$$\Delta \phi'_{\rm V}[n] = \Delta f_{\rm M}[n] \cdot T_{\rm REF} = \Delta \phi_{\rm M}[n].$$
(4.3)

Aside from an  $\Delta f_{\rm M}$  error caused by the DCO FM nonlinearity, this condition can also be impaired by the DCO-phase-accumulation time ( $T_{\rm acc}$ ) deviating from  $T_{\rm REF}$  [95]. This occurs if CKU is time-varying, as in Fig. 4.1. Then, the CKU period variation will degrade the PM accuracy through  $\phi_{\rm E,DM}$ . Second, from the phase-detection perspective, the DCO update clock CKU should ideally align with the sampling clock FREF. If any offset exits (this will be discussed in Section 4.2.3),  $\phi'_{\rm R}$  will not precisely predict  $\phi'_{\rm V}$ . The associated error adds to  $\phi_{\rm E,PP}$ , thereby disturbing the PLL and affecting the EVM.

4.2.3 Hybrid-Time Model of Phase Modulator



Figure 4.4: Phase modulator with delay spread compensation: (a) waveforms and (b) block diagram.

A realistic CKU might not be perfectly aligned with FREF due to various circuit delays on the FM path, e.g., CKU's propagation delay and DCO's settling time. For simplicity, all these delays are included in the nominally constant offset between FREF and CKU, i.e.,  $\Delta t_{\text{cnst}}$  (exaggerated) in Fig. 4.4(a). Then,  $\phi'_{\text{R}}$  predicts  $\phi'_{\text{V}}(t)$  sampled at the CKU grid, instead of that at FREF. Therefore, using  $\phi'_{\text{R}}$  for the phase detection leaks some  $\phi'_{\text{V}}$  information to  $\phi_{\text{E,PP}}$ , resulting in an error of

$$\phi_{\rm R2S}[n] = \Delta t_{\rm cnst} \cdot \Delta f_{\rm M}[n] = \frac{\Delta t_{\rm cnst}}{T_{\rm REF}} \cdot \Delta \phi_{\rm M}[n].$$
(4.4)

Figure 4.4(b) sketches a hybrid-time phase-modulator model, which merges the hybrid-time DCO in Fig. 4.3(a) with the discrete-time phase modulator of Fig. 4.2. To reflect the  $\phi'_V$  leakage mechanism due to the  $\Delta t_{\text{cnst}}$  skew, the hybrid model emphasizes the clock-domains—FREF is used in the  $\phi'_V$ sampling and CKU drives all the remaining discrete-time blocks and updates the DCO's  $\Delta f_M$ . Furthermore, this model also converts  $\phi'_R[n]$  to the  $\phi'_V(t)$ prediction at the FREF grid, i.e.,  $\phi'_S[n] = \phi'_R[n] - \phi_{R2S}[n]$ . Utilizing  $\phi'_S$  for phase detection can completely avoid the  $\phi'_V(t)$  leakage.

It should be noted that [86] has also found this  $\phi'_{\rm V}(t)$  leakage mechanism, defined as "delay spread", and compensated for it by recursively predicting  $\phi'_{\rm S}$ . However, [86] considers only the case of constant  $\Delta t_{\rm cnst}$ . In the non-uniform CKU case (to be discussed in Section 4.3), CKU's offset relative to FREF becomes time-varying. Under such a condition, using  $\phi_{\rm R2S}$  to predict  $\phi'_{\rm S}$  can be more convenient, since it only involves the phase accumulation within one CKU cycle and the prediction error would not propagate to or accumulate on subsequent cycles due to the non-recursive form.

# 4.3 Non-Uniform Clock Compensation (NUCC)

#### 4.3.1 Foundation for NUCC— $\Delta t_{\rm S}$ Estimation

Due to the system-level constraints discussed in Section 4.1, the proposed phase modulator adopts the *update* clock CKU that is generated by re-timing the FREF falling edge to the 5th subsequent CKV falling edge (for timing reasons), as shown in Fig. 4.5(a). Consequently, CKU shows the time-varying offset (relative to FREF) and period, thus respectively contributing errors to  $\phi_{\rm E,PP}$  and  $\phi_{\rm E,DM}$ . To tackle these errors, the first step is to estimate the variations of CKU offset and period. This entails knowing  $\Delta t_{\rm S}$ , i.e. the instantaneous time offset between FREF and its 1<sup>st</sup> subsequent CKV edge, due to two reasons: Regarding the CKU's offset from FREF,  $\Delta t_{\rm S}$ dominates the variation component because this offset breaks down to two



Figure 4.5: Phase modulator with the proposed non-uniform clock compensation (NUCC): (a) Waveforms showing CKU generation by re-timing FREF by CKV, (b) waveforms illustrating the phases related to  $\Delta t_{\rm S}$  prediction, and (c) the system diagram.

parts— $\Delta t_{\rm S}$  and four CKV periods (i.e.,  $4T_{\rm CKV}[n]$ , where  $T_{\rm CKV}[n]$  is the CKV period during the  $n^{\rm th}$  CKU cycle). The former one varies across CKU cycles; the latter one is roughly constant, approximately 4 average  $T_{\rm CKV}[n]$ , i.e.,  $\Delta t_{\rm cnst} \approx 4\overline{T_{\rm CKV}}$ , given that BW<sub>FM</sub> is sufficiently smaller than the DCO carrier frequency  $(f_0)$ . Regarding the CKU period, its variation can be simply derived by differentiating the relevant offsets, more specifically  $\Delta t_{\rm S}$ 's.

Actually, the  $\Delta t_{\rm S}$  prediction is widely used in the recent PLLs to narrow down the phase detectors' input range [17, 18, 20, 36, 43]. Predicting  $\Delta t_{\rm S}$ requires the absolute phase of CKV, i.e.  $\phi_{\rm V}$ , which counts not only the excess phase  $\phi'_{\rm V}$  due to modulation, but also the carrier phase  $\phi_{\rm C}$  [see Fig. 4.5(b)].<sup>1</sup> Using the predicted  $\phi_{\rm V}$  at the FREF grid, i.e.  $\phi_{\rm S}$ ,  $\Delta t_{\rm S}$  in the  $n^{\rm th}$  CKU cycle can be predicted as

$$\Delta t_{\rm S}[n] \approx (1 - \phi_{\rm S, frac}[n]) \cdot \overline{T_{\rm CKV}}, \qquad (4.5)$$

<sup>&</sup>lt;sup>1</sup>In this chapter, a generic excess phase  $\phi'_x$  represents the absolute phase  $\phi_x$  excluding the ideal carrier phase  $\phi_c$ .

where  $\phi_{S,\text{frac}}$  is the fractional part of  $\phi_S$ .

To facilitate the  $\Delta t_{\rm S}$  prediction, the phase modulator model in Fig. 4.5(c) includes the DCO's carrier phase  $\phi_{\rm C}$ : On the direct-modulation side,  $\phi_{\rm C}$  is modeled by integrating the DCO carrier frequency  $f_0$  over time. Then  $\phi_{\rm C}$  adds to  $\phi'_{\rm V}$  to represent the absolute CKV phase  $\phi_{\rm V}$ . On the phase-prediction side, the frequency control word, i.e.,

$$FCW = \frac{f_0}{f_{REF}} = \frac{T_{REF}}{\overline{T_{CKV}}},$$
(4.6)

is accumulated to reflect the behavior of  $\phi_{\rm C}$  at the FREF grid:

$$\phi_{\rm C}[n] = \int_0^{n \cdot T_{\rm REF}} f_0 \mathrm{d}\tau = \sum^n \mathrm{FCW}.$$
(4.7)

The accumulated FCW adds to  $\phi'_{\rm S}$  (the prediction of  $\phi'_{\rm V}$  at the FREF grid), yielding  $\phi_{\rm S}$ . With its fractional part  $\phi_{\rm S,frac}$ , the NUCC block can predict  $\Delta t_{\rm S}$ as well as estimate the CKU's period and offset deviation relative to FREF, and then compensate the associated effects on  $\phi_{\rm E,DM}$  and  $\phi_{\rm E,PP}$  with  $\phi_{\rm DMC}$ and  $\phi_{\rm R2S}$ , respectively.

#### 4.3.2 Tackling $\phi_{E,DM}$ due to CKU Period Variation



Figure 4.6: Waveforms of the phase modulator, showing  $\phi'_{\rm V}$  error due to the non-uniform CKU period, i.e.,  $\Delta \phi'_{\rm V,E}$ , and the correction through  $\phi_{\rm DMC}$ .

Figure 4.6 illustrates  $\phi_{\rm E,DM}$  due to the non-uniform period of CKU. The excess phase  $\phi'_{\rm V}$  will accumulate the desired phase shift of  $\Delta \phi_{\rm M}$  if the modulating frequency  $\Delta f_{\rm M}$  precisely lasts the duration of  $T_{\rm REF}$  [see (4.3)]. However,

the realistic phase accumulation time  $T_{\rm acc}$  deviates from  $T_{\rm REF}$  due to the time-varying CKU. Therefore, an error of  $\Delta \phi'_{\rm V,E}$  is added onto  $\phi'_{\rm V}$  in each cycle. The error in the  $n^{\rm th}$  CKU cycle is

$$\Delta \phi_{\rm V,E}'[n] = \frac{T_{\rm acc}[n] - T_{\rm REF}}{T_{\rm REF}} \cdot \Delta \phi_{\rm M}.$$
(4.8)

The  $T_{\rm acc}[n]$  variation relative to  $T_{\rm REF}$  can be estimated by

$$T_{\rm acc}[n] - T_{\rm REF} = \Delta t_{\rm S}[n] - \Delta t_{\rm S}[n-1].$$

$$(4.9)$$

г л

Substituting (4.5), (4.6), and (4.9) into (4.8) yields the estimation of  $\Delta \phi'_{V,E}$  based on  $\phi_{S,frac}$ . To address  $\Delta \phi'_{V,E}[n]$ , the NUCC core adds to the direct-modulation-related path a compensation phase equal to  $-\Delta \phi'_{V,E}[n]$  in the next CKU cycle, i.e.,

$$\phi_{\text{DMC}}[n+1] \approx (\phi_{\text{S,frac}}[n] - \phi_{\text{S,frac}}[n-1]) \cdot \frac{\Delta \phi_{\text{M}}[n]}{\text{FCW}}.$$
 (4.10)

Consequently, the DCO frequency slightly changes by  $\phi_{\text{DMC}}[n+1]/T_{\text{REF}}$ . If this extra frequency shift could sustain for exactly  $T_{\text{REF}}$ , the DCO would acquire a compensation phase of  $\phi_{\text{DMC}}[n+1]$  to perfectly correct the excess phase error  $\Delta \phi'_{V,\text{E}}[n]$  from the previous cycle. However, this condition is violated due to the time-varying CKU period. Therefore, there is a secondary residue error with the magnitude around  $\Delta \phi_{\text{M}}[n]/\text{FCW}^2$ . Fortunately, this error is negligible, especially at large FCW's (e.g., FCW > 60 in the implemented chip).



Figure 4.7: Comparison of the  $\Delta \phi'_{V,E}$  correction strategies in pre-distortion and postcompensation styles that correct the error with a latency of 0 or 1 CKU cycle, respectively.

One may also notice  $\Delta \phi'_{V,E}$  is post-compensated, i.e., corrected with one CKU cycle latency, and wonder if it would be better to pre-distort  $\Delta \phi'_{V,E}$  to prevent this error from occurring. In fact, these two methods would result in the same simulated EVM. The reason is clarified in Fig. 4.7. Due to the phase integration feature of DCO, compensating  $\Delta \phi'_{V,E}$  takes one CKU cycle, instead of being completed immediately. Therefore, the  $\Delta \phi'_{V,E}$ 

compensation error would stay on the  $\phi'_{\rm V}(t)$  trajectory for *one* clock cycle, whichever strategy is adopted.

#### 4.3.3 Addressing $\phi_{E,PP}$ due to CKU Offset Variation



Figure 4.8: Predicting  $\phi'_{\rm S}$  by subtracting  $\phi_{\rm R2S}$  from  $\phi'_{\rm R}$ , in face of the non-uniform CKU.

Compared to the delay spread compensation in Fig. 4.4, the  $\phi_{\rm E,PP}$ compensation in NUCC specifically addresses the  $\phi_{\rm R2S}$  prediction error raised by the time-varying component of the offset between FREF and CKU. Similar to the scenario in (4.4), calculating  $\phi_{\rm R2S}[n]$  requires the instantaneous modulation frequency  $\Delta f_{\rm M}[n]$  and time offset  $\Delta t_{\rm R2S}[n]$ , which replaces the constant  $\Delta t_{\rm cnst}$  to characterize the time-varying delay between the two critical moments when the excess-phase trajectory  $\phi'_{\rm V}(t)$  crosses  $\phi'_{\rm R}[n]$  and  $\phi'_{\rm S}[n]$  (see Fig. 4.8). Since the aforementioned compensation phase  $\phi_{\rm DMC}$  from NUCC has shifted the modulation frequency to  $\Delta f_{\rm M}[n] = (\Delta \phi_{\rm M}[n] + \phi_{\rm DMC}[n])/T_{\rm REF}$ ,  $\phi_{\rm R2S}$  can be determined by

$$\phi_{\rm R2S}[n] = \frac{\Delta t_{\rm R2S}[n]}{T_{\rm REF}} \cdot (\Delta \phi_{\rm M}[n] + \phi_{\rm DMC}[n]). \tag{4.11}$$

So far,  $\Delta t_{R2S}[n]$  is obscure because the  $\phi'_R[n]$ -crossing moment of  $\phi'_V(t)$  deviates from the CKU grid. However, given that NUCC has compensated the  $\Delta \phi'_{V,E}$  errors (due to the CKU period variation) from all the previous CKU cycles,  $\phi'_V$  can ideally hit  $\phi'_R$  if the relevant CKU cycle virtually lasts for the duration of  $T_{REF}$  (see  $\phi'_R[n+1]$  and the related  $T_{REF}$  in Fig. 4.6). This observation helps to locate  $\phi'_R[n]$  on the  $\phi'_V(t)$  trajectory in Fig. 4.8, and finally leads to the conclusion that  $\Delta t_{R2S}[n]$  equals the time offset between

FREF and CKU in the preceding CKU cycle, i.e.,

$$\Delta t_{\rm R2S}[n] = \Delta t_{\rm S}[n-1] + \Delta t_{\rm cnst}, \qquad (4.12)$$

considering either side of the formula equals  $T_{\text{REF}} - \Delta t_{\text{acc,S}}[n]$ , where  $\Delta t_{\text{acc,S}}[n]$ denotes the duration between the  $n^{\text{th}}$  CKU and the subsequent FREF edges. Substituting (4.5)(4.6) (4.12) into (4.11) yields a  $\phi_{\text{S,frac}}$ -based  $\phi_{\text{R2S}}$  prediction, i.e.,

$$\phi_{\rm R2S}[n] \approx \left(\frac{\Delta t_{\rm cnst}}{T_{\rm REF}} + \frac{1 - \phi_{\rm S, frac}[n-1]}{\rm FCW}\right) \cdot \Delta \phi_{\rm M}[n], \tag{4.13}$$

where the  $\phi_{\text{DMC}}$  term is ignored due to its negligible influences (in the order of  $\Delta \phi_{\text{M}}/\text{FCW}^2$ ).  $\Delta t_{\text{cnst}}$  in this expression characterizes the constant component of the offset between FREF and CKU, thus can be estimated with the LMS algorithm in [86]. Consequently,  $\phi_{\text{R2S}}$ ,  $\phi'_{\text{S}}$  and  $\phi_{\text{S}}$  can be accurately predicted [see Fig. 4.5(c)]. This will not only compensate the  $\phi_{\text{E,PP}}$  error due to the non-uniform CKU, but will also provide an accurate  $\phi_{\text{S,frac}}$  for  $\phi_{\text{E,DM}}$ -compensation in the next cycle [see (4.10)].

## 4.4 DCO Frequency Error Compensation

4.4.1 Characterizing the Error Induced by  $1/\sqrt{LC}$ 



Figure 4.9: Extracted open-loop representation in the direct-modulation path of the phase modulator, highlighting the influences of the forward frequency division  $(\div K)$ ,  $\Sigma \Delta$  dithering and LC-tuning of the DCO.

Figure 4.9 sketches an open-loop representation of the direct-modulation path in a PLL-based phase modulator. The instantaneous resonant frequency of the LC tank is controlled by a switched-capacitor bank, thereby suffering from errors related to the  $1/\sqrt{LC}$ -induced nonlinearity. As mentioned in Section 4.1, these errors increase dramatically at higher values of the fractional FM bandwidth BW<sub>FM</sub>/ $f_0$ . The quantitative analysis starts with the DCO carrier frequency  $f_0 = 1/(2\pi\sqrt{L_0C_0})$ , where  $L_0$  and  $C_0$  are the tank's inductance and capacitance, respectively. With the capacitance change of  $\Delta C$ , the resonant frequency shifts by

$$\Delta f(\Delta C) = \left(\frac{1}{\sqrt{1 + \Delta C/C_0}} - 1\right) \cdot f_0.$$
(4.14)

However, nearly all published frequency modulators utilize just the linear (or first-order) approximation of (4.14) to estimate the frequency shift due to  $\Delta C$ , i.e.,

$$\Delta f_{\rm lin}(\Delta C) \approx -\frac{1}{2} \frac{\Delta C}{C_0} \cdot f_0. \tag{4.15}$$

Consequently, a realistic DCO frequency shift deviates from the expected  $\Delta f_{\rm lin}$  with a relative error of

$$\operatorname{Err}(\Delta f_{\operatorname{lin}}) = \frac{\Delta f - \Delta f_{\operatorname{lin}}}{\Delta f_{\operatorname{lin}}} \approx \frac{3}{2} \frac{\Delta f_{\operatorname{lin}}}{f_0}.$$
(4.16)

Considering that the maximum  $\Delta f_{\rm lin}$  during modulation equals half of the FM bandwidth (i.e., BW<sub>FM</sub>/2), BW<sub>FM</sub>/f<sub>0</sub> thus reflects the level of the  $1/\sqrt{LC}$ -induced FM error.

According to the discussion above, a polar TX under the assumption of invariant signal characteristics (e.g.,  $BW_{sig}$  and  $BW_{FM}$ ) suffers from a higher  $1/\sqrt{LC}$ -induced PM error when it generates a lower RF channel frequency  $f_{RF}$  simply due to the increased  $BW_{FM}/f_0$ , if the DCO directly oscillates at  $f_{RF}$ , i.e.,  $f_0 = f_{RF}$ . However, in a practical polar TX, the DCO output may be first scaled down by a programmable frequency divider  $\div K$  before input to the AM part (see Fig. 4.9) so as to extend the lower operational range of  $f_{RF}$  [25]. Since  $\div K$  allows the DCO to maintain the resonance at high frequency, i.e.  $f_0 = K \cdot f_{RF}$ , one may wonder how this would affect the nonlinearity characterized by  $BW_{FM}/f_0$ . Actually,  $\div K$  also attenuates the DCO phase by K. To ensure the divided output maintains the desired phase  $\theta_M$ , it should be amplified by K before modulating the DCO (see Fig. 4.9). This forces  $BW_{FM}$  to also expand by K. In the end,  $BW_{FM}/f_0$ and the  $1/\sqrt{LC}$ -induced nonlinearity remain the same as in the basic case of  $f_0 = f_{RF}$ .

#### 4.4.2 Phase-Domain Digital Pre-Distortion

Considering the DCO nonlinearity due to the  $1/\sqrt{LC}$  law is well captured in the presented math formulas, it can be compensated by polynomials whose coefficients are determined by pure math. As shown in Fig. 4.10(a), we pre-distort the nonlinearity in the phase domain with a second-order

108



Figure 4.10: Pre-distortion of DCO nonlinearity in (a) phase domain, (b) OTW domain, and (c) both domains, i.e., the combinational DPD.

polynomial term, i.e., adding it to  $\Delta \phi_{\rm M}$ . Derivation of this coefficient relies on the LC-DCO model in Fig. 4.9. Considering (4.14) and the capacitance change due to OTW, i.e.,  $\Delta C = -\text{OTW} \cdot C_{\rm U}$ , where  $C_{\rm U}$  is the capacitance of the switched-capacitor units, the DCO frequency shift of  $\Delta f$  would require an OTW of

OTW = 
$$\frac{C_0}{C_U} \cdot \left[ 1 - \frac{1}{(1 + \Delta f/f_0)^2} \right]$$
 (4.17)

By applying a Taylor series to (4.17) and exploiting (4.1) and (4.6), OTW can be written as a function of  $\Delta \phi_{\rm M}$ ,

$$OTW = \frac{C_0}{C_U} \cdot \left[ \frac{2\Delta\phi_M}{FCW} - \sum_{i=2}^{\infty} (i+1) \cdot \left(-\frac{\Delta\phi_M}{FCW}\right)^i \right].$$
 (4.18)

The coefficient of the linear  $\Delta \phi_{\rm M}$  term also equals  $f_{\rm REF}/K_{\rm DCO}$ , which is the denormalization factor from  $\Delta \phi_{\rm M}$  to OTW in the linearized DCO models, e.g., Fig. 4.3(a). Therefore, (4.18) can be rewritten as

$$OTW = \frac{f_{REF}}{K_{DCO}} \cdot [\Delta \phi_{M} + \phi_{DPD}], \qquad (4.19)$$

where 
$$\phi_{\text{DPD}} = \sum_{i=2}^{\infty} \frac{i+1}{2 \cdot (-\text{FCW})^{i-1}} \cdot \Delta \phi_{\text{M}}^{i}.$$
 (4.20)

 $\phi_{\text{DPD}}$  can be used for the phase-domain DPD. In the implemented system, the terms with i > 2 are discarded as negligible.

Interestingly, prior arts tend to pre-distort the DCO nonlinearity exclusively in the OTW domain [28], [25], and [88], i.e., by adding a compensation signal OTW<sub>DPD</sub> into OTW [Fig. 4.10(b)], rather than into  $\Delta \phi_{\rm M}$ . According to (4.19) and (4.20), OTW<sub>DPD</sub> significantly correlates with  $K_{\rm DCO}$ , i.e.,

$$OTW_{DPD} = \sum_{i=2}^{\infty} \frac{i+1}{2} \left(-\frac{K_{DCO}}{FCW \cdot f_{REF}}\right)^{i-1} \cdot OTW_{lin}^{i}, \qquad (4.21)$$

where  $OTW_{lin}$  is the OTW linearly denormalized without DPD, i.e.,  $OTW_{lin} = \Delta \phi_{\rm M} \cdot f_{\rm REF}/K_{\rm DCO}$ . Considering  $K_{\rm DCO}$  varies dramatically across frequency [87], this might come as no surprise as to why the prior arts suffer from the frequency-dependent  $OTW_{\rm DPD}$ , thus requiring extensive calibration. In contrast, the phase-domain DPD can be calibration-free because the coefficients in (4.20) rely only on the foreknown FCW.

Note that the phase-domain DPD mainly tackles the nonlinearity caused by the  $1/\sqrt{LC}$  law. As for that caused by device mismatches, the OTWdomain DPD can address it with relatively fixed settings since the mismatch is expected to be stable after the fabrication [87]. Therefore, combining the OTW- and phase-domain DPD ultimately leads to a frequency-insensitive solution to address the DCO nonlinearity, i.e., the combinational DPD in Fig. 4.10(c).

#### 4.5 System Implementation

#### 4.5.1 System Overview

Figure 4.11 presents an overview of the implemented phase modulator. The main body is a time-mode-arithmetic-unit (TAU)-based PLL explained in Chapter 2, which natively operates in a fractional-N regime and where the phase error (i.e., normalized timing of CKV relative to FREF),  $\Delta\phi_{\rm E}$ , is extracted by the TAU-based phase detector, then passed through the digital loop filter to be iteratively corrected by tuning the DCO through OTW<sub>TRC</sub> (the OTW for carrier *tracking*). The phase detector extracts  $\Delta\phi_{\rm E}$  according to  $\phi_{\rm S}$ , i.e., the predicted CKV phase  $\phi_{\rm V}$  at the FREF grid, in a coarse-fine style: The coarse path counts the number of CKV edges, representing the integer part of  $\phi_{\rm V}$ , then cancels it with the integer portion of  $\phi_{\rm S}$ , i.e.,  $\phi_{\rm S,int}$ . On the fine path, the TAU samples  $\Delta t_{\rm S}$ , reflecting the fractional  $\phi_{\rm V}$ , cancels it with  $T_{\rm CKV}$  scaled by  $(1 - \phi_{\rm S,frac})$  to extract the time error  $\Delta t_{\rm E}$ . After  $\Delta t_{\rm E}$  is quantized by a time-to-digital converter (TDC) and normalized by the TDC gain (K<sub>TDC</sub>), the resulting phase adds to that of the coarse path, constituting



Figure 4.11: Simplified block diagram of the implemented phase modulator, where the gray signals are used in the LMS calibration.

 $\Delta \phi_{\rm E}$ . The TAU also launches the CKU, which aligns with the fifth CKV falling edge after FREF and clocks the main digital block.

The PM function is realized through the two-point modulation scheme: On the direct modulation (DM) side, the phase-shift target  $\Delta \phi_{\rm M}$  is added to  $\phi_{\rm V}$  by tuning the DCO's offset frequency through  $\Delta \phi_{\rm DM}$ ; on the phase prediction (PP) side,  $\Delta \phi_{\rm M}$  accumulates with FCW so that  $\phi_{\rm S}$  reflects the excess phase and ideally cancels with the sampled  $\phi_{\rm V}$  prior to the digital loop filter. As discussed in Section 4.3 and Section 4.4, the PM accuracy suffers from two significant error sources. One is the DCO's FM nonlinearity raised by  $1/\sqrt{LC}$ , which is compensated by the proposed second-order phasedomain DPD. The other is the non-uniform characteristics of CKU. It is tackled by the NUCC introduced in Fig. 4.5(c), whose separate accumulators for FCW and  $\Delta \phi_{\rm M}$  are combined here without affecting the functionality.



Figure 4.12: Implementation of NUCC with the calibration for the constant time offset,  $\Delta t_{\text{cnst}}$ .

#### 4.5.2 Implementation of NUCC

Figure 4.12 shows the implemented NUCC. The  $\phi_{\rm E,DM}$  and  $\phi_{\rm E,PP}$  compensation paths share the common term  $\Delta \phi_{\rm M}/\rm{FCW}$ , which characterizes the expected phase accumulation on DCO during the average CKV period, i.e.,

$$\Delta f_{\rm M}[n] \cdot \overline{T_{\rm CKV}} = \Delta \phi_{\rm M}[n] \cdot \frac{T_{\rm CKV}}{T_{\rm REF}} = \frac{\Delta \phi_{\rm M}[n]}{\rm FCW}.$$
(4.22)

Scaling  $\Delta \phi_{\rm M}/\text{FCW}$  with  $(\phi_{\rm S,frac}[n] - \phi_{\rm S,frac}[n-1])$  yields  $\phi_{\rm DMC}$ , which compensates  $\phi_{\rm E,DM}$  due to the CKU period variation. This matches (4.10). To compensate  $\phi_{\rm E,PP}$  due to the CKU offset variation,  $\Delta \phi_{\rm M}/\text{FCW}$  is scaled to generate  $\phi_{\rm R2S}$ , i.e.,

$$\phi_{\text{R2S}}[n] = (\widehat{NT}_{\text{cnst}} + 1 - \phi_{\text{S,frac}}[n-1]) \cdot \frac{\Delta \phi_{\text{M}}[n]}{\text{FCW}}.$$
(4.23)

This equation is a re-arranged version of (4.13).  $\widehat{NT}_{\text{cnst}}$  represents the constant component of CKU offset (relative to FREF) normalized by the average CKV period, i.e.,

$$\widehat{NT}_{\rm cnst} = \frac{\Delta t_{\rm cnst}}{\overline{T_{\rm CKV}}}.$$
(4.24)

 $N\hat{T}_{\text{cnst}}$  is estimated by an LMS algorithm that correlates the differentiated  $\Delta\phi_{\text{M}}$  with the detected phase error  $\Delta\phi_{\text{E}}$ , emulating [86]. The diagram is also shown in Fig. 4.12, where the factor  $\mu_{\text{NT}}$  adjusts the calibration convergence speed.

Obviously, larger amplitudes in  $\phi_{R2S}$  and  $\phi_{DMC}$  indicate that more PM error is compensated by NUCC. Since  $\Delta \phi_M$ /FCW is the base scaling term in both (4.10) and (4.23), NUCC can improve the PM accuracy more conspicuously when a wideband signal (with a higher distribution probability at large  $\Delta \phi_M$ amplitudes) modulates the PLL with a small FCW. Besides, the impact of  $\phi_{DMC}$  outweighs that of  $\phi_{R2S}$ : The former scales  $\Delta \phi_M$ /FCW with a factor (i.e.,  $\phi_{S,frac}[n] - \phi_{S,frac}[n-1]$ ) ranging from -1 to 1, and reduces  $\phi_{E,DM}$ , which could directly accumulate on the DCO and interfere with the PM signal across multiple CKU cycles until corrected by the PLL. The latter scales  $\Delta \phi_M$ /FCW with a factor (i.e.,  $\phi_{S,frac}[n-1]$ ) distributed within [0,1), and reduces  $\phi_{E,PP}$ , which can be attenuated by the loop filter before disturbing the DCO.

Since NUCC tackles the  $\phi_{E,DM}$  and  $\phi_{E,PP}$  errors whose impacts depend on the PLL bandwidth (see Section 4.2.1), the EVM improvement due to NUCC is also bandwidth-dependent. To demonstrate that, time-domain simulations of a 3188-MHz PLL-based phase modulator shown in Fig. 4.11 have been carried out. The simulation conditions (e.g., using a 64-PSK signal,  $f_{\text{REF}}$  of 40 MHz, feedforward frequency division K=8, etc.) and the way to evaluate EVM are identical as in the measurements later presented in Fig. 4.21(b). The DCO in this simulation has perfect linearity and ultra-fine resolution, thereby contributing negligible distortion and quantization error to EVM. This benefits in observing the impacts of non-uniform CKU and NUCC. The simulated EVM versus the PLL bandwidth is shown in Fig. 4.13. Enabling NUCC (see the "NUCC on" curve) improves EVM by at least 10 dB compared with the case when NUCC is disabled (see the "NUCC off" curve). Hence, the "NUCC off" behavior is dominated by the impact of non-uniform CKU, thereby roughly reflecting the EVM degradation due to the non-uniform CKU. According to the "NUCC off" curve, the non-uniform CKU degrades EVM more forcefully at narrower PLL bandwidths because the degradation is dominated by the  $\phi_{E,DM}$  error being less suppressed by the PLL loop. Therefore, especially at low PLL bandwidths, the bulk of EVM improvement from NUCC is obtained by merely enabling  $\phi_{\rm DMC}$  (see the curve of "only  $\phi_{\text{DMC}}$  of NUCC on"). The EVM associated with the  $\phi_{\text{DMC}}$ only option increases at wider PLL bandwidths because the non-uniform CKU contributes more PM error through  $\phi_{E,PP}$  when the PLL bandwidth is wider. This necessitates activating the  $\phi_{R2S}$  component of NUCC at wide

PLL bandwidths. Finally, simultaneously utilizing both options in NUCC nearly entirely removes the effects of non-uniform CKU and lowers the EVM to the level limited by phase noise across a wide range of PLL bandwidths.



Figure 4.13: Simulated EVM versus PLL bandwidth under different NUCC settings. The simulation conditions (i.e., PM signal, reference frequency  $f_{\text{REF}}$ , carrier frequency  $f_0$ , feedforward division ratio K, etc.) are the same as those in Fig. 4.21(b).

#### 4.5.3 DCO with Calibration

Figure 4.14(a) depicts a schematic of the DCO core, consisting of the LC-tank and complementary cross-coupled transistor pairs. The resonant frequency is tuned by the switched-capacitor (SC) banks. While performing PM, the active banks can be functionally categorized into two types. The first tracks the carrier, i.e., the 32-b unary tracking bank (TB). The second is used for FM and configured in a segmented style, i.e., consisting of an 8-b unary coarse modulation bank (MCB) and a 16-b unary fine modulation bank (MFB). All the encoded OTWs are resampled by CKU before toggling the DCO SC units in order to avoid the data-dependent propagation delay, which may vary the effective phase accumulation time in each CKU cycle and finally degrade the PM accuracy.

All the banks adopt the SC-unit structure sketched in Fig. 4.14(a), whose unit capacitor  $C_U$  is inspired by the layout of a SAR ADC [96]. Here, the ground and output (VP/VN) nets can shield the internal switching node from the surroundings to minimize the systematic capacitance mismatch. This layout style also allows the SC units to abut each other, thereby shortening critical connection lines (i.e., VP and VN) to minimize the FM error related to the parasitic routing inductance.

Figure 4.14(b) illustrates the control logic surrounding the DCO core. Regarding the carrier phase tracking, the integer portion of  $OTW_{TRC}$ , i.e.  $OTW_{TB}$ , directly tunes the number of active TB units, and the fractional  $OTW_{TRC}$  dithers one TB unit through a high-speed (HS)  $\Delta\Sigma$  modulator



Figure 4.14: (a) Schematic of the DCO core, and (b) control logic surrounding the DCO core, where the digital blocks are implicitly clocked by CKU, except for the CKV clock divider ( $\div$ 4) and the high speed (HS)  $\Delta\Sigma$ .

clocked by CKVD4 at 1/4 CKV frequency to improve resolution [26].

For phase modulation,  $\Delta \phi_{\rm DM}$ , i.e., the compensated  $\Delta \phi_{\rm M}$ , is first denormalized to OTW<sub>M</sub> by  $f_{\rm REF}/\hat{K}_{\rm DCO,M}$ , where  $\hat{K}_{\rm DCO,M}$  estimates the MFB's frequency resolution. To control MCB and MFB separately, the integer part of OTW<sub>M</sub> after rounding, i.e., OTW<sub>M,I</sub>, splits into OTW<sub>MCB</sub> and OTW<sub>MFB</sub> without extra re-scaling. This is because each MCB unit contains 16 MFB units, resulting in a nominal resolution ratio of 16. To employ TB's fine resolution (around 1/9 of the MFB), the rounding residue  $OTW_M$ , i.e.,  $OTW_{M,F}$ , modulates TB after it is scaled by the resolution ratio between MFB and TB, i.e.,  $\widehat{K}_{DCO,M}/\widehat{K}_{DCO,T}$ , where  $\widehat{K}_{DCO,T}$  estimates the frequency resolution of TB.

Among the three SC-banks, MCB has the coarsest resolution and affects the DCO FM linearity the most significantly. To address the frequency error associated with each  $OTW_{MCB}$  codeword (9 in total), a look-up table (LUT) adds an  $OTW_{MCB}$ -dependent compensation code,  $OTW_C$ , to the TB-tuning path. However, the control words from the scaled  $OTW_{M,F}$  and LUT contain fractional bits, incompatible with the integer  $OTW_{TB}$ . Therefore, their sum is noise-shaped by a first-order low-speed (LS)  $\Delta\Sigma$  modulator (at the CKU rate) before being added to  $OTW_{TB}$  to prevent the quantization error from accumulating on the DCO. To further suppress the quantization error, one can also add the fractional bits to the high-speed  $\Delta\Sigma$  modulator, as in [81].



Figure 4.15: Behavioral description of the LUT with off-line calibration in Fig. 4.14: (a) Calibrating the LUT content with the piecewise LMS algorithm in [28], and (b) updating the LUT with an LMS algorithm emulating  $K_{\text{DCO}}$  calibration.

Two categories of parameters need to be estimated in Fig. 4.14(b). The first category is related to  $K_{\rm DCO}$ , i.e.,  $f_{\rm REF}/\hat{K}_{\rm DCO,M}$  and  $\hat{K}_{\rm DCO,M}/\hat{K}_{\rm DCO,T}$ . They are calibrated by an LMS-based algorithm, which correlates the detected phase error  $\Delta\phi_{\rm E}$  [input of the digital loop filter, see Fig. 4.11] and the relevant phase tuning target (i.e.,  $\Delta\phi_{\rm DM}$  or  ${\rm OTW}_{\rm M,F}$ ), as in [86]. The second category is the LUT content, which is updated by correlating  $\Delta\phi_{\rm E}$  with  ${\rm OTW}_{\rm MCB}$ . The detailed algorithm depends on the dominant mechanism of non-idealities in MCB. For example, if the mismatch between the MCB units dominates, the piecewise LMS algorithm shown in [28] is preferred. Figure 4.15(a) sketches the calibration principle. The LUT function is represented by the mux which conditionally passes the OTW<sub>MCB</sub>-associated compensation codes, VAL[0...7], to OTW<sub>C</sub>. After the chosen VAL[n] is used, the corresponding  $\Delta \phi_{\rm E}$  difference is scaled by  $\mu_{\rm DCO}$  and added to that VAL[n] (enabled by EN[n]). VAL[n] finally converges at the value that exactly compensates for the error of the associated OTW<sub>MCB</sub> codeword. One may notice only 8 VAL units (VAL[0] to VAL[7]) are adopted to compensate the 9 OTW<sub>MCB</sub> codewords, i.e., integers ranging from -4 to 4 (considering MCB is 8-b unary). In fact, the frequency error associated with the codeword OTW<sub>MCB</sub> = 0 gets implicitly counted in the carrier frequency  $f_0$  and automatically corrected by the PLL since OTW<sub>MCB</sub> = 0 is used when PLL locks the DCO to  $f_0$ .

On the other side, if the dominant DCO non-ideality mechanism arises from the gain mismatch between MCB and MFB, i.e., the resolution ratio between MCB and MFB deviates from the nominal 16, all the desired VAL's linearly correlate with OTW<sub>MCB</sub> through the same factor, say  $K_{\rm C}$ . Consequently, the piecewise calibration in Fig. 4.15(a) simplifies to a  $K_{\rm DCO}$ -calibration-like algorithm shown in Fig. 4.15(b), where all the OTW<sub>MCB</sub> codewords and their corresponding  $\Delta \phi_{\rm E}$  difference data are correlated to estimate the same gain factor  $K_{\rm C}$ . Then,  $K_{\rm C} \cdot {\rm OTW}_{\rm MCB}$  replaces the function of LUT. One may doubt whether  $K_{\rm C}$  calibration interferes with that for  $f_{\rm REF}/\hat{K}_{\rm DCO,M}$ , considering both ultimately correlate  $\Delta \phi_{\rm E}$  with  $\Delta \phi_{\rm M}$  (OTW<sub>MCB</sub> is proportional to  $\Delta \phi_{\rm M}$ if the phase-domain DPD is ignored). Actually, the mutual interference can be suppressed by activating these two calibrations at different moments:  $f_{\rm REF}/\hat{K}_{\rm DCO,M}$  is calibrated only when OTW<sub>MCB</sub> = 0; during this time,  $K_{\rm C}$ naturally does not update.

To maintain flexibility in modifying the algorithm, the LUT is updated in an off-line style [see Fig. 4.14(b)]:  $\Delta \phi_{\rm E}$  and  $OTW_{\rm MCB}$  sequences are collected and stored in an SRAM for debugging. The software reads the data, processes it, and updates the LUT. With the new content in the LUT,  $\Delta \phi_{\rm E}$ and  $OTW_{\rm MCB}$  samples are collected again to update the LUT, whose content settles after several iterations.

#### 4.5.4 Calibrated Parameters in Face of Channel Hopping

The implemented system utilizes, in total, four calibration loops related to phase modulation, i.e., those for  $\widehat{NT}_{cnst}$ ,  $f_{REF}/\widehat{K}_{DCO,M}$ ,  $\widehat{K}_{DCO,M}/\widehat{K}_{DCO,T}$ , and the LUT tackling the OTW<sub>MCB</sub>-associated error. Blindly re-calibrating all these parameters after channel hopping may take a long time before the EVM reaches back its optimum. To accelerate this re-calibration process, we first examine the frequency dependence of these parameters and then roughly compensate them according to the change in FCW.



Figure 4.16: Breaking down the  $\widehat{NT}_{cnst}$  components.

Considering (4.24),  $\widehat{NT}_{\text{cnst}}$  is designed to be a constant 4 because  $\Delta t_{\text{cnst}}$ ideally represents an offset between CKU and the first CKV edge after FREF, and roughly equals  $4\overline{T}_{\text{CKV}}$ . However, the DCO modulation frequency  $\Delta f_{\text{M}}$ does not change immediately after the rising edge of CKU. An additional delay, i.e.,  $\Delta t_{\text{prop}}$  in Fig. 4.16, is always present mainly due to the propagation latency of control signals (e.g., OTW's). This delay is substantially constant in the time domain, but turns frequency-dependent after being normalized by  $\overline{T}_{\text{CKV}}$ . Since the estimated  $\widehat{NT}_{\text{cnst}}$  also counts  $\Delta t_{\text{prop}}$ , the  $\Delta t_{\text{prop}}$ -related part of  $\widehat{NT}_{\text{cnst}}$  should be re-normalized according to the FCW (inversely proportional to  $\overline{T}_{\text{CKV}}$ ) after each channel hopping, i.e.,

$$\widehat{NT}_{\text{cnst}}\big|_{\text{new}} = 4 + \left(\widehat{NT}_{\text{cnst}}\big|_{\text{old}} - 4\right) \cdot \frac{\text{FCW}|_{\text{new}}}{\text{FCW}|_{\text{old}}},\tag{4.25}$$

where the subscripts "old" and "new" distinguish the corresponding parameters in the previous and newly hopped channels. After the channel hopping, if  $\Delta t_{\rm prop}$  does not significantly change (for example, caused by environmental variations, such as supply voltage or temperature), (4.25) can directly set  $\widehat{NT}_{\rm cnst}$  to the value accurate enough to achieve optimum EVM in a new frequency channel. Consequently, re-calibration will be unnecessary.

Per mathematical derivation in [87],  $K_{\text{DCO}}$  exhibits a cubic relationship with the resonant frequency. Hence, after hopping to a new channel,  $f_{\text{REF}}/\widehat{K}_{\text{DCO,M}}$  should be re-calculated by

$$\frac{f_{\text{REF}}}{\widehat{K}_{\text{DCO,M}}}\Big|_{\text{new}} = \frac{f_{\text{REF}}}{\widehat{K}_{\text{DCO,M}}}\Big|_{\text{old}} \cdot \left(\frac{\text{FCW}|_{\text{old}}}{\text{FCW}|_{\text{new}}}\right)^3.$$
(4.26)

This equation is derived under the assumption of an ideal inductor. Considering a real inductor behaves a bit differently due to its parasitic capacitance, the estimated value might not be accurate enough for low EVM. Hence, some further calibration might still be needed. In contrast,  $\widehat{K}_{\text{DCO},\text{T}}/\widehat{K}_{\text{DCO},\text{M}}$  is

determined by the capacitance ratio of the SC units in MFB and TB, thus independent of frequency and in no need of any further adjustment.

Regarding the LUT for MCB, it is utilized in combination with the phase-domain DPD which tackles the  $1/\sqrt{LC}$ -induced parabolic nonlinearity. Hence, the LUT mainly compensates for the nonidealities raised by device mismatches, e.g., the capacitance mismatch between MCB units or the gain mismatch between MCB and MFB. Considering these mismatch ratios are roughly constant after the fabrication, the LUT content does not need a frequency-dependent adjustment unless extremely low EVM is targeted.

In summary, after channel hopping, the values of  $\widehat{NT}_{\text{cnst}}$  and  $f_{\text{REF}}/K_{\text{DCO,M}}$ need to be modified using (4.25) and (4.26) to compensate their frequency dependence. Only  $f_{\text{REF}}/K_{\text{DCO,M}}$  needs re-calibration. These observations can help to shorten the calibration time.

# 4.6 Measurement Results



Figure 4.17: (a) Chip micrograph and (b) power consumption breakdown.

The proposed phase modulator is fabricated in TSMC 40-nm CMOS and occupies an active area of  $0.31 \text{ mm}^2$  [excluding the output drivers and SRAMs, see Fig.  $4.17(a)^1$ ]. With a reference clock of 40 MHz, it generates a phase-modulated clock whose carrier frequency  $f_0$  ranges from 2.7 to 3.9 GHz. Fig. 4.17(b) shows the power consumption breakdown. The overall power drain is 4.6 mW, which is dominated by the DCO and its buffer, costing 2.35 mW at a 1.1 V supply. All other blocks are supplied with 1.0 V. The power consumption for the TAU-based phase detector sub-system (including TAU, TDC, counter, etc.) and digital logic are respectively 0.95 mW and

<sup>&</sup>lt;sup>1</sup>This is the same chip as in Chapter 2 but with different functions turned on.

1.2 mW. The digital power is measured with the feedforward frequency division K = 8 after engaging all the proposed options (i.e., phase domain DPD, LUT, NUCC), and the calibrations for  $\widehat{NT}_{cnst}$  and  $K_{DCO}$ 's. Considering the obvious circuit simplicity and low clock rate of the off-line calibration for the LUT, if the calibration shown in Fig. 4.15(a) were to be implemented on-chip, it would add a negligible power penalty to the overall 4.6-mW figure.



#### 4.6.1 Measurement of the DCO's FM-INL

Figure 4.18: DCO FM-INL: (a) Measurement setup and results (b) with different DCO linearization settings when  $f_0=3188$  MHz, and (c) with the proposed phase-domain DPD and the same  $K_{\rm E} = 0.023$  at multiple  $f_0$ 's.

To measure the integral nonlinearity (INL) of the DCO's FM characteristic ("FM-INL"), we adopt the flow in Fig. 4.18(a). All possible combinations of the FM-related OTW's are input to a free-running DCO to measure the frequency differences relative to the corresponding  $f_0$ , like in [97]. Such measured frequency difference reflects  $\Delta f_{\rm M}$  in a realistic FM operation. Meanwhile, the three OTW's are combined into OTW<sub>M</sub>, then 'restored' to  $\Delta \phi_{\rm M}$  through a reversed data flow relative to Fig. 4.14(b). Afterwards,  $\Delta \phi_{\rm M}$ is converted to the expected  $\Delta f_{\rm M}$  according to (4.1). The difference between the measured and expected  $\Delta f_{\rm M}$ 's reflects the FM-INL of the DCO.

Figure 4.18(b) shows the measured FM-INL at  $f_0=3188$  MHz. The 'linear' (blue) case restores  $\Delta \phi_{\rm M}$  by assuming that the  $\Delta \phi_{\rm M}$ -to-OTW function [in Fig. 4.18(a)] contains only the first-order term, thereby reflecting the FM-INL



Figure 4.19: Simulations of *equivalent* inductance of DCO's coil  $(L_{eq})$ : (a)  $L_{eq}$  versus frequency, and (b)  $L_{eq}$  versus offset frequency relative to the center frequency  $(f_0)$  across the corresponding FM range [shown on Fig. 4.18(c)].

of the DCO under the conventional *linear* assumption, as in Fig. 4.3(a). In reality, the INL curve is parabolic, and the maximum frequency deviation can exceed 7 MHz. After including the second-order term in the  $\Delta \phi_{\rm M}$ -to-OTW function, which emulates the case of applying the proposed phasedomain DPD, the INL curve (green) becomes a linear staircase. This residue error after the DPD is behaviorally attributed to the nonideality that the resolution ratio between MCB and MFB deviates from the nominal value of 16; it is because this curve contains 9 stairs, coincident with the number of MCB codewords. Physically, two factors may explain this deviation. First is a parasitic routing inductance, with which an identical SC-unit in the physically (in layout) separated MFB and MCB can exhibit different impacts in tuning the DCO's resonance frequency. Second is the frequency-dependent characteristic of the DCO coil's equivalent inductance, defined as  $L_{eq} =$  $\mathcal{I}m[Z_{\rm L}(\omega)]/\omega$ , where  $\mathcal{I}m[Z_{\rm L}(\omega)]$  is the imaginary part of the composite DCO coil's impedance at angular frequency  $\omega$ . Around the center frequency of 3188 MHz, the *equivalent* inductance changes up to 0.4% across the expected  $\Delta f_{\rm M}$  range (according to simulated inductance shown in Fig. 4.19(b)) and theoretically results in a peak-to-peak frequency error around 0.2% of  $f_0$ , almost coinciding with the 5-MHz peak-to-peak frequency error on the DPD compensated curve (green) in Fig. 4.18(b). To compensate for this error, we introduce a small correction factor  $K_{\rm E}$  when combining the OTW's [see Fig. 4.18(a)]. With  $K_{\rm E}=0.023$ , the maximum INL reduces to 0.5 MHz, below 0.26% of the full FM range (i.e., 197 MHz).  $K_{\rm E}$  is merely used to describe the nonlinear behavior here, and the associated effect will be addressed by the LUT for  $OTW_{MCB}$  when characterizing the PM accuracy.

Figure 4.18(c) shows the FM-INL curves at multiple  $f_0$ 's under the same DCO linearization settings, i.e., using the second-order phase-domain DPD

and  $K_{\rm E} = 0.023$ . From 2708 to 3786 MHz, the frequency error is always below 0.45% of the full range, validating the efficacy of the phase-domain DPD in a wide range of carrier frequencies. The declining trend of the 3948-MHz curve can be attributed to the frequency-dependent  $L_{\rm eq}$ , which increases faster than at around 3188 MHz (see Fig. 4.19(b)) and thus cannot be perfectly compensated with the same  $K_{\rm E}$  value.

#### 4.6.2 PM Signal Generation and Measurement Setup



Figure 4.20: (a) M-PSK signal generation by interpolating the symbol phases  $(\theta_{\text{sys}})$  with a frequency pulse-shaping filter [g(t)], and (b) the setup for measuring the phase modulator's EVM.

Although a GMSK signal is commonly used to evaluate the accuracy of phase modulators, it may fail to reflect the performance across the full PM range because it employs only two possible phase shifts between symbols (i.e.,  $\pm 0.5\pi$ ), exercising limited OTW codewords. Therefore, using M-PSK signals is deemed more reasonable. To avoid amplitude modulation in conventional M-PSK signals [98], we generate the test signal by interpolating the symbols using a frequency pulse-shaping filter from the continuous phase modulation (CPM) [99].

Figure 4.20(a) illustrates how the symbol is interpolated in this work. The frequency pulse-shaping filter g(t) lasts four sampling clock (FREF) cycles, equal to one symbol period  $T_{\rm sys}$ . The integral of g(t) defines the transition between symbol phases, i.e.,  $\theta_{\rm sys}$ . During the first three  $T_{\rm REF}$ 's, g(t)traverses the shape of a raised-cosine filter to smoothen the phase trajectory  $\theta_{\rm M}(t)$ . In the last  $T_{\rm REF}$ , g(t) = 0, thus freezing  $\theta_{\rm M}(t)$  at the associated  $\theta_{\rm sys}$ . Consequently, the symbols can be simply restored by sampling the transmitted signal during this period.

The measurement setup is shown in Fig. 4.20(b). The desired phase, i.e.,

the discrete-time  $\theta_{\rm M}$ , is processed to  $\Delta \phi_{\rm M}$ , loaded into an on-chip SRAM, and then input to the proposed phase modulator. The modulated output centers at  $f_0$  and is further frequency-divided off-chip by K (programmable from 1 to 8). The division extends the carrier to a lower RF channel frequency emulating a realistic multi-band polar TX, and helps to evaluate the effects DCO nonlinearity at large BW<sub>FM</sub>/ $f_0$ . The divided clock is sampled by a high-speed oscilloscope, then processed in Matlab to evaluate the EVM.

#### 4.6.3 Modulation Performance at 64-PSK

A 64-PSK signal with a data rate of 60 Mb/s is finally adopted to evaluate the phase modulation accuracy. Figure 4.21 shows the measured constellation diagram at  $f_0 = 3188$  MHz. According to Fig. 4.21(a), when the feedforward divider ratio K increases from 1 to 8 with all compensation options turned off (i.e., phase-domain DPD, LUT for OTW<sub>MCB</sub>, and NUCC), EVM degrades from  $-35.1 \,\mathrm{dB}$  to  $-24.4 \,\mathrm{dB}$ . This is because the large K requires wider BW<sub>FM</sub> (expanding from 24 to 192 MHz)<sup>1</sup>, which boosts BW<sub>FM</sub>/ $f_0$  (increasing from 0.75% to 6.02%), and finally intensifies the  $1/\sqrt{LC}$ -induced DCO nonlinearity.

Figure 4.21(b) begins with the worst-case K = 8 in Fig. 4.21(a). After enabling the phase-domain DPD, EVM is improved to  $-38.3 \,\mathrm{dB}$ . However, as indicated by the DCO FM-INL curve in Fig. 4.18(b), the DPD performance is masked by the error in the resolution ratio between MCB and MFB, i.e.,  $K_{\rm E}$  in Fig. 4.18(a). To combat this  $K_{\rm E}$  error, the LUT for OTW<sub>MCB</sub> [see Fig. 4.14(b)] is updated by the  $K_{\text{DCO}}$ -calibration-like algorithm shown in Fig. 4.15(b), where the compensation gain  $K_{\rm C}$  is equivalent to  $16K_{\rm E} \cdot \hat{K}_{\rm DCO,M}/\hat{K}_{\rm DCO,T}$ . Then, EVM is improved to  $-44.7 \, \text{dB}$ . This suggests that the LC-DCO can be sufficiently linearized by the proposed phase-domain DPD with a proper  $K_{\rm DCO}$  estimation. On top of that, enabling the NUCC further improves EVM by  $2.9 \,\mathrm{dB}$  to  $-47.6 \,\mathrm{dB}$ . The final EVM is limited by the unexpected DCO nonlinearity [see the FM-INL in Fig. 4.18(c)]. The difference in EVM before and after applying NUCC suggests that NUCC removes a PM error around  $-47.9 \,\mathrm{dB}$ , agreeing with the simulation result (see the "NUCC off" curve in Fig. 4.13) at a large PLL bandwidth (around 3 MHz according to the phase noise profile in Fig. 4.22). In addition, the output spectrum of this case is shown in Fig. 4.23.

Figure 4.24(a) shows the measured EVM versus the fractional FCW (FCW<sub>frac</sub>) at different forward frequency division ratios (K) when the integer FCW and all compensation options remain the same as in the final state of

<sup>&</sup>lt;sup>1</sup>Because the frequency pulse-shaping filter smooths out the phase transitions between any two subsequent symbols,  $\Delta \phi_{\rm M}$  of the 64-PSK signal ranges from -0.3 to 0.3. This results in BW<sub>FM</sub> =  $0.6 f_{\rm REF} = 24$  MHz when K = 1. For arbitrary K, BW<sub>FM</sub> =  $K \times 24$  MHz.



Figure 4.21: Constellation diagram of a 60 Mb/s 64-PSK signal measured at  $f_0=3188$  MHz: (a) Feedforward frequency division K increases from 1 to 8, with all compensation options off (i.e., phase-domain DPD, LUT for OTW<sub>MCB</sub>, and NUCC); (b) K = 8 and all the compensation options are incrementally turned on.

Fig. 4.21(b). Under the constant K, EVM varies within 1 dB across FCW<sub>frac</sub><sup>1</sup>. With K increasing from 1 to 8, EVM shows a 10.6 dB improvement, similar to the trend of quantization noise that decreases with  $-20 \log_{10} K$ . However, the EVM is actually dominated by the DCO nonlinearity according to the EVM breakdown for the rightmost case on the K = 1 curve: The contribution due to the DCO's finite resolution is  $-43 \,\mathrm{dB}$ . This is because the TB's frequency resolution  $\Delta f_{\rm res} = 156 \,\mathrm{kHz}$  and update interval  $T_{\rm REF} = 25 \,\mathrm{ns}$  result in phase resolution of  $\theta_{\rm res} = 2\pi \cdot \Delta f_{\rm res} \cdot T_{\rm REF}$ , which adds to the modulated phase as a quantization noise with the power of  $\theta_{\rm res}^2/12$ , given that the noise transfer function of the low-speed first-order  $\Delta\Sigma$  in Fig. 4.14(b), i.e.  $N(z) = 1 - z^{-1}$  [100], cancels out the accumulation characteristic of DCO, i.e.  $1/(1-z^{-1})$  in the transfer function (see Fig. 4.9). Additionally,

<sup>&</sup>lt;sup>1</sup>In the realized phase modulator, the FREF signal couples to and periodically disturbs the DCO. The disturbance strength depends on the instantaneous phase difference between the FREF and DCO clocks, thus fluctuating at the frequency of FCW<sub>frac</sub> ·  $f_{\text{REF}}$ . At lower FCW<sub>frac</sub>, the disturbance experiences less filtering by the DCO (described by the DCO's phase-domain transfer function, i.e., 1/s). The unfiltered disturbance not only directly degrades EVM by increasing the PM error, but also results in a larger detected phase error  $\Delta\phi_{\text{E}}$ . A large  $\Delta\phi_{\text{E}}$  can saturate the TDC (detecting time errors ranging from -3.5 ps to 3.5 ps), and slow down the PLL's transient response. Therefore, PM errors stay uncorrected for a longer time, thereby further degrading the EVM. This is a possible explanation as to why the EVM increases at very small FCW<sub>frac</sub>.



Figure 4.22: Measured phase noise at 3188 MHz under the same loop bandwidth setting as the EVM measurements in Fig. 4.21.



Figure 4.23: Measured spectrum of the RF output clock modulated with a 60 Mb/s (10 MSymbol/s) 64-PSK signal at the RF channel frequency of  $3188 \div 8$  MHz.

the integrated phase noise (IPN) of the unmodulated carrier degrades the EVM by  $-44 \,\mathrm{dB}$ , which is 3 dB higher than the double-sided IPN of  $-47 \,\mathrm{dBc}$  shown in Fig. 4.22, since the modulated signal spreads over both positive and negative offset frequencies. The combined EVM contribution from these two sources is  $-40.5 \,\mathrm{dB}$ , which is 3.5 dB lower than the measured EVM of  $-37 \,\mathrm{dB}$ . The DCO nonlinearity appears the only candidate to explain this



Figure 4.24: (a) Measured EVM versus fractional FCW (FCW<sub>frac</sub>) for different feedforward frequency-division ratios (K) when the integer FCW is fixed at 79 (i.e.,  $f_0$  around 3160 MHz); (b)  $\Delta f_M$  distribution correlated with the DCO's FM-INL.

To further explore why the DCO nonlinearity affects EVM in a similar trend as does the quantization noise, Fig. 4.24(b) provides the  $\Delta f_{\rm M}$  distribution together with the DCO's FM-INL curve, on which the 9 discrete segments correlate with the 9 MCB codewords, and the V-shape of each segment arises from the mismatch between the MFB units. When K = 1, the exercised  $\Delta f_{\rm M}$  range almost overlaps with the central V-shape segment, so only the FM-INL related to MFB degrades the EVM. However, when Kincreases to 8, the INL grows  $2.5\times$ , i.e., from 0.2 to 0.5 MHz. Considering that the operational  $\Delta f_{\rm M}$  range is also multiplied by 8, the INL relative to the exercised range shrinks by 0.31, agreeing with the 10 dB improvement in EVM. Therefore, the high EVM at small K is mainly attributed to the MFB exhibiting unexpectedly strong nonlinearity, which is even higher than that due to MCB considering the frequency-tuning range. To further improve the EVM, additional measures are needed to combat the MFB-related INL, e.g., an additional LUT for  $OTW_{MFB}$  or the dynamic element matching (DEM) in [27].

Figure 4.25 shows the measured EVM versus the DCO carrier frequency  $f_0$  at different forward frequency division ratios K. EVM basically decreases at low  $f_0$  and large K cases because they exercise a wider portion of the DCO's frequency-tuning range to dilute the effect of MFB's nonlinearity. To demonstrate that the combinational DPD for the DCO-nonlinearity, i.e., combining both phase and OTW domains, can achieve the frequency-insensitive performance, the EVM is measured in two scenarios. In the first case (solid lines in Fig. 4.25), the compensation settings (i.e., phase-domain DPD, the OTW<sub>MCB</sub> LUT, and NUCC) are kept the same as in the final state in Fig. 4.21(b) irrespective of  $f_0$ . In the second scenario (the dashed lines),

gap.



Figure 4.25: Measured EVM versus the DCO carrier frequency  $(f_0)$  at different forward frequency division ratios (K). The corresponding BW<sub>FM</sub> scales with K, i.e., BW<sub>FM</sub> =  $K \times 24$  MHz.

the OTW<sub>MCB</sub> LUT is updated at each frequency point with the piecewise calibration shown in Fig. 4.15(a) to represent the optimum EVM of this design. At most points, the solid lines coincide with the dashed ones. In the case of K = 4 and K = 8, EVM on the solid lines remains below -43 dBacross the full tuning range of  $f_0$ . This validates the frequency-insensitive performance of a combinational DPD solution.

One may notice a greater deviation between the solid and dashed lines at relatively high frequencies ( $f_0 > 3.4 \text{ GHz}$ ) and K = 8. This is because the DCO exhibits a larger FM-INL [after compensating by a *fixed* gain factor,  $K_{\rm C}$ , shown in Fig. 4.15(b)] at higher resonant frequencies and across wider exercised  $\Delta f_{\rm M}$  ranges (i.e., BW<sub>FM</sub> which scales with K) according to Fig. 4.18(c).



Figure 4.26: Measured transient trajectories of the calibration coefficients and EVM. Modulation and calibration are turned on at t = 0 after PLL gets locked to the target frequency  $f_0$ . Results are measured at K = 4, when  $f_0$  hops (a) from 2868 MHz to 3948 MHz and (b) vice versa.

Due to its relatively frequency-insensitive performance, the combinational DPD can reduce the efforts required in the DCO nonlinearity calibration and shorten the time to reach optimum EVM after each frequency hop. To prove this, we hopped the PLL's center frequency  $f_0$  between 2868 and 3948 MHz, then measured (recorded by the debugging SRAM in Fig. 4.14) the settling curves of  $f_{\text{REF}}/\hat{K}_{\text{DCO,M}}$  (the only parameter that will likely require a re-calibration according to Section 4.5.4), as shown in Fig. 4.26.

At each new frequency,  $f_{\rm REF}/\widehat{K}_{\rm DCO,M}$  starts with an initial value that is calculated from the final value of the previous frequency using (4.26), and then settles within  $15 \,\mu s$ . Regarding the remaining PM-related parameters,  $\hat{NT}_{\text{cnst}}$  values were calculated as per (4.25);  $\hat{K}_{\text{DCO,M}}/\hat{K}_{\text{DCO,T}}$  and the LUT content are frequency-independent, thus staying unchanged. These parameters are not shown in Fig. 4.26 because they are temporally frozen during the  $f_{\rm REF}/\hat{K}_{\rm DCO,M}$  settling to avoid any mutual influence with the unsettled  $f_{\rm REF}/\hat{K}_{\rm DCO,M}$ , thereby accelerating the calibration. The measured transient  $f_{\rm REF}/\widehat{K}_{\rm DCO,M}$  was also written back to the phase modulator to measure the corresponding EVM in the K = 4 case (where the calibration process also used the same PM sequence in accordance with K = 4). As shown in Fig. 4.26, EVM settles to the optimum value within  $15 \,\mu s$ . This time is much shorter than 100 ms needed by the phase modulator to calibrate the DCO's nonlinearity with the piecewise LMS algorithm [84]. One might argue that this comparison is unfair since the aforementioned 100 ms is the calibration time during an initialization, which can be shorter if optimized for the channel hopping. However, the assumed shorter calibration time after channel hopping is not true for the piecewise LMS since the calibration results of the piecewise LMS are not only related to the DCO nonlinearity but also to the estimated  $K_{\rm DCO}$ 's [28]. After the DCO hops to the frequency associated with a faraway channel,  $K_{\rm DCO}$ 's will change significantly. Consequently, the piecewise LMS will need to correct rather huge errors, and so the corresponding calibration time will not considerably differ from that in the original initialization.

#### 4.6.4 Performance Comparison

Table 4.1 compares this work with state-of-the-art PLL-based phase modulators. While running the DCO at 3,188 MHz, this design produces a transmitted RF carrier at 398.5 MHz after the division by K = 8. When generating the 64-PSK signal, the DCO exercises a FM bandwidth (BW<sub>FM</sub>) of 192 MHz, corresponding to 6.02% fractional BW<sub>FM</sub> (BW<sub>FM</sub>/ $f_0$ ); hence it results in a large FM error due to the  $1/\sqrt{LC}$  law. Despite this, the proposed phase modulator achieves the lowest EVM and energy per bit, i.e., -47.6 dB and  $0.08 \,\mathrm{nJ/bit}$ , respectively.

|                                                     | This         | JSSC' 12   | JSSC'16      | SSCI'20      | REIC'20     | TMTT'18     | USSC'19   |
|-----------------------------------------------------|--------------|------------|--------------|--------------|-------------|-------------|-----------|
|                                                     | Work         | [84]       | [28]         | [79]         | [101]       | [25]        | [82]      |
| Modulation Type                                     | 64-PSK       | QPSK       | GMSK         | 32-PSK       | GFSK        | 64QAM       | 1024QAM   |
| DCO Carrier Freq.<br>(f <sub>0</sub> ) Range (GHz)  | 2.7-3.9      | 2.9-4.0    | 10.1-12.4    | 13.0-14.5    | 1.6-1.94    | 2.8-7.6     | 9.9-12.1  |
| Measured f <sub>0</sub> (GHz)<br>/ Freq. Division K | 3.188<br>/ 8 | 3.6<br>/ 1 | 10.24<br>/ 1 | 13.75<br>/ 1 | 1.81<br>/ 2 | 5.14<br>/ 2 | 11<br>/ 2 |
| Integrated RMS<br>Jitter (fs)                       | 317          | 503        | 180          | 95.2         | NA          | 1091        | 168       |
| Data Rate (Mbit/s)                                  | 60           | 20         | 10           | 250          | 1           | 201.6       | 25        |
| Ref. Freq. (MHz)                                    | 40           | 40         | 40           | 200          | 60          | 26          | 40        |
| BW <sub>FM</sub> (MHz)                              | 192          | 40         | 2.5          | 200          | 0.5         | 416         | ≤80       |
| BW <sub>FM</sub> / f <sub>0</sub> (%)*              | 6.02         | 1.11       | 0.024        | 1.45         | 0.028       | 8.09        | ≤0.73     |
| EVM @ f <sub>0</sub> / K (dB)                       | -47.6        | -36        | -37.4        | -42.2        | -30.9       | -28.7       | -41.3     |
| IPN** @ f <sub>0</sub> /K (dBc)                     | -65          | -42        | -41.7        | -44.7        | N/A         | -38.1       | -47.6     |
| EVM <sub>rescale</sub> *** (dB)                     | -47.6        | -55.1      | -65.6        | -73          | -38         | -44.9       | -64.1     |
| EVM excl. IPN<br>@  f <sub>0</sub> / K (dB)         | -47.8        | -39        | -43.3        | N/A          | N/A         | -29.8       | -44       |
| Power (mW)                                          | 4.6          | 5          | 8.1          | 31.5         | 5.3         | 40.7****    | 17.7****  |
| Energy/Bit (nJ/bit)                                 | 0.08         | 0.25       | 0.81         | 0.13         | 5.3         | 0.2****     | 0.71****  |
| Active Area (mm <sup>2</sup> )                      | 0.31         | 0.5        | 0.25         | 0.7          | 0.3831      | 2.12****    | 1.31      |
| CMOS Process (nm)                                   | 40           | 65         | 28           | 28           | 65          | 28          | 28        |

Table 4.1: Comparison with state-of-the-art PLL-based phase modulators.

\* Unchanged if the DCO directly operates at f<sub>0</sub>/K

\*\*Only integrated over postive or negative frequencies

\*\*\* Rescaled to 398.5MHz \*\*\*\* Including only the phase modulator part

It should be noted that the issue of comparing EVMs across designs is still an open question in the literature. Ref. [81] has chosen to normalize the EVMs to the same output frequency. This is equivalent to measuring the EVM after virtually dividing<sup>1</sup> the PM clock by  $K_{\text{rescale}} = f_{\text{reported}}/f_{\text{chosen}}$ , where  $f_{\text{reported}}$  is the original output frequency reported in a given reference paper, and  $f_{\text{chosen}}$  is our chosen target output frequency for re-scaling (here equal to 398.5 MHz). Under an expedient assumption that the PM error is dominated by random jitter (i.e. thermal phase noise), the rescaled EVM in dB, i.e., EVM<sub>rescale</sub>, equals the original EVM minus 20 log<sub>10</sub>( $K_{\text{rescale}}$ ) because the divided carrier period becomes  $K_{\text{rescale}}$  times larger, but the random jitter remains the same. Table 4.1 also lists the calculated EVM<sub>rescale</sub> values of each work.

 $<sup>^1\</sup>mathrm{In}$  general, a multiplication is also possible but, for the sake of simplicity, here we only describe a division.

However, the above  $-20 \log_{10}(K_{\text{rescale}})$  scaling assumption does not hold under a realistic scenario of a wideband TX when distortion dominates the PM error because the distortion increases with  $K_{\text{rescale}}$ . This can be understood by inspecting the distortion induced by the error in the modulation frequency  $(\Delta f_{\text{M}})$ : According to Section 4.4.1, the *relative*  $\Delta f_{\text{M}}$  error due to the  $1/\sqrt{LC}$  nonlinearity is roughly reflected by BW<sub>FM</sub>/ $f_0$ . If the original PM clock at  $f_0$  was to be (virtually) frequency-divided by  $K_{\text{rescale}}$  (for the EVM rescaling), BW<sub>FM</sub> should multiply by  $K_{\text{rescale}}$  to keep the PM characteristics (e.g., data rate and constellation) unchanged after the division. Hence, a larger  $K_{\text{rescale}}$  increases BW<sub>FM</sub>/ $f_0$ , indicating stronger *relative*  $\Delta f_{\text{M}}$  error and higher EVM contribution. This is verified by Fig. 4.21(a), contradicting with the EVM-rescaling trend indicated by the jitter-dominant assumption. Although linearizing the DCO can suppress the  $\Delta f_{\text{M}}$  error, the residue increases dramatically with BW<sub>FM</sub>/ $f_0$  due to the high-order nonlinearities indicated in (4.18)<sup>1</sup>. This will ultimately dominate the EVM.

Considering that the EVM contributions due to jitter and distortion change differently in the frequency rescaling, we prefer to separately compare these two contributors, rather than merely considering the overall EVM. In Table 4.1, the former one is already covered by the integrated rms jitter, and the latter is reflected by the EVM *excluding* IPN (integrated phase noise) at their original output frequencies. The "EVM excl. IPN" is calculated by

EVM excl. IPN = 
$$10 \log_{10} [10^{\text{EVM}(\text{dB})/10} - 10^{(\text{IPN}(\text{dB})+3)/10}],$$
 (4.27)

where 3 dB is added to IPN because it integrates phase noise over positive or negative offset frequencies and counts merely half of the EVM contribution. The proposed phase modulator exhibits the lowest distortion level compared with other works.

# 4.7 Conclusions

This chapter has demonstrated a digital PLL-based phase modulator of high accuracy yet low power consumption. Although the DCO updates at a non-uniform clock and suffers from strong nonlinearity due to the wide FM bandwidth, the phase modulator can still achieve EVM below -47 dBat a 60-Mbit/s 64-PSK signal. This benefits from the two proposed innovations: 1) the non-uniform clock compensation (NUCC) that addresses PLL disturbances arising from the time-varying period and offset of the updating clock, and 2) the phase-domain digital pre-distortion that compensates

<sup>&</sup>lt;sup>1</sup>BW<sub>FM</sub>/ $f_0$  correlates with  $\Delta \phi_{\rm M}$ /FCW in (4.18) because the former represents the maximum  $\Delta f_{\rm M}/f_0$ , and the latter equals  $\Delta f_{\rm M}/f_0$  according to (4.1) and (4.6)

the  $1/\sqrt{LC}$ -induced DCO nonlinearity. From the methodology perspective, the NUCC analysis entails the improved phase modulation model in the hybrid-time domain. The new model is effective in analyzing the time-related distortions in general PLL-based phase modulators. Moreover, combining the proposed phase-domain pre-distortion with the conventional OTW-domain counterpart could constitute a frequency-insensitive solution compensating for DCO nonlinearity. These two powerful tools would benefit low-power PLL-based phase modulators in improving the accuracy, thereby paving the way for future polar transmitters supporting high-data-rate applications.



# Conclusion

# 5.1 Original Contributions

The original scientific contributions described in this thesis are summarized as follows:

- Proposed a strategy that dynamically scales the 'golden' time base to cancel the deterministic time offset input to the PLL phase detector
- Proposed the time-mode arithmetic unit (TAU) that can be neficially calculate a weighted sum of time-mode inputs
- Obtained an insight into the characteristics of self-interference mechanisms in a PLL, i.e., synchronicity with the DCO phase and sinusoidal pattern
- Developed a strategy canceling the self-interference-induced fundamental fractional spurs
- Developed a hybrid time-domain model to study the PLL's phase modulation error raised by a non-ideal sampling clock
- Proposed a non-uniform clock compensation (NUCC) scheme to address the phase modulation error raised by a re-timed sampling clock
- Proposed a phase-domain digital pre-distortion (DPD) to tackle the  $1/\sqrt{LC}\text{-induced DCO}$  nonlinearity
### 5.2 Thesis Outcomes

The quest for ever-lower power consumption has become a crucial factor shaping the wireless transceiver design, where each critical block in the system should keep on reducing its power consumption, yet without sacrificing its performance. Considering such background, this thesis focuses on the lowpower techniques suppressing a PLL's fractional-spur levels when the PLL generates an unmodulated carrier, and the phase modulation (PM) error when the PLL is engaged in a two-point modulation.

Chapter 2 mainly addresses fractional spurs raised during the phase-errorextraction process. By comparing different phase-error-extraction strategies, we notice that their accuracy is degraded by the fact that those strategies rely on an imperfect time base to cancel the deterministic time offset input to the phase detector; we then proposed to cancel the time offset by scaling the 'golden' time base, i.e., the period of the DCO output clock. This new phaseerror-extraction strategy is validated on a fractional-N digital PLL adopting the proposed time-mode arithmetic unit (TAU). The TAU is a general-purpose timestamp-signal processor that calculates the weighted sum of input time offsets. In the PLL, the TAU extracts the DCO phase error by calculating the weighted sum of two inputs—the 'golden' time base, i.e., the DCO period, and the instantaneous time offset between the significant reference clock edge and the first subsequent DCO clock edge. A prototype PLL, implemented in 40-nm CMOS, achieves 182 fs rms jitter with 3.5 mW power consumption. In a near-integer channel (i.e., the worst case), the PLL shows the worst fractional spur below -59 dBc. Under significant supply and temperature variations, the worst fractional spur still remains below -51.7 dBc without any background calibration tracking. The spurious performance benefits from the phase-error-extraction strategy—scaling the 'golden' time base to cancel the deterministic phase detector input—which automatically corrects the TAU's transfer function. This is a methodology-level improvement and indicates a potential for exploring this new phase-detection category for low-spur clock generation.

Chapter 3 mainly studies the fundamental fractional spurs raised by self-interference, especially the in-band and DCO interference mechanisms originating from mutual coupling between the DCO and reference clock circuitry. We first analyze the characteristics of these two types of interference and the corresponding impacts on fundamental fractional spurs. Based on two features of the self-interference, i.e., a sinusoidal pattern and synchronicity with the DCO phase, we develop a digitally intensive strategy that cancels fundamental fractional spurs caused by the DCO interference by deliberately injecting a well-designed in-band interference. This strategy is verified on a chip where the DCO is significantly disturbed by a coupled reference clock. After applying the proposed strategy, the worst fundamental fractional spurs across the fractional channels are suppressed by over 10 dB and remain below -58 dBc. Compared with the existing adaptive DCO-interference-cancellation methods, the proposed one incurs a lower hardware cost (e.g., by reusing the same hardware tackling the TAU nonlinearity in this design) because it adequately utilizes the sinusoidal and synchronous characteristics of selfinterference. More importantly, the theory underlying the proposed strategy helps to understand and characterize the on-chip coupling mechanisms. This provides a foundation to develop future spur suppression techniques with lower power and hardware penalties.

Chapter 4 concerns itself with a case where a PLL is phase-modulated. Because a PLL-based modulator acquires the desired phase shift by integrating the modulating frequency over one period of the sampling clock, improving the phase modulation (PM) accuracy should tackle the errors related to both the nonlinearity in frequency modulation (FM) and the non-uniformity of the sampling clock grid. The non-uniform clock issue is attributed to the fact that the DCO's modulation frequency updates at a clock generated by re-timing the reference clock to the phase-modulated DCO output (due to the practical system considerations). Consequently, the re-timed clock inherits some PM features and exhibits non-uniform characteristics disturbing the PLL and finally degrading the PM accuracy. To tackle this issue, a hybrid-time domain model is developed to analyze the clock-timing-related distortions, and then a non-uniform clock compensation scheme is proposed based on this model. The FM-related error is dominated by the nonlinear DCO. Compared with the existing DCO linearization techniques, which compensate all the nonlinearity sources by pre-distorting the oscillator tuning word (OTW), we consider the  $1/\sqrt{LC}$ -induced DCO nonlinearity separately and address it by pre-distorting the target modulating phase that has not been denormalized to OTW yet. This phase-domain DPD can improve the PM accuracy without requiring any preliminary knowledge of the physical parameters. Furthermore, combining the phase- and OTW-domain DPD techniques constitutes a frequency-insensitive DCO linearization strategy, which finally reduces the associated calibration efforts and channel-hopping time. The proposed concepts have been verified in a prototype digital-PLL-based phase modulator fabricated in a 40-nm CMOS technology. The measured EVM is  $-47 \, \text{dB}$  with a 60-Mb/s 64-PSK modulation, under the case that the phase-modulated output is frequency-divided by K=8, when the DCO exhibits significant nonlinearity due to the large fractional FM bandwidth. This proves that the proposed techniques are effective in suppressing the PM error. When K=8

or 4, the measured EVM remains below  $-43 \, dB$  across the carrier-frequency tuning range without re-calibrating the DCO nonlinearity. This indicates the combinational DPD strategy is frequency insensitive. Therefore, the proposed techniques, i.e., NUCC and combinational DPD, will benefit the low-power PLL-based phase modulators in improving the accuracy, thereby paving the way for developing future polar transmitters supporting high-data-rate applications.

### 5.3 Recommendations for Future Development

Although the proposed TAU circuit and spur-cancellation technique have already been proven effective in suppressing the PLL fractional spur levels, they can still be improved to optimize the PLL's performance in other aspects. The first is to improve the TAU noise. In the prototype PLL, the TAU is designed with a bit relaxed noise budget because of the targeted PLL application, and so the TAU's noise contribution can be suppressed by utilizing a narrow PLL bandwidth. However, if the design target aims for better PLL phase noise performance and without any extra power consumption, a larger PLL bandwidth is usually desired because it can help suppress the phase noise contribution from the power-hungry DCO without a power penalty. As a side effect, the wider PLL bandwidth increases the TAU's contribution to the overall PLL phase noise. This necessitates further research on low-power techniques to suppress the TAU noise. The second aspect is to improve the spur-cancellation technique so that it can measure and determine all the essential parameters in the background of PLL operation. So far, the proposed method determines these parameters in the foreground, thus incapable of maintaining the spur-cancelation performance across environmental variations, i.e., supply and temperature drifts. In contrast, a future solution running in the background would exhibit robust performance under these environmental This section will offer some suggestions for improving these two drifts. aspects.

Regarding the TAU's noise, it can be suppressed by reducing the maximum amplitude of  $\Delta t_{\rm S}$ , i.e., max  $|\Delta t_{\rm S}|$ : The TAU noise is dominated by kT/C and the slicing comparator's input-referred noise, i.e., contributing nearly 70% of the overall noise [see Fig. 2.34(b)]. Directly combating these two types of noise sources (e.g., by optimizing the size of critical devices) inevitably incurs a power penalty. A low-power alternative starts with an observation that these two types of noise are converted from voltage noise to edge jitter through the final discharge slope. So, with a faster final discharge slope, the same noise voltage can convert to lower edge jitter. However, the final discharge slope cannot be arbitrarily fast because it is inversely proportional to the featured  $R \cdot C$  product of the weighted time register (WTR) (i.e.,  $R_0C_0$ , where  $R_0$  is the WTR's unit resistance and  $C_0$  is the fixed capacitance), which should be large enough to handle all supported time inputs (according to Section 2.3.5). Considering that the TAU's task is to cancel the deterministic pattern of  $\Delta t_s$ , max  $|\Delta t_s|$  actually determines the maximum time that the WTR must handle. Therefore, a reduced max  $|\Delta t_s|$  requires a smaller featured RC product and a sharper discharge slope, ultimately suppressing the TAU's edge jitter.



Figure 5.1: System diagram of TAU-based PLL with MMD.

Properly using the TAU's differential input range can reduce max  $|\Delta t_{\rm S}|$ , thus benefitting the TAU noise optimization: Although the implemented TAU only needs to handle  $\Delta t_{\rm S}$  variation of  $T_{\rm CKV}$ , max  $|\Delta t_{\rm S}|$  is as high as  $11/8T_{\rm CKV}$ . This is attributed to the drawbacks of the snapshot circuit— Because it captures positive-only  $\Delta t_{\rm S}$  and suffers from the metastability issue, the finally sampled  $\Delta t_{\rm S}$  ranges from  $3/8T_{\rm CKV}$  to  $11/8T_{\rm CKV}$  (see Fig. 2.11). If a multi-modulus divider (MMD) dithered by a first-order  $\Delta\Sigma$ -modulator can replace the snapshot circuit to sample  $\Delta t_{\rm S}$  (see Fig. 5.1), the captured  $\Delta t_{\rm S}$ can be either positive or negative, i.e., ranging from  $-1/2T_{\rm CKV}$  to  $1/2T_{\rm CKV}$ . This reduces max  $|\Delta t_{\rm S}|$  from  $11/8T_{\rm CKV}$  to  $1/2T_{\rm CKV}$ , making a roughly  $3\times$ sharper discharge slope possible. Consequently, the TAU's output jitter contributed by certain voltage noise, i.e., the KT/C noise and the slicer's input-referred noise, can be suppressed to 1/9 of the original value, which is a significant improvement.

Regarding the spur cancellation, the key parameters are calibrated in the foreground because in-band and DCO interferences coexist and the associated calibrations target conflicting convergence conditions—the calibration tackling in-band interference settles down when the detected phase error associated with each  $\phi_{\rm R,frac}$  value (fractional part of the predicted DCO phase) exhibits a mean value of 0; whereas, that tackling the DCO interference convergences when the sin $(2\pi\phi_{\rm R,frac})$ -correlated pattern in the detected phase

error sequence aligns with the injected cancellation signal, instead of vanishing. The latter calibration settles at this special condition because the signal canceling DCO interference is injected into the phase detector, failing to immediately cancel the interference occurring to the DCO. As a result, the injected pattern never vanishes in the phase detector output.



Figure 5.2: System diagram of the two-point calibration tackling self-interferences.

If the DCO interference calibration directly injects a pattern into the DCO to cancel the interference there, as shown in Fig. 5.2, the calibration will settle down when the  $\sin(2\pi\phi_{\rm R,frac})$ -correlated pattern disappears in the phase detector output. The new convergence condition coincides with that of in-band interference calibration. Consequently, these two calibrations (addressing DCO and in-band interferences) can be simultaneously performed in the background of PLL operation. However, by that time, both calibrations correlate the phase detector output (i.e., TDC output in Fig. 5.2) with  $\phi_{\rm R,frac}$ , and thereby may affect each other. So, some other techniques should be further investigated to distinguish the calibration residues' contributions to the phase detector output.



## **Differential Vernier Time-to-Digital Converter**

Fig. A.1 shows a simplified schematic of the 4-bit differential vernier timeto-digital converter (TDC). It quantizes the input time difference between the CMP<sub>P</sub> and CMP<sub>N</sub> falling edges to a digital code D<sub>TDC</sub>. The overall TDC contains two 3-bit Vernier TDCs (same as [58]), whose thermometric outputs are encoded in a binary fashion. The delay difference between the fast ( $\tau_F$ ) and slow ( $\tau_S$ ) delay cells in the sub-TDCs determines the TDC resolution, which is around 2 ps in this implementation. These two sub-TDCs are configured in a differential style, like in [38], to detect both the positive and negative time inputs, respectively standing for the case that CMP<sub>P</sub> leads CMP<sub>N</sub> and that CMP<sub>P</sub> lags CMP<sub>N</sub>. The polarity of the input is measured by an extra arbiter and reflected on the SIGN bit, which also selects the proper sub-TDC output to be encoded and to represent the input-time amplitude.



Figure A.1: Schematic of the differential vernier TDC.

Once the active sub-TDC has successfully quantized the input time (indicated by a rising edge on the DONE signal), the sampler registers the SIGN bit and the encoded sub-TDC result as a combined value of  $D_{TDC}$ . Simultaneously, a  $\overline{TDC}_{done}$  falling edge is launched to notify the system that  $D_{TDC}$  is ready. Right before the next time input comes (indicated by the falling FREF), the low-level on  $\overline{TDC}_{done}$  is released, ready for the  $D_{TDC}$ -ready notifier for the next conversion. This coincides with the timing requirements of the TAU-based PLL system.

# A P P E N D I X

# **Output Jitter of the Slicing Comparator**

The output operation of the two-stage slicing comparator in Fig. 2.18 is simplified in Fig. B.1. Subfigure (a) shows the schematic with only the active transistors participating during the output, while (b) shows the voltage waveforms. The output process is triggered the moment the slicer input  $V_{\rm C}(t)$ crosses the threshold voltage of the first-stage comparator, i.e.,  $V_{\rm th1}$ . After that, the driving strength of PM2 becomes stronger than that of NM1, so the capacitor  $C_1$  is gradually charged and the voltage  $V_1$  begins to increase. When  $V_1$  rises higher than the threshold of the second-stage comparator, NM3 will pull down the output voltage  $V_2$  and trigger the final output, i.e., CMP falling edge.

During this output process, both noise voltages on  $C_1$  and  $C_2$  can contribute to the overall output jitter (on the CMP falling edge) through the corresponding (dis)charging slope. However, the noise voltage on  $C_1$  has a dominant contribution because a slower voltage transition slope converts the given voltage noise to larger edge jitter and  $V_1$  exhibits a much slower slope than  $V_2$  does in a well-optimized design. In the jitter analysis below, we consider only the contribution from the noise voltage on  $C_1$ .

According to [14],  $C_1$  acquires the noise voltage by integrating an input noise current over time. The noise current, denoted as  $i_n$  in Fig. B.1(a), is injected by PM2 and NM1 (both assumed to operate in the saturation region during the noise integration), and exhibits a power density of

$$S_{i_{\rm n}}(f) = 4kT\gamma g_{\rm m,eq},\tag{B.1}$$



Figure B.1: Simplified schematic of the TAU's slicing comparator and the waveforms during output.

where  $g_{\rm m,eq}$  and  $\gamma$  are respectively the equivalent transconductance of the combined PM2 and NM1, and their thermal noise coefficient. The noise integration interval equals the propagation time of the first-stage comparator, i.e.,  $t_{\rm d}$  in Fig. B.1(b).  $t_{\rm d}$  starts from the moment  $V_{\rm C}(t)$  crosses the first-stage threshold  $V_{\rm th1}$ , and ends at the moment when  $V_1(t)$  crosses the second-stage threshold  $V_{\rm th2}$ . In the remaining part of this section,  $t_{\rm d}$  will be estimated first. Then, the noise voltage accumulated during  $t_{\rm d}$ , i.e.,  $\overline{v_{\rm n}^2}$ , can be calculated according to the lossless noise integration process [equation (8) in [14]]. Finally,  $\overline{v_{\rm n}^2}$  will be converted to jitter according to the instantaneous slope at which  $V_1$  crosses  $V_{\rm th2}$ .

In estimating  $t_d$ ,  $V_C(t)$  is assumed to linearly decrease versus time with the slope of  $k_{\text{th1}}$  around the  $V_{\text{th1}}$ -crossing moment, i.e, t = 0 in Fig. B.1(b). Hence, the *small-signal* voltage characterizing  $V_C(t) - V_{\text{th1}}$  can be approximated as

$$v_{\rm C}(t) = -k_{\rm th1} \cdot t. \tag{B.2}$$

During the crossing moment when t > 0, the combination of PM2 and NM1 charges  $C_1$  with the current proportional to  $v_{\rm C}(t)^1$ . So the voltage on  $C_1$  increases as

$$V_1(t) = -\int_0^t \frac{g_{\rm m,eq} \cdot v_{\rm C}(\tau)}{C_1} d\tau = \frac{k_{\rm th1} \cdot g_{\rm m,eq}}{2C_1} \cdot t^2.$$
(B.3)

Considering  $t_d$  ends the moment  $V_1(t)$  crosses  $V_{th2}$ , solving  $V_1(t) = V_{th2}$  yields  $t_d$ , i.e.,

$$t_{\rm d} = \sqrt{\frac{2V_{\rm th2} \cdot C_1}{g_{\rm m,eq} \cdot k_{\rm th1}}}.$$
 (B.4)

<sup>&</sup>lt;sup>1</sup>Note that we follow here the IEEE convention for using the capital/small letters for voltage and current.

Then, the power of noise voltage  $v_n$  accumulated during  $t_d$  can be calculated by applying equation (8) from [14]:

$$\overline{v_{\mathrm{n}}^2} = \frac{S_{i_{\mathrm{n}}}}{2C_1^2} \cdot t_{\mathrm{d}}.\tag{B.5}$$

On the other hand, the  $C_1$  charging slope at the  $V_{\rm th2}$ -crossing moment, i.e.,  $t = t_{\rm d}$  can be derived as

$$k_{\text{th}2} = \left. \frac{\mathrm{d}V_1(t)}{\mathrm{d}t} \right|_{t=t_{\mathrm{d}}} = \frac{k_{\mathrm{th}1} \cdot g_{\mathrm{m,eq}} \cdot t_{\mathrm{d}}}{C_1}.$$
 (B.6)

Through this slope,  $\overline{v_n^2}$  can be converted to time variation of the  $V_2$  falling edge, which is the output jitter of the overall slicing comparator, i.e.,

$$\sigma_{\rm cmp}^2 = \frac{\langle v_{\rm n}^2 \rangle}{k_{\rm th2}^2}.$$
 (B.7)

Substituting (B.1), (B.4), (B.5) and (B.6) into this equation yields the final expression for the slicer's output jitter

$$\sigma_{\rm cmp}^2 = \frac{\sqrt{2}kT\gamma}{\sqrt{V_{\rm th2} \cdot k_{\rm th1}^3 \cdot g_{\rm m,eq} \cdot C_1}}.$$
(B.8)

# Bibliography

- [1] "Global mobile traffic share by region 2022," https://www.statista.com/statistics/306528/share-of-mobile-internettraffic-in-global-regions/.
- [2] A. S. G. Andrae and T. Edler, "On Global Electricity Usage of Communication Technology: Trends to 2030," *Challenges*, vol. 6, no. 1, pp. 117–157, Jun. 2015.
- [3] M. P. Kennedy, Y. Donnelly, J. Breslin, S. Tulisi, S. Patil, C. Curtin, S. Brookes, B. Shelly, P. Griffin, and M. Keaveney, "16.9 4.48GHz 0.18μm SiGe BiCMOS Exact-Frequency Fractional-N Frequency Synthesizer with Spurious-Tone Suppression Yielding a -80dBc In-Band Fractional Spur," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), Feb. 2019, pp. 272–274.
- [4] W. Wu, C.-W. Yao, C. Guo, P.-Y. Chiang, L. Chen, P.-K. Lau, Z. Bai, S. W. Son, and T. B. Cho, "A 14-nm Ultra-Low Jitter Fractional-N PLL Using a DTC Range Reduction Technique and a Reconfigurable Dual-Core VCO," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 12, pp. 3756–3767, Dec. 2021.
- [5] P. Madoglio, H. Xu, K. Chandrashekar, L. Cuellar, M. Faisal, W. Y. Li, H. S. Kim, K. M. Nguyen, Y. Tan, B. Carlton, V. Vaidya, Y. Wang, T. Tetzlaff, S. Suzuki, A. Fahim, P. Seddighrad, J. Xie, Z. Zhang, D. S. Vemparala, A. Ravi, S. Pellerano, and Y. Palaskas, "13.6 A 2.4GHz WLAN digital polar transmitter with synthesized digital-to-time converter in 14nm trigate/FinFET technology for IoT and wearable applications," in 2017 IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2017, pp. 226–227.
- [6] A. Ben-Bassat, S. Gross, A. Nazimov, A. Ravi, B. Khamaisi, E. Banin, E. Borokhovich, N. Kimiagarov, P. Skliar, R. Banin, S. Zur, S. Reinhold, S. Bruker, T. Maimon, U. Parker, and O. Degani, "10.5 A Fully

Integrated 27dBm Dual-Band All-Digital Polar Transmitter Supporting 160MHz for WiFi 6 Applications," in 2020 IEEE International Solid-State Circuits Conference - (ISSCC), Feb. 2020, pp. 180–182.

- [7] A. Ravi, P. Madoglio, H. Xu, K. Chandrashekar, M. Verhelst, S. Pellerano, L. Cuellar, M. Aguirre-Hernandez, M. Sajadieh, J. E. Zarate-Roldan, O. Bochobza-Degani, H. Lakdawala, and Y. Palaskas, "A 2.4-GHz 20–40-MHz Channel WLAN Digital Outphasing Transmitter Utilizing a Delay-Based Wideband Phase Modulator in 32-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 12, pp. 3184–3196, Dec. 2012.
- [8] Y. Palaskas, P. Madoglio, J. Angel, J. Tomasik, S. Hampel, P. Schubert, P. Preyler, T. Mayer, T. Bauernfeind, P. Plechinger, A. Ravi, O. Degani, R. Banin, E. Gordon, and Z. Boos, "A Cellular Multiband DTC-Based Digital Polar Transmitter With -153 dBc/Hz Noise in 14-nm FinFET," *IEEE Solid-State Circuits Letters*, vol. 2, no. 9, pp. 179–182, Sep. 2019.
- [9] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, "Design Considerations for a Direct Digitally Modulated WLAN Transmitter With Integrated Phase Path and Dynamic Impedance Modulation," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 12, pp. 3160–3177, Dec. 2013.
- [10] Y. Shen, M. Mehrpoo, M. Hashemi, M. Polushkin, L. Zhou, M. Acar, R. van Leuken, M. S. Alavi, and L. de Vreede, "A fully-integrated digitalintensive polar Doherty transmitter," in 2017 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Jun. 2017, pp. 196–199.
- [11] A. Ba, Y. Liu, J. van den Heuvel, P. Mateman, B. Busze, J. Gloudemans, P. Vis, J. Dijkhuis, C. Bachmann, G. Dolmans, K. Philips, and H. de Groot, "26.3 A 1.3nJ/b IEEE 802.11ah fully digital polar transmitter for IoE applications," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), Jan. 2016, pp. 440–441.
- [12] A. Ba, J. van den Heuvel, P. Mateman, C. Zhou, B. Busze, M. Song, Y. He, M. Ding, J. Dijkhuis, E. Tiurin, S. Madampu, P. Boer, S. Traferro, Y. Zhang, Y. Liu, C. Bachmann, and K. Philips, "A 0.62nJ/b multistandard WiFi/BLE wideband digital polar TX with dynamic FM correction and AM alias suppression for IoT applications," in 2018 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Jun. 2018, pp. 308–311.
- [13] E. Familier and I. Galton, "Second and third-order successive requantizers for spurious tone reduction in low-noise fractional-N PLLs," in

2017 IEEE Custom Integrated Circuits Conference (CICC), Apr. 2017, pp. 1–4.

- [14] A. A. Abidi, "Phase Noise and Jitter in CMOS Ring Oscillators," IEEE Journal of Solid-State Circuits, vol. 41, no. 8, pp. 1803–1816, Aug. 2006.
- [15] L. Wu, T. Burger, P. Schönle, and Q. Huang, "A Power-Efficient Fractional-N DPLL With Phase Error Quantized in Fully Differential-Voltage Domain," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 4, pp. 1254–1264, Apr. 2021.
- [16] W. Deng, D. Yang, A. T. Narayanan, K. Nakata, T. Siriburanon, K. Okada, and A. Matsuzawa, "14.1 A 0.048mm2 3mW synthesizable fractional-N PLL with a soft injection-locking technique," in 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, Feb. 2015, pp. 1–3.
- [17] D. Tasca, M. Zanuso, G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, "A 2.9–4.0-GHz Fractional-N Digital PLL With Bang-Bang Phase Detector and 560-fs<sub>rms</sub> Integrated Jitter at 4.5-mW Power," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 12, pp. 2745–2758, Dec. 2011.
- [18] W. Wu, C. Yao, K. Godbole, R. Ni, P. Chiang, Y. Han, Y. Zuo, A. Verma, I. S. Lu, S. W. Son, and T. B. Cho, "A 28-nm 75-fs<sub>rms</sub> Analog Fractional- N Sampling PLL With a Highly Linear DTC Incorporating Background DTC Gain Calibration and Reference Clock Duty Cycle Correction," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 5, pp. 1254–1265, May 2019.
- [19] X. Gao, O. Burg, H. Wang, W. Wu, C. Tu, K. Manetakis, F. Zhang, L. Tee, M. Yayla, S. Xiang, R. Tsang, and L. Lin, "9.6 A 2.7-to-4.3GHz, 0.16psrms-jitter, -246.8dB-FOM, digital fractional-N sampling PLL in 28nm CMOS," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), Jan. 2016, pp. 174–175.
- [20] H. Liu, D. Tang, Z. Sun, W. Deng, H. C. Ngo, K. Okada, and A. Matsuzawa, "A 0.98mW fractional-N ADPLL using 10b isolated constantslope DTC with FOM of -246dB for IoT applications in 65nm CMOS," in 2018 IEEE International Solid - State Circuits Conference - (ISSCC), Feb. 2018, pp. 246–248.
- [21] C. R. Ho and M. S.-W. Chen, "Interference-induced DCO spur mitigation for digital phase locked loop in 65-nm CMOS," in *ESSCIRC*

Conference 2016: 42nd European Solid-State Circuits Conference, Sep. 2016, pp. 213–216.

- [22] C. R. Ho and M. S. W. Chen, "A digital frequency synthesizer with dither-assisted pulling mitigation for simultaneous DCO and reference path coupling," in 2018 IEEE International Solid-State Circuits Conference - (ISSCC), Feb. 2018, pp. 254–256.
- [23] R. B. Staszewski and P. T. Balsara, "Phase-domain all-digital phaselocked loop," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 52, no. 3, pp. 159–163, Mar. 2005.
- [24] B. Razavi, *RF Microelectronics: United States Edition*, 2nd ed. Upper Saddle River, NJ: Prentice Hall, Sep. 2011.
- [25] T. Buckel, S. Tertinek, T. Mayer, T. Bauernfeind, C. Wicpalek, A. Springer, R. Weigel, and T. Ussmueller, "A Highly Reconfigurable RF-DPLL Phase Modulator for Polar Transmitters in Cellular RFICs," *IEEE Transactions on Microwave Theory and Techniques*, vol. 66, no. 6, pp. 2618–2627, Jun. 2018.
- [26] R. B. Staszewski and P. T. Balsara, All-Digital Frequency Synthesizer in Deep-Submicron CMOS, 1st ed. Hoboken, N.J: Wiley-Interscience, Sep. 2006.
- [27] H. Shanan, D. Dalton, V. Chillara, and P. Dato, "A 9-to-12GHz Coupled-RTWO FMCW ADPLL with 97fs RMS Jitter, -120dBc/Hz PN at 1MHz Offset, and With Retrace Time of 12.5ns and 2μs Chirp Settling Time," in 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, Feb. 2022, pp. 146–148.
- [28] N. Markulic, K. Raczkowski, E. Martens, P. E. P. Filho, B. Hershberg, P. Wambacq, and J. Craninckx, "A DTC-Based Subsampling PLL Capable of Self-Calibrated Fractional Synthesis and Two-Point Modulation," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 12, pp. 3078–3092, Dec. 2016.
- [29] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A Low Noise Sub-Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is Not Multiplied by N<sup>2</sup>," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 12, pp. 3253–3263, Dec. 2009.
- [30] X. Gao, E. A. M. Klumperink, G. Socci, M. Bohsali, and B. Nauta, "Spur Reduction Techniques for Phase-Locked Loops Exploiting A Sub-Sampling Phase Detector," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 9, pp. 1809–1821, Sep. 2010.

- [31] A. Sharkia, S. Mirabbasi, and S. Shekhar, "A 0.01mm2 4.6-to-5.6GHz sub-sampling type-I frequency synthesizer with -254dB FOM," in 2018 IEEE International Solid - State Circuits Conference - (ISSCC), Feb. 2018, pp. 256–258.
- [32] Z. Yang, Y. Chen, S. Yang, P. Mak, and R. P. Martins, "16.8 A 25.4-to-29.5GHz 10.2mW Isolated Sub-Sampling PLL Achieving -252.9dB Jitter-Power FoM and -63dBc Reference Spur," in 2019 IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2019, pp. 270–272.
- [33] Z. Yang, Y. Chen, J. Yuan, P.-I. Mak, and R. P. Martins, "A 3.3-GHz Integer N-Type-II Sub-Sampling PLL Using a BFSK-Suppressed Push–Pull SS-PD and a Fast-Locking FLL Achieving -82.2-dBc REF Spur and -255-dB FOM," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 30, no. 2, pp. 238–242, Feb. 2022.
- [34] H. Liu, Z. Sun, H. Huang, W. Deng, T. Siriburanon, J. Pang, Y. Wang, R. Wu, T. Someya, A. Shirane, and K. Okada, "A 265μ W Fractional- N Digital PLL With Seamless Automatic Switching Sub-Sampling/Sampling Feedback Path and Duty-Cycled Frequency-Locked Loop in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 12, pp. 3478–3492, Dec. 2019.
- [35] Z. Gao, J. He, M. Fritz, J. Gong, Y. Shen, Z. Zong, P. Chen, G. Spalink, B. Eitel, M. S. Alavi, R. B. Staszewski, and M. Babaie, "A Low-Spur Fractional-N PLL Based on a Time-Mode Arithmetic Unit," *IEEE Journal of Solid-State Circuits*, vol. 58, no. 6, pp. 1552–1571, Jun. 2023.
- [36] J. Zhuang and R. B. Staszewski, "A low-power all-digital PLL architecture based on phase prediction," in 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012), Dec. 2012, pp. 797–800.
- [37] S. Levantino, G. Marzin, and C. Samori, "An Adaptive Pre-Distortion Technique to Mitigate the DTC Nonlinearity in Digital PLLs," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 8, pp. 1762–1772, Aug. 2014.
- [38] A. Elkholy, T. Anand, W. Choi, A. Elshazly, and P. K. Hanumolu, "A 3.7 mW Low-Noise Wide-Bandwidth 4.5 GHz Digital Fractional-N PLL Using Time Amplifier-Based TDC," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 4, pp. 867–881, Apr. 2015.
- [39] B. Liu, H. C. Ngo, K. Nakata, W. Deng, Y. Zhang, J. Qiu, T. Yoshioka, J. Emmei, H. Zhang, J. Pang, A. T. Narayanan, D. Yang, H. Liu,

K. Okada, and A. Matsuzawa, "A 1.2ps-jitter fully-synthesizable fullycalibrated fractional-N injection-locked PLL using true arbitrary nonlinearity calibration technique," in 2018 IEEE Custom Integrated Circuits Conference (CICC), Apr. 2018, pp. 1–4.

- [40] V. K. Chillara, Y. Liu, B. Wang, A. Ba, M. Vidojkovic, K. Philips, H. de Groot, and R. B. Staszewski, "9.8 An 860µW 2.1-to-2.7GHz alldigital PLL-based frequency modulator with a DTC-assisted snapshot TDC for WPAN (Bluetooth Smart and ZigBee) applications," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb. 2014, pp. 172–173.
- [41] S. Levantino, G. Marucci, G. Marzin, A. Fenaroli, C. Samori, and A. L. Lacaita, "A 1.7 GHz Fractional-N Frequency Synthesizer Based on a Multiplying Delay-Locked Loop," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2678–2691, Nov. 2015.
- [42] J. Tao and C.-H. Heng, "A 2.2-GHz 3.2-mW DTC-Free Sampling ΔΣ Fractional-N PLL With -110-dBc/Hz In-Band Phase Noise and -246-dB FoM and -83-dBc Reference Spur," *IEEE Transactions on Circuits and* Systems I: Regular Papers, vol. 66, no. 9, pp. 3317–3329, Sep. 2019.
- [43] Y. Chen, J. Gong, R. B. Staszewski, and M. Babaie, "A Fractional-N Digitally Intensive PLL Achieving 428-fs Jitter and < -54-dBc Spurs Under 50-mV<sub>pp</sub> Supply Ripple," *IEEE Journal of Solid-State Circuits*, pp. 1–1, 2021.
- [44] Y. Liu, J. V. D. Heuvel, T. Kuramochi, B. Busze, P. Mateman, V. K. Chillara, B. Wang, R. B. Staszewski, and K. Philips, "An Ultra-Low Power 1.7-2.7 GHz Fractional-N Sub-Sampling Digital Frequency Synthesizer and Modulator for IoT Applications in 40 nm CMOS," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 5, pp. 1094–1105, May 2017.
- [45] K.-F. Un, G. Qi, J. Yin, S. Yang, S. Yu, C.-I. Ieong, P.-I. Mak, and R. P. Martins, "A 0.12-mm2 1.2-to-2.4-mW 1.3-to-2.65-GHz Fractional-N Bang-Bang Digital PLL With 8- μ s Settling Time for Multi-ISM-Band ULP Radios," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 66, no. 9, pp. 3307–3316, Sep. 2019.
- [46] W. Wu, C.-W. Yao, C. Guo, P.-Y. Chiang, P.-K. Lau, L. Chen, S. W. Son, and T. B. Cho, "32.2 A 14nm Analog Sampling Fractional-N PLL with a Digital-to-Time Converter Range-Reduction Technique Achieving 80fs Integrated Jitter and 93fs at Near-Integer Channels,"

in 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, Feb. 2021, pp. 444–446.

- [47] T. Seong, Y. Lee, C. Hwang, J. Lee, H. Park, K. J. Lee, and J. Choi, "17.3 A -58dBc-Worst-Fractional-Spur and -234dB-FoM<sub>jitter</sub>, 5.5GHz Ring-DCO-Based Fractional-N DPLL Using a Time-Invariant-Probability Modulator, Generating a Nonlinearity-Robust DTC-Control Word," in 2020 IEEE International Solid- State Circuits Conference -(ISSCC), Feb. 2020, pp. 270–272.
- [48] D. Liao and F. F. Dai, "A Fractional-N Reference Sampling PLL With Linear Sampler and CDAC Based Fractional Spur Cancellation," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 3, pp. 694–704, Mar. 2021.
- [49] N. Markulic, K. Raczkowski, P. Wambacq, and J. Craninckx, "A 10-bit, 550-fs step Digital-to-Time Converter in 28nm CMOS," in ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC), Sep. 2014, pp. 79–82.
- [50] J. Z. Ru, C. Palattella, P. Geraedts, E. Klumperink, and B. Nauta, "A High-Linearity Digital-to-Time Converter Technique: Constant-Slope Charging," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 6, pp. 1412–1423, Jun. 2015.
- [51] A. Tharayil Narayanan, M. Katsuragi, K. Kimura, S. Kondo, K. K. Tokgoz, K. Nakata, W. Deng, K. Okada, and A. Matsuzawa, "A Fractional-N Sub-Sampling PLL using a Pipelined Phase-Interpolator With an FoM of -250 dB," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 7, pp. 1630–1640, Jul. 2016.
- [52] A. Ben-Bassat, S. Gross, A. Lane, A. Nazimov, B. Khamaisi, E. Solomon, E. Banin, E. Borokhovich, N. Kimiagorov, N. Dinur, P. Skliar, R. Cohen, R. Banin, S. Zur, S. Reinhold, S. Breuer-Bruker, T. Abuhazira, T. Livneh, T. Maimon, U. Parker, A. Ravi, and O. Degani, "A Fully Integrated 27-dBm Dual-Band All-Digital Polar Transmitter Supporting 160 MHz for Wi-Fi 6 Applications," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 12, pp. 3414–3425, Dec. 2020.
- [53] Z. Gao, J. He, M. Fritz, J. Gong, Y. Shen, Z. Zong, P. Chen, G. Spalink, B. Eitel, K. Yamamoto, R. B. Staszewski, M. S. Alavi, and M. Babaie, "A 2.6-to-4.1GHz Fractional-N Digital PLL Based on a Time-Mode Arithmetic Unit Achieving -249.4dB FoM and -59dBc Fractional Spurs," in 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, Feb. 2022, pp. 380–382.

- [54] K. Kim, W. Yu, and S. Cho, "A 9 bit, 1.12 ps Resolution 2.5 b/Stage Pipelined Time-to-Digital Converter in 65 nm CMOS Using Time-Register," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 4, pp. 1007–1016, Apr. 2014.
- [55] Y. Wu, P. Lu, and R. B. Staszewski, "A Time-Domain 147fs<sub>rms</sub> 2.5-MHz Bandwidth Two-Step Flash-MASH 1-1-1 Time-to-Digital Converter With Third-Order Noise-Shaping and Mismatch Correction," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 67, no. 8, pp. 2532–2545, Aug. 2020.
- [56] P. Effendrik, W. Jiang, M. van de Gevel, F. Verwaal, and R. B. Staszewski, "Time-to-digital converter (TDC) for WiMAX ADPLL in 40-nm CMOS," in 2011 20th European Conference on Circuit Theory and Design (ECCTD), Aug. 2011, pp. 365–368.
- [57] P. J. A. Harpe, C. Zhou, Y. Bi, N. P. van der Meijs, X. Wang, K. Philips, G. Dolmans, and H. de Groot, "A 26μW 8 bit 10 MS/s Asynchronous SAR ADC for Low Energy Radios," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 7, pp. 1585–1595, Jul. 2011.
- [58] M. Zhang, C.-H. Chan, Y. Zhu, and R. P. Martins, "A 0.6-V 13-bit 20-MS/s Two-Step TDC-Assisted SAR ADC With PVT Tracking and Speed-Enhanced Techniques," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 12, pp. 3396–3409, Dec. 2019.
- [59] A. Homayoun and B. Razavi, "Analysis of Phase Noise in Phase/Frequency Detectors," *IEEE Transactions on Circuits and Sys*tems I: Regular Papers, vol. 60, no. 3, pp. 529–539, Mar. 2013.
- [60] R. B. Staszewski, C. Fernando, and P. T. Balsara, "Event-driven Simulation and modeling of phase noise of an RF oscillator," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 52, no. 4, pp. 723–733, Apr. 2005.
- [61] L. Vercesi, L. Fanori, F. De Bernardinis, A. Liscidini, and R. Castello, "A Dither-Less All Digital PLL for Cellular Transmitters," *IEEE Journal* of Solid-State Circuits, vol. 47, no. 8, pp. 1908–1920, Aug. 2012.
- [62] P. Chen, F. Zhang, Z. Zong, S. Hu, T. Siriburanon, and R. B. Staszewski, "A 31- μ W, 148-fs Step, 9-bit Capacitor-DAC-Based Constant-Slope Digital-to-Time Converter in 28-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 11, pp. 3075–3085, Nov. 2019.

- [63] A. Santiccioli, M. Mercandelli, L. Bertulessi, A. Parisi, D. Cherniak, A. L. Lacaita, C. Samori, and S. Levantino, "A 66-fs-rms Jitter 12.8-to-15.2-GHz Fractional-N Bang–Bang PLL With Digital Frequency-Error Recovery for Fast Locking," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 12, pp. 3349–3361, Dec. 2020.
- [64] B. Liu, Y. Zhang, J. Qiu, H. C. Ngo, W. Deng, K. Nakata, T. Yoshioka, J. Emmei, J. Pang, A. T. Narayanan, H. Zhang, T. Someya, A. Shirane, and K. Okada, "A Fully Synthesizable Fractional-N MDLL With Zero-Order Interpolation-Based DTC Nonlinearity Calibration and Two-Step Hybrid Phase Offset Calibration," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 68, no. 2, pp. 603–616, Feb. 2021.
- [65] X. Gao, E. A. M. Klumperink, P. F. J. Geraedts, and B. Nauta, "Jitter Analysis and a Benchmarking Figure-of-Merit for Phase-Locked Loops," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 2, pp. 117–121, Feb. 2009.
- [66] Z. Gao, M. Fritz, J. He, G. Spalink, R. B. Staszewski, M. S. Alavi, and M. Babaie, "A DPLL-Based Phase Modulator Achieving -46dB EVM with A Fast Two-Step DCO Nonlinearity Calibration and Non-Uniform Clock Compensation," in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Jun. 2022, pp. 14–15.
- [67] H. Liu, D. Tang, Z. Sun, W. Deng, H. C. Ngo, and K. Okada, "A SubmW Fractional-N ADPLL With FOM of -246 dB for IoT Applications," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 12, pp. 3540–3552, Dec. 2018.
- [68] Z. Xu, M. Osada, and T. Iizuka, "A 3.3-GHz 4.6-mW Fractional-N Type-II Hybrid Switched-Capacitor Sampling PLL Using CDAC-Embedded Digital Integral Path with -80-dBc Reference Spur," in 2021 Symposium on VLSI Circuits, Jun. 2021, pp. 1–2.
- [69] S. M. Dartizio, F. Tesolin, M. Mercandelli, A. Santiccioli, A. Shehata, S. Karman, L. Bertulessi, F. Buccoleri, L. Avallone, A. Parisi, A. L. Lacaita, M. P. Kennedy, C. Samori, and S. Levantino, "A 12.9-to-15.1-GHz Digital PLL Based on a Bang-Bang Phase Detector With Adaptively Optimized Noise Shaping," *IEEE Journal of Solid-State Circuits*, vol. 57, no. 6, pp. 1723–1735, Jun. 2022.
- [70] J. Kim, Y. Jo, Y. Lim, T. Seong, H. Park, S. Yoo, Y. Lee, S. Choi, and J. Choi, "32.4 A 104fsrms-Jitter and -61dBc-Fractional Spur 15GHz

Fractional-N Subsampling PLL Using a Voltage-Domain Quantization-Error Cancelation Technique," in 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, Feb. 2021, pp. 448–450.

- [71] O. E. Eliezer, R. B. Staszewski, I. Bashir, S. Bhatara, and P. T. Balsara, "A Phase Domain Approach for Mitigation of Self-Interference in Wireless Transceivers," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 5, pp. 1436–1453, May 2009.
- [72] B. Hong and A. Hajimiri, "A General Theory of Injection Locking and Pulling in Electrical Oscillators—Part I: Time-Synchronous Modeling and Injection Waveform Design," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 8, pp. 2109–2121, Aug. 2019.
- [73] L. Wu, T. Burger, P. Schönle, and Q. Huang, "A 3.3-GHz 101fsrms-Jitter, -250.3dB FOM Fractional-N DPLL with Phase Error Detection Accomplished in Fully Differential Voltage Domain," in 2020 IEEE Symposium on VLSI Circuits, Jun. 2020, pp. 1–2.
- [74] W. Wu, R. B. Staszewski, and J. R. Long, "A 56.4-to-63.4 GHz Multi-Rate All-Digital Fractional-N PLL for FMCW Radar Applications in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 5, pp. 1081–1096, May 2014.
- [75] A. A. Abidi, "Phase Noise and Jitter in CMOS Ring Oscillators," IEEE Journal of Solid-State Circuits, vol. 41, no. 8, pp. 1803–1816, Aug. 2006.
- [76] R. B. Staszewski, K. Waheed, F. Dulger, and O. E. Eliezer, "Spur-Free Multirate All-Digital PLL for Mobile Phones in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 12, pp. 2904–2919, Dec. 2011.
- [77] F. M. Gardner, *Phaselock Techniques, 3rd Edition*, 3rd ed. Hoboken, NJ: John Wiley & Sons Inc, Aug. 2005.
- [78] A. Hajimiri and T. H. Lee, "A general theory of phase noise in electrical oscillators," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 179–194, Feb. 1998.
- [79] C.-R. Ho and M. S.-W. Chen, "A Digital PLL With Feedforward Multi-Tone Spur Cancellation Scheme Achieving <-73 dBc Fractional Spur and <-110 dBc Reference Spur in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 12, pp. 3216–3230, Dec. 2016.
- [80] C. R. Ho and M. S. W. Chen, "A fractional-N DPLL with adaptive spur cancellation and calibration-free injection-locked TDC in 65nm

CMOS," in 2014 IEEE Radio Frequency Integrated Circuits Symposium, Jun. 2014, pp. 97–100.

- [81] D. Cherniak, M. Mercandelli, L. Bertulessi, F. Padovan, L. Grimaldi, A. Santiccioli, M. Aichner, C. Samori, and S. Levantino, "A 250-Mb/s Direct Phase Modulator With -42.4-dB EVM Based on a 14-GHz Digital PLL," *IEEE Solid-State Circuits Letters*, vol. 3, pp. 126–129, 2020.
- [82] Z. Gao, M. Fritz, G. Spalink, R. B. Staszewski, and M. Babaie, "A Digital PLL-Based Phase Modulator With Non-Uniform Clock Compensation and Non-linearity Predistortion," *IEEE Journal of Solid-State Circuits*, vol. 58, no. 9, pp. 2526–2542, Sep. 2023.
- [83] J. Zhuang, K. Waheed, and R. B. Staszewski, "A Technique to Reduce Phase/Frequency Modulation Bandwidth in a Polar RF Transmitter," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 57, no. 8, pp. 2196–2207, Aug. 2010.
- [84] N. Markulic, P. T. Renukaswamy, E. Martens, B. van Liempd, P. Wambacq, and J. Craninckx, "A 5.5-GHz Background-Calibrated Subsampling Polar Transmitter With -41.3-dB EVM at 1024 QAM in 28-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 4, pp. 1059–1073, Apr. 2019.
- [85] A. Ba, Y. Liu, J. van den Heuvel, P. Mateman, B. Büsze, J. Dijkhuis, C. Bachmann, G. Dolmans, K. Philips, and H. D. Groot, "A 1.3 nJ/b IEEE 802.11ah Fully-Digital Polar Transmitter for IoT Applications," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 12, pp. 3103–3113, Dec. 2016.
- [86] G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, "A 20 Mb/s Phase Modulator Based on a 3.6 GHz Digital PLL With -36 dB EVM at 5 mW Power," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 12, pp. 2974–2988, Dec. 2012.
- [87] C.-C. Li, M.-S. Yuan, C.-C. Liao, Y.-T. Lin, C.-H. Chang, and R. B. Staszewski, "All-Digital PLL for Bluetooth Low Energy Using 32.768kHz Reference Clock and ≤0.45-V Supply," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 12, pp. 3660–3671, Dec. 2018.
- [88] L. Vercesi, L. Fanori, F. De Bernardinis, A. Liscidini, and R. Castello, "A Dither-Less All Digital PLL for Cellular Transmitters," *IEEE Journal* of Solid-State Circuits, vol. 47, no. 8, pp. 1908–1920, Aug. 2012.

- [89] S. Gunturi, J. Tangudu, S. Ramakrishnan, J. Janardhanan, D. Sahu, and S. Mukherjee, "Principal architectural changes in polar transmitter in DRP design for WLAN," in 2013 National Conference on Communications (NCC), Feb. 2013, pp. 1–5.
- [90] R. Staszewski, J. Wallberg, S. Rezeq, C.-M. Hung, O. Eliezer, S. Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise, M. Entezari, K. Muhammad, and D. Leipold, "All-digital PLL and transmitter for mobile phones," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
- [91] M. J. Underhill and R. I. H. Scott, "Wideband frequency modulation of frequency synthesisers," *Electronics Letters*, vol. 13, no. 15, pp. 393–394, 1979.
- [92] C. Durdodt, M. Friedrich, C. Grewing, M. Hammes, A. Hanke, S. Heinen, J. Oehm, D. Pham-Stabner, D. Seippel, D. Theil, S. Van Waasen, and E. Wagner, "A low-IF RX two-point ΣΔmodulation TX CMOS single-chip Bluetooth solution," *IEEE Transactions on Microwave Theory and Techniques*, vol. 49, no. 9, pp. 1531–1537, Sep. 2001.
- [93] R. B. Staszewski, I. Bashir, and O. Eliezer, "RF Built-in Self Test of a Wireless Transmitter," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 54, no. 2, pp. 186–190, Feb. 2007.
- [94] I. L. Syllaios, P. T. Balsara, and R. B. Staszewski, "Recombination of Envelope and Phase Paths in Wideband Polar Transmitters," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 57, no. 8, pp. 1891–1904, Aug. 2010.
- [95] Y.-H. Liu, S. Sheelavant, M. Mercuri, P. Mateman, J. Dijkhuis, W. Zomagboguelou, A. Breeschoten, S. Traferro, Y. Zhan, T. Torf, C. Bachmann, P. Harpe, and M. Babaie, "9.3 A680 μW Burst-Chirp UWB Radar Transceiver for Vital Signs and Occupancy Sensing up to 15m Distance," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), Feb. 2019, pp. 166–168.
- [96] P. Harpe, "A Compact 10-b SAR ADC With Unit-Length Capacitors and a Passive FIR Filter," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 3, pp. 636–645, Mar. 2019.
- [97] O. Eliezer, B. Staszewski, J. Mehta, F. Jabbar, and I. Bashir, "Accurate self-characterization of mismatches in a capacitor array of a

digitally-controlled oscillator," in 2010 IEEE Dallas Circuits and Systems Workshop, Oct. 2010, pp. 1–4.

- [98] E. McCune, Practical Digital Wireless Signals, illustrated edition ed. Cambridge University Press, Feb. 2010.
- [99] C.-E. Sundberg, "Continuous phase modulation," *IEEE Communica*tions Magazine, vol. 24, no. 4, pp. 25–38, Apr. 1986.
- [100] R. Schreier and G. C. Temes, Understanding Delta-Sigma Data Converters, 1st ed. Piscataway, NJ: Hoboken, N.J.; Chichester: Wiley-IEEE Press, Nov. 2004.
- [101] Y. Liu, W. Rhee, and Z. Wang, "A 1Mb/s 2.86% EVM GFSK Modulator Based on ΔΣ BB-DPLL without Background Digital Calibration," in 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Aug. 2020, pp. 7–10.

# **List of Publications**

### **Journal Papers**

- Z. Gao, R. B. Staszewski, and M. Babaie, "Canceling Fundamental Fractional Spurs due to Self-Interference in a Digital Phase-Locked Loop," *IEEE Journal of Solid-State Circuits*. (Under Review)
- Z. Gao, M. Fritz, G. Spalink, R. B. Staszewski, and M. Babaie, "A Digital PLL-Based Phase Modulator with Non-Uniform Clock Compensation and Nonlinearity Predistortion," *IEEE Journal of Solid-State Circuits*, vol. 58, no. 9, pp. 2526–2542, Sep. 2023. DOI: 10.1109/JSSC.2023.3270265. [IEEE Xplore link (Open Access)]
- Z. Gao, J. He, M. Fritz, J. Gong, Y. Shen, Z. Zong, P. Chen, G. Spalink, B. Eitel, M. S. Alavi, R. B. Staszewski, and M. Babaie, "A Low-Spur Fractional-N PLL Based on a Time-Mode Arithmetic Unit," *IEEE Journal of Solid-State Circuits*, vol. 58, no. 6, pp. 1552–1571, Jun. 2023. DOI: 10.1109/JSSC.2022.3209338. [IEEE Xplore link (Open Access)]

### **Conference** Papers

- Z. Gao, M. Fritz, J. He, G. Spalink, R. B. Staszewski, M. S. Alavi, and M. Babaie, "A DPLL-Based Phase Modulator Achieving -46dB EVM with A Fast Two-Step DCO Nonlinearity Calibration and Non-Uniform Clock Compensation," 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Jun. 2022, pp. 14–15. DOI: 10.1109/VLSITechnologyandCir46769.2022.9830398. [IEEE Xplore link]
- Z. Gao, J. He, M. Fritz, J. Gong, Y. Shen, Z. Zong, P. Chen, G. Spalink, B. Eitel, K. Yamamoto, R. B. Staszewski, M. S. Alavi, and M. Babaie, "A 2.6-to-4.1GHz Fractional-N Digital PLL Based on a Time-Mode Arithmetic Unit Achieving -249.4dB FoM and -59dBc Fractional Spurs," 2022 IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2022, vol. 65, pp. 380–382. DOI: 10.1109/ISSCC42614.2022.9731561. [IEEE Xplore link]

- Y. Hu, X. Chen, T. Siriburanon, J. Du, Z. Gao, V. Govindaraj, A. Zhu, and R. B. Staszewski, "17.6 A 21.7-to-26.5GHz Charge-Sharing Locking Quadrature PLL with Implicit Digital Frequency-Tracking Loop Achieving 75fs Jitter and -250dB FoM," 2020 IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2020, pp. 276–278. DOI: 10.1109/ISSCC19947.2020.9063024. [IEEE Xplore link]
- Z. Gao, Y. Hu, T. Siriburanon, and R. B. Staszewski, "28 GHz Quadrature Frequency Generation Exploiting Injection-Locked Harmonic Extractors for 5G Communications," 2019 17th IEEE International New Circuits and Systems Conference (NEWCAS), Jun. 2019, pp. 1–4. DOI: 10.1109/NEWCAS44328.2019.8961293. [IEEE Xplore link]

## Summary

Reducing power consumption is becoming increasingly important for the sustainability of the communication industry because it is expected to consume a significant portion of the global electricity in the face of the exponentially increasing demands on the volume and rate of data transmission. As the scope narrows to the individual wireless device level, the reduced power consumption helps to extend the lifetime of battery-powered devices, thereby leading to improved user experience and enabling the development of innovative applications. The quest for the lower power consumption will profoundly shape the wireless transceiver design, i.e., each critical block in the system should constantly reduce its drained power without sacrificing the performance. With this background, the thesis focuses on the phase-locked loops (PLL) that generate RF clocks for wireless transceivers, and develops low-power techniques suppressing the fractional-spur levels when the PLL generates unmodulated carrier, and the phase modulation (PM) error when the PLL additionally serves as a two-point modulator.

The PLL's fractional-spur issue is investigated at both the block and system levels. At the block level, we notice the fractional spurs are dominated by PLLs' phase-error-extraction blocks whose accuracy is degraded by the fact that their operation relies on imperfect time bases to cancel the deterministic time-offset pattern input to the phase detector. We then proposed to utilize a 'golden' time base, i.e., the period of a digitally controlled oscillator's (DCO) output clock. To realize this new strategy adopting the 'golden' time base, we proposed a universal time-signal processing circuit—time-mode arithmetic unit (TAU). The TAU can calculate the weighted sum of all its timestamp inputs, thereby making it sufficient to extract the DCO phase error by processing certain timestamp differences from the PLL, i.e., the DCO period and the time difference between certain reference and DCO clock edges. A prototype TAU-based PLL exhibits a low level of fractional spurs, which can be maintained under supply and temperature drift in the measurement. This validates the superiority of using the 'golden' time base to extract the PLL phase error.

At the system level, the PLL's fundamental fractional spurs can be raised by various self-interference sources, especially the in-band and DCO interference originating from the mutual coupling between the DCO and reference clock circuitry. We first analyze the characteristics of these two types of interference and the corresponding impacts on the fundamental fractional spurs. Based on two features of the self-interferences, i.e., sinusoidal pattern and synchronicity with the predicted DCO phase, we develop a digitally intensive strategy that cancels the DCO-interference-raised fundamental fractional spurs by injecting a well-designed in-band interference. This strategy is verified on-chip where the DCO is significantly disturbed by a coupled reference clock. After applying the proposed strategy, the worst fundamental fractional spurs across fractional channels are suppressed by over 10 dB, proving the effectiveness of this spur-cancellation strategy.

Regarding the phase modulation error, this thesis explores the techniques for improving the error vector magnitude (EVM) of a PLL-based phase modulator. Because a PLL-based modulator acquires the desired phase shift by integrating the modulating frequency over a sampling clock period. improving the phase modulation (PM) accuracy should tackle the errors related to both the nonlinearity in frequency modulation (FM) and the nonuniformity of the sampling clock grid. The non-uniform sampling clock issue is attributed to the fact that the DCO's modulation frequency updates at the clock generated by re-timing the reference clock to the phase-modulated DCO output. Consequently, the re-timed clock inherits some PM features and exhibits non-uniform characteristics disturbing the PLL, thus ultimately degrading the PM accuracy. To tackle this issue, a hybrid-time domain model is developed to analyze the clock-timing-related distortions, and then a non-uniform clock compensation (NUCC) scheme is proposed based on The FM-related error is dominated by the nonlinear DCO. this model. Compared with the existing DCO linearization techniques, which attempt to compensate all the nonlinearity sources by pre-distorting the oscillator tuning word (OTW), we consider the  $1/\sqrt{LC}$ -induced DCO nonlinearity separately and address it by pre-distorting the target modulating phase that has not yet been denormalized to the OTW. The phase-domain DPD can improve PM accuracy without requiring preliminary knowledge of the physical parameters. Furthermore, combining the phase and OTW domain DPD techniques constitutes a carrier-frequency-insensitive DCO linearization strategy, reducing the associated calibration efforts and channel-hopping time. Finally, a prototype digital-PLL-based phase modulator adopting the proposed strategies exhibits low EVM and low energy per bit in the

measurement, proving the effectiveness of these techniques.

# List of Figures

| 1.1 | System diagram and signal spectra illustrating how the RF clock spurs can impact the SNR of the received signal and the out-of-band emission of the transmitted signal. | 2  |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2 | Constellation diagram of (a) 4-QAM and (b) 16-QAM, illustrating the impact of the RF clock's phase error.                                                               | 3  |
| 1.3 | Contribution of the communication industry to the global electricity usage [2]                                                                                          | 5  |
| 1.4 | Transmitter architectures: (a) Cartesian transmitter and (b) polar transmitter.                                                                                         | 6  |
| 1.5 | Diagram of a conventional analog PLL                                                                                                                                    | 7  |
| 1.6 | Diagram of an analog PLL with DTC canceling the quantiza-<br>tion noise of the MMD.                                                                                     | 8  |
| 1.7 | Comparing (a) delay-chain DTC and (b) current DAC                                                                                                                       | 9  |
| 1.8 | Block diagram of a digital PLL, as a counterpart of Fig. 1.6.                                                                                                           | 10 |
| 1.9 | Diagram of a PLL-based frequency/phase modulator realized<br>by two-point modulating the digital PLL shown in Fig. 1.6.                                                 | 11 |
| 2.1 | Time offset cancellation strategies to narrow the required input<br>range of phase detector (PD) using (a) DTC and (b) voltage-                                         |    |
|     | domain cancellation.                                                                                                                                                    | 16 |
| 2.2 | Conceptual diagram of the proposed TAU-based PLL                                                                                                                        | 19 |
| 2.3 | Conceptual and timing diagrams of time register $(TR)$                                                                                                                  | 20 |
| 2.4 | Conceptual and timing diagrams of weighted time register (WTR)                                                                                                          | 21 |
| 2.5 | Conceptual and timing diagrams of differential weighted time registers (DWTR)                                                                                           | 22 |
| 2.6 | Conceptual and timing diagrams of time-mode arithmetic unit (TAU)                                                                                                       | 23 |
| 2.7 | RC tuning in the weighted time register (WTR)                                                                                                                           | 24 |

| 2.8  | Timing diagram of the differential WTRs' in a complete TAU                                         |
|------|----------------------------------------------------------------------------------------------------|
|      | execution cycle.                                                                                   |
| 2.9  | Simplified diagram of the TAU-centered sub-system, and tim-<br>ing diagram of the state transition |
| 2.10 | Differential snapshot circuit–schematic and waveforms                                              |
| 2.11 | Boundary cases of the metastability mitigation mechanism in                                        |
|      | the differential snapshot circuit                                                                  |
| 2.12 | Schematic and waveform diagrams of the global FSM                                                  |
| 2.13 | Tri-mode PFD—the simplified diagram and waveforms $\ldots$                                         |
| 2.14 | Single pulse-pair generation (SPPG) logic.                                                         |
| 2.15 | Simplified diagram of the local FSM                                                                |
| 2.16 | Waveforms of the local FSM                                                                         |
| 2.17 | Schematic of the implemented WTR                                                                   |
| 2.18 | Level-crossing slicer in the WTR: schematic and waveforms.                                         |
| 2.19 | Visualization of the equivalent discharge time accumulated on                                      |
|      | the differential WTRs                                                                              |
| 2.20 | Implementation diagram of the RC encoder                                                           |
| 2.21 | Top-level diagram of the proposed PLL                                                              |
| 2.22 | Time-domain noise injected into the differential WTRs                                              |
| 2.23 | Jitter contributors of an SWD pulse-pair                                                           |
| 2.24 | Comparing the time error extracted by an ideal phase detector                                      |
|      | and that from a conceptual TAU                                                                     |
| 2.25 | Characterization of the TAU's INL: (a) principle and (b) con-                                      |
|      | ceptually expected INL curve                                                                       |
| 2.26 | INL curve of the TAU shaped by component mismatch                                                  |
| 2.27 | Simulated INL of TAU at the supply of $1 \text{ V}$ and $1.1 \text{ V}$                            |
| 2.28 | (Equivalent) delay error under sinusoidal supply fluctuating                                       |
|      | between $1 \text{ V}$ and $1.1 \text{ V}$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$    |
| 2.29 | Foreground piecewise calibration for the INL of TAU                                                |
| 2.30 | Illustration of the $FCW_{frac}$ impact on the foreground INL                                      |
|      | calibration                                                                                        |
| 2.31 | Simulation results of the TAU's INL calibrations                                                   |
| 2.32 | Micrograph and power breakdown of the fabricated chip                                              |
| 2.33 | Measured PN at 2668.2 MHz.                                                                         |
| 2.34 | Measured PN with the s-domain prediction                                                           |
| 2.35 | Measured rms jitter across the full frequency tuning range                                         |
| 2.36 | Measured PLL spectra under different calibration and envi-                                         |
|      | ronmental variation scenarios                                                                      |
| 2.37 | Measured PN at a near-integer channel                                                              |

| 2.38 | The worst-case fractional spur level and the corresponding integrated rms jitter versus fractional FCW                                                                                                                                | 60  |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 3.1  | Block diagram (a) and phase domain model (b) of a type-II PLL.                                                                                                                                                                        | 65  |
| 3.2  | Waveform diagram of the key signals in the interference model<br>in Fig. 3.1. The $\phi_{i,IB}$ and $\phi_{i,DCO}$ signal patterns are syn-<br>chronous with the $\phi_{R,frac}$ sequence. Note, $\phi_{0,IB}$ and $\phi_{0,DCO}$ are |     |
|      | some constant phase offsets                                                                                                                                                                                                           | 67  |
| 3.3  | Waveforms illustrating how FREF can disturb the DCO phase.                                                                                                                                                                            | 69  |
| 3.4  | PLL's output spectrum with spurs raised by the interference                                                                                                                                                                           | ~ ~ |
|      | coupled from FREF                                                                                                                                                                                                                     | 69  |
| 3.5  | Schematic and waveforms illustrating the supply-ripple-induced                                                                                                                                                                        |     |
|      | FREF delay.                                                                                                                                                                                                                           | 72  |
| 3.6  | PLL diagram emphasizing the details related to spur cancellation.                                                                                                                                                                     | 74  |
| 3.7  | PLL diagram emphasizing partitioning of the power domains.                                                                                                                                                                            | 76  |
| 3.8  | Chip micrograph.                                                                                                                                                                                                                      | 77  |
| 3.9  | CKV spectra (a) before and (b) after utilizing the LUT to                                                                                                                                                                             |     |
|      | suppress the in-band interference                                                                                                                                                                                                     | 77  |
| 3.10 | Measured fundamental fractional-spur levels versus $FCW_{frac,s}$ :<br>(a) before and (b) after canceling the in-band interference with                                                                                               | -   |
|      | the LUT in Fig. $3.6.$                                                                                                                                                                                                                | 78  |
| 3.11 | Spectrum of the free-running DCO with spurs caused by FREF.                                                                                                                                                                           | 79  |
| 3.12 | Diagrams explaining the principle of FREF-delay-based method                                                                                                                                                                          | ~ ~ |
|      | to cancel the spurs raised by self-interference                                                                                                                                                                                       | 80  |
| 3.13 | Measured fundamental fractional-spur level versus $FCW_{frac,s}$<br>after tuning FREF delay                                                                                                                                           | 82  |
| 3.14 | Phasor diagram illustrating how the in-band interference de-                                                                                                                                                                          |     |
|      | signed for spur cancelation $(\vec{\phi}_{\rm SC})$ is fed-forward by the loop                                                                                                                                                        |     |
|      | filter (as $\vec{\phi}_{SC,ff}$ ) and then cancels with the DCO interference                                                                                                                                                          |     |
|      | $(\vec{\phi}_{ m DCO})$                                                                                                                                                                                                               | 84  |
| 3.15 | Diagram explaining the principle of searching for $\theta_{\rm SC,ff}$                                                                                                                                                                | 85  |
| 3.16 | Phasor diagram explaining the principle of searching for $A_{\rm SC}$                                                                                                                                                                 | 87  |
| 3.17 | Flow to determine the spur-cancellation content of the LUT                                                                                                                                                                            |     |
|      | (in Fig. 3.6), i.e., the waveform of $\vec{\phi}_{\rm SC}$ which is logically stored<br>in the SC-LUT.                                                                                                                                | 88  |
| 3.18 | (a) Measured $\theta_{\rm PD}$ -versus- $ \rm FCW_{\rm frac.s} $ curve used for searching                                                                                                                                             |     |
|      | $\theta_{\rm SC,ff}$ . (b) Convergence curve of $A_{\rm x}$ to determine $A_{\rm SC}$                                                                                                                                                 | 91  |
| 3.19 | PLL's output spectra and phase noise profiles before (a) and                                                                                                                                                                          |     |
|      | after (b) applying the proposed spur cancellation technique.                                                                                                                                                                          | 91  |

| 3.20                                    | Comparison of the worst fractional spur (a) and integrated jitter (b) versus $FCW_{frac,s}$ before and after applying the proposed spur cancellation technique. | 92   |
|-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 4.1                                     | Block diagram of a digital polar transmitter                                                                                                                    | 97   |
| 4.2                                     | Discrete-time domain model of an ideal PLL-based phase                                                                                                          |      |
|                                         | modulator with a two-point modulation                                                                                                                           | 99   |
| $\begin{array}{c} 4.3\\ 4.4\end{array}$ | Hybrid-time model of the DCO: (a) schematic and (b) waveforms.<br>Phase modulator with delay spread compensation: (a) wave-                                     | .100 |
|                                         | forms and (b) block diagram                                                                                                                                     | 101  |
| 4.5                                     | Phase modulator with the proposed non-uniform clock com-                                                                                                        |      |
|                                         | pensation (NUCC)                                                                                                                                                | 103  |
| 4.6                                     | Waveforms showing the phase modulation error due to the                                                                                                         |      |
|                                         | non-uniform CKU period                                                                                                                                          | 104  |
| 4.7                                     | Comparison of different $\Delta \phi'_{\rm V,E}$ correction strategies                                                                                          | 105  |
| 4.8                                     | Predicting $\phi'_{\rm S}$ by subtracting $\phi_{\rm R2S}$ from $\phi'_{\rm R}$ , in face of the                                                                | 100  |
| 1.0                                     | non-uniform CKU.                                                                                                                                                | 106  |
| 4.9                                     | Extracted open-loop representation in the direct-modulation                                                                                                     | 105  |
| 4 1 0                                   | path of the phase modulator                                                                                                                                     | 107  |
| 4.10                                    | Pre-distorting DCO nonlinearity in different domains                                                                                                            | 109  |
| 4.11                                    | Simplified block diagram of the implemented phase modulator                                                                                                     | 111  |
| 4.12                                    | Implementation of NUCC with the calibration for the constant                                                                                                    | 110  |
| 4.13                                    | Simulated EVM versus PLL bandwidth under different NUCC                                                                                                         | 112  |
|                                         | settings                                                                                                                                                        | 114  |
| 4.14                                    | Schematics of the DCO core and the control logic                                                                                                                | 115  |
| 4.15                                    | Behavioral description of the LUT with off-line calibration in                                                                                                  | 110  |
| 4 1 0                                   | Fig. 4.14 $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                                                                                                          | 116  |
| 4.10                                    | Breaking down the $NT_{cnst}$ components                                                                                                                        | 118  |
| 4.17                                    | Micrograph and power breakdown of the fabricated chip                                                                                                           | 119  |
| 4.18                                    | Simulations of equivalent inductors of DCO's soil                                                                                                               | 120  |
| 4.19                                    | M DSK gignal generation and the gatup for maggining the                                                                                                         | 121  |
| 4.20                                    | m-r SK signal generation and the setup for measuring the                                                                                                        | 199  |
| 1 91                                    | Constellation diagram of a 60 Mb/s 64 PSK signal measured                                                                                                       | 122  |
| 4.21                                    | $f_{2}$ = -3188 MHz                                                                                                                                             | 194  |
| 1 22                                    | Measured phase noise at 3188 MHz under the same loop hand-                                                                                                      | 124  |
| 7.44                                    | width setting as the EVM measurements in Fig. 4.91                                                                                                              | 125  |
| 4 23                                    | Measured spectrum of the BF output clock modulated with                                                                                                         | 140  |
| 1.20                                    | a 60 Mb/s (10 MSymbol/s) 64-PSK signal at the RF channel                                                                                                        |      |
|                                         | frequency of $3188 \div 8$ MHz.                                                                                                                                 | 125  |
|                                         | L V                                                                                                                                                             |      |

| 4.24         | Measured EVM versus fractional FCW and the $\Delta f_{\rm M}$ distribu-                                    |     |
|--------------|------------------------------------------------------------------------------------------------------------|-----|
|              | tion correlated with the DCO's FM-INL                                                                      | 126 |
| 4.25         | Measured EVM versus the DCO carrier frequency $(f_0)$ at different forward frequency division ratios $(K)$ | 127 |
| 4.26         | Measured transient trajectories of the calibration coefficients                                            |     |
|              | and EVM                                                                                                    | 127 |
| $5.1 \\ 5.2$ | System diagram of TAU-based PLL with MMD                                                                   | 137 |
|              | interferences                                                                                              | 138 |
| A.1          | Schematic of the differential vernier TDC                                                                  | 140 |
| B.1          | Simplified schematic of the TAU's slicing comparator and the waveforms during output.                      | 142 |
## List of Tables

- 2.1 Comparison with state-of-the-art fractional-N PLLs . . . . 62
- 4.1 Comparison with state-of-the-art PLL-based phase modulators.129

## Acknowledgement

Time has passed in the blink of an eye. It feels like just yesterday that I arrived in the Netherlands to embark on my PhD journey at TU Delft. This path has been a long and enriching one, filled with adventures and moments of pure joy. Now, on the brink of reaching the destination, I reflect on the many individuals who have supported me along the way. I am deeply grateful for their invaluable contributions to this incredible journey.

First of all, I would like to express my deep gratitude and appreciation to my promoter and advisor, Professor Robert Bogdan Staszewski, for the guidance, support and encouragement throughout my Ph.D. journey. I started my Ph.D. career in your group in Ireland. Realizing my inclination towards more freedom in research, you recommended me to this new Ph.D. project in TU Delft. I am indebted to you for providing me this invaluable opportunity to engage in research alongside esteemed colleagues from academia and seasoned experts from the industry. Over the duration of this project, I appreciate the autonomy I've been granted in selecting research topics and forging pathways. However, the meticulous scrutiny of each research step, focusing on both innovation for publication and robustness for massive production, has been a challenging aspect of this experience. Fortunately, during these challenging moments, your visionary outlook and boundless enthusiasm for research have always guided me. I highly regard the depth of insight I have gained under your supervision and cherish the invaluable lessons learned from you.

I would like to extend my sincere gratitude to my promoter and daily supervisor, Dr. Masoud Babaie. Your dedication in thoroughly reviewing my progress and presentations before our monthly update meetings with industry collaborators has significantly enhanced the effectiveness of our communication and helped in averting potential issues. Despite the aggressive project management approach you adopt, which has undoubtedly introduced additional challenges and pressure into my Ph.D. journey, these experiences have played a crucial role in my growth and preparedness for future challenges. Thank you for your unwavering support and guidance. I would also like to thank Dr. Morteza Alavi, another invaluable supervisor in my project. Thank you for your invaluable suggestions over the past five years, which have elevated various aspects including my slides, presentations, and papers. Additionally, I appreciate your consistent encouragement, no matter how small the progress may have seemed.

My Ph.D. project is sponsored by Sony Europe B.V., Stuttgart, Germany and Sony Semiconductor Solutions, Atsugi, Japan. The expertise and assistance I received from the experts there were invaluable. I want to express my deepest gratitude to Martin Fritz. He dedicated countless hours to providing constructive suggestions for enhancing my presentation skills, enabling me to conduct smoother update meetings with Sony. Without his assistance in data processing and his efforts in streamlining the management processes, I would not have been able to complete this project as swiftly. I also extend my thanks to Ken Yamamoto for his in-depth technical inquiries, which have prompted continual self-reflection and refinement of my designs. Furthermore, I appreciate his willingness to forgo co-authorship, facilitating the timely submission of my conference and journal papers. I am grateful to Gerd Spalink for engaging in enriching technical discussions, and I would like to acknowledge Ben Eitel for addressing critical management issues.

I would like to thank the members of my doctoral examine committee: Prof. Vaucher, Prof. Nauta, Prof. Levantino, Dr. Yamaoto, Dr. Liu, and Prof. de Vreede for meticulous review of the dissertation, insightful suggestions for enhancing the manuscript, and generous investment of their valuable time. My gratitude also goes to other professors in Microelectronics Department of TU Delft: Dr. Spirito, Dr. Gao, Dr. Muratore.

Our supporting staff also deserve praise and applause. Their help and supports have made my PhD research work smoother. I sincerely thank Atef Akhnoukh for his exceptional patience and unwavering commitment in assisting with the tapeouts. I am also thankful to Juan Bueno Lopez for his assistance in setting up measurements, to Zu-yao Chang for wire bonding of my chips, and to Lukasz Pakula for taking micrograph of my chips. Additionally, I appreciate Antoon Frehe for providing highly efficient CAD software and IT support. I would also like to express my thanks to Marion de Vlieger, our dedicated group secretary, for her administrative support.

I want to express my heartfelt gratitude to all my colleagues and friends at TU Delft. A special mention goes to Jiang Gong and Yue Chen for their technical discussions that transformed me from a novice in PLL design to an expert. Great thanks to Jingchu He for sharing the project pressures and fighting with toutuos together. Without her collaboration and support, I

175

might have considered stepping away from this project in its early stages. Many thanks to Linghan Zhang and Bolin Chen. Having them as roommates during the lockdown imposed by the Corona pandemic was a stroke of luck. Their unwavering support, both in technical matters and in daily life, along with their companionship, brought light to those otherwise gloomy days. Thanks to Yiyu Shen. His guidance in digital backend, our technical discussions, and the wealth of information he shared have been invaluable to me. Thanks to Zhiri Zong and Mohammad Reza Beikmirza for designing and implementing circuit blocks for my tapeouts. Their assistance was indispensable in meeting crucial deadlines. Thanks to Ying Wu, whose insights into the significance of time registers left a lasting impression on me and marked the inception of a key innovation in my Ph.D. project. Thanks to Mohammad Ali Montazerolghaem and Mohsen Mortazavi for their guidance in setting up my accounts on my very first day at TU Delft, and for their continuous support since then. Thanks to Jordi van der Meulen for sharing intriguing facets of Dutch life, enriching my experience here. I also want to thank Lei Zhou, Gagan Singh, Masoud Pashaeifar, Lunan Gu, Lianbo Liu, Rishabh Gurbaxani, Ehsan Shokrolahzade, Anil Kumar Kumaran, Amir Arsalan Kiavar, Alireza Ghafari, Dimitris Verroiopoulos, Haris charalampidis, Rob Bootsman, Dieuwert Mul, Richard Coesoij, Simon Verkleij, Visweswaran Karunanithi, Satoshi Malotaux, Huanqiang Duan, Jun Feng, Martijn Hoogelander, Niels Fakkel, Carmine De Martino, Nawaf Almotairi, Mingliang Tan, Dapeng Sun, Wenjun Yang, Jin Yan, Yutong Wang, Yingwen Zhao, Nan Bai, Diyun Yuan and many others that I forgot to mention.

I wish to thank my friends in China whose technical prowess and unwavering support were instrumental in the ultimate success of my Ph.D. project. A special thanks goes to Minglei Zhang for imparting his expertise in TDC design and introducing me to the concept of time amplification. I am deeply appreciative of Shirui Zhao and Hui Zhang for our insightful discussions on digital design, particularly the invaluable insights shared regarding clockgating techniques, which ultimately proved pivotal in salvaging my chip. Thanks to Yan Zhao and Linchao Han for their help in the digital backend. Thanks to Xiaoping Dong for generously sharing useful scripts accelerating layout implementation. Thanks to Haitao Meng, Huashuo Zhang, Renjie Zhou for dicussions on measurement debugging. Thanks to Xiang Guan for his invaluable suggestions in doing Ph.D., as well as for his recommendation that helps secure my Ph.D. position.

I wish to thank my Friends in Ireland, where I started my Ph.D. life. Special Thanks to Peng Chen and Feifei Zhang for technical discussions, mental support during the depressing moments, treating me with meals each time they arrived at Delft, and sharing useful information when I searched for jobs. Thanks to Suoping Hu and Zhao Chen for showing me around Dublin. Thanks to Dawei Mai for his companion in doing teaching assistance. Thanks for Chen Ling for the tour of the UCD campus. I also want to thank Teerachot Siriburanon, Xi Chen, Jianglin Du, Kai Xu, Yizhe Hu, Zhenyu Ren, Hongying Wang, Viet Anh Nguyen, Hieu Minh Nguyen, Amir Bozorg, Ali Esmailiyan, Tian Meng, Zhongzheng Wang, and many others that I forgot to mention.

I embarked on my IC design career at the Chinese Academy of Sciences under the guidance of Professor Yuepeng Yan. I am sincerely grateful to him for introducing me to this captivating field. I extend my thanks to Chang Liu and Guiliang Guo for their invaluable supervision during my time there. Additionally, I want to express my appreciation to my former colleagues at the Chinese Academy of Sciences: Yulin Zhang, Xu Cheng, Hua Chen, Zhengyu Sun, Yuzhe Liu, Jinwang Zheng, Yu Jiang, Jingyu Han, Tao Yang, Chao Luo, Shengyou Liu, Jiang Fei Guo, and Rongjiang Liu.

Lastly, I reserve my deepest gratitude for my beloved parents, Baba and Mama, whose unwavering love and encouragement have been the cornerstone of my journey. They have selflessly stood by every significant decision I've made, even if it meant incurring potential challenges in their own lives. Special thanks to my uncles and aunts for stepping in to care for my parents during times of illness, easing my guilt over being absent in those critical moments. I am also grateful to my cousins for their frequent visits to my mother, providing companionship and alleviating her sense of loneliness, particularly during the festivals when I was abroad. Their presence made a significant difference in her well-being.

> Zhong Gao Oct. 2023 Hangzhou, China

## Chip Micrograph Gallery



A Digital Polar TX Using A PLL-Based Phase Modulator ISSCC 2022, VLSI 2022, JSSC 2022, JSSC 2023



**Revised Version of the Digital Polar TX** 

## About the Author



Zhong Gao received the B.Sc. degree in Physics from Shandong University, Jinan, China, in 2011, and M.Sc. degree in Microelectronics and Solid State Electronics from University of Chinese Academy of Science, Beijing, China, in 2014. Since January 2019, he has been pursuing the Ph.D. degree in Microelectronics at Delft University of Technology, Delft, The Netherlands. Before joining TU Delft, he worked on wireless transceiver design in Altobeam Inc., Beijing, China. His current research interests include mixed-signal IC and RF transceiver system design.