# A 200 µA Duty-Cycled PLL for Wireless Sensor Nodes in 65 nm CMOS Drago, Salvatore; Leenaerts, Domine M.W.; Nauta, Bram; Sebastiano, Fabio; Makinwa, Kofi A.A.; Breems, Lucien J. DOI 10.1109/JSSC.2010.2049458 **Publication date** 2010 **Document Version** Accepted author manuscript Published in IEEE Journal of Solid State Circuits Citation (APA) Drago, S., Leenaerts, D. M. W., Nauta, B., Sebastiano, F., Makinwa, K. A. A., & Breems, L. J. (2010). A 200 μA Duty-Cycled PLL for Wireless Sensor Nodes in 65 nm CMOS. *IEEE Journal of Solid State Circuits*, *45*(7), 1305-1315. https://doi.org/10.1109/JSSC.2010.2049458 Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. # A 200 $\mu$ A Duty-Cycled PLL for Wireless Sensor Nodes in 65nm CMOS Salvatore Drago, Student Member, IEEE, Domine M. W. Leenaerts, Fellow, IEEE, Bram Nauta, Fellow, IEEE, Fabio Sebastiano, Student Member, IEEE, Kofi A. A. Makinwa, Senior Member, IEEE and Lucien J. Breems, Senior Member, IEEE #### **Abstract** The design of a Duty-Cycled PLL (DCPLL) capable of burst mode operation is presented. The proposed DCPLL is a moderately-accurate low-power high-frequency synthesizer suitable for use in nodes for Wireless Sensor Networks (WSN). Thanks to a dual loop configuration, the PLL's total frequency error, once in lock, is less than 0.25% from 300 MHz to 1.2 GHz. It employs a fast start-up DCO which enables its operation at duty-cycles as low as 10%. Fabricated in a baseline 65-nm CMOS technology, the DCPLL circuit occupies $0.19x0.15 \text{ mm}^2$ and draws $200 \mu\text{A}$ from a 1.3-V supply when generating bursts of 1 GHz signal with a 10% duty-cycle. **Keywords:** CMOS, Duty-Cycle, PLL, frequency stability, ultra-low power, wireless sensor networks, WSN, fully integrated, frequency synthesizer. #### I. Introduction Energy autonomy and form factor are two critical concerns for emerging sensor platforms, particularly for applications based on Wireless Sensor Networks (WSN) [1]. The limited energy This work is funded by the European Commission in the Marie Curie project TRANDSSAT - 2005-020461. - S. Drago, F. Sebastiano, L. J. Breems and D. M. W. Leenaerts are with NXP Semiconductors, Eindhoven, The Netherlands, Email: salvatore.drago@nxp.com. - K. A. A. Makinwa is with the Electronic Instrumentation Laboratory/DIMES, Delft University of Technology, Delft, The Netherlands. - B. Nauta is with the IC Design Group, CTIT Research Institute, University of Twente, Enschede, The Netherlands from power sources such as micro-fabricated batteries or energy scavengers remains one of the biggest challenges for such systems. Reducing the power consumption of WSN nodes will extend their lifetime, lower the battery size and, consequently, reduce their volume. 2 As for other radio communication systems, high frequency synthesizers are essential blocks of WSN nodes. The current state-of-the-art of such synthesizers is illustrated in Fig. 1. Conventional PLLs are robust to frequency offset and frequency drifts thanks to the fact that they are locked to a stable reference. Their inaccuracy is then mainly determined by oscillator phase noise and by other sources of in-band noise. Although PLLs can achieve inaccuracies of a few ppm, this is associated with stringent phase noise and jitter requirements [2], [3] and leads to relatively high power consumption. Such PLLs are not suitable for use in WSN nodes. To address this problem, various architectures with relaxed phase noise and accuracy specifications have been proposed to reduce power consumption. In [4] and [5], a free-running, but periodically calibrated, digitally controlled oscillator (DCO) is employed. This approach is extremely low power, but its inaccuracy is limited to only a few percent due to the large and unpredictable frequency drift caused by supply voltage and temperature variations. A node in a WSN typically spends the largest fraction of time in idle mode [6]. The energy wasted while idling can be significantly reduced by switching-off unused parts of the system. This suggests the use of Duty-Cycled PLLs (DCPLLs) in WSN nodes, i.e. PLLs which are operated in burst mode [7]. The output of a DCPLL consists of short bursts of high frequency signals separated by long idle periods, during which energy is saved. The resulting lower power dissipation of DCPLLs makes them much more suitable for WSN nodes. Since DCPLLs are not active continuously, they are prone to frequency offset and so they are less accurate than conventional PLLs. The inaccuracy of 0.25% targeted in this work is enough to meet the requirements of WSN applications [5], [6]. Although DCPLLs dissipate more power than simple free-running DCOs, they are more accurate and less prone to frequency drift due to their closed-loop nature. However, they require special architectures to ensure loop stability and fast start-up circuitry to avoid extra power consumption during the transitions from idle to active periods. Fast start-up circuitry enables the use of low duty-cycle ratios, which translates into low average power consumption. The objective of this work is to design a frequency synthesizer capable of burst operation while maintaining a frequency error due to offset and to the DCO noise less than 0.25%. The proposed DCPLL can be operated at low duty-cycle ratios, since it employs a fast start-up DCO, resulting in a highly energy-efficient synthesizer which enables energy autonomous WSN nodes. The generated frequency ranges from several hundreds of MHz to more than 1 GHz. Theoretical analysis and experimental validation of this approach is provided, demonstrating that a frequency inaccuracy of better than 0.25% can be achieved while maintaining a power consumption of only few hundreds of $\mu$ W. The architecture of the DCPLL is presented in section II along with a stability analysis; circuit description and fast-start up strategies are discussed in section III; experimental results are shown in section IV and conclusions are drawn in section V. ### II. DUTY CYCLED PHASE LOCKED LOOP (DCPLL) #### A. DCPLL architecture In order to enable burst mode operation, an All-Digital PLL is preferred over a conventional analog PLL based on a phase frequency detector and a charge pump. This is because the DCO's digital control word (DCW), which represents its frequency can then be stored in a memory, allowing frequency tracking between two successive bursts. A simplified block diagram of the proposed DCPLL is shown in Fig. 2. Its main loop consists of a DCO, a counter, an accumulator (ACC1) and one digital subtractor (S1). A second fine tuning loop increases the accuracy of the output frequency as explained in the next subsection. Both loops are controlled in an efficient manner by a finite state machine (FSM). The DCO consists of a current-controlled ring oscillator and a 16-bit digital-to-analog converter (DAC) segmented in two banks: one 7-bit bank for coarse frequency acquisition and one 9-bit bank for fine tuning. The use of two different banks relaxs the requirements of the DAC, resulting in area saving and reduced complexity [3]. As shown in the timing diagram of Fig. 3, a reference clock with a frequency REF drives the FSM, which generates the control signals for the DCO, the counter and the accumulators. The DCO is periodically turned on and off, while the two loops ensure that its frequency is locked to REF. After a sleep time of N-1 reference clock cycles, the DCO is started up and allowed to run for only one reference clock cycle $T=\frac{1}{REF}$ . The DCO drives the counter which is reset before each burst generation. In doing so, the counter detects the number of DCO rising edges that occur during the reference clock cycle. The resulting integer is stored in the registers of the counter and it is compared with the desired frequency control word (FCW) by the digital subtractors. The resulting error signals $\epsilon_{coarse}$ and $\epsilon_{fine}$ updates the DCWs stored in the two accumulators. 4 The DCW update is delayed by one reference cycle T. Since T is large compared to the counter's and subtractor's delays, there is enough time margin for a proper error estimation. This strategy makes it possible to implement the counter and the digital subtractor as a simple asynchronous D-FF-based counter and a full-adder based subtractor respectively. This leads to a significant power saving with respect to synchronous counters and phase frequency detector based on charge pump. Moreover, thanks to the burst mode operation, the large delay T in the DCW update does not affect the DCPLL's dynamics. As will be explained in the next section, a short preset period is used to speed-up the DCO's start-up. # B. Coarse Acquisition Main Loop Dynamics The dynamics of the coarse acquisition main loop can be analyzed considering it as a discrete time system, where the sampling operation is determined by the rising edge of the reference clock which causes the burst generation, which appears once every $N^{th}$ clock cycle. In the following analysis the delays of each block, including the FSM, are ignored. This assumption is valid if the reference clock periods are larger than the total delay introduced by the digital gates. The previous condition is well satisfied in the DCPLL implementations since the reference frequency is several times smaller then the generated high-frequency output signal. The response can be formulated in terms of the output frequency $F_0$ and the input frequency REF. A block diagram model of the coarse acquisition loop is represented in Fig. 4. The output frequency for the $i^{th}$ burst, $F_0(i)$ , is given by: $$F_{0}(i) = K_{DCO} \cdot DCW(i) + F_{offset} =$$ $$= K_{DCO} \cdot [DCW(i-1) + \epsilon_{coarse}(i-1)] + F_{offset} =$$ $$= F_{0}(i-1) + K_{DCO} \cdot \epsilon_{coarse}(i-1)$$ (1) where $K_{DCO}$ is the DCO gain (MHz/bit), $F_{offset}$ is the DCO offset and $\epsilon_{coarse}(i)$ , defined as the $i^{th}$ burst's frequency error, is given by: $$\epsilon_{coarse}(i) = FCW - C(i)$$ (2) C(i) represents the counter's output, i.e. the integer number of rising clock edges which fall in one clock reference period T in the $i^{th}$ burst. As shown Fig. 5, integer C(i) can be expressed as the sum of the fractional number of DCO's period $T_{DCO}$ contained in one reference clock period T, represented by $\frac{T}{T_{DCO}}$ , and the quantization error $\epsilon_q$ . Thus, C(i) is equal to: $$C(i) = \frac{T}{T_{DCO}(i)} + \epsilon_q(i) =$$ $$= \frac{F_0(i)}{REF} + \epsilon_q(i)$$ (3) 5 where $\epsilon_q(i) \in [0,1)$ . By combining Eq.(1), Eq.(2) and Eq.(3) the following closed loop finite difference equation can be derived: $$F_0(i) = K_{DCO} \cdot FCW + F_0(i-1) \cdot \left[ 1 - \frac{K_{DCO}}{REF} \right] - K_{DCO} \cdot \epsilon_q(i-1)$$ (4) If the coarse DCO gain $K_{DCO}$ is constant and known, it is possible to predict the exact dynamics of the coarse acquisition loop. However, theoretical considerations on Eq.(4) can be drawn easily only under the hypothesis that the quantization error $\epsilon_q$ is very small and negligible. In this case, the system is stable if the pole falls inside the unitary circle. In the general case, when the term $\epsilon_q$ is large, the stability condition is difficult to predict on a theoretical basis. $\epsilon_q$ is, in fact, an implicit function of $F_0$ and Eq.(4) becomes non-linear. In order to find a simple condition for stability, a numerical approach has been used. Simulation results based on Eq.(4) are summarized in Fig. 6. It shows the normalized DCPLL's step response for different values of $\frac{K_{DCO}}{REF}$ . For $0 < \frac{K_{DCO}}{REF} \le 1$ the system is always stable and its step response is overdamped. The DCPLL settles in one step when $\frac{K_{DCO}}{REF} = 1$ . A particular behaviour is observed when $1 < \frac{K_{DCO}}{REF} < 2$ . In this case, the system is stable only if the programmed DCPLL's output frequency $F_0$ is close to one of the possible DCO's free running frequencies: $$|K_{DCO} \cdot DCW - FCW \cdot REF| << REF \tag{5}$$ Under this condition the response is underdamped and it converges asymptotically to the programmed frequency. However, as shown in Fig. 6 (b), if $1 < \frac{K_{DCO}}{REF} < 2$ and for any DCW the DCO's frequency differs from the output frequency $F_0$ by more then REF [see eq. (6)] the output will oscillate around the target frequency with a large quantization error. $$|K_{DCO} \cdot DCW - FCW \cdot REF| > REF$$ (6) Finally, the DCPLL is always unstable if $\frac{K_{DCO}}{REF} > 2$ . In conclusion the DCPLL is unconditionally stable if the following stability equation is satisfied: $$K_{DCO} \le REF$$ (7) 6 When locked, $F_0(i) = F_0(i-1) = F_0$ and the integer number of DCO rising edges between two reference edges is equal to the programmable FCW. The DCO has a duty-cycle of 1/N and the DCPLL's output frequency $F_0$ is: $$F_0 = FCW \cdot REF + \Delta F_{q,coarse} \tag{8}$$ with $\Delta F_{q,coarse} = K_{DCO} \cdot \epsilon_q$ falling in the range [0, REF). While the reference clock frequency is known, the parameter $K_{DCO}$ is process technology dependent and it behaves nonlinearly with respect to the digital control word DCW. This will cause the dynamics to vary around the design target. As will be explained in section III, current-controlled delays lines in closed loop can be used to implement a DCO with a fast start-up time. Fig. 7 shows an example of its output frequency as a function of DCW. The frequency can change over a broad range, but it is nonlinear with respect to DCW. As the operating frequency is reduced, $K_{DCO}$ becomes larger, which cause the frequency quantization error to increase. This behavior is undesirable because the stability of the loop can be affected at lower frequency, which in turn constrains the operating frequency range. Thus, the DCO has to be carefully designed in order to ensure the stability condition of Eq.(7) for each value of DCW, especially for low frequencies where $K_{DCO}$ is larger. For a given tuning range the stability condition can be ensured by increasing the resolution of the coarse frequency acquisition bank in order to reduce the DCO gain $K_{DCO}$ . #### C. Fine Tuning Secondary Loop Conceptually a single loop performing the coarse frequency acquisition is sufficient to reach the steady state condition. Fig. 8 (a) shows a typical coarse acquisition steady state condition, where the DCO's output frequency is closed to the programmed frequency $F_0 = FCW \cdot REF$ . The $(FCW + 1)^{th}$ DCO rising edge may be delayed by $\Delta t_{coarse} = \epsilon_q \cdot T_{DCO}$ with respect to the reference rising edge. This results into an error in the generated frequency which can be as high as REF. Significantly better performance can be achieved if, in conjunction with the main loop, which handles the coarse frequency acquisition, an additional loop is employed for fine frequency tuning. As depicted in Fig. 8 (b), a small increase $\Delta f_{fine}$ of the DCO's frequency advances all the DCO's rising edge by small time steps. The last DCO edge is advanced by a time interval $\Delta t_{fine}$ given by: $$\Delta t_{fine} \simeq \frac{\Delta f_{fine}}{REF} T_{DCO} \tag{9}$$ 7 Before each burst generation, the fine tuning loop increases the DCW by a least significant bit (LSB) increasing the DCO frequency by a small step $\Delta f_{fine}$ until the $(FCW+1)^{th}$ DCO edge just leads the reference clock edge. At this point, the fine tuning loop increases or decreases the DCW by 1 LSB depending on whether the $(FCW+1)^{th}$ DCO edge leads or lags the reference clock edge. Burst by burst, the frequency then varies by $\pm \Delta f_{fine}$ and so the last DCO edge jumps backward and forward around the reference clock edge. While the main loop controls the number of rising edges occurred between two successive reference clock edges, the fine tuning loop decreases the delay between the last DCO rising edge and the reference clock edge. The total error is reduced and the accuracy is improved (Fig. 8 b)). Notice that the coarse and the fine tuning loops adjust only the centre frequency of the bursts. However, since each burst is generated synchronously every N reference cycles, the DCO initial phase is locked to the reference phase. Moreover, the last DCO period is also locked to the reference clock thanks to the bang-bang operation. Thus, the combination of the two loops together with the duty-cycling operation transforms the system into a Phase Locked Loop. The quantization error in the frequency generated by the proposed dual loop configuration is reduced to $\Delta f_{fine}$ . This error can be minimized by increasing the DCO's fine tuning bank resolution. However, in a low power implementation, the quantization noise is lower than DCO's phase noise which is determined by the total power available. In the current design, $\Delta f_{fine}$ has been chosen low enough to make the quantization noise negligible with respect to the phase noise. When only thermal noise is considered the DCO relative period jitter $\frac{\sigma_{noise}}{T_{DCO}}$ can be expressed as function of the SSB phase noise PSD L(f) at frequency offset f and the DCO frequency $F_0$ [8]: $$\frac{\sigma_{noise}}{T_{DCO}} = \sqrt{\frac{L(f)}{F_0}} \cdot f \tag{10}$$ 8 The uncertainty of the edge $(FCW + 1)^{th}$ due to the phase noise accumulation after FCW periods is: $$\frac{\sigma_{noise,FCW+1}}{T_{DCO}} = \sqrt{\frac{L(f) \cdot FCW}{F_0}} \cdot f \tag{11}$$ The quantization noise is negligible with respect to the phase noise if the following condition holds: $$\frac{\Delta t_{fine}}{T_{DCO}} \simeq \frac{\Delta f_{fine}}{REF} << \frac{\sigma_{noise,FCW+1}}{T_{DCO}}$$ (12) By combining Eq. (11) and Eq. (12), it can be concluded that to neglect the error due to the quantization noise $\Delta f_{fine}$ should satisfy the following the condition: $$\Delta f_{fine} \ll \sqrt{L(f) \cdot REF} \cdot f$$ (13) As said in the previous sub-section, thanks to the delay introduced in the DCW update, the DCPLL does not require a power hungry bang-bang phase detector but only requires simple logic circuits implementing a digital subtractor [9]. A modified subtractor has been used in order to realize the bang-bang operation. Fig. 9 shows the implemented combined transfer characteristic of the counter and the subtractor for the coarse acquisition and fine tuning loops. In the transfer characteristic of the coarse acquisition loop the horizontal dead-band has been extended from the range [0,1), typical for a conventional subtractor, to the range [-1,1). This is equivalent to saying that the subtractor produces a null error signal $\epsilon_{coarse}$ when the integer number of the DCO edges falling into one clock cycle is equal to FCW or to FCW-1. This avoids changes in the coarse frequency bank when the DCO's frequency is closed to the desired frequency. In order to realize the bang-bang operation in the fine tuning loop, a vertical dead-band is implemented in its transfer characteristic. This ensures that the fine tuning bank is continuously modified in order to change the DCO's frequency by small steps around the programmed frequency in a bang-bang fashion. Finally, the fine tuning dynamics are adjusted based on whether the system is in the acquisition or in the steady-state tracking mode. In doing so, both a faster PLL settling time and an accurate frequency output can be achieved. By means of the bandwidth control block, the gain in the fine loop can be modified to achieve an adaptive bandwidth. Fig. 19 shows the simulated settling of the coarse and fine tuning values during the frequency acquisition. Initially only the coarse tuning is operative. When the coarse acquisition loop produces a null error $\epsilon_{coarse}$ the secondary fine tuning loop is activated and the gain is automatically reduced until the 'bang-bang' steady state condition is reached. If the fine tuning accumulator overflows, the coarse acquisition bank is modified by one LSB. To ensure proper functionality, the fine tuning range is larger than 2 coarse LSBs, realizing a segmented but overlapping DCO transfer characteristic. #### III. DCO The proposed DCPLL can work only with a fast start-up DCO whose output frequency can settle well within a short reference clock period T. Ring oscillators start up faster than LC oscillators, which require approximately Q periods to reach steady-state, where Q is the quality factor of the LC tank [10]. Additionally, if phase noise is not the main requirement, ring oscillators require less power than LC oscillators [5]. Finally, since the DCO will be turned off for a significant fraction of time, its static power consumption in idle mode should be very low. These considerations motivate the use of the ring oscillator shown in Fig. 10. It consists of four delay stages in a closed loop and an R/2R ladder current DAC. Each delay stage uses a pseudo-differential architecture. The frequency is controlled by the complementary voltages $V_p$ and $V_n$ at the gates of PMOS $M_1 - M_4$ and NMOS $M_5 - M_8$ which are stored on the two large gate capacitors $C_p$ and $C_n$ . The fast start-up behaviour of the DCO is achieved by adopting a preset phase implemented by means of the switches $s_5 - s_8$ , which precedes the start-up moment controlled by the switches $s_1 - s_4$ . Fig. 10 illustrates the time diagram of the switches $s_1 - s_8$ . During the idle state, the switches $s_1$ and $s_2$ are connected to Vdd and ground, respectively, while the final stage of the delay line is disconnected from the first stage by means of the switches s<sub>3</sub> and s<sub>4</sub>. Therefore, the oscillator's power consumption is only determined by the leakage currents of the inverters. Opening s<sub>1</sub> and s<sub>2</sub> and closing s<sub>3</sub> and s<sub>4</sub> synchronously, configures the delay line as an oscillator whose output frequency depends on the control voltages $V_p$ and $V_n$ . Most of its power dissipation is due to switching events (i.e. is proportional to CV<sup>2</sup>). To synthesize the desired frequency, the per-stage delay is tuned to 1/8 of the desired RF cycle period by means of the DAC current source $I_{DAC}$ which sets the two voltages $V_p$ and $V_n$ . The DCO start-up delay must be negligible with respect to the reference period. This requires that $C_p$ and $C_n$ are large capacitors and that the currents through the diodes $M_9$ and $M_{11}$ are large enough to set the voltages in a short time. To achieve this while maintaining a low power consumption, a preset phase precedes the DCO's actual start-up. During the preset phase, which begins one reference clock before the DCO is started (Fig. 10), the DAC is switched ON to read the information stored in the DCPLL accumulators and, after half reference period, the switches $s_{\rm 5}-s_{\rm 8}$ are closed allowing the generated current $I_{DAC}$ to set the voltage $V_p$ and $V_n$ . So when the DCO is started, all voltages are already preset to their correct values, thus mitigating output frequency variations. The DCO is kept running for one reference cycle and, then, shut down by means of the switches $s_1 - s_4$ which configure the DCO again as an open-loop delay line. After a small delay the switches $s_5 - s_8$ are opened to preserve the charge in the capacitors $C_p$ and $C_n$ and the DAC is turned off to save power. The different control phases are generated by means of a non-overlapping clock generator. In order to decrease the $R_{on}$ resistance of the switches $s_3$ and $s_4$ in the signal path, a transmission gate topology has been chosen (Fig. 10). The simulated $R_{on}$ is 270 Ohm, which together with the node capacitance introduces a delay of 34 ps, which is negligible with respect to the minimum DCO period. The simplified circuit schematic of the R/2R current DAC is represented in Fig. 11 (a). It consists of two different R/2R ladders implementing the coarse and the fine banks, connected to the PMOS transistor $M_1$ and an opamp. The opamp, consisting of a differential pair, connects both the ladders in feedback in order to improve the linearity of the drain current $I_{DAC}$ of $M_1$ . A scaled copy of $I_{DAC}$ is delivered to the ring oscillator by means of transistor $M_2$ . In order to save power during the idle state, the enable switches are open and $M_1$ goes to the cut-off region due to the large load resistance. Therefore, the DAC power consumption is only determined by the opamp current. However, thanks to the low output capacitance at node A the required current to ensure the close loop stability is also low. $R_{comp}$ and $C_{comp}$ are used for Miller compensation of the feedback loop comprising $M_1$ and the opamp. Fig. 11 (b), shows the current DAC equivalent circuit. The two R/2R ladders can be represented as the parallel of 2 digitally tunable voltage source $V_{coarse}$ and $V_{fine}$ in series with a fixed coarse and fine resistances $R_c$ and $R_f$ . The voltage at node A is fixed to the reference voltage $V_{ref}$ by the feedback. By inspection of the equivalent circuit is simple to derive $I_{DAC}$ as sum of coarse and fine currents $I_{DAC,coarse}$ and $I_{DAC,fine}$ : $$I_{DAC} = I_{DAC,coarse} + I_{DAC,fine} (14)$$ $$= \frac{V_{ref} - V_{coarse}}{R_c} + \frac{V_{ref} - V_{fine}}{R_f}$$ (15) 11 To ensure proper functionality, the maximum value of $V_{coarse}$ and $V_{fine}$ should be lower than $V_{ref}$ . To ensure this, additional R/2R elements, always connected to ground, limit the range of $V_{coarse}$ and $V_{fine}$ to $V_{dd}/2$ . The adopted circuit topology allows to increase the resolution of the DAC while maintaining fixed the tuning range by adding extra R/2R elements. Montecarlo simulations showed that 7 bits are enough to ensure the stability condition of Eq.(7) for all the coarse DCWs over the full frequency range. $R_{fine}$ is chosen to set the fine tuning range larger than 2 coarse LSBs. Finally, the area of the resistors is chosen large enough to ensure the monotonicity of the DAC. The proposed DCO's architecture allows, in principle, a fractional multiplication of the reference thanks to the availability of multi-phase outputs. Fig. 12 shows the signals $V_a$ and $V_b$ at node a) and b) with reference of Fig. 10, in the steady state condition and in the particular case when node b) is used as output. Since node a) is connected to the switches $s_2$ and $s_4$ , $V_a$ switches from ground to $V_{dd}$ with negligible delay with respect to the start-up reference rising edge. Since node b) is fed back to the counter in the DCPLL loop, its $(FCW+1)^{th}$ rising edge is aligned with the reference rising edge generating the switch-off signal. Since $V_a$ and $V_b$ are normally delayed by $\frac{T_{DCO}}{4}$ , there are (FCW+0.25) DCO periods $T_{DCO}$ in one reference period T and the nominal output frequency is given by: $$F_0 = (FCW + 0.25) \cdot REF \tag{16}$$ When required, the reference frequency multiplication factor can be also be changed by steps of 0.25 by selecting one of the four possible quadrature outputs to feedback to the counter with respect to the position of switches $s_1 - s_4$ . In principle, the adoption of a 4 differential stage DCO allows the generation of 8 different phases and, thus, the operation at $\frac{1}{8}$ fractional-N. In this design, however, the multiplication factor is fixed because no additional resolution is required. To test the fractional multiplication of the reference, node b) has been chosen as output resulting into a multiplication factor of (FCW+0.25). The proposed DCO can cover frequencies ranging from 300Mhz up to 1.2GHz. The maximum DCO frequency is limited by the interconnections parasitic capacitances which is comparable with the input capacitance of the delay stages, since they are implemented with minimum size devices to enable low power operation. The maximum DCO frequency can be increased either by burning more power, scaling up the devices size, or by employing a 2 differential stage DCO. However, the last one translates into a lower DCPLL resolution. #### IV. EXPERIMENTAL RESULTS The oscillator has been realized in a baseline TSMC 65-nm CMOS process. The circuit measures $0.03 \text{ mm}^2$ . Most of the area is occupied by the R/2R network and by the two digital loops (Fig. 13). As shown in Fig. 14, the DCPLL's output consists of a train of approximately 1 GHz bursts with 50 ns duration and with a 10% duty-cycle (N=10). he delay between the reference clock and the generated burst is 1.2 ns, which corresponds to a $8.65^{\circ}$ constant phase error with respect to the reference. The output frequency can be programmed from 300 MHz to 1.2 GHz according to Eq. (16), while being driven by a 20 MHz reference clock. When generating 1 GHz, the total current consumption at 1.3-V supply voltage is $200 \mu A$ ( $100 \mu A$ for the DCO; $60 \mu A$ for the current DAC; $40 \mu A$ for the counter and PLL logic). The PLL's initial settling transient is shown in Fig. 15. Each point represents the average frequency within each burst and it has been measured by using a 20 GHz digital sampling scope. After the acquisition of each burst, the DCO periods have been computed by first interpolating the sampled waveform linearly and then estimating the zero-crossing time. The instantaneous frequency is computed as the reciprocal of the DCO period, while the average frequency within each burst is estimated by averaging the instantaneous frequency. As shown in Fig. 15, after 15 bursts, or equivalently, after 7.5 $\mu$ s, the output frequency settles to the programmed frequency of 1.005 GHz with an error less than 0.25%. In the case shown, the DCO's initial frequency was set to about 300 MHz by loading an estimated DCW into the accumulator while the programmed FCW was 50. After the PLL's first settling transient, the correct DCW will be stored in the two accumulators and only needs to be slightly adjusted to compensate for temperature and voltage variations. Fig. 16 shows the frequency for 1000 consecutive bursts for the case FCW=50. Each point represents the average frequency within each burst, while the two bold lines represent the standard deviation. The average frequency has an offset with respect to the nominal frequency of about 1.5 MHz or equivalently 0.15%. This is due to a systematic difference between the delay from the reference clock to the start-up signal and from the reference clock to the switch-off signal. The DCO ON time is longer than one reference period and its frequency is then lower. In fact, the number of DCO periods, which occurs during the DCO ON time is fixed to FCW by the dual loop architecture. Consequently, if the DCO ON time is longer, the average DCO period will be also longer and, thus, the DCO frequency is lower. The measured systematic offset for all the frequencies is less of 0.2% and it is reported in Fig. 17. If a systematic error affects the reference period the relative error on the time the DCO is active is independent on FCW. Consequently, also the relative error on the output frequency $F_0$ is independent on FCW and this in fact is observed in the measurements in Fig. 17. Fig. 18 (a) shows the distribution of the generated frequency for 1000 consecutive bursts in the case FCW = 50. The absence of the systematic "bang-bang" frequency jumps confirms that the error due to the DCO's phase noise is greater than the quantization error. Fig. 18 (b) shows the zero-crossing point distribution of the $50^{th}$ rising edge. After 49 DCO periods, the accumulated jitter for the edge is 30 ps (rms) giving a time uncertainty of 0.06% with respect to the reference. This translates into a frequency error due to the noise of 0.06% observed in Fig. 18 (a). The DCO period jitter is $\frac{30ps}{\sqrt{49}} = 4.28$ ps which corresponds to a thermal free running phase noise of -77dBc/Hz at 1 MHz offset (Eq. (10)). According to (Eq. (13)) the frequency step of the fine tuning bank $\Delta f_{fine}$ is less than 140 KHz. The frequency accuracy of a DCPLL is determined by the total contribution of the offset The frequency accuracy of a DCPLL is determined by the total contribution of the offset and of the DCO's phase noise. While the first can be calibrated, the latter can be reduced only increasing the power consumption. It can be seen that the fine tuning loop significantly improves the achieved accuracy; an error of 20 MHz (2%) would be obtained with only the main loop. The standard deviation of the <sup>&</sup>lt;sup>1</sup>This value is more reliable than the one reported in [7] of -73dBc/Hz@1MHz since it is computed on the basis of an higher number of bursts (1000) frequency error represents an important parameter for burst-mode frequency synthesizer since it replaces the closed-loop PLL phase noise. As shown in the spectrum of Fig. 20 the DCPLL output signal is modulated and it is not possible to derive phase noise informations. To characterize the DCO's performance, its instantaneous frequency during a burst has been measured and is reported in Fig. 21 together with the interpolated frequency (2 samples averaging) and the average frequency over a burst period. The DCO starts approximately at the correct frequency and takes a few DCO periods to settle. The DCPLL is not sensitive to this systematic variations but it tries to tune the average frequency showed as dashed line. However, the deviation from the fixed frequency is kept within few percent thanks to the preset strategy. Table in Fig. 22 summarizes the DCPLL performaces and shows a comparison with a few previously published frequency synthesizers. Conventional PLLs achieve better accuracy (limited by DCO phase noise and by other sources of in-band noise) but with higher power consumption. Free running DCOs consume less power but they are prone to large frequency drift. In DCPLLs power is traded for accuracy. #### V. CONCLUSIONS Duty-cycled PLLs can be used as high frequency synthesizers in WSN nodes, thanks to their moderate accuracy and low power demand. A simplified theoretical analysis has been carried out showing the stability conditions for such systems. By employing a fast start-up DCO the PLL can operate with a low duty-cycle factor, resulting in an high energy-efficient synthesizer. Fabricated in a 65-nm CMOS process the DCPLL shows a total frequency multiplication inaccuracy, less than 0.25% including frequency offset and error due to the noise $(1\sigma)$ . After the offset calibration the achieved accuracy is limited by the DCO's jitter and, hence, by the total power budget available. It consumes less than 200 $\mu A$ while generating a 1 GHz output frequency with 10% duty-cycled. As shown in Fig. 1, DCPLLs are good candidates to generate a high frequency in nodes for WSN applications. # REFERENCES - [1] J. Ammer, F. Burghardt, E. Lin, B. Otis, R. Shah, M. Sheets, and J. M. Rabaey, "Ultra low-power integrated wireless nodes for sensor and actuator networks," in *Ambient Intelligence*, W. Weber, J. M. Rabaey, and E. Aarts, Eds. Springer, 2005. - [2] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A 2.2GHz 7.6mW Sub-Sampling PLL with -126 dBc/Hz In-Band Phase Noise and 0.15 ps Jitter in 0.18μm CMOS," in *ISSCC*, *Dig. of Tech. Papers*, Feb. 2009, pp. 392–393. - [4] B. W. Cook, A. D. Berny, A. Molnar, S. Lanzisera, and K. S. J. Pister, "An ultra-low power 2.4GHz RF transceiver for wireless sensor networks in 0.13μm CMOS with 400mV supply and an integrated passive RX front-end," in *ISSCC Digest* of *Technical Papers*., Aug. 2006, pp. 258 – 259. - [5] N. Pletcher, S. Gambini, and J. Rabaey, "A 52 $\mu$ W Wake-Up Receiver With 72 dBm Sensitivity Using an Uncertain-IF Architecture," *Solid-State Circuits, IEEE Journal of*, vol. 44, no. 1, pp. 269–280, Jan. 2009. - [6] F. Sebastiano, S. Drago, L. Breems, D. Leenaerts, K. Makinwa, and B. Nauta, "Impulse based scheme for crystal-less ULP radios," in *Proc. ISCAS*, May 2008, pp. 1508 1511. - [7] S. Drago, D. Leenaerts, B. Nauta, K. Sebastiano, F. Makinwa, and L. Breems, "A 200 μA Duty-Cycled PLL for Wireless Sensor Nodes," in *Proc. ESSCIRC*, Sep 2009, pp. 132–135. - [8] A. Abidi, "Phase Noise and Jitter in CMOS Ring Oscillators," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 8, pp. 1803–1816, Aug. 2006. - [9] F. R. K. Soliman, S. Yuan, "An overview of design techniques for CMOS phase detectors," in *Proc. ISCAS*, May 2002, pp. 457–460. - [10] D. Wentzloff and A. Chandrakasan, "A 47pJ/pulse 3.1-to-5 GHz All-Digital UWB transmitter in 90nm CMOS," in ISSCC Digest of Technical Papers., Feb. 2007, pp. 118–591. #### LIST OF FIGURES Comparison between high frequency synthesizers in various applications. For PLLs accuracy is enstimated from their rms period jitter. For free running oscillators the accuracy is defined as the maximum relative frequency deviation due to PVT Simulated normalized step response for different value of $\frac{K_{DCO}}{REF}$ ; the bottom plot Transfer characteristic of counter and subtractor: (a) Coarse acquisition (b) Fine Fractional multiplication: steady state signals on start-up node (a) and the output Measured DCPLL Frequency deviation from 1.005 GHz vs. time . . . . . . . . . Measured Probability Density Functions (PDF) of the DCPLL Output Frequency (a) and of the Zero-Crossing time of the $50^{th}$ edge (b) in the case FCW=50. . . . Fig. 1. Comparison between high frequency synthesizers in various applications. For PLLs accuracy is enstimated from their rms period jitter. For free running oscillators the accuracy is defined as the maximum relative frequency deviation due to PVT variations. DCPLL point is given as reference. Fig. 2. Duty-cycled PLL. Fig. 3. DCPLL waveforms. Fig. 4. Block diagram model of the coarse acquisition loop. Fig. 5. Counter quantization noise. Fig. 6. Simulated normalized step responce for different value of $\frac{K_{DCO}}{REF}$ ; the bottom plot represents condition (6). Fig. 7. Typical DCO transfer characteristic. Fig. 8. (a) Coarse acquisition (b) Fine tuning. Fig. 9. Transfer characteristic of counter and subtractor: (a) Coarse acquisition (b) Fine tuning. Fig. 10. Schematic of the DCO. Fig. 11. Schematic of the DAC (a) and its equivalent circuit (b). Fig. 12. Fractional multiplication: steady state signals on start-up node (a) and the output node (b) Fig. 13. Die micrograph of the test chip. Fig. 14. Measured DCPLL output (top) with zoom-in (bottom). Fig. 15. Measured DCPLL settling time. Fig. 16. Measured DCPLL Frequency deviation from 1.005 GHz vs. time Fig. 17. Measured Output Frequency and Offset Vs. FCW Fig. 18. Measured Probability Density Functions (PDF) of the DCPLL Output Frequency (a) and of the Zero-Crossing time of the $50^{th}$ edge (b) in the case FCW=50. Fig. 19. Simulated Coarse and Fine tuning values settling behaviour. Fig. 20. Measured Output Spectrum. Fig. 21. Measured DCO instantaneous frequency during a burst. | | This work | [3] | [2] | [4] | [5] | |--------------------------------|-----------|------------------|-----------------|------------------------|------------------------| | Technique | DCPLL | ADPLL | SSPLL | Free<br>running<br>DCO | Free<br>running<br>DCO | | Frequency | 1.005 GHz | 825 MHz | 2.21 GHz | 2.4 GHz | 1.9 GHz | | Duty-cycle | 10% | 100% | 100% | 100% | 100% | | Power consumption | 260 uW | 23 <b>.</b> 3 mW | 7 <b>.</b> 6 mW | 200 uW | 20 uW | | Resolution | 20 MHz | 15 Hz | 55.25 MHz | N.A. | 62 MHz | | Accuracy | | | | | | | Cumulated rms Jitter<br>@ 50ns | 30ps | 0.050ps(*) | 0.020ps(*) | N.A. | N.A. | | Max Frequency drift | None | None | None | 10% | 2.5% | | Frequency offset | <0.2% | None | None | N.A. | 1.6% | <sup>(\*)</sup> Computed by integration of phase noise spectrum Fig. 22. DCPLL performance and comparison with previous work.