

**Delft University of Technology** 

# A 1024-Channel 268-nW/Pixel 36 × 36 $\mu$ m<sup>2</sup>/ Channel Data-Compressive Neural Recording IC for High-Bandwidth Brain–Computer Interfaces

Jang, Moonhyung; Hays, Maddy; Lee, Changuk; Caragiulo, Pietro; Ramkaj, Athanasios T.; Wang, Pingyu; Vitale, Nicholas; Tandon, Pulkit; Muratore, Dante G.; More Authors

**DOI** 10.1109/JSSC.2023.3344798

Publication date 2023 Document Version Final published version Published in

IEEE Journal of Solid-State Circuits

# Citation (APA)

Jang, M., Hays, M., Lee, C., Caragiulo, P., Ramkaj, A. T., Wang, P., Vitale, N., Tandon, P., Muratore, D. G., & More Authors (2023). A 1024-Channel 268-nW/Pixel 36 × 36 µm<sup>2</sup>/ Channel Data-Compressive Neural Recording IC for High-Bandwidth Brain–Computer Interfaces. *IEEE Journal of Solid-State Circuits, 59*(4), 1123-1136. https://doi.org/10.1109/JSSC.2023.3344798

# Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

### Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

# Green Open Access added to TU Delft Institutional Repository

# 'You share, we take care!' - Taverne project

https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

# A 1024-Channel 268-nW/Pixel $36 \times 36 \ \mu m^2/$ Channel Data-Compressive Neural Recording IC for High-Bandwidth Brain–Computer Interfaces

Moonhyung Jang<sup>10</sup>, Member, IEEE, Maddy Hays, Wei-Han Yu<sup>10</sup>, Member, IEEE, Changuk Lee<sup>10</sup>, Member, IEEE,

Pietro Caragiulo, Member, IEEE, Athanasios T. Ramkaj<sup>®</sup>, Member, IEEE, Pingyu Wang,

A. J. Phillips<sup>®</sup>, Graduate Student Member, IEEE, Nicholas Vitale, Graduate Student Member, IEEE,

Pulkit Tandon<sup>®</sup>, Member, IEEE, Pumiao Yan<sup>®</sup>, Graduate Student Member, IEEE,

Pui-In Mak<sup>®</sup>, *Fellow, IEEE*, Youngcheol Chae<sup>®</sup>, *Senior Member, IEEE*,

E. J. Chichilnisky<sup>®</sup>, Boris Murmann<sup>®</sup>, *Fellow*, *IEEE*,

and Dante G. Muratore<sup>(D)</sup>, Senior Member, IEEE

Abstract—This article presents a data-compressive neural recording IC for single-cell resolution high-bandwidth brain–computer interfaces (BCIs). The IC features wired-OR lossy compression during digitization, thus preventing data deluge and massive data movement. By discarding unwanted baseline samples of the neural signals, the output data rate is reduced by  $146 \times$  on average while allowing the reconstruction of spike samples. The recording array consists of pulse-position

Manuscript received 31 August 2023; revised 14 November 2023; accepted 9 December 2023. Date of publication 29 December 2023; date of current version 28 March 2024. This article was approved by Associate Editor Mototsugu Hamada. This work was supported in part by the Wu Tsai Neurosciences Institute, Stanford University; and in part by the Stanford Nanofabrication Facility and the National Institutes of Health (NIH) under Grant EY021271 and Grant EY032900. (*Corresponding author: Moonhyung Jang.*)

Moonhyung Jang, Pietro Caragiulo, Athanasios T. Ramkaj, A. J. Phillips, Nicholas Vitale, Pulkit Tandon, and Pumiao Yan are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: moon90@stanford.edu).

Maddy Hays is with the Department of Bioengineering, Stanford University, Stanford, CA 94305 USA.

Wei-Han Yu and Pui-In Mak are with the Institute of Microelectronics, University of Macau, Macau, China.

Changuk Lee is with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA 94720 USA.

Pingyu Wang is with the Department of Materials Science and Engineering, Stanford University, Stanford, CA 94305 USA.

Youngcheol Chae is with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, South Korea.

E. J. Chichilnisky is with the Hansen Experimental Physics Laboratory, Department of Neurosurgery and Ophthalmology, Stanford University, Stanford, CA 94305 USA.

Boris Murmann was with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA. He is now with the Department of Electrical and Computer Engineering, University of Hawai'i at Mānoa, Honolulu, HI 96822 USA.

Dante G. Muratore is with the Department of Microelectronics, Delft University of Technology, 2628 CD Delft, The Netherlands (e-mail: d.g.muratore@tudelft.nl).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2023.3344798.

Digital Object Identifier 10.1109/JSSC.2023.3344798

modulation (PPM)-based active digital pixels (ADPs) with a global single-slope (SS) analog-to-digital conversion scheme, which enables a low-power and compact pixel design with significantly simple routing and low array readout energy. Fabricated in a 28-nm CMOS process, the neural recording IC features 1024 channels (i.e.,  $32 \times 32$  array) with a pixel pitch of  $36 \mu$ m that can be directly matched to a high-density micro-electrode array (MEA). The pixel achieves  $7.4-\mu V_{\rm rms}$  input-referred noise with a -3-dB bandwidth of 300 Hz–5 kHz while consuming only 268 nW from a single 1-V supply. The IC achieves the smallest area per channel ( $36 \times 36 \mu m^2$ ) and the highest energy efficiency among the state-of-the-art neural recording ICs published to date.

*Index Terms*— Brain–computer interface (BCI), brain–machine interface (BMI), compression, multi-electrode array, neural interface, neural recording, pulse-position modulation (PPM), single-cell resolution.

#### I. INTRODUCTION

**B**RAIN-COMPUTER interfaces (BCIs) have the poten-tial to revolutionize therapy for neurological diseases, because they target the nervous system with high spatiotemporal resolution as opposed to pharmacological, surgical, or gene therapies [1], [2], [3]. Next-generation BCIs for clinical applications will benefit from an implantable neural recording IC with a dense, high channel count recording array that can be directly matched to a micro-electrode array (MEA) at the pitch of neurons ( $\approx$ 30  $\mu$ m) to effectively capture spatiotemporal patterns of neural activity at single-cell resolution. Over the last five decades, the doubling rate for simultaneously recorded neurons was approximately seven years, and the number is still less than a few thousand [4]. Future BCIs must support simultaneous recording from tens of thousands of neurons or more within the form factor and power budget of a fully implanted device. Recently, custom requirements for clinical BCIs that focus only on action potentials are emerging [5], [6], [7]. Hence, there is an opportunity for an architectural paradigm shift that can increase the number of

0018-9200 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. Design challenges of neural recording ICs for massive MEA and issues of the prior works.

channels while reducing channel area and power consumption. However, meeting these requirements poses a number of significant design challenges [8] (see Fig. 1). First, as the number of channels  $(N^2)$  increases, the system data rate becomes unmanageable (e.g., 10000 channels digitized at 10-bit resolution and 20 kS/s generate 2 Gb/s). On-chip spike detection (SD) can compress the raw data by transmitting only a snippet around the spike [9], [10]. However, this solution incurs significant overhead in threshold management, typically per channel, and in the memory, buffers needed to compensate for the SD latency. Compressive sensing [11], [12] and compressive autoencoder [13] architectures can be designed with hardware-friendly encoders on the implant site. However, they require all raw data to be digitized and buffered. Notably, the data movement cost can be a limiting factor when compression happens after digitization (e.g., to buffer 1-ms spikes in the above case, caching 2 Mbit into a 0.6-pJ/bit SRAM [14] at 1 kHz consumes 1.2 mW). Analog implementations of compressive sensing reduce the amount of raw data that are digitized [11], but require bulky analog filters that do not scale well to large-scale high-density arrays. Second, as the channel density increases, the routing from the analog signals in the array to the peripheral recording channels becomes a limiting factor. A common strategy is to perform sub-array digitization using a switch matrix [15], [16]. However, this eliminates the possibility of simultaneous recording over the entire array. Active digital pixels (ADPs) digitize the analog input inside the array and reduce routing congestion [17], but it comes at the cost of large pixels. Third, most of the prior high-density neural recording ICs consume chip total power per channel of more than 10  $\mu$ W. Practical wirelessly powered biomedical implants have a power budget of less than 10 mW [18]. Hence, the number of channels  $(N^2)$  is limited to less than 1000. Also, as the size of the channel is getting smaller, power density increases, and it leads to a safety issue along with a maximum allowable heat dissipation of implantable device in direct contact with a tissue (e.g., a power dissipation limit of 1 mW/mm<sup>2</sup> requires less than 1  $\mu$ W per channel in the area of 33  $\times$  33  $\mu$ m). Finally, for all the prior works, the area of the channel is too large  $(>0.004 \text{ mm}^2)$  to achieve a single-cell resolution neural interface in most regions of the nervous system.

This article presents a data-compressive neural recording IC that addresses the above issues to realize a high-density and channel count recording array for future single-cell resolution BCIs. The 1024-channel array consists of in-pixel pulse-position modulation (PPM)-based ADPs with a global single-slope (SS) A/D conversion scheme that significantly reduces the design complexity and size of the pixel. Different from [19] employing only a global SS A/D conversion to reduce the size of digital pixel, the PPM-based ADP output is read outside the array with a single routing line, thus significantly reducing the array's routing congestion and readout energy. A wired-OR compression method [20] compresses massive data from the large array during the A/D conversion, which addresses the data deluge problem and significantly reduces data movement. The average compression rate is  $146 \times$  with pre-recorded neural signals, while the reconstructed signal enables efficient spike sorting, cell type classification, and recovery of cell mosaics [20], [21]. Since the compression occurs without a spike detector, there is no threshold management and memory overhead. Fabricated in a 28-nm CMOS process, it achieves the lowest power consumption per channel (=268 nW) and the smallest area per channel  $(=36 \times 36 \ \mu m^2)$  among neural recording ICs while having 7.4- $\mu$ V<sub>rms</sub> input-referred noise in a [0.3, 5]-kHz bandwidth.

As an extension of [22], this article is organized as follows. Section II presents a system overview of the neural recording IC, while its architectural benefits are described in Section III. Section IV provides implementation details, and the measurement results are presented in Section V. Finally, the conclusions are drawn in Section VI.

# II. SYSTEM OVERVIEW

# A. Neural Recording IC Architecture Overview

Fig. 2 shows the block diagram of the data-compressive neural recording IC. At the front end, the  $N \times N$  MEA interfaces with the neural cells, and the pitch-matched ADP directly reads out each electrode in the  $N \times N$  pixel array. Each ADP consists of a front-end amplifier, comparator, and wired-OR logic. Outside the array, the global ramp generator, counter, and collision decoder process the array output data.

First, the input from the electrode is conditioned by the amplifier in the band of interest. The continuous-time (CT) comparator in the pixel applies PPM to the output of the amplifier using a globally distributed ramp signal. Then, the PPM output is connected to the row and column address of each pixel through wired-OR logic. In this way, the



Fig. 2. Block diagram of the data-compressive neural recording IC.



Fig. 3. (a) Chip architecture and (b) operation principle: two-channel example and its timing diagram.

 $N \times N$  array is simultaneously read out with a reduced number of wires (from  $N^2$  to 2N when compared with conventional analog and/or PPM-based pixels). Outside the array, the collision decoder reads the wired-OR PPM outputs and assigns the corresponding digital values based on a global counter synchronized with the ramp generator. If multiple pixels access the row/column buses during the same ramp step, decoding is not possible. These events are called collisions and are discarded (i.e., not stored) by the decoder performing data compression. Hence, only data from pixels having a unique digital value within a single ramp period are stored (see [20] and [21] for extensive validation of the wired-OR compression algorithm).

The described architecture has various advantages over the prior works. First, the PPM-based ADP includes only a single comparator for A/D conversion, which reduces its area and power consumption (the global ramp generator is shared among all pixels, making its power and area consumption negligible). Second, routing congestion in the array is significantly mitigated, allowing an increase in the number of channels while reducing the channel pitch (i.e., increasing the size and density of the array). Finally, since the compression occurs during A/D conversion and is realized without spike detection, the readout chain does not have massive data movement and spike detection overhead. Therefore, this analog-to-digital compression architecture enables simultaneous recording in large-scale MEAs while addressing the data deluge problem.

# B. Neural Recording IC Operation Principle

Fig. 3(a) shows the neural recording IC architecture. The recording pixel array has 1024 PPM-based ADPs, and its pixel pitch is matched to the electrode pitch ( $L_E = 36 \ \mu$ m) of the MEA to directly read out each electrode. The output of each ADP is read out by their row and column address location outside the array through the row and column wires ( $V_{row0} - V_{row31}$  and  $V_{col0} - V_{col31}$ ). Then, the row and column readouts process all row and column wires in parallel at each ramp step and output the collision information ( $R_{out}$ [5]) and addresses of the active pixels ( $R_{out}$ [4:0]). The collision decoder reads  $R_{out}$ [5:0] and  $C_{out}$ [5:0] at each of the 256 ramp steps and combines it with the output of an 8-bit counter ( $G_{out}$ [7:0]) to perform the 8-bit PPM. Finally, it outputs the address ( $A_{out}$ ) and data ( $D_{out}$ ) for the collision-free channels with a data valid flag (Valid).

Fig. 3(b) shows a two-channel example and its timing diagram. Each PPM-based ADP consists of a sample and hold  $(f_s = 20 \text{ kHz})$ , an amplifier, filter, and a continuous-time (CT) comparator that drives the local row and column using



Fig. 4. Left: data compression example with primate retina recording (ex vivo). Right: example of pixel activity on the global ramp for a single sample at t = 1.3 ms.

open-drain outputs. First, the acquired samples of each channel  $(V_{30} \text{ and } V_{31})$  are amplified and filtered  $(V_{30C} \text{ and } V_{31C})$  and then compared with  $V_{\text{ramp}}$ . The ramp crossing occurs at  $T_1$  for channel 30, and it makes the comparator output  $(V_{oc30})$  high, which triggers the corresponding wired-OR outputs ( $V_{row0}$ ) and  $V_{col30}$ ). In the same way,  $V_{row0}$  and  $V_{col31}$  are triggered at  $T_2$  for channel 31. Then, the row and column readouts output the corresponding  $R_{out}[4:0]$  (=0 for both) and  $C_{out}[4:0]$ (=30 and 31) at  $T_1$  and  $T_2$  with the collision/no-collision information ( $R_{out}[5]$  and  $C_{out}[5]$ ). The 8-bit PPM outputs at  $T_1$ and  $T_2$  are obtained from  $G_{out}[7:0]$  by the collision decoder along with the  $R_{out}[5:0]$  and  $C_{out}[5:0]$ . In the case that the two channels have different PPM outputs [Fig. 3(b) (bottom left)], the collision decoder outputs  $A_{out}$  and  $D_{out}$  of each channel with Valid = 1 (no collision). However, if the input values are so close that  $T_1$  and  $T_2$  occur in the same ramp step, the two channels have the same PPM outputs [Fig. 3(b) (bottom right)]. In this case, a collision occurs, and the outputs of the decoder ( $A_{out}$  and  $D_{out}$ ) are not valid (Valid = 0), thus discarded.

Fig. 4 shows an example of the data compression with eight-channel data recorded from ex vivo primate retina [23], [24]. A MATLAB behavioral model is used to emulate the array digitization, including the wired-OR compression. As can be seen, the wired-OR compression discards a large number of unwanted samples near the baseline of the neural signals where the probability of having the same PPM outputs is high. In contrast, it retains the more important spike samples of neural signals. This is because spike samples are rare, making the probability of collisions very low.

#### **III. ARCHITECTURAL BENEFITS**

To investigate the benefits of the wired-OR architecture, the readout energy and output data rate of the entire array are compared with those of a conventional ADP array.

#### A. Read-Out Energy Reduction

Fig. 5(a) shows the readout energy of a 1024-channel 8-bit ADP array. For simplicity, the required readout energy/bit for all pixels is assumed to be the bitline access cost  $CV^2$  (assuming 1 V and 0.1 pF for 1-mm bitline). Then, the readout energy/pixel is 0.8 pJ for an 8-bit ADP, and the total energy to read out the entire array is 819.2 pJ (Fig. 6).

Fig. 5(b) shows the readout energy of a 1024-channel PPMbased ADP array. With the same assumption, since the PPM



Fig. 5. Readout energy of (a) 1024-channel 8-bit ADP array and (b) 1024-channel 8-bit PPM-based ADP array.



Fig. 6. Readout energy of 1024-channel 8-bit ADP array and 8-bit PPM-based ADP array according to the number of pulses.



Fig. 7. Total number of pulses in 8-bit PPM-based ADP array during single ramp period. (a) Minimum case. (b) Maximum case.

ADP has a single-bit output on each wire (row and column), the required readout energy/wire is 0.1 pJ. Therefore, the readout energy for a 1024-channel PPM-based ADP array is equal to the total number of pulses on row and column wires during the entire ramp period multiplied by 0.1 pJ and is plotted in Fig. 6. The total number of pulses is minimum when all 1024 pixels have the same PPM output [Fig. 7(a)]. This results in 64 pulses in a single ramp step and a readout energy equal to 6.4 pJ. In contrast, the number of pulses is maximum when the PPM outputs of all pixels are evenly distributed with a unique row and column address. For example, four pixels having a unique row and column address are fired at each ramp step [Fig. 7(b)]. This results in eight pulses at each ramp step and a readout energy equal to 204.8 pJ. Therefore, the readout energy is input-dependent (ranging from 6.4 to 204.8 pJ) and is



Fig. 8. (a) 1000 samples (=50-ms recording) of 1024-channel prerecorded retina neural signals. (b) Distribution of the total number of pulses at 1024-channel 8-bit PPM-based ADP array with the input signals of (a).

 $4 \times$  to  $128 \times$  lower than in the conventional case (Fig. 6). However, both extreme cases are unlikely, considering the statistics of the neural signal. Instead, the average readout energy should be estimated based on real neural signals. Fig. 8(a) shows 1000 samples for 1024 channels of pre-recorded neural signals from ex vivo primate retina [23], [24], which corresponds to 50-ms recording at 20-kHz sampling rate. These neural signals are used as input to the behavioral model described in Section II-B to obtain the total number of pulses per sample [Fig. 8(b)]. The total number of pulses ranges from 325 to 499 with the average number being 420, which corresponds to an average readout energy of 42 pJ. Therefore, the average readout energy reduction is  $19.5 \times$  when compared with an 8-bit conventional ADP array.

#### B. Output Data-Rate Reduction

With a typical sampling frequency of 20 kS/s, the output data rate of the 1024-channel 8-bit ADP array ( $D_{ADP}$ ) can be calculated as follows:

$$D_{\text{ADP}} = 1024 \times 8 \text{ bit} \times 20 \text{ kS/s} = 163.84 \text{ Mb/s.}$$
 (1)

In the case of a 1024-channel PPM-based ADP array with wired-OR compression, 8-bit output data are transmitted at 20 kS/s only when the channels are collision-free. Therefore, the output data rate  $(D_{\rm wOR})$  depends on the total number of collision-free channels (N<sub>cf</sub>) and can be calculated as follows:

$$D_{\rm wOR} = N_{\rm cf} \times 8 \text{ bit} \times 20 \text{ kS/s.}$$
 (2)

Fig. 9(a) shows the output data rate as a function of the number of collision-free channels. Since the number of collision-free channels is input-dependent, the output data rate also should be estimated based on the statistics of the neural signal as done for the readout energy. The number



Fig. 9. (a) Data rate of 1024-channel 8-bit ADP array and 8-bit PPM-based ADP array with wired-OR compression. (b) Distribution of the number of collision-free channels with the input signals in Fig. 8(a).

of collision-free channels ranges from 0 to 15, with an average number of 7 [Fig. 9(b)], corresponding to an average data rate of 1.12 Mb/s. Therefore, a data rate reduction of around  $146 \times$  can be obtained compared with the 1024-channel 8-bit ADP array [Fig. 9(a)]. Even with this large compression rate, the reconstructed signal still retains the critical samples belonging to spikes and allows for efficient spike sorting, cell type classification, and recovery of cell map features [20], [21].

# **IV. IMPLEMENTATION DETAILS**

#### A. Overall Architecture

Fig. 10 shows the top schematic of the neural recording IC. In the neural recording front end, an ac-coupled low-noise boxcar (LNB) sampler and a low-pass filter (LPF) are implemented for sample and hold, amplification, and filtering, which are followed by a CT comparator to compare the input signal against the global ramp signal ( $V_{\text{RAMP}}$ ) and the wired-OR logic. The local clock generator provides all the phases for the recording front end from the system clock ( $f_{\text{ck}}$ ). The reference electrode is built-in on-chip and implemented with an electrode ring around the 32 × 32 MEA, which is actively driven by the neural recording IC.

The row and column pulse readout comprises a pulse detector for each row/column wire and a decoder. The pulse detector samples the output of the wired-OR logic using a negative-edge triggered flip-flop and uses a tunable pull-up current source to reset the wired-OR line ( $I_{pull-up}[3:0]$ ). The pulse decoder uses one-hot detection on the row/column wires to detect a collision ( $R_{out}[5]$  and  $C_{out}[5]$ ) and performs one



Fig. 10. Top schematic of the neural recording IC.



Fig. 11. Architecture of neural recording front end and its timing diagram.

hot to binary conversion to output the address of the active row/column wires ( $R_{out}[4:0]$  and  $C_{out}[4:0]$ ).

The collision decoder outputs the row and column address together with the associated value from the global counter (Addr<sub>Row</sub>[4:0], Addr<sub>Col</sub>[4:0], and  $D_{out}$ [7:0]) and a flag signaling whether the output is collision-free or not (Valid). Using up to eight wires ( $w_{en}$ [7:0]) per row and column, the array can be split into multiple sub-arrays. This generates multiple levels for collision decoding and allows to control the collision rate (i.e., degree of compression).

The global ramp generator consists of a current source  $(I_{ramp})$  and a tunable capacitor bank  $(C_{ramp})$  and  $C_{unit}$  with Cal[7:0]), followed by a unity-gain buffer to drive the 1024 channels. Also, a global bias generator provides the current bias ( $I_{\text{biasp}}$  and  $I_{\text{biasn}}$ ) to each pixel. The digital control unit is used to configure the chip and transmit the output data (Addr<sub>Row</sub>[4:0], Addr<sub>Col</sub>[4:0], and  $D_{out}$ [7:0]) off the chip using serial communication. The  $f_{ck}$  is 6.2 MHz, and the input sampling rate  $(f_s)$  is 20 kS/s.

#### B. Neural Recording Front End

Fig. 11 shows the architecture of the neural recording front end and its timing diagram. An ac-coupled LNB sampler minimizes the noise penalty from noise folding with its

1128



Fig. 12. Schematic of (a) front end  $G_m$  and (b) continuous-time comparator with auto-zeroing and in-pixel output offset calibration.

inherent anti-aliasing property due to the notches at the multiples of sampling frequency (=20 kHz) and provides rail-to-rail electrode dc offset tolerance. To minimize area, the circuit uses a 1.1-pF input MOM capacitance  $(C_{IN})$  on top of ESD and active devices. The LNB uses an inverter-based  $G_m$  with a large feedback resistor ( $R_{\rm HPF} \cong 21.3 \text{ G}\Omega$ ) for dc biasing and setting the high-pass corner ( $f_{\rm HP} = 300$  Hz). The  $R_{\rm HPF}$  is realized with a duty-cycled resistor (DCR), which consists of a 50-M $\Omega$  transistor in a triode region and a switch with 1/43 duty cycle ( $\varphi_{\text{DCR}} \approx 7.2T_{\text{ck}}$ ), resulting in low noise and small area. The duty cycle is globally programmable with a 4-bit binary delay control unit. The output of the inverter-based  $G_m$  is integrated on  $C_{INT}$  for 296 $T_{ck}$  ( $\varphi_{INT}$ ) and then sampled on  $C_{LPF}$ for  $8T_{ck}$  ( $\varphi_{LPF}$ ) to implement a passive switched-capacitor lowpass filter (SC-LPF) without additional power consumption. The SC-LPF pole and the null from the boxcar result in an overall  $f_{LP} = 5$  kHz. The overall front end has a bandpass filter (BPF) response with a gain and bandwidth of 38 dB and 300 Hz–5 kHz;  $6T_{ck}$  are allocated to reset  $C_{INT}$  between samples ( $\varphi_{RST}$ ), which leads to  $f_{ck} = 310 f_s$  (=6.2 MHz). During the reset phase of the LNB, the outputs of the  $G_m$  cell are connected to set the common-mode voltage, which is then copied to the input by the DCR resistor. During the integration phase, the previous sample stored in  $C_{\text{LPF}}$  is compared with the global ramp for PPM. The 8-bit conversion phase lasts  $(256 + 40)T_{ck}$  to compensate for the comparator latency. The comparator includes auto-zeroing ( $\varphi_{AZ}$ ) and in-pixel offset calibration to minimize the offset between channels to the level required by the wired-OR compression [20]. The ramp ( $V_{ramp,P}$ ) and  $V_{\text{ramp,N}}$ ) range and slope can be set to change the ADC resolution and input range of the pixel. In the wired-OR logic block, the comparator output trigger edge is converted into the pulse having a width of  $T_{ck}$  and synchronized to  $f_{ck}$  by the feedback synchronizer ( $V_{\text{sense}}$ ), resulting in an 8-bit PPM. Then, it is transmitted outside the array through the row and column OR buses. According to the array configuration for the number of wires  $(w_{en}[7:0]), w_{sel}[7:0]$  determines the channel connection to one of the row bus wires (Bus<sub>Row</sub>[7:0]).

The  $G_m$  cell is implemented with a current starved inverter self-biased by a DCR [Fig. 12(a)]. With a bias current of only 100 nA, the resulting  $G_m$  is 2.8  $\mu$ S, which corresponds to an integrated input-referred thermal noise of 6  $\mu$ V<sub>rms</sub> over 1 Hz–10 kHz. The input-referred noise contribution from the



Fig. 13. Schematic of row (or column) pulse read-out circuit and its timing diagram.

DCR is designed to be negligible (= 2.3  $\mu V_{rms}$ ) compared with  $G_m$ , while it only occupies an area of  $1.6 \times 2.8 \ \mu m^2$ . Even with the  $G_m$  cell device size of 13.5  $\mu$ m<sup>2</sup>, the standard deviation of the recording front-end's input-referred offset is 18  $\mu V_{rms}$  based on Monte Carlo simulations because of the auto-zeroing at the comparator. The CT comparator is implemented with four inputs differential to single-ended architecture [Fig. 12(b)]. It consumes 80 nA in the input pairs (40 nA per branch), which are sized, such that, when combined with auto-zeroing, the offset of the comparator does not degrade the noise and offset performance of the pixel. Inpixel offset calibration circuits are added at the single-ended output to further reduce the offset variation across pixels. By adjusting the amount of sink and source offset current at the output, a delay in the CT comparator output is introduced, which is equivalent to controlling the ADC digital output value. The maximum calibration range is  $\pm 7$  LSB, which corresponds to  $17.9-\mu V$  input-referred offset.

# C. Row and Column Pulse Readout

Fig. 13 shows the schematic of the row (or column) pulse readout circuit and its timing diagram. The 4-bit programmable pull-up current source drives  $32 \times$  pixel output OR logic and its bus routing line ( $V_{\text{row,N}}$  or  $V_{\text{col,N}}$ ). As soon as one of the comparator output pulses ( $V_{\text{out,n}}$ ) at a row (or column) triggers the output OR logic, the  $V_{\text{row,N}}$  (or  $V_{\text{col,N}}$ ) goes low, and the



Fig. 14. Schematic of the global ramp generation circuit.

buffered output  $V_{\text{row,NB}}$  (or  $V_{\text{col,NB}}$ ) goes high. Then, the  $V_{\text{row,NB}}$ (or  $V_{\text{col,NB}}$ ) is sampled by the flip-flop at the opposite phase of  $f_{\text{ck}}$  ( $t_2$ ), and the pulse output  $V_{\text{row,N,out}}$  (or  $V_{\text{col,N,out}}$ ) of the *N*th row (or column) is transmitted to the row (or column) pulse decoder. If the parasitic capacitance ( $C_{\text{bus}}$ ) from the  $32 \times$  pixel output OR logic and the  $V_{\text{row,N}}$  (or  $V_{\text{col,N}}$ ) bus routing is so large, such that the pull-up current source cannot charge it within a half  $f_{\text{ck}}$  cycle, multiple counts of the pulse occur (e.g., double count occurs if  $V_{\text{row,NB}}$  is still high even after  $t_4$ ). Therefore, the pull-up current source is designed to have its row (or column) driving capability.

#### D. Global Ramp Generator

Fig. 14 shows the schematic of the global ramp generation circuit. The ramp starting points ( $V_{top}$  and  $V_{bot}$ ) are generated by a regulated resistor divider around the common mode voltage ( $V_{cm}$ ) and are programmable with 4-bit resolution (Cal<sub>R</sub>[3:0]). The source and sink current sources ( $I_P$  and  $I_N =$ 15 nA) and the capacitors, including  $C_{ramp}$  and the tunable capacitor bank ( $C_{MSB}$  and  $C_{LSB}$ ), determine a ramp slope, which is adjustable with 8-bit resolution. The resulting input range of the pixel is from 0.75 to 2.25 mV<sub>pp</sub>. The reset and ramp timing ( $\varphi_{rst}$  and  $\varphi_{ramp}$ ) is equal to the reset and integration timing of the pixel. Finally, the unity gain buffer is designed to have the driving capability of a 1024-channel array, including gate and routing parasitic to ensure even distribution of the ramp signals ( $V_{ramp}$ , p and  $V_{ramp}$ , n) across the array.

# V. MEASUREMENT RESULTS

#### A. Electrical and In Vitro Measurements

The prototype IC was fabricated in a 28-nm standard CMOS process with a 1-V supply voltage. It occupies a total active area of 3.27 mm<sup>2</sup>. The size of the 1024-channel array is  $1.2 \times 1.2 \text{ mm}^2$ , and the size of each pixel is only  $36 \times 36 \ \mu\text{m}^2$  ( $\simeq 0.00129 \text{ mm}^2$ ) with a  $15 \times 15 \ \mu\text{m}^2$  electrode deposited directly on top. As shown in Fig. 15, the pixel area is dominated by the ESD protection diode and the input capacitor.

Fig. 16 shows a power breakdown of the full chip and the pixel. The measured total power consumption of the neural recording IC is 508.7  $\mu$ W, and the corresponding chip total power per channel is 496 nW. It should be noted that the power consumption of the pixel array and digital part dominate, while the row and column readouts are only 2.6% of the total. The total power consumption of the pixel is only 268.4 nW, and



Fig. 15. Die photograph.



Fig. 16. Power breakdown of full chip and pixel.

it is mostly dominated by the comparator and the  $G_m$  cell. The power in the  $G_m$  cell is limited by noise requirements, while the power in the comparator is limited by bandwidth requirements. Note that the pixel power also includes the power consumption from the ramp generation, which is shared among all channels and accounts for around 20% of the total pixel power budget. The digital power could be further reduced by supply voltage scaling and by introducing a multi-clock domain and implementing more aggressive clock gating, since the activity in the processing pipeline is mostly driven by collision-free events.

Fig. 17(a) shows the measured pixel frequency response. The bandpass filter response is obtained with a high-pass and low-pass pole of 300 Hz and 5 kHz, respectively, and an in-band gain of 38 dB, which is well matched with the simulation results. Fig. 17(b) shows the measured output spectrum of the single pixel when a 1-kHz,  $1-mV_{pp}$  sine wave is applied at the input. Under these conditions, the pixel achieves a peak SNDR and SFDR of 34 and 63 dB, respectively, and the corresponding input-referred noise is 7  $\mu V_{rms}$ .

Fig. 18 shows the in vitro test setup and neural spike recording result with a single channel. The platinum (Pt) electrodes of  $15 \times 15 \ \mu m^2$  were deposited on each pixel post-fabrication, and the pre-recorded retina neural signals were injected in saline solution using an arbitrary waveform generator (Keysight 33500B) connected to a platinum wire. The neural recording IC was encapsulated after wire bonding, so that only the MEA was exposed to saline solution. The measured retina neural spike waveform shows that even with small and high-impedance ( $\approx 1.15 \ M\Omega \ at 1 \ kHz$ ) electrodes, the IC can accurately record neural spikes.



Fig. 17. Measured single-channel characteristics. (a) Frequency response. (b) Output spectrum.



Fig. 18. In vitro test setup and measured neural spike waveform.



Fig. 19. Measured 1024-channel characteristics. (a) Noise distribution. (b) Offset distribution.

Fig. 19 shows the measured input-referred noise and offset distribution of all 1024 channels. The saline solution is grounded in the in vitro test setup (Fig. 18), and sputtered



Fig. 20. Measured sinewaves with wired-OR compression. (a) Single channel recording. (b) Two channels recording.

iridium oxide film (SIROF) electrodes are additionally deposited on each pixel to reduce the electrode impedance ( $\approx 250 \text{ k}\Omega$  at 1 kHz). The mean and standard deviation of the 1024-channel array are 7.4 and 1.07  $\mu V_{rms}^{-1}$  [Fig. 19(a)], respectively, which shows an even pixel noise characteristic over the entire array. The measured 1024-channel inputreferred offset distribution is shown in Fig. 19(b). The standard deviation of input-referred offset for the 1024-channel array is 14.4  $\mu V$ , which is within the pixel-to-pixel offset calibration range of 18  $\mu V$ .

Fig. 20 shows sine-wave measurements to visualize the wired-OR compression. A sine wave is applied to a single channel, while all other channels are connected to the grounded saline solution [Fig. 20(a)]. As can be seen, all samples outside the baseline are captured, while missing samples near the baseline are reconstructed using an interpolation filter. The interpolation is performed with a three-tap non-causal finite impulse response (FIR) filter with coefficients  $b_{-1} = 0.5$ ,  $b_0 = 0$ , and  $b_{+1} = 0.5$ . Fig. 20(b) shows data-compressive sine-wave measurement with two active channels. All critical samples for reconstructing the two signals are still captured, since the two sinewaves are out of phase and rarely present the same digital value at the same time.

Fig. 21 shows data-compressive measurements of a retinal neural spike signal. The pre-recorded neural signal is applied to a test channel, while all other channels are connected to the grounded saline solution. As can be seen, all the spike samples

<sup>1</sup>In [22], the numbers were measured with Pt electrodes deposition on each pixel. Those are revised to the measured values with the additional deposition of SIROF electrodes on each pixel.



Fig. 21. Measured pre-recorded injected neural spikes with wired-OR compression.



Fig. 22. (a) Ex vivo test setup. (b) Measured neural signals.

are well captured, while the baseline samples are discarded. The missing samples near the baseline are also reconstructed using a simple interpolation filter. The compression rate is inversely proportional to the spike rate. Here, a  $12.5 \times$  compression rate is achieved with an artificially large spike rate of 13.2% (10× larger than typical spike rates).

# B. Ex Vivo Validation

The neural recording IC was further validated through an ex vivo experiment with a rat retina. Fig. 22(a) shows the experimental setup used to obtain ex vivo recordings. Dissected rat retina tissue is flattened onto the MEA using a mini-plug covered with a dialysis membrane controlled by a micro-manipulator, so that the retinal ganglion cells are close to the SIROF electrodes. The ex vivo tissue is perfused with perfluoro liquid to keep it healthy during the experiment.

The recording IC is able to recover spikes with high fidelity during single-channel recordings without compression—see Fig. 22(b). Spikes can also be recovered when the wired-OR algorithm is enabled, and all 1024 channels are active—see Fig. 23. As expected, the number of wires configured in



Fig. 23. Measured neural spikes with wired-OR compression according to the number of wires (ex vivo). Reconstructed spikes during wired-OR recordings are compared against average spikes during single-channel recordings.

the array controls the compression-accuracy trade-off. In this experiment, the average spike rate is 30.6 spikes/s, and the compression rate ranges from  $111.2 \times$  to  $38.8 \times$  with one wire and four wires, respectively. Here, compression is defined as the number of collision-free channels over the total 1024 channels for every sample. Waveform distortion can compromise certain BCI tasks where spike sorting is needed. The analysis of this compression-accuracy trade-off is beyond the scope of this article. The reader can refer to our previous work in [20] and [21] for an extensive analysis of this trade-off across multiple experimental datasets.

#### C. Comparison With State-of-the-Art Works

Table I shows the performance summary and comparison with other state-of-the-art neural recording ICs [9], [10], [17], [25], [26], [27], [28], [29]. This work focuses solely on action potentials, which have been shown to achieve the highest performance in motor BCI tasks when compared with local field potentials [5]. As a result, the AFE bandwidth is limited to high-frequency content, and the required ADC resolution is limited to 8 bits [5]. With the largest number of channels, this work achieves the lowest power consumption per channel with sufficiently low input-referred noise (IRN) required for effective neural recording. Especially, *power/Ch* and *chip total power/Ch* are significantly reduced to hundreds of nW levels, which are  $10.1 \times$  and  $16.8 \times$  lower than the best prior works,

TABLE I Performance Summary and Comparison With State-of-the-Art Works

|                                    | This work          | [25]                                   | [17]                         | [10]                   | [26]                                | [27]                        | [9]                           | [28]                       | [29]                       |
|------------------------------------|--------------------|----------------------------------------|------------------------------|------------------------|-------------------------------------|-----------------------------|-------------------------------|----------------------------|----------------------------|
| Technology<br>[nm]                 | 28                 | 22                                     | 180                          | 65                     | 55                                  | 130                         | 180                           | 180                        | 130                        |
| Supply [V]                         | 1.0                | 0.8                                    | 1.8                          | 1.2                    | 1.2                                 | 1.2                         | 0.5/1/1.8                     | 0.5                        | 1.2                        |
| Input type                         | AC-coupled         | AC-coupled                             | DC-coupled                   | AC-coupled             | DC-coupled                          | AC-coupled                  | AC-coupled                    | AC-coupled                 | AC-coupled                 |
| Topology                           | Boxcar<br>+ SS ADC | $1^{st}$ order<br>$\Delta\Delta\Sigma$ | 2  step<br>I $\Delta \Sigma$ | IA<br>+ SAR            | $2^{nd}$ order $\Delta\Delta\Sigma$ | IA<br>+ SAR                 | $IA + \Delta \Delta \Sigma$   | LNA<br>+ SAR               | LNA<br>+ SAR               |
| Type of signal                     | АР                 | LFP+AP                                 | LFP+AP                       | LFP+AP                 | LFP+AP                              | LFP+AP                      | LFP+AP                        | AP                         | AP                         |
| # of<br>Channels                   | 1024               | 128                                    | 8-24                         | 1024                   | 16                                  | 384                         | 1024                          | 16                         | 64                         |
| BW [Hz]                            | 300-5k             | 0.5-10k                                | 0.5-10k                      | 0.5-10k                | 0.5-10k                             | 0.5-10k                     | 0.4-9.2k                      | 1-6.8k                     | 192-7.4k                   |
| ADC [bit]                          | 8                  | -                                      | 11                           | 10                     | -                                   | 14                          | 11/8                          | 8                          | 8                          |
| Power/Ch<br>[µW]                   | 0.268              | 6.02                                   | 8.59                         | 2.72                   | -                                   | 48.7                        | -                             | 0.88                       | 3.04/4.54                  |
| Chip Total<br>Power/Ch<br>[µW]     | 0.496              | 8.34                                   | 14.94                        | 24.08                  | 61.2                                | 95.1                        | 15.35                         | -                          | 5.15                       |
| IRN [ $\mu V_{rms}$ ]              | 7.4<br>(AP)        | 7.71 (AP)<br>11.9 (LFP)                | 4.37 (AP)<br>2.72 (LFP)      | 8.89 (AP)<br>6.8 (LFP) | 5.53 (AP)<br>2.88 (LFP)             | 7.43 (AP)<br>7.78 (LFP)     | 5.18<br>(LFP+AP)              | 5.4<br>(AP)                | 3.8<br>(AP)                |
| *NEF / PEF<br>(AP band)            | 2.84 / 8.07        | 9.6 / 73.7                             | 4.85 / 42.4                  | 15.3 /<br>282.8        | 15.2 /<br>278.2                     | 25.5 / 650.3                | - / 59.4                      | -                          | 3.32 /<br>13.27            |
| Input range<br>[mV <sub>pp</sub> ] | 0.75-2.25          | 43                                     | 14                           | 0.75-4.87              | 148                                 | 12.5                        | -                             | -                          | -                          |
| THD [%]                            | 0.097<br>@-3dBFS   | 0.015<br>@21.5mV <sub>pp</sub>         | 0.078<br>@10mV <sub>pp</sub> | 0.57<br>@-0.8dBFS      | 0.05<br>@20mV <sub>pp</sub>         | 0.17<br>@10mV <sub>pp</sub> | 0.062<br>@3.2mV <sub>pp</sub> | 2.2<br>@92mV <sub>pp</sub> | 0.08<br>@3mV <sub>pp</sub> |
| Area/Ch<br>[mm <sup>2</sup> ]      | 0.00129            | 0.0045                                 | 0.0046                       | 0.0062                 | 0.0077                              | 0.035                       | 0.098                         | 0.16                       | 0.16                       |
| EDO<br>Tolerance                   | Rail-to-Rail       | Rail-to-Rail                           | $\pm$ 60 mV                  | Rail-to-Rail           | $\pm$ 70 mV                         | Rail-to-Rail                | Rail-to-Rail                  | Rail-to-Rail               | Rail-to-Rail               |

\*NEF / PEF are calculated using Chip Total Power/Ch

respectively [10], [25]. This enables a sub-mW chip total power consumption even with a 1024-channel array. As a result, this work achieves the highest power efficiency among neural recording ICs with the best NEF and PEF of 2.84 and 8.07, respectively, advancing the PEF of the best state of the art by  $5.2\times$ . It should be noted that NEF and PEF are calculated by using chip total power/Ch to compare the power efficiency of the entire neural recording IC, which includes a signal acquisition chain (front-end amplifier and ADC, or only ADC in the cases of direct conversion), and digital back end. It also achieves the smallest area per channel ( $=0.00129 \text{ mm}^2$ ) among all neural recording ICs, advancing area efficiency of the best state of the art by  $3.5 \times$ . This enables a single-cell resolution neural interface, while the wired-OR compression method significantly reduces the data deluge problem from massive MEA and immense data movement in the recording chain without any spike detection overhead.

# VI. CONCLUSION

A 1024-channel data-compressive neural recording IC is realized for future single-cell resolution high-bandwidth BCIs. It achieves a high-density and large-scale recording array by implementing PPM-based ADP, which significantly reduces routing congestion. By using a wired-OR data compression method, the data-deluge problem in large-scale MEAs is mitigated. Also, on-chip massive data movement and spike detection overhead are avoided, thus enabling massively parallel recording arrays. The prototype achieves the power consumption per channel of 268 nW and an area per channel of 36 × 36  $\mu$ m<sup>2</sup> with 7.4- $\mu$ V<sub>rms</sub> input-referred noise and 0.3– 5-kHz bandwidth, which results in the best power and area efficiency among the neural recording ICs published to date. The neural recording IC architecture offers great promise in enabling massively parallel single-cell resolution MEAs for future BCIs.

#### ACKNOWLEDGMENT

The chip fabrication was provided by the TSMC University Shuttle Program.

#### REFERENCES

- F. J. Santos, R. M. Costa, and F. Tecuapetla, "Stimulation on demand: Closing the loop on deep brain stimulation," *Neuron*, vol. 72, no. 2, pp. 197–198, Oct. 2011.
- [2] M. W. Slutzky, "Brain-machine interfaces: Powerful tools for clinical treatment and neuroscientific investigations," *Neuroscientist*, vol. 25, no. 2, pp. 139–154, Apr. 2019.
- [3] M. A. Lebedev and M. A. L. Nicolelis, "Brain-machine interfaces: From basic science to neuroprostheses and neurorehabilitation," *Physiolog. Rev.*, vol. 97, no. 2, pp. 767–837, Apr. 2017.
- [4] Tracking Advances in Neural Recording. Accessed: Apr. 27, 2023.
  [Online]. Available: https://stevenson.lab.uconn.edu/scaling/
- [5] N. Even-Chen et al., "Power-saving design opportunities for wireless intracortical brain–computer interfaces," *Nature Biomed. Eng.*, vol. 4, no. 10, pp. 984–996, Aug. 2020.
- [6] S. R. Nason et al., "A low-power band of neuronal spiking activity dominated by local single units improves the performance of brainmachine interfaces," *Nature Biomed. Eng.*, vol. 4, no. 10, pp. 973–983, Jul. 2020.

- [7] D. G. Muratore and E. J. Chichilnisky, "Artificial retina: A future cellular-resolution brain-machine interface," in NANO-CHIPS On-Chip AI for an Efficient Data-Driven World, B. Murmann and B. Hoefflinger, Eds. Cham, Switzerland: Springer, 2020, pp. 443–465.
- [8] A. H. Marblestone et al., "Brain-machine interfaces: From basic science to neuroprostheses and neurorehabilitation," *Frontiers Comput. Neurosci.*, vol. 97, no. 2, pp. 1–34, Apr. 2017.
- [9] S.-Y. Park, J. Cho, K. Lee, and E. Yoon, "Dynamic power reduction in scalable neural recording interface using spatiotemporal correlation and temporal sparsity of neural signals," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1102–1114, Apr. 2018.
- [10] D.-Y. Yoon, S. Pinto, S. Chung, P. Merolla, T.-W. Koh, and D. Seo, "A 1024-channel simultaneous recording neural SoC with stimulation and real-time spike detection," in *Proc. Symp. VLSI Circuits*, Jun. 2021, pp. 1–2.
- [11] M. Shoaran, M. H. Kamal, C. Pollo, P. Vandergheynst, and A. Schmid, "Compact low-power cortical recording architecture for compressive multichannel data acquisition," *IEEE Trans. Biomed. Circuits Syst.*, vol. 8, no. 6, pp. 857–870, Dec. 2014.
- [12] C. Aprile et al., "Adaptive learning-based compressive sampling for lowpower wireless implants," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 11, pp. 3929–3941, Nov. 2018.
- [13] T. Wu, W. Zhao, E. Keefer, and Z. Yang, "Deep compressive autoencoder for action potential compression in large-scale neural recording," *J. Neural Eng.*, vol. 15, no. 6, Dec. 2018, Art. no. 066019.
- [14] K. Prabhu et al., "CHIMERA: A 0.92-TOPS, 2.2-TOPS/W edge AI accelerator with 2-MByte on-chip foundry resistive RAM for efficient training and inference," *IEEE J. Solid-State Circuits*, vol. 57, no. 4, pp. 1013–1026, Apr. 2022.
- [15] J. Dragas et al., "In vitro multi-functional microelectrode array featuring 59 760 electrodes, 2048 electrophysiology channels, stimulation, impedance measurement, and neurotransmitter detection channels," *IEEE J. Solid-State Circuits*, vol. 52, no. 6, pp. 1576–1590, Apr. 2017.
- [16] C. M. Lopez et al., "A multimodal CMOS MEA for high-throughput intracellular action potential measurements and impedance spectroscopy in drug-screening applications," *IEEE J. Solid-State Circuits*, vol. 53, no. 11, pp. 3076–3086, Nov. 2018.
- [17] D. Wendler et al., "A 0.0046-mm<sup>2</sup> two-step incremental delta–sigma analog-to-digital converter neuronal recording front end with 120-mVpp offset compensation," *IEEE J. Solid-State Circuits*, vol. 58, no. 2, pp. 439–450, Feb. 2023.
- [18] M. Haerinia and R. Shadid, "Wireless power transfer approaches for medical implants: A review," *Signals*, vol. 1, no. 2, pp. 209–229, Dec. 2020.
- [19] S. Kleinfelder, S.-I. Lim, X. Liu, and A. El Gamal, "A 10 kframe/s 0.18 μm CMOS digital pixel sensor with pixel-level memory," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2001, pp. 88–89.
- [20] D. G. Muratore, P. Tandon, M. Wootters, E. J. Chichilnisky, S. Mitra, and B. Murmann, "A data-compressive wired-OR readout for massively parallel neural recording," *IEEE Trans. Biomed. Circuits Syst.*, vol. 13, no. 6, pp. 1128–1140, Dec. 2019.
- [21] P. Yan et al., "Data compression versus signal fidelity tradeoff in wired-OR analog-to-digital compressive arrays for neural recording," *IEEE Trans. Biomed. Circuits Syst.*, vol. 17, pp. 754–767, no. 4, Aug. 2023.
- [22] M. Jang et al., "A 1024-channel 268 nW/pixel 36 × 36 μm<sup>2</sup>/ch datacompressive neural recording IC for high-bandwidth brain–computer interfaces," in *Proc. IEEE Symp. VLSI Technol. Circuits*, Jun. 2023, pp. 1–2.
- [23] A. M. Litke et al., "What does the eye tell the brain? Development of a system for the large-scale recording of retinal output activity," *IEEE Trans. Nucl. Sci.*, vol. 51, no. 4, pp. 1434–1440, Aug. 2004.
- [24] E. S. Frechette, A. Sher, M. I. Grivich, D. Petrusca, A. M. Litke, and E. J. Chichilnisky, "Fidelity of the ensemble code for visual motion in primate retina," *J. Neurophysiol.*, vol. 94, no. 1, pp. 119–135, Jul. 2005.
- [25] X. Yang et al., "A 128-channel AC-coupled 1st-order  $\delta \delta \sigma$  IC for neural signal acquisition," in *Proc. IEEE Symp. VLSI Technol. Circuits*, Jun. 2022, pp. 60–61.
- [26] S. Wang et al., "A 77-dB DR 16-ch 2nd-order  $\delta \delta \sigma$  neural recording chip with 0.0077 mm<sup>2</sup>/Ch," in *Proc. Symp. VLSI Circuits*, Jun. 2021, pp. 1–2.
- [27] S. Wang et al., "A compact quad-shank CMOS neural probe with 5,120 addressable recording sites and 384 fully differential parallel channels," *IEEE Trans. Biomed. Circuits Syst.*, vol. 13, no. 6, pp. 1625–1634, Dec. 2019.

- [28] S.-J. Kim et al., "A sub-μW/Ch analog front-end for Δ-neural recording with spike-driven data compression," *IEEE Trans. Biomed. Circuits Syst.*, vol. 13, no. 1, pp. 1–14, Feb. 2019.
- [29] M. Delgado-Restituto, A. Rodríguez-Pérez, A. Darie, C. Soto-Sánchez, E. Fernández-Jover, and Á. Rodríguez-Vázquez, "System-level design of a 64-channel low power neural spike recording sensor," *IEEE Trans. Biomed. Circuits Syst.*, vol. 11, no. 2, pp. 420–433, Apr. 2017.



**Moonhyung Jang** (Member, IEEE) received the B.Sc. (summa cum laude) and Ph.D. degrees in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2014 and 2021, respectively. His Ph.D. research was in the field of high-resolution power-efficient continuous-time delta–sigma A/D conversion.

He is currently a Post-Doctoral Research Fellow with the Murmann Mixed-Signal Group, Stanford University, Stanford, CA, USA. His current research interests include low-power data converters, high-

bandwidth single-cell resolution brain-machine interfaces (BMI), in-memory computing-based deep neural network (DNN) accelerators, and various high-performance mixed-signal integrated circuits and systems.

Dr. Jang was a recipient of the 2020–2021 IEEE Solid-State Circuits Society Predoctoral Achievement Award, the 2020 Yonsei-Samsung Semi-Conductor Research Center Best Paper Award, the 2020 Samsung Human-Tech Paper Award Silver Prize in Circuit Design, and the 2018 Samsung Human-Tech Paper Award Bronze Prize in Circuit Design. He has served as a reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS.



**Maddy Hays** received the B.Sc. degree in biomedical engineering and minors in electrical engineering, mathematics, and physics from Virginia Commonwealth University, Richmond, VA, USA. She is currently pursuing the Ph.D. degree in bioengineering with Stanford University, Stanford, CA, USA, with a focus on neuroscience.

Some of her previous work includes dielectric modeling of complex tissues for implantable RF systems and the study of microRNAs as potential therapeutics for osteoarthritis. As part of the collab-

orative artificial retina (AR) project, she aims to develop the computational pipeline that will allow the AR neural interface to identify single cells by cell type, record electrical activity with single-spike resolution, and selectively stimulate these cells to manipulate cellular population dynamics in awake behaving monkeys. Her research interests revolve around the use of technology to answer questions regarding retinal ganglion cell-type contributions in visual perception and acuity.



Wei-Han Yu (Member, IEEE) received the Ph.D. degree from the University of Macau (UM), Macau, China, in 2018.

From 2019 to 2021, He was a Visiting Scholar at the Muramnn Mixed-Signal Group, Stanford University, Stanford, CA, USA. He has been an Assistant Professor with the State-Key Laboratory of Analog and Mixed-Signal VLSI (AMSV), UM, since 2021. His research interests include edge AI, in-memory computing, switched capacitor circuits, energy harvested RF transceivers, and neural interfaces.

Dr. Yu received the IEEE ISSCC Student Travel Grant Award, the FDCT Science and Technology Postgraduate Student Award in 2016, and the IEEE SSCS Predoctoral Achievement Award in 2018.



**Changuk Lee** (Member, IEEE) received the B.S. and Ph.D. degrees in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2016 and 2022, respectively.

He is currently a Post-Doctoral Researcher with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA. His current research interests include data converters, low-noise sensor interfaces, high-precision analog circuits, and wireless neural interfaces.

Dr. Lee was a recipient of the IEEE SSCS Pre-Doctoral Achievement Award in 2022 and the IEEE SSCS Student Travel Grant Award in 2022. He received the Bronze Prize, the Silver Prize, and the Gold Prize in the Samsung Human-Tech Paper Award in Circuit Design hosted by Samsung Electronics in 2018, 2020, and 2022, respectively. He served as a reviewer for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS.



**Pietro Caragiulo** (Member, IEEE) received the B.S. and M.Sc. degrees in electrical engineering from the Politecnico di Bari, Bari, Italy, in 2007 and 2010, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2022.

From 2010 to 2018, he was with the SLAC National Accelerator Laboratory, Menlo Park, CA, USA, where he was involved in the development of high-frame rate cameras and time-of-flight sensors. From 2022 to 2023, he was with Meta Inc., Cam-

bridge, MA, USA, as a Silicon Research Scientist with the AR/VR Division. He is currently with Apple Inc., Cupertino, CA, USA.

Dr. Caragiulo was a recipient of the Stanford Graduate Fellowships in Science and Engineering (SGF) in 2018 and the ADI Outstanding Student Designer Award in 2020.



Athanasios T. Ramkaj (Member, IEEE) received the M.Sc. degree (cum laude) in electrical engineering (microelectronics) from TU Delft, Delft, The Netherlands, in 2014, and the Ph.D. degree (summa cum laude) in electrical engineering from KU Leuven, Leuven, Belgium, in 2021. His Ph.D. research was in the field of multi-GHz bandwidth power-efficient Nyquist A/D converters.

From 2013 to 2014, he was a Research Intern with the Central Research and Development Department, NXP Semiconductors, Eindhoven, The Netherlands,

where he worked on GHz-range A/D converters for communication systems. In 2019, he was a Research/Design Intern with the High Speed Data Converters Group, Analog Devices Inc., Wilmington, MA USA, investigating highly integrated solutions for bandwidth extension of next-generation RF A/D converters. From 2021 to 2022, he was with the Murmann Mixed-Signal Group, Stanford University, Stanford, CA USA, as a Post-Doctoral Research Fellow, and also a Visiting Researcher at Kilby Labs, Texas Instruments, Santa Clara, CA USA, investigating multi-GHz ultralow jitter A/D solutions and heterogeneous integration. He is currently a member of Technical Staff Silicon Design Engineer with the AMD-Xilinx Wired and Wireless Group, San Jose, CA USA, developing next-generation wireline transceivers. His main research interests include wide-bandwidth data converters, high-speed analog/mixed-signal circuits for wireline/wireless systems, RF sampling receiver front ends, and ultralow-jitter clocking.

Dr. Ramkaj is a member of the Technical Program Committee of the European Solid State Circuits Conference (EESCIRC). He was a recipient of the 2021 Analog Devices Outstanding Student Designer Award, the 2019–2020 IEEE Solid-State Circuits Society Predoctoral Achievement Award, and the 2015 IEEE PRIME Golden Leaf Best Student Paper Award. He also serves as a reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: REGULAR PAPERS, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS.



**Pingyu Wang** received the B.Eng. degree in mechanical engineering and the M.Phil. degree in materials science and engineering from The University of Hong Kong, Hong Kong. He is currently pursuing the Ph.D. degree with the Department of Materials Science and Engineering, Stanford University, Stanford, CA, USA.

Leveraging advanced microfabrication technologies, his current research focuses on developing large-scale and high-resolution neural interfaces for the retina and other parts of the nervous system.



**A. J. Phillips** (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering and computer science from New Mexico State University, Las Cruces, NM, USA, in 2020, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2023, where he is currently pursuing the Ph.D. degree in electrical engineering.

His research interests include neuroengineering, digital signal processing, and optimization. Currently, his research mainly focuses on adaptively

recording and stimulating neural populations at single-cell, single-spike resolution.

Dr. Phillips was a recipient of the National Science Foundation Graduate Research Fellowship.



Nicholas Vitale (Graduate Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Case Western University, Cleveland, OH, USA, in 2017 and 2019, respectively, and the second M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2021, where he is currently pursuing the Ph.D. degree in electrical engineering under the supervision of Thomas H. Lee.

He is broadly interested in modeling biological signals and systems as well as designing scalable integrated sensors for molecular diagnostics.



**Pulkit Tandon** (Member, IEEE) received the B.Tech. degree in electrical engineering from IIT Bombay, Mumbai, India, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2022, with a focus on multiple interdisciplinary problems at the intersection of compression, large-scale data analysis, neuroengineering, and perceptual engineering.

He is currently a Research Engineer at Granica Inc., Mountain View, CA, USA. He is interested in data optimization for machine learning and compression.



**Pumiao Yan** (Graduate Student Member, IEEE) received the B.Sc. degree in electrical and computer engineering from Cornell University, Ithaca, CA, USA, in 2018, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2020, where she is currently pursuing the Ph.D. degree.

She is a Seth A. Ritch Bio-X Graduate Student Fellow with Stanford University. Her research focuses on algorithm-hardware co-design and signal processing for analog-to-digital compression hardware architectures for neural interfaces.



**Pui-In Mak** (Fellow, IEEE) received the Ph.D. degree from the University of Macau (UM), Macau, China, in 2006.

He is currently a Full Professor at the Faculty of Science and Technology, ECE Department, UM, where he is also the Director of the State Key Laboratory of Analog and Mixed-Signal VLSI and the Deputy Director (Research) of the Institute of Microelectronics. His research interests are on analog and radio frequency (RF) circuits and systems for wireless and multidisciplinary innovations.

Prof. Mak has been a fellow of the U.K. Institution of Engineering and Technology (IET) for contributions to engineering research, education, and services since 2018; the IEEE for contributions to radio frequency and analog circuits since 2019; and the U.K. Royal Society of Chemistry since 2020. He received the Tencent Xplorer Prize 2022 and is recognized as one of the Top ISSCC Paper Contributors for the past 70 years of ISSCC. He has been inducted as an Overseas Expert of the Chinese Academy of Sciences since 2018.



Youngcheol Chae (Senior Member, IEEE) received the B.S., M.S., and Ph.D. degrees from Yonsei University, Seoul, South Korea, in 2003, 2005, and 2009, respectively.

During Ph.D. studies, he advanced oversampling ADCs through innovative design techniques, including inverter-based amplifiers. From 2009 to 2011, as a Post-Doctoral Researcher at the Delft University of Technology, Delft, The Netherlands, where he developed high-precision sensors and interface circuits for many applications. After joining Yonsei

University in 2012, he is currently a Full Professor with the Department of Electrical and Electronic Engineering, where he leads the Mixed-Signal IC Group, which focuses on innovative analog and mixed-signal circuits and systems for communications, sensing, and biomedical applications. This has resulted in over 130 peer-reviewed articles in journals and conferences and holds more than 70 patents.

Dr. Chae has been serving as a TPC Member for the International Solid-State Circuits Conference (ISSCC), the Asian Solid-State Circuits Conference (A-SSCC), and the Custom Integrated Circuits Conference (CICC). He received the ISSCC 2021 Takuo Sugano Award for Outstanding Far-East Paper; the Best Young Professor Award in engineering from Yonsei University in 2018; the Hae-Dong Young Engineer Award from IEIE Korea in 2017; the Outstanding Research Award of Yonsei University in 2017, 2019, and 2020; and the Outstanding Teaching Awards of Yonsei University in 2013 and 2014. He was a Distinguished Lecturer (DL) of the IEEE Solid-State Circuits Society (SSCS). He was a Guest Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS.



**E. J. Chichilnisky** received the B.A. degree in mathematics from Princeton University, Princeton, NJ, USA, and the M.S. degree in mathematics and the Ph.D. degree in neuroscience from Stanford University, Stanford, CA, USA.

He worked with the Salk Institute for Biological Studies, San Diego, CA, USA, for 15 years. He is currently a John R. Adler Professor of neurosurgery and a Professor of ophthalmology at Stanford University, where he has working since 2013. His research has focused on understanding

the spatiotemporal patterns of electrical activity in the retina that convey visual information to the brain and their origins in retinal circuitry, using large-scale multi-electrode recordings from primate and human retina. His ongoing work now focuses on using basic science knowledge along with electrical stimulation to develop a novel high-fidelity artificial retina for treating incurable blindness.

Dr. Chichilnisky was a recipient of the Alfred P. Sloan Research Fellowship, the McKnight Scholar Award, the McKnight Technological Innovation in Neuroscience Award, and the Research to Prevent Blindness Stein Innovation Award.



**Boris Murmann** (Fellow, IEEE) received the Dipl.-Ing. (FH) degree in communications engineering from Fachhochschule Dieburg, Dieburg, Germany, in 1994, the M.S. degree in electrical engineering from Santa Clara University, Santa Clara, CA, USA, in 1999, and the Ph.D. degree in electrical engineering from the University of California at Berkeley, Berkeley, CA, USA, in 2003.

From 1994 to 1997, he was with Neutron Mikrolektronik GmbH, Hanau, Germany, where he was involved in the development of low-power and

smart-power application-specific integrated circuits (ASICs) in automotive CMOS technology. From 2004 to 2023, he was with the Department of Electrical Engineering, Stanford University, Stanford, CA, USA, where he served as an Assistant Professor, an Associate Professor, and a Full Professor. He is currently with the Department of Electrical and Computer Engineering, University of Hawai'i at Mānoa, Honolulu, HI, USA. His research interests include the area of mixed-signal integrated circuit design, with an emphasis on data converters, sensor interfaces, and circuits for embedded machine learning.

Dr. Murmann was a co-recipient of the Best Student Paper Award at the Very Large-Scale Integration Circuits Symposium in 2008 and 2021, the Best Invited Paper Award at the IEEE Custom Integrated Circuits Conference (CICC) in 2008, the Agilent Early Career Professor Award in 2009, the Friedrich Wilhelm Bessel Research Award in 2012, and the SIA-SRC University Researcher Award for lifetime research contributions to the U.S. semiconductor industry in 2021. He was the 2017 Program Chair of the IEEE International Solid-State Circuits Conference (ISSCC) and the 2023 General Co-Chair of the IEEE International Symposium on Circuits and Systems (ISCAS).



**Dante G. Muratore** (Senior Member, IEEE) received the B.Sc. and M.Sc. degrees in electrical engineering from the Politecnico of Torino, Turin, Italy, in 2012 and 2013, respectively, and the Ph.D. degree in microelectronics from the Integrated Microsystems Laboratory, University of Pavia, Pavia, Italy, in 2017.

From 2015 to 2016, he was a Visiting Scholar at the Microsystems Technology Laboratories, Massachusetts Institute of Technology, Cambridge, MA, USA. From 2016 to 2020, he was a Post-Doctoral

Fellow at Stanford University, Stanford, CA, USA. Since 2020, he has been an Assistant Professor with the Bioelectronics Section, Delft University of Technology, Delft, The Netherlands, leading the Smart Brain Interfaces Group. His group investigates hardware and system solutions for high-bandwidth brain-machine interfaces that can interact with the nervous system at natural resolution. They contribute solutions for massively parallel bidirectional interfaces, on-chip neural signal processing, and wireless power and data transfer.

Dr. Muratore was a recipient of the Wu Tsai Neurosciences Institute Interdisciplinary Scholar Award.