# A 1024-Channel 268 nW/pixel 36x36 $\mu m^2/\text{ch}$ Data-Compressive Neural Recording IC for High-Bandwidth Brain-Computer Interfaces Jang, Moon Hyung; Yu, Wei-Han; Lee, Changuk; Hays, Maddy; Wang, Pingyu; Vitale, Nick; Tandon, Pulkit; Chae, Youngcheol; Muratore, Dante G.; More Authors 10.23919/VLSITechnologyandCir57934.2023.10185288 **Publication date** 2023 **Document Version** Final published version Published in Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits, VLSI Technology and Circuits 2023 Citation (APA) Jang, M. H., Yu, W.-H., Lee, C., Hays, M., Wang, P., Vitale, N., Tandon, P., Chae, Y., Muratore, D. G., & More Authors (2023). A 1024-Channel 268 nW/pixel 36x36 µm²/ch Data-Compressive Neural Recording IC for High-Bandwidth Brain-Computer Interfaces. In Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits, VLSI Technology and Circuits 2023 (Digest of Technical Papers - Symposium on VLSI Technology; Vol. 2023-June). IEEE. https://doi.org/10.23919/VLSITechnologyandCir57934.2023.10185288 To cite this publication, please use the final published version (if applicable). Please check the document version above. Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. # Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. # A 1024-Channel 268 nW/pixel 36x36 μm²/ch Data-Compressive Neural Recording IC for High-Bandwidth Brain-Computer Interfaces MoonHyung Jang<sup>1</sup>, Wei-Han Yu<sup>2</sup>, Changuk Lee<sup>3</sup>, Maddy Hays<sup>1</sup>, Pingyu Wang<sup>1</sup>, Nick Vitale<sup>1</sup>, Pulkit Tandon<sup>1</sup>, Pumiao Yan<sup>1</sup>, Pui-In Mak<sup>2</sup>, Youngcheol Chae<sup>3</sup>, E.J. Chichilnisky<sup>1</sup>, Boris Murmann<sup>1</sup>, Dante G. Muratore<sup>4</sup> <sup>1</sup>Stanford University, USA; <sup>2</sup>Macau University, Macau; <sup>3</sup>Yonsei University, South Korea; <sup>4</sup>TU Delft, The Netherlands #### **Abstract** This paper presents a neural recording IC featuring lossy compression during digitization, thus preventing data deluge and enabling a compact active digital pixel design. The wired-OR-based compression discards unwanted baseline samples while allowing the reconstruction of spike samples. The IC features a 32x32 MEA with 36µm pixel pitch and consumes 268nW per pixel from a single 1V supply. It achieves $9.8\mu V_{RMS}$ input-referred noise and 0.3-5kHz bandwidth, resulting in NEF/PEF of 3.7/14.1. **Keywords:** neural, recording, brain, interface, compression **Introduction** Next-generation brain-computer interfaces will benefit from dense, high channel count (>1k) microelectrode arrays (MEAs) at the pitch of neurons ( $\sim$ 30µm) to effectively capture spatiotemporal patterns of neural activity. However, as the number of channels increases, the data rate of recording becomes intractable (10k channels digitized at 8b and 20kSps generate 1.6Gbps). While a spike detector (SD) can be used to compress the raw data and transmit a snippet around the spike [1, 2], this approach incurs significant overhead in threshold management (typically per channel) as well as memory to compensate for SD latency. Also, as the MEA density increases, routing congestion worsens, limiting today's interfaces to sub-array digitization [3]. Active digital pixels (ADP) have been used to simplify routing and improve chip area efficiency, but come at the cost of larger pixels [4]. This paper presents a recording IC based on a wired-OR compression scheme [5, 6] that addresses both the data deluge and routing congestion problems. ## **Chip Architecture and Circuit Design** Fig. 1 shows the chip block diagram. Each ADP contains an amplifier, a sampler (f<sub>s</sub>=20kHz), and a continuous-time comparator that drives the local row & column using open-drain outputs (wired-OR). The pixel functions as a ramp ADC, i.e. the acquired samples are compared to a global ramp, leading to an 8b pulse position modulation. The wired-OR array outputs are interpreted by a collision decoder at each of the 256 ramp steps. A collision occurs when two or more pixels in a row (or column) sample the same input at the same time. In this case, the samples cannot be recovered and are discarded. On the other hand, if a pixel samples a unique voltage, the output pulse can be traced back to the pixel location, and the sample is stored. For neural signals, this compression discards a large number of unwanted samples near the signal's baseline while retaining the more important spike samples [5, 6]. Fig. 2 shows an example with primate retina recordings ex vivo. Fig. 3 shows the pixel and ramp generator circuits. The pixel occupies $36x36\mu m^2$ , with one-third allocated for ESD protection. To minimize area, the circuits use only 2.8pF of MOM capacitance on top of ESD and active devices. An AC-coupled boxcar sampler minimizes the noise penalty from noise folding. It uses an inverter-based $G_m$ with a duty-cycled resistor for DC biasing and setting the high-pass corner to $f_{HP}$ =300Hz. The input is integrated on $C_{INT}$ (296 $T_{ck}$ ) and then sampled on $C_{LPF}$ (8 $T_{ck}$ ) to implement a switched-capacitor low pass filter (SC-LPF). The SC-LFP pole and the null from the boxcar result in an overall $f_{\rm LP}{=}5{\rm kHz}.~6T_{\rm ck}$ are required to reset $C_{\rm INT}$ between samples. Hence, the main clock frequency is $f_{\rm ck}{=}310f_{\rm s}{=}6.2{\rm MHz}.$ During integration, the previous sample is compared to the global ramp. The 8-bit conversion phase lasts $(256{+}40)T_{\rm ck}$ to compensate for comparator latency. Comparator auto-zero and offset calibration are used to minimize the offset between channels to the level required by the wired-OR compression. The ramp is generated by integrating a fixed current on a capacitor at each clock cycle. The ramp range $(V_{\rm TOP}{-}V_{\rm BOT})$ and slope can be set to change the ADC resolution between 6-10b. Another option available on chip is to divide the array into subarrays with up to 8 wires per row & column. This allows us to vary the collision rate and, thus, the degree of compression. ### **Measurement Results** The IC was fabricated in 28nm CMOS. Fig. 4 (top) shows the measured frequency response and output spectrum of a The -3dB BW is 0.3-5kHz, and single channel. SNR/SFDR=34.1/63dB with a $500\mu V_{pp}$ input. The measured input referred noise is $9.8\mu V_{RMS}$ (1Hz-10kHz). Fig. 4 (bottom) shows the noise and offset distributions. Pt electrodes (d=17μm) were deposited on each pixel post-fabrication, and pre-recorded neural signals were injected through a Pt wire in saline (Fig. 5, top). The measured spike waveform (Fig. 5, bottom) shows that even with small high-impedance electrodes, the IC can accurately record neural spikes. Fig. 6 (top) shows sinewave measurements to visualize the wired-OR operation. In the first plot, only a single channel carries a sinewave. All samples outside the baseline are captured, while missing samples near the baseline are reconstructed using an interpolation filter. The second case shows two active channels. All critical samples for reconstructing the two signals are still captured, since there is no coincidence of seeing the same digital value at the same time (i.e., no collisions). Fig. 6 (bottom) shows measurements of a neural signal applied to a test channel, where only spike samples are captured at a compression rate of 12.5x (proportional to the spike rate). The die photo is shown in Fig. $\overline{7}$ . The pixel area is 0.0013mm<sup>2</sup>, and the total area is 3.27mm<sup>2</sup>. Each pixel consumes 268nW, and the total power consumption is 508.7 $\mu$ W (Fig. 8) with a 1V supply. Table I shows a performance comparison. This IC achieves the smallest area per channel and highest power efficiency (NEF/PEF) while providing rail-to-rail electrode DC offset (EDO) tolerance and dealing with data deluge and routing congestion problems through wired-OR compression. **Acknowledgements:** Chip fabrication was provided by the TSMC University Shuttle Program. Funding support: Wu Tsai Neurosciences Institute, Stanford Nanofabrication Facility, NIH Grants EY021271, EY032900. ## References [1] S. Park, *IEEE JSSC*, 2018. [2] D.-Y. Yoon, *VLSI*, 2021. [3] C.M. Lopez, *IEEE ISSCC*, 2018. [4] D. Wendler, *IEEE JSSC*, 2022. [5] D.G. Muratore, *IEEE TBCAS*, 2019, [6] P. Yan, *IEEE BioCAS*, 2022. [7] X. Yang, *VLSI*, 2022. [8] S. Wang, *VLSI*, 2021. [9] S. Wang, *IEEE TBCAS*, 2019.