# High Voltage and Nanoscale CMOS Integrated Circuits for Particle Physics and Quantum Computing Zur Erlangung des akademischen Grades eines #### **DOKTOR-INGENIEURS** von der KIT-Fakultät für Elektrotechnik und Informationstechnik des Karlsruher Instituts für Technologie (KIT) genehmigte #### **DISSERTATION** von M.Tech. Mridula Prathapan geb. in: Trivandrum Tag der mündlichen Prüfung: 28.01.2020 Hauptreferent: Prof. Dr. rer. nat. Ivan Perić Korreferent: Prof. Dr. ir. Paul Leroux I hereby declare that I have created this work completely on my own and used no other sources or tools than the ones listed, and that I have marked any citations accordingly. Hiermit versichere ich, dass ich die vorliegende Arbeit selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt sowie Zitate kenntlich gemacht habe. Karlsruhe, December 2019 Mridula Prathapan ### **Abstract** The original contributions of this dissertation are readout architectures for monolithic pixel detectors for the ATLAS experiment and an 8-bit SAR ADC in a 28 nm bulk-CMOS process. Three generations of high voltage CMOS pixel sensor chips have been designed to meet the requirements of ATLAS inner tracker layer 4. The first generation ASIC, ATLASpix1, introduces an asynchronous triggered readout architecture that can cope with high particle hit rates. For this purpose, a content addressable buffer with programmable latency to store and filter the hit data based on the level-1 trigger was designed. A digitally synthesized control unit is responsible for scheduling the readout operation from the pixel matrix and transferring hit data to the serial link. Some of these blocks were reused in the subsequent generations of ATLASpix chips with added features. An RTL verification environment was set up for full-chip functional verification. ATLASpix1 measurement results show that the serial data link works at the required rate of 1.28 Gbps. X-ray irradiation studies show that readout electronics are fully functional after a total ionization dose of 100 MRad. The second generation prototype, ATLASpix2, introduces the concept of sorting of hits according to the chronology of events. It has a smart pixel grouping and hit neighbor logic in the buffer for time-walk correction. It also stores time over threshold information that corresponds to the energy of particle hit. The content addressable buffer of ATLASpix1 was redesigned to include additional features. The third generation, ATLASpix3, is a $2\times2$ cm<sup>2</sup> sensor chip that is suitable for the construction of CMOS quad modules. It features a built-in command decoder with clock data recovery and a single channel Aurora 64b/66b encoder. As the final project during this dissertation, a test chip was designed and fabricated in a 28 nm bulk-CMOS process to evaluate the radiation hardness and cryogenic performance of a nanoscale process node. The test circuits include a 100 MS/s 8-bit SAR ADC, with its possible application in a Quantum Computing control system. vi Abstract # Zusammenfassung Der Beitrag dieser Dissertation ist die Auslesearchitektur für monolithische Pixeldetektoren für ATLAS Experiment und der 8-Bit SAR ADC im 28 nm Bulk-CMOS-Prozess. Drei Generationen von CMOS Pixelsensoren wurden mit dem Ziel entworfen, die Anforderungen für den inneren Spurendetektor des ATLAS Experiments, Lage 4, (engl. ATLAS inner tracker) zu erfüllen. Im ASIC erster Generation - dem ATLASpix1 - wurde die asynchrone getriggerte Auslesearchitektur eingeführt, welche mit den hohen Teilchentrefferraten umgehen kann. diese Auslese wurde ein inhaltsadressierbarer Puffer (engl. content addressable buffer) mit programmierbarer Latenz entworfen, dessen Aufgabe es ist die Trefferdaten zu speichern und basierend auf dem Level-1 Trigger zu filtern. Eine digitale synthetisierte Kontrolleinheit ist für die Zeitplanung (scheduling) von Ausleseoperationen aus der Pixelmatrix und die Übertragung von Trefferdaten zum seriellen Link verantwortlich. Einige dieser Blöcke wurden von vorherigen Generationen der ATLASpix Chips übernommen. Es wurden dabei Erweiterungen und neue Schaltungen hinzugefügt. Eine RTL Verifikationsumgebung für die funktionelle Verifikation vom ganzen Chip wurde aufgesetzt. ATLASpix1 Messungen zeigen, dass der serielle Datenlink die spezifizierte Bitrate von 1.28 Gbit/s erreicht. Bestrahlungsstudien mit Röntgenstrahlung zeigen, dass die Ausleseelektronik auch nach der Gesamtionisierungsdosis von 100 MRad voll funktionsfähig ist. Der ASIC zweiter Generation ATLASpix2 führt das Konzept der Sortierung von Trefferdaten anhand der Ereignischronologie ein. Er implementiert geschickte Pixelgruppierung und die Nachbarlogik für die Korrektur des timewalk Effekts. Der ASIC speichert die Zeit-über-der-Schwelle-Information (engl. time-over-threshold), welche der Energie des Teilchentreffers entspricht. Der inhaltsadressierbare Puffer von ATLASpix1 wurde neu entworfen, um zusätzliche Funktionen zu ermöglichen. Der ASIC dritter Generation ATLASpix3 ist ein $2 \times 2$ cm $^2$ großer Sensorchip, geeignet für die Konstruktion von CMOS Quadmodulen. Er enthält einen Befehlsdekoder mit Taktrückgewinnung (clock data recovery) und einen Einzelkanal-Aurora 64b/66b Kodierer. Als letztes Projekt dieser Dissertation wurde ein Testchip in 28 nm Bulk-CMOS-Prozess entworfen und hergestellt. Das Ziel des Projekts war es die Strahlenhärte und die kryogenen Eigenschaften von diesem nanoskaligen Prozess zu untersuchen. Die Testschaltungen umfassen einen 100 MS/s 8-Bit SAR ADC, mit der Möglichkeit der Anwendung in Kontrolsystemen von Quantencomputern. # **Contents** | | Abs | tract | | V | |---|------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | | Zus | ammer | ufassung vi | ii | | | Con | ventio | ns x | v | | | Prea | amble | XV | ii | | 1 | Intr | oductio | on Control of the Con | 1 | | | 1.1 | HL-L | HC: Demands and Challenges on ASIC design | 1 | | | 1.2 | State- | of-the-art: Hybrid vs Monolithic approach | 4 | | | | 1.2.1 | Hybrid pixel detector | 5 | | | | 1.2.2 | Monolithic CMOS pixel detector | 7 | | | | 1.2.3 | Readout Architecture | 0 | | | | 1.2.4 | Serial data link | 0 | | | | 1.2.5 | Powering schemes | 1 | | | | 1.2.6 | Radiation tolerance | 1 | | | | | Effect of radiation on MOSFETS | 2 | *x* Contents | | | 1.2.7 | Radiation tolerant IC design | 17 | |---|-----|--------|----------------------------------------------------------------------|----| | | | 1.2.8 | Exploring new technologies for radiation tolerance and fast timing | 17 | | | 1.3 | | of history of pixel detectors and their readout chips from LHC point | 18 | | | 1.4 | ATLA | S ITk Upgrade Layer 4: Requirements for CMOS detectors | 19 | | 2 | ATI | .ASpix | 1: The first large area HVCMOS prototype | 21 | | | 2.1 | Introd | luction | 21 | | | 2.2 | ATLA | Spix_M2 | 23 | | | | 2.2.1 | HVCMOS sensor | 25 | | | | 2.2.2 | Pixel electronics | 26 | | | 2.3 | Reado | out architecture | 27 | | | | 2.3.1 | Parallel Pixel to Buffer (PPtB) transfer | 29 | | | | 2.3.2 | CAB Buffer | 31 | | | | 2.3.3 | The Readout Control Unit (RCU) | 33 | | | | | Task scheduler: The readout state machine | 34 | | | | | 8b/10b pipelined encoder | 35 | | | | | Serializer | 38 | | | 2.4 | Summ | nary | 39 | | 3 | ATI | .ASpix | 1: Measurement | 41 | | | 3.1 | Introd | luction to test system | 41 | | | 3.2 | Labor | atory tests | 42 | Contents xi | | | 3.2.1 | Triggered readout | 43 | |---|-------------------|------------------------------------|-----------------------------------------------------------------------------------------------------------------------|----------------------------------------| | | | 3.2.2 | Threshold tuning | 43 | | | | 3.2.3 | Serial link | 44 | | | | 3.2.4 | Time-walk measurement | 45 | | | 3.3 | X-ray | irradiation tests | 48 | | | | 3.3.1 | Leakage current | 48 | | | | 3.3.2 | Signal to Noise Ratio | 51 | | | | 3.3.3 | Threshold Tuning | 52 | | | | 3.3.4 | Power consumption | 54 | | | | 3.3.5 | Summary | 56 | | | | | | | | 4 | ATL | ASpix | 2 : A multi project wafer run in AMS/TSI 180nm | 57 | | 4 | <b>ATI</b><br>4.1 | | 2: A multi project wafer run in AMS/TSI 180nm uction | | | 4 | | Introd | | 57 | | 4 | 4.1 | Introd | uction | 57 | | 4 | 4.1 | Introd | ruction | 57<br>58<br>58 | | 4 | 4.1 | Introd<br>Archit | ruction | 57<br>58<br>58<br>59 | | 4 | 4.1 | Introd<br>Archit<br>4.2.1<br>4.2.2 | ruction | 57<br>58<br>58<br>59 | | 4 | 4.1 | Introd<br>Archit<br>4.2.1<br>4.2.2 | ruction | 57<br>58<br>58<br>59<br>61 | | 4 | 4.1 | Introd<br>Archit<br>4.2.1<br>4.2.2 | Pixel grouping and hit neighbor logic CAB buffer The Readout Control Unit (RCU) Time Stamp generator | 57<br>58<br>58<br>59<br>61<br>63 | | 4 | 4.1 | Introd<br>Archit<br>4.2.1<br>4.2.2 | Pixel grouping and hit neighbor logic CAB buffer The Readout Control Unit (RCU) Time Stamp generator Trigger FIFO | 57<br>58<br>58<br>59<br>61<br>63<br>63 | xii Contents | | | Clock generation and timing | 68 | |---|------|---------------------------------------------------------------------------|----| | | | Serializer | 73 | | | | 8b/10b Aurora encoder | 77 | | | | Data packaging unit and synchronizer | 78 | | | 4.3 | Summary | 80 | | 5 | | ASpix3: A reticle-size chip for HVCMOS quad-module construction ATLAS ITk | 83 | | | 5.1 | Introduction | 83 | | | 5.2 | Architecture of ATLASpix3 | 83 | | | 5.3 | Readout periphery design | 84 | | | | 5.3.1 The Readout Control Unit (RCU) | 85 | | | | Readout controller | 85 | | | | Aurora 64b/66b encoder | 86 | | | | Command decoder | 88 | | | | Trigger generator | 89 | | | | Clock generation and timing | 91 | | | 5.4 | Summary | 92 | | 6 | Full | -chip verification and timing closure | 93 | | | 6.1 | Introduction | 93 | | | 6.2 | Mixed-mode simulation | 94 | | | 6.3 | RTL Design and Verification Environment | 94 | Contents xiii | | | 6.3.1 | Full chip behavioral model | 95 | |---|-----|----------|---------------------------------------------------------------------|-----| | | | 6.3.2 | RTL Test bench | 96 | | | | | Data receiver | 97 | | | | 6.3.3 | Simulations using Readout Modeling Environment (ROME) . | 100 | | | 6.4 | Physic | ral design | 102 | | | 6.5 | Summ | nary | 102 | | 7 | Sum | ımary o | of ATLASpix designs | 103 | | 8 | Des | ign of a | in 8-bit SAR ADC in a 28 nm bulk-CMOS process | 107 | | | 8.1 | Motiv | ation | 107 | | | 8.2 | 28 nm | high-k metal gate bulk-CMOS process | 107 | | | 8.3 | Curre | nt state-of-the art ADCs in 28 nm | 108 | | | 8.4 | Possib | le applications of the ADC prototype | 112 | | | | 8.4.1 | Requirements and challenges of control system for Quantum computing | 112 | | | | 8.4.2 | Current state-of-the-art ADCs in QC system | 114 | | | 8.5 | The de | esign of an 8-bit 100 MSa/s SAR ADC | 116 | | | | 8.5.1 | The Sample and hold circuit | 117 | | | | 8.5.2 | Comparator design | 122 | | | | | Working of a regenerative latch | 123 | | | | | Latched comparator : Implementation in 28 nm | 128 | | | | 8.5.3 | D/A Converter | 132 | <u>xiv</u> Contents | | | 8.5.4 | Readout circuitry and top-level simulation | 134 | |---|------|----------|-------------------------------------------------------------|-----| | | | 8.5.5 | IO interface | 136 | | | | 8.5.6 | Top-level integration | 136 | | | 8.6 | Test se | etup | 137 | | | 8.7 | Measu | rement results | 138 | | | 8.8 | Summ | nary | 138 | | 9 | Con | clusion | ı | 141 | | A | Dep | letion 1 | region depth in a High Voltage CMOS sensor | 147 | | | A.1 | P-N ju | unction under reverse bias | 149 | | | A.2 | Relatio | on between substrate resistivity and acceptor concentration | 152 | | В | ADO | C archit | tecture proposals and feasibility analysis | 155 | | | | B.0.1 | A VCO based delay line ADC | 155 | | | | B.0.2 | Characterization of a delay chain | 156 | | | | B.0.3 | A delay line based single-shot ADC | 160 | | | B.1 | Comp | arison with SAR architecture | 162 | | | B.2 | SAR A | ADC Verilog model | 163 | | | Ack | nowled | lgements | 167 | | | Pub | licatior | ns — — — — — — — — — — — — — — — — — — — | 169 | | | Bibl | iograp] | hy | 173 | # **Conventions** Throughout this thesis the following conventions are used. *Text conventions* Definitions of technical terms or short excursus are set off in colored boxes. Source code and implementation symbols are written in typewriter-style text. eg: myClass The whole thesis is written in American English. URL links are set off in blue color. Citations are marked in (author, year) format and are set off in green color. **xvi** Conventions ### **Preamble** This dissertation seeks answers to the most diverse research questions I studied during my research term at the KIT ASIC and Detector Laboratory (KIT-ADL). It aims to share the experience gained by designing Application Specific Integrated Circuits (ASIC)s for scientific applications such as High Energy Physics experiments (HEP) as it addresses the most fundamental questions of our existence. It is undeniable that electronics play a crucial role in interfacing the "real" and the "big data" world. The LHC upgrades have, by far, the single largest ASIC needs for HEP this decade [28]. The ASIC design effort required for LHC Upgrade was too large for any single institution to handle. As a result, several collaborations have been formed, such as CMS Collaboration ([25]), ATLAS Collaboration ([7]), RD53 Collaboration ([21]). The sensors described in this dissertation are monolithic, which means the sensor and readout circuitry are integrated on the same substrate. They are also called high voltage CMOS (HVCMOS) sensors because they are fabricated using a commercial high voltage CMOS processes. The application of high voltage enhances the thickness of a depletion region, which is the active sensor volume (Appendix A). For many reasons, cited by [86], Monolithic Active Pixel Sensors (MAPS) have the potential to be the imaging lenses of current and future collider experiments. Following this vision, KIT-ADL has leaped forward to develop High Voltage CMOS (HVCMOS) pixel sensors in collaboration with six different institutions. The sensor chips described in this dissertation are engineered to meet the requirements of ATLAS Inner Tracker (ITk) layer 4. Mu3e experiment ([16]), will be the first to use HVCMOS sensors for particle physics. HV/HRCMOS sensors are also proposed for CLIC ([69]) and LHCb Upgrade ([26]). In an attempt to contribute to this massive ASIC development, I have chosen one of the areas of my research as "Development of readout electronics of HVCMOS sensors for HEP experiments". Over three-quarters of this Ph.D. work was dedicated to designing the readout circuitry of HVCMOS sensor xviii Preamble chips. Three generations of HVCMOS chips, named ATLASpix, were designed in 180 nm AMS ah18 process. The first generation included a design variant named ATLASpix1\_M2, that supports triggered readout (Chapter 2). The second generation, ATLASpix2 contains several novel readout concepts (Chapter 4). ATLASpix2 tape-out took longer than expected since there was a change of foundry from AMS AG to TSI Semiconductors. It was relatively simple to port the designs since both AMS and TSI processes were progenies of the same parent, Global Foundary (former IBM) CM7RF process. We used an older design kit named "h18" to port AMS "ah18" schematics. ATLASpix2 was taped out in both AMS and TSI processes. Although it was not tested at KIT-ADL, the HVCMOS collaboration came up with post-silicon results proving good correlation of the sensor diode characteristics between TSI and AMS chips [120]. The development cylce of each HVCMOS ASIC ranged from six months to one year. I had spent some time characterizing the first large area HVCMOS prototype and studying the effects of X-ray radiation on its electronics (Chapter 3). Post-silicon characterization of ATLASpix1\_M2 was undertaken by this work. A new threshold tuning algorithm was implemented and integrated into the existing test software framework. The test system for ATLASpix was developed at KIT-ADL as part of Felix Ehrler's Ph.D. work [33]. An increase in the number of institutions contributing to the development of monolithic pixel detectors resulted in several collaboration meetings to discuss the design updates and present the ongoing measurements. ATLASpix3 (Chapter 5) was the result of a strategic decision to design a monolithic HVCMOS chip which can be used for building ITk pixel layer compatible multi-chip modules. It was of prime importance to ensure compatibility with the hybrid pixel readout chip, RD53A, that was designed for ITk inner layers. The readout control unit of ATLASpix3 was re-designed to incorporate 64b/66b Aurora encoding. A command decoder with clock data and trigger recovery was designed for chip configuration and read back. The overall ATLASpix3 readout architecture was optimized for ATLAS ITk layer 4 using a simulation framework developed at KIT-ADL as part of Rudolf Schimassek's Ph.D. work [109]. During the third year of my research term at KIT, I had the idea of piloting a sub-40 nm process for a cryogenic application such as Quantum Computing (QC). I decided to implement an ADC as a test circuit along with transistor test structures. TSMC 28nm High Performance Computing (HPC) was the most advanced process node supported by Europractice at that time. As I was deeply intrigued by the complexity of Quantum Computing systems and its circuit design challenges, I set off to design an 8-bit SAR ADC for qubit readout (Chapter 8). In order to arrive at the specifications of the ADC, a case study has Preamble xix been conducted on existing control circuitry for QC [54] at QuTech, TU Delft. A face to face discussion with the team at QuTech helped me gain an insight into the specifications of an ADC which, if silicon-proven, can be used in a QC system like the one described in [84]. In the early phase of design, there were several new ADC architectural ideas (Appendix B). Some of those were discussed and evaluated with expert help from researchers at ADVISE, KU Leuven. It was a logical consequence to adopt SAR architecture in order to achieve a competitive figure-of-merit with the current state of the art. This test chip (TC1) might not be the most noteworthy contribution in this work, but it laid the cornerstone of my research journey. All design efforts have been driven toward meeting the ever-increasing Power, Performance, Area (PPA) demand while thriving on the middle ground. One piece of advice I have received quite often is, as we progress toward advanced technology nodes, one should pay attention to his design methodology. On the contrary, it is imperative to stay focused on problem-solving rather than on methodology. The existing methodology, be it experimental or computational, will always be succeeded by superior ones. This dissertation has been an eye-opener to pin-point my research interests and develop the right approach toward problem-solving. I believe this experience will help me a great deal as I continue my research career in the development of cryogenic CMOS circuits for Quantum Computing control system at IBM Research Lab, Zurich. This dissertation is structured as follows: Chapter 1 provides a historical background of the research activities in monolithic pixel detectors. It explains the concept, implementation, and challenges of HVCMOS sensor design for the ATLAS experiment. Chapter 2 delves deep into the design of the first large area HVCMOS prototype, ATLASpix1. Various readout architectures are compared using an analogy of a single server queue. Chapter 3 describes the measurement results of one of the ATLASpix1 design variants and provides a bird's eye view of the test system. It also describes an X-ray irradiation study up to a total ionization dose of 100 MRad, which is equivalent to 10 years of operation at HL-LHC. Chapters 4 and 5 present the design details of successive HVCMOS sensor chips, namely ATLASpix2 and ATLASpix3, with emphasis on their readout electronics. Chapter 6 describes an RTL verification environment developed as a part of this work. It served as a full chip verification platform during the early design phase. **xx** Preamble Chapter 7 presents an evaluation of ATLASpix sensor ASICs with the current state of the art. Chapter 8 aims to transition from pixel sensor design in $180 \, \text{nm}$ to an 8-bit $100 \, \text{MS/s}$ SAR ADC design in $28 \, \text{nm}$ . The scope of this dissertation is confined within the design and preliminary measurement results. Chapter 9 is an evaluation and conclusion of this work from the author's point of view. It also throws some light on the prospects of this research. # Chapter 1 ## Introduction ### 1.1 HL-LHC: Demands and Challenges on ASIC design CERN has planned a series of upgrades for the Large Hadron Collider (LHC). The last in this series is termed as the High Luminosity LHC (HL-LHC), which will be operational from 2026. The ATLAS detector will have an all-sillicon tracker in its Phase II upgrade [7]. This includes changes to the pixel-barrel and end-cap strip detectors. In addition, the muon detector will be modified and the muon and electron triggers will be modified to improve muon resolution. In this way, trigger rates can be brought under control while maintaining constant trigger thresholds [52]. Performance requirements are critical for the innermost layer, which will be at a radius of 3-4 cm from the HL-LHC interaction point. The outer layers demand larger pixels, low-cost assembly, data aggregation, and reduced power at lower performance [92]. Technical challenges arise when there is an increased demand for performance and when the operating conditions lie well outside of industrial applications. The latter presents a challenge for today's ASIC design, which relies heavily on the accuracy of simulation models. Here are a few challenges that need extensive R&D. #### 1. High-bandwidth transmission: Future detector systems for HL-LHC require the transmission of large data volumes from the detector. High speed links are used for applications where streaming data out of the detector with off-detector triggering and filtering is preferred. ASICs are needed to serialize the in-detector data where commercial devices cannot be used because of e.g. radiation requirements. Current research is focused on links between 5 and 10 Gbps [41]. The use 2 1 Introduction of optical modulators to achieve a bandwidth of more than 10 Gbps is an ongoing research. #### 2. In-detector digitization, data compression, and processing: We need to transmit particle hit data from the detector to a remote location where the data is processed. One technique is to store as much data on detector until the decision is made on what data needs to be transmitted to the data acquisition system. For example, the integration of moderate-speed (a few MS/s) ADCs in front-end ASICs enable on-chip Digital Signal Processing (DSP). This means, it is possible to have self-calibration, smart digital triggering, data compression, digital memories, and fully digital communication. On-chip digitization results in the reduction of complexity and bandwidth of data acquisition systems. #### 3. Radiation tolerance: The operating conditions of High Energy Physics (HEP) experiments lie well outside the coverage of device models supplied by standard IC manufacturers. This problem can be addressed in the following ways: (1) using a special IC manufacturer that supports the requirements (2) qualifying a standard process and prove that the models provided by the foundry are valid for the desired conditions, (3) developing custom devices and models to meet the requirements. Radiation tolerance can be divided into two areas, total ionizing dose (TID) along with Non-Ionizing Energy Loss (NIEL)(e.g., from neutrons or other hadrons) and single event upset (SEU) tolerance. The next-generation hybrid pixel readout chips for ATLAS and CMS, need the highest radiation tolerance, specified as 1 Grad TID [39]. The pixel detectors developed in this work targets ATLAS inner tracker layer 4 where the required radiation tolerance is 100 MRad and $1\times10^{15}$ to $2\times10^{15}~\rm n_{eq}/cm^2$ NIEL. In-time efficiency after full irradiation should be >95% [92]. #### 4. Low-temperature Almost all the commercial CMOS vendors focus their models in temperature ranges from -40C to 125C. The device lifetimes are targeted for at least ten years. The design of cryogenic front-end ASICs for HEP requires models capable of accurately reproducing the static and dynamic response, the noise performance, and lifetime of CMOS devices and circuits operating down to the -200C/70K range. These models must extend down to the weak-to-moderate inversion region, considering the low-power requirements on analog circuits. #### 5. Non-standard processing Standard integrated circuits contain multiple interconnect layers and a single layer of transistors. 3D technologies, however, allow the more than one transistor layer. This facilitates many transistors to be physically close to one another when compared to 2D integrated circuit. It also helps to lower the capacitance of interconnects. Another method is to use bump bonds on both sides connected by through-hole vias. An example of non-standard processing are Monolithic Active Pixel Sensors (MAPS) that employs a quadruple well 180 nm bulk CMOS process. Also Silicon-On-Insulator (SOI) processes are considered. #### 6. High dynamic range One of the figures of merit for front-end electronics is the dynamic range which is defined as the ratio of maximum and minimum measurable charge. Dynamic range can be limited by circuits that follow the analog front-end, such as discriminators and peak detectors. A major challenge with deep submicron technologies comes from the decreased supply voltage. In order to achieve a high dynamic range of a few thousands of electron volts, low-noise design techniques must be adopted. #### 7. Fast timing The measurement of the time of arrival of a sensor signal relative to a reference clock requires a good match between the analog signal processing and the time measurement domain. For signals with fixed shapes, time invariant techniques such as constant fraction or zero-crossing have already been implemented. In analog waveform sampling ASICs, input waveforms are sampled typically via Delay Locked Loops (DLL). The trade-off is the number of storage cells per channel and the maximum analog input bandwidth. Dynamic range is limited by the maximum supply voltage and the size of the sampling capacitor (kT /C noise), which in turn limits the maximum input bandwidth. Sampling rates increase with smaller feature sizes or faster processes. #### 8. Reliability The longevity of semiconductor devices is important for HEP experiments targeted to run reliably for more than ten years. Most of the major failure mechanisms such as electro-migration, hot carrier injection, time dependent dielectric breakdown, and negative bias temperature instability need exhaustive modeling for use in HEP. Another area of concern is, the operating temperature of 70 K, which is well below the minimum temperature guaranteed by CMOS foundries (233 K). Impact ionization, which causes interface state generation and oxide trapped charge can substantially affect the lifetime of CMOS devices. 1 Introduction #### 1.2 State-of-the-art: Hybrid vs Monolithic approach High Energy Physics (HEP) experiments like the LHC has been the single largest demand for ASIC this decade. Keeping in mind the above-mentioned requirements and challenges, we need to develop high performing, radiation hard, and costefficient pixel detectors. Two main approaches are: 1) Hybrid sensors, where the sensor and the readout chip are separate entities that are connected by bump-bonds and, 2) Monolithic pixel sensors, which is a system-on-chip with integrated sensor and readout. Some experiments such as ALICE, BELLE II, and STAR have chosen monolithic pixel sensors for their upgrade. For ALICE ITS upgrade, a $10\ m^2$ large monolithic pixel detector will be installed [2]. It is by far, the largest pixel detector using monolithic pixel sensors. Monolithic pixel detectors have been proposed for the phase II upgrade of ATLAS Inner Tracker (ITk) pixel barrel layer 4. This work is about the development of three monolithic CMOS pixel sensors to prove its ability to handle such extreme requirements. **Table 1.1:** Particle hit-rate and radiation levels of various HEP experiments | | STAR Belle II | | ALICE | ILC | LHC | HL-LHC | | | |----------------------------|---------------|--------------------|-------------|-----------|--------------------|-----------|--------------------|--| | | SIAK | Delle II | ALICE | ILC | LIIC | Inner | Outer | | | BX-time (ns) | 110 | 2 | 20,000 | 350 | 25 | 25 | 25 | | | Hit Rate (kHz/ $mm^2$ ) | 4 | 400 | 10 | 250 | 1000 | 1000 | 10,000 | | | NEIL ( $\rm n_{eq}/cm^2$ ) | $10^{12}$ | $3 \times 10^{12}$ | $> 10^{13}$ | $10^{12}$ | $2 \times 10^{15}$ | $10^{15}$ | $2 \times 10^{16}$ | | | TID (Mrad) | 0.2 | 20 | 0.7 | 0.4 | 80 | 50 | > 1000 | | Let us evaluate the hybrid and monolithic design approaches by taking the following factors into consideration. 1) spatial resolution 2) radiation tolerance 3) complexity of readout circuitry 4) area and cost. Since the sensor and readout chip are separate in hybrid pixel technology, both these parts can be tailored independently to meet the demands. i.e., the sensor can be optimized for radiation tolerance, and the readout chip can be made to process high particle hit rates of the order of MHz/mm<sup>2</sup>. Hybrid pixels have proven their radiation hardness up to $10^{15} \text{ n}_{eq}/\text{cm}^2$ and beyond. Spatial resolutions of the order of 10 $\mu\text{m}$ have been demonstrated [131]. The technology used in hybrid detectors ROICs has been trending towards deep submicron commercial CMOS, in order to meet the increasing particle hit rate and radiation tolerance demands. The monolithic detectors have also made their way to achieving high detection efficiency and radiation tolerance. Monolithic sensors developed in this work are proven to be radiation hard up to 100 MRad (Chapter 3) in laboratory tests. Test beam studies show that an irradiated ATLASpix chip up to a fluence of $10^{15}$ $n_{eq}/cm^2$ exhibits an average detection efficiency of 99.4% [58]. Monolithic approach needs to set limitations on its features since the sensor and readout circuitry are integrated. The **Figure 1.1:** Hybrid pixel detector hybrid approach also has some disadvantages. It has a large material budget due to different components such as sensor, ROIC, flex kapton, passive components, cooling structures, and maintenance. The module production including bump-bonding and flip-chipping is complex owing to the large number of production steps. Increased cost efficiency with radiation tolerance has been the driving force behind the development of monolithic pixel sensors, especially for the outer layers of ATLAS ITk, where the particle hit rate is less compared to inner layers. The following sections elaborate on various design aspects of hybrid and monolithic sensors. #### 1.2.1 Hybrid pixel detector Figure 1.1 shows the principle of hybrid pixel technology. A silicon pn-diode acts as sensor and a readout chip, which maps 1-1 with every pixel cell are connected via bump-bonds using flip-chip technology. When an ionizing particle passes through the sensor, electron-hole pairs are generated. They move under the influence of an applied electric field, thereby inducing a signal on the pixel electrode. The signal is amplified, discriminated, and digitized by the electronics circuitry in the pixel cell and transmitted to the readout chip. Current hybridization technology can afford bump pitches in the order of 25-50 $\mu$ m, which will eventually be limited to 5-10 $\mu$ m by galvanic or evaporation methods [62]. Since there is a one to one correspondence between the sensor's pixel area and the chip's pixel area, the smallest pixel size is determined by the area of the CMOS electronics that needs 6 1 Introduction **Figure 1.2:** CCPD\_LF chip [49]: Block diagram of the readout circuitry of the CCPD\_LF chip attached to the FE-I4 readout chip. Three pixels of the CCPD\_LF chip are connected to one pixel of the FE-I4 chip. Wave forms of the top pixel (Pix1), middle pixel (Pix2), bottom pixel (Pix3) at each point are shown in blue, green and red, respectively to amplify, digitize, and store the hit information in the area occupied by the pixel. For a detector thickness of about $150-200~\mu m$ , the spread of the signal caused by Minimum Ionizing Particle (MIP) is calculated to be $4-8 \mu m$ . Pixel pitch well below this may lead to excessive charge sharing. [41]. Smaller pixel sizes can be achieved by employing smart sensors proposed by [85]. Such a sensor has pixels that can distinguish the particle signals from noise, by means of pulse height discrimination. The pixel position is encoded as amplitude. Several pixels can be grouped and their addresses can be transmitted through a common line. Three smart pixels of size $33 \times 125 \ \mu \text{m}^2$ can be capacitively coupled to one FE-I4, the second generation standard readout chip. The corresponding FE-I4 pixel size is $50 \times 250 \ \mu \mathrm{m}^2$ . The pixel that was hit can be decoded using its ToT value. This approach has demonstrated good results in a test beam study conducted by [49]. CCPD sensor chip developed at KIT-ADL, [139] can be readout using RD53A, the third generation standard readout IC (ROIC). Typical noise values for the present LHC detectors are 150 e<sup>-</sup> [1] which has been reduced to 80 e<sup>-</sup> using smaller pixel sizes in [39]. **Figure 1.3:** (a) Depleted Monolithic Active Pixel Sensor (DMAPS) in a quadruple well process, deep n-well acts as the charge collection electrode (b) capacitances associated with the sensor #### 1.2.2 Monolithic CMOS pixel detector Modern CMOS processes allow us to integrate numerous functionalities on a chip. A monolithic pixel detector is essentially a System on Chip (SoC). The sensor diode, analog and digital signal processing, storage, power regulation, monitoring, and safety functions are implemented on-chip. Monolithic pixel detectors are being studied for their potential applications in HEP experiments, which is the major topic of this dissertation. Monolithic detectors have the potential to provide low cost high precision tracking solution with simplified interconnections compared to hybrid pixel detectors. Different design approaches are pursued within the CMOS design community to achieve high radiation tolerance and cope with high particle hit rate environments such as the HL-LHC. The monolithic sensor design may require some process tweaks such as 1) high resistivity substrates 2) high-voltage application 3) multi-well technologies. The first fully depleted high resistive silicon sensors with integrated readout were developed for smartphone imaging in commercial CMOS processes. However, the radiation tolerance is an extra challenge when it comes to HEP applications. In early 1990s, the idea of employing commercial CMOS technology to design a monolothic CMOS detector was proposed by [83]. A few years later, Monolithic Active Pixel Sensors (MAPS) were introduced where an epitaxial layer (1-20 $\mu$ m) was used for sensing and containing the CMOS circuitry [123]. The charge collection mechanism was by diffusion rather than by drift. This causes low amplitude signals and less radiation tolerance. MAPS saw their first usage in experiments with low radiation levels and low hit rates such as STAR at RHIC [27]. ALICE ITS upgrade chose to use MAPS based on 180nm Towerjazz process called the ALPIDE sensor [73]. An improved solution for MAPS for faster charge collection by drift was proposed by [87]. Initially termed as HVMAPS and later HVCMOS, this new sensor type uses a low doped PN junction with a high reverse Introduction 8 Figure 1.4: Capacitive coupling of sensor and amplifier bias as sensor. A space charge region is induced by the application of high reverse bias. When a particle traverses through this region, electron-hole pairs are generated, which drift toward the charge collection electrode. The fundamental difference between HVCMOS sensors and MAPS is that the charge collection happens by drift under the influence of a strong electric field. A test chip was designed in a high voltage 0.35 $\mu m$ CMOS process. The depth of the depletion region, d is given by the equation 1.1 (see derivation in A), where $\rho$ is the substrate resistivity and V is the applied reverse bias voltage. $$d \propto \sqrt{\rho V} \tag{1.1}$$ The basic requirements for high signal-to-noise ratio (SNR) in HVCMOS sensors are high voltage and high substrate resistivity. High voltage technology is commonly used in the automotive and power management industry. There are a number of foundries that offer CMOS processes with high voltage handling capability that enables the creation of a depletion layer in a well of depth in the order of tens of microns. Multiple wells can be used to isolate transistors in order to optimize their radiation hardness. Negotiation with the foundry is required for such processes. Sometimes design rule changes are necessary. High resistive substrate is another requirement for HVCMOS to induce a depletion layer with moderate bias voltages. Backside processing facilitates additional control over the electric field within the device. Detector capacitance is another important parameter that affects the performance of the sensor. The most significant contribution is from the well to well capacitance, $C_{pn}$ between p-well and the deep n-well (figure 1.3b), which can be as large as 100 fF depending on the fill-factor area. This adds up to the total detector capacitance, $C_D$ . A large capacitance at the Charge Sensitive Amplifier (CSA) input results in increased thermal noise and detector response time. $$ENC_{thermal}^2 \propto \frac{4kTC_d^2}{3g_m\tau}$$ (1.2) $$\tau_{CSA} \propto \frac{C_d}{g_mC_f}$$ (1.3) $$\tau_{CSA} \propto \frac{C_d}{g_m C_f}$$ (1.3) There are two design variants taking into account the following facts: - 1. A smaller detector capacitance will lead to low noise and response time, consequently improved time resolution. - 2. A large fill factor will have a uniform electric field, large signal-to-noise ratio and an increased radiation tolerance at the cost of detector response time. The first approach is called "small charge collection electrode" or "small fill-factor" approach since the deep n-well which acts as the charge collection electrode has a small area. The CMOS circuits are placed outside deep n-well. Owing to the small fill factor in a modified CMOS process, the node capacitances are reduced to 5-20 pF [115]. It enables the design of senors with fast timing and improved noise performance. The radiation tolerance is greatly affected due to the larger drift distance to the charge collection point. Small pixel sizes are advisable to take advantage of this approach. The first prototype named "TJ investigator" in TJ 180 nm demostrated a timing resolution of about 16ns [93]. The second approach is known as the "large charge collection electrode" or "large fill-factor" approach since the deep n-well acts as the charge collection electrode. This approach offers higher radiation tolerance owing to the small charge collection distance, thus lowering the possibility of charge trapping after radiation. However, there is a large contribution of deep n-well to p-well capcitance leading to higher response time. ATLASpix1 sensor chips (Chapter 2,4,5) follow this approach. HVCMOS prototypes have proven their radiation tolerance upto 100 MRad and $1\times10^{15}~\rm n_{eq}/cm^2$ yielding an excellent efficiency of 99.7% in test beam. Their intime efficiency was more than that of FE-I4 hybrid pixel modules (95%) used in the same test beam [14]. The difference in response time for a small signal (2000 e $^-$ ) and a large signal (7200 e $^-$ ) input (referred to as "time walk") is measured to be 20 ns [91]. For both approaches, careful design of the sensor is required to prevent digital signals from coupling to the sensor part, creating fake particle hits. One solution is to keep the digital logic physically separated from the active region. Digital switching can inject parasitic signals into sensitive analog circuitry through substrate coupling. It is a well-known technique to isolate the analog circuit from the bulk substrate using a triple well process. Some processes such as LFoundry 150 nm (LF-150) offer additional wells. Such isolation implants have been explored in the design of monolithic CMOS sensors (eg, ATLASpix1 IsoPMOS variant, figure 1.5b). 1 Introduction **Figure 1.5:** DMAPS design variants: (a) Small fill-factor design: Schematic cross-section of a pixel in the modified Towerjazz process: at very low reverse collection electrode bias the depletion of the low dose n-type implant is only partial around the collection electrode. For higher reverse biases the depletion reaches the nwell implant for the collection electrode yielding a low sensor capacitance [115] (b) Large fill-factor design: HVCMOS sensor with isolated PMOS in AMS 180 nm. The deep n-well acts as charge collecting electrode [91] #### 1.2.3 Readout Architecture Readout architecture refers to the transfer of data from the pixels to storage buffers. The traditional approach is referred to as "Column Drain (CD)". More advanced readout architectures, such as Parallel pixel to Buffer (PPtB) (chapter 2) are explored in order to cope with high particle hit rates. In general, the readout architectures can be classified as triggered or triggerless. In a triggerless readout, every pixel hit data is transferred and stored in a hit buffer and eventually read out serially. In a triggered readout, the hits are temporarily stored in the hit buffers which are later filtered based on the presence of an external trigger signal. The triggerless readout demands high output bandwidth since all the hits are being read out. In a triggered readout system, the hits need to be stored in buffers until a trigger latency elapses. Readout logic blocks must be explicitly designed to make decisions based on a trigger signal. In such systems, the output bandwidth requirement is less when compared to triggerless readout. #### 1.2.4 Serial data link The ATLAS and CMS upgrades plan an output bandwidth of 5.12 Gbps per ROIC. ATLAS and CMS must have long cable runs, which means a high transmission line loss in terms of signal attenuation. DC balance avoids low frequencies by guaranteeing at least one transition in every n bits (e.g. for the 8b/10b encoding n = 5). Equalization attenuates lower frequencies more than higher frequencies (the opposite of what cables do) to achieve a flat response within the band. These well-known textbook facts play a critical role in pixel detector design. The telecommunications industry and consumer electronics have optimized their performance in lossy transmission a long time ago. HEP ASICs have also adopted commercial protocols and solutions. State of the art equalization included in commercial FPGA's can achieve reliable transmission with line losses as high as 28 dB. While FE-I4 used 8b/10b encoding, the RD53A ROIC will use an open-source commercial protocol implementation of 64b/66b encoding, including a multi-lane version for balancing data over four 1.28 Gbps outputs. Equalization will be an integral part of these systems with pre-emphasis capabilities, which boost high frequencies at the transmitter (sending a purposely distorted signal to counteract the cable distortion). Even with these techniques, because of various constraints including radiation damage, the ATLAS experiment aims to keep transmission line losses below 20 dB. #### 1.2.5 Powering schemes The planned high luminosity upgrades will face a major challenge in power distribution owing to the large detector area and increased power density. The IR drop causes heat dissipation on the cables which is the limiting factor for power. $$P_{limt} = I_{supply}^{2} R_{min} (1.4)$$ $$I_{supply} = \sqrt{\frac{P_{limt}}{R_{min}}} \tag{1.5}$$ Approximate values for ATLAS are $P_{limt}$ is 10 kW, $R_{min}$ is $0.5 \text{ m}\,\Omega$ yielding $I_{supply}$ of 5 kA [5]. Given the power per unit area requirement of detector layers, supplying power directly using 5 kA current rack is out of question. Two voltage conversion methods have been proposed: DC-DC conversion and serial powering. Serial powering achieves power conversion by connecting devices in series operating with a constant current instead of constant voltage. Shunt voltage regulators have been implemented in ATLASpix2 as a test block and in ATLASpix3 as an integrated block. Serial powering has been implemented in FE-I4 and RD53A readout ICs [56]. #### 1.2.6 Radiation tolerance The total particle fluence of $10^{15} \, n_{\rm eq}/{\rm cm}^2$ is projected over a decade of lifetime at the LHC. The damage created by charged and neutral particles, mostly pions and 12 1 Introduction protons near the interaction point has been normalized to the equivalent of the damage of 1 MeV neutrons (neq). This fluence of particles causes lattice damage by collisions with the lattice atoms (non-ionizing energy loss), but also by ionization of atoms, which corresponds to a total dose of 600 kGy in 250 $\mu$ m silicon bulk material assuming minimum ionizing particles (mips). Ionizing radiation dose at LHC is due to a combination of minimum ionizing particles such as pions and background X-ray radiation. The doping concentrations in CMOS transistors are $10^{15}$ cm $^{-3}$ or more, for which the defect density introduced by bulk radiation damage is negligible . However, there are many dielectric structures in a modern CMOS process leading to radiation effect due to total ionizing dose. 1 GRad corresponding to 50 MIPS crossing every silicon lattice cell [132]. The effect of radiation on electronics by different particles can be classified as shown in table[1.2]. | Radiation type | Energy | Interaction type | Primary | |------------------|--------|--------------------|--------------| | | range | | effect on Si | | | - | | and SiO2 | | | Low | Photo-electric | | | Photons | | effect | Ionization | | | Medium | Compton Effect | | | | High | Pair production | | | Neutrons/Protons | Low | Capture, nuclear | Displacement | | Neutions/110tons | | reaction | Displacement | | | High | Elastic scattering | | **Table 1.2:** Photon and Neutron radiation effects on silicon [13] #### Effect of radiation on MOSFETS Ionizing radiation damage in MOSFETS is entirely due to charge carriers getting trapped in the dielectric layers and not due to silicon bulk damage. The oxide structures in a bulk-CMOS process are the gate oxide, the shallow trench isolation oxide (STI), and the gate spacers as shown in figure 3.9. There are two major hole trapping mechanisms caused by long term ionizing radiation 1) generation of positive oxide traps 2) activation of interface traps, which can be positive, negative, or neutral. The former is due to $V_k$ defects due to oxygen vacancies in $SiO_2$ figure 1.7, and the latter is due to dangling bonds at $P_b$ centres figure 1.6. The unoxidized silicon atoms at the $Si-SiO_2$ interface generally contain unsaturated valence electron, which leads to the formation of dangling bonds [66]. This region is treated with $H_2$ to form Si-H bonds during the passivation process to reduce active interface states. Radiation can cause **Figure 1.6:** $P_b$ defect located at a Si/SiO<sub>2</sub> interface with (111) orientation. The defect is formed by an unpaired valence electron of a silicon atom back-bonded to three other silicon atoms. The defect's trap energy lies in the silicon band-gap. Thus, the charge state of the trap depends on the Fermi-level and it is electrically active [46] the release of hydrogen, resulting in the re-creation of dangling bonds. Ionizing radiation results in generation of electron-hole pairs in the oxide, out of which, a few recombine and a few drift under the influence of applied electric field. A fraction of electron-hole pairs escape the recombination. They are collectively called as the electron-hole yield or charge yield [112]. The electrons drift quickly under the gate bias. The holes continue their slow hopping transport to $Gate-SiO_2$ interface or $Si - SiO_2$ interface depending on the applied gate potential. It causes a distortion in the local potential field of $SiO_2$ lattice. The combination of this strain field and the hole itself is known as a polaron [61]. Polarons cause an increase in the effective mass of the holes hence decrease their mobility. The holes thus trap themselves in deep traps called oxide traps ( $E_0$ ). The number of holes that get trapped is highly dependent on device fabrication. The charge associated with trapped holes causes a change of threshold of transistors. The shift in threshold is negative for both n and p channel MOSFETs. Since PMOS has a negative threshold, it results in an increase in magnitude of its threshold voltage. Since the threshold voltage of an NMOS transistor is positive, it causes a decrease in the magnitude of threshold voltage. This results in Radiation Induced Leakage Current (RILC) in n-channel MOSFETs. PMOS transistors are tolerant towards RILC. Neutralization of oxide traps (annealing) can occur primarily by two mechanisms. 1) the tunneling of electrons from the silicon into the oxide traps 2) the thermal emission of electrons from the oxide valence band into the oxide traps. For tunneling, the spatial distribution of the oxide traps must be close to $Si-SiO_2$ interface and for thermal emission, the energy levels of the oxide traps must be closer to the oxide valence band. The rate of neutralization is also temperature and bias dependent [111]. 14 1 Introduction **Figure 1.7:** Schematic representation of different charge states of the oxygen vacancy in amorphous $SiO_2$ . A neutral vacancy (a) can transform into a negatively charged vacancy (b) by trapping an electron and into a positively charged vacancy (c) by trapping a hole. Double ionization of a neutral oxygen vacancy or trapping an extra hole by a positively charged vacancy induces strong distortion of flexible amorphous network and creation of a $V_{\alpha-}^{2+}$ center (d) or a $V_{k-}^{2+}$ center (e) [60] **Figure 1.8:** Band diagram of an MOS capacitor with a positive gate bias as illustrated by [110] In addition to oxide traps, radiation can cause activation of interface traps at the $SiO_2-Si$ interface. Interface traps exist within the silicon bandgap at the $Si-SiO_2$ interface [81]. Interface trap buildup occurs on time frames much slower than oxide trap charge buildup. Interface-trap buildup can take thousands of seconds to saturate after a pulse of ionizing radiation [76]. For polysilicon-gate transistors, the electric field dependence of interface-trap buildup is very similar to the electric field dependence of oxide-trap charge buildup [112]. The interface traps may act like a donor or an acceptor depending on their position in the energy band gap with respect to Si Fermi level. Figure 1.9 illustrates the determination of the charge state of an amphoteric interface trap in weak inversion. At these conditions the upper peak is totally empty and, therefore electrically neutral. The lower peak is filled to approximately two thirds and positively charged. Donor-like energy levels are located in the lower half of the band-gap around 0.25eV [100] above the valence band edge. The trap levels are positively charged when empty and electrically neutral when occupied by an electron. P-channel devices are affected by the interface traps that are positively charged and predominantly causes a negative threshold shift. Acceptor-like energy levels are located in the upper half of the band-gap, around 0.85eV [100] above the valence band. The trap levels are electrically neutral when empty and negatively charged when occupied by an electron. N-channel devices are affected by the interface traps that are negatively charged and predominantly causes a positive threshold shift. Since the oxide traps are positive for both PMOS and NMOS and interface traps are positive for PMOS and negative for NMOS, their combined effect add up for PMOS and compensate each other for NMOS devices. **Figure 1.9:** Energy diagram of a $P_b$ center at the $Si-SiO_2$ interface in weak inversion. The trap consists of donor-like states in the lower and acceptor-like states in the upper half of the band-gap due to its amphoteric nature. The Fermilevel determines the filling and, therefore the charge state. In this configuration, the trap is slightly positively charged. The holes trapped in gate oxide can cause a shift in threshold voltage for processes like 180 nm. For thinner gate oxides (below 6 nm), quantum mechanical tunneling is significant. Tunneling causes an increase in gate leakage current which led to 16 1 Introduction process modifications such as high-k metal gates instead of silicon dioxide gates for nodes below 40 nm. In case of shallow trench oxide, the net positive interface charge results in parasitic lateral gating termed as Radiation Induced Narrow Channel Effect (RINCE). In the case of a spacer, it results in the modification of source and drain, termed as Radiation Induced Short Channel Effect (RISCE). RINCE affects PMOS and NMOS transistors in opposite ways. NMOS transistors develop parasitic standing current, but this does not interfere with the transistor action, it simply adds to it. PMOS transistors are parasitically turned off near the sides, which hinders the transistor action. A critical question for PMOS is how far away from the sides is the channel affected by interface charge. The effect can be visualized as a radiation dependent width change, such that the effective width, $W_{eff} = W_{layout} - 2\Delta W_{RINCE}(TID)$ , where $W_{layout}$ is the width as drawn, $\Delta W_{RINCE}(TID)$ is how far away from the STI-channel interface is the channel affected, and TID is Total Ionizing Dose. From the observation that minimum width PMOS devices in 65 nm feature size are highly degraded at 500 Mrad, while those in 130 nm feature size are mildly degraded, we can say that $\Delta W_{RINCE}(500Mrad) \approx 30 \ nm$ . RISCE affects PMOS and NMOS in a similar way, by impeding charge flow between source or drain and channel. It can be roughly modeled as 'adding' a certain length to the channel (a longer transistor conducts less current than a shorter one). Some designers may, therefore, think that making very short transistors would be a good strategy against RISCE, because this would be a way to compensate. But the opposite is true: longer transistors are less affected by RISCE, because the relative change in effective length is small if the original device is long to begin with. Once again we can write $L_{eff} = L_{layout} + 2\Delta L_{RISCE}(TID)$ , where we are now adding effective length rather than subtracting effective width. The magnitude of the effect is about twice as large in PMOS than in NMOS. A 60 nm long PMOS (NMOS) will experience a 70% (30%) reduction in full-on current after 500 Mrad. Since transconductance scales as 1/L, a 70% (30%) reduction in current is equivalent to a factor of 2.5 (1.4) increase in length. For the original channel length of 60 nm this implies $L_{RISCE}(500\ Mrad)$ 45nm (12 nm) for PMOS (NMOS). The RINCE and RISCE effects are modulated by transistor bias and by temperature. In general, both effects occur only when transistors are powered, which means there are electric fields in the STI and spacer oxides. The larger the field (which depends on transistor bias conditions) the greater the effect, though there are quantitative differences between NMOS and PMOS. **Figure 1.10:** Conceptual diagram of a linear transistor (left) and an enclosed layout transistor (ELT) (right)) #### 1.2.7 Radiation tolerant IC design Radiation hard device engineering includes using silicon with increased oxygen content supplied in the silicon growth process [68] and operation at low temperature. It helps to reduce the radiation damage on the detector. [77] describes the recent reviews on radiation damage of silicon sensors in HL\_LHC environment. Using custom enclosed layout transistors (ELT) improves radiation tolerance of MOSFETS as the gate completely surrounds the source/drain, thereby reducing channel edge effects. ATLASpix designs use ELT to achieve radiation hardness in full custom blocks. Since ELT area causes a decrease in logic density, it lags behind its Moore's law obeying commercial counterparts. An alternate method is to use linear transistors to take advantage of their logic density. After fabrication, the chip can be characterized for radiation tolerance. The resulting values can be used for accurate modeling of radiation damage in transistors. These radiation models can be custom made and used along with the standard process voltage temperature (PVT) corners enabling their use in synthesized digital logic. It will help in timing closure for setup, hold and propagation delays after radiation. Radiation models can also be used in analog and mixed-mode simulations during the design phase. For analog design, it is possible to choose transistor geometries that are least affected by radiation. #### 1.2.8 Exploring new technologies for radiation tolerance and fast timing The charge collection time of a typical hybrid detector lies in the range of 3-10 ns depending on the sensor thickness and electric field. Low Gain Avalanche Diodes (LGAD) structures have been developed to achieve picosecond timing resolution [116]. However, their radiation tolerance still remains an issue. Fully Depleted Silicon-On-Insulator (FDSOI) technology includes a buried oxide layer that isolates the CMOS circuitry from its substrate. Due to thick buried oxide layer, 18 1 Introduction **Figure 1.11:** ATLAS tracker showing pixel barrel layers [24] the monolithic pixel sensors in this technology suffer from radiation damage. New developments in thin film SOI are being investigated for high radiation tolerance to cope with LHC demands [47]. # 1.3 A brief history of pixel detectors and their readout chips from LHC standpoint This section intends to chronologically list out the research advancements made in pixel detectors for LHC. A general introduction to pixel detectors and its applications can be found in [130]. Near the interaction point at LHC, pixel detectors must be capable of coping with the high density and rate of particle hits and withstand the harsh radiation levels. Pixel detectors record 3D space points which are necessary for pattern recognition and tracking near the interaction point. The channel density of pixel detectors increases by more than an order of magnitude compared to strip detectors to about 5000 channels per cm<sup>2</sup> [131]. Fast readout of such a complex system required new technologies and methods, which have been developed during the past decade. This document takes into account, the detectors in operation or under construction at the Large Hadron Collider at CERN, in the ATLAS, CMS and LHC-b experiments [104], [67] as the current state of the art. The ATLAS experiment has a 3-layer hybrid pixel detector installed in 2007 [1] and an additional layer at a lower radius called Insertable B-Layer (IBL), installed in 2014 [48]. The IBL introduced a new readout chip called FEI4 [38] with several features needed for readout at high particle hit rates. A systematic description of the specification of front-end ROIC for ATLAS pixel detector can be found in section 2.3 of [90]. [41] classifies CMOS readout ICs (ROICs) into three generations. First generation pixel chips are those in the original ATLAS, CMS detectors [89], [12] and Medipix [11] that used 0.25 $\mu$ m feature size with custom layout techniques for radiation tolerance. Second generation chips (eg: FE-I4) in 130 nm are currently used in experiments or devices. Third generation ROICs, RD53A or FE-65 [39] and Medipix3 [18] uses 65 nm process node and are currently under development. Tremendous improvement in data storage density (>10 Gb/s/ $cm^2$ ) and output bandwidth (> 20 Gbps) can be observed between these three generations. The detector readout architecture is constantly evolving in its storage capabilities and output bandwidth to cope with high particle hit rates. Table 1.3: Evolution of Front-End (FE) readout chips for pixel detectors in LHC | | FE-I3 | FE-I4 | FE-65 | |---------------------------------|--------------------|--------------------|--------------------| | hit rate(MHz/ $\mathrm{cm}^2$ ) | < 100 | < 400 | 2000 to 3000 | | TID (Mrad) | < 100 | 200 | 1000 | | $ m NIEL(n_{eq}/cm^2)$ | $1 \times 10^{14}$ | $5 \times 10^{15}$ | $2 \times 10^{16}$ | | technology (nm) | 250 | 130 | 65 | | power (mW/mm <sup>2</sup> ) | 2.4 | 1.8 | 3.5 | | output bandwidth (Mbps) | 40 to 160 | 300 to 1200 | 2000 to 20000 | # 1.4 ATLAS ITk Upgrade Layer 4: Requirements for CMOS detectors Table 1.4 is adopted from [92], which lists out the technical requirements of CMOS sensors for ATLAS ITk upgrade. It constitutes the design specifications of ATLASpix prototypes discussed in the next chapters. 20 1 Introduction **Table 1.4:** Requirements and acceptance criteria for monolithic CMOS sensors for ATLAS inner tracker layer 4 | Tracking requirements | | | | | |-----------------------------------------------------------------------|----------------------------------------------------------------------------|--|--|--| | BX-time | 25 ns | | | | | Hit Rate per mm <sup>2</sup> per BC at $\mu$ = 200 | $0.021 (kHz/mm^2)$ | | | | | Noise (Fake hit) rate per BC × pixel | < 0.1 % of physics hit rate | | | | | TID and NIEL | $100 ({ m Mrad}) { m and} 2 imes 10^{15} ( { m n}_{ m eq}/{ m cm}^2)$ | | | | | SEU affecting a chip | $< 0.05/hr/chip at 1.5 GHz/cm^2$ particle | | | | | | flux | | | | | Maximum lost hits | < 1% | | | | | Maximum trigger latency | $25 \mu s$ | | | | | Pixel requ | uirements | | | | | Minimum pixel size | $50 \ \mu \text{m} \times 50 \ \mu \text{m}$ | | | | | Dynamic range | 3 Minimum Ionizing Particle (MIP) | | | | | | equivalent if analog information is | | | | | | required | | | | | Minimum in-time threshold | 0.2 MIP as a guideline | | | | | Effective noise (pixel ENC + threshold | 0.1% threshold as a guideline | | | | | dispersion) | | | | | | Threshold dispersion | Uniformity of 10% across the chip | | | | | ToT resolution | 4-bits or more | | | | | Target depletion thickness | $< 70 \mu \text{m}$ , backside processing is optional | | | | | IO requi | | | | | | IO signal type | LVDS fail-safe | | | | | Output data format | RD53 compatible | | | | | Serial data link | nk 1 per chip at 1.28 Gbps | | | | | Functional requirements | | | | | | Internal and external trigger calibration for charge injection | | | | | | Power-on-reset | | | | | | On-chip serial powering | | | | | | Chip ID e-fuse | | | | | | Maskable pixels | | | | | | Built-in command decoder for configuration and read-back | | | | | | 10-bit ADC for temperature, current, and reference voltage monitoring | | | | | # Chapter 2 # ATLASpix1: The first large area HVCMOS prototype #### 2.1 Introduction The upgrade of the ATLAS inner tracker for HL-LHC has put forth many challenging requirements on silicon pixel sensors such as small pixel size, high time resolution, high readout speed, low power consumption, serial powering and radiation hardness. The high voltage CMOS (HVCMOS) pixel sensors are designed to meet such specifications for the outer pixel layers ATLAS ITk. The readout architecture is crucial for achieving high detection efficiencies for high particle hit rates such as 2 MHz/mm<sup>2</sup> in the outer layers of the ITk pixel tracker. The HVCMOS prototypes are large fill factor designs in 180 nm process on high resistive substrates. There is no clock distribution over the matrix, which in turn helps to avoid digital crosstalk and reduce the overall power consumption. ATLASpix is a series of monolithic High Voltage CMOS (HVCMOS) sensor chips that are engineered to meet the requirements of outer layers of ATLAS ITk pixel tracker for HL-LHC upgrade. They are large collection electrode designs on high resistive wafers to ensure high detection efficiency and radiation tolerance. The readout electronics are placed on the chip periphery. The design of HVCMOS demonstrator is carried out in different stages that involve prototyping of new design ideas. These prototypes are called as ATLASpix1, ATLASpix2 and ATLASpix3. ATLASpix sensors are large fill factor designs offering several advantages such as fast charge collection, short drift path, vertical depletion and no low field regions within the active area. ATLASpix prototypes are fabricated on different substrates of varying resistivity such as $80 \,\Omega \mathrm{cm}$ , $200 \,\Omega \mathrm{cm}$ and $1 \,\mathrm{k}\Omega \mathrm{cm}$ in $0.18 \,\mu \mathrm{m}$ AMS ah18 process. Figure 2.1: ATLASpix1 and Mupix8 layout ATLASpix1 prototype figure 2.1 has an active area of 1.6 cm $\times$ 0.33 cm. It is the first large area HVCMOS prototype in 180 nm gate length. It contains three design flavors namely Simple, M2, and IsoSimple based on the comparator type and readout type. Table 2.1 lists out the differences between ATLASpix1 design variants. ATLASpix1 with triggerless readout has demonstrated more than 99% detection efficiency in a team beam study conducted at CERN using CLIC telescope Various test beam and irradiation studies were conducted within HVCMOS collaboration on the ATLASpix1 sample with triggerless readout which is referred as "ATLASpix1\_Simple". In the design flavor named ATLASpix1\_M2, a novel triggered readout scheme that can cope with high particle hit rates is introduced [95]. It involves grouping of several pixels into a super pixel and the data transfer to the buffer takes place in parallel. Hence it is known as Parallel Pixel to Buffer (PPtB) scheme (section 2.3.1). ATLASpix\_M2 will be explained in the subsequent sections. The Readout Control Unit described in section 2.3.3 is a digital block that schedules the readout operation as well as data transfer to the giga-bit serial link. This block is reused in both ATLASpix\_Simple and M2 variants. ATLASpix\_M2 is the focus of this PhD work and hence it will be explained in detail in the next sections. Emphasis has been given to its readout architecture and control electronics. **Figure 2.2:** ATLASpix1\_Simple demonstrated an efficiency of more than 99% using CLIC telescope in a test beam study at CERN [91] Table 2.1: Comparison of ATLASpix1\_Simple and ATLASpix1\_M2 design features | Feature | ATLASpix1_Simple | ATLASpix1_M2 | | |---------------------------|-------------------------------------------|------------------------------------------|--| | Pixel size $(x \times y)$ | $130\mu\mathrm{m} \times 40\mu\mathrm{m}$ | $60\mu\mathrm{m} \times 50\mu\mathrm{m}$ | | | Matrix size | two times $25 \times 400$ | $56 \times 320$ | | | Pixel grouping | 1:1 pixel to buffer mapping | 16:4 PPtB transfer | | | Readout scheme | triggerless Column Drain | Triggered readout (PPtb) | | | Analog information | 6-bit ToT using DRAM | No amplitude information | | | Data storage | Hit buffer containing DRAM | CAB buffer contaning SRAM | | ## 2.2 ATLASpix\_M2 Figure 2.3 shows the general block scheme of ATLASpix M2 in accordance with its floorplan. The active area of the chip is where the sensing of particles occurs, and it consists of HVCMOS pixel matrix (320 rows and 56 columns) which is surrounded by global configuration registers and global bias DAC for the matrix. Each pixel consists of a charge sensitive amplifier, discriminator, 4-bit thershold tune-DAC and RAM. 16 pixels are grouped into a super pixel which is connected to a buffer block of depth, 4. There are 40 such super pixels in a double column. When a pixel is fired, the local address of the hit pixel is transmitted to the buffer block through an 8-bit line called hit bus. The particle hit information (pixel address and time stamp) is stored in buffers known as Content Addressable Buffer (CAB), which is explained in section 2.3.2. There are 40 buffers per double column of pixels. The four buffers in a block receive time stamp signals (10-bit gray coded) from Figure 2.3: ATLASpix\_M2 Block level representation end of column (EoC), originally generated by the Readout Control Unit (RCU). A delayed time stamp is the time stamp signal that is delayed by a certain value known as on-chip latency (up to 1000 bunch crossings or 25 $\mu$ s). This value is externally programmable. The on-chip latency is stored in the configuration registers. The EoC also delivers the external trigger signal that triggers the readout. Additionally, control signals for readout such as "Load" and "Read" propagate from RCU to CAB through EoC. Four CAB buffers are used to store a pixel address and time stamp. Pixel address, in this case, refers to the local-address of a pixel, which is represented as an 8-bit hit pattern. Time stamp refers to a 10-bit counter state that records the time at which a hit signal is received. The group address of a super pixel (6-bit) is stored in the buffer's ROM. A hit data comprises of all the above information. It is sent to EoC when a triggered hit is read out. The addresses are binary-coded and increase in the direction of priority chain. A priority logic is a scan chain which is used to sense the first empty buffer to write ("writeScan") and the first buffer that has to be read out ("readScan"). The priority chain also extends to EoC registers which is used by RCU to schedule the readout operation. The scan circuit finds the first hit and assures that the corresponding hit data is transmitted to RCU. Pixel matrix also contains configuration registers. There are 320 register blocks, each of 6-bits. Input of the row configuration register is connected to serial out of the column register. The output of the row-register is connected to the DAC register. The chip is configured externally by shifting configuration bits into these registers at 10 MHz. The readout operation is scheduled by control signals from the Readout Control Unit (RCU). RCU is a digital block has the following sub blocks executing a number of tasks, ranging from loading of hit data from matrix to encoding and serializing the data packets. RCU consists of : - Scheduler FSM: Also referred to the readout state machine, it receives the scanout signal and hit data from EoC and generates the control signals such as PullDown, LoadPixel, LoadColumn and ReadColumn. Each of these signals has its own control functions. - 2. Clock generator: Clock generator receives 800 MHz clock from PLL and generates 80 MHz for scheduler and time stamp generator. It also generates 160 MHz, 400 MHz and 200 MHz for data packaging and serializing. - 3. Serializer block: It is a novel serializer design based on multiplexers. Chapter 4 describes the serializer design with timing diagrams. - 4. Data encoder: A novel 8b/10b pipelined encoder is introduced in ATLASpix1. It is a pipelined version of an open-source encoder developed by [23]. Since the readout blocks are located at the chip periphery, less electronics are needed to be placed inside the pixel which in turn helps to achieve low detector capacitance and almost zero digital cross-talk, apart from that caused by the toggling of hit bus. In addition, the digital power consumption is less since no clocks (in this case time stamps) propagated inside the pixel matrix. Such a scheme is sometimes referred to as asynchronous readout. Each building block of mboxATLApix1\_M2 is explained in detail in the following subsections with emphasis on the readout circuitry. #### 2.2.1 HVCMOS sensor High Voltage CMOS sensors is a major break-through in the field of semiconductor sensor development for high energy particle physics. The usage of commercial CMOS technology with a high resistive substrate makes them cost-effective compared to the hybrid sensors [87]. HVCMOS sensors have high efficiency in detecting ionizing particles and are radiation tolerant up to $100\,\mathrm{Mrad}$ and $5\times10^{15}\,\mathrm{n_{eq}/cm^2}$ [14]. A general introduction of HVCMOS sensors can be found in section 1.2.2. The cross-section of an HVCMOS sensor is shown in figure 2.4. A depletion region is created by applying a high reverse bias voltage (eg: -50 V) across the p-substrate and deep n-well junction. The depth of the depletion region is of the order of tens of microns owing to the low dopant concentration of deep n-well and high reverse bias voltage (See APPENDIX A). The greater the depletion depth, the larger the number of electron-hole pairs created by the traversal of ionizing radiation. The deep n-well acts as the charge collection electrode. It also acts as substrate to the PMOS transistors while the NMOS transistors are placed inside a p-well. The major point to note here is that the charge collection happens by drift under the influence of high electric field, which is almost uniform across the pixel area. A high radiation tolerance is expected due to less charge collection time. High-energy electrons (secondary electrons generated by photon interactions or electrons present in the LHC environment) and protons can ionize atoms, generating electron-hole pairs. A single, high-energy incident photon, electron, or proton can create thousands of electron-hole pairs. The average signal corresponding to a minimum ionizing particle (MIP) is of the order of thousands of electrons. Figure 2.4: HVCMOS operating principle #### 2.2.2 Pixel electronics Figure 2.5 shows the schematic of a pixel. The pixel electronics is placed inside the deep n-well. It contains a charge sensitive amplifier (CSA), which is capcitively coupled to the sensor, a discriminator, a 4-bit threshold tune DAC and RAM to store the configuration bits. PMOS transistors have their bulk in a shallow n-well, which is ohmically connected to the deep n-well. Latch up can be prevented by careful biasing of the p-wells inside the deep n-well and by the use of guard rings [87]. To avoid large crosstalk, the comparator is realized using NMOS transistors. The radiation tolerant layout has been used. The pixel in ATLASpix\_M2 has **Figure 2.5:** Pixel electronics [91] the same electronics as the normal ATLASpix\_Simple pixel. The only difference is the size of the deep n-well, for which, ATLASpix\_M2 pixels have an area of $60~\mu\mathrm{m} \times 50~\mu\mathrm{m}$ . The design of the sensor and pixel electronics are beyond the scope of this dissertation. #### 2.3 Readout architecture Readout refers to the operation of transferring the hit data (time, location and energy information) from pixels to periphery, including storage and buffering. Figure 2.6 shows a comparison between different readout schemes. Traditionally, all hits are read out column-wise, in a serial manner. It is termed as Column drain (CD) readout. Waiting time in the queue to transfer hits is represented by "Q", the transfer delay by "T". An external trigger signal can be used to filter the buffered hits that correspond to events of interest. If there is no filtering of hits based on an external trigger, it is referred to as triggerless column drain readout. ATLASpix\_Simple chip uses triggerless CD readout. Such a system may not have the need of storing hits for a long period of time as long as the output data link has enough bandwidth. This scheme can be visualized as a single server queue with random customer arrival time\*, assuming each pixel is a customer, while the data output stage is the single server. As soon as a pixel is hit, it transmits data to the EoC buffer through hit bus. The hit bus is arbitrated so each hit pixel must wait for its turn. The hit bus forms a single server queue, just like M/M/1 queue. The bandwidth of the output data bus must exceed the incoming hit rate for this architecture to be feasible. The utilization factor $\rho$ is given by, <sup>\*</sup>This can be compared to a M/M/1 queue [57] $$\rho = \frac{\lambda}{\mu} \tag{2.1}$$ where $\lambda$ is the incoming data packet rate that corresponds to the hit rate per pixel and $\mu$ is the output data packet transfer rate. The probability of the waiting time exceeding transfer time in a single server queue is given by the following equation [43], $$P = \rho e^{-(\mu - \lambda)T} \tag{2.2}$$ For a hit rate of "H" per pixel, $\lambda$ =N×H, where N is the number of pixels connected to the hit bus. All hits must be transferred within a short time, as there is no place to store them within the pixel. To avoid hit loss, the output data rate must be considerably greater than the incoming data rate. This condition ( $\mu >> \lambda$ ) is not feasible for high pixel hit rates. The LHC-b Velopix readout chip [94] implements a readout architecture using a shift register (SR). The SR not only transfers data, but is also a temporary storage. The output bandwidth, $\mu$ of the SR is simply the clocking speed (assuming one pixel hit is read out per clock cycle). Only the first pixel transferring the data to the SR sees its full bandwidth. ATLASpix1\_Simple uses a column drain readout with hit buffers to temporarily store the data, which will be immediately transferred to the end of column. As soon as the hit arrives, it is transferred to the periphery. After a short duration of storage in the periphery in hit buffer, all hits are readout sequentially. The first generation ROICs such as FE-I3, uses trigger signal to filter out the hits (triggered column drain) that are relevant for physics. The main advantage of triggered readout is the reduced requirement on output bandwidth. All hits are stored until a pre-defined trigger latency after which, a trigger signal is received for decision making. The input rate to the hit bus is reduced as $\lambda' = \lambda \times t_R \times t_{BX}$ , where $t_R$ is the trigger rate and $t_{BX}$ is the bunch crossing period. The bandwidth of such a system is directly proportional to the trigger rate $(t_R)$ instead of pixel hit rate (N × H). The limiting factor in this approach is the buffer storage space and increased digital logic for filtering the hits. The fourth approach called Parallel Pixel to Buffer (PPtB) readout scheme is proposed in this work. A group of pixels forms a super pixel, that is connected to a set of four buffers. An example of PPtB scheme is shown in figure 2.7. An 8-bit hit pattern can be used to map the pixels that are hit. The buffer used in this scheme has more logic density compared to a column drain buffer. It uses a Content Addressable Buffer (CAB), whose architecture is explained in section 2.3.2. The filtering of hits based on trigger signal happens in the CAB buffer. Apart from the **Figure 2.6:** Comparison between shift register based full readout (left), triggerless CD readout (middle) and triggered CD readout (right) above-mentioned schemes, novel readout schemes based on waveform sampling are explored in some ATLASpix1 prototypes [108]. High speed time based ADCs can be used to sample the pixel output that are eventually read out using a shift register. #### 2.3.1 Parallel Pixel to Buffer (PPtB) transfer The signals caused by particle hits are digitized using discriminators and transferred to the digital periphery. Content Addressable Buffers (CAB) located at the periphery, stores and filters the hit data based on trigger. CAB buffers are sometimes referred to as trigger buffers in this dissertation, since they respond to an external trigger signal. The CAB buffer acts as a temporary storage for hit information, until it is sent to the End of Column (EoC) buffer. The CAB buffers store the hit information (hit pattern, time stamp) until the on-chip programmable latency elapses. Only the hits that belong to the chosen set of events (a certain time stamp) are readout by content addressing. The region of relevant time stamps is determined by the trigger signal and on-chip latency. Out of 17920 pixels, each group of 16 pixels forms a super pixel which can be addressed using 8 address lines by projection addressing as shown in figure 2.8. This helps to save routing space by reducing the number of interconnect lines required to transfer the hit pattern from 16 to 8. The address encoding scheme shown in figure 2.8, poses a risk of having ghost hits sharing the same address pattern as real hits. Due to the small area of a super pixel ( $800 \times 60 \ \mu \text{m}^2$ ), the rate of multiple clusters is reduced Figure 2.7: Conceptual representation of Parallel Pixel to Buffer (PPtB) scheme. by several orders of magnitude. Assuming a total hit rate of $108\ MHz/cm^2$ , the probability of having a hit in the super pixel per bunch crossing is calculated as $1.3\times 10^{-3}$ . Ghost hits are caused either when a single particle produces a cluster on the group edges (figure 2.8) or when two particles hit the pixels of neighboring groups in a super pixel. The former scenario leads to a real hit pattern which can be identified as a cluster and two ghost hits that appear as two separate particle hits. Since the probability of occurrence of two separate particle hits is much less than the probability of a single clustered hit, we can neglect the ghost hits during reconstruction. A theoretical discussion of lossless data compression in pixel detector readout can be found in [40], which states that the information content (entropy) is roughly proportional to the number of clusters and the achievable data compression depends on cluster size. Figure 2.8: ATLASpix1\_M2 Top layout showing address encoding and readout logic #### 2.3.2 CAB Buffer A super pixel is mapped to a Content Addressable Buffer (CAB) block that can store up to four hits at a time<sup>†</sup>. The CAB buffer compares the stored Time Stamp (TS) (10 bit) with a delayed TS propagated within the chip: Hence the name, content addressable. The group address (5 bit) of the super pixel is programmed in address ROM. When there is a particle hit in one or more of the 16 pixels, two or more address lines will be set to 1. Therefore, a hit can be detected by calculating logical OR function of the address lines. This is referred to as "HitOR" signal. When $<sup>^{\</sup>dagger}$ For example for 50 kHz single pixel hit rate and 5 $\mu$ s trigger latency, the average number of hits per latency period is 0.25. Using Poisson statistics, one would need to store up to 4 hits per pixel in order to keep 99.9% of hits. This number (4 in this example) is referred as the buffer depth. It implies a buffer depth of four Figure 2.9: ATLASpix1\_M2 top readout buffer logic [99] the HitOR signal goes high, the hit pattern (8 bit) is recorded and transferred to CAB buffer. The corresponding time stamp of the HitOR signal is recorded in the buffer RAM. The time stamps are 10-bit gray coded signals with a period of 25 ns that corresponds to the bunch crossing (BC) period of LHC. The hit information is held in the CAB buffer until the elapse of a retention time. The retention time of a CAB is programmable and is referred to as on-chip latency. The on-chip latency can be determined by comparing the stored time stamp with an additional time stamp signal that has the same period but different phase as the original time stamp signal. Both time stamps are generated on chip. The two time stamps are constantly compared in the CAB buffers. When there is a match, the decision will be made based on an external trigger signal. If the level-1 trigger signal is received within before the on-chip latency, the stored hits are marked for readout. The hits that are not marked for readout are deleted from the buffer. Figure 2.11 shows a mixed-mode simulation of CAB buffer. The first and the last hits are triggered where as the middle one is deleted due to the absence of a trigger signal. Figure 2.10: Content Addressable Buffer (CAB) block full custom layout #### 2.3.3 The Readout Control Unit (RCU) The readout control unit (RCU) is a digitally synthesized block, that schedules the readout operation from pixel matrix to serial data link. This section aims to provide an overview of RCU and its functionality in the readout system. It has several layers, and each of them is explained in chapter 4. It generates control signals to perform numerous operations. For example, it generates "load column" signal for loading the hit data from CAB to the EoC buffers. The "read column" signal is responsible for loading the hit data from EoC to RCU. The hit data is then encoded and serialized. RCU generates time stamps and delayed time stamps (delay value equals on-chip latency) of 10 bits length. It also generates a 6 bit TS2 time stamp to store time-over-threshold (ToT) information. All time stamp signals are gray-coded. Storage of ToT is implemented in ATLASpix\_Simple chip. Since ATLASpix\_Simple and M2 share the same RCU design, TS2 time stamps are not used in ATLASpix\_M2. The RCU has multiple clock domains. The clocks of frequencies 160 MHz, 200 MHz and 40 MHz are generated from an input clock of 800 MHz using Johnson counters and combinational logic. The data encoder used in ATLASpix1 (figure 2.15) is a pipelined custom 8b/10b encoder with running disparity based on [133]. A finite state machine known as the "task scheduler" is responsible for coordinating the transfer of hit data from pixel matrix to CAB to EoC and eventually to serial data link. Each state sends a control signal to transfer the hit data through different buffered stages. The scan-out logic implemented in buffers, helps to detect the presence of a stored hit. As a debug feature, it can be configured to send a periodic counter output instead of hit data. The readout operation is initiated when the "sendcounter" configuration is disabled. **Figure 2.11:** Simulation of CAB buffer in mixed mode environment. Read RAM signal indicates that the first and third hits are marked for readout. The second one is deleted due to the absence of a trigger signal. **Figure 2.12:** ATLASpix1\_M2 readout periphery: The hit data is transferred from the pixel matrix to the periphery. The time stamps and other control signals are propagated from RCU to the matrix. #### Task scheduler: The readout state machine All states with index 1 generate control signals for the pixel matrix. For example, PullDown1 generates the pull-down signal. All states with index 2 are separation states to match timing. The state machine is clocked at 80 MHz. It is possible to divide the clock using the parameter TimerEnd. If TimerEnd is set to "n" the clock is divided by n+1. The hit word is $4 \times 9$ bits long. The data format depends upon the state of the scheduler. During sync state, the data format is "1BC1BC1BC1BC". Figure 2.13: Readout Control Unit (RCU) The 1's at MSB is used to indicate a comma word. The remaining 8-bits are part of the data. A special sequence (1CAA1CAA) is sent two clock cycles post Loadcol2 state in order to mark the beginning of a hit-word. The hit word is formatted as "Column address [7:0], Row address [5:0], TimeStamp [9:0], PixelAddress [7:0]". Row address is the group address stored in ROM. After loadPix2 the data format is "000, BinaryCounter [15:8], BinaryCounter [7:0], TStoDetector [7:0]". During all the other states, the sync word "1BC1BC1BC1BC1BC" is sent out. #### 8b/10b pipelined encoder 8b/10b encoding scheme was defined in 1983 in the IBM journal of research and development [133]. It maps an 8-bit data frame to 10-bits in order to achieve DC balance. This type of encoding provides enough state changes for reasonable clock recovery. During the transmission of data characters, two additional bits called "running disparity (RD)" are added to the stream to ensure that the number of 1's Figure 2.14: ATLASpix1 Scheduler FSM are almost equal to the number of 0's. The transmitter assumes a negative Running Disparity (RD-) at startup. When an 8-bit data is being encoded, the encoder will use the RD- logic for encoding. If the encoded 10-bit data is disparity neutral, the Running Disparity will not be changed and RD- will still be used. Otherwise, the Running Disparity will be changed and the RD+ will be used instead. Similarly, if the current Running Disparity is positive (RD+) and a disparity neutral 10-bit data is encoded, the Running Disparity will still be RD+. Otherwise, it will be changed from RD+ to RD-. Figure 2.15 shows the pipelined 8b/10b encoder used in ATLASpix\_M2. In order to compare the proposed encoder with its current state-of-the-art, a test system has been designed. A pseudo-random number generator was implemented using an 8-bit Fibonacci linear feedback shift register (LFSR). At reset, "FFFF" pattern is loaded into the flip-flops. Since it is pseudo-random, the output repeats in every 31 clock cycles. A state machine generates the control signals to read and store the data into a FIFO. Each 8-bit data packet is encoded by the encoder block under test. The test system was synthesized in 180 nm and post-routed netlist was simulated under various power, performance, area (PPA) constraints. A comparison between different IPs are shown in table 2.2. Cadence IP is Figure 2.15: 8b/10b pipelined encoder implementation in ATLASpix1 Figure 2.16: 8b/10b pipelined encoder with running disparity based on look up tables and the open-source IP is based on combinational logic. ATLASpix\_M2 uses a pipelining technique on the open-source IP with running disparity. While the pipelining technique helped to increase the performance of the overall system, it is inferred that the sequential logic in 180 nm was the real bottleneck. The pipelined version of 8b/10b encoder was chosen for ATLASpix\_M2 Figure 2.17: Test system for encoder despite of its area and power trade-off due to its improved performance when compared to Cadence IP. However, when we consider the RCU as a system, it is the performance of scheduler FSM and serializer which is critical for achieving the required data rate of 1.28 Gbps. **Table 2.2:** Comparitive study of various 8b/10b encoder schemes | Encoder | Power | Performance | Area | |-------------------------------|----------------------|-------------|---------------| | Cadence IP | $5.05 \mu W$ | 4 Gbps | $0.4~mm^2$ | | Open-source IP [23] | $1.07~\mu\mathrm{W}$ | 5.8 Gbps | $0.12 \ mm^2$ | | ATLASpix_M2 pipelined encoder | $6.37~\mu\mathrm{W}$ | 6.2 Gbps | $0.4~mm^2$ | #### Serializer The serializer for ATLASpix\_M2 is based on a 3-stage MUX tree with input synchronization (section 4.2.3 in chapter 4). The serializer in RCU outputs two bits at 800 Mbps. The final stage of serialization is achieved using an analog serializer 2.4 Summary 39 based on current mode logic. The design of the digital serializer is explained in chapter 4. #### 2.4 Summary This chapter describes the design details of the first large area HVCMOS sensor chip in 180 nm. HVCMOS sensors are proposed to be a cost-effective replacement for the existing hybrid sensors in the outer layers of ATLAS ITk pixel barrel. To prove the feasibility of the proposed sensor for the ATLAS experiment, a large area prototype was designed. All HVCMOS sensors follow a "large charge collection electrode" topology. The readout electronics is located at the bottom periphery to avoid digital cross-talk. ATLASpix1 is a $1 \text{cm} \times 2 \text{cm}$ HVCMOS prototype in AMS ah18 process. It contains three design flavors based on the pixel size and readout scheme. The readout control unit block is the same for both triggered readout and triggerless readout variants. A novel triggered readout scheme with smart pixel grouping is introduced in ATLASpix1\_M2. It features a Content Addressable hit Buffer (CAB), where the triggered hits are filtered for readout. ATLASpix1\_M2 has a small pixel size of $60\,\mu\mathrm{m} \times 50\,\mu\mathrm{m}$ . Storage of signal amplitude information was not implemented to meet the area budget. For data encoding, a novel pipelined 8b/10b encoded with running disparity was used. With the readout scheme in ATLASpix1, the hits corresponding to different events are read out in an unsorted manner. This is rectified in the next generations of ATLASpix, together with improved spatial resolution and time walk correction. # Chapter 3 # ATLASpix1: Measurement Extensive laboratory tests and test beam studies are required for qualifying ATLASpix ASICs for ATLAS ITk. This chapter describes the laboratory tests and X-ray irradiation studies conducted on ATLASpix\_M2. Several test beam studies have already been conducted on ATLASpix\_Simple within the HVCMOS collaboration. It has proven a detection efficiency of 99.7% [91]. A detection efficiency of 99.4% was observed after irradiation [58]. The main goal of this work is to characterize ATLASpix\_M2 for the first time within the HVCMOS collaboration. ### 3.1 Introduction to test system The test system is shown in figure 3.2 for ATLASpix was developed at KIT-ADL by Felix Ehrler [33]. It was used to characterize ATLASpix1\_M2 chip. Front-end assembly includes a chip carrier board and an adapter board which is connected to Nexys Video board through FMC connector. The pixel sensor chip is mounted on and wire bonded to the carrier board. The carrier board includes power supplies connections, monitoring capability, bias voltages, chip selectors for different ATLASpix design flavors. The injection board and voltage board are responsible for generating injection pulses, setting the baseline and global threshold. The Nexys Video board features the Artix-7 XC7A200T FPGA from Xilinx Artix-7 family. It has industry standard communication peripherals such as onboard Ethernet, USB-UART, and high-speed USB. This allows the Nexys Video board to be interfaced with larger systems. Onboard user peripherals contain switches, buttons, LEDs, and an OLED display that allow users to interface directly with their designs instead of using additional I/O. The FMC connector and four Figure 3.1: ATLASpix test setup block description Figure 3.2: ATLASpix test setup Pmod ports further expand the interfacing ability of the Nexys Video. ## 3.2 Laboratory tests The laboratory tests were conducted to test the functionality of the pixel and readout electronics. Figure 3.3a shows the a response of a pixel to $Fe^{55}$ source. The amplifier output signal was recorded for injection pulses of varying amplitudes. The output response to $Fe^{55}$ source can be compared to the response toward injection signals, as shown in Figure 3.3b. We, therefore arrived at the conclusion that, a 350 mV injection is equivalent to 1660 $e^-$ . This calculation is used to convert threshold dispersion and mean noise from the unit of millivolts to electrons in the subsequent measurement results described in sections 3.2.2 and 3.3.3. Figure 3.3: ATLASpix1: Amplifier output #### 3.2.1 Triggered readout In order to test the CAB readout scheme in ATLASpix1\_M2, the trigger delay was been varied from 0 to 150 time stamps where each time stamp corresponds to 25 ns. During each step, 900 charge injections were made across the pixel matrix. Two sets of readout were done for on-chip latencies of 45 time stamps and 85 time stamps. The width of the trigger pulse was 25 time stamps. Figure 3.4 shows that the hit information is read out when the trigger window falls within the specified on-chip latency. #### 3.2.2 Threshold tuning A threshold scan was done over the entire pixel matrix of size $320 \times 56$ , at a readout speed of 800 Mbps. The trigger is generated with a fixed delay after the injection. The width of the trigger signal is 400 ns (equivalent to 16 bunch crossings (BC)). The on-chip latency is adjusted so that all the hits generated by injection are triggered. The injection voltage is then varied from 0 V to 0.6 V, in steps of 0.025 V, keeping the injection delay, the number of injections (10) and the on-chip latency (43 time stamps, where each time stamp corresponds to 1 BC) constant. The resulting Scurve is fit using an error function. The 50% value of this fit is regarded as the input-referred threshold for each pixel. The sigma value of this fit is regarded as **Figure 3.4:** First measurement results of triggered readout: The hits with stored timestamps that fall within the trigger window are readout [95]. the noise of the corresponding pixel. It is possible to adjust the threshold of every pixel using a 3-bit D/A converter called the tune-DAC or TDAC. These tune bits are stored in the pixel memory. A tuning algorithm was developed to determine the set of DAC values that gives the least threshold dispersion across the matrix. The tuning algorithm works in the following way: The threshold dispersion of the entire matrix for all DAC values ranging from 0 to 7 is estimated. The mean value of this dispersion is chosen as the target input referred threshold. For each pixel, a DAC value is chosen so that it has a local threshold closest to the target threshold. For these tuned DAC settings, the dispersion of the local thresholds is calculated. The tuning algorithm was implemented using Python and later integrated to the C++ test framework. The threshold dispersion was reduced by a factor of four after tuning, as shown in figure 3.5. The mean threshold is $1055\,\mathrm{e^-}$ with a standard deviation of $35\,\mathrm{e^-}$ . The mean value of noise distribution over the entire pixel matrix after tuning is $78\,\mathrm{e^-}$ . #### 3.2.3 Serial link The data transfer characteristics can be studied with the help of an eye diagram. The oscilloscope probe was connected to the data line on the PCB which is about 10 cm long. An on-chip PLL was used to generate the 800 MHz clock with a reference clock of 200 MHz. The serial data output works at a Double Data Rate (DDR) of **Figure 3.5:** ATLASpix1: Threshold tuning at 800 MHz, showing an improvement in the threshold dispersion by a factor of four [98]. 1.28 Gbps. An eye height of $504 \pm 1$ mV, an eye width of $580 \pm 1$ ps and a jitter of $100 \pm 0.2$ ps were measured as shown in figure 3.6. Figure 3.6: Eye diagram of serial data link at the required rate of 1.28 Gbps [98]. #### 3.2.4 Time-walk measurement Simultaneous particle hits with different charge quantities typically produce different responses in the pixel analog front-end. A particle hit with higher energy causes a faster amplifier response when compared to a particle hit with lower energy. The time interval between these responses is referred to as time-walk. The time-walk must be lower than the minimum required time resolution of 25 ns for ATLAS, or it must be compensated either on-chip or off-chip. We can estimate the trigger window characteristics by varying the trigger vs injection time distance. If the distance is too small, the hits don't get triggered and they will be lost in the CAB buffer. Trigger vs injection distance where 50% of the hits are detected is measured using delay scan. In this way, the time resolution and time walk can be measured. The time walk for signals from $1500 e^-$ (20% MIP) to $7000 e^-$ (100% MIP) using standard amplifier settings\* is measured to be 42 ns figure 3.7a. The time-walk effect can be further reduced by 1) lowering the threshold of the comparator 2) lowering the rise time of the amplifier response by increasing the transconductance $(G_m)$ . The latter can be achieved by increasing the parameter called "VPload", which will decrease the output resistance. The time-walk for threshold **(b)** Time-walk measurement using VPload = 20 **Figure 3.7:** Investigation of time walk variance signals from $1500~{\rm e^-}$ (20% MIP) to $7000~{\rm e^-}$ (100% MIP) is measured to be 30 ns figure 3.7b. A skew of 45 ns is observed between the hit-timing (time-walk) of row 0 <sup>\*</sup>Standard settings refer to VPload = 5 and VNfoll = 10 and row 319 figure 3.8b. The reason for the skew is the different length and different capacitive load of the long routing lines from pixels to the periphery. It was recommended to be fixed in ATLASpix3 design. All routing lines of ATLASpix3 have the same length. An extra metal layer in TSI process was an added advantage during the routing of ATLASpix3. It is also possible to implement digital correction to mitigate this effect. However, it needs careful verification. The dependency of timing skew on columns figure 3.8a is measured to be about 10 ns. The exact reason for this is unknown; an educated guess says that it can be caused by voltage drop. (a) Column dependency of TS Figure 3.8: ATLASpix1\_M2: time stamp skew across rows and columns #### 3.3 X-ray irradiation tests ATLASpix1\_M2 was irradiated up to 100 MRad, which is the required TID for ATLAS ITk layer 4 (Table 1.4). The dose rate was fixed to 935 kRad/hour. The irradiated ATLASpix1 samples have been characterized for leakage current variations, signal to noise ratio degradation and power consumption at room temperature. The measurements were performed using 300 mV injections at a readout speed of 200 MHz. #### 3.3.1 Leakage current As discussed in section 1.2.6, the oxide traps charges cause a negative shift in threshold voltage for PMOS and NMOS transistors. This effect dominates in the initial phase of irradiation, at doses between 0-10 MRad. The passive interface traps get activated by radiation and its effect dominates in the later phase of irradiation, from 10 to 100 MRad. At high dose rates and short intervals, little neutralization of oxide trap charge will occur. This can cause a negative threshold voltage shift for both p- and n-channel devices. In short, it becomes harder to turn OFF an NMOS, resulting in high leakage current. For PMOS devices, it becomes harder to turn ON, resulting in a low leakage current. The interface trap charge will have had insufficient time to buildup; hence, their effect is relatively small. For NMOS devices, a large negative shift in threshold voltage can significantly increase the drain-to-source leakage current which in turn contributes to static power consumption. At moderate doses, some build-up of interface traps will occur. For PMOS, the interface traps are positively charged causing negative threshold shifts. The effect due to oxide traps and interface traps add up together. Hence, for PMOS transistors, the net threshold shift is large and negative. This helps to reduce the static leakage current as the radiation dose increases. VSSA leakage current in figure 3.10b is almost steady or follows a downward trend because PMOS transistors dominate the analog part. Almost all NMOS devices in the analog part are enclosed. The digital power pin (VDDD) and the main power pin (VDDA) are shorted in the PCB. The current consumption shown in figure 3.10a is mostly because of the linear NMOS devices in the digital part. The lowering of the threshold of NMOS causes a steep increase in the leakage current till 5-10 MRad. This is due to the negative threshold shift due to oxide traps. A slow and steady buildup of interface trap charge dominates beyond 10 MRad. For NMOS, the interface traps are negatively charged, which causes positive threshold voltage shifts; hence the leakage current, after 10 MRad, follows a negative slope. Figure 3.9: PMOS and NMOS in AMS 180nm process showing the oxide structures To explain the increase in high voltage leakage current (figure 3.13) with an increase in radiation dose, the following hypothesis is proposed. The trapping of holes in the STI oxide leads to strong inversion of the p substrate beneath. This induced n-type channel like region touches the deep nwell and the p+ high voltage contact. It causes a strong electric field at the extended deep nwell - p+ junction figure 3.11b. The leakage current at this reverse bias pn junction increases at higher radiation doses because of the increase in number of holes getting trapped in STI oxide. The breakdown of this p-n junction can cause a high leakage current. This effect was annealed over a period of time. The formation of a parasitic BJT is possible between the pwell of protection diodes, deep nwell and the p+ substrate as shown in figure 3.12a. This hypothesis has been proven by the following experiment. The hypothetical PNP transistor has its emitter, base, collector terminals at pwell for protection diode ( $V_{gatepix}$ ), deep nwell ( $V_{DD}$ ) and p-substrate (HV), respectively. Keeping the high voltage constant at -10 V, $V_{DD}$ was varied from 1.8 to 1.2 V. The decrease in collector-base reverse bias resulted in a steady reduction of the HV leakage current from 24 $\mu$ A to 22 $\mu$ A. At this point, the emitter-base forward bias was increased from 0.6 V to 0.8 V, by increasing ( $V_{gatepix}$ ). A sudden increase in the HV leakage current (from 20 $\mu$ A to 100 $\mu$ A) was observed, implying the conduction of the parasitic PNP transistor. The formation of parasitic BJT is also possible in the pixel area as shown in figure 3.14. However, this has not been verified by experiment. (a) VDDA leakage current (analog and digital) (b) Analog power supply leakage current Figure 3.10: ATLASpix1 leakage currents after irradiation (a) Before irradiation **(b)** STI oxide charge accumulation after irradiation Figure 3.11: Effect of radiation on HV leakage current (a) Formation of a parasitic BJT due to induced extension of deep nwell because of trapping of holes at STI oxide - Si interface **(b)** Protection diode schematic **Figure 3.12:** Conceptual schematic of a parasitic BJT formation due to STI oxide - Si interface traps Figure 3.13: HV leakage current #### 3.3.2 Signal to Noise Ratio The signal generated in a silicon detector depends on the thickness of the depletion zone and dE/dx of the traversing particle. The noise depends on various parameters such as the geometry of detector, the biasing scheme, the readout circuitry etc. During each irradiation step, for an injection voltage of 500 mV, the output SNR was measured after 45 minutes of annealing. The SNR vs. TID plot **Figure 3.14:** Formation of a parasitic BJT in the pixels due to induced extension of deep nwell because of trapping of holes at STI oxide - Si interface in figure 3.15b shows an initial degradation of the SNR up to 5 MRad, which is believed to have improved due to annealing. Further degradation is observed at higher doses. The intensity of the X-ray beam has a non-uniform distribution. A degradation of SNR is observed from the bottom to top rows in figure 3.15b. The exact reason for this is open to further investigation. #### 3.3.3 Threshold Tuning Threshold tuning was performed after 100 MRad TID and 8 hours of annealing. The tuning circuit is based on a current-mode DAC called Tune-DAC or TDAC as shown in figure 3.16. TDAC current adds offset to the first stage of the comparator. The NMOS transitors $M_1$ and $M_2$ suffers from radiation damage and starts leaking. This causes the current to flow in the direction indicated by dotted red arrows in the figure. It causes a negative shift in the threshold of the comparator, that caused the missing of hit data. To mitigate this, $V_{nDAC}$ was increased resulting in the restoration of the current path shown by solid green line in figure 3.16. It was observed that increasing $V_{nDAC}$ results in better threshold tuning. The above mentioned observation remained consistent after prolonged annealing period. A few more anomalies were observed in tune DAC (TDAC) behavior, most of which were not persistent after annealing. In ATLASpix2, the DAC was implemented using PMOS transistors which are more tolerant to radiation. The same tuning procedure as explained in section 3.2.2 was repeated after irradiation. The threshold dispersion was reduced by a factor of two after tuning as shown in figure 3.17. The mean threshold was $2096\,\mathrm{e^-}$ with a standard deviation of $95\,\mathrm{e^-}$ . The mean value of noise distribution over the entire pixel matrix after tuning was $82\,\mathrm{e^-}$ . Due to the increase in noise after irradiation, the global threshold $(V_{thG})$ (a) Signal and noise plot for pixel (0,0) **(b)** Signal to noise ratio for different pixels in column 0 Figure 3.15: Signal to noise ratio measurements was set to 1.4 V and the baseline voltage ( $V_{BL}$ ) was set to 0.9 V before performing the threshold scan. The threshold and noise values post tuning is plotted over the entire matrix (320 $\times$ 56) in figure 3.18. 16 noisy pixels were masked in order to aquire the readout data. 16 pixels did not respond to injection pulses. Therefore, a total of 32 points are missing (marked as yellow). The threshold and noise map show an almost uniform distribution across the matrix. Figure 3.16: Analog front end: Pixel electronics including TDAC Figure 3.17: ATLASpix1: Threshold tuning results after 100 MRad TID #### 3.3.4 Power consumption The analog current consumption of the matrix includes VSSA and VDDA current. VSSA current was measured to be about 95 mA and VDDA current was measured to be about 57 mA after 100 MRad figure 3.10. Total power consumption of the pixel matrix is calculated as $216.6 \ mW$ which is equivalent to $316.67 \ mW/cm^{2\dagger}$ . To measure the leakage power consumption of the digital blocks (CAB Buffer and RCU), the analog blocks were comparator was turned OFF by setting VNPix, VNComppix, Vminus to zero. VPRamp, the bias for digital comparator in CAB <sup>&</sup>lt;sup>†</sup>Area of ATLASpix\_M2 from figure 2.8 is calculated as $((1.6 \text{ cm} + 1.3 \text{ cm}) + 0.36 \text{ cm}) = 0.684 \text{ cm}^2$ **Figure 3.18:** ATLASpix1: Threshold and Noise map at 100 MRad. The masked pixels are marked in yellow color. buffer was set to 20 mV. The digital leakage current (after 100 MRad) was measured to be 5 mA, which leads to a static power consumption of 9 mW. This is about 4% of the total power. Before irradiation, VSSA current was measured to be 120 mA and VDDA current was 31 mA. This gives a total power consumption of 199.8 mW which is equivalent to 292 $\rm mW/cm^2$ . Thus, the total power consumption has increased by 8.5% after irradiation. Given the fact that ATLASpix\_M2 has small pixels (50 $\mu m \times$ 60 $\mu m$ ), the power consumption can be reduced by increasing the pixel size for successive generations. #### 3.3.5 Summary Various measurements were carried out on ATLASpix1\_M2 in order to evaluate the functionality of its design blocks. It was shown that the serial data link works at the required rate (1.28 Gbps) for ATLAS ITk layer 4. The readout is fully functional after a total ionization dose of 100 MRad. Threshold tuning was successfully conducted before and after irradiation, ensuring the functionality of the pixel memory and Tune-DAC. Some anomalies in the DAC behavior were observed immediately after irradiation, which disappeared over annealing. The leakage current and SNR measurements follow an expected trend after irradiation. It may be advisable to use radiation-tolerant digital IC design in the future, although the readout circuitry is located at the chip periphery, far off from the active area. It was observed during time-walk measurements that the pixels at the bottom row (closer to the digital periphery) had less clock skew (in terms of time stamp) when compared to the pixel at the top row. The measured skew ( $\approx$ 45 ns) was greater than 25 ns bunch crossing time. Since it can cause an error in the hit data, it is recommended to be fixed in the next generations of ATLASpix. # Chapter 4 # ATLASpix2 : A multi project wafer run in AMS/TSI 180nm #### 4.1 Introduction ATLASpix1\_M2 featured a readout scheme with parallel hit transfer from pixels to hit buffers (PPtB) and Content Addressable Buffer readout (CAB). ATLASpix2 is a test chip (3.7 mm x 4.2 mm) in a multi-project wafer (MPW) run in AMS 180 nm process. It is optimized for better time resolution and faster readout than its predecessor, ATLASpix1\_M2. Three novel design concepts, namely, programmable sorted readout, hit neighbor logic and smart pixel grouping are introduced. The readout control unit features a 8b/10b standard Aurora encoder, task scheduler and a serializer. A hit word is 32 bit long and was transmitted at a rate of 1.6 Gbps in ATLASpix1. The specification was later changed to 1.28 Gbps to ensure compatibility with RD53A readout chip. The design of the readout control unit (RCU) and its scheduling operation will be explained in this chapter. ATLASpix2 architecture is similar to that of ATLASpix1\_M2, except that it has a different pixel grouping scheme. The number of pixels in a super-pixel group is fixed to be 12 instead of 16. The new pixel grouping scheme intends to have a dedicated bit for each pixel in the hit pattern. This avoids the possibility of ghost hits due to projection addressing used by ATLASpix1\_M2. ATLASpix2 employs a similar triggered readout scheme as ATLASpix1\_M2, using content-addressable buffers. The trigger buffer contains additional memory for ToT storage. It also includes a hit neighbor logic to ensure time-walk correction in case of clustered hits between different pixel groups. Figure 4.1: ATLASpix2 top level layout # 4.2 Architecture of ATLASpix2 #### 4.2.1 Pixel grouping and hit neighbor logic 12 pixels form a super pixel which is mapped to four CAB buffers. The tiling of different pixel groups are shown in the figure. Each buffer group contains four CAB buffers. For example, all pixels that belongs to super pixel A1 are mapped to buffer group A1 as shown in figure 4.2. When we adopt an architecture that involves pixel grouping, the Time-over-Threshold (ToT) implementation can be challenging because we can store only a limited number of ToTs per pixel group due to memory constraints. In ATLASpix2, a 6-bit ToT per hit pattern is stored, which means that each buffer can store 6-bit ToT and 10-bit timestamp. Let us assume the scenario where there is a clustered hit between two neighboring pixel groups A1 and B1, as shown in figure 4.2. If B1 has a larger share of charge than A1, without hit neighbor logic, the buffer group A1 records the time stamp corresponding to A1 pixel, and B1 buffer group records the time stamp corresponding to B1 pixel. If the time-walk effect is greater than 25 ns Figure 4.2: Smart pixel grouping in ATLASpix2 with hit neighbor logic (bunch crossing period), this can be mistaken as two separate hits. Hit neighbor logic in ATLASpix2 ensures that, when there is a cluster, the Time Stamp (TS) corresponding to the pixel that has major charge share is recorded, in this case, TS B1. This helps to mitigate the effect of time-walk. The interleaved pixel grouping ensures two ToTs per cluster (ToT (A1) and ToT (B1)) for enhanced spatial resolution. On the other hand, two sets of hit data are stored for a single cluster, which can lead to increased hit buffer memory usage. #### 4.2.2 CAB buffer In ATLASpix2, there is a 1-1 mapping between the pixels in a super pixel and address RAM (12-bits)\* When a pixel is fired, a 12-bit hit pattern is transferred to the periphery via routing lines. A hit receiver generates a 'HitOR' signal which <sup>\*</sup>unlike in ATLASpix1 where 16 pixels were mapped to a super pixel of 8-bit address. This address compression is disabled in ATLASpix2. However, the pixels in a super pixel share four CAB buffers as in ATLASpix1. is a bitwise OR function of the hit pattern. In the case of a pile up (hits shortly after each other), the asynchronous hit signal would stay high long enough, not able to distinguish the hits from one another. Hence, it is advisable to perform edge detection of the individual lines in the hit bus before generation of HitOR signal. This may require 12 hit receivers corresponding to each line, which is area expensive. For layout reasons, The area of a CAB block must scale to the pixel area<sup>†</sup>. Therefore, it was not implemented in ATLASpix. The HitOR signal is synchronized with a 40 MHz bunch crossing clock. The basic operation of a CAB is explained in section 2.3.2. The CAB buffer used in ATLASpix1 was re-designed to include novel features such as hit-neighbor logic, ToT storage and sorted readout. The content addressable memory (CAM) cells include a new comparator (Comp2 in figure 4.3) to compare the stored time stamp with level-1 trigger time stamp. The level-1 time stamp is stored in a trigger FIFO. The trigger FIFO is a part of the readout control unit. The second comparator is disabled with a control signal named "load2". Load2 signal is enabled during a full readout, where the hit data is read out in an unsorted manner. When there is an entry in the trigger FIFO, the readout state machine sends the L1 Trigger Timestamp (L1TS) to CAB. At first, the stored TS is compared with the delayed TS with delay equal to on-chip latency. If there is a trigger signal, the hit is latched for readout. The next step is to check if sorting is enabled. In the case of a sorted readout, the stored TS is compared with the L1 trigger TS. If there is a match, an internal signal is generated to fetch the data from memory, once there is a read signal<sup>‡</sup> received from RCU. The CAB block has a buffer depth of four with storage and logic area. A layout cell shown in figure 4.4 consists of the following - 1. 12 hit receivers (Analog) that level shifts the hit signal to CMOS. It has a built-in hitOR generator that performs the OR function, edge detection and synchronization with bunch crossing clock - 2. Hit neighbor logic to mitigate the effect of time-walk in case of clustered hits among neighboring pixel groups - Four CAB buffers with memory cells, triggered readout and sorting logic The hit receiver block is common to all four CAB buffers. The layout of CAB is made full-custom to achieve better area efficiency. The comparator and Time stamp RAM are integrated into a single cell in layout (CAM). ROM array (6-bit) is used to store the group address. An 18-bit SRAM array stores the local address and ToT (6-bit). The area of CAB logic is the same between ATLASpix1\_M2 and ATLASpix2. $<sup>^{\</sup>dagger}$ The pixels in ATLASpix2 are $128 \times 50 \, \mu \mathrm{m}^2$ <sup>&</sup>lt;sup>‡</sup>the "read" signal here is a buffered "load column (LdCol)" signal from scheduler FSM figure 4.8 and the "load" signal is a buffered "load pixel (Ldpix)" signal in figure 4.8 Figure 4.3: ATLASpix2 CAB buffer logic diagram Pixel address storage in ATLASpix2 required four additional RAM cells. The ToT implementation requires an extra 6-bit storage space per CAB. The maximum height of CAB block is $50\,\mu\mathrm{m}$ which corresponds to the height of a single pixel. The only possibility to include the RAM cells and additional logic to the existing layout was by increasing its width. 10 extra RAM cells would cause $12.62^{\$}$ microns increase in width. An additional 6 microns is accounted for routing overhead. Hence, ATLASpix2 buffer area increased by 16.36% compared to ATLASpix1\_M2, due to added storage space. The pixel width was scaled accordingly ¶. #### 4.2.3 The Readout Control Unit (RCU) The major blocks of readout control unit are the clock tree generator, scheduler FSM, time stamp generator, encoder, synchronizer and the serializer logic. The blocks are explained in detail in the following sections. The top level representation is shown in figure 4.5 along with the clock distribution over different stages. <sup>§</sup>The width of a single RAM cell is 2.62 microns. 10 additional RAM are required when compared to ATLASpix1 (4 extra address bits and 6 ToT bits). This results in an increase in the width of a buffer cell by $2.62 \times 10 = 12.62$ microns $<sup>\</sup>P$ ATLASpix2 pixel dimensions (x $\times$ y) are $128\,\mu\mathrm{m}\times60\,\mu\mathrm{m}$ where as ATLASpix1 has smaller pixels $60\,\mu\mathrm{m}\times50\,\mu\mathrm{m}$ Figure 4.4: CAB buffer layout of ATLASpix2 The last stage of serialization is done using an analog serializer located at the chip periphery. The design of the analog serializer is beyond the scope of this dissertation. Figure 4.5: ATLASpix2 RCU top #### Time Stamp generator The time stamp generator generates three time stamps, namely, TS[9:0], TSDel[9:0], TS2[5:0]. TSDel is a delayed version of TS by an on-chip latency which is programmable. TS of 25 ns length is transmitted to the hit buffers. When there is a hit, the hit data which consists of the pixel address, row address, column address, ToT and the TS is stored in the Content Addressable Buffer (CAB) memory. If there is a match between the stored and delayed TS in a buffer and the trigger signal is received, the data is selected for readout. TS2 has a length of 12.5 ns and is used for ToT storage. The time stamps are generated using binary counters and later gray coded to facilitate error correction. #### **Trigger FIFO** The trigger Time Stamp (TS) is stored in a FIFO called the "trigger table" (figure 4.6). The required depth of the trigger table is estimated to be 32 for ATLASpix2. The purpose of having a trigger table is to store the trigger data and readout the hits in chronological order. When a trigger signal arrives, the trigger data is stored in the trigger table. The trigger data may include the TS and the trigger ID that is received together with trigger command. In ATLASpix2, only the trigger TS is stored in the trigger table and command decoder is not integrated into RCU. | Trigger TS 31 | | | |---------------|--|--| | Trigger TS 30 | | | | 0 | | | | | | | | | | | | | | | | 0 | | | | Trigger TS 1 | | | | Trigger TS 0 | | | | 90 | | | **Figure 4.6:** Trigger FIFO with a maximum storage of 32 Level-1 trigger time stamps Figure 4.7 shows the simulation of read and write operations on the trigger FIFO with partial sorting (4.2.3) disabled. The write signal to the FIFO is derived from a synchronized trigger signal with the falling edge of the 40 MHz clock. TSDel signal Figure 4.7: Trigger table read and write acts as the trigger time stamp. For simulation purpose, it is programmed to be 10 TS which translates to a latency of 0.25 $\mu$ s where each TS is 25 ns. #### Task scheduler: The readout state machine The scheduler FSM (figure 4.8) sends control signals to the matrix to load the hit data into the data bus which is eventually readout. This is done in different stages. The readout state machine is activated when there is an entry in the trigger table or there is an external "Force Read" signal. The Trigger TS is read from the trigger table and is transmitted to buffer and which marks all the data that belong to that particular event to be readout. At the end of each event , it sends out an End of Event (EoE) word which includes the BCID. PriortyFromDet = 1 implies there are unread hits in the Buffer. Load2 signal is introduced to enable force reading even when there are no entries in the FIFO. This should help to prevent data overflow in the buffer memory. The state machine runs at 160 MHz clock. The data is output every fourth clock cycle that corresponds to the input frequency of Aurora 8b/10b encoder. The states where there is a data output is color-coded as green in figure 4.8. The synchronizer groups the 32-bit data into four 8-bit words which are fed to the encoder. The encoder output is, in turn grouped into four 10-bit words which is multiplexed and serialized. This is implemented as a combinatorial logic which is explained in section 4.2.3 and no multichannel processing is involved. Apart from data words, the scheduler state machine is also responsible for sending out comma words to the Figure 4.8: ATLASpix2 readout state machine Aurora encoder which determines the state of the Aurora state machine, that sends comma words from time to time. When a data word is sent, the "data valid" signal is set to 1 which is otherwise set to zero. The data valid signal from the scheduler state machine is used "K control" signal for the encoder (section 4.2.3). #### **Output Data Pattern** The data output from the readout FSM is 32-bit wide and can contain hit words, End of Event (EoE) words or debug words depending on the state of the scheduler. A "data valid" signal is set high when data words are transmitted. When the valid signal is zero, the Aurora encoder sends different comma words depending on the state of the Aurora FSM 4.2.3. The EoE word can be made compatible with the RD53A [39] data output scheme although ATLASpix2 is still an intermediate step towards the final design. ATLASpix2 has two readout modes based on the sorting of hits. In fully sorted readout mode, the hit data output follows the format captioned "hit word (a)" in 4.9. In partially sorted mode, three least significant bits of trigger time stamps are masked, hence it is important to transmit the 4 LSBs of the time stamp read out from the matrix as shown in the caption "hit word (b)" in 4.9. Partial sorting is enabled by default. The ToT information is contained in 6-bits which is referred to as TS2 during time stamp generation explained in section 4.2.3. **Figure 4.9:** ATLASpix2 output data format Figure 4.10: ATLASpix2 output data simulations #### Sorting of hits In ATLASpix2, the hits are read out in chronological order of events. The readout modes are configurable. In case of full readout, sorting logic can be disabled. In such cases, 10-bit time stamp data is transmitted along with the hit data. When sorting is enabled, only 4 LSB bits of stored time stamps are transmitted. The time stamp can be recovered from the L1TS data sent along with end of event word. There are two modes of sorting namely, full sorting and partial sorting. The hit data is sorted according to the chronology of events. In case of partial sorting, 3 LSB bits of trigger TS are masked while sorting, which means eight trigger time stamps are treated as one. The concept of partial sorting is shown in figure 4.11. The hit data corresponding to eight consecutive events are indistinguishable. However, since the LSB bits of trigger TS are transmitted along with the hit data, the trigger TS can be recovered once the L1TS is known. Thus, the hit data corresponding to each event can be distinguished. Partial sorting helps to reduce the number of entries in the trigger FIFO. The hits can be read out at a faster rate which in turn reduces the risk of buffer over flow. <sup>&</sup>quot;Full sorting can be enabled by setting the pin "EnableFullSorting" to high Figure 4.11: Partial sorting concept #### Clock generation and timing Digital systems most often require the generation of different clock frequencies from an input clock. Clock dividers can be implemented using flip flops, counters and can be combined with combinational logic for generating the required frequencies and/or duty cycles. The input clock is most often the fastest clock in the system and is used to generate slower clocks by means of clock dividers. It is easy to achieve 50% duty cycle when the clock is divided by an integer N, where N is the multiple of 2. But many a times it is required to have clocks where N is an odd number or a non integer. There are different design techniques that can be used to obtain 50% duty cycle in such cases as articulated in [6]. Johnson Ring Counters can also be used to divide the frequency of the clock signal by varying their feedback connections. A commonly available standard 5-stage Johnson counter such as CD4017 is used as a synchronous decade counter or divider circuit. A Johnson Ring Counter or "Twisted Ring Counter", is a shift register with feedback. The inverted output of the last flip-flop is connected to the input of the first flip-flop. The main advantage of this type of ring counter is that it only needs half the number of flip-flops compared to the standard ring counter. An N-stage Johnson counter will give a sequence of $2 \times n$ different states and can therefore be considered as a mod-2n counter. On each successive clock pulse, the counter circulates the same data bit between the flip-flops over and over again around the ring. But in order to cycle the data correctly around the counter we must first "load" the counter with a suitable data pattern. The circuit and truth table of a Johnson counter is shown in figure 4.12. Figure 4.12: Johnson Ring counter In ATLASpix2 we use Johnson counters with a certain preset condition to derive clocks. To achieve this, the PRESET of the flip flops are connected to the negative output of the last flip-flop. Clocks of frequencies 160 MHz<sup>¶</sup> and 200 MHz <sup>§</sup> are generated using mod5<sup>†</sup> and mod4\* Johnson counters, respectively. The counters are implemented as shown in figure 4.13. The fastcnt5 begins at state of all 1's. The negative output of fastcnt5 [0] is fed back to the input of fastcnt5 [3] to allow the propagation of zeros. When the last flip flop toggles after four clock cycles, the PRESET signal sets the output of all the flip flops back to logic-1. Hence we operate this counter with a modulus of 5. Fastcnt4 works in the same way as Fastcnt5 with a modulus of 4. There is a slight difference in implementation of PRESET as shown in figure 4.13. The 160 MHz clock has 40% duty cycle since it is obtained by dividing 800 MHz clock by 5 which is an odd number. The phases of different clocks are <sup>&</sup>lt;sup>¶</sup>this clock is referred to as clk\_4n in the RTL code and in ATLASpix1\_M2 design document <sup>§</sup>this clock is referred to as clk\_3n2 in the RTL code and in ATLASpix1\_M2 design document <sup>&</sup>lt;sup>†</sup>this counter is referred as Fastcnt5 in the RTL code and in this document <sup>\*</sup>this counter is referred to as Fastcnt4 in the RTL code and in this document clk 800 MHz (clk 200 MHz) Fastcnt4 [1] Fastcnt4 [2] Fastcnt4 [0] Q Q $\overline{\mathsf{Q}}$ Q Q PRESET PRESET PRESET Resetcnt4B clk 800 MHz (clk 160 MHz) Fastcnt5 [2] Fastcnt5 [3] Fastcnt5 [3] Fastcnt5 [2] Q Q Q Q Q Q PRESET PRESET shown in the simulation result in figure 4.15. Figure 4.13: Fast counter implementation | clk<br>800 MHz | fastcnt5 [3] | fastcnt5 [2] | fastcnt5 [1] | fastcnt5 [0] | PRESET | |----------------|--------------|--------------|--------------|--------------|--------| | 1 | 1 | 1 | 1 | 1 | 0 | | 2 | 0 | 1 | 1 | 1 | 0 | | 3 | 0 | 0 | 1 | 1 | 0 | | 4 | 0 | 0 | 0 | 1 | 0 | | 5 | 0 | 0 | 0 | 0 | 1 | | 6 | 1 | 1 | 1 | 1 | 0 | | 7 | 0 | 1 | 1 | 1 | 0 | | 8 | 0 | 1 | 1 | 1 | 0 | | | | | | | | Figure 4.14: Fast counter truth table Figure 4.16 shows how various clocks are distributed in ATLASpix2 RCU. The clock tree is generated from the input clock<sup>‡</sup> of frequency 800 MHz. 400 MHz clock<sup>||</sup> <sup>&</sup>lt;sup>‡</sup>this clock is referred as clkIn\_800p in the RTL code and in ATLASpix1\_M2 design document <sup>11</sup> this clock is referred as clk\_1n6 in the RTL code and in ATLASpix1\_M2 design document Figure 4.15: Clock phases is generated from 200 MHz using a frequency multiplication technique shown in figure 4.17. The logic can be reduced to using a delayed logic and an XNOR gate. This technique was patented in 1997 [74]. In ATLASpix2, the frequency multiplication is realized in RTL using NAND gates although the CAD algorithms provide an automatic synthesis of nearly minimal NAND gate logic networks. Figure 4.16: Clock tree distribution in ATLASpix2 RCU Figure 4.17: Generation of clk 400 MHz Figure 4.18: Generation of periodic reset for slow counter There are two slow counters in ATLASpix2 RCU namely, "cnt4" that runs at 160 MHz and "cnt5" that runs at 200 MHz. The role of these counters in synchronizing the data transfer is explained in section 4.2.3. The clock generation block is responsible for generating a signal known as "syncres" which resets the slow counter, "cnt4" to state 2'd1. The generation of this synchronous reset signal is shown in figure 4.18. The waveforms of this logic are shown in simulation (figure 4.19). The phases of mod5 fast counter and mod4 fast counter align automatically every 20 clock cycles. This is used to generate the syncres signal periodically. Figure 4.19 shows a timing diagram of the slow counters and how they are reset. Cnt4 is responsible for reseting cnt5 and ensures synchronous operation every four clock cycles of 160 MHz clock. **Figure 4.19:** Timing diagram for periodic reset generation of slow counters #### Serializer A serializer is used to combine "N" data streams at a bit rate of "R" to a single data stream of bit rate "N $\times$ R". Figure 4.20 shows a possible implementation of serializing two data streams into one. The data changes at the falling edge of the clock. A multiplexer is used to transmit the even and odd data channels at the falling and rising edges of the clock, respectively, thereby achieving twice the input bit rate at the output. Figure 4.20: Basic MUX based serializer One of the major issues with this implementation is the possibility of glitch in the output data stream towards the falling edge of the clock because of clock skew, rise and fall times of the clock and synchronization issues with the input data stream. Hence timing needs to be strictly constrained. The effect of clock skew is shown in figure 4.21. Here, the clock has a skew of one-fourth of its time period with respect to the falling edge. The resulting data output is not only non-periodic but also has a glitch. If the output data width is not equal to T/2, we either tend to lose bits or the number of bits transmitted will be more than the input sequence, without valid extra bits. This results in incorrect data transfer. To make this design more robust, let us consider the idea of sychronizing the input data using flip flops as shown in figure 4.22. The even data stream is registered at the positive edge of the clock and the odd data stream is registered at the negative edge of the clock. The clock to output delay of the flip flops (T<sub>cq</sub>) ensures that the even and odd data streams are stable before their respective selection phase of the clock during multiplexing operation, as shown in figure 4.23. In order to serialize multiple data streams we need to scale up this implementation. We can add multiple stages of multiplexing that include clocks operating at different frequencies. For 4 channel input we must use two MUX stages as shown in figure 4.24. The incoming data must be preceded by a synchronous design stage such as a state machine. The highest frequency clock is obtained from Clock Data Recovery (CDR) which is divided to generate the lower frequencies. MUX tree can be used to serialize N number of data streams where N is a multiple of 2. Figure 4.21: Issues with clock skew Figure 4.22: Input synchronization Figure 4.24: MUX based serializer tree Figure 4.23: Timing after input sychronization Now let us have a look at how this logic is implemented in ATLASpix2. The encoder output is 10-bit data. Four sets of 10-bit data output from the encoder are channeled to five 8-bit data sets. The serializer tree shown in figure 4.25 can serialize 8-bit data set at 200 MHz clock rate. It has two output bits. The final stage of serialization is done by an analog serializer [98]. The 800 MHz input clock is used to generate the rest of the clocks for the serializer tree as explained in 4.2.3 Figure 4.25: ATLASpix2 serializer tree We need some kind of synchronizing logic in between the Aurora 8b/10b encoder 4.2.3 and the serializing tree in order to ensure that the data is serialized without loss. This is explained in 4.2.3 #### 8b/10b Aurora encoder The Aurora Protocol is a link layer communications protocol for use on pointto-point serial links. Developed by Xilinx, it is intended for use in highspeed (gigabits/second and more) connections internally in a computer or in an embedded system. It uses either 8b/10b encoding (implemented in ATLASpix2) or 64b/66b encoding (implemented in ATLASpix3). The encoder receives the "data word" and "k control" from the scheduler FSM described in section 4.2.3. The purpose of comma words at regular intervals is to enable synchronous reception of the incoming bitstream. In ATLASpix2, the Aurora encoder takes in 8-bit data and the corresponding K-control, every fourth clock cycle of the state machine which is equivalent to 40 MHz since the state machine runs at 160 MHz. Comma words are set to "1111" during the transmission of "BCBCBCC" words and "0000" during the transmission of data and debug words. The 8-bit data is encoded to 10-bit. The Aurora output depends on the state of the Aurora FSM. For details on Aurora state machines, please refer to [53]. Clock compensation is disabled in the current implementation since the receiver and transmitter use the same clock frequencies. Figure 4.26: Implementation of Cadence Aurora 8b/10b in ATLASpix2 **Figure 4.27:** Aurora 8b/10b block-level representation. The state diagrams of the control FSMs can be found in Xilinx documentation of standard 8b/10b Aurora protocol [53] #### Data packaging unit and synchronizer Let us remember the top level RCU architecture in figure 4.5. It is clear that we need a synchronizing logic between the state machine, Aurora 8b/10b encoder and the serializer. This is one of the more interesting part of this design. The readout state machine generates 32-bit data output that needs to be encoded. The 32-bit packet is divided into four 8-bit packets since the encoder can process only 8-bits at a time. A demultiplexing logic (figure 4.28) allows the distribution of the 32-bit long data packet to chunks of 8-bits which are fed into the input register of the Aurora encoder at different times. The timing is synchronized using a counter<sup>†</sup> operating at 160 MHz clock. The states of this counter play a crucial role in synchronizing the 8-bit data sets written into the input register of the encoder (ToAurora) and the 10-bit data sets which are read out from the output register of the encoder (FromAurora). The state machine and the Aurora encoder runs at the same clock rate (160 MHz). Since the state machine outputs 32-bit data every fourth clock cycle which is equivalent to 40 MHz, an encoding operation needs to take place for 8-bits at a rate of 160 MHz. This means that if we start a counter at 160 MHz, at every stage of the counter, we send and receive one set of data to and from the encoder. <sup>&</sup>lt;sup>†</sup>We refer to this counter as "cnt4" in this document as well as in the RTL code of ATLASpix2. **Figure 4.28:** ATLASpix2 data synchronizer logic It is important to register all the incoming data words without the risk of being overwritten by the subsequent data sets. For this purpose, we use a register\* which can contain 40-bits which is equivalent to four sets of encoded data or one 32-bit data frame from the readout state machine. A DEMUX logic based on cnt4 is used to ensure synchronization. TenToEight register follows the ring buffer concept shown in figure 4.29a. Reading and writing operations of this buffer are synchronized using cnt5 and cnt4, respectively (figure 4.29b). The counters, cnt5 <sup>\*</sup>The register is called "TenToEight" in this document as well as in the RTL code of ATLASpix2. and cnt4 runs at 200 MHz and 160MHz respectively and are synchronized at the initial state. Figure 4.29: Cyclic registers for data synchronization # 4.3 Summary ATLASpix2 aims to prototype the new CAB buffer with hit neighbor logic, ToT implementation and partial sorting. The readout control unit features an 8b/10b Aurora encoder. Various aspects of RCU design such as time stamp generation, clocking, trigger table, serializer design, data packaging and synchronization were discussed in this chapter. These blocks were reused in ATLASpix3 with some modifications. Both ATLASpix1 and ATLASpix2 were fabricated in AMS ah18 process. ATLASpix2 tape-in was followed by a change of foundry. The AMS AG foundry recommended TSI Semiconductors to its academic customers. Hence, ATLASpix2 design was re-submitted in TSI process. Both TSI and AMS 180 nm processes were compatible with h18 process (derived from former IBM CM7RF process). ATLASpix2 characterization was undertaken by HVCMOS collaboration. [120] prove a nearly 100% correlation for the sensor diode characteristics between AMS and TSI processes. TSI offered high-resistive substrate for engineering runs at a lower cost. The TSI 180 nm process has seven routing layers which became an 4.3 Summary 81 added advantage during ATLASpix3 design. The extra routing layer was used to equalize the routing delays between different pixels to periphery. The timing skew observed in ATLASpix1 is expected to be resolved in ATLASpix3 with innovative routing techniques. # **Chapter 5** # ATLASpix3: A reticle-size chip for HVCMOS quad-module construction for ATLAS ITk #### 5.1 Introduction ATLASpix3 is a $2\,\mathrm{cm} \times 2\,\mathrm{cm}$ HVCMOS chip with all desired features needed for the construction of CMOS quad module for ATLAS ITk. It is termed as the "CMOS demonstrator chip" for ATLAS ITk. ATLASpix3 adopts a triggered column drain readout architecture. The novel features of ATLASpix3 design include event sorting according to trigger ID, command decoder with clock data recovery and Aurora 64/66b data encoder. The command decoder and readout data format follow the same protocol as the hybrid readout chip, RD53a, to ensure compatibility with the ITk inner layers. The CAB buffer from ATLASpix1\_M2 and the hit buffer from ATLASpix1\_Simple are re-used in ATLASpix3 with some added features. This chapter will describe the overall architecture of ATLASpix3 with emphasis on its novel features. ### 5.2 Architecture of ATLASpix3 ATLASpix3 is reticle-size $(2\times2\text{ cm}^2)$ chip suitable for HVCMOS quad module construction. The readout periphery located at the chip bottom, occupies 10% of the total area. The inactive area of the periphery is within the acceptable range **Figure 5.1:** ATLASpix3 top-layout showing triggered and trigerless readout data paths for ATLAS. ATLASpix3 supports triggered readout. As a test feature, triggerless readout is also implemented. Figure 5.1 shows the overall chip architecture with its layout. ATLASpix3 has the following major blocks: the pixel matrix and the periphery. The matrix is composed of 132 columns. Every column has 372 pixels, and a digital front end that contains hit buffers (one per pixel), $2 \times 40$ Content Addressable Buffers (CAB) and data multiplexers called End of Columns (EoCs). The pixel size is $50~\mu\mathrm{m} \times 150~\mu\mathrm{m}$ (height $\times$ width). There is a long array of routing lines connecting the pixel matrix to the periphery. Special care has been taken to equalize the load capacitances of these lines irrespective of the position of the pixels. The readout scheme of ATLASpix3 is "Triggered Column Drain" (FE-I3 like). This scheme can be looked upon as the combination of column drain readout in ATLASpix1\_Simple and the triggered readout in ATLASpix1\_M2. The chip periphery contains main RCU (for triggered readout), auxiliary RCU (for untriggered readout), Clock Data Recovery (CDR) block (customized digital part), configuration shift registers, bias block with the DAC shift register, voltage DAC block with the VDAC shift register, voltage and power regulators for serial powering, and I/O pads. # 5.3 Readout periphery design The buffers, digital and analog chip bottom occupy a $10\times$ smaller area than the active area of the pixel matrix. The readout periphery is placed on the bottom to avoid digital cross-talk. A pixel contains a sensor diode, Charge Sensitive Amplifier (CSA), comparator, threshold tune DAC, 4 bit RAM, and an output driver. The output of the pixels are transmitted to the hit buffers via long routing lines. When a pixel fires, the corresponding hit buffer stores the global time stamp (TS) from the Readout Control Unit (RCU) and generates a hit word. The hit data contains row address (9 bit), column address (8 bit), leading edge time stamp (10 bit) and time over threshold (7 bit). The time stamps are stored in the dynamic RAM of hit buffers. The data path from here onwards depends on the activated readout scheme. In case of triggered readout, hit data is transfered to the Content Addressable Buffers (CAB) where hits are filtered based on a trigger signal. The architecture of CAB is explained in Chapter 2. In case of triggerless readout, hit data is transferred from the hit buffers to the EoC for untriggered readout. ATLASpix3 has two Readout Control Units (RCUs): The main RCU for triggered readout and the auxiliary RCU for triggerless readout. The main RCU includes a command decoder with clock data recovery, Aurora 64b/66b encoder and serial data link at 1.28 Gbps. The auxiliary RCU is adapted from ATLASpix1 which supports triggerless readout. #### 5.3.1 The Readout Control Unit (RCU) The readout periphery contains synthesized digital logic called Readout Control Unit (RCU). RCU communicates with the digital front end as shown in figure 5.3. RCU has the following blocks: readout controller, clock generator, command decoder, 64b/66b Aurora encoder, data packaging unit and serializer. #### Readout controller Readout controller has two state machines: a pixel state machine that transfers the hit words from hit buffers to CAB buffers and a readout scheduler state machine that is responsible for reading out the triggered hits from CAB buffers. It also contains a FIFO where the L1 trigger entries (TS and ID) are stored. The triggered hits are sent out sorted by L1 event TS using 32 bit words. These hit words are divided into data words and control words. The former contains pixel information, the latter contains some debug information and the L1 TS. The hits words are 64-bit long and are sorted in the order of events. The readout scheduler sends 64-bit frames to the data packaging unit as shown in figure 5.4a. The 64-bit word contains two 32-bit data words which are packaged onto a register along with a valid bit. The register always contains an even number of data words. There are different data words as shown in figure 5.4b. Every triggered event is read out with the BoD word, followed by one or many hit words, until the EoC registers are empty. Figure 5.2: ATLASpix3: Readout control unit The new set of data is loaded to EoC buffers After that, the next column is loaded, which causes sending of another BoD sequence or in case of no more hits, an EoE word. If the total number of data words is odd, a space word is attached. An empty event contains an EoE word and a spacing word. The right half of the register is filled first which will be the LSB of Aurora FIFO content. As mentioned, to every 64-bit data word, half a byte that indicate the size of the word is attached called DataBytes. The DataBytes can take a different value depending upon the number of data words. It is set to "4'h8" if the number of data words are even and "4'h7" if it is odd. If the DataBytes is "4'h7", a space word is attached to the data word before sending it to Aurora FIFO. #### Aurora 64b/66b encoder Aurora FIFO stores the 64-bit package along with the 4-bit DataByte. The value of Databyte indicates a "full data" or a "half data with space word". This is used by Aurora multiplexer to decide between encoding a full data (64-bit) and including a separation block (8-bit) along with the data (56-bit). The multiplexer works based on a pre-defined priority, as listed in table 5.1. The priority multiplexer adds two header bits which is "01" during a full data transmission and "10" during all other cases, including the transmission of idle frames. The 64-bit data is scrambled and transmitted along with its header as 66-bits. The header bits are not scrambled as they are self DC balanced. The 66-bit encoded output are serialized as MSB first. **Figure 5.3:** ATLASpix3 readout periphery. Hit/CAB buffers are laid out in double column. Data transfer during triggered readout is denoted by arrows | <b>Table 5.1:</b> Aurora MUX | priority list | in descending ord | ier of priority | |------------------------------|---------------|-------------------|-----------------| | | | | | | Priority | Task | Data Format | |-----------------|---------------------|------------------------------------| | 0 | Clock compensation | | | 1 | Not ready | | | 2 | Channel bonding | | | 3 | Native flow control | | | 4 | User flow control | | | 5 User k blocks | Hear k blacks | 10, UserK[63:0] | | | USEI K DIOCKS | 10, 56'hFFFFFF, monitor_userk[7:0] | | | | 01, Data [63:0] | | 6 | User data | 10, IDLE_BLOCK, 8'h10, 48'hADA000 | | | | 10, SEP7_BLOCK, Data [55:0] | | 7 | Idle | 2'b10, IDLE_BLOCK, 8'h10, 48'hADA | Sending of the control words (from priority 0 to 4) is not implemented. UserK blocks (priority 5) can be the state of configuration register or a MonitorConfig register. These words are sent periodically\* or whenever there is a "readback" command. The UserK words take priority over the hit data. When no data or UserK words are sent, an idle word is sent out (priority 7). <sup>\*</sup>A periodic state machine handles this operation. Its period can be programmed Figure 5.4: ATLASpix3: 5.4a Data packaging unit and 5.4b data output format #### Command decoder The main purpose of Command Decoder (CD) block figure 5.6 is to receive and process commands. The input of the command decoder is a command bitstream at 160 Mbps which is used to recover the 160 MHz clock, L1 triggering with trigger tag, configuration bits and readback. The commands are sent in 8-bit frames which are encoded in a special encoding pattern as RD53A [39]. Each command frame contains a command or 5-bit word, which can be chip ID / register address or data. Unless in broadcast mode, the chip ID should match with the hard-coded ID (burned in EFuse). If not matched, the command will be ignored. The command processor generates a 40 MHz bunch crossing clock from the data stream. This clock can be phase shifted by an external configuration called BX **Figure 5.5:** Aurora 64b/66b implementation in ATLASpix3 phase. A command word is 16-bit long and should be sent in "LSB first" manner. A "SYNC\_WORD" (16'b1000\_0001\_0111\_1110) must be sent at the beginning of communication, since it is used by the receiver to align with the bitstream. It is also sent at regular intervals to ensure the receiver synchronization. The command words can be of three types: 1) trigger command 2) Read/Write register command 3) SET bit command. Read/Write register command is used to generate data ( $S_{in}$ ) and control signals ( $Ck_1$ and $Ck_2$ ) for reading and writing into the configuration registers. There is 10-bit data associated with the data part of this command. There is a state machine that works at 10 MHz <sup>†</sup>that controls the shifting operation. The shifting of a single bit requires 4 clock cycles. The SET bit command writes into a triple redundant 10 bit command-register. These bits are used to load output latches, writing and readback of pixel RAM, injection by command and to reset time stamp counters. The SET\_BIT command is associated with 2 bytes of data and chip ID. ### **Trigger generator** Figure 5.7 shows how the trigger signal is recovered from command words. The clock generator block generates a 40 MHz clock and a trigger signal which are in phase with the command words. The trigger signal can be phase shifted in steps <sup>&</sup>lt;sup>†</sup>knows as "interface\_FSM" in the RTL code. The speed of shifting is made externally configurable Figure 5.6: ATLASpix3 Command decoder with clock data recovery of one-fourth of the bunch crossing frequency to compensate for analog delays. Alternatively, external triggering is also possible. Auto trigger settings (stored in configuration register) define the trigger delay and trigger width. BX- and Injection phase (also stored in configuration register) change the phase of bunch crossing clock/trigger/trigger tag and injection. Trigger source (stored in configuration register) defines how the trigger is done: 0 and 3: from trigger command, 1: using auto trigger generator that is initiated by hit bus and 2: using auto trigger generator that is initiated by injection. The trigger command is used for generating a trigger signal and a trigger tag. The command consists of an encoded trigger pattern (8-bits) and an 8-bit code for trigger tag. The trigger encoded in a pattern that is adopted from RD53A. Since the trigger command has 16-bits, its transmission takes 100 ns<sup>‡</sup>. This is equivalent to four bunch crossings. This explains why encoding of 16 possible trigger patterns are required. 5-bits LSB of trigger tag is decoded from the 8-bits and 2-bits header is recovered from the trigger pattern (figure 5.7). <sup>&</sup>lt;sup>‡</sup>The transmission rate is 160 Mbps Figure 5.7: ATLASpix3 Trigger generator Figure 5.8: ATLASpix3 encoder and serializer with clock tree ### Clock generation and timing The clock tree generation is similar to that of ATLASpix2 (section 4.2.3), except that ATLASpix3 uses different frequencies to adapt to the new design specifications. The clock generator receives a 640 MHz clock and generates a 160 MHz command clock and a 58.18 MHz § clock, which is used by the Aurora encoder. <sup>§</sup>equal to $\frac{640}{11}$ , generated using a "divide by 11 counter" ### 5.4 Summary ATLASpix3 is the first reticle size ( $2~\mathrm{cm} \times 2~\mathrm{cm}$ ) monolithic sensor chip with all the required features for the construction of CMOS quad modules for the ATLAS ITk layer 4. It has a configurable readout scheme that supports triggered readout. A debug readout mode is implemented without trigger. Hit data format and command protocol are compatible with the RD53A readout IC. The design was optimized for ATLAS ITk layer 4 using the simulation results from a readout modeling environment. The testing of ATLASpix3 is an ongoing task during the articulation of this dissertation. The triggerless readout is proven to be functional. The clock generation is tested to be working. Test beam studies are being conducted within the HVCMOS collaboration. # Chapter 6 # Full-chip verification and timing closure ### 6.1 Introduction With the gate-count and system complexity growing exponentially, deep submicron technologies pose many challenges to both the design and verification domains. Most of the high-performance system-on-chip designs integrate digital cores with mixed-signal IPs, in order to meet their specifications. Since various domains are involved, the full-chip verification of a complex system-on-chip is not a trivial task. Most of the full-chip verification platforms have focused on the gate-level environment, due to the complexity of the transistor-level setup. However, the most reliable path to accurate and rapid system verification should be based on innovations in both tools and mixed-signal methodologies. Designers need to develop a quality assurance (QA) flow that facilitates the detection of a wide range of bugs at an early design stage. This chapter aims to provide an overview of the design verification methodologies used in ATLASpix designs. Apart from regular mixed-mode simulations, a full chip RTL simulation environment has been developed for testing the corner cases. ATLASpix pixel matrix is entirely analog design. The buffers used to store hit information are made full custom to meet the area constraints. The readout control unit is digitally synthesized. Since ATLASpix is fabricated in 180 nm, it follows the traditional "analog-on-top" integration methodology, contrary to new generation readout chips such as RD53A ([39]), which follows "digital-on-top" methodology. "Digital-on-top" methodology is gaining interest in the CERN design community, due to increasing logic density of the pixel readout chips. Digital-on-top methodology is explored in sub-nm designs (chapter 8). Once block level functionality is ensured, the full chip timing is verified using mixed-mode as well as digital simulations. Since the former technique is well known in analog design community, the latter has been given emphasis in this chapter. ### 6.2 Mixed-mode simulation Pixels have been simulated using an analog simulator. Owing to the large size of the pixel matrix and its interconnect lines, full chip timing closure has always been a challenge. Hence, a simplified matrix (analog) with RCU (digital) has been verified through mixed-mode simulations. Resistance and load capacitance of interconnects can be obtained through RC extraction and simple calculations. The effect of RC delay of these routing lines that transmit the hit signals from the pixel driver to the hit buffer is investigated. Simulations done on a simplified RCextracted netlist show a difference of 36 ns between the RC delays of interconnect lines corresponding to the top most pixel and the bottom most pixel in a column. In order to minimize this effect, the next generation ATLASpix chips are planned to be layouted in a way that the length of routing lines from pixel to hit buffer remain uniform across the matrix. This can be achieved with the help of an additional routing layer. The simulation of ATLASpix1\_M2 took about 12-18 hours using a simplified matrix. The runtime bottle neck of mixed-mode environment can be overcome by using full chip digital models for corner-case testing in an early design phase. ## 6.3 RTL Design and Verification Environment An RTL verification environment named "ATLASpix Verification Environment" (AVE) figure 6.1, has been developed to simulate the dynamic behavior of ATLASpix chips. It is intended to drive the complete digital implementation flow (both front-end and back-end) in a systematic way. A full-chip RTL model emulates the dynamic behavior of the pixel matrix. The readout control unit is rigorously tested for several corner case scenarios. ATLASpix2 was the first chip to make use of this verification methodology although itsmr development started during ATLASpix1 timeline. ATLASpix1 used a simplified pixel matrix emulator as part of its RTL test bench. This section is intended to provide an overview of the full chip verification using RTL model. Since some parts of the chip, such as the buffers and EoC multiplexers are full custom, their behavioral models had to be generated and integrated into the verification environment. A top-level digital simulation is then done using post-routed netlist of the RCU to verify timing. Figure 6.1: ATLASpix digital design and verification environment ### 6.3.1 Full chip behavioral model RTL model of ATLASpix1\_M2 was used to verify ATLASpix2 since the design is very similar except for downscaling of the matrix, addition of ToT RAMs and a new trigger time stamp comparator. The RTL model of ATLASpix1\_M2 has been adapted to these modifications except for the downscaling of the matrix size which is not significant in the RTL simulation environment. Several modifications have been made to ATLASpix3 RTL model to include the additional features. This model is used for function verification of the RCU which is the most important control circuitry. The full chip RTL model includes the following: Gate level model of hit buffer, trigger buffer which includes the hit receiver, CAB logic, time stamp comparators and time stamp and pixel address RAM and group address ROM. There are two ways to implement RAM, either use the macro model provided by foundry or use a standard Verilog model which is not synthesizable. The latter approach was chosen because of two reasons: 1) the full chip Verilog model is only used for simulation 2) the RAM and ROM used in the CAB buffer were not standard IPs but custom designed. - 2. Gate level model of End Of Column (EOC) logic. This includes several latches and multiplexers. - Integration of readout blocks connected to a single EOC - 4. Such matrix columns form a block which is eventually connected to the digital periphery - 5. Digital periphery which contains the FIFO, scheduler state machines, data synchronizer, clock generator, encoder and serializer All the above blocks are synchronous, which runs using a clock that is 10 times faster than the clock used in the RCU design. Although the ATLASpix readout is asynchronous, synchronous models were used for the ease of integrating into a verification environment. Some modifications have been implemented for ATLASpix3 to make the model partially asynchronous. The following blocks are required in order to make this model complete. - 1. RTL modeling of the analog circuits such as pixels, PLL, analog serializer. - 2. Row and Column configuration registers and pixel memory #### 6.3.2 RTL Test bench The RTL test bench includes all the blocks under AVE, except the matrix behavioral model. It has two clock generators. A fast clock for the full chip RTL model and a slow clock to emulate the VCO output. Since the pixels are not part of this environment, binary hit patterns and commands are generated for each corner test case. In case of a triggered readout, a trigger signal is generated after a predefined latency. The output control signals from the RCU are connected to the matrix model and the hit data from the matrix model is fed back to the RCU. A data receiver is implemented as a sub block in the test bench to receive and decode the serial data output from the RCU. Owing to the different encoding schemes used in ATLASpix 1, 2 and 3, the receiver implementation is different. As of now, there is no Universal Verification Methodology (UVM) developed for ATLASpix designs. Figure 6.2 shows a full chip digital simulation where a data packet corresponding to an event is transmitted. A data set corresponds to an event. In this simulation, three hit data that belong to two adjacent columns are transmitted. In the first column, two rows are hit, which is implied by the generation two read column(RdCol) signals for the first column and one read column (RdCol) signal for the second column. The signal, "KAurora" indicates the presence of a valid data and it toggles during transmission of each data set . Figure 6.2: Full chip digital simulation of ATLASpix2 ### Data receiver ATLASpix1 and 2 use 8b/10b encoding scheme for high-speed serial data transmission. The encoder on the transmitter side maps the 8-bit parallel data input to 10-bit output. This 10-bit output is then shifted out through a high-speed serializer. The serial data stream will be transmitted through the transmission media to the receiver. The high-speed deserializer, on the receiver side, converts the received serial data stream from serial to parallel. The decoder will then remap the 10-bit data back to the original 8-bit data. Hence the receiver has two major blocks: 1) deserializer 2) 10b/8b decoder. They are reused in the firmware to receive and decode data from the chip. Data receivers are different for each ATLASpix Deserializer Din prototype due to their different encoding schemes. For ATLASpix 1 and 2, the data Figure 6.3: ATLASpix1 data receiver includes a deserializer, phase detector logic and a 10b/8b decoder. The phase of the incoming bit stream is aligned upon the detection of comma words. disperr DataOut [7:0] stream is pushed to a 20-bit shift register (deserializer). Each 10-bit data frame is compared to a RD+ or RD- code [135] that corresponds to the comma word "BC" within the control symbol K28.5. It is used to synchronize the incoming data frame. The beginning of data word is marked by "1C" followed by "AA". The decoder maps the 10-bit data to 8-bit according to the decoding lookup table. The decoded data is used to verify the hit information as per the binary hit pattern generated by the test bench. The design is verified to be functional if all the hit words are recognized along with the debug and idle words. ATLAspix3 receiver has been designed to descramble and decode the incoming data stream that follows Aurora 64b/66b protocol. The overhead\* of 64b/66b encoding is 2-bits (header) for every 64 bits, which is 3.125% of the total number of bits. This is an 80% improvement on the efficiency of 8b/10b encoding scheme, which adds 2 overhead bits for each 8-bits. The 66-bit is made by prefixing one of <sup>\*</sup>The overhead of a coding scheme is defined as the ratio of the number of added coding bits to the number of raw payload bits. **Figure 6.4:** 10b/8b Decoder with two separate blocks for 5b/6b decoding and 3b/4b decoding. Disparity error bit indicates an error in the decoded data two possible 2-bit headers to the 64-bit data or control word. If the header is 01, the 64-bits are data and if the header is 10, the 64-bit hold an 8-bit type field and 56-bits of control information or it can be 64-bits of data (state of configuration registers). Any other header indicates an error in the received code. The scrambler uses an LFSR with a feedback polynomial $1+x^{39}+x^{58}$ . The same polynomial can be used for descrambling. **Figure 6.5:** ATLASpix3 data receiver conceptual schematic: The receiver includes a descrambler, digital phase detector, combinatorial logic to verify the correctness of data alignment and serial-in parallel-out shift registers to output the decoded data packets The receiver has a phase detector that detects the unscrambled header bits, which are self DC balanced (either 01 or 10). The phase detector performs XOR functions of these adjacent bits, which are always 1 if the receiver is in phase. The phase detector has a circular ring counter that counts from 0 to 66 and back. In addition, there are two other counters called the match counter and the word counter that counts the numbers of matches and the number of 66-bit words respectively. After 100 cycles, the state of the counters are compared. In case of a wrong phase, match counter value will be less than that of word counter. The reciever then phase-shift itself by delaying the phase counter and resets the other counters. When the receiver is in perfect sync, the match counter and the word counter will increment equally. After 100 such increments, a lock is established and the data from the shift register is loaded to a parallel register, which holds the decoded 64-bit output. ### 6.3.3 Simulations using Readout Modeling Environment (ROME) Digital simulators are not suitable for optimizing the readout architecture; for example, the number of buffers needed. For this purpose, ReadOut Modelling Environment (ROME) [109] has been developed by Rudolf Scimassek, at KIT-ADL to assess and optimize the overall architecture. Its simulation framework was used to optimize the readout architecture of ATLASpix3 during the design process to meet the requirements of ATLAS ITk layer 4. ROME simulations shown in figure 6.6 have been used to optimize the ATLASpix3 readout system. At first, the readout structure was mapped to ROME syntax (XML). Using ATLAS ITk physics simulation data in a Monte-Carlo process, the readout architecture was tested. The ATLASpix3 design was then optimized and the ROME simulations were repeated. The respective fill states of the buffers were tracked to find the efficiency bottlenecks in scheduler state machine and FIFO designs. In order to keep the trigger FIFO from loosing data, the data word size was reduced from 64 bits to 32 bits. To enable this, level-1 time stamp was included in the control word of ATLASpix3. With regard to the Aurora FIFO fill state, the readout scheduler speed was doubled as the maximum fill state was far below the feasible size. 32 bit data words are beneficial compared to 64 bit because two data words can be processed per Aurora encoder cycle, doubling the output rate. In combination with other state machine optimizations, the readout efficiency for high occupancies could be increased well beyond the planned trigger rate of 2 MHz. For the unoptimized system with 64-bit data words, the particle detection efficiency drops after 2 MHz trigger rate whereas it stays high for the optimized system with 32-bit data words figure 6.6c. <sup>&</sup>lt;sup>†</sup>The particle detection efficiency denotes the fraction of charge clusters of which, at least one pixel hit was read out. It is chosen to generate a measure comparable to efficiencies from beam test measurements. **Figure 6.6:** FIFO depth and readout efficiency simulations using ROME: For 3 MHz trigger rate and 32 bit data words, the trigger FIFO and Aurora FIFO fill states are plotted over the simulated time in (figure 6.6a and figure 6.6b). The comparison of readout efficiency of 32 bit (unoptimized system) with 64 bit data words (optimized system) is shown in (figure 6.6c). The readout efficiency stays high for 32-bit data words at trigger rates beyond 2 MHz. ### 6.4 Physical design RTL synthesis was carried out using Cadence Genus Compiler. It is always tricky to ensure the clock signals are generated in phase for designs that include multiple clock domains. Manually specified clock constraints were useful in ensuring the correct generation of clock signals. All synthesizable RTL code was written in Verilog and System Verilog. One of the most useful features that System Verilog added to the Verilog standard was the ability to use multi-dimensional arrays as module ports and a more visible representation of states during FSM coding. All complex Verilog descriptions were properly handled by the synthesis tool, and functional verification ensured that post-synthesis netlist was correct. Low power techniques such as clock gating were not used in ATLASpix designs. This could be an area of improvement for dynamic power reduction. Place and route of the RCU was done using Cadence Innovus. The automated place and route flow was implemented using tcl scripts. The wrapper script served as a push-button for the entire back end flow after the synthesis. Height and width of the RCU block is the biggest constraint, since it is the last block to be integrated to the top-level layout. The power consumption of ATLASpix RCUs falls in the range of a few milliwatts, with the data link working at 1.28 Gbps, with a timing skew as low as 100 ps. The post routed netlist was simulated and verified using the full chip Verilog model and the same testbench, which was used in the RTL simulation. ## 6.5 Summary This chapter discusses the design verification methodology used for the ATLASpix pixel sensor chips. Full-chip digital simulations were instrumental in verifying RCU functionality and timing before the integration stage. Since the ATLASpix designs follow analog-on-top approach, it is crucial for the digital blocks to be free of timing violations after integration. Table 7.1 shows that the gate count and consequently, the power consumption of the readout control unit have increased over the last three generations of ATLASpix designs. This trend will continue with an increase in the complexity of design with added features. It may be advisable to take advantage of low power and high performance offered by advanced process nodes. Physical design is a crucial step, since the performance of the RCU determines the output data rate. Timing closure was done for the best, typical and worst case process-voltage-temperature (PVT) corners. ATLASpix designs are on the high performing side than on low power. Achieving the required output data rate of 1.28 Gbps was challenging for a 180 nm process technology. ## Chapter 7 # Summary of ATLASpix designs The HVCMOS sensor developments described in the last five chapters are compared with the current state-of-the-art. The digital front end and readout architecture are the main focus of this work. Let us compare the readout architectures of ATLASpix1 (chapter 2), ATLASpix2 (chapter 4) and ATLASpix3 (chapter 5). The readout architecture of pixel detectors can be broadly classified into hit driven and trigger driven, synchronous and asynchronous based on output bandwidth and clock propagation into matrix. For example, triggerless column drain and a shift register based readout belong to hit driven category, since all the hits detected will eventually make their way to the serial data link. PPtB and triggered column drain readout are trigger driven, which means the first stage of filtering irrelevant hits happen in readout buffers. The readout architecture is referred to as synchronous readout, when clock signals or time stamps are propagated to the pixel matrix. All ATLASpix chips follow asynchronous readout architecture due to two reasons: 1) Asynchronous readout architecture avoids cross-talk between the digital lines and the pixel matrix, which is analog. 2) Dynamic power saving due to less toggling lines, since no clock signals are propagated to the matrix. ATLASpix1. Simple uses triggerless column drain readout, which has been tested with 99.7% efficiency in test beam. ATLASpix1. M2 and ATLASpix2 use a triggered readout scheme called parallel pixel to buffer readout (PPtB). The PPtB readout scheme is comparable to the readout scheme used in FEI4. However, their implementation differs. In ATLASpix1. M2, a novel content addressable memory named CAB buffer is used to store and filter the triggered hits. The content addressable memory (CAM) allows the storage and comparison of time stamps. It is scalable based on the number of pixels in a group. ATLASpix3 has a triggered CD readout, which is similar to FEI3 readout IC. The hit rate that can be handled, is limited by the amount of memory that fits into buffer column. All the hits must be stored until trigger latency. Hence, the memory needed is directly proportional to pixel $hit\ rate \times trigger\ latency$ . The longer the trigger latency the smaller the hit rate, that can be handled by a limited amount of buffer size. The particle hits in a pixel detector are correlated among neighboring pixels due to clustered hits. The correlation between pixels would imply that the hit rate for a super pixel is less than that of a single pixel times the number of pixels per region. The amount of charge sharing between neighboring pixels depends on sensor and detector geometry. Smart grouping pixels into regions with shared buffer storage significantly reduces memory requirement. Analog information, on the other hand, must be stored for every hit pixel regardless of how the pixels are grouped. ATLASpix1\_M2 stores no analog information since it was area expensive (due to small pixel size, $60 \, \mu \mathrm{m} \times 50 \, \mu \mathrm{m}$ ) to implement the additional set of RAM cells in the buffer to store ToT. The CAB buffer for ATLASpix2 was re-designed to include additional SRAM cells to store ToT information. In case of cluster hits between the pixels in a group, hit neighbor logic ensures that the time stamp corresponding to the pixel hit that has a larger charge share is recorded. It helps to mitigate the time-walk effect. Hybrid pixel ROICs like RD53A has grouped four pixels into a region with a shared buffer of depth 4. The third generation ROIC has taken one more step forward to using a digital core, which is a collection of such regions, stepped and repeated. A large core allows to take full advantage of digital synthesis tools to implement complex functionality in the pixel matrix and sharing of resources among many pixels. Please note that the hybrid ROIC, RD53A is implemented in 65 nm with higher logic density than ATLASpix ICs. ATLASpix chips being fully monolithic in 180 nm do not find it compelling to move to a digital integration domain, although it will save the manual routing effort. The readout control unit (RCU) acts as the CPU of ATLASpix ASICs. It schedules the entire readout operation from pixels to chip periphery. It synchronizes the serial data transfer at 1.28 Gbps. It also generates the time stamps, which are used to record the time of occurrence of hits. RCU design has evolved throughout the ATLASpix generations. The first generation ATLASpix had a simple triggered readout, where the hits were read out in an unsorted manner. It required further processing to figure out the event co-relation of the data. The second generation had a sorted readout combined with partial sorting where 3 LSB bits of trigger time stamps are masked. Masking causes eight consecutive events to be processed as one. It also introduced a trigger FIFO to store the trigger time stamp. Implementing ToT storage came with added toll on ATLASpix2 output bandwidth. ATLASpix2 finds its trade-off in transferring analog hit information by means of a configurable data transfer scheme. ATLASpix2 can be configured to either send a fully sorted | Chip | Gate count | Power | Area | Serial link | |--------------|------------|----------|-----------------------|-------------| | ATLASpix1_M2 | 4,271 | 8.46 mW | $0.04~\mathrm{mm}^2$ | 1.6 Gbps | | ATLASpix2 | 8,358 | 13.72 mW | $0.078~\mathrm{mm}^2$ | 1.28 Gbps | | ATLASpix3 | 37,976 | 22.9 mW | $0.32~\mathrm{mm}^2$ | 1.28 Gbps | Table 7.1: RCU Power Performance Area (PPA) for ATLASpix designs hit data or a partially sorted hit with 4-bit ToT and 3-bit LSB of the time stamp. The MSB bits of time stamp can be recovered from the level-1 TS (10 bit), sent along with the end of event word. ATLASpix3 has a more complex RCU that includes an on-chip command decoder that receives a command bitstream at 160 Mbps. The clock and trigger signals are recovered from the command stream, which also contains readback and configuration commands. Various data encoding schemes were used in ATLASpix RCUs. A novel pipelined 8b/10b encoder was used in ATLASpix1. ATLASpix2 had an 8b/10b aurora encoder, which sends a special sequence that marks consecutive hits, along with idle words. In order to facilitate higher bandwidth and send more hit data in a single frame, ATLASpix3 uses Aurora 64b/66b encoder. It also ensures compatibility between the readout IC used in ATLAS ITk innermost layers. The data receiver designed for these encoders during RTL verification stage were re-used in the test firmware. A full chip verilog model was used to emulate the behavior of the pixel matrix (including buffer block and EoC) as a single functional unit to verify the RCU design. Additionally, a readout modeling environment was used to optimize the readout architecture of ATLASpix2 and ATLASpix3 to meet the requirements of the ATLAS Itk outer layers. RCU has multiple clock domains, which require careful synthesis techniques to achieve timing closure. RTL synthesis and physical design methodology have been optimized as a part of this work. Table 7.1 shows a seven fold increase in gate count of RCU from ATLASpix1 to ATLASpix3. It is in alignment with the added features and hit data per frame. Out of the three ATLASpix sensor chips, ATLASpix1\_M2 was characterized as part of this work. ATLASpix1\_Simple was characterized at KIT as part of another PhD work. The HVCMOS collaboration took responsibility of testing ATLASpix2, which was fabricated in 180 nm AMS/TSI process. It was crucial to prove the correlation of MOSFET I-V characteristics between AMS and TSI process to continue HVCMOS developments in TSI process. [120] shows a nearly 100% correlation between the MOSFET I-V characteristics between AMS and TSI processes. ATLASpix1\_M2 was the first large area prototype with triggered readout. Measurement results (Chapter 3) show that the triggered readout works as expected. Threshold tunning resulted in 4× improvement in threshold dispersion before irradiation and 2× improvement after X-ray irradiation. The serial data link of ATLASpix1 works at the required rate of 1.28 Gbps. Table 7.2 summarizes the evolution of the ATLASpix sensor series. Table 7.2: Summary of ATLASpix designs | Feature | ATLASpix1_ | ATLASpix1_ | ATLASpix2 | ATLASpix3 | |---------------|----------------------------|---------------------------------|-------------------------|-----------------------| | | Simple | M2 | | | | Tape-in | January 2017 | January 2017 | September | March 2019 | | | | | 2017/2018 | | | Process | AMS 180 nm | AMS 180 nm | AMS/TSI | TSI 180 nm | | node | | | 180 nm | | | Die area | $1.8 \times 0.34$ cm | $1.96 \times 0.36 \text{ cm}^2$ | $4.2 \times 3.7 \ mm^2$ | $2 \times 2.1 \ cm^2$ | | Pixel size (x | $130 \times 40$ | $60 \times 50$ | $128 \times 50$ | $150 \times 50$ | | × y) | | | | | | Matrix (col | $2 \times (25 \times 400)$ | $56 \times 320$ | $24 \times 36$ | $132 \times 372$ | | × row) | , | | | | | Pixel | 1:1 pixel to | 16:4 PPtB | 12:4 | 1:1 pixel to | | grouping | buffer | | | buffer | | Readout | triggerless CD | triggered CAB | triggered CAB | triggered CD | | scheme | | | | 00 | | ToT storage | 6-bit ToT | No ToT | 4-bit ToT | 7-bit ToT | | Sorting of | Unsorted | Unsorted | Chronological | Chronological | | hits | | | sorting with | sorting with | | | | | Trigger TS | Trigger ID | | SEU | None | None | SEU tolerant | SEU tolerant | | tolerance | | | pixel memory | pixel and | | | | | ı J | global | | | | | | memory | | Data | 8b/10b | 8b/10b | Aurora | Aurora | | encoding | pipelined | pipelined | 8b/10b | 64b/66b | | | encoder | encoder | | , | | Slow | Externally | Externally | CMD decoder | CMD decoder | | control | configurable | configurable | as separate | as separate | | | Shift register | Shift register | block | block | | | with readback | with readback | | | # **Chapter 8** # Design of an 8-bit SAR ADC in a 28 nm bulk-CMOS process ### 8.1 Motivation Analog to Digital Converters (ADCs) find their applications in almost every systems-on-chip (SoC). ADCs are an integral part of the mixed signal front-end in the digital signal processing chain. The number of publications in the last decade indicates an increase of research interests on Successive Approximation (SAR) ADCs due to their power efficiency and technology scaling. This work investigates the design of an 8-bit 100 MSa/s SAR ADC to identify circuit techniques that improve their conversion speed while maintaining low power dissipation. A test chip named "TC1" was designed and fabricated in a 28 nm bulk-CMOS process to evaluate the performance of this ADC under normal conditions as well as at cryogenic temperatures. The specifications of the ADC is such that, if proven successful in silicon, it can be used in cryogenic applications such as quantum computing. ## 8.2 28 nm high-k metal gate bulk-CMOS process Silicon dioxide has been the primary gate insulator since MOS ICs were first developed. To achieve the drive currents required by advances in IC technology, the gates are becoming extremely thin. They are reaching a point where electron tunneling can cause a large increase in power consumption. To circumvent this problem, alternative gate dielectrics with high dielectric constants (also referred to as "high-k" dielectrics) are being explored. High-k Metal Gate (HKMG) is one of the most significant innovations in CMOS fabrication. It has stalled the transistor scaling with poly-Si gate technology, contradicting Moore's Law. The HKMG solution is by far, one of the most cost-effective solutions in performance, power, die size and manufacturability. This work aims to evaluate the radiation hardness and cryogenic performance of TSMC 28 nm High Performance Computing (HPC) process node, which is optimized for the next generation mobile devices. It enables smaller feature size, high performance and low power. TSMC 28 nm HPC process offers four $V_t$ options (high, typical, low, and ultra low) for design flexibility. Low $V_t$ transistors are used in this work due to an expected increase in MOSFET threshold voltage at cryogenic temperatures. Research groups at CERN have characterized 28 nm bulk-CMOS process up to 1 GRad TID, recommending it to be a suitable candidate for radiation-hard applications ([138], [35]). Pixel readout ICs can take advantage of the high switching frequency of sub- $\mu$ m nodes in order to cope with the high data rates of the inner pixel layers. By using a high-dielectric-constant gate material, a much thicker dielectric can be used to obtain the equivalent capacitance of thinner gates. For thick high-K insulators, electron tunneling is reduced, and oxide-trap charge (explained in section 1.2.6) may predominantly contribute to radiation effects. ### 8.3 Current state-of-the art ADCs in 28 nm ADC performance metrics have been focused on resolution vs. power dissipation and/or conversion rate vs. power dissipation. We use mainly three types of figure of merit (FOM) to evaluate ADCs. The Walden FOM is relevant for low resolution designs, which sets a limit line of 5 fJ per conversion step according to [78]. The Schreier FOM is relevant for medium to high resolution designs that are limited by noise. Currently, SAR ADCs exhibit the best energy efficiency for medium-resolution applications. The power vs. conversion rate eventually translates into FOM vs. sampling frequency. Figure 8.1 shows the distribution of different ADC architectures in 28 nm process. The data is collected from various publications in ISSCC conferences over the last few years. $$FOM_W = \frac{P}{f_s \times 2^{ENOB}} \tag{8.1}$$ where P is the total power consumption and $f_s$ is the sampling frequency. The effective number of bits (ENOB) is given by $$ENOB = \frac{SNDR - 1.76}{6.02} \tag{8.2}$$ $$FOM_s = SNDR + 10\log(\frac{f_s/2}{P}) \tag{8.3}$$ Improvement in the conversion rate over the past years owes to the process technology, which scaled up the switching frequency of transistors by a factor of 10. For circuits that are limited by matching, a prominent factor in sub- $\mu$ m nodes, 8× increase in power is observed per each added bit. A large fraction of modern ADC designs are limited by noise and not process technology. Figure 8.1: Walden FOM vs. sampling frequency [78] Figure 8.2: Schreier FOM vs sampling frequency: 28 nm designs filtered from [78] SAR ADCs have gained dominance over the pipelined architecture in 8-10 bit resolution domain, and sampling speeds up to tens of MHz. The digital dominance of a nanoscale technology favors SAR ADCs over other architectures. Time interleaved SAR ADCs achieve high sampling speeds and good energy efficiency. Figure 8.2 shows a plot of Schreier figure-of-merit vs. conversion speed for ADCs in 28 nm. The recent designs have achieved conversion speeds up to 10's of GS/s). At low to moderate conversion speeds ( $\leq 100MS/s$ ), they attain low power ( $\leq 2~mW$ ). This can be attributed to the following design advancements in SAR ADCs. - 1. Advanced CMOS processes have 8-10 metal layers, such as the one used in this work. The width and spacing of these metal layers can accurately be controlled for matching. This resulted in the realization of metal-oxide-metal (MOM) capacitors. A capacitive DAC can be designed using MoM capacitors\* as unit elements to scale down the total capacitance, improving both speed and power dissipation. This is a result of advanced photo-lithography techniques in contrast to earlier design techniques (eg. [45]) which used custom made inter-digitated metal-metal capacitors. - 2. Research has been done on reducing the switching of Capacitive DAC. The average energy required for charging and discharging the SAR capacitor array determines the efficiency of the switching scheme. The average switching energy is reduced by 81% by eliminating MSB capacitance switching and obtaining a 50% reduction in the number of capacitors in [70]. A conventional SAR switching scheme consumes five times more energy for a 'DOWN' transition as compared to the corresponding 'UP' transition as shown in [137]. This leads to increased power consumption, dynamic settling errors in references and in turn, limits the speed of the converter. [44] uses a common-mode voltage halfway between the DAC references achieving 87.5 % power savings. A separate coarse-ADC can be used to calculate the MSBs. The use of energy-efficient switching, together with a small unit DAC capacitance significantly reduces the DAC switching energy, which in turn reduces the overall ADC power dissipation and improves its power efficiency. - 3. ADCs with more than 6 bit precision, running at several GHz are almost impossible to be built as a single-channel ADC. It is desirable to use time-interleaving techniques to extend the speed of SAR architecture so that it can compete with traditional multi-stage pipelined ADCs, in the application space requiring a sampling frequency greater than 1 GHz and similar SNDR. - 4. The use of asynchronous SAR logic has gained more popularity in recent years. For example, in [63], asynchronous clocking shortens the overall conversion time by removing waiting times during SAR operation. The rising <sup>\*</sup>Metal-oxide-metal (MoM) capacitor is an inter-digitated multi-finger capacitor formed by multiple metal layers (optionally connected by vias) in the vertical BEOL (back-end-of-line) stack separated by inter-metal dielectrics external clock resets the SAR logic completely, independently of the state of the ADC. An asynchronous clock has been used in several designs such as [122] which achieves less than 1 mW power consumption in 28 nm. - 5. Converting more than one bit per cycle is an effective way to increase the performance of an ADC. A number of designs with 2-bits per conversion cycle were presented ([136], [19], [129]) with higher sampling rates than 1-bit per cycle SAR ADCs. The disadvantages of multi-bit per cycle SAR ADCs are the complex DAC structures to perform a large number of comparisons. In addition, it reduces the possibility to take advantage of time-interleaving and asynchronous logic. - Technology scaling with lower supply voltages in digital CMOS processes favors ADC topologies with only a few truly analog elements. This makes SAR ADCs perfectly suitable, as they require no gain stages. [22] achieves gigahertz sampling speed from a single SAR ADC and reduces area by more than half by leveraging a charge-injection DAC. It employs a unique uni-directional transfer of charge as opposed to the bi-directional charge sharing in a conventional capacitor array DAC. Considering a typical 10-bit 100 MS/s synchronous SAR ADC, if the sampling time, active comparator delay and SAR logic delay are subtracted from each period, the DAC settling time needs to be less than 0.4 ns per bit cycle. Such a short time interval may not be sufficient for a capacitive DAC to stabilize due to the interconnect line impedance in advanced process nodes that slows down the charge transfer. It is predominant in the longest routing path of the capacitive DAC array. The reference voltage noise and crosstalk can also affect the settling time of DAC. Many designs have adopted non-binary SAR which can tolerate the DAC settling error at the cost of increased design complexity and hardware overhead. [71] reports a 10-bit 100MS/s SAR ADC that uses binary scaled DAC networks for settling error compensation with a power consumption of just 1.13 mW. It uses error correction techniques for overcoming the DAC settling error. [31] utilizes a redundancy facilitated error-detection scheme and an analog correction scheme achieving an ENOB of 10.4 bits and a state-of-theart power-efficiency of 5.5 fJ/conversion-step. Besides DAC mismatch, noise tends to limit the performance of high-resolution SAR ADCs in modern CMOS, with reduced supply voltage ( $\leq 1$ V). The power, speed and area of SAR ADC are closely related to its DAC, whose total capacitance hinges upon either the (kT/C) noise or the matching requirement for linearity. To achieve the required SNR level while saving power, several techniques have been proposed. A two-stage pipelined SAR can relax the comparator noise by introducing a low-noise amplifier between the two stages. Several publications have followed this idea. A time-interleaved SAR-assisted pipelined ADC in 28nm CMOS reported by [124] uses a fully dynamic residue amplifier two cascaded integration steps. This combines excellent noise-filtering properties with high gain. Scaling up the device dimensions can improve matching but it will deteriorate power-efficiency and speed. To summarize, design optimizations have been done on the capacitor switching strategy of the DAC, asynchronous control logic and low-power comparator design. ### 8.4 Possible applications of the ADC prototype Medium-resolution Nyquist rate SAR ADCs are suitable for a wide range of applications in many different areas. This includes high-speed wireless communication systems due to their low power consumption and excellent area efficiency. However, for applications that require high sampling rates, the performance of a single-channel SAR ADC is limited by the number of clock cycles required for conversion. To increase the conversion rate, SAR ADC designs have evolved to using a mixed architecture such as pipelined-SAR and flash-SAR ADCs. In this work, a 100 MS/s 8-bit single channel SAR ADC is prototyped to evaluate its suitability for cryogenic applications. Cryogenic CMOS control circuity of a quantum computing system is identified as the application of interest for the proposed SAR ADC. A case study has been conducted at Qutech, TU Delft to understand the specifications and requirements of such a system. # 8.4.1 Requirements and challenges of control system for Quantum computing Quantum computers promise an exponential increase in computing power compared with classical CMOS-based systems. Quantum computing aspirants predict a whole new era of computing. Conversely, there are also quantum computing skeptics who think it will never happen. The main argument is that quantum computing will require control over an exponentially large number of quantum states, and this amount of control is technically challenging to achieve. Hence quantum computing control has opened up new horizons for physicists and engineers, especially for ASIC designers. The main technical challenges in Quantum Computing can be classified into three: 1) Qubit Quality 2) Error Correction and 3) Qubit Control. Qubits can be implemented in a number of ways such as trapped ions, electron or hole spin in semiconductors, nitrogen-vacancies in diamond lattices etc. Poor qubit quality can deliver a result that is indistinguishable from noise. The first quantum error-correcting codes were discovered independently by [114] and [118]. In order to implement complex error correction algorithms, we should be able to control multiple qubits. The control circuitry must have low-latency of the order of 10's of nanoseconds. CMOS-based adaptive feedback control circuits are used to achieve this. Additionally, we need to address the fan-out, which hinders the scaling up of the number of qubits within a quantum chip. We require multiple control wires to create each qubit. It is difficult to build a million-qubit chip with millions of wires connecting to the circuit board, routed out of a cryostat. The semiconductor industry has already recognized this problem in 1960's by E.F Rent, IBM<sup>†</sup>. [37] reviews extensibility limits faced by different qubit implementations on the way towards a truly large-scale qubit system. CMOS technology offers high speed at low power and a compact form factor. MOS transistors are known to operate at 4K and below [128]. [51] demonstrates a commercial FPGA operating at 4 kelvin and controlling a microwave signal switching matrix at 20 mK, which interfaces with a quantum dot device. As expected, the current increases at cryogenic temperatures due to an increase in carrier mobility in silicon. However, the increase in carrier mobility is mitigated by partial substrate freeze-out and other effects [20]. Although enhancing of BSIM4 model was attempted [4], no MOSFET models exist for nanometer CMOS at deep-cryogenic temperatures. [55] proposes a SPICE MOSFET model based on $MOS_{11}/PSP$ ([64]) valid at 4 K and 100 mK in addition to the common operating range ( $-55^{\circ}C$ to $125^{\circ}C$ ). The model is validated with measurement results for 160 nm and 40 nm MOSFETs. MOS transistors can be used to implement quantum control circuits such as multiplexers, LNAs, DACs, ADCs etc. Figure shows a single channel quantum computing interface. The readout interface consists of a current to voltage converter, a low-pass filter, and data converters. Digital to analog converter (DAC) and analog to digital converter (ADC) are used for qubit control and readout. Demultiplexers and multiplexers can be used to minimize the number of connections to and from higher temperatures to the cryostat, making the system compact and scalable. ADC can be implemented in an ASIC or an FPGA. The current proposed solution is to place the electronics at 4 K with connections to 20 mK [20]. <sup>&</sup>lt;sup>†</sup>Rent's rule predicts that the number of terminals required by a group of gates for communication with the rest of the circuit is a consequence of statistically homogeneous circuit topology and gate placement [65] **Figure 8.3:** A single-channel representation of a classical interface to a quantum computer. It includes temperature ranges of operation for each section, along with optical and wireless/wired interfaces to and from the quantum processor. The digital controller (ASIC and/or an FPGA) is used as a local error correction and readout of the state of the qubit but also to drive algorithm execution. (De)multiplexers are used to reduce the number of interconnections to and from the cold areas of the circuit, so as to ensure large number of channels while minimizing thermal flux; these components operate in TDMA, FDMA and SDMA mode. Optical sensing and light guides may be implemented in CMOS and CMOS-compatible substrates. [20] #### 8.4.2 Current state-of-the-art ADCs in QC system Although qubits can be implemented in different ways, in this case study, superconducting qubits are considered. The readout of qubits require fast ADCs with medium to high resolutions. [105] describes a digital feedback control system for superconducting qubits. The core component of this digital controller is called "ADwin-Gold". It is a processor that has a set of analog inputs with configurable outputs that can be analog or digital. It reads out the signal with a trigger delay. ADwin-Gold marked the first generation of digital controllers for quantum computing. Researchers at ETH Zurich have proposed FPGA-based controllers for digital feedback [119]. The second generation of digital feedback controllers used a programmable logic device (CPLD) to process the signal sampled by an 8-bit ADC. This offered a faster response time than the decay time when compared with ADwin. FPGA, however, offers design flexibility and scalability over CPLDs, making them the instrument of choice. [50] proposes a cryogenic ADC for qubit control implemented in FPGA. The conversion rate of ADCs used in quantum computing systems has scaled up over the last years. The first generation digital controllers that employ ADwin used a 14-bit at 2 MS/s. [106] used an 8-bit 20 MS/s during the second generation of digital controllers. It was succeeded by commercial ADCs operating at 100 MHz sampling rate with ENOB of 7-bits or more such as ADC08200 from Texas Instruments. [17] uses an on-board ADC to digitize the input signal at 500 MS/s and send to an FPGA for processing. [34] marked the use of GS/s ADCs in a quantum computing digital feedback and control system with multiplexed channels. The above mentioned publications shows the ADCs used in quantum computing are trending toward higher sampling rates in medium resolution range (8-bits or more). Current ADC designs face the challenge to keep up these high sampling rates at cryogenic temperature. [50] proposes a time-based ADC architecture with phase interpolation techniques to reach a sampling rate, higher than the clock frequency achieving a conversion rate of 1.2 GS/s down to 15 K (liquid helium). The advantage of this FPGA implemented prototype is its reconfigurability and scalability as it uses low-voltage differential signaling (LVDS) buffers as comparators and fast time-to-digital converters to perform the conversion. Cryogenic ADCs reported to date are limited in their sampling rate up to a few hundred kHz and resolutions up to 11-bits. Researchers at IMEC presented a cryogenic ADC implemented in a standard 0.35 micron CMOS process, which is functional down to 4.4 K with and ENOB of 8.53 bits [80]. The SNDR vs temperature characteristic of the ADC is observed to decrease with decreasing temperature. The charge injections of the switches increase with the increasing mobility at low temperatures. In addition, the deterioration of the device matching enhances the effect of charge injection on the comparator latch offset. It is implied from the above literature study that an 8-bit 100 MS/s is suitable for quantum computing control circuitry. An important aspect is the power consumption. The figure of merit (FOM) of the state of the art ADCs in 28 nm from the ADC survey [78] is 68.32 fJ/convstep. This results in a power budget of 17 mW for GS/s ADCs of medium resolution and $\approx\!\!5$ mW for 100 MS/s ADCs of medium resolution. Table 8.1 lists out the specifications of the ADC described in this work. | Feature | Specification | | |-----------------|----------------------|--| | Technology | 28 nm | | | Resolution | 8 bits | | | Conversion rate | $100 \mathrm{MS/s}$ | | | Bandwidth | 500 MHz | | | Dynamic range | 48 dB | | | Power budget | 5 mW | | Table 8.1: ADC specifications ### 8.5 The design of an 8-bit 100 MSa/s SAR ADC SAR ADCs convert an analog input to its digital equivalent by a series of successive approximation steps, usually using a binary search algorithm. The DAC output voltage is compared to the input signal and the result of this comparison is fed to the control logic. The control logic performs a feedback addition/subtraction that brings the DAC voltage closer to the analog input with every comparison. The resolution of SAR ADCs depends on the number of successive approximation cycles. Their linearity is primarily limited by the sensitivity of the comparator and the linearity of the D/A converter. The proposed SAR ADC shown in figure 8.4 consists of the following blocks: - 1. A bootstrapped sample and hold circuit to sample the input. This block has been simulated up to 1 GHz rail-to-rail input signal. - 2. A DAC based on a binary scaled capacitive array, which can charge positively and negatively depending on the output of the comparator. - 3. A latch based comparator that compares the voltage on the sampling capacitor to a pre-defined threshold, which is $0.5 V_{DD}$ . - 4. A digitally synthesized block that implements the SAR algorithm, generates the clock for the comparator, schedules the sampling operation, encodes and serializes the data output. The start of conversion signal and clock are supplied externally. The external clock can be up to 1 GHz. The data output is encoded using 8b/10b protocol described in section 2.3.3. The proposed architecture is verified using a full chip Verilog simulation that accounts for the delays of analog blocks. 10 clock cycles are required for each conversion cycle as shown in figure 8.5. The DAC described in section 8.5.3 saves one clock cycle during the conversion process by first comparing Figure 8.4: SAR ADC block-level representation the input signal before counting up or down. The upcoming sections describe the design details of each block. ### 8.5.1 The Sample and hold circuit Transistors have been used as switches since 1950s. The simplest way of sampling an analog signal is by using MOSFET as a switch as shown in figure 8.6. In this topology, the ON resistance of the switch depends on the input signal. $$R_{ON} = \frac{\mu_n C_{ox} W}{L} (V_{DD} - V_{in} - V_{th})$$ (8.4) It is necessary to maintain a relatively constant "ON resistance" to minimize distortion. In addition, when the MOSFET turns OFF, the charge stored in the inversion layer gets injected into the load capacitance, adding error to the sampled signal. The injected charge depends on the gate-source voltage, which is a function of the input signal. $$Q_{ch} = WLC_{ox}W(V_{GS} - V_{th}) \tag{8.5}$$ $$Q_{ch} = WLC_{ox}W(V_{DD} - V_{in} - V_{th})$$ $$\tag{8.6}$$ Figure 8.5: TC1: SAR ADC timing diagram Therefore it is necessary to keep the gate-source voltage of the sampling switch constant. This is achieved by fixing the gate-source voltage of the switch. The circuit technique is known as "bootstrapping" as patented in 1966 [107]. Bootstrapping suppresses the variation in $V_{GS}$ , but the threshold voltage is a function of the input due to body effect (8.7). $$\Delta V_{th} = \frac{\sqrt{2\epsilon_s q N_a}}{C_{ox}} (\sqrt{2\Phi_f + V_{SB}} - \sqrt{2\Phi_f})$$ (8.7) A passive level shift arrangement using a pre-charged capacitor as shown in figure 8.7 would consume no static power [79]. A number of implementations such as, [29, 3, 30] followed this topology, which is widely used in modern integrated circuits. Figure 8.6: A simple sampling switch using NMOS transistor Figure 8.7: Bootstrapping concept During the sampling phase, the bootstrap capacitor $(C_{bs})$ keeps the sampling switch turned ON and during hold phase it is recharged to $(V_{DD})$ . Therefore, the simplest implementation of a bootstrapped sample and hold circuit requires at least five switches as shown in figure 8.8. During the hold phase, $S_1$ and $S_2$ charges $C_{bs}$ to $V_{DD}$ as $S_5$ turns OFF the NMOS sampling switch. $S_3$ and $S_4$ disconnect $C_{bs}$ while on hold. Figure 8.8: Bootstrapping concept using switches The switches $S_2$ and $S_3$ are realized using PMOS, since they are connected to higher potential. $S_1$ and $S_5$ are realized using NMOS as they are tied to ground. During sampling mode, $S_4$ senses the input signal. Let us assume that it must be the same type as the sampling switch. During hold phase $(\overline{CK})$ , $S_1$ and $S_2$ are ON in order to precharge $C_{BS}$ . $S_5$ must be ON to ensure that the sampling switch is in OFF state. The connection of the gates of $S_2$ and $S_4$ needs special consideration. The top plate of $C_{BS}$ may rise above $V_{DD}$ due to the pre-charge voltage and the input swing. In such a condition, $S_2$ cannot be turned off by connecting its gate to CK. Therefore we must bootstrap the gate of $S_2$ to $V_{in}$ . If the gate of $S_4$ is connected to CK, during sampling mode, it exhibits a large $R_{ON}$ to higher values of Vin. Therefore, we bootstrap the gate of $S_4$ . The circuit so far looks like as shown in figure 8.9. Figure 8.9: Bootstrapped sampling circuit For reliability reasons, we need to add a cascode device $SR_5$ to limit the $V_{DS}$ and $V_{GD}$ of $S_5$ , since it may rise above $V_{DD}$ figure 8.10. Figure 8.10: Bootstrapped sampling circuit with reliability Figure 8.11: The complete bootstrapped sample and hold circuit based on [3] The circuit in figure 8.10 has a limitation due to the fact that $S_3$ needs to remain in ON state as $V_{in}$ approaches $V_{DD}$ . To facilitate this, we add three additional switches resulting in the topology shown in figure 8.11. During sampling phase (CK), the voltage at node X rises to $V_{in} + V_{DD}$ . $S_{31}$ turns ON $S_3$ during initial phase of sampling and $S_{32}$ ensures that $S_3$ remains in ON state as $V_{in}$ approaches $V_{DD}$ . $S_{33}$ ensures that $S_3$ is turned OFF during hold phase. The bulk of $S_2$ and $S_3$ must be connected to node P instead of $V_{DD}$ . **Figure 8.12:** The complete bootstrapped sample and hold circuit implementation in 28nm ### 8.5.2 Comparator design Comparators play an important role in ADC design. In most of the ADCs, comparator design poses limitations to overall speed and resolution. The function of a comparator is to compare the instantaneous value of two analog signals and generate a digital output voltage based on the polarity of their difference. Latched comparators have become popular, since they consume zero static power, produce rail-to-rail outputs, and their input-referred offset arises from primarily one differential pair [102]. They are suitable for SAR ADCs since they respond to a strobe such as a clock edge and the output is stored till the next strobe. Dynamic comparators have become popular in recent publications ([75], [42]). In this work, a two-staged latched comparator with preamplifier based on [82] is used. ### Working of a regenerative latch An ideal comparator output will be $V_{out} = V_{DD}$ , if $V_{in} > V_{ref}$ and $V_{out} = 0$ , if $V_{in} < V_{ref}$ . The comparator performance can be evaluated based on its ability to resolve input voltages close to the threshold. An idea is to use cascaded amplifiers for maximizing the gain. For example, the minimum resolvable input using a three stage cascaded amplifiers each of gain "A" is $$V_{in} = \frac{V_{DD}}{A^3} \tag{8.8}$$ The only way to improve the resolution is by increasing the gain. Another idea is to use a feedback element in such a way that when Vin>0, we pump charge on a capacitor and when Vin<0, we remove charge from the capacitor. In other words, we can make use of a Voltage Controlled Current Source (VCCS) as shown in figure 8.13. At $t\to\infty$ , $V_c\to\infty$ if $v_i>0$ and $V_c\to-\infty$ if $v_i<0$ . In practice, G Figure 8.13: Voltage Controlled Current Source of gain "G" with positive feedback is finite, hence $V_c$ will be limited between ground and supply voltage. The output current of the transconductor can be given by $$C\frac{\mathrm{d}v_c}{\mathrm{d}t} = Gv_c \tag{8.9}$$ where $v_c$ is the instantaneous voltage across the capacitor Integrating on both the sides, $$C\int_{V_i}^{V_c} \frac{dv_c}{v_c} = G\int_0^T dt \tag{8.10}$$ $$ln(\frac{Vc}{Vi}) = \frac{GT}{C} \tag{8.11}$$ $$V_c = V_i e^{\frac{GT}{C}} \tag{8.12}$$ We don't want the latch to regenerate indefinitely. In practice, $V_c$ is finite and can reach up to $V_{DD}$ with a finite delay of T. In reality, T refers to a clock period of ADC. As seen from equation 6, $\frac{C}{G}$ has the dimension of time and we call it regenerative time constant. The major advantage of such a system over the cascaded amplifiers is that the gain is a function of time. If we wait longer, we get a larger gain. It is hardware efficient and therefore is the de facto solution for ADC. Minimum resolvable input voltage is given by, $$V_{in} = \frac{V_{DD}}{e^{\frac{GT}{C}}} \tag{8.13}$$ If we want to resolve the difference between two differential input signals, we need a differential VCCS as shown in figure 8.14. In other words, we need a circuit that Figure 8.14: Differential Voltage Controlled Current Source senses $V_{CM} + \Delta v$ and $V_{CM} - \Delta v$ and is capable of sourcing a current of $G\Delta v$ if $\Delta v > 0$ and sinking a current of $G\Delta v$ if $\Delta v < 0$ . Such a circuit is called a regenerative latch figure 8.15. The simplest VCCS we can think of is a CMOS inverter properly biased at its trip point<sup>‡</sup>. Assuming the common-mode voltage $(V_{CM})$ is around the trip point, the inverter draws a current equal to $G_m\Delta v$ when $V_{in} = V_{CM} + \Delta v$ and sources a current equal to $G_m\Delta v$ when $V_{in} = V_{CM} - \Delta v$ where $G_m$ is the total transconductance of PMOS and NMOS transistors. <sup>&</sup>lt;sup>‡</sup>Trip point is a region in the output characteristics of a CMOS inverter where the gain is high and it can be found by connecting the input to output in which case it will settle down to a voltage referred to as trip point Figure 8.15: A regenerative latch using CMOS inverters In such an arrangement, the node voltage which was higher tends to increase and the node voltage which was lower tends to decrease until the output voltages reaches either $V_{DD}$ or ground depending on which input was larger in magnitude. For regeneration to happen faster, we must reduce the time constant of the latch, $rac{C}{G}$ as much as possible. Increasing the size of the transistors has an adverse effect since it results in an increase in the parasitic capacitance. Hence, two cross-coupled inverters without load capacitance will give the best regenerative time constant. The absence of load capacitance is generally not a problem as long as the $\frac{kT}{C}$ noise is less than the input voltage we want to resolve. In order to momentarily charge the parasitic capacitors, $C_{p1}$ § and $C_{p2}$ ¶ with $V_{CM} + \Delta v$ and $V_{CM} - \Delta v$ , we add two more switches which operate at a clock phase named $\phi_1$ figure 8.16. When the input voltage is between $V_{DD}$ and GND, both PMOS and NMOS are turned ON resulting in static power consumption. One solution is to tri-state the inverters during $\phi_1$ . ie. disable the inverters during $\phi_1$ and enable them during regeneration phase. Hence we need a non-overlapping clock $\phi_{1B}$ as shown in figure 8.17. During $\phi_1$ , the gate-source capacitance of the MOSFETs are reduced since the sources are floating. The input impedance is due to the parasitic capacitances at nodes X and Y and the drain resistance of the driving MOSFET. The node potentials at X and Y, $V_x$ and $V_y$ attempts to track the input. The parasitic capacitances at nodes X and Y gets charged. During $\phi_{1B}$ , the inverter will regenerate the difference between the node voltages at X and Y. The operation of a regenerative latch is shown in If the time constant of the regenerative latch is comparable to the time period of the clock $\phi_1$ , the discharge of the node voltages at X and Y will be incomplete due to the finite impedance of the switch. This effect is prominent and likely to cause an error in the output when the difference between the input signal is very small and is of the opposite polarity as the previous comparison cycle. The <sup>§</sup>Assuming the inverters are identical, input parasitic capacitance of CMOS inverter which is equal to the sum of gate capacitances of PMOS and NMOS <sup>&</sup>lt;sup>¶</sup>output parasitic capacitance of the CMOS inverter between the drain and the bulk of MOSFETs Figure 8.16: Clocked regenerative latch using CMOS inverters Figure 8.17: Tri-stated regenerative latch using CMOS inverters output of the comparator tends to hold on to the previous decision. Latch based comparators suffer from this interesting phenomenon called hysterisis $^{||}$ . Hysterisis can cause a serious error in ADC output code. A solution to this issue is to reset the latch outputs to $\frac{V_{DD}}{2}$ at the end of each regeneration phase and before the sampling phase. During the sampling phase, we must disable the inverters and charge the two parasitic capacitances to the differential input voltage. Towards the end of regeneration, we grab the digital data from the two output nodes X and Y. Once the digital output is recorded, we make sure that the output nodes are reset to $\frac{V_{DD}}{2}$ or GND in the interest of low static power dissipation. A regenerative latch as shown in figure 8.17 has offset due to device mismatch and mismatch of capacitive load on either side of the latch circuit. The former is a static mismatch due to changes in device parameters while fabrication and the Derived from the Greek word "hysteria" meaning "remembering the past" Figure 8.18: Timing diagram of a regenerative latch showing hysterisis latter is a dynamic mismatch due to layout design. The effect of device mismatch is less for larger MOSFETs since the random variations are averaged over a larger area. Usage of large MOSFETs is accompanied by performance, power and area trade off. The dynamic mismatch due to difference in the capacitive loads on either end of the latch circuit causes a big change in the input common-mode voltage of the latch. These offset values are often large enough to cause INL/DNL errors in ADCs. Therefore we need to include offset correction. One way to mitigate offset error is to scale the input voltage by a constant A, thereby dividing the offset value by A figure 8.19. In other words, use a preamplifier of gain A. Offset due to device mismatch can be an issue in the preampflier design as well. Assuming that $\sigma_{offset}$ due to device mismatch is Gaussian distributed, the input-referred offset of the preamplifier and latch system is given by $$\sigma_{offset}^2 = \sigma_{preamp}^2 + \frac{\sigma_{latch}^2}{A^2} \tag{8.14}$$ Some design techniques to reduce the offsets of the latch and the preamplifier are discussed in [103]. Digital calibration is applied most commonly to comparator offsets [125], amplifier gains [126] or capacitor values [113], and in all cases involves some digital programmability for the above-mentioned non-idealities [127]. While the technology scaling of MOS transistors enables high-speed and low- Figure 8.19: Use of a preamplifier to reduce input referred offset power operation, the offset voltage of the comparator is increased due to the transistor mismatch. Preamplifiers are conventionally used to reduce offset voltage [101]. It can also be used to avoid kickback noise\*\*. The use of a preamplifier before a latch has the advantage of reducing the input offset voltage of the latch by gain of the amplifier figure 8.19. The simplest design of a preamplifier is an actively loaded differential MOSFET pair. #### Latched comparator: Implementation in 28 nm A pseudo-differential dynamic comparator topology described in [75] is used in this work as shown in figure 8.20. The dynamic implementation implies no static power consumption which makes the total ADC power scale linearly with the sampling frequency. The comparator has two stages with its first stage as a preamplifier and the second stage as a regenerative latch. Minimum sized transistors are used targeting high speed operation. The mismatch is dominated by the input transistor pair whose sizes are up by a factor of 2. The mismatch of the second stage is suppressed by the gain of the preamplifier. Load capacitance calibration is a commonly used technique to compensate the mismatch [42]. Let us analyze the circuit shown in figure 8.20. The first stage is a preamplifier which integrates differential input signals with time. The second stage is a regenerative latch as shown in figure 8.21. The transient simulation is shown in figure 8.22. When CLK is low, the transistors $M_5$ and $M_6$ are ON where as $M_3$ and $M_4$ are in cutoff mode. VintP and VintN rises to the supply voltage as the parasitic capacitance of these nodes get charged. During this time, the regenerative <sup>\*\*</sup>Large voltage variations in the internal nodes that are coupled to the input, disturbing the input voltage is known as kickback noise. [36] describes some design techniques to reduce kickback noise in comparators used in ADCs Figure 8.20: Comparator implementation 28nm latch is turned off since $M_{15}$ and $M_{16}$ are off. When CLK goes high, $M_3$ and $M_4$ are turned ON while $M_5$ and $M_6$ are off. The voltage at internal nodes ( $V_{intP}$ and $V_{intN}$ ) starts decreasing as the parasitic capacitors discharge. The difference in discharge currents between the two internal nodes is determined by input signals, InP and InN. When the voltages at intP and intN drop sufficiently, the second stage is activated. It will regenerate the voltage difference over time, delivering the rail-to-rail output ( $V_{outP}$ and $V_{outN}$ ). The rise time of CLK is 1 ps in figure 8.22. When CLK is high and $M_3$ (or $M_4$ ) is in the saturation, $V_{intP}$ (or $V_{intN}$ ) can be approximated as the drain voltage of $M_1$ (or $M_2$ ) which in turn depends on the drain current and the parasitic capacitance at internal node (C). $$V_{intP/N} = V_{DD} - \frac{I_{DS}}{C}t \tag{8.15}$$ The change in the internal node voltage is given by $$\frac{\mathrm{d}V_{intP/N}}{\mathrm{d}t} = -\frac{I_{DS}}{C} \tag{8.16}$$ Figure 8.21: Equivalent circuit of figure 8.20 Taking into account the channel length modulation, the drain current is given by, $$I_{DS} = -\frac{1}{2}\mu C_{ox} \frac{W}{L} (V_{GS} - V_{th})^2 (1 + \lambda (V_{DS} - V_{DSsat}))$$ (8.17) where $\lambda$ is the channel length modulation coefficient and $V_{DSsat}$ is the saturated drain-source voltage. The integration time (t) of the first stage can be deduced from equation 8.15 by substituting the average drain current when $V_{intP/N}$ is decreased from $V_{DD}$ to $V_{DSsat}$ . $$t = \frac{\left(V_{DD} - V_{DSsat}\right)C}{I_{DS}^{-}} \tag{8.18}$$ Simulations show that the comparator can resolve a $V_{th}\pm 100~mV$ input, up to a clock frequency of 10 GHz. Although it is not obvious from the SAR architecture, Figure 8.22: Comparator: transient simulation the input signals below the comparator noise levels can limit the ADC resolution. Hence, the comparator noise plays an important role in the design of high resolution ADCs. A pseudo-differential preamplifier may deliver high speed at the cost of increased noise when compared to a fully-differential preamplifier used in ATLASpix designs figure 8.23. Figure 8.23: Fully differential preamplifier followed by a SR-latch ([86]) #### 8.5.3 D/A Converter The linearity (INL and DNL) of a SAR ADC is determined by the linearity of the feedback DAC. The matching of a unit capacitor is the deciding factor. Since the binary weighted capacitors can be positively or negatively connected to the sampling capacitor, any mismatch between the units or the parasitics may deteriorate the INL/DNL of the ADC. The parasitics on both sides of the array needs to be equalized. Otherwise positive and negative charge addition may yield a different result. Symmetric MoM capacitors with closely spaced metal fingers are used. Bigger unit capacitance has less mismatch. Apart from matching, the sampling noise also determines the minimum input capacitance of an ADC and hence the DAC array as they are proportional. During each sampling phase, an integrated noise voltage kT/C remains on the capacitor. However, matching requirements dictates the use of unit capacitors of large sizes which makes kT/C noise often negligible. Additionally, shielding has been used to decouple noise. Commonly followed topology such as centroid matching was not considered necessary for this test chip. ADC performance and the effect of DAC on its non-linearity will be investigated. A successive approximation ADC works by using a digital to analog converter (DAC) and a comparator to perform binary search to find the input voltage. A sample and hold circuit is used to sample the analog input voltage and hold the sampled value whilst the binary search is performed. The binary search starts with the most significant bit (MSB) and works towards the least significant bit (LSB). For 8-bit output resolution, eight comparisons are needed which takes at least eight clock cycles. The sample and hold circuit samples the analog input on a rising edge of the clock. The comparator output is a logic one if the sampled analog voltage is greater than the output of the DAC. The DAC consists of binary weighted capacitor arrays. When a bit is ON, a capacitive voltage divider is created by the corresponding capacitor and the total array capacitance, which adds a voltage to node A equal to its weight. When a weighted capacitor is switched OFF, the same voltage is subtracted from node A. The amount of voltage added or subtracted is given by equation 8.19, where $V_t - V_0$ represents the change in applied voltage and C is the value of the capacitor that switches. $$\Delta V = \frac{(V_t - V_0) C}{\Sigma C} \tag{8.19}$$ The DAC array shown in figure 8.24 works as follows: Step 1: The MSB capacitor is switched ON causing the voltage at node A to settle at $V_{in} + 0.5V_{DD}$ . Step 2: $V_A$ is compared to $V_{DD}$ . Step 3:If $V_A > V_{DD}$ , the comparator output will be zero which means $V_{in} > 0.5V_{DD}$ . the capacitor array counts up in binary weighted steps, steps increasing the voltage at node "A". If $V_A < V_{DD}$ the MSB bit is set to Figure 8.24: Capacitor array DAC schematic Figure 8.25: Capacitor array DAC schematic with count up and down 0 and the next bit is set to 1, decreasing $V_A$ by a factor of 2. Step 2 and Step 3 are repeated over eight clock cycles until $V_A$ converges to VDD. The above scheme has disadvantages making it difficult for the comparator to resolve as $V_A V_{DD}$ . The comparator delay with respect to various input amplitudes may add to non-linearity. The proposed solution in figure 8.25 uses $0.5V_{DD}$ as reference and it can count both up and down. The DAC array works as follows: Step 1: The sampled input signal is compared with $V_{ref}$ which is set to $0.5V_{DD}$ Step 2: Based on the result of the comparison, the capacitor array counts up or down, thereby increasing or decreasing the voltage on the positive input of the comparator. Step 2 and Step 1 is repeated over eight clock cycles until node "A" converges close Figure 8.26: DAC layout (single side) using 128 unit capacitors of 1.27 fF to $0.5V_{DD}$ . The above scheme saves one clock cycle per conversion, owing to the fact that a comparison is performed soon after the input sample phase. Rail-to-rail input can be easily resolved by the comparator. The disadvantage of this scheme is the huge amount of capacitances adding to node "A" and hence on the hold capacitor. This is mitigated by using the least unit capacitance trading off the process-induced mismatch and relying on clever layout techniques to equalize the parasitics. Shielding has been used to prevent cross-talk between the sampled input and DAC switching signals as shown in figure 8.26. A small unit capacitance is useful to reduce the switching energy and settling time of the DAC, thereby improving the ADC performance. Figure 8.27: Layout of analog blocks #### 8.5.4 Readout circuitry and top-level simulation The readout logic includes a finite state machine implementing SAR logic, a data encoder and a serializer. The sampling, readout and output data transmission are scheduled by means of a controller state machine whose states are shown in figure 8.5. The data is serialized using a Parallel In Serial Out (PISO) register. The data output follows an 8b/10b encoding scheme. It takes 10 clock cycles from the start of conversion to loading of the PISO register. Hence, a clock frequency of 1 GHz will result in 100 MSa/s for an 8-bit A/D converter. The RTL of the readout block is verified using a full-chip Verilog model that emulates the rest of the ADC blocks. A full chip mixed-mode simulation with RC extracted netlist ensured timing closure (figure 8.28). **Figure 8.28:** SAR ADC top level mixed-mode simulation using RC extracted netlist. Input is a ramp signal with amplitude ranging from 0 to 0.9 V. Threshold voltage is set to 0.45 V (equal to $\frac{V_{\rm DD}}{2}$ ). It can be seen in the simulation that the node, Vdac, settles down at $\frac{V_{\rm DD}}{2}$ . The clock frequency is 1 GHz, resulting in a sampling rate of 100 MHz. The SAR logic is imported as Verilog netlist. Only the analog blocks were extracted. Figure 8.29: TC1: Micrograph showing ADC and SEU tolerant register arrays #### 8.5.5 IO interface Clock input and data output can be configured to use either CMOS or Low Voltage Differential Signaling (LVDS) scheme. High speed level shifters are designed for LVDS to CMOS conversion and vice versa. The IO interface is designed to handle data rates exceeding 100 Mbps using LVDS signaling. Analog I/O cells were used for all input and output signals. #### 8.5.6 Top-level integration TC1 is a Multi-Project-Wafer (MPW) micro-block of die area $1 \times 1 \text{ mm}^2$ . "Digital-on-top" integration methodology was piloted during the initial design phase. Since the back-end views of standard cells and I/O cells in 28 nm were not provided to academic customers, it was challenging to verify digital-on-top methodology within the design cycle. Moreover, it was challenging to run LVS on the top-level after digital integration. Hence, the "analog-on-top" methodology was chosen for design sign off to avoid mistakes during top-level routing. I/O LEF files were used for the full-chip integration. The dimensions of the ADC are $3.8 \times 340 \ \mu\text{m}^2$ (y ×x), an aspect ratio intended for time-interleaving possibility in the future. 8.6 Test setup 137 #### 8.6 Test setup A test system was developed for the characterization of the ADC. The chip was bonded to a PCIe carrier as shown in figure 8.30b. The test setup consists of the PCIe carrier board, an adapter board and a nexys video board with Xilinx Artix 7 FPGA (figure 8.30a). The adapter board serves as an interface between the carrier board and the nexys video board. Pull up resistors were used for LVDS output on the adapter board. The test software is adopted from ATLASpix3 with minimal modifications. It has a QT interface that enables the chip configuration and displays the decoded output data. The data is received as "MSB first". It is important to align the phase of the incoming data, in order to receive the digital codes. For this purpose, the "comma\_enable" control was set to align the phase of the data receiver. Once the comma words were detected, the ADC was configured to send digital codes. (a) Test setup **(b)** ADC chip bonded to PCIe carrier Figure 8.30: TC1 measurement set up Figure 8.31: Reconstructed sine wave using digital codes at 40 MHz sampling rate #### 8.7 Measurement results The output data was recorded for a set of input DC voltages. The ADC was able to digitize input voltage ranging from 50 mV to 850 mV. The measurements were continued using sinusoidal input signal of varying frequencies. The output codes were used to reconstruct the original sinusoidal input. Figure 8.31 shows a reconstructed signal using digital codes corresponding to an input sine wave of frequency 1 MHz, 800 mV peak to peak. The full rail is 0-900 mV. The conversion rate was 40 MS/s. The total power consumption is 5.4 mW. This is much higher than the estimated power of 1.12 mW from post-layout simulations. The analog blocks consume very low power of about 61 $\mu$ W, and the digital blocks consume 0.9 mW, constituting a major share of the power budget. The digital leakage power is estimated to be $24\,\mu$ W. The ADC was tested successfully using input signals of frequencies ranging from 100 kHz to 4 MHz. #### 8.8 Summary A $1\times 1~mm^2$ test chip has been designed and fabricated in TSMC 28 nm high performance computing (HPC) process to asses its suitability for radiation hard or cryogenic applications. As a test circuit, an 8-bit 100 MSa/s ADC has been designed, targeting the requirements of the control circuitry of a quantum 8.8 Summary 139 computing system. The design aspects of ADC and its building blocks are explained in detail. The power consumption of the ADC prototype is measured to be 5.4 mW at a sampling rate of 40 MHz. The initial test results using a 1 MHz input sinusoidal signal at 40 MHz conversion rate look promising. ### Chapter 9 ### Conclusion The main contributions of this dissertation are: 1) the design of readout electronics for high voltage CMOS (HVCMOS) sensors in 180 nm, and 2) the design of an 8-bit 100 MS/s SAR ADC in 28 nm. The former is a high voltage CMOS process and the latter is a nanoscale bulk-CMOS process optimized for high performance computing. ATLASpix is a series of three monolithic sensor chips, engineered to meet the requirements of ATLAS inner tracker layer 4. Monolithic sensors are system-on-chip (SoC), with integrated sensors and readout. The usage of a commercial CMOS process ensures cost-efficiency when compared to existing hybrid sensors. The HVCMOS sensors described in this dissertation are proposed for the ATLAS inner tracker phase II upgrade. The first generation ASIC, namely ATLASpix1\_M2, contains a triggered readout scheme that can cope with high particle hit rates. It employs a smart pixel grouping technique called "Parallel Pixel to Buffer (PPtB)." A novel readout buffer topology is implemented based on content-addressable memory. The content addressable buffer (CAB) can store particle hit information until the elapse of an on-chip latency. The on-chip latency is programmable up to a maximum of 25 $\mu$ s, which corresponds to 1000 bunch crossings. The buffer filters hit data based on an external trigger signal. Triggered readout has a relaxed output bandwidth requirement compared to the traditional column drain readout. ATLASpix\_M2 has the smallest pixel size (50 $\mu$ m × 60 $\mu$ m) among the three design variants of ATLASpix1. The second generation prototype, ATLASpix2 follows the triggered readout scheme introduced in ATLASpix1\_M2. It is a proof-of-concept design for the following novel features: 1) readout with sorting of hits according to the 142 9 Conclusion chronology of events 2) data encoding using standard 8b/10b Aurora protocol 3) content addressable buffer with storage of amplitude information 4) hit neighbor logic for time walk correction. The third generation HVCMOS sensor chip, namely, ATLASpix3, is the first reticle size ( $2 \times 2~{\rm cm}^2$ ) sensor chip that is suitable for the construction of HVCMOS quad modules. To reduce the fanout, a command-based configuration and read back has been implemented. The clock and trigger signals are recovered from an incoming command bitstream at 160 Mbps. The data encoder uses a single channel 64b/66b Aurora protocol for high-bandwidth serial data transmission. The data word length was set to 32-bits to facilitate the transfer of two hit words per encoding cycle. Simulations show that it helps to maintain nearly 100% readout efficiencies for trigger rates exceeding 2 MHz (figure 6.6c in section 6.3.3). The following tasks were undertaken by this dissertation: #### 1. Implementation of triggered readout: A content addressable trigger buffer was designed for triggered readout. It includes 8T-SRAM cells and CAM cells (SRAM + comparator), hit receiver, and triggered readout logic. These are full custom logic blocks using customized standard cells. #### 2. Design of a readout control unit: Readout control unit (RCU) can be described as the brain of ATLASpix sensors. It coordinates the entire readout operation, and data transfer from the pixel matrix to the serial output link. It is responsible for a command-based configuration and read back. The design blocks of RCU include finite state machines, multi-clock domain logic, data synchronizer, encoder and serializer. The serial data link works at 1.28 Gbps. 3. Development of a full chip RTL front-end verification environment: An RTL model that emulates the behavior of the ATLASpix chip was developed. It was used for the functional verification of synthesized digital blocks such as the readout control unit. The full chip digital simulation helps to verify the functionality of the system during its early design phase. Sign off simulations were done in mixed-mode. #### 4. Post-silicon functional testing of ATLASpix1\_M2: The proposed readout architecture in ATLASpix1\_M2 is tested to be working. The serial data link works at the required rate of 1.28 Gbps. A new threshold tuning algorithm was developed and integrated into the existing test system. The tuning process demonstrated a $4\times$ improvement in threshold dispersion. The mean threshold was $1055\,\mathrm{e^-}$ with a standard deviation of $35\,\mathrm{e^-}$ . The mean value of noise distribution over the entire pixel matrix after tuning was $78 \,\mathrm{e}^-$ . 5. X-ray irradiation studies on ATLASpix\_M2: The readout blocks are fully functional after a total ionization dose of 100 MRad, an estimated TID at ATLAS ITk layer 4, over 10 years of operation. ATLASpix1\_M2 was characterized for leakage current variations and signal-to-noise ratio degradation. The tune-DACs are tested to be functional. A $2\times$ improvement in threshold dispersion was achieved after tunning. The mean threshold was $2096\,\mathrm{e^-}$ with a standard deviation of $95\,\mathrm{e^-}$ . The mean value of noise distribution over the entire pixel matrix after tuning was $82\,\mathrm{e^-}$ . The total power consumption after 100 Mrad TID is measured $216.6\,mW$ which is equivalent to $316.67\,\mathrm{mW/cm^2}$ . The ATLASpix sensors meet the requirements of ATLAS ITk layer 4. The high voltage CMOS sensors developed in this work are promising candidates for particle physics experiments. The final project of this dissertation was aimed at cryogenic applications such as Quantum Computing (QC). An 8-bit 100 MS/s SAR ADC was designed, targeting the requirements of a QC control system. The ADC was designed using low threshold devices to account for the reduction of overdrive voltage ( $V_{DD} - V_{T}$ ) at cryogenic temperatures due to an increase in device threshold voltage. A novel rail-to-rail, low power capacitive DAC scheme was proposed. A test system was developed for the characterization of the 28 nm chip. The ADC is functional at 40 MHz sampling rate with 1 MHz input sinusoidal signal. The total power consumption was measured to be 5.4 mW. Further characterization of ADC test chip is an ongoing task during the articulation of this dissertation. #### **Future Work** Radiation hard digital design has taken its leap forward due to an increased interest in the radiation immunity of electronic circuits in recent years. ATLASpix uses radiation hard design approaches such as guard rings and enclosed transistors. However, digitally synthesized blocks still use linear transistors. One recommendation will be to perform digital synthesis using customized standard cells with enclosed transistors ([72]). It is an area-expensive solution since the future detectors will demand an increase in the logic density of readout blocks. The second approach is to use standard libraries with additional PVT corners for radiation hardness. The additional PVT corners can be used for the worst- 144 9 Conclusion case timing and power closure at required radiation levels. Such libraries can be obtained from device characterization data. This will require test chips to include several standard cell test structures. The pre-characterized standard cell libraries can be re-used in the subsequent generations of ASIC. Some research groups ([72]) have already taken steps in this direction for 65 nm space electronics. Due to an ever-increasing demand for added functionality, an increase in logic density of readout blocks is inevitable for the future CMOS pixel sensor chips. Standard cells from the foundry can be customized for high density by reducing their heights and use them in the physical design flow. Some design communities at CERN follow this approach. In ATLASpix, we followed a similar approach, but in full-custom manner. It can be recommended for ATLASpix designs to follow digital-on-top integration methodology for readout buffer blocks by treating memory cells as black boxes. The advantage of this approach is the ease of toplevel timing closure. The digital-on-top methodology can be even adapted for the whole chip. Some effort has been made in this direction during this work. The idea was to generate LEF and liberty files using Cadence Liberate tool and use them for top-level integration. However, it was challenging to implement and validate this methodology during the design cycle of ATLASpix ASICs. Given the 180 nm feature size, full-custom design seems to be area efficient. However, a digital-ontop approach will be advantageous for readout simulations, which is currently verified using mixed-mode simulations on a simplified pixel matrix. It also helps to automate the routing of a large area design like ATLASpix during its top-level integration. Given the current trend in high speed ADC designs, there is a great scope for time interleaving the 28 nm SAR ADC discussed in this dissertation. The ADC layout has been designed to facilitate a time-interleaved architecture in the future. The 8-bit, 100 MS/s SAR can serve as a single stage. The time-interleaved arrangement of 10 stages can deliver a conversion rate of 1 GS/s with an added power budget. Alternatively, the current state of the art shows an added advantage in combining a pipelined architecture with SAR ADC as a sub-block. The SAR-assisted pipeline ADC architecture such as the one described in [121] is an energy-efficient hybrid architecture for moderately high-resolution analog-to-digital conversion. The SAR ADCs can work as high-resolution sub-ADCs in the pipelined stages. Another scope of improvement lies in the use of an asynchronous readout logic, combined with a timing block to generate sample and latch clocks for S/H and comparator, respectively. Asynchronous logic has been historically used to shorten the SAR conversion time by removing the waiting period. A major challenge in nanoscale designs is the device matching. Poor device matching can result in comparator offset, which can add to non-linearity in ADCs. Auto zeroing and load capacitor calibration can be employed to mitigate such effects. The DAC design can be further improved by adopting a fully differential scheme. #### **Concluding remarks** This dissertation deals with the design of integrated circuits, whose applications lie well outside of industry standards, such as for particle physics and quantum computing. An 8-bit SAR ADC was designed to meet the requirements of a quantum computing control system. The high voltage CMOS (HVCMOS) sensor chip proposed in this work offers a cost-effective solution for the outer pixel layers of ATLAS ITk, compared to the current state-of-the-art hybrid sensors. Several novel readout topologies for pixel sensors were introduced in this work. Functional tests and irradiation studies were successfully conducted on the first large area HVCMOS sensor with triggered readout. The HVCMOS pixel sensor chips are proven to meet the layer 4 specifications of ATLAS ITk. The third generation chip designed as a part of this dissertation is used for the construction of HVCMOS demonstrator quad-module for ATLAS ITk. ## Appendix A # Depletion region depth in a High Voltage CMOS sensor High Voltage CMOS (HVCMOS) sensor is a reverse biased PN junction with pregion as the substrate and n-region as the deep nwell. It is important to note that the PN junction, in this case, has a different doping profile than a regular PN junction. This causes the depth of the depletion region to be thicker toward the P side (lightly doped) when compared to the N side (heavily doped). When an ionizing particle passes through an HVCMOS sensor, electron-hole pairs are created. The holes move toward the negative high voltage terminal, whereas the electrons get collected by the deep nwell. The signal from the sensor diode is in turn amplified and digitized. There are several factors that affect the sensor's performance. This section aims to prove that the depth of the depletion region in an HVCMOS sensor depends on the resistivity of p-substrate and the applied reverse bias voltage. Electron-hole pairs are information carriers in semiconductor detectors. The excitation of an electron to the conduction band creates a vacancy in the valence band. The probability of thermal generation of an electron-hole pair per unit time is given by, $$P(T) = CT^{3/2}e^{(-E_g/2kT)} (A.1)$$ where T is the absolute temperature, $E_g$ is the band gap energy and k is the Boltzmann constant and C is a constant which is the characteristic of the material. A pure semiconductor is an insulator at absolute zero. Thermal excitation in semiconductor materials depends on the temperature. As the temperature increases, electrons can be thermally excited from the valence band to the **Figure A.1:** HVCMOS operating principle: The traversing particle causes the generation of electron-hole pairs. The electrons drift toward the deep nwell under the influence of the applied electric field. conduction band. An ideal semiconductor detector must be charge neutral in the absence of radiation. Since cooling reduces the number of electron-hole pairs in the crystal, most of the semiconductor detectors are cooled to liquid nitrogen temperature ( $\approx 77~\rm K$ ). The fundamental interaction between a semiconductor detector and radiation is associated with the generation of electron-hole pairs. The number of electron-hole pairs generated is directly proportional to the average energy dissipated by the traversing charged particle. The energy required to generate an electron-hole pair is much larger than the bandgap as shown in table A.1. The energy gap is the shortest distance between the valence band and the conduction band. This does not imply that generation of an electron-hole requires 1.1 eV for silicon, but it requires more energy since photon absorption happens through an indirect band gap [117]. A share of the energy of incident photon used by a phonon that represents crystal lattice vibration. **Table A.1:** Band gap Vs energy requirement of e-h pair production | | Bandgap | | electron-ho | electron-hole pair creation | | |-------------|---------|---------|-------------|-----------------------------|--| | Temperature | 300 K | 0 K | 300 K | 0 K | | | Si | 1.11 eV | 1.16 eV | 3.62 eV | 3.76 eV | | | Ge | 0.66 eV | 0.74 eV | 2.91 | 2.96 eV | | #### A.1 P-N junction under reverse bias A semiconductor detector is essentially a reverse biased P-N junction. The presence of a uniformly distributed electric field is a necessity for efficient charge collection. Charge collection by electric field is an age-old concept from the time of gas-filled detectors. The applied high voltage in an HVCMOS sensor depends on the detector size and is typically tens of volts. The standard process follows the process of modifying the impurity concentration of one side of the material (either p or n-type) so that both sides of the same material can have opposite configurations. For example, if the initial material is p-type the radiation interactions occurs mainly at the p-region and it is called a p-type detector. The original acceptor concentration is represented by $N_A$ . The donor impurity concentration is represented by $N_D$ . Near the interface, $N_D$ can be made to exceed $N_A$ . This means that in the n-type region, there is a higher density of mobile electrons and in the p-type region, there is a lower density of mobile electrons. This results in a net diffusion of electrons and holes from the high density side to low density side. A charge build-up occurs on either side of the junction which diminishes the tendency for further diffusion of electrons and holes. At equilibrium, the built-in electric field is such that it is just adequate to prevent diffusion across the junction. A steady state of charge distribution is established, forming a space charge region called the depletion region. The value of electric potential V at any point in the depletion region is given by the solution of Poisson's equation\*: $$\nabla^2 V = -\frac{\rho}{\epsilon} \tag{A.2}$$ where $\rho$ is the net charge desnity and $\epsilon$ is the dielectric constant. In a single dimension the it can be written as, $$\frac{d^2V(x)}{dx^2} = -\frac{\rho(x)}{\epsilon} \tag{A.3}$$ <sup>\*</sup>The Poisson equation is not a basic equation, but follows directly from the Maxwell equations if all time derivatives are zero, i.e. for electrostatic conditions. The first Maxwell equation for the electrical field, E under these conditions is $\nabla E = \frac{\rho}{\epsilon \epsilon_0}$ . We have used the definition of the electrical field E as the negative gradient of potential V. Since the second derivative of the electrical potential times $\epsilon \epsilon_0$ is just the charge density as asserted by Poissons equation, integrating the charge density once essentially yields the electrical field strength and integrating it twice yields the potential The corresponding electric field is given by $$\vec{E} = -\nabla V \tag{A.4}$$ The depletion region in an unbiased p-n junction will function as a radiation detector with poor efficiency. An electric signal can be formed by the electrons swept toward the n-type material and holes toward the p-type material. The charge collection is not practical due to the absence of contact potential, which plays an important role in moving the charge carries quickly across the junction. The generated electron-hole pairs can get trapped or recombined in such condition. This will also result in poor noise characteristics of the unbiased sensor diode. If the thickness of the depletion region is small, the detector area is reduced, which brings forth the requirement of reverse biasing the sensor diode. Under reverse biased condition, the semiconductor detector diode operates with higher efficiency. A high voltage CMOS process enables the application of tens of volts across the p-substrate and the deep n-well region as shown in figure A.1. As a result, the generated electrons are drifted across the junction towards the deep n-well, which acts as the charge collection electrode. Since the concentration of the minority carriers are low, very less leakage current is expected across the junction. Let us analyze the effect of reverse bias on a PN junction. We can use the simplified of charge distribution shown in figure A.2 to analyze the properties of the reverse biased pn junction. In an idealized distribution, $\rho(x)=eN_D$ for $-a\leq x<0$ . Electron diffusion results in a uniform positive space charge between -a and 0. Hole diffusion results in a uniform negative space charge between 0 and b. The net charge is zero, which means $N_D\times a=N_A\times b$ . The electric potential can be found from the solution of equation A.3, where in this case for $-a\leq x<0$ , $$\frac{d^2V(x)}{dx^2} = -\frac{\rho(x)}{\epsilon} = -\frac{eN_D}{\epsilon} \tag{A.5}$$ for $0 \le x \le b$ , $$\frac{d^2V(x)}{dx^2} = -\frac{\rho(x)}{\epsilon} = +\frac{eN_A}{\epsilon} \tag{A.6}$$ The electrical field, $\vec{E} = -\nabla V$ must be 0 at the edges of the charge distribution, i.e. $$\frac{dV(x)}{dx}(x=-a) = 0 (A.7)$$ and $$\frac{dV(x)}{dx}(x=b) = 0 (A.8)$$ **Figure A.2:** Assumed concentration profiles for p-n junction and the corresponding profiles for space charge, electric field and potential Integrating equations A.5 and A.6 with limits $-a \le x \le b$ , we obtain the following: $$\frac{dV(x)}{dx} = -\frac{eN_D}{\epsilon}(x+a) \quad (-a \le x < 0)$$ (A.9) $$\frac{dV(x)}{dx} = \frac{eN_A}{\epsilon}(x-b) \quad (0 \le x \le b)$$ (A.10) The difference in potential across the junction is $V_b$ , the applied bias voltage<sup>†</sup>. Therefore $V(-a) = V_b$ and V(b) = 0. Applying these values and integrating equations A.9 and A.10 again, $$V(x) = -\frac{eN_D}{2\epsilon}(x+a)^2 + V_b \quad (-a \le x < 0)$$ (A.11) $$V(x) = \frac{eN_A}{2\epsilon}(x-b)^2 + 0 \quad (0 \le x \le b)$$ (A.12) At x = 0, the two solutions match. Which implies: <sup>&</sup>lt;sup>†</sup>Assuming the built in potential is less than the applied bias voltage. ie. $V_{bi} << V_b$ and it can be ignored. $V_{bi}$ is the zero bias junction voltage, given by $V_T \ln \left( \frac{N_D N_A}{n_i^2} \right)$ , where $V_T$ the thermal voltage of 26 mV (kT/C) at room temperature, $N_D$ and $N_A$ are the impurity concentrations and $n_i$ is the intrinsic concentration. The significance of built-in potential across the junction, is that it opposes both the flow of holes and electrons across the junction $$-\frac{eN_D}{2\epsilon}(a)^2 + V_b = \frac{eN_A}{2\epsilon}(b)^2 \tag{A.13}$$ in turn yields the following result $$N_D a^2 + N_A b^2 = \frac{2\epsilon V_b}{e} \tag{A.14}$$ If we define d as the total width of the depletion region, d = a + b. Since the doping level at N side is much higher than that of P side, $N_D >> N_A$ and using the relation, $N_D \times a = N_A \times b$ , the total width of the depletion region is calculated as $$d = a + b \cong b = \sqrt{\frac{2\epsilon V_b}{eN_A}} \tag{A.15}$$ # A.2 Relation between substrate resistivity and acceptor concentration Let us consider the p-substrate of an HVCMOS sensor that forms the p-side of the PN junction diode. When an electric field is applied across a semiconductor device, the electrons and holes undergo drift. They move in opposite directions which causes a net current in the same direction as that of the electric field. It can be shown that the drift velocity of electrons in a semiconductor, $v_e$ is directly proportional to the applied electric field. ie. $v_e \propto \vec{E}$ . This is equalized by a proportionality constant called mobility ( $\mu_e$ ). The total drift current, i is given by $$i = neAv_e (A.16)$$ where n is the number of charge carriers, $e = 1.6 \times 10^{-19} C$ , A is the area of the cross section and $v_e$ is the drift velocity of electrons. The current density, J is defined as as the current per unit cross sectional area. Therefore, $$J = i/A = nev_e = ne\mu_n E \tag{A.17}$$ Ohm's law states that $J = \sigma E$ where $\sigma$ is the conductivity which is a property of the material. Therefore from equation A.17‡, $$\sigma = ne\mu_n \tag{A.18}$$ where n is the concentration of charge carriers. In case of p-type material it is holes; which gives the conductivity of a p-type material as $$\sigma = e\mu_p N_A \tag{A.19}$$ $$\sigma \propto N_A \implies \rho \propto \frac{1}{N_A}$$ (A.20) From equations A.20 and A.15, we can infer that $$d \propto \sqrt{\rho V_b}$$ (A.21) where $\rho$ is the resistivity of the p-substrate and $V_b$ is the applied reverse bias across the pn junction. Hence, the size of the depletion region in an HVCMOS sensor depends on both bias voltage and the resistivity of p-substrate. By applying a high reverse bias (of the order of 10s of volts), a depletion depth of tens of microns can be obtained. The active volume of the detector can be estimated from the width of the depletion region. The depletion region behaves like a capacitor ( $\epsilon A/d$ ) since charges are built up on either side of the pn junction. Thus, as the reverse bias voltage increases, the depletion region grows and the capacitance decreases. A small detector capacitance is preferred for a good energy resolution. <sup>&</sup>lt;sup>‡</sup>Since the mobility of holes $(\mu_p)$ is less than that of electrons $(\mu_n)$ due to their higher effective mass, we can rewrite the current density equation as $J = (n\mu_n + p\mu_p)eE$ # Appendix B # ADC architecture proposals and feasibility analysis Several new ADC architectures were proposed before arriving at an 8-bit SAR ADC which was taped out in TC1, test chip. Some of the new architectural ideas are described below. #### B.0.1 A VCO based delay line ADC Figure B.1: VCO based delay line ADC A time based ADC shown in figure B.1 works in a time interleaved manner, sampling different points of the analog input and converting them into digital codes. The architecture takes advantage of the fast switching frequency of the 28 nm MOSFET. The digital code is a free-running thermometer code through the delay line. Figure B.2: waveform sampling In this architecture, the current source adds to non-linearity as well as noise, resulting in less Signal to Noise Distortion Ratio (SNDR). #### B.0.2 Characterization of a delay chain A minimum sized inverter is designed to analyze the switching speed of a delay line. The NMOS device is chosen to have the minimum W/L (100/30 nm). The width of the PMOS transistor is chosen to equalize the drain current as that of NMOS which is obtained from I-V characteristics. For a $V_{DS}$ of 0.9 V and $V_{GS}$ of 0.9 V, NMOS yields a drain current of 95.732 $\mu$ A. For a $V_{DS}$ of 0.9 V and $V_{GS}$ of -0.9 V, PMOS yields a drain current of 72.98 $\mu$ A. This implies that the W/L ratio between PMOS and NMOS devices that can carry the same drain current is 1.3:1. The width of the PMOS device is 130 nm for a minimum sized inverter. Figure B.3: inverter schematic Figure B.4 shows a rise time of 5 ps and fall time of 4.88 ps. Figure B.4: inverter characteristics A delay chain is designed using five minimum sized inverters with buffered output at each stage as shown in figure B.5. Figure B.5: Delay line of five inverters with buffered output From figure B.7a, the rise time $(t_{pHL})$ is 5.5 ps and the fall time $(t_{pLH})$ is 5.6 ps. The propagation delay, $t_p$ is given by the equation: $$t_p = \frac{t_{pHL} + t_{pLH}}{2} \tag{B.1}$$ the propagation delay, $t_p$ is calculated to be 5.55 ps. The frequency of oscillation, f is 18 GHz as given by the following equation: $$f = \frac{1}{2 \times N \times t_p} \tag{B.2}$$ Figure B.6: Delay definitions **Figure B.7:** 5-stage delay line simulations: In Figure B.7a, the propagation delay of an inverter is obtained from the simulations to estimate the frequency of a ring oscillator. Figure B.7b shows the phase delay of a single stage that corresponds to the maximum obtainable sampling rate in the proposed architecture. The phase delay between the consecutive stages is 9 ps as shown in the simulation(figure B.7b). This means that if we sample the input analog signal at each stage of the delay line, we can obtain a sampling rate as high at 111.11 GHz theoretically. This may be further reduced by parasitics. #### 6-bit parallel in serial out register 64-bit thermometer to 6-bit binary encode load[99] 64-bit Latch [63] Analog\_In 6-bit parallel in serial out register Scan out 64-bit thermometer to 6-bit binary encoder load[1] 64-bit Latch [4 Vth Analog\_In 6-bit parallel in serial out register load[0] 64-bit thermometer to 6-bit binary encoder 64-bit Latch 40 Vth Trigger\_In Analog\_In #### **B.0.3** A delay line based single-shot ADC Figure B.8: Delay line based single-shot ADC A delay line based ADC architecture is proposed below. A trigger pulse turns on the sampling switch as well as initiates the delay line. The vertical delay line provides time interleaving capability. The analog signal is sampled using a capacitive voltage divider. A load signal is generated when the input amplitude crosses a given threshold. At this point, a thermometer code is loaded into the registers. The thermometer code depends on the state of the delay chain. Threshold crossing time varies based on the amplitude of the input signal. A delay line based single shot ADC is shown in figure B.8. The advantages of this architecture are 1) low noise since there is no current source in the sampling circuit unlike the previous one 2) The readout can be implemented using a shift register based scheme 3) Matching of delay elements is not important since the code is based on the propagation of 1's and not the delay time. Another advantage is that the comparator can be made slower than the delay line switching frequency. The architecture can be easily scaled up in resolution since we can add extra bits by increasing the number of delay elements. It comes at the area and switching power cost of adding $2^N$ delay elements per N-bit. charge injection in sampling switch. **Figure B.9:** Simulation of sample and hold circuit: showing the non linearity in sampled values at different points of the input sine wave **Figure B.10:** Sample and hold circuit simulation: The input sine wave is sampled at different points using the delay line. The threshold crossing point is recorded by the comparator. The inverting node is pulled up to prevent the threshold crossing while discharging. The idea of delay line can be extended to a digital dominated architecture in figure B.11. This architecture requires a reference clock of 125 MHz which is used by a PLL to generate the 800 MHz clock. Figure B.11: Ring oscillator based continuously running synchronous ADC ## **B.1** Comparison with SAR architecture ### 1. Sample and hold circuit: For a time based ADC, the discharge current source introduces non-linearity and added power consumption where as in a SAR ADC, the sample and hold circuit can be implemented without a current source. ### 2. Comparator: A continuous time comparator consumes more static power than a latch based comparator. In SAR architecture a clock triggered comparator can be used for synchronous timing. #### 3. Conversion Block: In time based architecture, to cover a large dynamic range, a lot of delay stages are required which may add to power consumption. In SAR ADC, the digital logic and DAC power consumption and mostly due to switching elements. ### **B.2** SAR ADC Verilog model The following Verilog model was used to verify the functionality and timing of digital logic with the ADC system. The test bench emulates the comparator, sampling and binary weighted capacitive DAC in Verilog. The DAC used in this system can count up and down based on the SAR logic output. ``` module SARDigitaltb(); reg clk,go,rst,simck; wire valid, sample; wire cmp; wire [7:0] downb, up; wire doutcmos, dout; reg commaen, cmosen; wire validideal, sampleideal; wire [7:0] downbideal, upideal; wire doutcmosideal, doutideal; $//edge detection$ reg [7:0] downbdel, updel, downbedge, upedge; reg [7:0] vin; reg [7:0] signal; initial begin // sdf annotate for post route simulation $sdf\_annotate("./SARDigitalTop.slow.sdf", SARtopI,, "annotate.log", "maximum"); end // instance controller circuit SARDigitalTop SARtopI( .clk(clk), .rst(rst), .go(go), .cmp(cmp), .commaen (commaen), ``` ``` .cmosen(cmosen), .sample(sample), .valid(valid), .dout (dout), .doutcmos (doutcmos), .downb (downb), .up(up) ); SARDigitalTop SARtopIdeal( .clk(clk), .rst(rst), .go(go), .cmp(cmp), .commaen (commaen), .cmosen(cmosen), .sample(sampleideal), .valid(validideal), .dout(doutideal), .doutcmos(doutcmosideal), .downb(downbideal), .up(upideal) ); always @ (posedge clk or posedge rst) begin if(rst) signal<=0;</pre> else if(go) if(sample) signal <= signal + 1; end assign cmp = vin < 128; always @(posedge simck) begin downbdel <= downb;</pre> updel <= up; downbedge <= downbdel & ~downb;</pre> upedge <= up & ~updel;</pre> if(sample) vin <= signal;</pre> ``` ``` else begin if(downbedge[7]) vin <= vin - 64; if(downbedge[6]) vin <= vin - 32;</pre> if(downbedge[5]) vin <= vin - 16;</pre> if(downbedge[4]) vin <= vin - 8;</pre> if(downbedge[3]) vin <= vin - 4;</pre> if(downbedge[2]) vin <= vin - 2; if(downbedge[1]) vin <= vin - 1;</pre> if(downbedge[0]) vin <= vin - 0;</pre> if (upedge[7]) vin \le vin + 64; if(upedge[6]) vin \le vin + 32; if(upedge[5]) vin \le vin + 16; if (upedge[4]) vin \leq vin + 8; if(upedge[3]) vin \le vin + 4; if(upedge[2]) vin <= vin + 2;</pre> if(upedge[1]) vin <= vin + 1;</pre> if(upedge[0]) vin \le vin + 0; end end // reciever to decode 8b/10b data //ideal block uses RTL netlist for verification of //post routed simulation results FMMuPixAuroraRXFPGA receiverI( .clkser(clk), .resn(!rst), .dout (dout), .slowck(valid) ); FMMuPixAuroraRXFPGA receiverideal ( .clkser(clk), .resn(!rst), .dout(doutideal), .slowck(validideal) ); ``` ``` // Test bench monitor some signals and provide input stimuli initial begin #0 clk = 0; \#0 simck = 0; #0 commaen = 0; #0 cmosen = 0; #0 rst = 1; #0 go = 0; #10 rst = 0; #100 go=1; #0 commaen = 1; #400 commaen = 0; #5000 go=0; #5000 go=1; #40 go=0; #5000 go=1; #40 go=0; #5000; \$stop; end \ensuremath{//} generate a clock with period of 1 time unit // a 10x fast clock is used for simulation puposes always #0.5 clk=~clk; always #0.05 simck=~simck; endmodule ``` # Acknowledgements This dissertation did not take long to write, yet it took almost four years in its making. It would have been impossible without the following people and the odds of this universe that brought them together. I am extremely grateful to my supervisor, Prof. Dr. Ivan Perić, Head of the KIT ASIC and Detector Laboratory (KIT-ADL), for walking me through the bizarre world of circuit design. I dedicate every bit of my knowledge in ASIC design to him. I would like to express my gratitude to Prof. Dr. Paul Leroux, Head of the Advanced Integrated Sensing Lab (ADVISE), KU Leuven, for agreeing to be my second supervisor and his willingness for international collaboration. I am thankful to Prof. Dr. Marc Weber, Head of the Institute of data Processing and Electronics (IPE), for helping me gain a broader vision of the organization as well as his support during my research term. I would like to thank the Karlsruhe School of Elementary Particle and Astroparticle Physics (KSETA) for offering me a doctoral fellowship that gave me this opportunity. Thanks to all the present and former ADL group members, especially Mr. Felix Ehrler, Dr. Richard Leys, and Mr. Rudolf Schimassek, without whom, this dissertation would not have been what it is. At this juncture, I thank all the students and staff members at IPE for a pleasant and close-knit work environment. My extended thanks to all the members of HVCMOS collaboration, especially Dr. Winnie Wong and Dr. Mathieu Benoit for their expert guidance, support, and feedback during our ATLASpix team meetings. I would like to express my special thanks to the following people for providing me an insight into the requirements and specifications of ADCs. Dr. Jeffrey Prinzie and Mr. Bjorn Van Bockel at ADVISE lab, KU Leuven, for their help in analyzing ADC architectures. Prof. Dr. Edoardo Charbon and his team for explaining the requirements of a quantum computing system at QuTech research center, TU Delft. Their inputs have been instrumental during the initial design phase of the ADC chip. Setting up of 28 nm design environment was one of the most challenging tasks I had encountered during this Ph.D. I owe that credit to the technology support crew at IMEC and microelectronics support center at Rutherford Appleton laboratory for their guidance, from the time of installing the design kit until tape out. I am lucky to have friends and family members who are enthusiastic about my work. My heartfelt thanks to Behdad for being with me through thick and thin during these years. It is one of those few moments when I tried to express my gratitude toward my parents. Their vision and unconditional love made my words fall short every single time. Thanks to my sister and brother-in-law for cheering me up during my international research adventure. I can't thank enough my little niece for being a bundle of joy. I am thankful to my former colleagues at Intel Corporation for being a source of inspiration. Their meaningful conversations and career advice have helped me shape this Ph.D. journey. This dissertation would not have been a reality if it hadn't been for these wonderful people who have stepped in and opened up new dimensions within me. The memories of these well-spent years will continue to energize me as I move forward in life. ## **Publications** ### Published papers - [1] M. Prathapan, M. Benoit, R. Casanova, D. Dannheim, F. Ehrler, M. Kiehn, A. Nürnberg, P. Pangaud, R. Schimassek, E. Vilella, A. Weber, W. Wong, H. Zhang, and I. Perić. Towards the large area hvcmos demonstrator for atlas itk. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 936:389 391, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors - [2] Ivan Peric, Mridula Prathapan, Heiko Augustin, Mathieu Benoit, Raimon Casanova Mohr, Dominik Dannheim, Felix Ehrler, Fadoua Guezzi Messaoud, Moritz Kiehn, Andreas Nürnberg, Rudolf Schimassek, Mateus Vicente Barreto, Eva Vilella Figueras, Alena Weber, Winnie Wong, and Hui Zhang. A high-voltage pixel sensor for the atlas upgrade. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 924:99 103, 2019. 11th International Hiroshima Symposium on Development and Application of Semiconductor Tracking Detectors - [3] Mridula Prathapan et al. Design of a HVCMOS pixel sensor ASIC with onchip readout electronics for ATLAS ITk Upgrade. *PoS*, TWEPP2018:074, 2019 - [4] DMS Sultan, S. Gonzalez Sevilla, D. Ferrere, G. Iacobucci, E. Zaffaroni, W. Wong, M.V. Barrero Pinto, M. Kiehn, M. Prathapan, F. Ehrler, and et al. Electrical characterization of ams ah18 hv-cmos after neutrons and protons irradiation. *Journal of Instrumentation*, 14(05):C05003–C05003, May 2019 - [5] F. Ehrler, M. Benoit, D. Dannheim, M. Kiehn, A. Nürnberg, I. Perić, M. Prathapan, R. Schimassek, T. Vanat, M. Vicente, A. Weber, and H. Zhang. Characterization results of a hvcmos sensor for atlas. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, 936:654 656, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors 170 Publications [6] I. Perić, R. Blanco, R. Casanova Mohr, F. Ehrler, F. Guezzi Messaoud, C. Krämer, R. Leys, M. Prathapan, R. Schimassek, A. Schöning, E. Vilella Figueras, A. Weber, and H. Zhang. Status of HVCMOS developments for ATLAS. *Journal of Instrumentation*, 12(02):C02030–C02030, feb 2017 - [7] Dirk Wiedner et al. Readout Electronics for the First Large HV-MAPS Chip for Mu3e. *PoS*, TWEPP-17:099, 2018 - [8] H. Augustin et al. Performance of the large scale HV-CMOS pixel sensor MuPix8. *JINST*, 14(10):C10011, 2019 - [9] H. C. Augustin, N. Berger, S. Dittmeier, F. Ehrler, J. Hammerich, A. Herkert, L. Huth, D. Immig, J. Kröger, I. Peric, M. Prathapan, R. Schimassek, A. Schöning, A. L. Weber, D. Wiedner, and H. Zhang. MuPix8 - A large-area HV-MAPS chip. In 26th International Workshop on Vertex Detectors, page 57, Sep 2017 - [10] Moritz Kiehn, Francesco Armando Di Bello, Mathieu Raimon Casanova Mohr, Hucheng Chen, Kai Chen, Sultan D.M.S., Felix Ehrler, Didier Ferrere, Dylan Frizell, Sergio Gonzalez Sevilla, Giuseppe Iacobucci, Francesco Lanni, Hongbin Liu, Claudia Merlassino, Jessica Metcalfe, Antonio Miucci, Ivan Peric, Mridula Prathapan, Rudolf Schimassek, Mateus Vicente Barreto, Thomas Weston, Eva Vilella Figueras, Michele Weber, Alena Weber, Winnie Wong, Weihao Wu, Ettore Zaffaroni, Hui Zhang, and Matt Zhang. Performance of cmos pixel sensor prototypes in ams h35 and ah18 technology for the atlas itk upgrade. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 924:104 – 107, 2019. 11th International Hiroshima Symposium on Development and Application of Semiconductor Tracking Detectors - [11] M. Kiehn, A. Herkert, A. Weber, A. Schöning, A. Miucci, A. Fehr, C. Blattgerste, C. Grzesik, D.M.S. Sultan, D. Immig, D. Forshaw, D. Ferrere, D. Wiedner, E. Zaffaroni, E. Vilella Figueras, F. Ehrler, F. Lanni, G. Iacobucci, H. Augustin, H. Liu, H. Chen, I. Peric, J. Anders, J. Hammerich, J. Kröger, J. Vossebeld, K. Chen, L. Xu, L. Noehte, L. Huth, M. Vicente, M. Benoit, M. Weber, M. Prathapan, N. Berger, S. Dittmeier, S. Sevilla, S. Tang, T. Golling, W. Wu, and W. Wong. Performance of the ATLASPix1 pixel sensor prototype in ams aH18 CMOS technology for the ATLAS ITk upgrade. *Journal of Instrumentation*, 14(08):C08013–C08013, aug 2019 - [12] H. Augustin, N. Berger, S. Dittmeier, F. Ehrler, C. Grzesik, J. Hammerich, A. Herkert, L. Huth, J. Kröger, F. Meier Aeschbacher, I. Perić, M. Prathapan, R. Schimassek, A. Schöning, I. Sorokin, A. Weber, D. Wiedner, H. Zhang, and M. Zimmermann. Mupix8 large area monolithic hycmos pixel detector for the mu3e experiment. Nuclear Instruments and Methods in Publications 171 *Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment,* 936:681 – 683, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors [13] Roberto Blanco et al. HVCMOS Monolithic Sensors for the High Luminosity Upgrade of ATLAS Experiment. *JINST*, 12(04):C04001, 2017 ### **Under review** [1] Mridula Prathapan et al. ATLASpix3: A high voltage CMOS sensor chip designed for ATLAS Inner Tracker. https://indi.to/W8Z45, 2019. Accessed: February 12, 2020 ### **Talks** - [1] Mridula Prathapan et al. Design of HVCMOS pixel sensor chips for ATLAS ITk upgrade. https://publikationen.bibliothek.kit.edu/1000091693, 2019. Accessed: February 12, 2020 - [2] Mridula Prathapan et al. Large area HVCMOS pixel sensor prototype for ATLAS detector upgrade. https://www.dpg-verhandlungen.de/year/2018/conference/wuerzburg/part/t/session/5/contribution/2, 2018. Accessed: February 12, 2020 - [1] G Aad et al. ATLAS pixel detector electronics and sensors. *Journal of Instrumentation*, 3(07):P07007–P07007, jul 2008. - [2] Abelev et al. Technical Design Report for the Upgrade of the ALICE Inner Tracking System. Technical Report CERN-LHCC-2013-024. ALICE-TDR-017, Nov 2013. - [3] A. M. Abo and P. R. Gray. A 1.5-v, 10-bit, 14.3-ms/s cmos pipeline analog-to-digital converter. *IEEE Journal of Solid-State Circuits*, 34(5):599–606, May 1999. - [4] A. Akturk, M. Peckerar, M. Dornajafi, N. Goldsman, K. Eng, T. Gurrieri, and M. S. Carroll. Impact ionization and freeze-out model for simulation of low gate bias kink effect in soi-mosfets operating at liquid he temperature. In 2009 *International Conference on Simulation of Semiconductor Processes and Devices*, pages 1–4, Sep. 2009. - [5] G Apollinari, I Béjar Alonso, O Brüning, M Lamont, and L Rossi. High-Luminosity Large Hadron Collider (HL-LHC): Preliminary Design Report. CERN Yellow Reports: Monographs. CERN, Geneva, 2015. - [6] Mohit Arora. *Clock Dividers*, pages 87–93. Springer New York, New York, NY, 2012. - [7] Collaboration ATLAS. Letter of Intent for the Phase-II Upgrade of the ATLAS Experiment. Technical Report CERN-LHCC-2012-022. LHCC-I-023, CERN, Geneva, Dec 2012. Draft version for comments. - [8] H. Augustin, N. Berger, S. Dittmeier, F. Ehrler, C. Grzesik, J. Hammerich, A. Herkert, L. Huth, J. Kröger, F. Meier Aeschbacher, I. Perić, M. Prathapan, R. Schimassek, A. Schöning, I. Sorokin, A. Weber, D. Wiedner, H. Zhang, and M. Zimmermann. Mupix8 large area monolithic hycmos pixel detector for the mu3e experiment. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated - *Equipment*, 936:681 683, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors. - [9] H. Augustin et al. Performance of the large scale HV-CMOS pixel sensor MuPix8. *JINST*, 14(10):C10011, 2019. - [10] H. C. Augustin, N. Berger, S. Dittmeier, F. Ehrler, J. Hammerich, A. Herkert, L. Huth, D. Immig, J. Kröger, I. Peric, M. Prathapan, R. Schimassek, A. Schöning, A. L. Weber, D. Wiedner, and H. Zhang. MuPix8 - A large-area HV-MAPS chip. In 26th International Workshop on Vertex Detectors, page 57, Sep 2017. - [11] R Ballabriga, J Alozy, G Blaj, M Campbell, M Fiederle, E Frojdh, E H M Heijne, X Llopart, M Pichotka, S Procz, L Tlustos, and W Wong. The medipix3rx: a high resolution, zero dead-time pixel detector readout chip allowing spectroscopic imaging. *Journal of Instrumentation*, 8(02):C02016–C02016, feb 2013. - [12] M. Barbero et al. Design and test of the CMS pixel readout chip. *Nucl. Instrum. Meth.*, A517:349–359, 2004. - [13] G. Barbottin and A. Vapaille. *Instabilities in Silicon Devices: New insulators devices and radiation effects.* Number v. 3. Elsevier, 1999. - [14] M. Benoit, S. Braccini, G. Casse, H. Chen, K. Chen, F.A. Di Bello, D. Ferrere, T. Golling, S. Gonzalez-Sevilla, G. Iacobucci, M. Kiehn, F. Lanni, H. Liu, L. Meng, C. Merlassino, A. Miucci, D. Muenstermann, M. Nessi, H. Okawa, I. Perić, M. Rimoldi, B. Ristić, M. Vicente Barrero Pinto, J. Vossebeld, M. Weber, T. Weston, W. Wu, L. Xu, and E. Zaffaroni. Testbeam results of irradiated ams h18 HV-CMOS pixel sensor prototypes. *Journal of Instrumentation*, 13(02):P02011–P02011, feb 2018. - [15] Roberto Blanco et al. HVCMOS Monolithic Sensors for the High Luminosity Upgrade of ATLAS Experiment. *JINST*, 12(04):C04001, 2017. - [16] A. Blondel et al. Research Proposal for an Experiment to Search for the Decay $\mu \to eee$ . 2013. - [17] Philippe Campagne-Ibarcq. *Measurement back action and feedback in superconducting circuits*. Theses, Ecole Normale Supérieure (ENS), June 2015. - [18] M. Campbell, J. Alozy, R. Ballabriga, E. Frojdh, E. Heijne, X. Llopart, T. Poikela, L. Tlustos, P. Valerio, and W. Wong. Towards a new generation of pixel detector readout chips. *Journal of Instrumentation*, 11(01):C01007–C01007, jan 2016. - [19] Z. Cao, S. Yan, and Y. Li. A 32 mw 1.25 gs/s 6b 2b/step sar adc in 0.13 $\mu$ m cmos. *IEEE Journal of Solid-State Circuits*, 44(3):862–873, March 2009. [20] E. Charbon, F. Sebastiano, A. Vladimirescu, H. Homulle, S. Visser, L. Song, and R. M. Incandela. Cryo-cmos for quantum computing. In 2016 IEEE International Electron Devices Meeting (IEDM), pages 13.5.1–13.5.4, Dec 2016. - [21] J (CERN) Chistiansen and M (LBNL) Garcia-Sciveres. RD Collaboration Proposal: Development of pixel readout integrated circuits for extreme rate and radiation. Technical Report CERN-LHCC-2013-008. LHCC-P-006, CERN, Geneva, Jun 2013. The authors are editors on behalf of the participating institutes. the participating institutes are listed in the proposal. - [22] K. D. Choo, J. Bell, and M. P. Flynn. 27.3 area-efficient 1gs/s 6b sar adc with charge-injection-cell-based dac. In 2016 IEEE International Solid-State Circuits Conference (ISSCC), pages 460–461, Jan 2016. - [23] Chuck Benz. 8b/10b encoder. http://asics.chuckbenz.com, 2019. [Online; accessed 22-October-2019]. - [24] ATLAS Collaboration. Technical Design Report for the ATLAS Inner Tracker Strip Detector. Technical Report CERN-LHCC-2017-005. ATLAS-TDR-025, CERN, Geneva, Apr 2017. - [25] CMS Collaboration. Technical proposal for the upgrade of the CMS detector through 2020. Technical Report CERN-LHCC-2011-006. LHCC-P-004, Jun 2011. - [26] LHCb Collaboration. LHCb VELO Upgrade Technical Design Report. Technical Report CERN-LHCC-2013-021. LHCB-TDR-013, Nov 2013. - [27] Giacomo Contin. The maps-based vertex detector for the star experiment: Lessons learned and performance. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment,* 831:7 11, 2016. Proceedings of the 10th International "Hiroshima" Symposium on the Development and Application of Semiconductor Tracking Detectors. - [28] G. De Geronimo, D. Christian, C. Bebek, M. Garcia-Sciveres, H. Von der Lippe, G. Haller, A. A. Grillo, and M. Newcomer. Integrated Circuit Design in US High-Energy Physics. In *Proceedings*, 2013 Community Summer Study on the Future of U.S. Particle Physics: Snowmass on the Mississippi (CSS2013): Minneapolis, MN, USA, July 29-August 6, 2013, 2013. - [29] M. Dessouky and A. Kaiser. Input switch configuration suitable for rail-to-rail operation of switched op amp circuits. *Electronics Letters*, 35(1):8–10, Jan 1999. - [30] M. Dessouky and A. Kaiser. Very low-voltage digital-audio /spl delta//spl sigma/ modulator with 88-db dynamic range using local switch bootstrapping. *IEEE Journal of Solid-State Circuits*, 36(3):349–355, March 2001. [31] M. Ding, P. Harpe, Y. Liu, B. Busze, K. Philips, and H. de Groot. A 46 $\mu$ W 13 b 6.4 ms/s sar adc with background mismatch and offset calibration. *IEEE Journal of Solid-State Circuits*, 52(2):423–432, Feb 2017. - [32] F. Ehrler, M. Benoit, D. Dannheim, M. Kiehn, A. Nürnberg, I. Perić, M. Prathapan, R. Schimassek, T. Vanat, M. Vicente, A. Weber, and H. Zhang. Characterization results of a hvcmos sensor for atlas. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, 936:654 656, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors. - [33] Felix Ehrler. Test system for ATLASpix. https://git.scc.kit.edu/jm2998/H18b\_Vivado/ https://git.scc.kit.edu/jm2998/H18\_AllSensors/, 2018. Accessed: February 12, 2020. - [34] Jeffrey Evan and Sank Daniel. Fast accurate state measurement with superconducting qubits. *Physical Review Letters*, 112(19), May 2014. - [35] F. Faccio and G. Cervelli. Radiation-induced edge effects in deep submicron cmos transistors. *IEEE Transactions on Nuclear Science*, 52(6):2413–2420, Dec 2005. - [36] P. M. Figueiredo and J. C. Vital. Kickback noise reduction techniques for cmos latched comparators. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 53(7):541–545, July 2006. - [37] D.P. Franke, J.S. Clarke, L.M.K. Vandersypen, and M. Veldhorst. Rent's rule and extensibility in quantum computing. *Microprocessors and Microsystems*, 67:1–7, 2019. - [38] M. Garcia-Sciveres et al. The FE-I4 pixel readout integrated circuit. *Nucl. Instrum. Meth.*, A636:S155–S159, 2011. - [39] Maurice Garcia-Sciveres. The RD53A Integrated Circuit. Technical Report CERN-RD53-PUB-17-001, CERN, Geneva, Oct 2017. - [40] Maurice Garcia-Sciveres and Xinkang Wang. Data encoding efficiency in pixel detector readout with charge information. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, 815:18–22, Apr 2016. - [41] Maurice Garcia-Sciveres and Norbert Wermes. A review of advances in pixel detectors for experiments with high rate and radiation. *Reports on Progress in Physics*, 81(6):066101, may 2018. - [42] Vito Giannini, Pierluigi Nuzzo, Vincenzo Chironi, Andrea Baschirotto, Geert Van der Plas, and Jan Craninckx. An 820 $\mu$ w 9b 40ms/s noise-tolerant - dynamic-sar ADC in 90nm digital CMOS. In 2008 IEEE International Solid-State Circuits Conference, ISSCC 2008, Digest of Technical Papers, San Francisco, CA, USA, February 3-7, 2008, pages 238–239, 2008. - [43] Donald Gross, John F. Shortle, James M. Thompson, and Carl M. Harris. *Fundamentals of Queueing Theory*. Wiley-Interscience, New York, NY, USA, 4th edition, 2008. - [44] V. Hariprasath, J. Guerber, S. . Lee, and U. . Moon. Merged capacitor switching based sar adc with highest switching energy-efficiency. *Electronics Letters*, 46(9):620–621, April 2010. - [45] P. Harpe, C. Zhou, X. Wang, G. Dolmans, and H. de Groot. A 12fj/conversion-step 8bit 10ms/s asynchronous sar adc for low energy radios. In 2010 Proceedings of ESSCIRC, pages 214–217, Sep. 2010. - [46] C R Helms and E H Poindexter. The silicon-silicon dioxide system: Its microstructure and imperfections. *Reports on Progress in Physics*, 57(8):791–852, aug 1994. - [47] Tomasz Hemperek, Tetsuichi Kishishita, Hans Krüger, and Norbert Wermes. A monolithic active pixel sensor for ionizing radiation using a 180nm hysoi process. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, 796:8–12, Oct 2015. - [48] Fabian Hügging. The ATLAS Pixel Insertable B-Layer (IBL). *Nucl. Instrum. Meth.*, A650:45–49, 2011. - [49] T. Hirono, M. Barbero, P. Breugnon, S. Godiot, T. Hemperek, F. Hügging, J. Janssen, H. Krüger, J. Liu, P. Pangaud, I. Peric, D. Pohl, A. Rozanov, P. Rymaszewski, and N. Wermes. Characterization of fully depleted cmos active pixel sensors on high resistivity substrates for use in a high radiation environment. In 2016 IEEE Nuclear Science Symposium, Medical Imaging Conference and Room-Temperature Semiconductor Detector Workshop (NSS/MIC/RTSD), pages 1–4, Oct 2016. - [50] H. Homulle, S. Visser, and E. Charbon. A cryogenic 1 gsa/s, soft-core fpga adc for quantum computing applications. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 63(11):1854–1865, Nov 2016. - [51] J. M. Hornibrook et al. Cryogenic Control Architecture for Large-Scale Quantum Computing. *Phys. Rev. Applied*, 3(2):024010, 2015. - [52] B T Huffman. Plans for the phase II upgrade to the ATLAS detector. *Journal of Instrumentation*, 9(02):C02033–C02033, feb 2014. [53] Xilinx Inc. Aurora 8B/10B Protocol Specification. https://www.xilinx.com/products/intellectual-property/aurora64b66b.html, 2019. Accessed: February 12, 2020. - [54] R. M. Incandela, L. Song, H. Homulle, E. Charbon, A. Vladimirescu, and F. Sebastiano. Characterization and compact modeling of nanometer cmos transistors at deep-cryogenic temperatures. *IEEE Journal of the Electron Devices Society*, 6:996–1006, 2018. - [55] R. M. Incandela, L. Song, H. A. R. Homulle, F. Sebastiano, E. Charbon, and A. Vladimirescu. Nanometer cmos characterization and compact modeling at deep-cryogenic temperatures. In 2017 47th European Solid-State Device Research Conference (ESSDERC), pages 58–61, Sep. 2017. - [56] M. Karagounis, D. Arutinov, M. Barbero, F. Huegging, H. Krueger, and N. Wermes. An integrated shunt-ldo regulator for serial powered systems. In 2009 Proceedings of ESSCIRC, pages 276–279, Sep. 2009. - [57] David G. Kendall. Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded markov chain. *Ann. Math. Statist.*, 24(3):338–354, 09 1953. - [58] M. Kiehn, A. Herkert, A. Weber, A. Schöning, A. Miucci, A. Fehr, C. Blattgerste, C. Grzesik, D.M.S. Sultan, D. Immig, D. Forshaw, D. Ferrere, D. Wiedner, E. Zaffaroni, E. Vilella Figueras, F. Ehrler, F. Lanni, G. Iacobucci, H. Augustin, H. Liu, H. Chen, I. Peric, J. Anders, J. Hammerich, J. Kröger, J. Vossebeld, K. Chen, L. Xu, L. Noehte, L. Huth, M. Vicente, M. Benoit, M. Weber, M. Prathapan, N. Berger, S. Dittmeier, S. Sevilla, S. Tang, T. Golling, W. Wu, and W. Wong. Performance of the ATLASPix1 pixel sensor prototype in ams aH18 CMOS technology for the ATLAS ITk upgrade. *Journal of Instrumentation*, 14(08):C08013–C08013, aug 2019. - Francesco Armando Di Bello, Mathieu [59] Moritz Kiehn, Benoit, Raimon Casanova Mohr, Hucheng Chen, Kai Chen, Sultan D.M.S., Felix Ehrler, Didier Ferrere, Dylan Frizell, Sergio Gonzalez Sevilla, Giuseppe Iacobucci, Francesco Lanni, Hongbin Liu, Claudia Merlassino, Jessica Metcalfe, Antonio Miucci, Ivan Peric, Mridula Prathapan, Rudolf Schimassek, Mateus Vicente Barreto, Thomas Weston, Eva Vilella Figueras, Michele Weber, Alena Weber, Winnie Wong, Weihao Wu, Ettore Zaffaroni, Hui Zhang, and Matt Zhang. Performance of cmos pixel sensor prototypes in ams h35 and ah18 technology for the atlas itk upgrade. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 924:104 – 107, 2019. 11th International Hiroshima Symposium on Development and Application of Semiconductor Tracking Detectors. [60] Anna V. Kimmel, Peter V. Sushko, Alexander L. Shluger, and Gennadi Bersuker. Positive and negative oxygen vacancies in amorphous silica. 2009. - [61] Charles Kittel. Introduction to Solid State Physics. Wiley, 8 edition, 2004. - [62] M. Klein, M. Hutter, H. Oppermann, T. Fritzsch, G. Engelmann, L. Dietrich, J. Wolf, B. Bramer, R. Dudek, and H. Reichl. Development and evaluation of lead free reflow soldering techniques for the flip chip bonding of large gaas pixel detectors on si readout chip. In 2008 58th Electronic Components and Technology Conference, pages 1893–1899, May 2008. - [63] L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Brändli, M. Kossel, T. Morf, T. M. Andersen, and Y. Leblebici. A 3.1 mw 8b 1.2 gs/s single-channel asynchronous sar adc with alternate comparators for enhanced speed in 32 nm digital soi cmos. *IEEE Journal of Solid-State Circuits*, 48(12):3049–3058, Dec 2013. - [64] R. Van Langevelde, A. J. Scholten, D. B. M. Klaassen, A. J. Scholten Way, and D. B. M. Klaassen Way. Physical background of mos model 11 level 1101. - [65] M. Y. Lanzerotti, G. Fiorenza, and R. A. Rand. Microminiature packaging and integrated circuitry: The work of e. f. rent, with an application to onchip interconnection requirements. *IBM Journal of Research and Development*, 49(4.5):777–803, July 2005. - [66] P. M. Lenahan, K. L. Brower, P. V. Dressendorfer, and W. C. Johnson. Radiation-induced trivalent silicon defect buildup at the si-sio2 interface in mos structures. *IEEE Transactions on Nuclear Science*, 28(6):4105–4106, Dec 1981. - [67] P. Linczuk, R. Krawczyk, A. Wojenski, W. Zabolotny, M. Chernyshova, K. Pozniak, T. Czarski, M. Gaska, G. Kasprowicz, P. Kolasinski, E. Kowalska-Strzeciwilk, and K. Malinowski. Latency and throughput of online processing in soft x-ray gem-based measurement system. *Journal of Instrumentation*, 14(05):C05001, 2019. - [68] G Lindström et al. Radiation hard silicon detectors—developments by the rd48 (rose) collaboration. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment,* 466(2):308 326, 2001. 4th Int. Symp. on Development and Application of Semiconductor Tracking Detectors. - [69] Lucie Linssen, Akiya Miyamoto, Marcel Stanitzki, and Harry Weerts. Physics and Detectors at CLIC: CLIC Conceptual Design Report. 2012. - [70] C. Liu, S. Chang, G. Huang, and Y. Lin. A 10-bit 50-ms/s sar adc with a monotonic capacitor switching procedure. *IEEE Journal of Solid-State Circuits*, 45(4):731–740, April 2010. [71] C. Liu, S. Chang, G. Huang, Y. Lin, C. Huang, C. Huang, L. Bu, and C. Tsai. A 10b 100ms/s 1.13mw sar adc with binary-scaled error compensation. In 2010 *IEEE International Solid-State Circuits Conference - (ISSCC)*, pages 386–387, Feb 2010. - [72] J. Liu, Y. Li, R. Zhang, W. Yang, Y. Wang, D. Fu, G. Chen, and R. Li. Development of a radiation-hardened standard cell library for 65nm cmos technology. In 2016 China Semiconductor Technology International Conference (CSTIC), pages 1–3, March 2016. - [73] M. Mager. Alpide, the monolithic active pixel sensor for the alice its upgrade. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 824:434 – 438, 2016. Frontier Detectors for Frontier Physics: Proceedings of the 13th Pisa Meeting on Advanced Detectors. - [74] Roland Marbot et al. Field effect switching circuit. U.S Patent US5614841A, Bull S.A., Puteaux, France, 1997. - [75] Masaya Miyahara, Yusuke Asada, Daehwa Paik, and Akira Matsuzawa. A low-noise self-calibrating dynamic comparator for high-speed adcs. In 2008 IEEE Asian Solid-State Circuits Conference, pages 269–272, Nov 2008. - [76] J.M. McGarrity, P.S. Winokur, H.E. Boesch, and F.B. McLean. Interface states resulting from a hole flux incident on on the sio2/si interface. In SOKRATES T. PANTELIDES, editor, *The Physics of SiO2 and its Interfaces*, pages 428 432. Pergamon, 1978. - [77] M. Moll. Displacement damage in silicon detectors for high energy physics. *IEEE Transactions on Nuclear Science*, 65(8):1561–1582, Aug 2018. - [78] Boris Murmann. ADC Survey. <a href="https://web.stanford.edu/~murmann/adcsurvey.html">https://web.stanford.edu/~murmann/adcsurvey.html</a>, subtitle = Excel sheet online, note = Accessed: February 12, 2020, 2018. - [79] S. R. Norsworthy, R. Schreier, and G. C. Temes. *Analog Circuit Design for ADCs*. IEEE, 1997. - [80] Burak Okcan, Patrick Merken, Georges Gielen, and Chris Van Hoof. A cryogenic analog to digital converter operating from 300 k down to 4.4 k. *The Review of scientific instruments*, 81 2:024702, 2010. - [81] T R Oldham, F B McLean, H E Boesch, and J M McGarrity. An overview of radiation-induced interface traps in MOS structures. *Semiconductor Science and Technology*, 4(12):986–999, dec 1989. [82] D. Paik, M. Miyahara, and A. Matsuzawa. An analysis on a pseudo-differential dynamic comparator with load capacitance calibration. In 2011 9th IEEE International Conference on ASIC, pages 461–464, Oct 2011. - [83] Sherwood Parker. A proposed vlsi pixel device for particle detection. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 275(3):494 516, 1989. - [84] B. Patra, R. M. Incandela, J. P. G. van Dijk, H. A. R. Homulle, L. Song, M. Shahmohammadi, R. B. Staszewski, A. Vladimirescu, M. Babaie, F. Sebastiano, and E. Charbon. Cryo-cmos circuits and systems for quantum computing applications. *IEEE Journal of Solid-State Circuits*, 53(1):309–321, Jan 2018. - [85] I. Perić et al. High-voltage pixel sensors for ATLAS upgrade. *Nucl. Instrum. Meth.*, A765:172–176, 2014. - [86] I. Peric. A novel monolithic pixel detector implemented in high-voltage cmos technology. In 2007 IEEE Nuclear Science Symposium Conference Record, volume 2, pages 1033–1039, Oct 2007. - [87] I. Peric. A novel monolithic pixel detector implemented in high-voltage cmos technology. In 2007 IEEE Nuclear Science Symposium Conference Record, volume 2, pages 1033–1039, Oct 2007. - [88] I. Perić, R. Blanco, R. Casanova Mohr, F. Ehrler, F. Guezzi Messaoud, C. Krämer, R. Leys, M. Prathapan, R. Schimassek, A. Schöning, E. Vilella Figueras, A. Weber, and H. Zhang. Status of HVCMOS developments for ATLAS. *Journal of Instrumentation*, 12(02):C02030–C02030, feb 2017. - [89] I. Peric, L. Blanquart, G. Comes, P. Denes, K. Einsweiler, P. Fischer, E. Mandelli, and Gerrit Jan Meddeler. The FEI3 readout chip for the ATLAS pixel detector. *Nucl. Instrum. Meth.*, A565:178–187, 2006. - [90] Ivan Peric. Design and Realisation of Integrated Circuits for the Readout of Pixel Sensors in High Energy Physics and Biomedical Imaging, 2004. Presented 2004. - [91] Ivan Peric, Mridula Prathapan, Heiko Augustin, Mathieu Benoit, Raimon Casanova Mohr, Dominik Dannheim, Felix Ehrler, Fadoua Guezzi Messaoud, Moritz Kiehn, Andreas Nürnberg, Rudolf Schimassek, Mateus Vicente Barreto, Eva Vilella Figueras, Alena Weber, Winnie Wong, and Hui Zhang. A high-voltage pixel sensor for the atlas upgrade. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 924:99 – 103, 2019. 11th International Hiroshima Symposium on Development and Application of Semiconductor Tracking Detectors. [92] H. Pernegger. Specification for ATLAS CMOS monolithic sensor. CERN internal document, 2016. - [93] H. Pernegger, R. Bates, C. Buttar, M. Dalla, J.W. van Hoorne, T. Kugathasan, D. Maneuski, L. Musa, P. Riedler, C. Riegel, C. Sbarra, D. Schaefer, E.J. Schioppa, and W. Snoeys. First tests of a novel radiation hard CMOS sensor process for depleted monolithic active pixel sensors. *Journal of Instrumentation*, 12(06):P06008–P06008, jun 2017. - [94] T. Poikela, M. De Gaspari, J. Plosila, T. Westerlund, R. Ballabriga, J. Buytaert, M. Campbell, X. Llopart, K. Wyllie, V. Gromov, M. van Beuzekom, and V. Zivkovic. VeloPix: the pixel ASIC for the LHCb upgrade. *Journal of Instrumentation*, 10(01):C01057–C01057, jan 2015. - [95] M. Prathapan, M. Benoit, R. Casanova, D. Dannheim, F. Ehrler, M. Kiehn, A. Nürnberg, P. Pangaud, R. Schimassek, E. Vilella, A. Weber, W. Wong, H. Zhang, and I. Perić. Towards the large area hvcmos demonstrator for atlas itk. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 936:389 391, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors. - [96] Mridula Prathapan et al. Large area HVCMOS pixel sensor prototype for ATLAS detector upgrade. https://www.dpg-verhandlungen.de/year/2018/conference/wuerzburg/part/t/session/5/contribution/2, 2018. Accessed: February 12, 2020. - [97] Mridula Prathapan et al. ATLASpix3: A high voltage CMOS sensor chip designed for ATLAS Inner Tracker. https://indi.to/W8Z45, 2019. Accessed: February 12, 2020. - [98] Mridula Prathapan et al. Design of a HVCMOS pixel sensor ASIC with onchip readout electronics for ATLAS ITk Upgrade. *PoS*, TWEPP2018:074, 2019. - [99] Mridula Prathapan et al. Design of HVCMOS pixel sensor chips for ATLAS ITk upgrade. https://publikationen.bibliothek.kit.edu/1000091693, 2019. Accessed: February 12, 2020. - [100] L.-Å. Ragnarsson and P. Lundgren. Electrical characterization of $P_b$ centers in (100)Si-SiO<sub>2</sub> structures: The influence of surface potential on passivation during post metallization anneal. *Journal of Applied Physics*, 88:938–942, July 2000. - [101] B. Razavi. Building Blocks of Data Conversion Systems. IEEE, 1995. - [102] B. Razavi. The strongarm latch. *IEEE Solid-State Circuits Magazine*, 7:12–17, 2015. [103] B. Razavi and B. A. Wooley. Design techniques for high-speed, high-resolution comparators. *IEEE Journal of Solid-State Circuits*, 27(12):1916–1926, Dec 1992. - [104] G. Aglieri Rinella, D. Alvarez Feito, R. Arcidiacono, C. Biino, S. Bonacini, A. Ceccucci, S. Chiozzi, E. Cortina Gil, A. Cotta Ramusino, J. Degrange, M. Fiorini, E. Gamberini, A. Gianoli, J. Kaplon, A. Kluge, A. Mapelli, F. Marchetto, E. Minucci, M. Morel, J. Noël, M. Noy, L. Perktold, M. Perrin-Terrin, P. Petagna, F. Petrucci, K. Poltorak, G. Romagnoli, G. Ruggiero, B. Velghe, and H. Wahl. The na62 gigatracker. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 845:147 149, 2017. Proceedings of the Vienna Conference on Instrumentation 2016. - [105] D. Ristè, C. C. Bultink, K. W. Lehnert, and L. DiCarlo. Feedback control of a solid-state qubit using high-fidelity projective measurement. *Physical Review Letters*, 109(24), Dec 2012. - [106] D. Ristè and L. DiCarlo. Digital feedback in superconducting quantum circuits, 2015. - [107] D. D. Russel. Field effect switching circuit. U.S Patent 3,448,293, Schneider Electric Systems USA Inc, 1966. - [108] R. Schimassek, R. Blanco, R. Casanova, F. Ehrler, I. Perić, E. Vilella, and H. Zhang. Monolithic sensors in Ifoundry technology: Concepts and measurements. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 936:679 – 680, 2019. Frontier Detectors for Frontier Physics: 14th Pisa Meeting on Advanced Detectors. - [109] Rudolf Schimassek. ReadOut Modelling Environment (ROME). https://git.scc.kit.edu/jl1038/Readout\_Simulation, 2019. Accessed: February 12, 2020. - [110] J. R. Schwank, M. R. Shaneyfelt, D. M. Fleetwood, J. A. Felix, P. E. Dodd, P. Paillet, and V. Ferlet-Cavrois. Radiation effects in mos oxides. *IEEE Transactions on Nuclear Science*, 55(4):1833–1853, Aug 2008. - [111] J. R. Schwank, P. S. Winokur, P. J. McWhorter, F. W. Sexton, P. V. Dressendorfer, and D. C. Turpin. Physical mechanisms contributing to device "rebound". *IEEE Transactions on Nuclear Science*, 31(6):1434–1438, Dec 1984. - [112] M. R. Shaneyfelt, J. R. Schwank, D. M. Fleetwood, P. S. Winokur, K. L. Hughes, and F. W. Sexton. Field dependence of interface-trap buildup in polysilicon and metal gate mos devices. *IEEE Transactions on Nuclear Science*, 37(6):1632–1640, Dec 1990. [113] A. Shikata, R. Sekimoto, T. Kuroda, and H. Ishikuro. A 0.5v 1.1ms/sec 6.3fj/conversion-step sar-adc with tri-level comparator in 40nm cmos. In 2011 Symposium on VLSI Circuits - Digest of Technical Papers, pages 262–263, June 2011. - [114] Peter W. Shor. Scheme for reducing decoherence in quantum computer memory. *Phys. Rev. A*, 52:R2493–R2496, Oct 1995. - [115] W. Snoeys et al. A process modification for cmos monolithic active pixel sensors for enhanced depletion, timing performance and radiation tolerance. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 871:90 96, 2017. - [116] V. Sola, R. Arcidiacono, A. Bellora, N. Cartiglia, F. Cenna, R. Cirio, S. Durando, M. Ferrero, Z. Galloway, B. Gruey, P. Freeman, M. Mashayekhi, M. Mandurrino, V. Monaco, R. Mulargia, M.M. Obertino, F. Ravera, R. Sacchi, H. F-W. Sadrozinski, A. Seiden, N. Spencer, A. Staiano, M. Wilder, N. Woods, and A. Zatserklyaniy. Ultra-fast silicon detectors for 4d tracking. *Journal of Instrumentation*, 12(02):C02072–C02072, feb 2017. - [117] Helmuth Spieler. *Semiconductor detector systems*. Semiconductor Science and Technology. Oxford Univ. Press, Oxford, 2005. - [118] A. M. Steane. Error correcting codes in quantum theory. *Phys. Rev. Lett.*, 77:793–797, Jul 1996. - [119] L. Steffen, Y. Salathe, M. Oppliger, P. Kurpiers, M. Baur, C. Lang, C. Eichler, G. Puebla-Hellmann, A. Fedorov, and A. Wallraff. Deterministic quantum teleportation with feed-forward in a solid state system. *Nature*, 500(7462):319–322, 2013. - [120] DMS Sultan, S. Gonzalez Sevilla, D. Ferrere, G. Iacobucci, E. Zaffaroni, W. Wong, M.V. Barrero Pinto, M. Kiehn, M. Prathapan, F. Ehrler, and et al. Electrical characterization of ams ah18 hv-cmos after neutrons and protons irradiation. *Journal of Instrumentation*, 14(05):C05003–C05003, May 2019. - [121] V. Tripathi and B. Murmann. A 160 ms/s, 11.1 mw, single-channel pipelined sar adc with 68.3 db sndr. In *Proceedings of the IEEE 2014 Custom Integrated Circuits Conference*, pages 1–4, Sep. 2014. - [122] J. Tsai, H. Wang, Y. Yen, C. Lai, Y. Chen, P. Huang, P. Hsieh, H. Chen, and C. Lee. A 0.003 mm<sup>2</sup> 10 b 240 ms/s 0.7 mw sar adc in 28 nm cmos with digital error correction and correlated-reversed switching. *IEEE Journal of Solid-State Circuits*, 50(6):1382–1398, June 2015. - [123] R Turchetta, J.D Berst, B Casadei, G Claus, C Colledani, W Dulinski, Y Hu, D Husson, J.P Le Normand, J.L Riester, G Deptuch, U Goerlach, S Higueret, - and M Winter. A monolithic active pixel sensor for charged particle tracking and imaging using standard vlsi cmos technology. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, 458(3):677 689, 2001. - [124] F. van der Goes, C. Ward, S. Astgimath, H. Yan, J. Riley, J. Mulder, S. Wang, and K. Bult. 11.4 a 1.5mw 68db sndr 80ms/s 2x interleaved sar-assisted pipelined adc in 28nm cmos. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 200–201, Feb 2014. - [125] G. Van der Plas, S. Decoutere, and S. Donnay. A 0.16pj/conversion-step 2.5mw 1.25gs/s 4b adc in a 90nm digital cmos process. In 2006 IEEE International Solid State Circuits Conference Digest of Technical Papers, pages 2310–, Feb 2006. - [126] B. Verbruggen, M. Iriguchi, and J. Craninckx. A 1.7mw 11b 250ms/s 2 x interleaved fully dynamic pipelined sar adc in 40nm digital cmos. In 2012 *IEEE International Solid-State Circuits Conference*, pages 466–468, Feb 2012. - [127] Bob Verbruggen. *Digitally Assisted Analog to Digital Converters*, pages 25–44. Springer International Publishing, Cham, 2015. - [128] D. R. Ward, D. E. Savage, M. G. Lagally, S. N. Coppersmith, and M. A. Eriksson. Integration of on-chip field-effect transistor switches with dopantless si/sige quantum dots for high-throughput testing. *Applied Physics Letters*, 102(21):213107, May 2013. - [129] H. Wei, C. Chan, U. Chio, S. Sin, S. U, R. P. Martins, and F. Maloberti. An 8-b 400-ms/s 2-b-per-cycle sar adc with resistive dac. *IEEE Journal of Solid-State Circuits*, 47(11):2763–2772, Nov 2012. - [130] N. Wermes, L. Rossi, P. Fischer, and T. Rohe. *Pixel Detectors, From Fundamentals to Applications*. Springer-Verlag, 2006. - [131] Norbert Wermes. Pixel vertex detectors, 2006. - [132] Norbert Wermes. Pixel detectors ... where do we stand? *Nucl. Instrum. Meth.*, A924:44–50, 2019. - [133] A. X. Widmer and P. A. Franaszek. A dc-balanced, partitioned-block, 8b/10b transmission code. *IBM Journal of Research and Development*, 27(5):440–451, Sep. 1983. - [134] Dirk Wiedner et al. Readout Electronics for the First Large HV-MAPS Chip for Mu3e. *PoS*, TWEPP-17:099, 2018. [135] Wikipedia contributors. 8b/10b encoding — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=8b/10b\_encoding&oldid=916157423, 2019. [Online; accessed 22-October-2019]. - [136] B. Wu, S. Zhu, B. Xu, and Y. Chiu. A 24.7 mw 65 nm cmos sar-assisted ct $\delta\sigma$ modulator with second-order noise coupling achieving 45 mhz bandwidth and 75.3 db sndr. *IEEE Journal of Solid-State Circuits*, 51(12):2893–2905, Dec 2016. - [137] You-Kuang Chang, Chao-Shiun Wang, and Chorng-Kuang Wang. A 8-bit 500-ks/s low power sar adc for bio-medical applications. In 2007 IEEE Asian Solid-State Circuits Conference, pages 228–231, Nov 2007. - [138] C. Zhang, F. Jazaeri, A. Pezzotta, C. Bruschini, G. Borghello, F. Faccio, S. Mattiazzo, A. Baschirotto, and C. Enz. Characterization of gigarad total ionizing dose and annealing effects on 28-nm bulk mosfets. *IEEE Transactions* on Nuclear Science, 64(10):2639–2647, Oct 2017. - [139] H. Zhang, M. Caselle, B. Leyrer, U. Bauer, P. Pfistner, and I. Perić. Radiation hard active pixel sensor with 25µm × 50µm pixel size designed for capacitive readout with RD53 ASIC. *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, page 162760, 2019.