## Broadband Circuits for High-Speed Short Reach Optical Transceivers

Zur Erlangung des akademischen Grades eines

# DOKTORS DER INGENIEURWISSENSCHAFTEN (Dr.-Ing.)

von der KIT-Fakultät für Elektrotechnik und Informationstechnik des Karlsruher Instituts für Technologie (KIT)

angenommene

#### DISSERTATION

von

#### M.Sc. Christian Bohn

geb. in Balingen

Tag der mündlichen Prüfung:

23.04.2024

Hauptreferent: Korreferent: Prof. Dr.-Ing. Ahmet Çağrı Ulusoy Prof. Dr. Ivan Perić

### Zusammenfassung

Mit der Zunahme der weltweiten Internetnutzer und des Datenverkehrs ist der Bedarf an Fortschritten beim Datendurchsatz in Rechenzentren dringender denn je. Ethernet-Standards wie IEEE802.3bs und IEEE802.3dj nutzen PAM-4-Signale um die Datenrate im Vergleich zu NRZ Verbindungen mit gleicher Bandbreite zu erhöhen und definieren die Rahmenbedingungen für intensitätsmodulierte Übertragung mit 200 Gb/s pro Kanal. Diese Arbeit soll Erkenntnisse und Fortschritte für Schaltungen liefern, die für diese Geschwindigkeitsstufen geeignet sind. Das Ziel ist die Untersuchung kompakter und effizienter Transceiver-Frontends für 100 GBd PAM-4-Signalisierung.

Für die Untersuchung von 100 GBd PAM-4-Sender werden Treiberschaltungen mit integrierter PAM-4-Leistungskombination betrachtet. Sie arbeiten rein analog ohne einen synchronen Taktgeber und sind für die Ansteuerung von silizium-organisch hybriden Mach-Zehnder modulators vorgesehen. Zunächst werden zwei Versionen von verteilten Treibern entwickelt. Zwei Wanderwellenverstärker werden auf einer gemeinsamen Ausgangsleitung kombiniert und als PAM-4-Kombinierer verwendet. Es werden zwei Methoden zur Verringerung der Eingangskapazität und damit zur Erweiterung der Bandbreite verglichen, wobei eine Bandbreite von 81 GHz für eine Kaskode mit kapazitivem Teiler und 102 GHz für eine Kaskodenverstärkerzelle mit Stromgegenkopplung erreicht wird. Sie verbrauchen nur 118 mW und 122 mW für die jeweiligen Varianten. 40 GBd mit etwa 600 mV<sub>pp</sub> verifizieren das Prinzip für beide Varianten, wobei die Variante mit kapazitivem Teiler auch bis 70 GBd verifiziert wurde.

Eine zweite Treiber-Topologie wurde implementiert, da die verteilten Treiber das Ziel von 100 GBd nicht erreicht haben. Dieser Treiber basiert auf einer differenziellen Ausgangsstufe mit Stromaddition, wodurch er kompakter als die verteilte Variante ist. Außerdem ist diese Art der Schalung mit differenziellen Entzerrungs- oder Serialisierungsschaltungen kompatibel. Um den Treiber mit zwei differentiellen Datenströmen aus externen nicht-differenziellen Quellen zu versorgen, sind Breitband-Baluns mit einer Bandbreite von mehr als 67 GHz auf dem Chip enthalten. Mit 23 mW verbrauchen sie nur einen Bruchteil der gesamten 315 mW des PAM-4-Kombinators. Die Ausgangsstufe zeigte 100 GBd mit 2 V<sub>pp,diff</sub> bei einer 100  $\Omega_{diff}$  Last, was sie zu einem geeigneten Treiber für SOH MZMs macht. Alle Treiber unterstützen einen einstellbaren PAM-4 Levelabstand zur Vorverzerrung der Übertragungsfunktion des MZM.

Eine Fotodiode wird in der Regel als Detektor in der optischen Kommunikation eingesetzt. Der von der Diode erzeugte Fotostrom wird durch einen Transimpedanzverstärker (TIA) in eine Spannung gewandelt und verstärkt. Im Rahmen dieser Arbeit wird ein linearer mehrstufiger TIA für die direkte Detektion von 100 GBd PAM-4 mit einer einzelnen Photodiode entwickelt. Es wurde eine Transimpedanz von  $72 \, dB \,\Omega$  mit einer Bandbreite von mehr als 67 GHz und einer geringen Gruppenlaufzeitsvariation erreicht. Besonderes Augenmerk liegt in dieser Schaltung auf einer neuen und verbesserten Eingangsgleichstromsenke. Die modifizierte Senke ermöglicht einen um 2 dB erweiterten Dynamikbereich mit nur minimalen Auswirkungen auf die Hochfrequenzeigenschaften. Zwei Verstärker mit variabler Verstärkung, die der Eingangsstufe nachgeschaltet sind, ermöglichen eine konstante Ausgangsamplitude und halten die harmonischen Verzerrungen bei einer Eingangsstromdynamik von 20 dB unter 3 %. Automatische und manuelle Verstärkungseinstellung, DC-Offset-Kompensation, ein abstimmbarer Frequenzgang und ein Ausgangstreiber für 400 mV<sub>pp,diff</sub> vervollständigen die Funktionalität für ein Empfängersystem bei einer Gesamtleistungsaufnahme von 193 mW.

Die Kombination von Silizium-Photonik und Silizium-Elektronik auf einem einzigen Chip verspricht ultrakompakte Transceiver-Systeme. Daher wurde eine Implementierung für einen monolithischen, mehrkanaligen, kohärenten optoelektronischen Empfänger untersucht. Der kohärente Empfänger verwendet zwei Fotodioden, die eine differentielle Eingangsstufe speisen. Zwei Regelkreise dienen zur Arbeitspunkteinstellung und steuern herkömmlichen NMOS-Stromsenken zur Eingangsgleichstromkompensation. Mit 39 GHz Bandbreite, 10 dB Einstellbereich der Verstärkung, nur 1.7  $\mu$ A<sub>rms</sub> eingangsbezogenem Rauschstrom und 400 mV<sub>pp,diff</sub> Ausgangsamplitude hat er das Potenzial, in Systemen mit hoher Datenrate eingesetzt zu werden. Der TIA ist für den Betrieb mit einer einzigen Versorgungsspannung vorbereitet und verfügt über Spannungs- und Stromreferenzen. Ein Kanal verbraucht 174 mW, wodurch die Verlustleistung für Mehrkanalempfänger überschaubar bleibt.

Ein Serialisierer und ein breitbandiger Frequenzverdoppler wurden als unterstützende Schaltungen für Hochgeschwindigkeits-Frontends entwickelt. Der 8:1-Serialisierer mit einer Ausgangsdatenrate von 10.4 Gb/s basiert auf einer optimierten Kombination aus CMOS- und BiCMOS-CML-Logikzellen. Die Leistungs- und Flächenanforderungen werden durch diese optimierte Designstrategie minimiert. Der Serialisier, inklusive Balun für den Takteingang, benötigt nur eine Fläche von 0.025 mm<sup>2</sup> für den Schaltungskern und hat eine Leistungsaufnahme von 54 mW, wobei 68 % davon im Ausgangstreiber verbraucht werden.

Ein Push-Push-Frequenzverdoppler wurde für eine große Betriebsbandbreite und differenzielle Ausgänge modifiziert. Der Chip enthält einen integrierten Balun, um die Grundschwingung 18.5 dB unterhalb der zweiten Harmonischen über die Ausgangsbandbreite von 5 GHz to 80 GHz zu halten. Mit seiner differenziellen Ausgangsmodifikation kann der Verdoppler direkt als Taktmultiplikator oder flexible lokale Oszillatorquelle in Multiband-HF-Transceivern verwendet werden.

### Abstract

With increasing worldwide internet users and data traffic, the need for advances in data center throughput is more pressing than ever. Ethernet standards like IEEE802.3bs and IEEE802.3dj move to PAM-4 signaling to increase the data rate compared to same bandwidth NRZ transmission and prepare the grounds for 200 Gb/s in intensity-modulated single fiber links. This work aims to provide insights and advancements for circuits usable in these speed grades. The goal is to investigate compact and efficient transceiver front ends for 100 GBd PAM-4 signaling.

For the 100 GBd PAM-4 transmitters, driver circuits with integrated PAM-4 power combination are investigated. They operate purely analog without a synchronous clock and are intended to drive silicon-organic hybrid Mach-Zehnder modulators. First, two versions of distributed drivers are developed. Two traveling-wave amplifiers are combined on a common output line and used as PAM-4 combiners. Two methods of input capacitance reduction and, therefore, bandwidth extension are compared, achieving a bandwidth of 81 GHz for a capacitive-divider cascode and 102 GHz for an emitter-degenerated cascode gain cell. They consume only 118 mW and 122 mW for the respective variants. 40 GBd with approximately  $600 \text{ mV}_{pp}$  verify the principle for both variants, whereas the capacitive-divider variant was also verified up to 70 GBd.

A second driver topology is implemented since the distributed drivers did not reach the 100 GBd target. This driver is based on a differential current combining output stage, making it more compact than a distributed variant. Additionally, this stage is compatible with differential equalization or serialization circuitry. To supply two differential data streams to the driver from external single-ended sources, broadband baluns with a bandwidth of more than 67 GHz are included on the die. With 23 mW, they consume only a fraction of the PAM-4 combiner's total 315 mW. The output stage showed 100 GBd with 2 V<sub>pp,diff</sub> into standard 100  $\Omega_{diff}$ , making it a suitable driver for SOH MZMs. All drivers support an adjustable level spacing for pre-distortion of the MZM's transfer function.

A photodiode is commonly used as a detector in optical communications. The diode-generated photocurrent is converted to a voltage and amplified by a transimpedance amplifier. In the scope of this thesis, a linear multi-stage TIA for the direct detection of 100 GBd PAM-4 with a single photodiode is developed. A 72 dB  $\Omega$  transimpedance with more than 67 GHz bandwidth and low group delay variation was implemented. Particular focus is on a new and improved input DC current sink. The modified sink enables a 2 dB enhanced dynamic range with minimal impact on the high-frequency performance. Two variable gain amplifiers following the input stage enable a constant output amplitude and keep the total harmonic distortion below 3 % for 20 dB input current variation. Automatic and manual gain control, DC offset compensation, a tuneable frequency response, and an output driver for 400 mV<sub>pp,diff</sub> complete the functionality for a receiver system while consuming a total of 193 mW.

Combining silicon photonics with silicon electronics onto a single die promises ultra-compact transceiver systems. Therefore, an implementation for a multichannel coherent monolithic optoelectronic receiver was investigated. The coherent receiver uses two photodiodes feeding a differential input stage. Dual control loops sink the DC currents with a conventional NMOS current sink. With 39 GHz bandwidth, 10 dB controllable gain, only 1.7  $\mu$ A<sub>rms</sub> input-referred noise current, and 400 mV<sub>pp,diff</sub> output amplitude it has the potential to be used in high data rate systems. The TIA is prepared for single-supply operation and features voltage and current references. One channel consumes 174 mW, keeping power dissipation manageable for multi-channel receivers.

A serializer and a broadband frequency doubler are designed as supporting circuits for high-speed front ends. The 8:1 serializer with 10.4 Gb/s output data rate uses an optimized combination of CMOS and BiCMOS CML logic cells. Its power and area requirements are minimized using this design strategy. The serializer, including clock balun, uses a core area of 0.025 mm<sup>2</sup> and 54 mW where 68 % of power is used in the output driver.

A push-push frequency doubler is modified for an extensive operational bandwidth and differential outputs. The die includes an integrated balun to keep the fundamental 18.5 dB below the second harmonic over the conversion gain output bandwidth from 5 GHz to 80 GHz. With its differential output modification, the doubler can be directly used as a clock multiplier or flexible local oscillator source in multi-band RF transceivers.

## Acknowledgment

First, I would like to express my sincere gratitude to Prof. Dr.-Ing. Ahmet Çağrı Ulusoy for the supervision during this work and for giving me the opportunity to work on my ideas. I would also like to thank Prof. Dr. Ivan Perić for accepting the co-lecture of this thesis. Furthermore, I would like to thank Prof. Dr.-Ing. Dr. h.c. Thomas Zwick who initially gave me the chance to begin my work at IHE.

I am also grateful to the technical staff at IHE, Andreas Lipp, Thorsten Fux, Ronald Vester, Mirko Nonnenmacher, and Andreas Gallego for their help with preparing measurement samples. Thanks should also go to Simone Gorre, Marion Jentzsch, and Anglea Ziemba for helping to manage everything related to the university bureaucracy. My special thanks go to Jonathan, Jerzy, Joachim, Matthias, Kateryna, Kaan, Tsung-Ching, and Lucas, with whom I shared the office and numerous fruitful discussions during the time of my work. Moreover, I would like to extend my sincere thanks to all my colleagues at IHE for the wonderful time we had working together.

A very special and personal thanks goes to Anja for her never-ending support.

Weinstadt, February 2024

Christian Bohn

## Contents

| Zu | sam   | menfassung                                                | i  |
|----|-------|-----------------------------------------------------------|----|
| Ab | ostra | et                                                        | v  |
| Ac | knov  | vledgment                                                 | ix |
| 1  | Intro | oduction                                                  | 1  |
|    | 1.1   | Electro-Optical Communication Systems                     | 1  |
|    | 1.2   | Modulation Formats for IM/DD Optical Communications       | 3  |
|    | 1.3   | Design Concepts for the Broadband Circuits in This Work . | 5  |
|    | 1.4   | Technologies for High-Speed Optical Communication Systems | 7  |
|    | 1.5   | Organization and Goals of This Work                       | 10 |
| 2  | Driv  | er Circuits with Integrated PAM-4 Combination             | 13 |
|    | 2.1   | Distributed Driver with Analog Power Combining            | 16 |
|    |       | 2.1.1 Unit Cell Design for a Distributed Power Combiner . | 18 |
|    |       | 2.1.2 Line Termination for Distributed Amplifiers         | 21 |
|    |       | 2.1.3 Experimental Results                                | 22 |
|    |       | 2.1.4 Conclusion                                          | 25 |
|    | 2.2   | Fully-Differential Driver                                 | 27 |
|    |       | 2.2.1 Broadband Single-Ended to Differential Converter .  | 28 |
|    |       | 2.2.2 PAM-4 Driver and Combiner Stage                     | 31 |
|    |       | 2.2.3 Experimental Results                                | 33 |
|    | 2.3   | Conclusion                                                | 37 |
| 3  | Line  | ear Transimpedance Amplifier for Short Reach PAM-4        |    |
|    | Rec   | eivers                                                    | 39 |
|    | 3.1   | High-Speed Transimpedance Input Stage                     | 40 |

|   |      | 3.1.1   | Design of the Common-Emitter Shunt Feedback Stage      | 44  |
|---|------|---------|--------------------------------------------------------|-----|
|   | 3.2  | Linear  | ity Considerations and Post-Amplifiers                 | 49  |
|   |      | 3.2.1   | Variable Gain Amplifiers                               | 49  |
|   |      | 3.2.2   | Diode Based Shunt At The Input                         | 53  |
|   | 3.3  | Design  | n of Control Loops and Control Circuits                | 57  |
|   |      | 3.3.1   | First Stage Biasing                                    | 57  |
|   |      | 3.3.2   | Automatic Transimpedance Control                       | 59  |
|   | 3.4  | Experi  | mental Results                                         | 63  |
|   |      | 3.4.1   | Control Loops                                          | 64  |
|   |      | 3.4.2   | RF Performance                                         | 66  |
|   |      | 3.4.3   | Noise                                                  | 70  |
|   | 3.5  | Conclu  | ision                                                  | 71  |
| 4 | Diff | erentia | I EPIC Receiver for Coherent Communications            | 75  |
|   | 4.1  | Differe |                                                        | 76  |
|   | 4.2  | Post-A  | mplifier Section                                       | 80  |
|   | 4.3  | DC-Cu   | arrent Sink and Offset Cancellation Loops              | 83  |
|   | 4.4  | Final F | Receiver Channel                                       | 86  |
|   | 4.5  | Conclu  | usion                                                  | 88  |
| 5 | Bro  | adband  | Data and Clock Generation                              | 91  |
|   | 5.1  | Design  | n of a 10.4 Gb/s Serializer in IHP's SG13G2 technology | 91  |
|   |      | 5.1.1   | CMOS 4:1 Serializer                                    | 93  |
|   |      | 5.1.2   | The BiCMOS CML-Latch                                   | 94  |
|   |      | 5.1.3   | CMOS - CML Integration                                 | 96  |
|   |      | 5.1.4   | Implementation and Test of the 10.4 Gb/s Serializer    |     |
|   |      |         | Test Chip                                              | 98  |
|   |      | 5.1.5   | Conclusion                                             | 101 |
|   | 5.2  | Investi | gation of a Broadband Differential Frequency Doubler   | 103 |
|   |      | 5.2.1   | Input Balun                                            | 103 |
|   |      | 5.2.2   | Broadband Doubler Implementation                       | 104 |
|   |      | 5.2.3   | Experimental Results                                   | 107 |
|   |      | 5.2.4   | Conclusion                                             | 108 |
| 6 | Pac  | kaging  | for Broadband Applications                             | 111 |
|   | 61   | Wire E  | Sonds                                                  | 111 |

|    | 6.2<br>6.3<br>6.4 | DC Blocking                                                  | 114<br>115<br>117 |
|----|-------------------|--------------------------------------------------------------|-------------------|
| 7  | Con               | clusions and Outlook                                         | 119               |
| A  | Sec               | ond Order Low-Pass Transfer Functions                        | 123               |
| в  | Tem               | perature and Supply insensitive Biasing                      | 125               |
|    | <b>B</b> .1       | Design of Bandgap References in SiGe                         | 125               |
|    |                   | B.1.1 Design of a Voltage Reference in IHP's SG13G2 tech-    |                   |
|    |                   | nology                                                       | 125               |
|    |                   | B.1.2 Design of a Voltage Reference for the IHP EPIC Process | 128               |
|    | B.2               | Current Mirrors                                              | 129               |
|    | B.3               | Reference Current Generator                                  | 131               |
| С  | Con               | trol Circuits used in This Work                              | 133               |
|    | C.1               | Operational Amplifiers                                       | 133               |
|    | C.2               | Switches                                                     | 135               |
| Bi | bliog             | raphy                                                        | 139               |
| ٥v | vn Pi             | ublications                                                  | 153               |

## **Acronyms and Symbols**

## Acronyms

| IHE    | Institute of Radio Frequency Engineering and Electronics |
|--------|----------------------------------------------------------|
| IHP    | Leibniz Institute for High Performance Microelectronics  |
| MOS    | metal-oxide-semiconductor                                |
| MOSFET | MOS field-effect transistor                              |
| NMOS   | n-type MOSFET                                            |
| PMOS   | p-type MOSFET                                            |
| SiGe   | silicon germanium                                        |
| HBT    | heterojunction bipolar transistor                        |
| CMOS   | complementary metal-oxide-semiconductor                  |
| BiCMOS | bipolar CMOS                                             |
| CML    | current mode logic                                       |
| IC     | integrated circuit                                       |
| RFIC   | radio-frequency integrated circuit                       |
| PIC    | photonic integrated circuit                              |
| EPIC   | electronic-photonic integrated circuit                   |
| BEOL   | back end of line (the IC's metal stack)                  |
| FEOL   | front end of line (the IC's active layers)               |
| IM/DD  | intensity modulation direct detection                    |

| PAM   | pulse amplitude modulation             |
|-------|----------------------------------------|
| NRZ   | non-return-to-zero                     |
| OOK   | on-off-keying                          |
| MZM   | Mach-Zehnder modulator                 |
| MMI   | multimode interference                 |
| PD    | photodiode                             |
| VCSEL | vertical-cavity surface-emitting laser |
| TIA   | transimpedance amplifier               |
| VGA   | variable gain amplifier                |
| BALUN | balanced-unbalanced or balancing unit  |
| AGC   | automatic gain control                 |
| DC    | direct current                         |
| AC    | alternating current                    |
| EM    | electromagnetic                        |
| SE    | single-ended                           |
| THD   | total harmonic distortion              |
| RLM   | relative level mismatch                |
| SNR   | signal to noise ratio                  |
| TWA   | traveling wave amplifier               |
| DA    | distributed amplifier                  |
| opamp | operational amplifier                  |
| CPW   | coplanar waveguide                     |
| GSG   | ground-signal-ground                   |
| CE    | common-emitter                         |
| CC    | common-collector                       |

| СВ   | common-base                      |
|------|----------------------------------|
| РТАТ | positive to absolute temperature |
| CMRR | common-mode rejection ratio      |

### Constants

| $\pi = 3.14159$                                | PI                       |
|------------------------------------------------|--------------------------|
| <i>c</i> = 299 792 458 m/s                     | speed of light in vacuum |
| $e = 1.602  18 \cdot 10^{-19}  \mathrm{C}$     | elementary charge        |
| $k_{\rm B} = 1.3807 \cdot 10^{-23}  {\rm J/K}$ | Boltzmann constant       |

### Latin Symbols

| f  | Frequency |
|----|-----------|
| BW | Bandwidth |

## **Greek Symbols**

| Waveleng | th |
|----------|----|
| Waveleng |    |

 $\varphi$  Phase

## 1 Introduction

Ever-increasing worldwide internet traffic and users demand a steady growth and advancement in data center capacity [Cis23]. The communication within and between the data centers relies heavily on fiber optics. Due to low losses compared to copper cables and very high available modulation bandwidths, this is the preferred method for high data rate communication systems. Complex modulation schemes are employed for long-haul or intra-data center links, allowing for wavelength division multiplexing, further increasing the data throughput of a single fiber link. Recently 273.6 Tb/s over 1001 km have been reported using multiple modes and wavelengths [VDHDSR<sup>+</sup>23]. In short reach links, e.g., within the data center or a car (IEEE 802.3cz), the necessary connections require more compact transmitter and receiver designs, while the total data rate is not required to be this large. Therefore, intensity modulation is used to transmit and receive data and Ethernet standards like IEEE802.3bs and IEEE802.3dj move to PAM-4 signaling to increase the data rate and prepare the grounds for 200 Gb/s in intensity-modulated single fiber and wavelength links [IEE22, IEE23].

This work aims to investigate and implement integrated circuits to be used in high-speed electro-optical communication systems pushing towards 100 GBd PAM-4 signaling. Components for a front-end chipset, including drivers for Mach-Zehnder modulators and transimpedance amplifiers, are investigated. Additional support circuitry like a serializer and frequency doubler are also developed. This work uses SiGe BiCMOS technologies from Leibniz Institute for High Performance Microelectronics.

#### 1.1 Electro-Optical Communication Systems

Most circuits in this work are designed as building blocks in communication systems as sketched in Figure 1.1. The modulation format used is intensity



Figure 1.1: Typical setup of an IM/DD electro-optical link using a MZM on the transmitter side.

modulation direct detection (IM/DD). A transmitter uses the input data to modulate the intensity of its output laser light, hence the name intensity modulation. For example, the most straightforward modulation format is on-offkeying (OOK) where light on represents a one and light off is a zero. A laser diode or a modulator can control the laser light intensity. Since the physical principles differ for these two options, both require specific driving circuitry. In a laser diode, for example a vertical-cavity surface-emitting laser (VCSEL), the current defines the output laser power. The driver output stage has to be designed to provide a current. When using modulators, an external laser source is used, and its light intensity is controlled by absorption in an electro-absorption modulator (EAM) or interference in a Mach-Zehnder modulator (MZM). The EAM changes its absorption properties depending on the electrical field in the electro-optical active region. The light path in an MZM is split into two branches. Each branch has a variable phase controlled again by the electrical field. After adding both branches, constructive or destructive interference can occur depending on the electrical fields. So, both modulator types need a voltage as the output of the amplifier. On the receiver side, a photodiode (PD) is used to convert from the optical domain back into the electrical. The photodiode is a reverse-biased PN or PIN diode. When being illuminated, a current proportional to the incident optical power can flow through the junction. The first stage of electro-optical receivers is an amplifier accepting a current as input and outputting a voltage. Its gain is then expressed as  $Z_{\rm T} = \frac{v_{\rm out}}{i_{\rm in}}$ , an impedance. Since this impedance transfers an input current at one port to an output voltage at a different port, this amplifier is called transimpedance amplifier (TIA). Following the TIA post-amplification, clock data recovery, deserializer, and analog to digital converters will follow depending on the actual physical protocol. Typically, for longer-distance connections, coherent transmissions are very interesting. They work similarly to modern RF communication systems with complex modulation schemes and, on the receiving side, a local oscillator used for conversion to an intermediate frequency (IF) or baseband. In the case

of an optical system, this is a laser. The IF is then amplified by a TIA and digital signal processing follows to recover the data.

#### 1.2 Modulation Formats for IM/DD Optical Communications

Pulse amplitude modulation (PAM) is the general name for a modulation of the amplitude of pulses. It is sometimes also regarded as amplitude shift keying (ASK) in the electrical domain. The simplest form is non-return-tozero (NRZ), ASK-2, or PAM-2. This is also referred to OOK. Since bandwidth is a limited resource and losses and challenges due to parasitics increase with frequency, an efficient use of the available bandwidth is necessary. Increasing the modulation format to two bits per symbol doubles the data rate and spectral efficiency. The now resulting four amplitude levels define a PAM-4 signal. Figure 1.2 shows two examples of eye diagrams. Histograms for the ideal sampling point and jitter are plotted adjacent to the eyes. These histograms are used to judge the signal quality. From the Nyquist-Shannon sampling theorem ( $f_{sample} = 2BW$ ), the Nyquist rate can be derived. The Nyquist rate specifies the highest symbol rate for a band-limited channel as  $f_{sym} = 2BW$ . For PAM-4



Figure 1.2: Examples of measured eye diagrams including histograms to measure the eye quality for different modulation formats: a) NRZ and b) PAM-4.

signals this results in only a quarter of the bit rate compared to half the bit rate with NRZ signaling [VKDKP<sup>+</sup>19]. This improvement is paid for by a reduced signal to noise ratio (SNR).

The eye height in a PAM-4 is only a third of NRZ. Therefore, the SNR penalty is:

$$SNR_{PAM-4} - SNR_{PAM-2} = 20 \log\left(\frac{1}{3}\right) = -9.54 \, dB.$$
 (1.1)

This means the SNR per detected level is 9.5 dB lower for the same received amplitude and receiver noise. [VKDKP<sup>+</sup>19, Int19]. However, when factoring in the PAM-4 symbol coding, the SNR penalty for the same bit error rate drops to about 7 dB [Säc17].

Baseband data streams and their quality can be judged by eye diagrams and their quality factor Q.

$$Q_n = \frac{\mu_{n+1} - \mu_n}{\sigma_{n+1} + \sigma_n}, \qquad n \in [0, 3]$$
(1.2)

where  $\mu_n$  is the mean value for each amplitude level and its variance  $\sigma_n^2$  [FSN<sup>+</sup>12]. During testing, the signal-to-noise ratio SNR,  $Q^2$ , and eye diagrams can be used for an error probability estimation.

An additional important quality metric of a PAM-4 signal is the relative level mismatch (RLM), a measure for the spacing of the individual signal levels and, therefore, usually an indicator for compression and non-linear distortion:

RLM = 
$$3 \frac{\min(\mu_{n+1} - \mu_n)}{\mu_3 - \mu_0}$$
,  $n \in [0, 3]$  (1.3)

with the  $\mu_n$  for the individual sub-eyes [Int19].

When increasing the modulation format and working with traditional multi-bit digital to analog converters as signal sources, the line drivers and receivers must operate linearly to avoid degraded RLM values, increasing the complexity of a communication system using PAM-4 modulation. Nevertheless, PAM-4, is used in a lot of current and upcoming communication standards like 400GBASE-SR4 in IEEE802.3bs [IEE22] or the 200 Gb/s per lane in IEEE802.3dj [IEE23].

#### 1.3 Design Concepts for the Broadband Circuits in This Work

When talking about broadband circuits in this scope, we think of circuits for wireline or optical communications. These applications require operation with baseband data streams. Therefore, bandwidth is always regarded as spanning from almost DC to  $BW_{3dB}$ . The restriction *almost DC* is due to included DC blocks in the packaged components to avoid DC loading or protect connected devices from incompatible DC levels. These blocking capacitors need to be >10 nF to ensure a high-pass corner frequency in the kHz or low MHz range. The exact specification depends on the application, speed and line code [Opt17a]. In any case, these blocking capacitors are too large to fit on the active IC's die. The high-speed data inputs and outputs use ground-signalground (GSG) pads or variants like GSGSG and GSSG. These signal pads are optimized for a very large operational bandwidth, meaning the signal pad is designed to have a minimal pad capacitance and, therefore, is very small. In typical RF applications, resonant matching is used to avoid reflections and ensure the maximal power transfer. Due to their inherent bandwidth limitation, they are not sufficient for broadband circuits. Matching is sometimes enforced by using appropriate load resistors in output driver stages or matching resistors at the inputs. However, the input of a transimpedance amplifier in an electrooptical receiver is deliberately designed to have a low impedance and no matching to the photodiode. During design of baseband circuits capable of a symbol rate 100 GBd, one big challenge is presented by bandwidth and operational frequencies extending well into the millimeter wave region. The parasitic resistance and capacitance extraction used in analog design for larger structures is not accurate enough for the applications in this work. More precise complete electromagnetic (EM)-simulations suffer from very long run times or sometimes just too much computational effort due to the relatively large number of devices (e.g., resistors and transistors) in a single stage. All circuits in this work are designed using a method of combining an RC-extracted, very compact circuit core with EM-simulated interconnects.

When working with wide bandwidths, the group delay becomes an essential measure of phase linearity. It is the derivative of the phase of a transfer function  $H(\omega)$ :

$$\tau_{\rm g}(\omega) = -\frac{d\phi(\omega)}{d\omega} \tag{1.4}$$

In practice, the  $\tau_{\rm g}(\omega)$  can be regarded as the delay a specific frequency component experiences through the transfer function. A substantial group delay variation is a concern for inter-symbol interference and jitter. Usually, the goal is to keep the group delay variation below 10% of a symbol period. With increasing bandwidths and bandwidth extension measures like inductive peaking, this becomes one of the main challenges during design. Amplitude linearity is measured using total harmonic distortion (THD). THD is defined as the ratio of the power of all harmonics  $P_{\rm h}$  and the power of the fundamental  $P_{\rm f}$ :

$$\text{THD}_{\%} = \frac{P_{\rm h}}{P_{\rm f}} \cdot 100 \,\%. \tag{1.5}$$

These harmonics are, for example, generated by gain compression in amplifying stages. The use of this metric is possible since there can be multiple harmonics within the band of operation when selecting a sufficiently low test signal frequency.

#### Receiver Noise

Transimpedance amplifiers, as the main building block of an optical receiver, are usually characterized by an input-referred noise current. This input-referred noise gives a measure of minimum detectable signal (sensitivity) and can be viewed as a noise current in parallel to the input signal. In that case, the amplifiers are regarded as noiseless. It is derived by measuring the output noise and gain. When assuming a linear transfer function  $Z_T(f)$ , the output voltage noise power spectral density is related to the input-referred noise current power density by:

$$V_{n,o}^{2}(f) = |Z_{\rm T}(f)|^{2} I_{n,i}^{2}(f).$$
(1.6)

Sometimes, the root-mean-square noise is given. This is related to the spectral density by integration to the noise bandwidth  $BW_n$ :

$$\overline{v_{n,0}^2} = \int_0^{BW_n} V_{n,0}^2(f) df.$$
 (1.7)

The RMS noise is then:

$$v_{n,\text{RMS},0} = \sqrt{v_{n,0}^2} \tag{1.8}$$

An RMS value is relatively easy to measure since it is equal to the standard deviation of the output voltage. Referring the total noise and RMS noise back to the input is done analog to Equation (1.6):

$$\overline{i_{n,i}^2} = \frac{1}{Z_{\text{T}0}^2} \overline{v_{n,o}^2}.$$
(1.9)

Here,  $Z_{T,0}$  is the mid-band transimpedance, which is chosen since the RMS and total noise voltage lacks frequency information.

### 1.4 Technologies for High-Speed Optical Communication Systems

An electro-optical transceiver consists of multiple devices. Especially since the conversion from electrical to optical and vice versa has to be performed. This section will give a brief overview of the materials and technologies involved.

Photodiodes (PDs) rely on the photocurrent in a PN or PIN junction. Different properties (e.g., responsivity, operational wavelength, and electrical bandwidth) are based on the used materials. Therefore, the targeted wavelength often determines the selection of PD technologies. Commonly selected semiconductors are GaAs, InP, Ge, and Si [RBT<sup>+</sup>21].

Silicon-based photonic integrated circuits (PICs) are used for compact coherent receivers [AVI<sup>+</sup>23]. The PICs feature Si or SiN waveguides, couplers, phase shifters, and photodiodes, allowing for compact receiver or transmitter assem-

blies. At the same time, their manufacturing processes are often compatible with typical CMOS production lines.

New generations of silicon-organic hybrid Mach-Zehnder modulators promise a large bandwidth [KFZ<sup>+</sup>20, EMF<sup>+</sup>22]. They use an organic material as an electro-optical active compound. At the same time, their  $V_{\pi}l$  product is much lower than LiNbO<sub>3</sub> modulators [VMW<sup>+</sup>22] and thus the needed modulation voltage is lower for similar-sized devices. This enables the use of lower breakdown technologies for driver circuits. Since their structure is based on Silicon technology, they are compatible with Silicon PICs. If the modulator is too long to be treated as a lumped component, it is often integrated into a waveguide such as a 50  $\Omega$  coplanar waveguide (CPW) transmission line [Zwi20]. The line needs to be terminated at the end, and, therefore, a modulator can be seen as a 50  $\Omega$  load to the circuit during the driver design phase.

On the electrical side of an electro-optical communication system, the radiofrequency integrated circuit (RFIC), two primary technology families are used: silicon germanium (SiGe) bipolar CMOS (BiCMOS) and complementary metal-oxide-semiconductor (CMOS) processes. Due to the higher intrinsic  $g_{\rm m}$  of heterojunction bipolar transistors (HBTs) in SiGe BiCMOS, they are attractive for high-speed and low-noise front ends. These technologies are available from multiple vendors (e.g., Leibniz Institute for High Performance Microelectronics (IHP), Infineon, Globalfoundries, Tower Semiconductor) and also feature a CMOS device portfolio for control and bias circuitry. In recent years, deeply scaled CMOS processes are gaining importance due to the increased RF performance of these technologies [PSC23]. Also, the link speeds often require signal processing for error correction and calibration of channel impairments. This digital signal processor (DSP) can directly integrate with the front end using small-node CMOS technologies. In contrast, BiCMOS processes usually lack digital performance due to legacy nodes being used as CMOS basis.

Since photonic components, especially laser diodes, have requirements on the technology used, which makes them hard to integrate with RFIC technologies, it is typical to build hybrid electronic-photonic modules. These modules then feature the best of both worlds but introduce packaging difficulties into the design process. The high-speed I/O connections especially have to be modeled precisely. In these modules, a photodiode might be bonded to receiver ICs [PSC23, LSH<sup>+</sup>21]. Current setups often include SiGe drivers and TIAs



Figure 1.3: Hybrid packaging concept: PAM-4 driver + modulator used for the designs in Chapter 2.

co-packaged with photodiodes, modulators or VCSELs [AHW<sup>+</sup>19, AVI<sup>+</sup>23, VMA<sup>+</sup>21, CYS<sup>+</sup>20, BSH<sup>+</sup>15]. Figure 1.3 shows this hybrid packaging concept used as a framework in this work for designing drivers with integrated PAM-4 power combination.

#### **Electronic-Photonic Integrated Circuits**

To avoid the need for high-bandwidth interconnects like bond wires or flipchip connections between a photodiode and the TIAs, the use of an electronicphotonic integrated circuit (EPIC) is an option [ANW<sup>+</sup>15, EAW<sup>+</sup>16]. The EPIC attribute describes that photonic and electronic circuits coexist on the same die. This poses additional challenges for the technology because the photonic components like waveguides, couplers, phase shifters, and photodiodes usually have different constraints than electronic or RF circuits. The biggest advantage of these technologies is the reduction of packaging complexity. Alternatively, when constructing an MZM-based transmitter, the output stages can be connected directly to the phase shifter branches [GLRP<sup>+</sup>17, RLP<sup>+</sup>16].

In IHP's implementation of an EPIC technology based on a  $0.25 \,\mu m$  BiCMOS process node [KLB<sup>+</sup>15], the photonic components sit on top of a thick SiO<sub>2</sub> layer. Whereas the conventional BiCMOS devices are manufactured on a Si-Bulk substrate. Because of this, a significant distance between photonic and electronic devices is necessary. The metal back end can be manufactured above both regions and interfaces the photonic and electronic domains. This technology is used in a coherent receiver design discussed in Chapter 4.

Another available EPIC technology is Globalfoundries' Fotonix<sup>™</sup> 45SPCLO platform [RMN<sup>+</sup>20]. A 45 nm CMOS process with included photonic compo-

nents where 112 Gb/s were demonstrated [BAC<sup>+</sup>23, MAL<sup>+</sup>23]. With increasing channel density, these technologies can provide a cost-effective solution for volume production.

#### 1.5 Organization and Goals of This Work

This work presents and discusses different aspects and circuits for high-speed optical communication.

Chapter 2 starts with the design of driver circuits for low MZMs with  $V_{\pi}l$ products as low as 1 V. The primary focus is set on an integrated PAM-4 signal combination from two NRZ input streams, allowing to feed these drivers directly from serialized data sources. Their operation as a 2-bit digitalto-analog converter simplifies the targeted 100 GBd (200 Gb/s) single-channel transmitter operation. Two variants of distributed power combiners and a differential non-distributed implementation are investigated.

A linear transimpedance amplifier for IM/DD detection and its associated design challenges are presented in Chapter 3. The input stage is designed to interface a single photodiode. Particular focus is on a modified input current sink topology, enabling an enhanced dynamic range. At the same time, the bandwidth requirements are set to allow for 100 GBd PAM-4 signals while keeping the power consumption low. Control and supply circuits are added for DC offset compensation and automatic gain control.

Since coherent optical communication is essential in long-haul networks, a fully-differential TIA for coherent data transmission is shown in Chapter 4. It is designed in an electronic-photonic IC process, enabling a single-chip receiver and a minimized packaging effort. The advantages of a differential implementation over a single-ended input stage are investigated and discussed.

Data serialization is a significant factor in generating data streams. In Section 5.1, a serializer implementation using CMOS and BiCMOS logic is presented. The goal for this serializer is a power and area-efficient implementation leveraging and combining the advantages of different logic implementations.

A broadband doubler in Section 5.2 enables flexible clock or LO generation for wide-band or multi-band systems. Since RF systems usually utilize differential

circuits, this doubler investigates the possibility of modifying a classical pushpush transistor pair to generate a differential output.

Chapter 6 summarizes prototype packaging methods with in-house technology. This chapter focuses on the goal of creating a packaged version of the driver in Section 2.2 to be connected to an MZM modulator according to Figure 1.3.

### 2 Driver Circuits with Integrated PAM-4 Combination

A PAM-4 signal can be constructed by adding two NRZ data streams where one has twice the amplitude. Implementing the PAM-4 signal generation from digital data streams more sophisticatedly than passive power combination is beneficial since a passive solution is inherently wasteful. Multi-bit and high-speed digital to analog converters have been reported recently [WCL<sup>+</sup>22, KKB<sup>+</sup>22]. However, with the cost of complex designs in advanced and expensive technologies with very low voltage limits, they may also need a linear driver amplifier. Two analog PAM-4 power combining circuits are presented and compared to overcome these limitations in Section 2.1 and Section 2.2. They both work on the principle of adding the outputs of two amplifiers as seen in Figure 2.1a. The resulting symbol mapping and eye diagram are depicted in Figure 2.1b. As design outline and target serve a hybrid packaged PAM-4 driver integrated circuit (IC) and modulator module PIC as schematically shown in Figure 1.3. The target output voltage swing of the drivers 1 V<sub>pp</sub> was chosen. This is sufficiently large to drive short SOH modulators with  $V_{\pi}l \leq 1$  V mm. The desired data rate is 100 GBd or 200 Gb/s.



Figure 2.1: a) PAM-4 signal creation by summing two data streams. b) resulting signal levels and bit mapping.



Figure 2.2: Normalized intensity transfer function of an ideal MZM. The marked point is the quadrature point used for biasing in IM/DD systems.

A lossless MZM in push-pull configuration has the intensity transfer function given by [Zwi20]:

$$\frac{P_{\text{out}}}{P_{\text{in}}} = \frac{1}{2} \left( 1 + \cos\left(\frac{\pi V_{\text{in}}}{V_{\pi}}\right) \right). \tag{2.1}$$

Figure 2.2 shows this normalized transfer function and the bias point for intensity modulation. Typically, a DC voltage is superposed on the RF signal to set the bias point, or the already present DC potential of the amplifier output is adjusted to match this bias point. A quadrature bias point at a multiple of  $\frac{(2n-1)}{2}V_{\pi}$  can be chosen. An almost linear relationship between drive voltage and optical output intensity is present when not using the full modulation depth. However, for the full depth, the modulator response is non-linear. Since the modulation function is known, pre-distortion of the amplitude levels can create equally spaced intensity levels for the full modulation depth. For four equally spaced intensity levels, the driving levels should be 0,  $0.39V_{\pi}$ ,  $0.61V_{\pi}$ , and  $V_{\pi}$ . Two PAM-4 streams could also be used for an IQ modulation (QAM16) for coherent transmission. In this case, two modulators are biased at the zero intensity point ( $V_{\pi}$  in Figure 2.2), and each fed by a PAM-4 signal with the desired electrical driving levels at 0,  $0.78V_{\pi}$ ,  $1.22V_{\pi}$ , and  $2V_{\pi}$ . The optical carrier in both modulators is 90° offset, and the outputs are combined into a complex modulated QAM16 signal. Due to the electrical levels being dependent on the modulation depth, the drivers are designed with an adjustable level spacing. The design procedure focuses in a first step on small-signal simulations to analyze and optimize the gain and phase responses. However, due to the current combining principle used, the circuits will operate in a

non-linear regime when used with NRZ input data streams. During design this behavior is tested and the operation verified by transient simulations with random data inputs.

#### 2.1 Distributed Driver with Analog Power Combining

#### Parts of the following section have been published in [3].

For broadband amplifiers, the input and output capacitances of gain stages and the load and source resistance limit the realizable bandwidth in a basic low-pass response. The concept of distributed amplification aims to mitigate this problem by using parallel amplification stages connected by inductances and absorbing the capacitances into so-called artificial transmission lines. The resulting line mimics the lumped model of a transmission line segment. Figure 2.3 presents the general concept of a distributed amplifier (DA), which was already proposed in the age of vacuum tube electronics [GHJN48]. In and outgoing waves travel along the lines and get amplified by constructive interference. Hence, the alternative name traveling wave amplifier (TWA). However, for the signal to add up constructively, the phases for all wave components have to match

$$\beta_{\rm in} l_{\rm in} = \beta_{\rm out} l_{\rm out}, \qquad (2.2)$$

where  $\beta$  is the propagation constant on the lines respectively. This formula assumes an equal phase response for all gain stages. If this condition holds and neglects line losses, the gain of a DA is

$$A_{\nu} = \frac{1}{2} n G_m Z_0, \tag{2.3}$$



Figure 2.3: General DA structure. Inductors represent the lines.


Figure 2.4: Four stage distributed power combiner [3] ©IEEE.

*n* represents the number of stages [GHJN48]. Broadband matching to the source and load can be created by adjusting the line impedance

$$Z_{\text{Line}} = \sqrt{\frac{L}{C}}$$
(2.4)

$$=\sqrt{\frac{L'l}{C'l+C_{\rm in,out}}},$$
(2.5)

where *L* and *C* are the artificial transmission lines' total inductive and capacitive components. They are calculated by the inductance L' and capacitance C' per length and the length *l*. The line capacitance adds to the input or output capacitance of the amplifiers  $C_{in,out}$ . The cutoff in the artificial transmission line then limits the bandwidth:

$$\omega_{\rm c} = \frac{2}{\sqrt{LC}}.\tag{2.6}$$

Nevertheless, DAs reach large bandwidths. A relatively low gain and efficiency pay for this due to parallel stages in contrast to more commonly used series gain stages. Since waves can travel in both directions on the transmission lines, a termination  $R_{\text{Term}}$  needs to be used to prevent reflections, further decreasing the efficiency due to a part of the signal being dumped in that termination.

Following the idea in Figure 2.1a, the distributed amplifier can now serve as an analog power combiner by adding a second input line and an array of gain cells [TCE15, 10]. The power combination is performed on a shared collector line. Figure 2.4 illustrates this approach for a four-stage power combiner. If the amplifiers in the respective paths  $A_1$ ,  $A_2$  have the before mentioned gain

difference of 6 dB, the inputs are combined into a PAM-4 signal at the output port. Port 1 and Port 2 are the inputs for two NRZ data streams, and Port 3 is the PAM-4 signal output. In the scope of this work a driver was implemented in IHP's SG13S Technology, a 130 nm SiGe:C BiCMOS process with five thin and two thick aluminum metal layers featuring HBTs with  $f_T = 250$  GHz and  $f_{max} = 300$  GHz. In the design phase, the goals of an output voltage swing of 1 V<sub>pp</sub> and a bandwidth of about 80 GHz to 100 GHz were targeted. These metrics are chosen so that recent generations of SOH modulators could be driven even above 100 GBd [EMF<sup>+</sup>22].

# 2.1.1 Unit Cell Design for a Distributed Power Combiner

The four-stage power combiner is constructed out of identical unit cells. This choice has the advantage of fast extension to more cells if needed. Efficiency was not the first concern in this design, so techniques like non-uniform cells were not used. Since two signals are added with different gain, two gain stages per unit cell must be designed. Both must have similar input and output capacitances to fulfill the phase relation in (2.2). Each amplifier has a core consisting of an HBT cascode. It is a single-ended amplifier with six transistor fingers with  $A_e = 120 \text{ nm} \times 480 \text{ nm}$  each. At a bias of  $V_{BE} = 0.9 \text{ V}$  which is slightly below max( $F_t$ ) the amplifier shows  $C_{in} = 63.1$  fF and  $C_{out} = 13.8$  fF. With  $Z_{\text{Line}} = 50 \Omega$  and Equations (2.4) and (2.6) this input capacitance would lead to a line cutoff  $f_c = 100$  GHz. When factoring in the additional capacitance per unit length of the input and output lines, this will drop below 63 GHz, still neglecting additional wiring parasitics. Additionally C<sub>in</sub> is much greater than  $C_{\text{out}}$ . Even when considering double the output capacitance at the collector node due to two amplifiers being connected, a matching capacitance would need to be placed to ensure identical capacitances at the nodes of the lines. In order to mitigate these drawbacks, two amplifier cell variants featuring methods to reduce the input capacitance are investigated. An emitter degeneration (ED) and capacitive divider (CDIV) at the input variant for the amplifiers are tested. The amplifier cells are divided into specific high-gain and low-gain variants.



Figure 2.5: Simplified schematic of the emitter degenerated amplifier cell and component values.

#### **Amplifier Gain Cells**

The emitter degenerated amplifier cell and its sizing is shown in Figure 2.5. This degeneration enables a larger bandwidth and reduced input capacitance by sacrificing some gain.

The schematic of the capacitive divider variant is shown in Figure 2.6. The main method of reducing the input capacitance in this variant is creating a capacitance in series to the actual input capacitance of the transistor. The dividing capacitance  $C_{\text{div}}$  is selected to be approximately equal to the expected  $C_{\pi}$ . Since this will miss a path for DC bias and generate a high-pass characteristic, a parallel resistor is placed to  $C_{\text{div}}$ . This  $R_{\text{div}}$  is optimized for a flat frequency response. For this to happen, it needs to be scaled appropriately to the expected bias point of the amplifier cell. The capacitive divider variant also features a small emitter degeneration resistance with a large bypass capacitor to enhance the bandwidth.

#### **Transmission lines**

In the Aluminium back end of line (the IC's metal stack) (BEOL) of IHP's SG13S Technology, there are five thin (M1 to M5) and two thick (TM1 and TM2) metal layers. The inductances connecting the individual stages are designed as high-impedance transmission lines. They use the top thick metal layer TM2 as signal and M3 as ground. The M3 ground is used to stack with

| Out<br>Y |                                                                                            | high gain              | low gain              |  |
|----------|--------------------------------------------------------------------------------------------|------------------------|-----------------------|--|
| $V_{b2}$ | R <sub>E</sub>                                                                             | 5Ω                     | 7.5 Ω                 |  |
|          | CE                                                                                         | 200 fF                 | 200 fF                |  |
|          | R <sub>div</sub>                                                                           | $3.7 \mathrm{k}\Omega$ | $17 \mathrm{k}\Omega$ |  |
|          | Cdiv                                                                                       | 60 fF                  | 60 fF                 |  |
|          | Nf <sup>a</sup>                                                                            | 6                      | 6                     |  |
|          | Ic                                                                                         | 9.8 mA                 | 1.5 mA                |  |
|          | <sup>a</sup> Number of fingers with $A_E = 0.12 \mu\text{m} \times 0.48 \mu\text{m}$ each. |                        |                       |  |

Figure 2.6: Simplified schematic of the capacitive divider amplifier cell and component values.

M2 for an improved ground and still route the bias voltage for the common-base stage underneath the ground. A line width of 3 µm was chosen to stay within the maximum current density specification of TM2 while carrying the total collector current of about 40 mA. Despite the total current only flowing in the last section on the collector line, all lines are designed with equal dimensions to ensure a matching signal propagation. At the beginning of the design phase, the PDK-provided transmission line models are used to extract starting points for line lengths. With the mentioned dimensions, the following parameters can be extracted:  $Z_{\text{Line}} = 81 \Omega$ , L' = 0.52 pH/µm and C' = 79 aF/µm. The needed line length can then be calculated by

$$l = \frac{C_{\rm in} Z_0^2}{L' - Z_0^2 C'}.$$
 (2.7)

With these starting points, the lengths are tuned in a schematic design using parasitic extracted core cells. From this point, custom folded transmission lines are designed and optimized using EM simulations. The lines are folded to save space and prevent the chip from being narrow and long. The amplifier's input is placed at the center of the input line segments. The collector line connects to both outputs and features a slight additional peaking for the low gain path.



Figure 2.7: a) Schematic of the used split load structure. b) Return loss presented to the line with and without external capacitance. c) Resonance avoiding with a split load.

## 2.1.2 Line Termination for Distributed Amplifiers

Until now, we considered the line termination as an ideal 50  $\Omega$  load. With base and, more importantly, collector bias voltages present on the lines, these loads lead to significant wasted power. A DC-blocked termination is necessary to avoid this. The usable metal-insulator-metal (MIM) capacitor in this process however has only a specific capacitance around 1.5 fF/µm<sup>2</sup>. If we want to achieve a reasonably good termination, the blocking capacitor would be too large for a practical IC size. Therefore, a termination following the structure in Figure 2.7a combines on-chip and off-chip decoupling capacitances. Since  $C_{\text{on-chip}}$ ,  $C_{\text{off-chip}}$  and  $L_{\text{bond}}$  form a resonance circuit the resistance was divided into a 40  $\Omega$  and two 10  $\Omega$  pieces. The 10  $\Omega$  resistances efficiently load the C-L-C circuit and flatten any resonances. By using all available area without significantly increasing the die size, a total of  $C_{\text{on-chip}} = 45$  pF could be fitted,



Figure 2.8: Chip Photograph of the CDIV variant. West: input pads; East: output pad; North and South: DC biasing. The ED variant has the same overall circuit dimensions.

forming the decoupling on-chip needed for high frequencies.  $L_{\text{bond}}$  is the wire-bond inductance and assumed to be 1 nH.  $C_{\text{off-chip}}$  is a 150 pF high-frequency die capacitor, where the bottom electrode is glued onto the package ground plane and the top electrode is connected via wire bonds. Figure 2.7b presents  $S_{11}$  seen by the artificial transmission lines and proves the need for the additional off-chip decoupling capacitance. Otherwise, the termination could cause reflections and, therefore, signal degradation within the frequency range of interest. Additionally, Figure 2.7c presents the improvement in the termination when damping the resonance utilizing the split resistance. The example shown here uses only one external decoupling capacitor. When using more capacitors to improve the low-frequency cutoff, multiple resonance peaks can occur, all of which are suppressed by the series loading resistances. This technique is used on the input and output lines of the distributed power combiners in this section.

## 2.1.3 Experimental Results

Figure 2.8 shows the photograph of the CDIV variant. The ED variant uses the same pad frame and a very similar layout. For measurements, the DUTs are mounted on a printed circuit board, and external capacitors for the line terminations are connected. S-parameter measurements are performed with a broadband measurement system from 0.01 to 110 GHz. The 3-port response is calculated from two 2-port measurements. Figure 2.9a shows the small signal



Figure 2.9: Small signal results for CDIV variant. a) S-parameters, b) filtered group delay response, and c) gain difference tuning capability [3] ©IEEE.

response for the capacitive divider variant, and Figure 2.10a for the emitter degenerated variant. Both show a good input and output match up to 100 GHz. In addition, the low-frequency input and output return loss is better than 20 dB, proving the functionality and performance of the DC blocked termination. The capacitive divider variant has a 3 dB bandwidth of 81 GHz. However, the high gain path shows a drop below 5 GHz indicating a mismatch in the capacitive input divider with its bypass resistor ( $C_{div}$  and  $R_{div}$  in Figure 2.6). This mismatch could be caused by process variation in the considerably big resistance  $R_{div}$ . The deviation with the simulated gain at lower frequencies ( $\leq$  10 GHz) also indicates a resistor deviation.

The emitter degenerated driver variant exhibits a bandwidth of 102 GHz. The variance in the gain difference between the forward paths is, in this version,



Figure 2.10: Small signal results for ED variant. a) S-parameters, b) filtered group delay response, and c) gain difference tuning capability [3] ©IEEE.

slightly higher (between 5.8 and 6.5 dB). However, this variant performs considerably better, considering the higher gain and bandwidth. Compared to the simulated response, both circuits show a slight gain decrease towards higher frequencies and worse matching. This effect could not be fully captured in post-measurement simulations, even by simulating an entire unit cell as one EM-structure. In Figures 2.9b and 2.10b, the measured group delay for both circuits is presented. To account for noisy measurements, the phase of forward gain paths is smoothed with a spline before group delay calculation. The group delays for both variants show almost the same frequency response and a variation of less than 5 ps in a frequency range from 2 to 80 GHz, promising low distortion for high data rates. Both circuits can adjust the amplification in the low gain path by changing its bias point and thus the gain difference

between both paths. Figures 2.9c and 2.10c show this tuning capability for both variants. With adjustment, the relative sizes of the outer eye openings to the inner eye can be tuned.

Time domain measurements are performed with an Agilent DCA 86100C mainframe with 70 GHz sampling heads. The inputs and outputs are probed with GSGSG and GSG probes, respectively. Figure 2.11a is recorded with a Keysight M8194A arbitrary waveform generator as source for two PRBS-31 data streams. It shows 40 GBd (80 Gb/s) operation for the circuit version with emitter degeneration. Figure 2.11b presents the same measurement with the capacitive divider circuit variant. The voltage swings are  $\approx 700 \text{ mV}$  for the ED and  $\approx 600 \text{ mV}$  with the CDIV variant. This is less than the targeted 1 V. The losses of added bias-tees and probes at the output are not de-embedded. However, the bias-tee and probe introduce a losses of less than 1.7 dB below 67 GHz which does not explain this difference. For Figure 2.12 the SHF 12105 A with two C603 B multiplexers was used for two NRZ input streams resulting in a 70 GBd (140 Gb/s) PAM-4 signal. The amplitude in this case is with 150 mV even lower compared to the lower speed measurements. With higher input power the distortions lead to fully closed eyes, therefore, the output amplitude could not be increased. As with the roll-off in the S-parameter response, this is behavior could not be reproduced in simulations. Nevertheless, both circuits demonstrate the successful generation of PAM-4 signals, with the capacitive divider variant even beyond 100 Gb/s. Although, in principle, both circuits could operate at higher speeds, higher speed measurements could not be taken due to the limitations of the challenging time domain measurement setup.

## 2.1.4 Conclusion

Two different driver circuits for optical modulators with analog PAM-4 generation based on distributed power combining were investigated. Variations with capacitive divider at the input of the gain cells and emitter degeneration exhibit high bandwidths of up to 102 GHz. Both variants have low group delay variation and are therefore promising high data rate circuits while consuming only 118 mW and 122 mW for the CDIV and ED variants, respectively. However, the capacitive divider at the input requires the usage of a high resistance in the signal path, leading to an increased uncertainty with process variation. Time



Figure 2.11: 40 GBd eye diagrams: a) ED and a) CDIV [3] ©IEEE.



Figure 2.12: 70 GBd eye diagram for CDIV [3] ©IEEE.

domain measurements with eye diagrams at data rates of 80 Gb/s and 140 Gb/s verified operation at high data rates. In Table 2.2 these variants are compared to some state-of-the-art designs as well as a different design developed in the scope of this work and presented in the next section.

# 2.2 Fully-Differential Driver

#### Parts of the following section have been published in [1].

The distributed power combining structure in Section 2.1 suffers a few drawbacks. First, the distributed nature requires a relatively large chip area for the given task. Second, proper line termination is needed to ensure a traveling wave structure without reflections. A differential implementation would most importantly include many crossing RF transmission lines, potentially creating coupling issues. A differential implementation, however, is beneficial in terms of biasing and supply noise rejection and is more compatible with highspeed CML-based serialization or equalization circuitry. Therefore, a lumped approach creating a PAM-4 signal in a fully differential architecture was investigated. Lumped in this context means one fully differential amplifying stage as PAM-4 combiner and output driver. This requires two differential high-speed inputs to drive the output stage. These inputs are again of the same amplitude and represent two synchronous data streams. The input signal quality in terms of phase relation and time of arrival of the MSB and LSB strongly influences the output signal. Since this imposes many challenges for the measurement setup, a broadband single-ended (SE) to differential converter (balanced-unbalanced or balancing unit (BALUN)) is added to the test chip for verification. And a pre-driver connecting the output stage and the baluns. The resulting block diagram of the differential PAM-4 combiner is shown in Figure 2.13. The design was implemented in IHP's SG13G2 technology a 130 nm SiGe:C BiCMOS process with five thin and two thick aluminum metal layers featuring HBTs with  $f_{\rm T} = 300 \,\text{GHz}$  and  $f_{\rm max} = 500 \,\text{GHz}$  [HBB+10].



Figure 2.13: Block diagram of the differential PAM-4 combiner including baluns.

## 2.2.1 Broadband Single-Ended to Differential Converter

The design goal behind this balun is to provide a well-balanced differential signal for the main combiner stage. It also serves as a broadband input match. Passive wavelength-dependent implementations are unsuitable due to their inherent band limitation. Therefore, an active circuit was chosen for this task. Active circuits usually follow two main topologies: a commonemitter/common-collector (CE/CC) stage with identical  $R_{\rm E}$  and  $R_{\rm C}$  where the outputs signal at the emitter and the collector terminal are used as 180° offset output or a differential amplifier using only one input. The single CE amplifier, however, has different signal propagation times from base to collector versus base to emitter, creating phase errors that are hard to tune out for a wide bandwidth. Different parasitic capacitances form a diverging frequency response. The CE amplifier also leads to different DC potentials for the differential output. This is a major problem since DC blocking capacitors in between the balun and following stages would create a high-pass response within the band of interest since on-chip capacitances are limited in size. Considering all these disadvantages, a CE differential amplifier was chosen as balun due to its symmetric outputs. To be able to use a differential amplifier as a balun, an excellent common-mode rejection is essential. Then the circuit can be regarded as amplifying  $v_{in,d} = v_{in,SE} - 0$ . As can be seen in the simplified schematic in Figure 2.14a, the balun uses two stages. The second stage provides additional common-mode rejection and, thus, a better amplitude and phase balance. This is necessary since the balance of just one asymmetrically driven differential amplifier is not enough due to the limited output impedance of the current source. The individual stages and the input and outputs are DC coupled to ensure operation down to DC. Since the stage is designed to have only a SE-SE gain of  $-3 \, dB$ , using a cascode stage does not provide significant benefit. So the CE implementation can run from a 1.8 V supply to save some power and consumes only 23 mW, including the reference currents needed for the biasing with current mirrors. To avoid a cluttered schematic in Figure 2.14a, the current mirrors are simplified as only the mirroring transistors. The implementation of the current mirror features emitter degeneration to protect against thermal runaway and increased impedance looking into that current source (see Appendix B.2). A breakout circuit of this balun was fabricated and measured separately. Its chip photo and dimensions are shown in Figure 2.14b. In Figure 2.15a, the S-parameters of this breakout are shown. They show



Figure 2.14: a) Schematic of the single-ended to differential converter [1] ©IEEE. b) Photo of Balun breakout circuit.

an excellent agreement between measurement and simulation. However, the measured gain is significantly higher than simulated with nominal device models. The plot also shows the expected behavior for the best-case corner of the HBT transistors using the foundry-provided HICUM models. The best-case gain is higher than measured, so the measurement results are within the specified tolerances. For both device corners, the shape of the frequency response closely matches in simulation and measurement. The resonance around 2 GHz originates in a resonance created by on-chip supply decoupling capacitors and off-chip components, like the DC needle inductance and decoupling in the DC probe. Connecting the chip with DC contacts or wire bonds forms a CLC resonant circuit. Unfortunately, the external components were not considered during the original design phase, and therefore, measures to dampen this res-Layout errors like asymmetric supply lines to save onance were not taken. pads make this problem worse. The peaking inductors are connected to supply rails. Those, however, are not equally connected to the pad. This is especially a problem for frequencies where on-chip decoupling is not enough. However, the matching S-parameter results show that its effect can be modeled and the behavior predicted. Accurate modeling, however, necessitates an EM model



Figure 2.15: a) Measured and simulated S-parameters for the balun breakout circuit and b) resulting phase and amplitude balance [1] ©IEEE.

|                             | This        | [FEM <sup>+</sup> 20] | [SRCE19]    | [HN16]      |
|-----------------------------|-------------|-----------------------|-------------|-------------|
| Technology                  | 130 nm SiGe | 130 nm SiGe           | 130 nm SiGe | 180 nm SiGe |
| Amplitude imb. (dB)         | 0.7         | 0.65                  | 0.2         | 1           |
| Phase imb. (°)              | 2           | 3.2                   | 5           | 10          |
| Bandwidth (GHz)             | 5 to >67    | 3 to 60               | 0 to >70    | 0 to 50     |
| <b>P</b> <sub>DC</sub> (mW) | 23          | 37                    | 144         | 14.4        |
| Area (mm <sup>2</sup> )     | 0.23        | 0.42                  | 0.42        | 0.319       |

Table 2.1: Comparison broadband active baluns.

of the supply lines to capture the line impedance adequately. Its impact is also negligible for the high data rates targeted in this design. Figure 2.15b presents this active balun's amplitude and phase balance. Again, the low-frequency resonance can be seen. Additionally, the design has a slight deviation in the amplitude balance at 35 GHz. An asymmetric supply creates additional inductance in one branch of the balun, effectively providing peaking in  $S_{21}$  but not in  $S_{31}$ . And thus, the amplitude imbalance peaks at 35 GHz to around 0.7 dB. Despite these issues, the circuit provides a very wide bandwidth of >67 GHz while still maintaining an amplitude imbalance  $\leq 0.7$  dB and a phase error below 2° in the range of 5 GHz to 67 GHz. With its low power consumption,



Nf: Number of emitter fingers with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$  each.

Figure 2.16: Schematic of the PAM-4 combiner.

wide bandwidth, and good balance, this balun is comparable to other works as shown in Table 2.1.

## 2.2.2 PAM-4 Driver and Combiner Stage

The main driver and output stage is shown in Figure 2.16. The output stage operates as a current steering current combiner. It consists of two differential cascode amplifiers, where one is scaled by a factor of two with respect to device sizing and bias current. The currents are combined in the load, creating a signal with four amplitude levels for same-level input signals. By adjusting the ratio of the bias current and thus the switching current in the respective branches, the eye amplitude of the inner eye to the outer eyes can be controlled. This is useful to pre-emphasize the non-linear modulation response of MZMs when biased in the quadrature operating point [LZH13]. The load features inductive peaking. The inductors are placed after the load resistance to separate their parasitic capacitance from the output node. Since a large resistance precedes them, the achieved quality factor is not of significant concern as long as the self-



Figure 2.17: Cascode pre-driver, as DC-level shifter and line driver connecting the output stage.

resonance frequency is larger than the operating frequency. Between the output stage and preceding circuits common-collector (CC) buffers are placed. These buffers have a voltage gain of approximately one but are used to decouple the output stage's input capacitance from the previous stage. Resistors are used to bias these buffers. Current sources would add to the large loading capacitance, increasing the danger of ringing in the buffer. Since the balun's  $V_{cc}$  is 1.8 V, the signals DC potential needs to be shifted upward to match the supply of the emitter followers. Therefore, a cascode pre-driver is placed between the balun and the output stage. It has a gain of only  $A_{v,diff} = 1.4$ . This is chosen to counteract losses in the wiring but at the same not waste excessive power. Its schematic is presented in Figure 2.17. In Figure 2.18, the complete chip photograph is shown. The long lines connecting the output stage can be seen between the pre-driver and output stage. The lines are length-matched, and as can be seen, they cross each other. This crossing is due to the folded layout of the output stage. The high gain path is in the center and is sided by the low gain branch. This is done so the parasitic resistance in the emitter connections is lower for the stage with the higher  $g_m$ , where this resistance has a stronger effect. Therefore, the lines have to split up and cross each other. To minimize coupling effects, the lines cross at a  $90^{\circ}$  angle and are only 4  $\mu$ m wide. The



Figure 2.18: Chip photograph of the PAM-4 Combiner including balun (blue), pre-driver (orange), and combiner output stage (red) [1] ©IEEE.

resulting coupling is always less than -35 dB in the frequency range of interest. The driver is designed to reach  $V_{out} = 2 V_{pp,diff}$ , while consuming a total of 315 mW. Both output differential pairs consume 54 % of that power alone. Multiple supply voltages are used for the blocks to save some power on the chip: 1.8 V for the baluns, 2.7 V for the pre-drivers and CC buffers, and 4 V for the output stage.

## 2.2.3 Experimental Results

Figure 2.19a shows the S-parameters for the entire PAM-4 combiner. Ports 1 and 2 are the inputs. Port 3 is the differential output port. Due to similar design errors as in the balun, the small signal gain has a resonance around 2 GHz. Since it is the same balun design, the same low-frequency behavior is expected. The LSB starts to roll off above 35 GHz. This originates in an underestimation of the inductive degeneration formed by the emitter connections of the differential pair. The initial design was conducted using only parasitic extraction, which did not adequately capture this branch's inductance. Since the output stage is not operating linearly when fully switching, this bandwidth limitation is, to a first degree, not critical for high-speed performance. Nevertheless, this sets a limit for the eye quality and maximally achievable data rate. The



Figure 2.19: a S-parameters of the PAM-4 Combiner. Ports 1 and 2 are the inputs Port 3 is the differential output. b Group Delay for the gain paths [1] ©IEEE.

inputs and outputs are very well matched up to 60 GHz. A worse output match is expected since the design uses  $60 \Omega$  load resistances to slightly boost the output voltage swing.  $S_{31}$  and  $S_{32}$  are again higher than the fully EM-simulated results. However, they fall well between the nominal and best-case transistor corners. In Figure 2.19b, the group delay responses for the gain paths are plotted. They show the resonance effects at low frequencies and some measurement artifacts around 40 GHz. The variation is only ±3 ps from 10 GHz to 67 GHz. However, above the LSB roll-off frequency, the group delay starts to deviate, leading to a difference of up to 3 ps.

Eye diagrams are measured using an SHF 12105 A bit pattern generator with two C603 B multiplexers. The generator produces four NRZ bit streams, which are multiplexed into two streams used as input signals. The input amplitude is set to 245 mVpp and no pre-emphasis is used. The output is measured single-ended with an Agilent DCA 86100C mainframe with 70 GHz sampling heads. Probe connections and DC blocks are not de-embedded. Figure 2.20a shows measured eye diagrams for 80 GBd. The amplitude of more than 1 V<sub>pp,SE</sub> or 2 V<sub>pp,diff</sub> is reached. Figures 2.20b and 2.20c show the ability to pre-distort the PAM-4 level spacing by adjusting the bias in the current combiner stage. This demonstration at 80 GBd is clearer to see than at 100 GBd. Figure 2.21a presents the targeted 100 GBd or 200 Gb/s operation. The eyes for 80 and 100 GBd are clearly open. An asymmetry in the eye openings can be observed.



Figure 2.20: a) Nominal single-ended eye diagram at 80 GBd.b)and c) Eye diagrams showing level adjustment capabilities at 80 GBd [1] ©IEEE.



Figure 2.21: Output eye diagram at a) 100 GBd [1] ©IEEE. and b) 120 GBd.

This has a twofold origin. First, the relative time of arrival of the input streams is hard to control in the measurement setup used here. This relative time is mainly determined by the multiplexer clock. However, slight differences in the cables from clock generation to the multiplexers can lead to noticeable timing delays. Even phase-matched cables can have a few picoseconds of difference in the propagation delay after some length. In our case, phase shifters or, more precisely, a variable time delay would be needed for the multiplexer clocks to align them properly. At the time of the measurements, this was not available in the lab. Secondly, the LSB roll-off causes a group delay offset between the two paths, eventually resulting in the same timing problem. 120 GBd is shown in Figure 2.21b. This is now very noisy with degraded eye openings due to the roll-off of the LSB, the group delay difference in the circuit, as well as not perfectly matching delay responses in the measurement setup. Eye openings can still be observed.

# 2.3 Conclusion

In this chapter, driver circuits with integrated PAM-4 power combination are designed and tested. They work on the principle of current combining in the load, adding two NRZ streams into one PAM-4 data stream. Two variants of a distributed combiner in IHP's SG13S technology are presented. The focus was on a large bandwidth where methods of reducing the amplifier cell input capacitance are needed. Emitter degeneration and a capacitive divider are tested for this purpose. Their chip area is 0.51 mm<sup>2</sup> each, while consuming only 118 mW and 122 mW for the CDIV and ED variants, respectively. Both variants exhibit bandwidths greater than 80 GHz and low group delay variations, while the variant with Emitter degeneration exhibits a superior S-parameter response. Eye diagrams up to 70 GBd were measured, but the output amplitude and achievable data rate stayed behind the expected results. With their single-ended design and need for bias-tees, the distributed drivers are relatively complex to integrate into a system. Therefore, a different driver version based on a differential output stage was also evaluated. This driver was fabricated in IHP's SG13G2 technology. Since two differential data streams drive the output stage, a broadband balun was integrated to reduce the test and measurement effort. Including on-chip baluns and pre-drivers, the total chip area is only 0.6 mm<sup>2</sup>. It consumes a total 315 mW from several supply voltages. The circuit showed some issues due to violations in symmetry, leading to a degraded frequency response as well as low-frequency resonances in the biasing. The high data rate helps to minimize the impact of the low-frequency resonances. For future designs, a more strict symmetry will be used, minimizing these biasing problems. The supply resonance cannot be avoided since dampening this resonance will always impact the low-frequency response. Single-ended signals are always present in the balun, resulting in AC currents in the supply. One solution employed in the receiver in Chapter 3 is a voltage regulator stabilizing the supply of single-ended stages at the cost of circuit area and power consumption. Nevertheless, 2 Vpp,diff at 100 GBd (200 Gb/s) was measured proving the capability of this type of driver circuit.

Table 2.2 compares the drivers designed in the scope of this work with recent publications for circuits featuring a similar functionality. All drivers in this work are very compact, especially when compared to the design in 0.25 µm InP DHBT [NWJ<sup>+</sup>18]. However, the design in [NWJ<sup>+</sup>18] was measured to an

|                             | DPC<br>ED | DPC<br>CDIV | Diff   | [NWJ <sup>+</sup> 18] | [RLA <sup>+</sup> 17] | [IPGM22] |
|-----------------------------|-----------|-------------|--------|-----------------------|-----------------------|----------|
| Tech                        | 130 nm    | 130 nm      | 130 nm | 0.25 µm               | 130 nm                | 130 nm   |
| Tech.                       | SiGe      | SiGe        | SiGe   | InP DHBT              | SiGe                  | SiGe     |
| f <sub>T</sub><br>(GHz)     | 250       | 250         | 300    | 460                   | 300                   | 300      |
| Data                        |           |             |        |                       |                       |          |
| Rate                        | 80        | 140         | 200    | 256                   | 100                   | 96       |
| (Gb/s)                      |           |             |        |                       |                       |          |
| Input                       | SE        | SE          | SE     | SE                    | diff                  | diff     |
| Output                      | SE        | SE          | diff   | diff                  | diff                  | diff     |
| V <sub>out,pp</sub><br>(mV) | 600       | 150         | 2000   | 900                   | 4000                  | 2100     |
| P <sub>DC</sub><br>(mW)     | 122       | 118         | 315    | 2000                  | 390                   | 197      |
| Eff.<br>(pJ/b)              | 1.53      | 0.84        | 1.58   | 7.81                  | 3.90                  | 2.05     |
| Area<br>(mm <sup>2</sup> )  | 0.51      | 0.51        | 0.6    | 4                     | 0.9                   | 0.945    |

Table 2.2: Comparison of analog PAM-4 combiner and driver circuits.

even higher data rate. For SiGe-based circuits, the differential driver from this chapter has the highest data rate. With an efficiency of 1.58 pJ/b, it is also very competitive.

Despite the distributed drivers' reduced speed, all drivers designed in this work's scope can drive SOH modulators. The differential driver achieved the goal of 100 GBd PAM-4, making it the better choice for integration in a transmitter prototype.

# 3 Linear Transimpedance Amplifier for Short Reach PAM-4 Receivers

Parts of the following section have been published in [2].

As discussed in 1.1, the central purpose of a transimpedance amplifier is to transfer a current into an output voltage. In our case, this is a photocurrent generated by a photodiode. Ideally, the current-to-voltage conversion is done with the least added distortion and noise. In conjunction with an ever-increasing demand for higher data rates and, thus, bandwidths, this leads to several design trade-offs and choices. Modern TIAs additionally include a multitude of design features like supply voltage and temperature insensitive biasing, bandwidth or peaking control, manual or automatic gain adjustment, differential outputs with offset cancellation [ANK<sup>+</sup>18]. These features make it possible to use them under different operating conditions but require a multi-stage approach. Including a multitude of supporting control circuitry. Figure 3.1 shows an example of such a system. The current created by a single photodiode is converted to a voltage via a single-ended transimpedance stage. A scaled version of this input stage is used as a replica for biasing and as a constant voltage for



Figure 3.1: Multi-stage TIA featuring a TI stage and differential voltage amplifiers for further amplification. [2] ©IEEE

the single-ended to differential conversion. A variable gain amplifier acts as a balun, and the differential signal is further amplified by a second variable gain amplifier (VGA). An output driver is necessary for the conversion to a standard 50  $\Omega_{\rm SE}$  or 100  $\Omega_{\rm diff}$  environment. Feedback loops to fix the operating point of the input stage by sinking any  $I_{\rm DC}$  and for automatic gain control (AGC) using a power detector at the output ensures autonomous operation for a varying input power.

Since most performance parameters require making compromises and tradeoffs, the discussed designs in this chapter follow a specific project and application. The target in this project was a 100 GBd TIA with a maximum transimpedance of more than  $4 k\Omega$ . Approximately 20 dB gain tuning range is necessary to accept 10 dB optical input power variation while keeping the output swing constant. Power consumption was targeted to stay below 200 mW. An additional design goal was to circumvent the use of area-consuming inductors. The design was implemented using IHP's SG13G2 technology [HBB<sup>+</sup>10].

The input or transimpedance stage interfaces the photodiode and is most dominant in the overall performance. The noise behavior is primarily dominated by the first stage of an amplifying chain as stated by the Friis formula for noise [Fri44]. Therefore, the design of this input stage is discussed in more detail in Section 3.1. In higher modulation formats like PAM-4 amplitude distortions or more general non-linearity provide another source of error in the receiver chain. Section 3.2 focuses on linearity constraints in the entire amplification chain. For the operation, some control loops and circuits are needed and discussed in Section 3.3.

# 3.1 High-Speed Transimpedance Input Stage

The photodiode generates a current proportional to the optical input power:

$$i_{\rm PIN} = RP. \tag{3.1}$$

*R* is the responsivity depending on the wavelength and quantum efficiency. This current has a direct current (DC) portion representing the average input power:

$$I_{\rm DC} = RP_{\rm avg}.$$
 (3.2)

This  $I_{DC}$  should not flow into the input stage. Therefore, current sinks are used to get rid of this signal content. It is common to use NMOS transistors for this purpose if available. HBTs are not ideal for the  $I_{DC}$  sink due to their current shot noise and amplified base resistance thermal noise. The alternating current (AC) current amplitude is defined by the extinction ratio ER:

$$ER = \frac{P_{\text{max}}}{P_{\text{min}}}$$
(3.3)

$$=\frac{i_{\max}}{i_{\min}},\tag{3.4}$$

a measure of how well the light can be turned off. This results in the modulation amplitude

$$i_{\rm AC,pp} = 2RP_{\rm avg} \frac{\rm ER - 1}{\rm ER + 1}.$$
(3.5)

When designing the input stage, an exact photodiode model is needed since the PD is usually not matched to a typical 50  $\Omega$  but instead directly connected and a part of the frequency response of the TIA. The most simplified photodiode model is a current source for the photocurrent with a parallel capacitance due to the diode's PN-junction. This model, however, does not capture the behavior accurately enough for high-speed applications. A more complete model is shown in Figure 3.2a. This example, in particular, is designed to represent the behavior of bare die diodes, which are co-packaged with the TIA die. The loss  $R_p$ , inductance  $L_p$ , and parasitic capacitance  $C_{pad}$  of the wiring, including the pads of the diode's die, are represented in this model. With this, key performance metrics like transimpedance, input referred noise, and linearity can be simulated. The parasitics add a low-pass characteristic to the diode. However, the frequency response of the optical-to-electrical conversion is not fully modeled in this approach, and, therefore, a complete OE channel simulation is impossible. A structure presented in Figure 3.2b is used to include the transit-time delay of the carriers in the PD [GTH<sup>+</sup>03, WYW<sup>+</sup>22].



Figure 3.2: Two flavors of more complete photodiode models. a) PD model for AC and noise simulations. b) PD model to be used for OE S-parameter simulations.

This characteristic is introduced by replacing the ideal current source with a low-pass formed by  $R_1, R_2, C_1$ , and a voltage-controlled current source with the transconductance G.

There are multiple options for creating a transimpedance stage. Figure 3.3 shows some examples using shunt-shunt feedback topologies. This topology is widely used for high-speed and low-noise transimpedance front ends due to some big advantages, mainly the low input and output impedances and low-noise performance. Figure 3.3a is the ideal representation based on an idealized main amplifier used for the derivation of these advantages. Due to the feedback the input impedance neglecting the amplifiers  $Z_{in,A}$  is given by:

$$Z_{\rm in} = \frac{R_{\rm f}}{A_0 + 1}.$$
 (3.6)

To a first degree, the shunt feedback stage has two poles due to two reactive elements: the input and output capacitances of the stage. The input capacitance includes the photodiode capacitance, amplifier input, and parasitics. The load, amplifier output, and parasitics determine the output capacitance. With the load resistance  $R_{\rm C}$ , the main amplifier will have a gain of  $A_0$  and a time-



Figure 3.3: Shunt-feedback transimpedance (TI) front-ends. a) Generalized shunt-shunt feedback stage. b) Common-emitter shunt-shunt feedback stage. c) Cascode amplifier within TI stage.

constant  $\tau_A$ . The transimpedance of this second-order system is then given by [Abr82]:

$$Z_{\rm T}(s) = -Z_{\rm T,0} \frac{1}{\frac{1}{\omega_n^2} s^2 + \frac{1}{\omega_n Q} s + 1},$$
(3.7)

with

$$Z_{\rm T,0} = \frac{A_0}{A_0 + 1} R_{\rm F},\tag{3.8}$$

$$\omega_n = \sqrt{\frac{A_0 + 1}{\tau_A R_F C_{\rm in}}},\tag{3.9}$$

$$Q = \frac{\sqrt{\tau_{\rm A} R_{\rm F} C_{\rm in} (A_0 + 1)}}{\tau_{\rm A} + R_{\rm F} C_{\rm in}}.$$
(3.10)

This two-pole model makes the analysis independent of the type of amplifier circuit. The assumption of a dominant pole defining the time constant of the amplifier core (e.g. created by  $C_{out}$  and  $R_{out}$ ) has to be valid for this simplification, resulting in the two-pole model. Through the choice of amplifier and component values,  $\omega_n$  and Q and thus the response  $Z_T(s)$  can be adjusted. In order to reduce inter-symbol interference and jitter, one aims for a Bessel-Thomson response offering a linear phase or constant group delay. For details see Appendix A. In order to get the desired closed-loop Bessel response with a damping ratio of  $\zeta = \sqrt{3}/2$ , the open-loop poles have to have a relationship of approximately

$$\omega_{p2} = (3A+1)\omega_{p1}, \tag{3.11}$$

where A is the DC gain,  $\omega_{p1}$  the pole frequency associated with the input and  $\omega_{p2}$  the pole at the output of the main amplifier [Säc17]. In bipolar technologies the most used solutions replacing the idealized main amplifier are presented in Figures 3.3b and 3.3c, a common-emitter amplifier or a cascode structure.

# 3.1.1 Design of the Common-Emitter Shunt Feedback Stage

For this project, a common-emitter shunt-shunt feedback stage was selected. Figure 3.4 shows a simplified schematic of the input stage. A source simplifies the  $I_{DC}$  current sink. Its detailed design is discussed further in Section 3.2.2. The main amplifier ( $Q_1$ ) uses a 7-finger common-emitter (CE) amplifier. In order to raise  $V_{CE,Q1}$  and thus  $V_{out,DC}$ , a common-collector (CC) buffer ( $Q_2$ ) is placed in the feedback path. This raised voltage is necessary to directly



Nf: Number of emitter fingers with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$  each.

Figure 3.4: Simplified schematic of the CESF transimpedance stage.

connect to the differential post-amplifiers where some headroom is needed for the current sources. It also helps to avoid saturation in the input stages transistor. However, with  $V_{CE,Q1} \approx 2V_{BE} \approx 1.8$  V this is above the openbase collector emitter breakdown voltage (BVCEO = 1.6 V). The operating point is indeed starting to generate some weak avalanche current flow from the collector to the base. This is indicated by a reduced but still positive external base current, compared to the expected base current at lower  $V_{CE}$ . However, since the base is never open and the current generated by this onset of breakdown is not creating runaway effects, this setup was still considered safe to use. An emitter degeneration resistor was deliberately not chosen since it would reduce gain and increase the input impedance given in Equation (3.6). It would also add to the total noise generated. With the degeneration, however, a potential transfer function optimization and, thus, flattening the group delay response could be achieved [KHSE10]. The noise generated by this stage must be considered to understand some tradeoffs. A simplified analysis of the input-referred noise of a common-emitter shunt feedback transimpedance stage is given by [Säc17]:

$$I_n^2(f) = \underbrace{\frac{4k_BT}{R_F}}_{R_F \text{ thermal noise}} + \underbrace{2qI_B + \frac{4k_BTR_{bb}}{R_F^2} + 4k_BTR_{bb} (2\pi C_{PD})^2 f^2}_{Base \text{ current shot and parasitic base resistance thermal noise}}$$

+ 
$$\frac{2qI_{\rm C}}{(g_{\rm m}R_{\rm F})^2}$$
 +  $2qI_{\rm C}\left(\frac{2\pi(C_{\rm D}+C_{\pi}+C_{\mu})}{g_{\rm m}}\right)^2 f^2$ . (3.12)

Collector current shot noise

The individual contributors are marked in the equation. It is important to note that the influence of the base resistance and the collector current have components that scale by  $f^2$ . This simplified equation neglects the noise generated by the emitter resistance and the onset of avalanche collector-base current due to the high  $V_{\text{CE}}$ .

With Equations (3.7), (3.11) and (3.12) the design constraints and the procedure can be summarized:

#### • Input transistors operating point.

A higher bias current will lead to more noise. Especially the base current shot noise is of concern.  $I_{\rm C}$ 's shot noise is scaled by  $\frac{1}{g_{\rm m}^2}$  and the transconductance increases with current. Therefore, the increasing collector current does not affect as much. In this design,  $V_{\rm BE} = 0.91$  was selected to ensure linearity even at high input amplitudes. This value is on the higher side, but the transistor is still not at peak- $f_{\rm T}$ .

#### • Input transistor size.

Increasing the transistor size similarly affects the noise currents as raising  $V_{\text{BE}}$ . However, the important difference is that a larger emitter area will result in smaller parasitic resistances. Additionally, the device capacitances rise with the transistor size. This has to be traded off



Figure 3.5: Simulated transimpedance and input-referred noise current density (irnd) of the final input stage design.

against the bandwidth and frequency response. A transistor with seven fingers of  $A_{\rm E} = 70 \,\mathrm{nm} \times 900 \,\mathrm{nm}$  was chosen. Mainly due to the reduction in base resistance.

## • Choose R<sub>C</sub> and R<sub>F</sub>.

Both resistances are then used to adjust the poles of the system. Simulating the open-loop response and poles is helpful for behavioral understanding, but finally, the closed-loop response has to be optimized. After optimization the values  $R_{\rm F} = 260 \,\Omega$  and  $R_{\rm C} = 70 \,\Omega$  were selected.

## • Common-collector buffer size and current.

This buffer should be sized as small as possible to reduce parasitics. However, two things have to be considered: Collector current and size of this transistor need to be sufficient to not create distortions at high signal levels. The parasitic base resistance plays an essential role in this transistor's noise contribution.

Most design steps are affecting each other. So, an iterative process cycling through schematic, layout, parasitic extraction or EM-simulation, and adjustments is necessary.

This design adds an additional input inductance  $L_{in}$ . It provides noise matching and is implemented simply by the connection from the pad to the input transistor. In Figure 3.5, the transimpedance and input-referred noise current density is shown. The effect of  $L_{in}$  together with PD-TIA connection is seen as the dip in the noise around 75 GHz. A transimpedance of 250  $\Omega$  with a



Figure 3.6: a) Simulation setup to compare main amplifier topologies. b) Pole comparison: Cascode vs. common-emitter configuration in the main amplifier.

bandwidth of 79 GHz is realized. The average input-referred noise current density is  $13.3 \text{ pA}/\sqrt{\text{Hz}}$ .

Using a cascode topology as the main amplifier would reduce the input capacitance by reducing the Miller capacitance. However, in this case, a cascode was found not to provide any benefit since the output associated  $\tau_{RC}$  and the resulting second open-loop pole is not scaled accordingly. This is because interconnect parasitics and the following stages are independent of the main amplifier topology. As a result, the damping ratio in the closed-loop transfer function is reduced, creating excessive peaking and group delay variation towards  $\omega_n$ . A simulation setup shown in Figure 3.6a is used to simulate the poles of the transimpedance transfer function. Both topologies use HBTs with  $A_e = 4 \times 70 \text{ nm} \times 900 \text{ nm}$ ,  $V_{BE} = 0.9 \text{ V}$ , and  $V_{CE,common-emitter} = 2V_{BE}$ ;  $V_{\text{CE,cascode}} = V_{\text{BE}}$  for each transistor. Figure 3.6b shows the main poles of these transimpedance amplifier designs to visualize this issue. In [KBE21], this problem is mitigated by adding a parallel capacitor to  $R_{\rm F}$ , effectively negating the advantage of a cascode in broadband amplifiers and adding an additional point where device variation could create a problem. Reducing the output  $\tau_{\rm RC}$ could be realized only by reducing the load resistance, sacrificing open loop gain, and increasing the input impedance. When  $R_{\rm C}$  drops too significantly, the entire stage's output voltage is limited by the linear voltage swing at the output. Therefore, this design selected a common-emitter shunt feedback stage with a common-collector buffer to boost the output DC level. In designs where a CC buffer could be used for making the load capacitance or very large photodiode capacitances need to be interfaced, a cascode might, however, be beneficial to reduce  $C_{in,total}$ .

# 3.2 Linearity Considerations and Post-Amplifiers

As stated before, an excellent low-noise performance needs a high transimpedance gain in the first stage. However, this can already compress the stages following the TI stage at high input powers. Since an approximate 10 dB optical power range will result in an 20 dB current input range, meeting a high linearity and low noise becomes a big challenge in designing a receiver. A fixed transimpedance input stage has the drawback that its output amplitude variation can be substantial with varying input power. In lower bandwidth applications, variable transimpedance stages through modifying the feedback resistance and load resistance are employed [VMD<sup>+</sup>22, AVI<sup>+</sup>23]. When the input-referred noise performance is not as critical for higher input powers, the transimpedance is lowered to lessen the linearity demands on the post-amplifiers. However, these tuning options are challenging due to the additional parasitic capacitances and an increased control circuit complexity. In this single-ended design, changing the load resistance would also modulate  $V_{\rm CE}$  if no additional measures are implemented. This modification would also need to be implemented in the replica. Differential designs are beneficial here since they could modify the differential load resistance connected between the two outputs. This load is then free of a DC bias current and thus removes the  $V_{\rm CE}$  dependence on this setting.

## 3.2.1 Variable Gain Amplifiers

Broadband variable gain amplifiers are typically based on current-steering type structures, sometimes in a Gilbert-cell type fashion. The first VGA in this design was selected as a current steering type. Its schematic is shown in Figure 3.7. The current steering type VGA has the significant advantage that the capacitance at the output node is smaller than when using a Gilbert-Cell with the cross-connected common-base transistors doubling the collector to



Nf: Number of emitter fingers with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$  each.

Figure 3.7: Schematic of the first current steering VGA used as a balun.

substrate capacitance at the output node (see Figure 3.9). The gain is adjusted by  $V_{a+} - V_{a-}$  where the bias current is directed between the common-base transistor in the signal path or the dummy pair.

Since the voltage levels of the transimpedance stage do not allow for a buffer between the TIA and this VGA, the input impedance, especially the capacitance, of this balun directly loads the TI-stage. This impedance has to be considered during the design of the poles in  $Z_{\rm T}$ .

This VGA acts as balun since  $v_{out} = A_v (v_{TIA} - v_{Replica})$  where the AC amplitude  $v_{Replica}$  is ideally zero. The replica is a scaled version of the TI stage with no input signal. It is used for biasing, and since its output impedance is low, it can also be used as a broadband low-impedance input for this balun. To filter any high-frequency noise generated by the replica, a shunt capacitance of  $\approx 1 \text{ pF}$  is placed at its output. Otherwise, this replica would generate uncorrelated high-frequency noise, which adds to the total input-referred noise. A two-finger device was selected since it offered a reasonable linearity and gain with acceptable loading of the TI stage and power consumption. Additionally, emitter degeneration is necessary to improve the bandwidth and prevent the common-emitter transistors from generating too much distortion at high input



Figure 3.8: Gain and THD of the balun vs single-ended input signal swing. The gain setting is controlled to keep the output amplitude of the full chain, including the second VGA constant.

levels. Unfortunately, this emitter degeneration increases the amplitude and phase imbalance at the output. This is why only a moderate value of  $24 \Omega$ was chosen. At the output, common-collector transistors with resistive biasing serve as level-shifting buffers to the next stage. One disadvantage of a VGA based on the structure in Figure 3.7 is the modulation of the output DC potential with the gain setting. The output potential starts to rise when reducing the gain and thus the current through the in-path common-base transistors and load resistances. The following stages need to be able to handle this shift. This is fine for the here-used current-source-biased differential stages as long as the base-collector diode is not starting to become forward-biased. Figure 3.8 shows the selected gain and THD with respect to the input voltage swing. The gain setting is controlled to keep the output amplitude of the full chain, including the second VGA constant. It can be seen, that the tuning range of the first VGA is limited to about 5 dB so as not to create much distortion. The main source of the non-linearity is the decreasing  $I_{\rm C}$  in the in-path common-base transistors resulting in a too low operating point for the signal swings present.

A second VGA is implemented to enhance the receiver system's total dynamic range. This time, a Gilbert-Cell-type circuit is used. It is shown on the left side of Figure 3.9. This Gilbert-Cell has a higher output capacitance. Therefore, a lower load resistance is used to keep the overall bandwidth. This stage also features a variable capacitor bypassing a stronger emitter degeneration. The larger  $R_E$  is needed since the balun's limited tuning range creates a large voltage swing at the input of this second VGA. This makes it necessary to operate this VGA with a gain <0 dB for most of the gain settings. With bypass variable



Nf: Number of emitter fingers with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$  each.

Figure 3.9: Schematic of the second VGA based on a Gilbert cell, including frequency peaking and output driver.

capacitor at  $R_E$  the frequency response can be slightly adjusted for more or less gain peaking. The transistor sizing and operating currents are similar to the balun. The second VGA is also decoupled with common-collector buffers. The linearity in both variable gain amplifiers could be improved by using a higher bias current, thus spending more power. The achieved performance was decided to be sufficient, so the power consumption was kept as low as possible.

The gain setting voltages are derived for both VGAs from a resistive voltage divider. An example is shown in Figure 3.10 for the balun. One resistive divider is used for providing a constant  $V_{BE}$  for the in-path common-base (CB) transistors. A second divider generates a similar voltage for the bypass transistors. Through loading this second divider via an NMOS transistor,  $V_{a-}$  can be lowered.  $V_{a+} - V_{a-}$  then determines the ratio at which the total collector current is steered between in-path and bypass transistors controlling the gain. This type of control voltage generation is compact and very immune to process variation of the sheet resistance since it only depends on the ratio of values. The exact control voltage value is not too critical since a current source biases the common-base transistors. By adjusting the resistance ratios, the


Figure 3.10: Gain control voltage generation for the first VGA. The second VGA follows the same structure with adjusted resistance values.

gain tuning range can be manipulated, and thus, all VGAs use the same  $V_{\text{gain}}$  but feature a different range and  $A(V_{\text{gain}})$  behavior. The bypass path via the NMOS transistor is sized so the gain can not be reduced below a certain point, preventing a sub-optimal operating point in terms of non-linear distortions in the VGAs.

Following the second VGA, a differential cascode serves as a differential driver for standard 100  $\Omega_{\text{diff}}$  environments. Again, this driver features emitter degeneration. The bias current is selected so only little distortion is generated at the nominal output swing of 400 mV<sub>pp,diff</sub> while keeping the power consumption low. The schematic of this driver is also shown in Figure 3.9.

### 3.2.2 Diode Based Shunt At The Input

A single-ended transimpedance stage for direct detection needs a current sink path to ground for DC photocurrents (Figure 3.1). The goal is to keep the output resistance of this current sink high in order to avoid a negative impact on the transfer function. This current sink can be implemented by a transistor and scaled so that the output resistance and capacitance have negligible effects on the input impedance. In BiCMOS, a metal-oxide-semiconductor (MOS) transistor is beneficial since it provides only a capacitive load for the controlling circuitry. Due to its more flexible scaling options in, for example, in IHP's



Figure 3.11: TIA input stage with current source to ground for  $I_{ph,DC}$ .

SG13G2 technology the impact on the input capacitance can be smaller than by using an HBT transistor. Also, the noise this current sink adds is smaller when using NMOS transistors, where mostly the channel noise is relevant. An HBT exhibits shot noise and amplified thermal noise due to the parasitic base resistance. The more substantial flicker noise in MOS transistors is of little concern in this application.

However, a variable transimpedance in the first stage is beneficial to ease the constraints on the VGAs. In this design, a combination of a controllable NMOS transistor and a diode-connected HBT is used in the current sink path. Figure 3.11 shows this added at the input stages schematic. The idea behind this setup is not to have an ideal current sink but to provide a shunt path to ground for the AC signal. This is possible since DC and AC are linked for a constant extinction ratio. High AC swings coincide with a high  $I_{DC}$  (see Equations (3.2) and (3.5)). Therefore, the added shot noise of the HBT can also be accepted. At low optical input powers where noise matters the most, only a little additional shot noise is generated. If an NMOS transistor is sized to enter the triode region for increasing  $I_{DC}$ , its  $r_{DS}$  drops and it can be used as a shunt device. In this design, the diode improves on this application by pushing  $V_{\rm DS}$  down even further and additionally masking the MOS capacitance from the input. Figure 3.12a shows  $C_{\text{shunt}}$  and  $R_{\text{shunt}}$  for the sink with and without the diode connected HBT in series to the controlled MOS transistor.  $C_{\text{shunt}}$  and  $R_{\text{shunt}}$  represent an equivalent RC parallel connection. It can be seen that the



Figure 3.12: a) Shunt capacitance and resistance of the modified current sink. b) Total harmonic distortion after the fist VGA (balun) for a constant input AC swing and varying  $I_{\text{DC}}$ .  $V_{\text{gain}}$  is set to have a constant  $V_{\text{out}}$ .  $i_{n,\text{RMS}}$  of the first stage with and without the diode is also shown.



Figure 3.13: Equivalent circuit to derive admittance  $Y_{\rm D}$  looking into the diode.

capacitance is at the worst point  $\approx 2.5$  times smaller, and  $R_{\text{shunt}}$  is approximately half when using the diode.

This shunt resistance helps to lower the distortion in the entire signal chain. Figure 3.12b shows this exemplary for the THD after the first VGA (balun). For this simulation, the AC swing is kept constant, and  $I_{DC}$  is increased while  $V_{gain}$  is set to get a constant  $V_{out}$ . The improvement with  $I_{DC}$  is visible, and it is further investigated in Section 3.4. In the same plot, the first stage noise is also shown. As expected, it rises with  $I_{DC}$ , but the proposed setup outperforms the reference design without a diode.

The capacitance of this diode even becomes negative for  $I_{DC} > 1 \text{ mA}$ . To explain this, the equivalent circuit in Figure 3.13 is used. A parasitic base resistance  $r_{bb}$  is added to the most simplified hybrid- $\pi$  equivalent circuit mod-



Figure 3.14: a) Exemplary values, extracted for a single SG13G2 npn HBT. b) Comparison of derived model with and without  $r_{ee}$  to VBIC transistor model.

els. The parasitic collector resistance will be ignored going forward since it does not affect the results.  $r_{ee}$  can be ignored for a general investigation of the behavior.  $i_x$  and  $v_x$  are test voltages and currents to arrive at

$$Y_x = \frac{i_x}{v_x} \tag{3.13}$$

$$Y_{\rm D} = \frac{\frac{I_x}{r_{\rm ce}}}{Y_x + \frac{1}{r_{\rm ce}}}$$
(3.14)

$$\frac{i_x}{v_x} = \frac{g_{\rm m} + \frac{1}{r_{\pi}} + j\omega C_{\pi}}{\frac{r_{\rm bb}}{r_{\pi}} + 1 + j\omega C_{\pi} r_{\rm bb}}.$$
(3.15)

Further simplification is done by assuming  $r_{bb} \ll r_{\pi}$  and  $g_{m} \gg \frac{1}{r_{\pi}}$ . Then, separating into real and imaginary parts:

$$Y_x = \frac{g_{\rm m} + \omega^2 C_\pi^2 r_{\rm bb} + j\omega C_\pi (1 - g_{\rm m} r_{\rm bb})}{1 - (\omega C_\pi r_{\rm bb})^2}.$$
 (3.16)

(3.16) shows that the imaginary part of the admittance becomes negative if  $g_{\rm m}r_{\rm bb} > 1$ . Figure 3.14 presents some exemplary extracted values for a single HBT with  $I_{\rm c} = 1$  mA and the match between this simple behavioral model and the full VBIC device. Solving for  $Y_{\rm D}$  improves the match to the exact model but does not give more insight into the cause of the negative capacitance.



Figure 3.15: Simulated bandwidth and transimpedance for the complete chain, when the  $V_{\text{gain}}$  is set to have a constant  $V_{\text{out}}$  and  $I_{\text{in,DC}} = 1.05i_{\text{in,AC,pp}}$ .

This negative capacitance may be beneficial. However, as we decrease the input resistance with this shunt and decrease the input capacitance simultaneously, the time constants of the transfer function do not match anymore. Altogether, this leads to an additional frequency peaking with increasing input power. However, as can be seen in the measurements later (Figure 3.26b), the bandwidth of the VGAs drops when their gain is reduced. This bandwidth drop and the additional peaking in the input stage counteract and provide a more constant overall bandwidth shown in Figure 3.15.

# 3.3 Design of Control Loops and Control Circuits

Since TIAs for optical receivers are more complex systems than, e.g., a PAM-4 driver and combiner circuit from the previous chapters, they almost always need control circuitry or feedback loops to stabilize the operating point. In this section, the loops and their constraints are discussed for the TIA designed in the scope of this work.

### 3.3.1 First Stage Biasing

As mentioned in Section 3.1, the single-ended input stage is self-biased. After power-up, the circuit eventually reaches a stable bias point if sized correctly. However, any incoming DC photocurrent may disturb this bias point by flowing through the feedback resistor and creating additional voltage drops, thus



Figure 3.16: Distortion created by the output stage with DC imbalance.



Figure 3.17: Offset compensation loop. a) Block diagram and b) loop gain and phase simulations.

changing the input transistor's  $V_{\text{BE}}$ . As a result, the bias point depends on the incident optical power. Along with the bias point changes the DC potential after the TI stage and the following single-ended to differential converter will exhibit a DC offset in the differential output. This, in turn, will be amplified throughout the chain. This offset leads to additional distortion in the differential stages. Up to the point where the outputs are stuck to the supply rail. As an example, this effect of a DC offset in a differential amplifier is shown just for the output driver in Figure 3.16. The simulated THD is doubled as the offset rises from 30 mV to 80 mV, creating a severe output signal quality degradation. The effect is not as pronounced below 30 mV offset. In this design, the output DC potentials of the TIA and replica outputs are used for  $I_{\text{DC}}$  compensation. This  $I_{\text{DC}}$  compensation loop doubles as offset cancellation since it enforces equal DC potentials going into the differential post-amplification chain. Figure 3.17a shows the relevant receiver block diagram section, including the HBT-diode/MOS type current sink. Since this loop, if it is too fast by hav-

ing too much gain at higher frequencies, would also cancel the low-frequency content of a use signal, an operational amplifier (opamp) with a dominant first pole is used. The opamp uses MOS field-effect transistors (MOSFETs), the detailed schematic is appended in Appendix C.1. Loop gain and loop phase simulation results are plotted in Figure 3.17b. The 0 dB gain mark is crossed in the kHz region, providing a high pass corner frequency in that range. Due to the dominant first pole, this loop is relatively slow but has a phase margin of 90° in a safely stable region.

Voltage regulators supply these first self-biased stages to enable a known operating point for a range of  $V_{cc}$  voltages. The regulator uses a feedback loop around a large p-type MOSFET (PMOS) pass transistor. The regulator schematic is presented in Figure 3.18a. The resistors  $R_1$  and  $R_2$  are used to adjust  $V_{out}$  with respect to  $V_{ref}$ . These regulators' big challenges are line regulation and supply rejection for this broad frequency range. Due to the large pass transistor forcing a limited bandwidth of the control loop and limited space for bypass capacitors, the PSRR and line regulation are degraded between 1 MHz to 1000 MHz. Figure 3.18b visualizes this problem in this design, where the output impedance rises to about 50  $\Omega$ . The pass transistor, load capacitance, the operational amplifier, and expected load current are carefully balanced to ensure stability in this frequency range, proven by a PSRR below 0 dB. It is beneficial for efficient area usage to employ a combination of metalinsulator-metal, metal-oxide-metal capacitors, and MOS transistors used as capacitors. For wiring simplicity, a second regulator is used for the self-biased replica.

### 3.3.2 Automatic Transimpedance Control

For the use of more complex receiver systems, automatic gain control is preferred to provide a constant amplitude for subsequent sampling and digitization. Figure 3.19 shows the general working principle. The output signal is measured by a power detector and fed back via a gain control loop to set the VGAs' gain. A switch can be used to set the gain manually. For a detailed schematic of the switch and operational amplifier see Appendices C.1 and C.2. This switch was implemented as a safety feature if the AGC was not working correctly and, more importantly, to enable a defined S-parameter characterization. Without the manual override, the AGC could, for stronger input test powers, reduce the



MOS transistors are high-voltage variants and sizes are given in µm.



Figure 3.18: a) Schematic of the voltage regulator used for the TI stage. b) Regulator source impedance and PSRR.



Figure 3.19: Block diagram of the automatic gain control loop, including the switch for manual control.



Figure 3.20: Diode-based power detector with CS follow-up amplifier. a) schematic diagram and b) output DC voltage versus input RMS voltage.

gain and enhance it again if the frequency response begins to drop, creating an overestimated bandwidth. Measuring the bandwidth versus gain settings is not possible with the AGC active. The switch is implemented using two connected CMOS transmission gates with inverted control logic disconnecting either the internal or the external gain setting voltage.

#### **Output Power Detector**

Diode-based power detectors can be used in this application. The diodes act as rectifiers, providing a DC voltage to a first degree proportional to the input power. To save power and area, it was chosen over a self-mixing Gilbert-Cell used in some literature examples, which also produces DC content proportional to the input power [ANM<sup>+</sup>16]. The output of the rectifying diodes also has a high harmonic content, which has to be filtered by a low pass filter. The sensing diodes are designed as diode-connected HBTs and are placed directly next to the output transistors. They are biased at very low current densities of  $I_{\text{bias}} = 27.5 \,\mu\text{A}$  per diode. To boost the output signal, a common-source buffer amplifier is used. By setting its gain, the responsivity is adjusted. This responsivity is part of the loop and influences the loop gain. In this application, the linearity of the detector is not critical. It is not used as a tool to measure different power levels but to control the power to be always at the same level.



Figure 3.21: Envelope simulation of the AGC loop.  $I_{DC}$  is kept constant at 100 µA for this simulation.

However, this detector has some drawbacks. Its reading is very dependent on the output's DC potential. To circumvent this issue, a dummy detector only fed by the output DC voltage could be used. The actual power reading is then determined as the difference between those two readings. This structure also reduces the dependence on process variations. Since the base-emitter diode is strongly temperature-dependent, this detector output also has a temperaturedependent responsivity. This can only be solved by temperature-adjusted biasing.

#### **Control Loop Design**

The control loop cannot be analyzed by a simple small signal stability analysis as done for the DC offset loop. To facilitate this, an accurate linearized model of the VGAs gain with respect to tuning voltage and the power detector is necessary. Since those relationships are not linear, an envelope or transient simulation can be used to judge loop stability. The result of this envelope simulation is shown in Figure 3.21. The top strip shows the input steps, with a rise or fall time of 1 ns. The second strip shows the output voltage of the first harmonic, and the internal  $V_{level}$  and  $V_{again}$  are shown in the bottom strip. At

the beginning of this simulation, the circuit needs to stabilize from startup. In  $V_{\text{again}}$ , the low slew rate of the operational amplifier in the control loop can be observed. The time needed for output level regulation and overshoot behavior depends on the step direction and height. However, this simulation proves the stability and fast reaction time of below 1 µs for the AGC.  $I_{\text{DC}}$  is kept constant in this simulation. Otherwise, this would dominate the response. The input control loop is much slower, and thus, the gain and output level drop for each step due to the substantial DC offset and compression in the differential stages.

# 3.4 Experimental Results

The final circuit's die is shown in Figure 3.22. The input pad is placed on the left, and the differential output RF pads are on the right. The top DC pad row is for gain control, e.g., manual gain setting, switch to manual gain, desired output amplitude set point, and one  $V_{cc}$  pad. The bottom row has reference voltage and current, frequency response peaking, and  $V_{cc}$  pads. The RF signal path can be seen in the center of the chip. Above and below the RF chain, the control circuitry is placed. The total area, including pads, is only 0.41 mm<sup>2</sup>.



Figure 3.22: Chip photograph of the transimpedance amplifier. [2] ©IEEE



Figure 3.23: Single-ended output power and distortion for variable input power and different AGC set points. [2] ©IEEE

### 3.4.1 Control Loops

The power of a 1 GHz input tone was swept to verify the linearity and the AGC. The output spectrum is then used to calculate the total harmonic distortion and output power. Such a low frequency allows for a reliable power calibration, and all harmonics generated are within the amplifier's bandwidth. Figure 3.23 shows the result of these measurements. The control loop stabilizes the output power according to the set point. With a  $V_{set} = 2.05 \text{ V}$ , a higher distortion level can be observed. This is because the output swing exceeds the designed specification, and the driver creates much distortion. A set point of  $V_{set} = 2.1 \text{ V}$ corresponds to an output amplitude of approximately 200 mV<sub>pp</sub> single-ended. The THD increases rapidly for all settings at input powers above -27 dBm. Here, the variable gain amplifiers start to hit their linearity limit. The output amplitude is constant over a power range of almost 20 dB while maintaining a THD below 5 % and ensuring linear operation of the TIA. The differential THD is expected to be slightly lower than these measured values since 2<sup>nd</sup> order harmonics from compressing variable gain stages are canceled even further. These measurements were conducted with  $I_{DC,in} = 100 \,\mu A$  to stabilize the input stage operating point.

In Figure 3.24a, the input  $I_{\rm DC}$  sinking and offset cancellation is shown. For input currents  $I_{\rm DC} \le 1.3$  mA the output DC offset stays at a constant low value of around 15 mV. The circuit cannot sink a higher current and does not operate appropriately beyond that value, as indicated by the sharp change in that curve. The residual DC offset is due to non-infinite gain and other non-idealities like



Figure 3.24: Measured response to an input  $I_{DC}$  a) Output DC offset versus input  $I_{DC}$ . [2] ©IEEE b) THD and output power with an input signal of 1 GHz. The amplitude setpoint is 2.1 V.

op-amp offset voltages in the control loop. Additionally, any mismatch after the loop is not corrected. This could be improved by tapping the output DC value instead of the values before the single-ended to differential conversion. However, the achieved offset is low enough to justify this approach's minimized on-chip wiring effort. Figure 3.24b presents the harmonic distortion for two different input currents and an automatically adjusted gain so the output amplitude is  $200 \text{ mV}_{pp,SE}$ . We can see an improvement of the THD when the input power is at the limit due to the input stage gain modification with increasing input current. This reduced gain leads to a lower signal amplitude entering the variable gain stages, reducing distortions generated by these VGAs. This feature improves the usable dynamic range in an application scenario since a high input power also correlates with a high DC current. This improvement makes the amplifier usable for a 2 dB higher input power, and the THD stays below 3 % for the desired 20 dB tuning range. The mismatch at the interface



Figure 3.25: Raw S-Parameter measurement with the gain manually set to maximum.

must be considered to convert the input power back to a current. From  $S_{11}$  an input impedance of  $Z_{in} = 27 \Omega$  can be extracted. With

$$P_{\text{delivered}} = P\left(1 - \left|\frac{Z_{\text{in}} - Z_0}{Z_{\text{in}} + Z_0}\right|^2\right),$$
 (3.17)

$$i_{\rm RMS} = \frac{P_{\rm delivered}}{Z_{\rm in}},\tag{3.18}$$

$$i_{\rm pp} = 2 \frac{i_{\rm RMS} \sqrt{2}}{Z_{\rm in}} \tag{3.19}$$

and -25 dBm input power this results in  $i_{\text{in}} = 0.92 \text{ mA}_{\text{pp}}$ .

#### 3.4.2 RF Performance

The frequency response is measured with an S-parameter setup. Figure 3.25 compares measurement and simulation of the raw S-parameters with maximum gain. The HICUM models are used in simulation since they provide a better fit. The HICUM model reports a higher  $g_m$  due to a lower  $R_{ee}$  for the same operation point. Compared to VBIC models, the measured gain is about 2 dB higher than the simulated. Simulation and measurement fit very well, proving the design methodology of using parasitic extracted compact cores with the active devices and EM-simulated interconnects between cells. The transimpedance shown in Figure 3.26b is calculated using these S-parameter results and the



Figure 3.26: a) Setup to calculate  $Z_T$  from measured S-parameters. b) Calculated transimpedance with measured TIA response.

circuit shown in Figure 3.26a. As expected from the S-parameters, it also shows an excellent fit to simulation. However, the relationship  $Z_T(V_{gain})$  has a slight deviation in the mid-gain range. This comparison is visualized in Figure 3.27a. A deviation in  $Z_T(V_{gain})$  is not a big issue since the gain will be controlled mainly by the AGC. Figure 3.27b shows the ability to peak the gain at 40 GHz for about 2 dB by an external voltage. The measured group delay (GD) is also shown in this plot. Due to the noise in GD measurements, these results are filtered. The group delay shows a maximum variation of  $\pm 2.5$  ps from its mid-band value.

Pure electrical eye diagrams were recorded using a Keysight M8194A arbitrary waveform generator (AWG) and an Agilent DCA 86100C mainframe with 70 GHz sampling heads. A bias tee is used to inject  $I_{DC}$  into the TIA. One AWG channel served as a reference for a timebase unit in the oscilloscope (Agilent 86107A). Due to only  $\approx 60$  dB channel isolation in the AWG, a 20 dB attenuator was placed between the signal channel and the chip. Otherwise, the reference channel for the timebase created a sine with 30 mV amplitude at the



Figure 3.27: a) Transimpedance vs. manual gain tuning voltage. b) Peaking control and group delay measurements. [2] ©IEEE

TIA's output. During waveform generation, the frequency-dependent losses of the connections were removed using pre-emphasis in Keysight's IQtools suite. PRBS15 sequences were used. They are the maximum sequence length that the available waveform memory could create. Figure 3.28 shows the output eye diagram at 80 GBd. The eye diagrams were analyzed using a histogram at the center of the eye for 20 % of one period. This output eye shows  $R_{\rm LM} = 0.95$  and Q = 2.4 to 2.7. The input signal is shown in the insert of the figure with 5 mV/div and 5 ps/div. Due to the low input impedance of the DUT, the input voltage during measurement is close to the scope noise level. Even when increasing the amplitude to the maximum achievable by the AWG, a measured  $R_{\rm LM} = 0.96$  and  $Q_n \approx 2.2$  shows the input signal generation to be at a limit. Higher data rates could not be reliably generated with this setup.

Further, proof for the linearity improvement of the refined input stage is given by Figures 3.29a and 3.29b where 60 GBd eye diagrams with different input DC currents are shown. With higher  $I_{DC}$ ,  $R_{LM}$  improves from 0.92 to 0.94 and the Q-factor of the visibly less compressed upper eye is enhanced from 3.1 to 3.4. The input eye for this measurement has a  $R_{LM} = 0.99$ , a Q-factor per sub-eye of 3 and an amplitude of 22.5 mV<sub>pp</sub> corresponding to about 0.83 mA<sub>pp</sub>. These results show a clear improvement in the performance of a direct detection receiver by using a diode-connected HBT in the DC sink path.



Figure 3.28: Single ended output eye diagram for 80 GBd. The histogram on the side is used for signal quality evaluation. The insert shows the input signal with 5 mV/div and 5 ps/div.



Figure 3.29: 60 GBd eye diagrams with a 0.2 mA and b 1.2 mA  $I_{DC}$  respectively the AC amplitude is kept constant.

#### 3.4.3 Noise

Since practically all noise is due to the single-ended input stage and therefore correlated in the differential output, it is not necessary to measure a true differential output noise.

Therefore, the single-ended RMS noise voltage can be doubled to get the differential output noise. The equipment noise and loss between DUT and oscilloscope ( $S_{21,conn}$ ) have to be de-embedded [Säc17,KBE21]:

$$v_{n,\text{CKT,RMS,out}} = \frac{2\sqrt{v_{n,\text{RMS,SE}}^2 - v_{n,\text{RMS,Scope}}^2}}{\frac{1}{BW} \int_0^{BW} |S_{21,\text{conn}}(f)|^2 df},$$
(3.20)

$$i_{n,\text{CKT,RMS}} = \frac{v_{n,\text{RMS,out}}}{Z_{\text{T}}},$$
(3.21)

resulting in an average input-referred noise current density of:

$$i_{n,i} = \sqrt{\frac{(i_{n,RMS})^2}{BW}}.$$
 (3.22)

The output noise was measured with the Agilent DCA 86100C mainframe with 70 GHz sampling heads within the 70 GHz bandwidth. The equipment noise was evaluated to be 1.14 mV<sub>RMS</sub> by measuring the unpowered DUT. A measurement without input signal at max gain showed 8 mV<sub>RMS,SE</sub>. Using Equation (3.20) to remove the equipment noise and the power loss in the connection from DUT to the equipment and (3.21) for input referral, the input-referred RMS noise current was evaluated to be 4.7  $\mu$ A<sub>rms</sub>. This is equivalent to an average input referred noise current density of 17.7 pA/ $\sqrt{\text{Hz}}$ . The simulated values are 3.7  $\mu$ A<sub>rms</sub> and an average of 15.6 pA/ $\sqrt{\text{Hz}}$  which is a less than the measured response. Assuming NRZ modulation the electrical sensitivity is  $i_{pp,sens} = 2Qi_{n,RMS,i} = 29.7 \,\mu$ A for an SNR of 10. The optical sensitivity for the same SNR using a PD with  $R = 0.6 \,\text{A/W}$  would then be  $-16 \,\text{dBm}$ . And accounting for the PAM-4 SNR penalty, this results in a sensitivity of  $-7.5 \,\text{dBm}$ .

Since this measurement does not give any insight into the noise spectrum, a noise figure and spectrum were also measured. Figure 3.30a shows the measured and simulated input noise. At the input, a termination was necessary



Figure 3.30: Noise measurements for the TIA circuit without photodiode. a) Input referred noise with termination at the input. b) Noise figure measurement.

to prevent the pickup of additional noise from the environment. Since the DCcontrol loop did not operate properly at  $I_{DC} = 0$ , a current of 100 µA had to be injected. The probe needed for this acted most likely as an antenna, producing a strong ripple on the noise measurement. With termination, this ripple could be avoided. The measured noise is with  $18.4 \text{ pA}/\sqrt{\text{Hz}}$  again slightly higher than expected. With termination the simulated average  $i_{n,i}$  is  $16.0 \text{ pA}/\sqrt{\text{Hz}}$ . A noise figure measurement in Figure 3.30b gives a 1 dB higher value than the simulated result. This is also in line with the other noise measurement findings. The noise measurement with the sampling scope underestimates the input-referred noise since it assumes a constant transimpedance over the entire bandwidth, which is not the case. The gain roll-off will lead to less measured output noise power and, thus, an underestimated input-referred noise if that is not accounted for. The 18.4 pA/ $\sqrt{\text{Hz}}$  averaged  $i_{n,i}$  derived from the spectrum is, therefore, a more realistic value for this setup. One reason the measured noise is higher than simulated could be the unclear modeling of the noise generated by the avalanche collector-base current at the operating point. All model data is derived from and compared to a more safe operating point.

# 3.5 Conclusion

A transimpedance amplifier consisting of multiple stages was investigated in this study. The amplifier was optimized for a single photodiode with IM/DD

modulation targeting 100 GBd PAM-4. It comprises a single-ended transimpedance stage and differential voltage amplifiers with variable gain. Particular focus was put on saving power while keeping the amplifier linear, which is necessary for the higher modulation format. An unconventional  $I_{\rm DC}$  sink at the input helps to improve the maximal acceptable input power by 2 dB. This current sink acts as an RF shunt to ground, providing some input powerdependent variable transimpedance. The necessary DC offset compensation loop controls the shunt and leverages the unwanted  $I_{DC}$  to the advantage of this circuit. The circuit also features an automatic gain control loop for autonomous operation. Compared to some recent works listed in Table 3.1, this design is very power and area-efficient while offering an extensive bandwidth. The noise performance could be better, especially since it measured worse than designed. However, designs featuring the lowest noise performance are usually based on differential architectures and are particularly optimized for low-noise operation [KBE21, GLAR<sup>+</sup>18]. Nevertheless, this design shows the usability of the improved  $I_{DC}$  sink and the design strategy of not using area-consuming inductors.

|                                            | This<br>work     | [ANM <sup>+</sup> 16] | [BCT <sup>+</sup> 17] | [KBE21]          | [LBJC18]         | [VMD <sup>+</sup> 22] |
|--------------------------------------------|------------------|-----------------------|-----------------------|------------------|------------------|-----------------------|
| Tech.                                      | 130 nm<br>SiGe   | 130 nm<br>SiGe        | 130 nm<br>SiGe        | 130 nm<br>SiGe   | 28 nm<br>CMOS    | 90 nm SiGe            |
| $\max_{(dB\Omega)} Z_T$                    | 72.5             | 80                    | 75                    | 71               | 60               | 71.5                  |
| BW<br>(GHz)                                | 67               | 53                    | 38                    | 65               | 60               | 37.5                  |
| Noise $\left(pA/\sqrt{Hz}\right)$          | 18.4             | 23.6                  | 14.9                  | 7.2              | 19.3             | 9.7                   |
| max I <sub>in</sub><br>(mA <sub>pp</sub> ) | 920 <sup>c</sup> | 3000                  | 1050                  | 800              | 1000             |                       |
| THD<br>(%)                                 | 3                | 5                     | 2                     | 1.5              | 5                |                       |
| Vout<br>(mV <sub>pp</sub> )                | 400 <sup>a</sup> | 900 <sup>a</sup>      | 500 <sup>a</sup>      | 800 <sup>b</sup> | 300 <sup>a</sup> | 500 <sup>b</sup>      |
| P <sub>DC</sub><br>(mW)                    | 193              | 277                   | 340                   | 345              | 107              | 137                   |
| Area<br>(mm <sup>2</sup> )                 | 0.39             | 0.975                 | 1.6                   | 1                | 1.7              | 1.43                  |

Table 3.1: Wideband linear transimpedance amplifiers for optical communications.

<sup>a</sup>with AGC, <sup>b</sup>without AGC, <sup>c</sup>at  $I_{DC} = 1.1 \text{ mA}$ ; ER = 4.7 dB

# 4 Differential EPIC Receiver for Coherent Communications

The previous chapters focused on IM/DD transmission systems. In this chapter, a receiver design for coherent communications is investigated. The design uses IHP's SG25H5 EPIC technology. The electronic-photonic integrated circuit (EPIC) process integrates a BiCMOS technology with high-performance HBTs featuring  $f_T/f_{max} = 210 \text{ GHz}/290 \text{ GHz}$  and a full set of photonic components [KLB<sup>+</sup>15]. A single-channel receiver is discussed with the intention of future implementation in a single-die multi-channel IQ receiver. Its signal path follows the block diagram in Figure 4.1. RF and LO signals are coupled through a multimode interference (MMI) coupler and fed to two photodiodes. In heterodyne or intradyne reception with two PDs, the actual  $i_{AC}$  into the TIA is given by:

$$i_{\rm AC,diff} = 2R\sqrt{P_{\rm LO}P_{\rm Sig}}\sin\left(\Delta\omega t + \Delta\varphi\right),$$
 (4.1)

and can then be controlled by the LO power [Säc17]. Equation (4.1) has some important implications. The upper linearity limit with high RF input powers is not as constrained as in IM/DD systems since  $i_{AC,diff}$  can be reduced by reducing the LO power. Similarly, in the lower end for a weak RF raising the LO power increases  $i_{AC,diff}$ . Low RF signals can be *amplified*. However,



Figure 4.1: Coherent opto-electrical receiver.

this property is limited by the maximum allowable DC photocurrent per diode given by:

$$I_{\rm DC} = R(P_{\rm LO} + P_{\rm RF}), \tag{4.2}$$

and the generated noise in the diode due to this current. Additional advantages of coherent systems are the retained phase information allowing higher modulation schemes (e.g., quadrature amplitude modulation), which results in a higher data rate for a given bandwidth. Also, the LO can be used to select a channel frequency or wavelength so multiple channels can use the same fiber. Polarization information can also be used for channel multiplexing [Kik16].

Design requirements and trade-offs for the TIA are similar to IM/DD receivers with a few minor differences apart from the already mentioned potentially reduced demand on dynamic range. Depending on the actual transmission scheme, the lower cutoff is not as low as in IM/DD cases [Opt17b]. Since RF and LO frequencies will rarely be the same, the IF signal is at some intermediate frequency. Except in homodyne systems, where RF and LO are phase-locked, the PDs produce a pure baseband signal. In this case, a high-pass cutoff of 1 MHz was selected. External gain control for the post-amplification greater than 10 dB is added to ensure optimal ADC levels for unequal channel responses. The target output swing or input swing for the ADC is 400 mV<sub>pp,diff</sub>. The design goal is to achieve a reasonably high transimpedance with a bandwidth of >35 GHz.

# 4.1 Differential Input Stage

A receiver with a balanced detector can be designed like one for direct detection presented in Chapter 3 if both diodes are connected in series. The TIA's input then connects to both diodes as shown in Figure 4.2a. This architecture has the advantage of no  $I_{\rm DC}$  into the TIA. Since, ideally, both diodes have the same average incident power  $P_{\rm RF} + P_{\rm LO}$ , the  $I_{\rm DC}$  flows only from  $V_{\rm b,PD+}$  to  $V_{\rm b,PD-}$ . In practice, device mismatch and unbalanced slitting ratio in the MMI coupler can lead to a small  $I_{\rm DC}$  into the TIA. Therefore, a structure for  $I_{\rm DC}$ sinking and, in this case, even sourcing has to be implemented. Fortunately, this current is not expected to be very large. So, in the case of a shunt-feedback TIA with a common-collector buffer, this current could be controlled via the



Figure 4.2: Realizations of transimpedance frontends with balanced photodiodes: a) single-ended, b) quasi-differential, and c) fully-differential input architecture.

feedback resistor, minimizing the capacitive load at the input. However, other disadvantages make this single-ended input stage not ideal for the given task. The input stage sees twice the PD capacitance, and since it is an SE design, a broadband voltage regulator with low  $r_{out}$  is needed for the supply voltage of the main amplifier. A better approach is a quasi-differential input stage as shown in Figure 4.2b [ANM<sup>+</sup>16, ANK<sup>+</sup>18]. In this case, two SE TIA stages interface, each one photodiode. This reduces the PD capacitance seen per input stage, and only one PD bias voltage is needed. However, a more capable DC current sink needs to be implemented. As long as this sink has a lower parasitic capacitance than a photodiode, one can expect a larger bandwidth compared to the realization in Figure 4.2a due to a lower input capacitance. A voltage regulator is still needed for the supply of the two single-ended main amplifiers. However, ideally, this regulator does not see an AC signal, due to a virtual ground at the common supply node, when connecting the main

amplifier supplies. The regulator-generated noise appears as common-mode noise and is canceled by any common-mode rejection in the following stages. Moreover, considering the input-referred noise, an improvement of  $\frac{1}{\sqrt{2}}$  can be observed. This originates in the fact that the noise generated by the two stages is uncorrelated, but the gain is doubled:

$$Z_{\rm T,diff} = 2Z_{\rm T,SE} \tag{4.3}$$

$$\overline{v_{n,\text{o,diff}}^2} = \sqrt{2\overline{v_{n,\text{o,se}}^2}}$$
(4.4)

$$\overline{i_{n,i,\text{diff}}^2} = \frac{v_{n,o,\text{diff}}^2}{Z_{\text{diff}}}$$
(4.5)

$$=\frac{1}{\sqrt{2}}\underbrace{\frac{\overline{v_{n,o,SE}^2}}{Z_{SE}}}_{}.$$
(4.6)

$$\overline{i_{n,i,\text{SE}}^2}$$

There is also no uncorrelated noise from a replica coupled into the differential post-amplification chain as it would be for the SE case. The higher gain and the noise reduction are paid for by power consumption. However, it also vastly simplifies the first post-amplification stage since it does not need to operate as a broadband balun. A third option is a fully differential stage as presented in Figure 4.2c [DFMK21]. This provides additional common mode rejection while having all other advantages of a quasi-differential input. There is no need for a voltage regulator since biasing could be realized by a tail current source when using for example a differential pair as main amplifier. Additionally,  $V_{out}$ can be buffered through a common-collector stage, minimizing the load due to the following amplifiers. This is possible since all DC potentials are higher to ensure enough headroom for the current source. Therefore, the buffered output potential is still high enough for a fully-differential stage. One disadvantage is that the photodiode bias voltage could rise above  $V_{cc}$ , requiring an additional higher voltage supply. Nevertheless, a fully differential input stage was chosen for this design. The schematic of selected architecture using a common-emitter differential pair with common-collector buffers in the feedback path is shown in Figure 4.3.

From an input noise perspective, HBTs are not ideal for the  $I_{DC}$  sink due to their current shot noise and amplified base resistance thermal noise. The improved



Nf: Number of emitter fingers with  $A_e = 180 \text{ nm} \times 840 \text{ nm}$  each.

Figure 4.3: Fully-differential shunt-feedback input stage.

DC-current sink for IM/DD receivers with a diode connected HBT presented in Section 3.2 is also not a good solution here due to the separation of  $i_{in,AC}$ and  $I_{in,DC}$  in a coherent receiver. For example, at low  $P_{RF}$  levels, the  $P_{LO}$  could be increased to boost the received AC amplitude. A shunting DC sink is, in this case, counterproductive. Therefore, this design uses an n-type MOSFET (NMOS) transistor with  $w/l = 6 \,\mu\text{m}/0.5 \,\mu\text{m}$ . w/l is chosen to support a DC photo current of 3 mA. This corresponds to  $P_{LO} = 6.3 \text{ dBm}$  at 1550 nm wavelength and -2V diode bias.  $I_{in,DC}$  higher than 2 mA are not desired because they start to cause photodiode performance drops. The sizing here ensures some margin. This transistor's load must be included in the design of the input stage. Fortunately, the capacitive load with such small devices does not change much with Iin, DC through the device. The channel length is chosen to be larger than the minimal size to reduce its noise impact. A larger input capacitance of the TIA is the drawback. The transimpedance and input-referred noise current density are shown in Figure 4.4. With  $R_{\rm C} = 130 \,\Omega$  and  $R_{\rm F} = 350 \,\Omega$  a transimpedance bandwidth of about 40 GHz with  $Z_{T,mid-band} = 616 \Omega$  is achieved. The inputreferred noise current is  $i_{n,RMS} = 1.5 \,\mu\text{A}$  in this bandwidth which corresponds to an average noise density of 7.6 pA/ $\sqrt{\text{Hz}}$ .



Figure 4.4: Transimpedance and input-referred noise current density of the differential input stage.

## 4.2 Post-Amplifier Section

Following the transimpedance stage, two variable gain amplifiers provide additional gain. Both VGAs use the same topology as shown in Figure 4.5, a differential cascode where the current through the common-base transistors can be controlled. This topology is also used in the design described in Section 3.2. A Gilbert-cell type VGA could not provide sufficient bandwidth due to the additional capacitive load at the outputs. Again, peaking inductors were not used to keep the circuit compact. Common-collector buffers follow each VGA to shift the DC potential and decouple the following stage's input capacitance. Resistor in series to current mirror based current sources are used to bias these buffers. The resistors  $R_{S1}$  and  $R_{S2}$  are needed to reduce  $V_{CE}$  of the source. This is necessary to ensure a correct mirror ratio, which is affected by the relatively low Early-voltage of the transistors in this technology. For details current source design process and considerations see Appendix B.2. Both

|                | VGA1            | VGA2            |
|----------------|-----------------|-----------------|
| R <sub>C</sub> | $150\Omega$     | 130 Ω           |
| R <sub>E</sub> | $60\Omega$      | 55 Ω            |
| CE             | $10\mathrm{fF}$ | $20\mathrm{fF}$ |
| Nf Buffer      | 1               | 2               |
| I <sub>2</sub> | 1 mA            | 2 mA            |

Table 4.1: VGA device sizes for EPIC receiver.



Nf: Number of emitter fingers with  $A_e = 180 \text{ nm} \times 840 \text{ nm}$  each.

Figure 4.5: Variable gain amplifiers used in the post-amplification section.

VGA stages use slightly different device sizes optimized for their position in the chain. The sizes are summed up in Table 4.1. The desired gain is set by an external voltage  $V_{gain}$ , which is then used to create the bias voltages for the common-base transistors. This is done via resistive voltage dividers as shown in Figure 4.6a. The control voltage for the signal transistors is set to be constant, and the bypass transistors' voltage is changed via an additional branch loading the divider. The device sizes are optimized to keep the VGAs in a region where they do not create too much distortion (see Section 3.2). Both VGAs use the same control voltage. Figure 4.6b shows the gain versus control voltage. A tuning range from 1.5 dB to 14.6 dB is reached, including the gain of the output driver.  $V_{gain}$  is designed to span from 0 V to  $V_{cc} = 3.3$  V.

The output driver in Figure 4.7 is constructed as a differential cascode amplifier. Its load resistance is with 55  $\Omega$  slightly larger than the ideal 50  $\Omega$ . This yields more gain while still keeping a good output match. Emitter degeneration is used for linearity. The gain is set to unity, and the driver adds 0.6 % of THD at the targeted output amplitude of 400 mV<sub>pp</sub>. *V*<sub>b2,drv</sub> is also created via a resistive divider. Again, this is a valid construction due to the virtual ground and a linear operating regime. No decoupling capacitance is used on this node, slightly



Figure 4.6: a) Resistive voltage divider circuit to create the VGA's control voltages. b) Gain of post-amplification stages (VGA1, VGA2, and Driver) at 10 GHz.

improving the common-mode rejection ratio (CMRR) by 1 dB at 50 GHz. The common mode gain is kept below -10 dB. The complete post-amplification chain has a bandwidth of greater than 43 GHz. It consumes 118 mW with the major contributor being the output stage consuming 50 mW alone.



 $A_{\rm e} = 180 \,\rm nm \times 840 \,\rm nm.$ 

Figure 4.7: Schematic of the output line driver.

# 4.3 DC-Current Sink and Offset Cancellation Loops

To control both input DC sinks, two individual loops are used. A single loop could not eliminate device mismatches or input power mismatches in the photodiodes. The positions and connections of both loops are sketched in Figure 4.8. The blue components show the  $I_{DC,P}$  control loop using a replica. Orange shows the  $I_{DC,N}$  control loop eliminating the output DC-offset. Both loops use the same op-amp design based on a two-stage amplifier with PMOS input stages and feedback capacitances to create a dominant, low first pole. This is done to enforce the loop's stability and push the receiver system's low-frequency cutoff below 1 MHz. The schematic of this amplifier is documented in Appendix C.1.

The  $I_{DC,P}$  sink control at the non-inverting TIA input is shown in more detail in Figure 4.9a. This loop acts on the first stage based on a replica. The current through the common-collector buffer in one branch of the input stage is monitored at the supply side using a resistor  $R_1$ . The desired voltage drop across this resistor is determined using an identical reference  $R'_1$  with the same



Figure 4.8: Receiver channel with both control loops. Blue is the  $I_{DC,P}$  control loop based on a replica and orange shows the  $I_{DC,N}$  control loop eliminating the output DC-offset.



Figure 4.9: *I*<sub>DC</sub> control loop working on the non-inverting input of the TIA: a) schematic showing half of the TI input stage, the replica, and a simplified loop. b) Loop gain and phase simulations.



Figure 4.10: Offset cancellation loop working on the inverting input of the TIA: a) Block diagram of the loop wrapped around the entire chain. b) Loop gain and phase simulations.

supply and current source. The currents  $I_1$  and  $I'_1$  are identical. If now a current  $I_{error} = I_{PH,DC} - I_{D,NMOS}$  flows into or out of the TIA through  $R_F$  and Q2, the current in  $R_1$  and thus its voltage drop differs to the reference. The loop acts upon this error. The base currents are neglected in this design, but since *B* of the transistors is more than 250, this small error is accepted in this loop design. Phase and loop gain of this loop are shown in Figure 4.9b. The low-frequency loop gain is only 43 dB. This is due to selecting only 60  $\Omega$  as measurement resistor and a resistive divider in the additional loop filter in front of the op-amp. On the other hand, this low gain helps to achieve a phase margin of 90.8°. The first dominant pole can be seen in the phase of this loop.

The second loop acting on the inverting input of the TIA is shown in Figure 4.10a. It measures the output DC offset since this is similarly affected by  $I_{\text{DC,N}}$  flowing into the TIA. This loop has a higher gain of up to 75 dB since the entire gain of the TIA chain is included. Additionally, its loop gain and phase presented in Figure 4.10b change with the gain setting, but the phase margin stays above 70°.

Since these two loops also cancel the desired AC signals, their 0 dB gain frequency is an important parameter. The control circuits generate a high-pass response since they also attenuate all signal components below their unity gain frequency. The DC-offset loop dominates that response since it has a higher unity gain frequency and, therefore, determines the high-pass cutoff for AC signals. For the highest TIA gain, the 3 dB high-pass corner frequency is at

850 kHz and for lower gain settings, even lower. In direct detection schemes, this frequency is aimed to be at least a factor of 10 smaller, but since the output of this coherent TIA is not at baseband due to different LO frequencies, this lower cutoff is sufficient.

### 4.4 Final Receiver Channel

The entire receiver is designed to run from a single supply for the electrical part with the addition of an external supply for the photodiode bias. This additional supply is necessary since the voltage needed for  $V_{in,DC} + V_{bias,PD}$  is above  $V_{cc}$ and to be able to adjust  $V_{\text{bias,PD}}$  without affecting the TIA. To facilitate the single electrical supply operation, a bandgap-based voltage reference is placed on the chip. This reference produces a temperature stabilized  $V_{ref} = 1.085 \text{ V}$ (see Appendix B.1.2). Constant current references for the individual current mirrors in every stage are derived from this voltage via an operational amplifier feedback circuit. The current reference works by comparing  $V_{ref}$  to a voltage across a resistor and controlling this voltage and, therefore, the current (see Appendix B.3). This current is then mirrored to all differential stages and scaled for the desired bias currents. If the bias needs to be adjusted during characterization, the SPDT switch can disconnect the internal reference and connect the biasing circuitry to an external pad if the voltage on that pad exceeds  $\frac{2}{3}$  of V<sub>ref</sub>. See Appendix C.2 for details on the switching circuitry. The cathodes of the photodiodes are connected to form a virtual ground. They are biased from the same pad. The receiver layout is optimized to facilitate multiplication on one chip for a complete multichannel IQ receiver architecture. Figure 4.11 shows a top view of this layout. The top metal is shown in orange, and red is the ground plane on metal3. Optical waveguides are routed below the ground and arrive from the left, going to the photodiodes. Supply and ground are distributed also from the left. When connecting the single receiver channel to a solid supply bar, there is again a virtual ground at the center in between the two contacts. This helps to provide isolation if there are multiple independent channels on a die. The RF chain is centered in the layout. Control loops and gain control circuits are above and below the RF portion.  $V_{ref}$  and  $V_{gain}$  are routed in shielded boxes and can cross the channels below the ground plane of the output transmission lines. A groundsignal-signal-ground configuration was chosen for the output pads for higher



Figure 4.11: Layout view of a single receiver channel. Supply and photodiode connections are marked. The loop and gain control circuits are above and below the RF chain. Orange is the top metal layer, and red is the ground plane on metal3.



Figure 4.12: Full chain simulation results: a) transimpedance and group delay and b) input-referred noise current density.

integration density of multiple channels. These pads define the total height of one receiver channel. The full chain transimpedance and group delay are shown in Figure 4.12a for the maximum and minimum gain. The input referred noise current density is plotted in Figure 4.12b. The average noise current density is 9.8 pA/ $\sqrt{\text{Hz}}$  for the 40 GHz operational OE-bandwidth. This results in an  $i_{n,\text{RMS},i} = 1.7 \,\mu\text{A}$  whereas the first stage only has  $1.5 \,\mu\text{A}$ . A noise-matching inductor at input was not chosen due to an already quite significant group delay



Figure 4.13: Differential S-parameters a) OE gain including photodiode frequency response for different V<sub>gain</sub> settings and b) TIA only S-parameters for the highest gain.

peaking. Figure 4.13a shows the differential opto-electrical gain of the entire TIA. The foundry provides photodiode S-parameter models. Therefore, the frequency response of the diodes is included in this simulation. The bandwidth varies from 39 GHz to 42 GHz with the highest bandwidth corresponding to the highest gain. The S-parameter data for the photodiode has a 100 MHz value of -21.3 dB. Therefore, all gain curves have an offset towards lower values. For comparison, the pure S-parameter frequency response for the highest gain with  $V_{gain} = 3.3$  V is shown in Figure 4.13b. The output is well matched in the complete band, and common mode rejection is higher than 50 dB. The CMRR drops as expected since the biasing current sources lose output impedance with increasing frequency. Measures like a high impedance at the base bias nodes for the common-base stages by using resistive biasing help to keep the CMRR up. Total power consumption per channel is 176 mW from the 3.3 V.

### 4.5 Conclusion

A fully differential TIA design for a coherent optical receiver in IHP's SG25H5 EPIC is investigated. The differential architecture has the advantage of an improved CMRR, and no wide-band voltage regulators are needed. Moreover, a differential TIA has a better input-referred noise current than a design with a following SE to differential conversion. The transimpedance input stage is optimized for the integrated PDs at a bias voltage of -2 V. Much
detail went into the  $I_{\rm DC}$  sink and offset control loops. A potential use in multi-channel IQ receivers was guiding some constraints regarding the floor plan, supply routing, and power consumption. With 176 mW per channel, the total consumption for an, e.g., four-lane IQ setup is 1.4 W and should be manageable with a sufficiently large heat sink. A single supply and integrated voltage and current references enable convenient operation. More than 39 GHz OE-bandwidth and only  $i_{n,RMS,i} = 1.7 \,\mu$ A integrated over that bandwidth. Assuming NRZ modulation this corresponds to an electrical sensitivity of  $i_{\rm pp,sens} = 2Qi_{n,RMS,i} = 10.7 \,\mu$ A for an SNR of 10. The optical sensitivity for the same SNR would be  $-20.5 \,d$ Bm. Measurement verification of this design is outstanding since manufacturing of the prototypes was not completed at the time of this writing.

### 5 Broadband Data and Clock Generation

High data rates per link help to minimize the number of I/Os and, thus, wiring or fibers needed to achieve a total data throughput. The link speeds usually exceed single data sources' processing or reading capabilities. Therefore, data is serialized before being sent over a serial connection. The serializer serves as the data source, combining multiple parallel data streams. In Section 5.1, a low power design for a 10 Gb/s serializer in IHP's SG13G2 technologyis discussed. It features a cell-based design combining CMOS, BiCMOS-CML, and pure bipolar cells, ensuring an efficient design.

An implementation of a wideband frequency doubler is investigated in Section 5.2. The design goal for this doubler was an extensive operational frequency range. In addition, a modification for differential outputs in a push-push type doubler was developed and implemented. This doubler can be used for a synthesizer-based source in millimeter wave software-defined radios or as a flexible clock multiplier in circuits with variable clocking.

# 5.1 Design of a 10.4 Gb/s Serializer in IHP's SG13G2 technology

There are two main methods of generating a serial data stream. First, utilizing a shift register with parallel load lines. This technique is widely used in relatively low-speed applications since its main drawback is that the shift register has to be able to shift data at the final speed. On the other hand, the shift register can be sized to fit the amount of parallel data and is, therefore, flexible concerning the serialization ratio. This is not possible in the second primary approach, the binary tree. The binary tree always combines two streams into one by using a selector. In very high-speed applications, the binary tree has significant

advantages regarding power consumption. The clocking in every stage of the tree gradually increases by a factor of two until the final stage. This way, only one stage has to clock at the highest frequency, allowing for a more power-efficient design. Combinations of both are also possible. Selectors with more than two inputs can be used to generate flexible serialization ratios, as shown in [JCC<sup>+</sup>19]. This section presents a design study for creating an area and power-efficient 8:1 serializer with more than 10.24 Gb/s output data rate. The motivation for the specific data rate originates in a CMOS pixel detector project. State-of-the-art pixel sensors in a 180 nm technology [PAA<sup>+</sup>21] are intended to be transferred to IHP's SG13G2 technology. The pixels will have a digital readout circuit with a first serialization scheme. Eight 1.28 Gb/s streams are now to be serialized to reduce the number of I/Os. The output driver is targeted to generate an output compliant with the CEI-11G-SR short-reach interface specification [Opt17a]. This would also be compatible with high-speed serial transceivers found in FPGAs [Xil19].

The relevant efficiency targeted in this section is power and area efficiency since both ultimately define the cost of building and running a communications system. Apart from the choice of general architecture, namely a binary tree, a shift register, or something in between, the type and construction of the actual logic gates are essential. A combination of CMOS and current mode logic (CML) is often used for high-speed serial transmitters since it offers an optimal relation of speed compared to area and power consumption [TL09]. Classic CMOS logic has the advantages of a compact layout and no static power consumption. The dynamic power consumption depends on transistor sizes and clock speeds and, thus, can be significant. The most significant disadvantage is the limited speed. Due to the parasitic and transistor capacitances and low  $g_m$  and consequently low switching current of small transistors, the clock frequency cannot be increased above a certain point. If the speed of CMOS logic cells is insufficient, using CML is an option to increase the circuit performance. A differential pair biased by a current source is the basis for CML logic. The current through this pair is switched from one side to the other. Load resistors create the differential voltage signal for the following cells. CML offers a lower gate delay since the CML voltage swing is lower than CMOS levels, and the transistor widths are usually larger than standard CMOS cells. CML latches need to switch completely to work correctly. The voltage swing needed is given for a particular technology, and thus, the load resistance and bias current are directly connected. CML's most significant disadvantage



Figure 5.1: Overview of a 8:1 CMOS and BiCMOS-CML serializer.

is static power consumption. However, since the gate delay, load resistance, and bias current depend on each other, the designer can choose to balance speed and power consumption. From these perspectives, a guide to designing these high-speed logic circuits can be derived: Use CMOS where possible and minimize the bias current of CML logic for the given task. The actual clock speed where one would transfer from CMOS to CML is technology and process-node dependent.

Figure 5.1 shows a general overview of the implemented serializer architecture. It uses a three-stage binary tree structure to serialize eight incoming data streams. A 2:1 BiCMOS-CML multiplexer follows two identical 4:1 stages designed with CMOS standard cells. The clock at half the output data rate with  $f_{clk} = 5.2 \text{ GHz}$  is supplied from an external source and converted to a differential clock used for the CML part. This clock is then divided and converted to a CMOS clock at 2.6 GHz. Using a switchable delay, the phases of these two clocks can be adjusted for correct timing, allowing for a PVT and speed calibration. The 2.6 GHz clock is fed to the 4:1 CMOS part, where it is divided again to be used in the first stage.

#### 5.1.1 CMOS 4:1 Serializer

The first and second serialization stages are built with 130 nm CMOS standard cells and re-timing and delaying latches. Figure 5.2 shows a block diagram of one of these CMOS multiplexers. In the first section at 1.3 Gb/s, the process is fast enough to be operational at the nominal  $V_{\text{DD,core}} = 1.2$  V. However, to be able to push the performance for the second stage of multiplexing at 2.6 Gb/s, the core voltage has to be raised to at least  $V_{\text{DD,core}} = 1.5$  V. In particular,



Figure 5.2: CMOS 2:1 multiplexer.

the clock divider constructed as a D-flipflop with inverter feedback included in this block suffers from tight tolerances in the timing at lower core voltages. This is especially critical as jitter in the clock source can cause the clock to be divided by three instead of two for some events. To minimize supplies, the core voltage is set to 1.5 V for both stages.

#### 5.1.2 The BiCMOS CML-Latch

Using the high-performance heterojunction bipolar transistors in this process for the relatively low speed in this application has several advantages. Bipolar transistors have a higher  $g_m$  than MOS transistors. Since a latch has to have at least a small signal gain of  $|A| \approx 1.5$  to overcome losses and device variations this gives an advantage of HBTs towards lower current consumption. Second, the minimal voltage swing needed for entirely switching a CML inverter is  $v_{in \, hbt} \ge 4V_T + I_{Tail}r_E$  with  $V_T$  being the temperature voltage and  $r_E$ the parasitic emitter resistance [Voi13]. With a safety margin, this results in a voltage swing of about 200 mV to 300 mV for an HBT latch. Compared to an approximately  $v_{in,mos}$  for this process node of about 450 mV. As a result, using HBTs, we can scale down the load resistance for the same tail current, increasing the small signal bandwidth and switching speed. Towards lower CMOS nodes  $v_{in,mos}$  reduces, and so using HBTs in the data path of latches is advantageous in BiCMOS process nodes  $\geq 90 \text{ nm}$  [DYC<sup>+</sup>06]. An increased area consumption of the HBTs pays for these advantages. Figure 5.3 shows the simplified schematic of the BiCMOS latches in this circuit. The HBTs use the smallest available number of fingers with a fixed dimension. The latch speed is further improved by using a split load [WHO<sup>+</sup>00]. This reduces the



All bipolar transistors with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$ .

Figure 5.3: Simplified schematic of used BiCMOS latch.



Figure 5.4: Small signal bandwidth of the BiCMOS latch for different load splitting ratios.

capacitance seen at the output node of the cell by separating the input capacitance of the *hold*-pair from the output nodes. Figure 5.4 shows the resulting bandwidths for a varying split load ratio. The total load resistance  $R_1 + R_2$  is kept constant while the splitting ratio varies. The downside to the split load technique is the dropping of the output voltage during the *hold*-phase due to a reduced  $R_{\text{load}}$  seen by the *hold*-pair.

Since the transistors in the clock path only need to switch the currents, their gain is of minor importance. Therefore, we can use NMOS clock transistors for improved area efficiency as well as a reduced input capacitance [DV05, DBV05]. The input capacitance of an HBT even at a  $I_c = 160 \,\mu\text{A}$  is about 6 fF. A here used NMOS with  $\frac{w}{T} = \frac{1}{0.13}$  has only 1.7 fF at the same device current. Common-drain stages serve as DC-level shifters from the CML clock, additionally masking that input capacitance. A BiCMOS CML latch uses  $14.42 \,\mu\text{m} \times 25.42 \,\mu\text{m}$ , having the current source also as MOS current mirror. A latch using only HBTs would come to around twice the area, which further emphasizes the benefit of using a BiCMOS latch. The latch is sized by selecting the load resistance as large as possible while still fulfilling the bandwidth requirement for the specific application.  $I_{\text{Tail}} = \frac{V_{\text{in,hbt}}}{R_{\text{Load}}}$  then sets the required bias current. Based on these latches, the CML 2:1 multiplexer is constructed in the same architecture as shown in Figure 5.2. The CML selector cell follows the exact device sizing as the latch. The only modification is converting the hold-pair to a second input, which gets selected at the low clock period.

If higher switching speeds are required, the total load resistance and tail current can be scaled to the desired value. However, scaling is limited by a drop in gain with increasing currents and junction temperatures moving towards  $I_c(\max f_T)$ .

#### 5.1.3 CMOS - CML Integration

The transition for the data from CMOS to CML is done by a sequence of two differential amplifiers as shown Figure 5.5. At first, a feed-forward NMOS stage with active load is used to get a differential signal from single-ended CMOS signal levels. The second stage acts as a level shifter up to the DC level of the CML latches. This level shifter is designed as a cascode with NMOS common-source and HBT common-base stage. Since the feed-forward mechanism in the first stage does not provide a perfect differential signal, the common mode rejection of the second stage current source helps to clean up the output signal. The load resistance and tail current are sized slightly differently to save some power. The maximally expected data rate at this interface is about 5.2 Gb/s.



Nf: Number of emitter fingers with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$ .

Figure 5.5: Two-stage CMOS to CML transition circuit.



Figure 5.6: CML to CMOS transition circuit.

The more critical transition is from CML back to CMOS logic levels for the clock signals. A DC-shifted version of the CML clock signal is fed standard CMOS inverters as a conversion from CML to CMOS. If we assume a sinusoidal clock and the DC level of this clock signal is not ideally aligned with the CMOS decision threshold, a substantial and systematic duty-cycle distortion will reduce the clock quality. Since the clocking scheme in the multiplexers uses both phases of the clock, a reduced timing accuracy and jitter tolerance would result. The duty-cycle distortion directly translates to different symbol times and thus reduces the time interval where the following stages can sample the CMOS output symbols. To solve this problem, a feedback loop controls the DC potential of the shifted CML signal. An operational amplifier compares the DC potential tapped via resistors from the shifted CML signal to a reference representing the mid-transition of CMOS inverters. A short circuit between the output and input of an inverter creates this reference. The OP amp then adjusts the current and, therefore,  $V_{GS}$  of the transistors in the shifting path, modulating the total shifted DC potential. The transistors are sized carefully to ensure proper tuning range and minimize clock distortions. Figure 5.6 shows the schematic for this conversion circuit. Although this converter is only used for clock signals, it can also be used to transfer a data stream from CML to CMOS domains. This operation is made possible by the use of the DC potentials of  $v_{in,shifted,p}$  and  $v_{in,shifted,n}$ .

#### 5.1.4 Implementation and Test of the 10.4 Gb/s Serializer Test Chip

The clock is fed through an on-chip DC blocking capacitor to a differential cascode amplifier used as a balun. This cascode is biased slightly asymmetric to force a stable CLK = 0 when there is no external clock signal providing a known startup condition. Following the balun, a clock driver for the fast CML clock. A high-gain CML inverter is used for this purpose. The output swing of this clock driver is 800 mV<sub>pp,diff</sub> driving the complete 2:1 BiCMOS multiplexer and a CML clock divider. This divider is constructed of two latches similar to the ones used in the MUX, but the loads are slightly asymmetric for a defined initial condition as in the clock balun. Following this divider, the clock is converted to CMOS logic levels using the circuit in Figure 5.6. From the fast-clock driver to the first sampling latch in the CML multiplexer,

the signal delay is about 20 ps, but from the clock driver to the inputs of the CMOS 4:1 blocks, the delay is 220 ps while being heavily process, voltage and temperature dependent. A switchable delay aligns the two clock branches by delaying the CMOS clock by additionally 87 ps, 180 ps, or 260 ps. This delay can adjust the CML MUX sampling time and CMOS 4:1 output hold times.

A differential output driver ensures matching to standard 100  $\Omega_{\text{diff}}$  environments. This driver uses common-collector buffers and then a differential cascode with six transistor fingers of  $A_e = 70 \text{ nm} \times 900 \text{ nm}$  each. The load resistance is 60  $\Omega$  for a marginally larger gain and still acceptable matching. The driver has a voltage gain of 2.4. With this gain and the expected signal strength from the CML section, this driver is operating non-linearly, acting as a limiting amplifier. In simulation, this gives a 4 dB improvement on the output eye's SNR.

Table 5.1 lists power and area consumption for the individual blocks. The 8:1 serializer, including the clock amplifier and distribution, uses only 13.74 mW. 68% of the total power is used in the output driver. The power reduction measures implemented in the BiCMOS latches led to a very efficient CML 2:1 multiplexer while its area is still larger than the 8:2 CMOS MUX. Due to a DC blocking capacitor, the clock balun uses a significant portion of the entire active area. All latches and interfaces between the signal domains are designed in a cell-based structure, allowing a flexible placement and reuse of similar layout elements.

Figure 5.7a shows the chip photo. On the left side, there are eight digital input cells.  $V_{DD,IO}$  for the digital input cells is combined with the  $V_{CC}$  of the BiCMOS latches. A current reference and the delay selection inputs are placed on the top side supply. To the bottom is the clock input, and to the right is the differential output. The circuit was tested using the PCB in Figure 5.7b with the wire-bonded die. Verification is done using constant inputs for six channels and pseudo-random binary sequences (PRBS15) for two channels. An Agilent M8190A arbitrary waveform generator supplied these sequences. A higher channel pattern or waveform generator was unavailable, but the data integrity can still be tested with two pseudo-random input streams. The output is recorded with an Agilent DSOX93204A real-time oscilloscope. An external source supplies the clock and is also fed to the oscilloscope via a power divider. Clock and data are recorded simultaneously for simple data recovery during post-processing by correcting for cable delays and simply sampling the

|                                        | Area<br>(µm <sup>2</sup> ) | Power <sup>a</sup><br>(mW) |
|----------------------------------------|----------------------------|----------------------------|
| CMOS 8:2                               | 2400                       | 4.16                       |
| CML 2:1 incl. CMOS to CML conv.        | 3600                       | 4.98                       |
| clock dividers, distribution and delay | 3200                       | 4.60                       |
| clock balun incl. DC block             | 6500                       | 2.9                        |
| output driver                          | 2300                       | 36.95                      |
| Total <sup>b</sup>                     | 25 400                     | 53.8                       |

Table 5.1: Area and simulated power consumption of the building blocks.

<sup>a</sup>  $V_{\text{DD,core}} = 1.5 \text{ V}$  and 10.4 Gb/s output data-rate.

<sup>b</sup> More than the sum due to included supply and bias circuitry.



Figure 5.7: a) 8:1 serializer chip photograph. b) Test board. The die is under the black cover to protect the wire bonds.



Figure 5.8: Measured eye diagram for error-free 10.4 Gb/s operation.

measured data with the clock. The expected output data stream is calculated using the constants and two PRBS inputs. Since the start of a recording is not synchronized to the data stream beginning, the measured data is aligned to the expected stream using a correlation and a shift to maximize the correlation. After that shift, a bit-wise comparison of the expected and measured bit stream ensures error-free operation. Figure 5.8 shows the recorded eye diagram for error-free 10.4 Gb/s operation. The connection from the chip to the oscillo-scope through the PCB, cables, and DC blocks is not de-embedded for the measurement.

The eye diagram shows asymmetric eye levels due to an offset in the output driver. A revised version could solve this with a DC-offset compensation control loop. The eye has a quality factor of Q = 10.8, and the crossing has  $4.5 \text{ ps}_{\text{RMS}}$  jitter when correcting for the DC offset.  $5.8 \text{ ps}_{\text{RMS}}$  if not correcting for it. Due to the eye-amplifying effect of the output driver and a relatively bad signal after the CML 2:1 MUX, some jitter is expected. Additionally, any duty cycle distortion in the clock directly translates to jitter.

#### 5.1.5 Conclusion

This design study investigated a procedure for designing area and powerefficient high-speed serializers using the full extent of the BiCMOS technology's device portfolio. A 10.4 Gb/s 8:1 serializer was implemented using IHP's SG13G2 technologyand verified. The serializer, including clock balun, uses a core area of  $0.025 \text{ mm}^2$  and 54 mW where 68 % of power is used in the output driver. The verification showed that a re-timing latch at the output and DC offset compensation would be a valuable addition to the final circuit implemented in the pixel detector. The additional latch, however, would require another clock divider stage and a higher input clock, increasing the area and power consumption. The latch would not contribute significantly to the area and power budget since it could be designed in the presented design scheme with a minor adjustment for an increased speed, selecting a lower load resistance and higher tail current. Actively controlling the clock duty cycle might be an alternative, more efficient solution. Especially, when moving towards 100 Gb/s where a 100 GHz re-timing clock would be necessary. Offset compensation helps to have the eye centered around 0 V, potentially improving the driver's performance. However, since the output is just NRZ, the additional distortion generated by this offset was not critical for circuit verification. The results prove the usability of the design methodology, which can be followed for an extension towards higher output data rates.

#### 5.2 Investigation of a Broadband Differential Frequency Doubler

#### Parts of the following section have been published in [4].

For test and measurement applications or emerging millimeter wave systems like software-defined radio, frequency sources and multipliers covering a wide frequency range are required [RYL16]. VCOs, as frequency sources for millimeter wave applications, suffer from high phase noise and low tunability. A commonly used solution is generating signals at lower frequencies and multiplying their outputs, for example, by one or multiple frequency multipliers [LUTS10, TLK<sup>+</sup>13, BPAB12]. There are multiple solutions like using a single transistor, self-mixing with Gilbert cell-based multipliers [CBYJ09] or a push-push transistor pair [RFC21]. A Gilbert cell uses stacked transistors, so a higher supply and a larger circuit area are needed. Push-push doublers are conveniently built from just two transistors and are, therefore, more efficient regarding supply voltage and area. However, their output is typically singleended. Differential signal sources are essential for millimeter wave systems due to many building blocks operating as differential circuits. In the scope of this work, during data serialization, a high-speed clock source is needed to clock the multiplexing stages and latches. The previous study of a serializer in Section 5.1 uses an external source due to its relatively low frequency. However, a fast clock is needed when serializing data towards the 100 Gb/s. From a system perspective, it is ideally created on-chip via a phase-locked loop and multipliers. A clock source for CML logic also needs to be differential. Therefore, a modified push-push frequency doubler to generate differential outputs without the use of a post-doubler balun is investigated in this section. The design of this doubler focuses on a wide operation bandwidth. This provides much flexibility for serializer data rates or as an LO source for multi-band millimeter wave systems.

#### 5.2.1 Input Balun

An active balun was integrated into the doubler test chip to drive the frequency doubler and provide a well-balanced wideband differential signal. Passive baluns require a large area and offer only limited bandwidth [TLK<sup>+</sup>13]. Fig-



Figure 5.9: a) Simplified circuit diagram of the input balun. b) Phase difference at collector and emitter of the balun's input transistor for different values for  $C_{\text{E}}$ . [4] ©IEEE

ure 5.9a shows a simplified schematic of the used active balun topology. The circuit is a classical single transistor balun with  $R_{E,B} = R_{C,B}$ . Special care has to be taken to keep amplitude and phase balance with increasing frequency since any error in the differential signal will translate into fundamental leakage at the doubler's output. Single transistor baluns introduce large errors at frequencies higher than 10 GHz. Therefore,  $C_E$  is used to tune the phase and amplitude response in the first step, as seen in Figure 5.9b. Increasing this capacitance helps to keep the phase difference between the collector and emitter tap close to 180°. However, this acts as a peaking capacitor for the signal tapped at the collector while being a low pass filter for the signal at the emitter. As a result, the amplitude balance degrades drastically. As a trade-off with the focus on a better phase response,  $C_E$  was chosen to be 15 fF. However, the response still shows a deviation from the desired differential signal. A differential amplifier following the balun stage introduces a second correction step. This stage uses a degenerated current mirror for common mode suppression.

#### 5.2.2 Broadband Doubler Implementation

As a frequency doubler, a push-push configuration of two npn HBTs is used. In Figure 5.10a, the simplified circuit diagram of the push-push doubler stage



Nf: Number of emitter fingers with  $A_e = 70 \text{ nm} \times 900 \text{ nm}$ .



Figure 5.10: a) Circuit diagram of the doubling stage with post amplifier. b) The doubler's conversion gain and amplifier's gain versus output frequency.

is shown. The transistors are biased close to Class-B, and so the circuit is operating as a rectifier. A differential signal at its input generates a total current through the transistor pair with the main frequency component of  $2f_{in}$ . Usually, a single-ended signal with  $2f_{in}$  is then taken from the collector [RFC21,RTCE18,CMZ<sup>+</sup>16]. The subtraction suppresses the fundamental frequency if the inputs are differential. However, depending on the balance of the input signal, its power, circuit parasitics, and non-idealities, a certain amount of the fundamental frequency is still present at the output. Filtering of the fundamental frequency can be achieved by optimizing the output matching



Figure 5.11: Photograph of the Chip. Total area:  $0.278 \text{ mm}^2$ . Core area (balun and doubler):  $280 \,\mu\text{m} \ge 125 \,\mu\text{m} = 0.035 \,\text{mm}^2$ . [4] ©IEEE

network for  $2f_{in}$  or high-pass filters. Since this design targets a broadband operation, this is not an option. Hence, the integrated balun is optimized for the best possible differential symmetry. The main difference to the state-ofthe-art push-push doublers is the usage of load resistors on both sides of the coupled npn pair. Like the balun's operation, a differential voltage can be tapped between the collector and emitter nodes. Due to a lack of 2<sup>nd</sup> harmonic reflectors because of their limited bandwidth and other techniques (e.g., cascode [JHR05]) to increase conversion gain, the doubling stage itself has a low conversion gain. Additionally, the conversion gain of the doubling stage starts to drop when the output frequency exceeds 30 GHz. Therefore, a two-stage fully differential amplification chain boosts the output signal. The amplifiers are optimized for wide bandwidth and only boost the total conversion gain to around 1 dB. However, they include a frequency peaking to counteract the drop in the doubler's response. Figure 5.10b plots the two frequency responses balancing each other for a flat response. Figure 5.11 shows a photograph of the chip with balun and doubler. The total area is 0.278 mm<sup>2</sup>, while the active circuit core only accounts for 0.035 mm<sup>2</sup>. It was fabricated in IHP's SG13G2 technology.



Figure 5.12: Balun small signal measurement response a) S-parameters and b) Amplitude and phase balance. [4] ©IEEE

#### 5.2.3 Experimental Results

A breakout circuit of the balun is measured independently. It is tested with an on-wafer measurement system and a 3-port S-parameter setup. Figure 5.12 shows the results for this balun. Since the input is DC-Blocked  $S_{11}$  is greater than -10 dB below 8 GHz but for higher frequencies  $S_{11}$  shows a good match. The amplitude and phase balance are excellent, with an amplitude mismatch below 0.5 dB and a phase mismatch below  $\pm 2^{\circ}$  in the frequency range from 2.4 to 67 GHz. The balun itself consumes 11.3 mW DC-power. The gain of the balun is low since it is not designed for a  $50 \Omega$  load at the output, as was the case during this measurement. Apart from having a slightly higher gain, the measurement and simulation fit very well and show the capability of this simple balun structure.

The doubler with integrated balun is also verified with an on-wafer measurement setup. For measurements up to 50 GHz output frequency, a Keysight N5247B VNA is used to measure the harmonics up to the fourth harmonic. It also generates the input signal. Two ports are used as a differential receiver with calibration up to the measurement probe tips. Therefore, there is no need for external baluns to measure a differential response, which would introduce measurement uncertainties. For output frequencies above 50 GHz, a broadband frequency extender was used to measure the output power single-ended. 3 dB was added to the measured power to account for the differential result.



Figure 5.13: a) Output harmonics at  $P_{in} = -13$  dBm. and b) Conversion gain and fundamental leakage at  $P_{in} = -13$  dBm. [4] ©IEEE

Figure 5.13a shows the measured and simulated harmonic power versus the input frequency. There is a strong ripple on the measured power on the output above 50 GHz, which is due to calibration difficulties for absolute power measurements in our system. Figure 5.13b shows conversion gain and fundamental rejection. The results agree with simulated values. The simulated conversion gain 3 dB-bandwidth is from 2.4 GHz to 56.4 GHz with respect to the input frequency. While in measurement, the upper limit is at 40 GHz input frequency. However, due to probe losses needing to be accounted for in the power calibration, this value is slightly underestimated. Figure 5.14b presents conversion gain and fundamental rejection at  $f_{in} = 25$  GHz as a function of input power. These results also match the simulated response. The total DC power consumption is 39.6 mW from a 1.8 V supply including 11.3 mW for the balun and 25.1 mW for the differential post amplifiers.

#### 5.2.4 Conclusion

The design of a broadband frequency doubler based on the push-push topology with an unusual method of generating a differential output using emitter and collector load resistances was investigated. A classic single transistor active balun is integrated to drive the doubler and ensure a well-defined phase relation between the doubler's input signal. The circuit has a measured peak conversion



Figure 5.14: a)  $2^{nd}$  harmonic output power versus input power at  $f_{in} = 25$  GHz and b) Conversion gain and fundamental power with respect to the  $2^{nd}$  harmonic versus input power. [4] ©IEEE

|                          | This<br>Work   | [PKN <sup>+</sup> 05] | [TLK+13]        | [CBYJ09]        | [EHB <sup>+</sup> 17] | [RTCE18]       |
|--------------------------|----------------|-----------------------|-----------------|-----------------|-----------------------|----------------|
| Tech.                    | 130 nm<br>SiGe | InP DHBT              | 0.18 μm<br>SiGe | 0.18 μm<br>SiGe | 0.18 μm<br>SiGe       | 130 nm<br>SiGe |
| Topology                 | PP             | GC                    | PP              | GC              | GC                    | PP             |
| Output                   | diff           | SE                    | SE              | diff            | diff                  | diff           |
| BW <sup>a</sup><br>(GHz) | 5 - 80         | DC - 86               | 15 – 36         | 36 - 80         | 37<br>@120 GHz        | 56 - 67        |
| peak CG<br>(dB)          | 1.7            | -0.25                 | -10             | 10.2            | 3                     | -15            |
| Fund.<br>(dBc)           | 18.5           |                       | 33              | 20              | 21                    | 42             |
| P <sub>DC</sub><br>(mW)  | 39.6           | 730                   | 4 - 11          | 137             | 69                    | 23.5           |

Table 5.2: Comparison of wideband frequency doubler circuits.

<sup>a</sup> Conversion Gain 3 dB bandwidth, <sup>b</sup> two doublers with polyphase filter

gain of 1.7 dB with an input power of -13 dBm at  $f_{in} = 25$  GHz while providing over 20 dB of fundamental rejection. The balun and doubler use a compact design without large passive structures and hence require only 0.035 mm<sup>2</sup> for the active circuit area. The doubler has a measured 3 dB-bandwidth of at least 75 GHz, with a good match between simulation and measurement. Table 5.2 presents an overview of similar publications for broadband doublers and doublers with differential outputs. The frequency doubler in this work shows the highest conversion gain 3 dB bandwidth for differential output circuits. When using this doubler as a clock multiplier in data serializers, the low harmonic suppression, primarily the fundamental suppression, limits the achievable performance concerning clock jitter. At  $f_{out} = 50 \text{ GHz}$  the simulated jitter is 422 fs<sub>RMS</sub> and 1.5 ps<sub>nn</sub>. These results were extracted using a transient simulation, including noise. If we now assume this 50 GHz is a clock for a multiplexer to 100 Gb/s, the peak to peak jitter exceeds 15 % of a symbol period. Since this is one of many sources for jitter, a detailed analysis of the complete system would be necessary to conclude if this is acceptable in a data serialization system. However, with wideband synthesizers, the doubler enables flexible and ultra-wideband local oscillator generation.

### 6 Packaging for Broadband Applications

In the scope of this work, some packaging solutions with in-house manufacturing have been researched. The capabilities at Institute of Radio Frequency Engineering and Electronics (IHE) include an automatic wire bonding machine, die-attach pick-and-place tools, and a prototyping laser for structuring and cutting.  $630 \,\mu\text{m}$  thick Alumina (Al<sub>2</sub>O<sub>3</sub>) substrates with a 3  $\mu\text{m}$  gold plating are used as the basis for broadband assemblies when trying to cover bandwidths  $\geq$ 40 GHz. This substrate has a comparably high dielectric constant of  $\epsilon_r = 9.4$ and low loss tan  $\delta = 0.0004$ . The dielectric constant leads to comfortable dimensions for a 50  $\Omega$  CPW transmission line:  $w_{\text{line}}/w_{\text{space}} = 2$  and  $w_{\text{line}} = 50 \,\mu\text{m}$ and  $w_{\text{space}} = 25 \,\mu\text{m}$ . With proper optimization of the laser parameters, this is manufacturable. This line can also be contacted with the identical RF probes used for on-wafer characterization, allowing easy manufacturing process verification. However, no small vias are available in this stack, and bends or corners tend to radiate when operation frequencies exceed 75 GHz. These radiation effects get more severe as the line and space increase. Therefore, a thin line was selected. In the same laser process, the substrate can be cut, and cavities for ICs can be milled. This chapter investigates and tests the steps and relevant techniques to package the differential PAM-4 combiner and driver presented in Section 2.2. Section 6.3 then shows the results for a sub-assembly-based packaging solution.

#### 6.1 Wire Bonds

A bond wire transition is the first interface from an integrated circuit to the outside world. Ideally, this connection does not create reflections in the transitioning signals. However, this is not possible if we consider standard  $50 \Omega$  environments. Figure 6.1 shows the impedance of a TEM mode in ground-



Figure 6.1: Impedance of a GSG bond wire transition for different wire diameters in air.

signal-ground (GSG) bond wires versus the wire pitch. It can be seen that even with unfeasible small pitches, there will still be a mismatch in the transition. By covering the bonds in a molding compound with a dielectric constant larger than one, the mismatch in the transition could be reduced [HSZ22]. However, at the cost of an additional process step with potential issues like non-uniformity due to trapped air bubbles, molding compound spillover interfering with tuned structures on the chip, and dielectric loss in the compound. Therefore, planar GSG wire bonds without a molding compound were selected for a packaging investigation of the PAM-4 driver. A 56i series semi-automatic wire bonder from F&S BONDTEC Semiconductor GmbH with different process-specific bond heads is available at the institute for bonding. Wedge-wedge wires for short and compact wires can be bonded with diameters down to 12.5 µm. A tool with an especially small bond flat of 25 µm is used for RF bonds. This is necessary so the bond flat fits on the pads initially designed for RF probing and reduced capacitance. Ribbon bonding is also possible however the smallest ribbon dimensions of  $w = 38 \,\mu\text{m}$  and  $h = 12.7 \,\mu\text{m}$  are too large for reliably bonding on the 40 µm wide pads with only 35.8 µm passivation opening. The bond adhesion might be severely reduced due to passivation debris being stuck under the bond wire. Moreover, the wire spreads during pressing down, so shorts between the signal and nearby ground pads might occur. Figure 6.2 shows measurements for 150 µm long 12.5 µm gold bonds from a CPW on Alumina to a CPW on the same substrate and a reference *through* CPW line. This is a somewhat idealized test, but it shows a few key takeaways. As expected 100  $\mu$ m pitch GSG transition exceeds a  $-10 \, dB$  return loss very early at 75 GHz. Reducing the pitch to 70 µm helps but is insufficient. Adding a



Figure 6.2: Comparison of different 150 µm wire bond connections on a alumina substrate. a) Return loss and b) Insertion loss.



Figure 6.3: Doubled ground and signal bonds transition from a chip to a CPW on an alumina package.

second wire to each ground connection (GGSGG) only marginally improves the return loss, but insertion loss is affected a lot. When additionally placing a second wire in the signal connection (GGSSGG), the performance is similar to a ribbon connection, and the RL stays below  $-10 \, dB$ . This second signal wire touches the first one, as seen in Figure 6.3. The chip in this example is embedded into the substrate to reduce height differences and thus bond wire length. This process is very reliable as the wedge-wedge bond head offers a clamp-operated wire tearing mechanism for the second bond. Furthermore, RF performance is the same as ribbon bonds, so this technique is used for further packaging steps.



Figure 6.4: a) Layout drawing of the DC block capacitor footprint on gold-plated alumina. b) Return loss and c) Insertion loss for a through as reference, a through with taper, and the final DC block assembly.

#### 6.2 DC Blocking

Since the intended operating frequency of broadband amplifiers should go as low as possible to prevent baseline wander, a relatively large capacitor is needed for DC blocking. On the other hand, DC blocks are necessary to avoid loading the outputs or inputs due to other connected circuitry. Especially since device or supply variations are shifting the common mode level of the signals, a perfect match to other building blocks is rarely possible. The capacitance value needed is next to impossible in the IHP SiGe processes with their metal-insulator-metal caps specific capacitance of  $1.5 \text{ fF}/\mu\text{m}^2$ . Therefore, off-chip DC blocks need to be designed. A Murata XBSC series 10 nF silicon capacitor

was selected. These caps are rated up to 110 GHz, which means they support a resonance-free response in that range. The 10 nF high pass  $f_c$  within a 50  $\Omega$ transmission line is at about 160 kHz. The pads are too large to fit into the  $50\,\Omega$  CPW. Therefore, a taper was designed to widen the line and gaps to keep reflections low. The width of the taper and, thus, the landing pad dimension for the capacitor is chosen as small as possible. Larger dimensions show increasing radiation and, thus, loss. The final footprint layout is shown in Figure 6.4a. The capacitor already has solder bumps for attachment to a PCB. However, the gold-plated alumina substrate used does not have a solder mask, making the alignment and soldering process very hard. With a solder mask, the surface tension of the solder would pull the capacitor in place. This is different here. That problem was solved by holding the capacitor with a pick-and-place tool during the soldering process. Figure 6.4 shows the measured S-Parameters for test structures. A straight 50  $\Omega$  through is used as the baseline. The effect of the taper is tested by tapering the line up and down again without inserting the capacitor. The straight and tapered through show very good matching and low loss. However, the taper shows radiation effects above 75 GHz. This is also visible in the measurement, including the capacitor. Up to 75 GHz, the DC block has only 1.5 dB of loss and a very good match. At 100 GHz the loss increases to 3 dB. The capacitor is not intended to be used on such high  $\epsilon_r$ substrates and thin lines. Nevertheless, these results show that it can be used for 100 GBd applications with in-house manufacturing at IHE.

#### 6.3 Sub-Assemblies

A sub-assembly-based packaging approach was used for testing the previously investigated technologies. This features a conventional low-cost PCB process for DC connections, decoupling, and an RFIC package glued onto that PCB shown in Figure 6.5a. The RFIC sub-assembly is based on a gold-plated alumina substrate. It can be tested individually before integration with the DC PCB or other modules. Ball-wedge wire bonds make electrical connections from the DC to the RF board. These bonds have their ball on the PCB, and the wire exits vertically up to the gold layer on the RFIC module. A zoomed-in view of the RFIC sub-assembly is shown in Figure 6.5b. The driver is embedded in the alumina substrate to minimize the step for the RF wire bonds. A thermally conductive glue (Polytec TC 430 from Polytec PT GmbH) is necessary to



Figure 6.5: Packaged driver from Section 2.2 with external decoupling capacitors and DC-block at one output. a) PCB as carrier board with additional decoupling and DC connections.b) Photograph of the RFIC sub-assembly.

connect the driver via the alumina to the ground plane on the PCB. This ground plane serves as the heat sink. Even though the power dissipation of 300 mW is relatively small, there is a noticeable heating effect of the die during power up. Assuming a 20 µm thick glue layer with a thermal conductivity of  $\lambda_{glue} = 0.7 \text{ W/mK}$  and a diced die size of 0.8 mm<sup>2</sup> a temperature gradient of 10.7 K is formed across the glue layer. The heat difference from the active area to the bottom of the die is ignored in this estimation due to the very large thermal conductivity of silicon  $\lambda_{Si} = 160 \text{ W/mK}$  resulting in a temperature difference below 1 K. The alumina with  $\lambda_{A12O3} = 27 \text{ W/mK}$  and  $\approx 310 \,\mu\text{m}$ results in another 4.3 K to the interface at the bottom of the alumina subassembly. However, this is already a very conservative estimation since it only includes vertical heat flow. The significantly larger alumina sub-assembly also spreads the heat vertically. At this interface, the glue area is large enough to restrict the temperature gradient below 1 K. Considering these estimations and a temperature rise on the DC board, a chip temperature  $\approx 15$  to 20 K above ambient is to be expected. During power-up, a short period of thermal stabilization could be observed.

For proof of this concept, connectors were not used in this module. RF input and outputs could be wire bonded or, as in the case of this characterization,



Figure 6.6: Measurements of the packaged driver. a) S-parameters and comparison to on-wafer measurements. b) 80 GBd eye diagram of the packaged driver.

probed. The output is probed at the sub-assembly edge. The input is probed at the transmission lines to the right of the open area. Wedge-wedge (WW) bonds with doubled signal wires interface the die. DC ground bonds are also formed in the WW process. 1 nF bypass and decoupling capacitors are glued onto the RFIC assembly and connected with ball-wedge wires. This is done to form a straight wire from the chip upwards. The DC block design from Section 6.2 is connected to the output. Only one output drives a single-ended Mach-Zehnder modulator (MZM).

The measured S-parameters are presented in Figure 6.6a. The on-wafer measured response is also plotted. Apart from a dip at 35 GHz, the packaged driver compares very well to the unpacked version. The additional loss of DC blocks, transmission lines, and wire bonds creates a greater difference with increasing frequency. 80 GBd 1  $V_{pp}$  operation is presented in Figure 6.6b.

#### 6.4 Conclusion

Packaging solutions for prototyping have been investigated. This investigation focused on a broadband package including a DC block for a hybrid module of PAM-4 driver and MZM according to Figure 1.3. Bond wires are still the most significant bottleneck due to their lack of matching. Short bonds

and multiple bond wires in parallel, especially for the larger ground pads, improve matching and reduce insertion loss. Doubled signal bonds provide comparable performance to ribbon bonds, which were too wide for the pads used in this design. The footprint of a commercial DC blocking capacitor was optimized to fit the used material combination of gold-plated alumina. A sub-assembly-based package of the PAM-4 driver and combiner using the investigated techniques showed 80 GBd with 1 V<sub>pp</sub> proofing this concept for fast prototyping packaging techniques. The package can be verified during all individual steps.

### 7 Conclusions and Outlook

This work investigates circuits for compact and efficient transceiver front ends supporting 100 GBd PAM-4 signaling, as well as supporting circuits focusing on wide bandwidths and compact designs.

Mach-Zehnder modulator drivers with integrated PAM-4 power combination are presented. They work on the principle of current combining in the load, adding two NRZ streams into one PAM-4 data stream. Two variants of a distributed combiner in IHP's SG13S technology are developed. The focus was on a wide operational bandwidth where methods of reducing the amplifier cell input capacitance are necessary. Emitter degeneration and a capacitive divider variant are tested for this purpose. Both variants exhibit bandwidths greater than 80 GHz and low group delay variations, while the variant with emitter degeneration exhibits a superior S-parameter response. Eye diagrams up to 70 GBd were measured, but the output amplitude and achievable data rate stayed behind the expected results. With their single-ended design and need for bias-tees, the distributed drivers are complex to integrate into a system. Therefore, a different driver version based on a differential output stage was also evaluated. This driver was fabricated in IHP's SG13G2 technology. Since two differential data streams drive the output stage, a broadband balun was integrated to reduce the test and measurement effort. 100 GBd eye diagrams with  $V_{\text{out,pp}} = 2 \text{ V}$  were measured to verify the drivers capability. This is the fastest PAM-4 combiner circuit reported in a silicon technology so far. With an efficiency of 1.58 pJ/b (315 mW total consumption), it is also very efficient in comparison to the state of the art. Despite the distributed drivers' reduced speed, all drivers designed in this work's scope can drive SOH modulators, additionally featuring an adjustable level spacing for pre-distortion of MZM transfer functions. The differential driver achieved the goal of 100 GBd PAM-4, making it the better choice for integration in a transmitter prototype.

Packaging solutions for prototyping have been developed to complement the differential driver. This investigation focused on a broadband package includ-

ing a DC block for a hybrid PAM-4 driver module and MZM. Bond wires are still the most significant bottleneck due to their lack of impedance matching. Short bonds and multiple bond wires in parallel, especially for the larger ground pads, improve matching and reduce insertion loss. Doubled signal bonds provide comparable performance to ribbon bonds, which were too wide for the pads used in this design. The footprint of a commercial DC blocking capacitor was optimized to fit the used material combination of gold-plated alumina. A sub-assembly-based package of the PAM-4 driver and combiner using the investigated techniques was developed. It is verified with 80 GBd and  $1 V_{pp}$  output swing proofing this concept for fast prototyping packaging techniques. For further improvement of the package, molding of the RF bonds to reduce the impedance and revised dicing and cutting techniques are necessary to reduce the gaps needed to be bridged by RF bond wires, further reducing frequency-dependent losses in the package.

As receiver front end, a linear TIA is developed for the direct detection of 100 GBd PAM-4 signals with a single photodiode. It includes a single-ended transimpedance stage and differential voltage amplifiers with variable gain. A  $72 \text{ dB} \Omega$  transimpedance with more than 67 GHz bandwidth and low group delay variation is reached. Area-consuming inductors for frequency response peaking are deliberately avoided to create a compact layout. Particular focus was put on saving power while keeping the amplifier linear, which is necessary for the targeted modulation format. An unconventional  $I_{DC,ph}$  sink at the input helps to improve the maximal acceptable input power by 2 dB. This current sink acts as an RF shunt to ground, providing some input power-dependent variable transimpedance. The necessary DC offset compensation loop controls the shunt and leverages the unwanted  $I_{DC,ph}$  of the photodiode to the advantage of this circuit. Automatic and manual gain control, a tuneable frequency response, and an output driver for  $400 \text{ mV}_{pp,diff}$  complete the functionality for a receiver system while consuming a total of 193 mW. The noise performance could be optimized compared to the state-of-the-art, especially since it measured worse than designed. Nevertheless, this design shows the usability of the improved  $I_{\rm DC}$  sink and the design strategy of not using area-consuming inductors.

Since coherent optical communications and monolithic integrated electronicphotonic circuits promise compact high-speed transceivers also usable for inter-data center communications, a fully differential TIA design for a coherent optical receiver in IHP's SG25H5 EPIC is investigated. It also follows a compact and inductorless design strategy as the 100 GBd TIA. The differential architecture has the advantage of an improved CMRR, and no wideband voltage regulators are needed. Moreover, a differential TIA has a better input-referred noise current than a design with a following SE to differential conversion. The transimpedance input stage is optimized for the integrated photodiodes at a bias voltage of -2 V and features dual  $I_{DC}$  sink and offset control loops. A potential use in multi-channel IQ receivers was guiding some constraints regarding the floor plan, supply routing, and power consumption. With 176 mW per channel, the total consumption for an, e.g., four-lane IQ setup is 1.4 W and should be manageable with a sufficiently large heat sink. A single supply and integrated temperature stabilized voltage and current references enable convenient operation. More than 39 GHz OE-bandwidth and only  $i_{n,RMS,i} = 1.7$  µA integrated over that bandwidth. Measurement verification of this design is pending since manufacturing of the prototypes still needs to be completed at the time of this writing.

A serializer and a broadband frequency doubler are designed as supporting circuits for the high-speed front ends. A procedure for designing area and power-efficient high-speed serializers using the full extent of the BiCMOS technology's device portfolio is developed and verified during the design of an 10.4 Gb/s 8:1 serializer using IHP's SG13G2 technology. The serializer, including clock balun, uses a core area of  $0.025 \text{ mm}^2$  and 54 mW where 68 % of power is used in the output driver for 50  $\Omega$  environments. The verification showed that a re-timing latch at the output and DC offset compensation would be a valuable addition for future projects. The additional latch, however, would require another clock divider stage and a higher input clock, increasing the area and power consumption. Offset compensation helps to center the eye around 0 V, potentially improving the driver's performance. The results prove the usability of the design methodology, which can be followed for an extension towards higher output data rates.

The design of a broadband frequency doubler based on the push-push topology with a method of generating a differential output using emitter and collector load resistances was investigated. A classic single transistor active balun is integrated to drive the doubler and ensure a well-defined phase relation between the doubler's input signal. The doubler has a 3 dB-bandwidth of at least 75 GHz and a peak conversion gain of 1.7 dB while providing over 18.5 dB of fundamental rejection in the entire band. The balun and doubler use again

a compact design without large passive structures, requiring only  $0.035 \text{ mm}^2$  for the active circuit area. When using this doubler as a clock multiplier in data serializers, the low harmonic suppression, primarily the fundamental suppression, limits the achievable performance concerning clock jitter. Since this is one of many sources for jitter, a detailed analysis of the complete system would be necessary to conclude if this is acceptable in a data serialization system. However, with wideband synthesizers, the doubler enables flexible and ultra-wideband local oscillator generation.

### A Second Order Low-Pass Transfer Functions

The transfer function of commonly used shunt feedback TIAs follows a secondorder low-pass response. Shaping this response is an essential part of the design. This chapter summarizes a few general properties of such transfer functions. The normalized transfer function follows the form:

$$H(s) = \frac{1}{\frac{1}{\omega_n^2} s^2 + \frac{1}{\omega_n Q} s + 1}$$
(A.1)

$$=\frac{1}{\frac{1}{\omega_n^2}s^2+\frac{2\zeta}{\omega_n}s+1},\tag{A.2}$$

where  $\omega_n$  denotes the natural frequency of the system. Q is the quality factor and  $\zeta$  the damping ratio. Their relationship is defined as

$$Q = \frac{1}{2\zeta} \quad \text{or} \quad \zeta = \frac{1}{2Q}. \tag{A.3}$$

The two poles are located at

$$s_{1,2} = -\zeta \omega_n \pm \omega_n \sqrt{\zeta^2 - 1}. \tag{A.4}$$

Looking at the damping ratio, we can observe four different behaviors:

- **undamped**  $\zeta = 0$ : The step response would oscillate with  $\omega_n$ . Complex pole pair on the imaginary axis.
- **underdamped**  $0 < \zeta < 1$ : The step response has an overshoot but decays exponentially. The poles are a complex pair with a negative real part.
- **critically damped**  $\zeta = 1$ : No overshoot in the step response. Two identical poles on the real axis.



Figure A.1: Pole diagram for underdamped transfer functions. The poles move on the dashed half-circle with varying  $\zeta \leq 1$ .



Figure A.2: Normalized response for different damping ratios  $\zeta$ : a) magnitude and b) group delay.

## **overdamped** $1 < \zeta$ : No overshoot in the step response. Two poles on the real axis move apart with an increasing damping ratio.

An exemplary pole diagram for an underdamped  $H(\omega)$  is given in Figure A.1. There are several noteworthy damping ratios. For example, the *Bessel* filter is designed to have a flat group delay response, which is desirable for broadband amplifiers to minimize jitter [Tho49]. To achieve this  $\zeta = \sqrt{3}/2$  is needed. And the *Butterworth*-response yields a maximally flat  $|H(\omega)|$  with  $\zeta = 1/\sqrt{2}$ . Figure A.2 shows magnitude and group delay for varying  $\zeta$ .
# B Temperature and Supply insensitive Biasing

### B.1 Design of Bandgap References in SiGe

Since integrated circuits have to be functional in a wide range of temperatures and every device has a temperature-dependent behavior, a way of creating a temperature-insensitive reference is necessary. Using this circuit, the operating point of individual parts can be kept constant or even adjusted for temperature variation if the exact temperature is known. These so-called *bandgap references* work on the principle of combining two inversely temperature-dependent effects to cancel them [Wid71]. The forward voltage of a diode, or  $V_{BE}$  of a transistor, has a negative temperature coefficient, and the  $V_{BE}$  difference between two diodes biased at different current densities has a positive one. Combining and scaling these two can yield a voltage with only minimal temperature variation. This voltage is sometimes called the bandgap voltage because when extrapolated down to 0 K its value equals the bandgap energy divided by the carrier charge *q* [Raz16].

## B.1.1 Design of a Voltage Reference in IHP's SG13G2 technology

Since in this technology, we are not limited to grounded collector transistors as in standard CMOS technologies, a Brokaw type reference was selected [Bro74]. The final circuit is shown in Figure B.1. The reference voltage is set by



Figure B.1: Circuit of the voltage reference in IHP's SG13G2 technology. The bipolar transistor multipliers denote the number of single finger transistors with  $A_{\rm E} = 70$  nm × 900 nm.

$$V_{\text{ptat}} = 2V_{\text{T}} \frac{R_2}{R_1} \ln \left( \frac{A_{\text{E},\text{Q2}}}{A_{\text{E},\text{Q1}}} \right) \tag{B.1}$$

$$V_{\rm ref} = V_{\rm BE,Q1} + V_{\rm temp} \tag{B.2}$$

$$= V_{\rm BE,Q1} + 2V_{\rm T} \frac{R_2}{R_1} \ln\left(\frac{A_{\rm E,Q2}}{A_{\rm E,Q1}}\right),\tag{B.3}$$

while the currents through both transistors are the same:  $I_1 = I_2$ . When matching the resistors to the transistor areas, a nearly constant output voltage can be achieved. A current mirror (dashed blue) keeps the currents identical [GP01]. Since  $V_{BE,Q1}$  drops with temperature, the current through both transistors has to rise to keep the output constant. This current is positive to absolute temperature (PTAT) and also mirrored to a  $I_{ptat}$  output. At  $R_1$  a PTAT voltage appears. Both  $I_{ptat}$  and  $V_{ptat}$  can be used for temperature measurement or compensation in other circuitry.  $V_{ref}$  and  $V_{ptat}$  are not allowed to be restively loaded. Otherwise, the operating point would be disturbed. From a DC perspective, the circuit has two stable operating points: when all currents are zero



Figure B.2: a) reference voltage versus temperature and b) PTAT current vs. temperature for two measured samples, VBIC and HICUM simulation models.

and in the desired operating regime. To enforce the wanted second bias point, a startup circuit (orange box) is used. The diode Q3 is forward biased and conducting as long as  $V_{\text{ref}}$  is below  $\approx 0.86$  V forcing  $V_{\text{ref}}$ ,  $I_1$  and  $I_2$  up. During normal operation, the diode is below  $V_{\text{BE,on}}$  and does not affect the circuit. The reference voltage is filtered by  $C_1$ . This helps to improve the supply rejection and short noise above  $\approx 500$  MHz. The supply rejection is better than 34 dB and the integrated output noise from 1 kHz to 10 GHz is  $130.5 \,\mu V_{RMS}$  at 50 °C. The reference uses an area of  $50 \,\mu\text{m} \times 50 \,\mu\text{m}$  and consumes  $90 \,\mu\text{A}$  from  $3.3 \,\text{V}$ . It was separately fabricated and tested. The die was glued on a PCB, and DC measurements were performed on a hotplate while monitoring the temperature with a fiber optic temperature sensor. This sensor measured the temperature on the PCB's surface, which closely matches the die's ambient temperature. As seen in Figure B.2a, the reference voltage is not constant with temperature and does vary between two samples. This behavior could not be reproduced by device process corners and is most likely a result of statistical mismatches. Nevertheless, it is close to an expected 1.032 V, and this mismatch does most likely not affect the performance of a receiver system too much. I<sub>ptat</sub> in Figure B.2b does fit very closely to the simulation with VBIC models. HICUM models do not provide a good match for the measured behavior. This difference in simulation results indicates the necessity for tuning structures when a higher precision reference is required. Tuning could be implemented by creating a switch bank to adjust  $R_1$  and  $R_2$ .



Figure B.3: SG25H5 BGR: a) reference voltage versus temperature and b) supply rejection at 50  $^{\circ}\mathrm{C}.$ 

#### B.1.2 Design of a Voltage Reference for the IHP EPIC Process

Another reference was created in EPIC Process for the circuit in Chapter 4. The structure is similar to the reference in Figure B.1. Since it is a different technology, the device sizes change. Most notably the transistor ratio is here chosen to be 13:1 and the resistors are sized to  $R_1 = 1074 \Omega$  and  $R_2 = 2148 \Omega$ . The reference in IHP's SG25H5 process is then  $V_{ref} = 1.085$  V. The maximum of the curvature was optimized to be at 50 °C as can be seen in Figure B.3a.  $I_{\text{ptat}}$ and  $V_{\text{temp}}$  were not needed, and thus, those outputs were not implemented. This reference draws 176 µA from 3.3 V. The simulated RMS noise from 1 kHz to 10 GHz is 139  $\mu$ V at 50 °C. A filter cap similar to  $C_1$  filters noise higher than 1 GHz and helps with supply rejection. Between 10 MHz and 1 GHz, the supply rejection has a low spot (Figure B.3b) where the filter capacitance is not large enough, and the performance of the current mirrors dropped. This reference occupies a layout area of 95  $\mu$ m × 60  $\mu$ m. Due to the experience with the reference in IHP's SG13G2 technology, an automatic SPDT switch was designed to have the SG25H5 EPIC TIA switch to an external reference if the voltage on a reference pin exceeds approximately two-thirds of the expected  $V_{\rm ref}$  (see Appendix C.2).



Figure B.4: Simple current mirror circuit. The source current can be adjusted by the reference and the transistor ratio *x*.

### **B.2 Current Mirrors**

When moving towards an IC with a few or even just one supply voltage, many bias voltages are created on the chip from that supply. However, this supply can change in a system operation, and the IC should keep its performance for a specified supply variation. This can only be realized by supply-independent biasing. The voltage reference in Appendix B.1 is one pillar of creating this independence. The second is biasing differential stages via current sources. During the first drafts of the design, an ideal current source is used for biasing. This ideal current source sets the tail current, and thus  $V_{\rm BE}$  for the desired  $I_{\rm c}$ is reached automatically. An ideal source has infinite  $r_{\text{source}}$  and no  $C_{\text{source}}$ , resulting in a perfect common-mode rejection, symmetric operation with no variation with a supply voltage change. This, however, cannot be reached with real components. The closest we can get to a source with a high  $r_{source}$  and low  $C_{\text{source}}$  is a current mirror. Here, a current in a reference branch is used to create the correct  $V_{\text{BE,source}}$  for a transistor used as a current source. The current mirror primarily used throughout this work is constructed as shown in Figure B.4. It is a two-transistor mirror with emitter degeneration and a filter capacitor on the base connection. A third transistor as a so-called Bhelper is not used since the transistors in IHP's SG13G2 technology have a reasonably large B of around 650. Therefore, the base currents' influence on the mirror ratio is negligible. Cascode versions of this mirror would require a larger voltage headroom and are not considered to save DC power. The voltage headroom for this mirror is  $V_{\text{source}} = I_{\text{source}} R_{\text{E}} / x + V_{\text{CE,sat}}$ , where the



Figure B.5:  $r_{\text{source}}$  and  $C_{\text{source}}$  when sweeping a)  $R_{\text{E}}$  with  $C_{\text{B}} = 300$  fF and b)  $C_{\text{B}}$  with  $R_{\text{E}} = 60 \Omega$ .

 $I_{\rm c}(V_{\rm CE})$  curve has it's knee, is at  $V_{\rm CE} = 250$  mV. To make sure to stay in the save operating region  $V_{\rm source}$  was kept between 0.7 V and 0.9 V.  $R_{\rm ref}$  provides a small but simple form of protection against ESD or wrong bias settings. When also scaling  $R_{\rm E}$  properly, the source current is defined by the transistor finger ratio *x*:

$$I_{\text{source}} = xI_{\text{ref}}.$$
 (B.4)

Due to a limited Early-voltage the mirror ratio is also affected by  $V_{CE,Q2}$ . Therefore, the designs in this work try to keep  $V_{CE,Q2}$  as close as possible to  $V_{BE,Q1} = V_{BE,Q2}$ . In addition to provide temperature stability  $R_E$  does also increase the source impedance:

$$Z_{\text{source}} = r_{\text{source}} || \frac{1}{j\omega C_{\text{source}}}$$
(B.5)

$$=\frac{r_{\text{source}}}{1+j\omega C_{\text{source}}r_{\text{source}}}$$
(B.6)

Figure B.5a shows the effect of  $R_{\rm E}$  on  $r_{\rm source}$  and  $C_{\rm source}$  for a  $C_{\rm B} = 300$  fF. It can be seen that a larger  $R_{\rm E}$  reduces the parasitic capacitance. Additionally, the output resistance increases at higher frequencies. In Figure B.5b the benefit of  $C_{\rm B}$  with  $R_{\rm E} = 60 \,\Omega$  is visualized.  $C_{\rm B}$  increases the high frequency  $r_{\rm source}$ . An AC voltage on the output terminal is coupled via  $C_{\mu}$  and the wiring to the base. Which in turn will then be amplified through the source transistor. Similar to the Miller effect, this coupling capacitance will appear amplified at

the output node.  $C_{\rm B}$  shorts this feedback voltage on the base node effectively reducing  $C_{\rm source}$  to the collector to substrate capacitance. Nevertheless, due to this capacitance, the output impedance of these current sources drops with frequency (see Equation (B.6)). Unfortunately, when using these mirrors, the gain of a differential amplifier does drop with temperature due to  $g_{\rm m} \approx \frac{I_{\rm c}}{V_{\rm T}}$ . This could be solved by using a positive to absolute temperature reference current with the PTAT slope scaled correctly. This is why a PTAT current output is tested in the reference circuit in Appendix B.1. The mirrors used in IHP's SG25H5 EPIC process for the circuit in Chapter 4 follow the same design principles, but the SG25H5 HBTs have a reduced early voltage and thus a worse output impedance.

#### **B.3 Reference Current Generator**

The reference current to be used in current mirrors can be generated from a stable reference voltage. The receiver in Chapter 4 uses a current generator shown in Figure B.6. An operational amplifier controls  $V_{\rm G}$  and thus the current



Figure B.6: Reference current generator. For  $V_{ref} = 1.085$  V this circuit can generate five times 1 mA to be used in current mirrors.

of the PMOS source transistor  $P_1$ . The voltage drop across  $R_2$  needs to equal  $V_{\text{ref}}$ . With the size of  $R_2$ , the reference current is adjusted. The branches  $P_{2,n}$  mirror this current five times with a factor of two.  $R_1$  lowers  $V_{\text{DS},\text{P1}}$  to match the expected  $V_{\text{DS},\text{P2}}$  improving the mirror ratio of  $P_1$  and  $P_{2,n}$ . In this implementation the circuit generates five times 1 mA to be used in current mirrors from  $V_{\text{ref}} = 1.085$  V.

### C Control Circuits used in This Work

### C.1 Operational Amplifiers

The operational amplifiers used in Figure C.1a presents the op-amp in the  $I_{\rm DC}$  control loop. The loop filter is integrated into this two stage design. It uses an area of 90  $\mu$ m × 65  $\mu$ m.

The AGC uses the op-amp depicted in Figure C.1b. It's loop filter is just an *RC*-filter at the output. The area consumption is with  $100 \,\mu\text{m} \times 65 \,\mu\text{m}$  similar to the previous one.



MOS transistors are high-voltage variants and sizes are given in µm.



MOS transistors are high-voltage variants and sizes are given in µm.

(b)

Figure C.1: Operational amplifiers used in Chapter 3: a)  $I_{\rm DC}$  control loop and b) AGC.



Figure C.2: Operational amplifier used in the coherent receiver (Chapter 4).

The control loops in the coherent receiver receiver (Chapter 4) use the PMOS input stage operational amplifiers shown in Figure C.2. Large feedback capacitors help to form a low pole in the loop gain. The 0.88 pF capacitor from the output to  $V_{in,n}$  also forms an integrator. Each op-amp uses 55  $\mu$ m × 90  $\mu$ m with the capacitors using up most of that area.

### C.2 Switches

The switch to select manual or automatic gain modes in Chapter 3 is based on two transmission gates. Inverters buffer and generate the inverted control signals, and then parallel connections of NMOS and PMOS transistors are used as a switch. Figure C.3 shows the schematic of that switch occupying  $50 \,\mu\text{m} \times 50 \,\mu\text{m}$ .

The receiver in Chapter 4 features an internal reference as shown in Appendix B.1.2. If that reference does not perform as expected or different bias settings are to be tested, an external voltage supplied through a dedicated pad could be used as a reference. The design uses a PMOS differential pair as a comparator, as shown in Figure C.4a. As soon as  $V_{\text{ref,ext}}$  raises above the



MOS transistors are high-voltage variants and sizes are given in µm.

Figure C.3: Gain control mode switch used in Chapter 3.

threshold  $V_{\text{thresh}}$  the control voltage  $V_{\text{switch}}$  raises to  $V_{\text{dd}}$ .  $V_{\text{thresh}}$  is adjusted by the divider formed with  $R_1$  and  $R_2$ , with the given values  $V_{\text{thresh}}$  is 0.67 V. This value is selected to have some margin for mismatch-induced offset voltages in the comparator. If the pad with  $V_{ref,ext}$  is not connected  $R_3$  pulls  $V_{ref,ext}$  down.  $V_{\text{switch}}$  is used for the transmission gate selecting which reference is supplied to the biasing circuit. The selection gate is shown in Figure C.4b. It uses the same structure however with slightly modified device sizes as shown in Figure C.3.



Figure C.4: Selection switch logic for the reference voltage used in the coherent receiver presented in Chapter 4. a) Comparator to switch reference voltages depending on the value of V<sub>ref,ext</sub> and b) The transmission-gate-based switch acting on the output of the comparator.

### **Bibliography**

- [Abr82] M. Abraham. Design of Butterworth-type transimpedance and, bootstrap-transimpedance preamplifiers for fiber-optic receivers. *IEEE Transactions on Circuits and Systems*, 29(6):375–382, June 1982.
- [AHW<sup>+</sup>19] Mostafa G Ahmed, Tam N. Huynh, Christopher Williams, Yong Wang, Pavan Kumar Hanumolu, and Alexander Rylyakov. 34-GBd Linear Transimpedance Amplifier for 200-Gb/s DP-16-QAM Optical Coherent Receivers. *IEEE Journal* of Solid-State Circuits, 54(3):834–844, March 2019.
- [ANK<sup>+</sup>18] Ahmed Awny, Rajasekhar Nagulapalli, Marcel Kroh, Jan Hoffmann, Patrick Runge, Daniel Micusik, Gunter Fischer, Ahmet Cagri Ulusoy, Minsu Ko, and Dietmar Kissinger. A Linear Differential Transimpedance Amplifier for 100-Gb/s Integrated Coherent Optical Fiber Receivers. *IEEE Transactions on Microwave Theory and Techniques*, 66(2):973–986, February 2018.
- [ANM<sup>+</sup>16] Ahmed Awny, Rajasekhar Nagulapalli, Daniel Micusik, Jan Hoffmann, Gunter Fischer, Dietmar Kissinger, and Ahmet Cagri Ulusoy. A dual 64Gbaud 10kΩ 5% THD linear differential transimpedance amplifier with automatic gain control in 0.13µm BiCMOS technology for optical fiber coherent receivers. In 2016 IEEE International Solid-State Circuits Conference (ISSCC), pages 406–407, January 2016.
- [ANW<sup>+</sup>15] Ahmed Awny, Rajasekhar Nagulapalli, Georg Winzer, Marcel Kroh, Daniel Micusik, Stefan Lischke, Dieter Knoll, Gunter Fischer, Dietmar Kissinger, Ahmet Cagri Ulusoy, and Lars Zimmermann. A 40 Gb/s Monolithically Integrated Linear Photonic Receiver in a 0.25 μm BiCMOS SiGe:C Technology. *IEEE Microwave and Wireless Components Letters*, 25(7):469–471, July 2015.

| [AVI+23] | Abdelrahman H. Ahmed, Leonardo Vera, Lorenzo Iotti,        |
|----------|------------------------------------------------------------|
|          | Ruizhi Shi, Sudip Shekhar, and Alexander Rylyakov. A Dual- |
|          | Polarization Silicon-Photonic Coherent Receiver Front-End  |
|          | Supporting 528 Gb/s/Wavelength. IEEE Journal of Solid-     |
|          | State Circuits, 58(8):2202–2213, August 2023.              |

- [BAC<sup>+</sup>23] Thomas Baehr-Jones, Shahab Ardalan, Matthew Chang, Saman Jafarlou, Xavier Serey, George Zarris, Gabriel Thompson, Artsroun Darbinian, Brian West, Babak Behnia, Vesselin Velev, Yun Zhe Li, Katherine Roelofs, Wuchun Wu, Jim Mali, Jiahao Zhan, Noam Ophir, Chris Horng, Romanas Narevich, Fen Guan, Jinghui Yang, Hao Wu, Patrick Maupin, Rhys Manley, Yogi Ahuja, Ari Novack, Lei Wang, and Matthew Streshinsky. Monolithically integrated 112 Gbps PAM4 optical transmitter and receiver in a 45 nm CMOS-silicon photonics process. *Optics Express*, 31(15):24926, July 2023.
- [BCT<sup>+</sup>17] Shanthi Bhagavatheeswaran, Terry Cummings, Eric Tangen, Matt Heins, Richard Chan, and Craig Steinbeiser. A 56 Gb/s PAM-4 linear transimpedance amplifier in 0.13-μm SiGe BiCMOS technology for optical receivers. In 2017 IEEE Compound Semiconductor Integrated Circuit Symposium (CSICS), pages 1–4, Miami, FL, October 2017. IEEE.
- [BPAB12] Christian Bredendiek, Nils Pohl, Klaus Aufinger, and Attila Bilgic. An ultra-wideband D-Band signal source chip using a fundamental VCO with frequency doubler in a SiGe bipolar technology. In 2012 IEEE Radio Frequency Integrated Circuits Symposium, pages 83–86, June 2012.
- [Bro74] A.P. Brokaw. A simple three-terminal IC bandgap reference. *IEEE Journal of Solid-State Circuits*, 9(6):388–393, December 1974.
- [BSH<sup>+</sup>15] Guido Belfiore, Laszlo Szilagyi, Ronny Henker, Udo Jörges, and Frank Ellinger. Design of a 56 Gbit/s 4-level pulseamplitude-modulation inductor-less vertical-cavity surfaceemitting laser driver integrated circuit in 130 nm BiCMOS technology. *IET Circuits, Devices & Systems*, 9(3):213–220, May 2015.
- [CBYJ09] A.Y.-K. Chen, Y. Baeyens, Young-Kai Chen, and Jenshan Lin. A 36–80 GHz High Gain Millimeter-Wave Double-

Balanced Active Frequency Doubler in SiGe BiCMOS. *IEEE Microwave and Wireless Components Letters*, 19(9):572–574, September 2009.

- [Cis23] Cisco. Cisco Annual Internet Report (2018–2023). https://www.cisco.com/c/en/us/solutions/executiveperspectives/annual-internet-report/index.html, March 2023.
- [CMZ<sup>+</sup>16] Sudipta Chakraborty, Leigh E. Milner, Xi Zhu, Leonard T. Hall, Oya Sevimli, and Michael C. Heimlich. A K-Band Frequency Doubler With 35-dB Fundamental Rejection Based on Novel Transformer Balun in 0.13-μm SiGe Technology. *IEEE Electron Device Letters*, 37(11):1375–1378, November 2016.
- [CYS<sup>+</sup>20] Robert Costanzo, Qianhuan Yu, Xiaochuan Shen, Junyi Gao, Andreas Beling, and Steven M. Bowers. Low-Noise Balanced Photoreceiver with Waveguide SiN Photodetectors and SiGe TIA. In 2020 Conference on Lasers and Electro-Optics (CLEO), pages 1–2, May 2020.
- [DBV05] T. Dickson, R. Beerkens, and S.P. Voinigescu. A 2.5-V 45-Gb/s decision circuit using SiGe BiCMOS logic. *IEEE Jour*nal of Solid-State Circuits, 40(4):994–1003, April 2005.
- [DFMK21] Giannino Dziallas, Adel Fatemi, Andrea Malignaggi, and Gerhard Kahmen. A 97-GHz 66-dBΩ SiGe BiCMOS Low-Noise Transimpedance Amplifier for Optical Receivers. *IEEE Microwave and Wireless Components Letters*, 31(12):1295– 1298, December 2021.
- [DV05] T.O. Dickson and S.P. Voinigescu. SiGe BiCMOS topologies for low-voltage millimeter-wave voltage controlled oscillators and frequency dividers. In *Digest of Papers. 2005 Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems, 2005.*, pages 273–276, San Diego, CA, USA, 2005. IEEE.
- [DYC<sup>+</sup>06] T.O. Dickson, K.H.K. Yau, T. Chalvatzis, A.M. Mangan, E. Laskin, R. Beerkens, P. Westergaard, M. Tazlauanu, M.-T. Yang, and S.P. Voinigescu. The Invariance of Characteristic Current Densities in Nanoscale MOSFETs and Its Impact

on Algorithmic Design Methodologies and Design Porting of Si(Ge) (Bi)CMOS High-Speed Building Blocks. *IEEE Journal of Solid-State Circuits*, 41(8):1830–1845, August 2006.

- [EAW<sup>+</sup>16] M. H. Eissa, A. Awny, G. Winzer, M. Kroh, S. Lischke, D. Knoll, L. Zimmermann, D. Kissinger, and A. C. Ulusoy. A wideband monolithically integrated photonic receiver in 0.25μm SiGe:C BiCMOS technology. In ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, pages 487–490, September 2016.
- [EHB<sup>+</sup>17] Arzu Ergintav, Frank Herzel, Johannes Borngraber, Dietmar Kissinger, and Herman Jalli Ng. An integrated 122GHz differential frequency doubler with 37GHz bandwidth in 130 nm SiGe BiCMOS technology. In 2017 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pages 53–56, March 2017.
- [EMF<sup>+</sup>22]
  C. Eschenbaum, A. Mertens, C. Füllner, A. Kuzmin, A. Schwarzenberger, A. Kotz, G. Ramann, M. Chen, J. Drisko, B. Johnson, J. Zyskind, J. Marcelli, M. Lebby, W. Freude, S. Randel, and C. Koos. Thermally Stable Silicon-Organic Hybrid (SOH) Mach-Zehnder Modulator for 140 GBd PAM4 transmission with Sub-1 V Drive Signals. In 2022 European Conference on Optical Communication (ECOC), pages 1–4, September 2022.
- [FEM<sup>+</sup>20] Aniello Franzese, Mohamed H. Eissa, Thomas Mausolf, Dietmar Kissinger, Renato Negra, and Andrea Malignaggi. Ultra Broadband Low-Power 70 GHz Active Balun in 130-nm SiGe BiCMOS. In 2020 IEEE BiCMOS and Compound Semiconductor Integrated Circuits and Technology Symposium (BCI-CTS), pages 1–4, November 2020.
- [Fri44] H.T. Friis. Noise Figures of Radio Receivers. *Proceedings of the IRE*, 32(7):419–422, July 1944.
- [FSN<sup>+</sup>12] Wolfgang Freude, Rene Schmogrow, Bernd Nebendahl, Marcus Winter, Arne Josten, David Hillerkuss, Swen Koenig, Joachim Meyer, Michael Dreschmann, Michael Huebner, Christian Koos, Juergen Becker, and Juerg Leuthold. Quality metrics for optical signals: Eye diagram, Q-factor, OSNR,

EVM and BER. In 2012 14th International Conference on Transparent Optical Networks (ICTON), pages 1–4, July 2012.

- [GHJN48] E.L. Ginzton, W.R. Hewlett, J.H. Jasberg, and J.D. Noe. Distributed Amplification. *Proceedings of the IRE*, 36(8):956– 969, August 1948.
- [GLAR<sup>+</sup>18] Iria Garcia Lopez, Ahmed Awny, Pedro Rito, Minsu Ko, Ahmet Cagri Ulusoy, and Dietmar Kissinger. 100 Gb/s Differential Linear TIAs With Less Than 10 pA/√Hz in 130nm SiGe:C BiCMOS. *IEEE Journal of Solid-State Circuits*, 53(2):458–469, February 2018.
- [GLRP<sup>+</sup>17] I. Garcia Lopez, P. Rito, D. Petousi, L. Zimmermann, M. Kroh, S. Lischke, D. Knoll, A. Awny, A. C. Ulusoy, and D. Kissinger. A 40 Gb/s PAM-4 monolithically integrated photonic transmitter in 0.25 μm SiGe:C BiCMOS EPIC platform. In 2017 IEEE 17th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), pages 30–32, January 2017.
- [GP01] G. Giustolisi and G. Palumbo. Detailed frequency analysis of power supply rejection in Brokaw bandgap. In ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196), volume 1, pages 731–734, 2001.
- [GTH<sup>+</sup>03] Gang Wang, T. Tokumitsu, I. Hanawa, Y. Yoneda, K. Sato, and M. Kobayashi. A time-delay equivalent-circuit model of ultrafast p-i-n photodiodes. *IEEE Transactions on Microwave Theory and Techniques*, 51(4):1227–1233, April 2003.
- [HBB<sup>+</sup>10] B. Heinemann, R. Barth, D. Bolze, J. Drews, G. G. Fischer, A. Fox, O. Fursenko, T. Grabolla, U. Haak, D. Knoll, R. Kurps, M. Lisker, S. Marschmeyer, H. Rucker, D. Schmidt, J. Schmidt, M. A. Schubert, B. Tillack, C. Wipf, D. Wolansky, and Y. Yamamoto. SiGe HBT technology with fT/fmax of 300GHz/500GHz and 2.0 ps CML gate delay. In 2010 International Electron Devices Meeting, pages 30.5.1–30.5.4, December 2010.

| [HN16]                | Cuong Huynh and Cam Nguyen. Ultra-Wideband Active<br>Balun Topology and Its Implementation on SiGe BiCMOS<br>Across DC-50 GHz. <i>IEEE Microwave and Wireless Compo-</i><br><i>nents Letters</i> , 26(9):720–722, September 2016.                                                                                                                              |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [HSZ22]               | Joachim Hebeler, Luca Steinweg, and Thomas Zwick. Differ-<br>ential bondwire interface for chip-to-chip and chip-to-antenna<br>interconnect above 200 GHz. In <i>2022 52nd European Mi-</i><br><i>crowave Conference (EuMC)</i> , pages 306–309, September<br>2022.                                                                                            |
| [IEE22]               | IEEE. IEEE Standard for Ethernet - Amendment 3: Physical Layer Specifications and Management Parameters for 100 Gb/s, 200 Gb/s, and 400 Gb/s Operation over Optical Fiber using 100 Gb/s Signaling. <i>IEEE Std 802.3db-2022 (Amendment to IEEE Std 802.3-2022 as amended by IEEE Std 802.3dd-2022 and IEEE Std 802.3cs-2022)</i> , pages 1–73, December 2022. |
| [IEE23]               | IEEE802. IEEE P802.3dj 200 Gb/s, 400 Gb/s, 800 Gb/s, and 1.6 Tb/s Ethernet Task Force. https://www.ieee802.org/3/dj/index.html, March 2023.                                                                                                                                                                                                                    |
| [Int19]               | Intel Corporation. AN 835: PAM4 Signaling Fundamentals, March 2019.                                                                                                                                                                                                                                                                                            |
| [IPGM22]              | Mesut Inac, Anna Peczek, Friedel Gerfers, and Andrea Ma-<br>lignaggi. Inductorless 96 Gb/s PAM-4 Optical Modulators<br>Driver in SiGe:C BiCMOS. In 2022 17th European Mi-<br>crowave Integrated Circuits Conference (EuMIC), pages 284–<br>287, September 2022.                                                                                                |
| [JCC <sup>+</sup> 19] | Yong-Un Jeong, Joo-Hyung Chae, Sungphil Choi, Jaekwang Yun, Shin-Hyun Jeong, and Suhwan Kim. A Low-Power and Low-Noise 20:1 Serializer with Two Calibration Loops in 55-nm CMOS. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pages 1–6, July 2019.                                                                  |
| [JHR05]               | Juo-Jung Hung, T.M. Hancock, and G.M. Rebeiz. High-<br>power high-efficiency SiGe Ku- and Ka-band balanced fre-<br>quency doublers. <i>IEEE Transactions on Microwave Theory</i><br><i>and Techniques</i> , 53(2):754–761, February 2005.                                                                                                                      |

| [KBE21]  | Mohammad Mahdi Khafaji, Guido Belfiore, and Frank               |
|----------|-----------------------------------------------------------------|
|          | Ellinger. A Linear 65-GHz Bandwidth and 71-dB $\Omega$ Gain TIA |
|          | With 7.2 pA/√Hz in 130-nm SiGe BiCMOS. IEEE Solid-State         |
|          | Circuits Letters, 4:76–79, 2021.                                |
| [KFZ+20] | Clemens Kieninger, Christoph Füllner, Heiner Zwickel, Yasar     |
|          | Kutuwantovida Junad N. Kamal Canstan Eashanhaum Dal             |

- Kutuvantavida, Juned N. Kemal, Carsten Eschenbaum, Delwin L. Elder, Larry R. Dalton, Wolfgang Freude, Sebastian Randel, and Christian Koos. SOH Mach-Zehnder Modulators for 100 GBd PAM4 Signaling with Sub-1 dB Phase-Shifter Loss. In 2020 Optical Fiber Communications Conference and Exhibition (OFC), pages 1–3, March 2020.
- [KHSE10] Christian Knochenhauer, Stefan Hauptmann, J. Christoph Scheytt, and Frank Ellinger. A Jitter-Optimized Differential 40-Gbit/s Transimpedance Amplifier in SiGe BiCMOS. *IEEE Transactions on Microwave Theory and Techniques*, 58(10):2538–2548, October 2010.
- [Kik16] Kazuro Kikuchi. Fundamentals of Coherent Optical Fiber Communications. *Journal of Lightwave Technology*, 34(1):157–179, January 2016.
- [KKB<sup>+</sup>22] Jihwan Kim, Sandipan Kundu, Ajay Balankutty, Matthew Beach, Bong Chan Kim, Stephen T. Kim, Yutao Liu, Savyasaachi Keshava Murthy, Priya Wali, Kai Yu, Hyung Seok Kim, Chuan-Chang Liu, Dongseok Shin, Ariel Cohen, Yoav Segal, Yongping Fan, Peng Li, and Frank O'Mahony. A 224-Gb/s DAC-Based PAM-4 Quarter-Rate Transmitter With 8-Tap FFE in 10-nm FinFET. *IEEE Journal* of Solid-State Circuits, 57(1):6–20, January 2022.
- [KLB<sup>+</sup>15] D. Knoll, S. Lischke, R. Barth, L. Zimmermann, B. Heinemann, H. Rucker, C. Mai, M. Kroh, A. Peczek, A. Awny, C. Ulusoy, A. Trusch, A. Kruger, J. Drews, M. Fraschke, D. Schmidt, M. Lisker, K. Voigt, E. Krune, and A. Mai. Highperformance photonic BiCMOS process for the fabrication of high-bandwidth electronic-photonic integrated circuits. In 2015 IEEE International Electron Devices Meeting (IEDM), pages 15.6.1–15.6.4, December 2015.
- [LBJC18] Hao Li, Ganesh Balamurugan, James Jaussi, and Bryan Casper. A 112 Gb/s PAM4 Linear TIA with 0.96 pJ/bit En-

ergy Efficiency in 28 nm CMOS. In *ESSCIRC 2018 - IEEE* 44th European Solid State Circuits Conference (ESSCIRC), pages 238–241, September 2018.

- [LSH<sup>+</sup>21] Hao Li, Jahnavi Sharma, Chun-Ming Hsu, Ganesh Balamurugan, and James Jaussi. 11.6 A 100Gb/s-8.3dBm-Sensitivity PAM-4 Optical Receiver with Integrated TIA, FFE and Direct-Feedback DFE in 28nm CMOS. In 2021 IEEE International Solid- State Circuits Conference (ISSCC), volume 64, pages 190–192, February 2021.
- [LUTS10] Gang Liu, A. Cagri Ulusoy, Andreas Trasser, and Hermann Schumacher. 64 to 86 GHz VCO utilizing push-push frequency doubling in a 80 GHz fT SiGe HBT technology. In 2010 Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), pages 239–242, New Orleans, LA, January 2010. IEEE.
- [LZH13] Yupeng Li, Yangan Zhang, and Yongqing Huang. Any Bias Point Control Technique for Mach–Zehnder Modulator. *IEEE Photonics Technology Letters*, 25(24):2412–2415, December 2013.
- [MAL<sup>+</sup>23] Ghazal Movaghar, Viviana Arrunategui, Junqian Liu, Aaron Maharry, Clint Schow, and James Buckwalter. A 112-Gbps, 0.73-pJ/bit Fully-Integrated O-band I-Q Optical Receiver in a 45-nm CMOS SOI-Photonic Process. In 2023 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), pages 5–8, San Diego, CA, USA, June 2023. IEEE.
- [NWJ<sup>+</sup>18] Munehiko Nagatani, Hitoshi Wakita, Teruo Jyo, Miwa Mutoh, Minoru Ida, Sorin P. Voinigescu, and Hideyuki Nosaka. A 256-Gbps PAM-4 Signal Generator IC in 0.25-μm InP DHBT Technology. In 2018 IEEE BiCMOS and Compound Semiconductor Integrated Circuits and Technology Symposium (BCICTS), pages 28–31, San Diego, CA, October 2018. IEEE.
- [Opt17a] Optical Internetworking Forum. Common Electrical I/O (CEI) Electrical and Jitter Interoperability agreements for 6G+ bps, 11G+ bps, 25G+ bps I/O and 56G+ bps (OIF-CEI-04.0), December 2017.

- [Opt17b] Optical Internetworking Forum. Implementation Agreement for Micro Intradyne Coherent Receivers (OIF-DPC-MRX-02.0), June 2017.
- [PAA<sup>+</sup>21] Ivan Peric, Attilio Andreazza, Heiko Augustin, Marlon Barbero, Mathieu Benoit, Raimon Casanova, Felix Ehrler, Giuseppe Iacobucci, Richard Leys, Annie Meneses Gonzalez, Patrick Pangaud, Mridula Prathapan, Rudolf Schimassek, Andre Schoning, Eva Vilella Figueras, Alena Weber, Michele Weber, Winnie Wong, and Hui Zhang. High-Voltage CMOS Active Pixel Sensor. *IEEE Journal of Solid-State Circuits*, 56(8):2488–2502, August 2021.
- [PKN<sup>+</sup>05] V. Puyal, A. Konczykowska, P. Nouet, S. Bernard, M. Riet, M. Jorge, and J. Godin. A broad-band active frequency doubler operating up to 120 GHz. In 2005 European Microwave Conference, pages 4 pp.–1506, 2005.
- [PSC23] Dhruv Patel, Alireza Sharif-Bakhtiar, and Tony Chan Carusone. A 112-Gb/s —8.2-dBm Sensitivity 4-PAM Linear TIA in 16-nm CMOS With Co-Packaged Photodiodes. *IEEE Journal of Solid-State Circuits*, 58(3):771–784, March 2023.
- [Raz16] Behzad Razavi. The Bandgap Reference [A Circuit for All Seasons]. IEEE Solid-State Circuits Magazine, 8(3):9–12, 2016.
- [RBT<sup>+</sup>21] Patrick Runge, Tobias Beckerwerth, Ute Troppenz, Marko Gruner, Hendrik Boerma, Martin Möhrle, and Martin Schell. InP-Components for 100 GBaud Optical Data Center Communication. *Photonics*, 8(1):18, January 2021.
- [RFC21] Sunil G. Rao, Milad Frounchi, and John D. Cressler. Triaxial Balun With Inherent Harmonic Reflection for Millimeter-Wave Frequency Doublers. *IEEE Transactions on Microwave Theory and Techniques*, 69(6):2822–2831, June 2021.
- [RLA<sup>+</sup>17] P. Rito, I. Garcia Lopez, A. Awny, M. Ko, A. C. Ulusoy, and D. Kissinger. High-efficiency 100-Gb/s 4-Vpp PAM-4 driver in SiGe:C BiCMOS for optical modulators. In 2017 IEEE Asia Pacific Microwave Conference (APMC), pages 1– 4, November 2017.

| [RLP <sup>+</sup> 16] | P. Rito, I. García López, D. Petousi, L. Zimmermann,<br>M. Kroh, S. Lischke, D. Knoll, D. Micusik, A. Awny, A. C.<br>Ulusoy, and D. Kissinger. A Monolithically Integrated Seg-<br>mented Linear Driver and Modulator in EPIC 0.25- $\mu$ m<br>SiGe:C BiCMOS Platform. <i>IEEE Transactions on Microwave</i><br><i>Theory and Techniques</i> , 64(12):4561–4572, December 2016.                                                                                                                                                                                                                                                     |
|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [RMN <sup>+</sup> 20] | Michal Rakowski, Colleen Meagher, Karen Nummy, Abdel-<br>salam Aboketaf, Javier Ayala, Yusheng Bian, Brendan Har-<br>ris, Kate Mclean, Kevin McStay, Asli Sahin, Louis Medina,<br>Bo Peng, Zoey Sowinski, Andy Stricker, Thomas Houghton,<br>Crystal Hedges, Ken Giewont, Ajey Jacob, Ted Letavic, Dave<br>Riggs, Anthony Yu, and John Pellerin. 45nm CMOS - Sil-<br>icon Photonics Monolithic Technology (45CLO) for next-<br>generation, low power and high speed optical interconnects.<br>In <i>Optical Fiber Communication Conference (OFC) 2020</i> ,<br>page T3H.3, San Diego, California, 2020. Optica Publishing<br>Group. |
| [RTCE18]              | Vincent Rieß, Paolo Valerio Testa, Corrado Carta, and Frank<br>Ellinger. Analysis and Design of a 60 GHz Fully-Differential<br>Frequency Doubler in 130 nm SiGe BiCMOS. In 2018 IEEE<br>International Symposium on Circuits and Systems (ISCAS),<br>pages 1–5, May 2018.                                                                                                                                                                                                                                                                                                                                                            |
| [RYL16]               | Sujiang Rong, Jun Yin, and Howard C. Luong. A 0.05- to 10-GHz, 19- to 22-GHz, and 38- to 44-GHz Frequency Synthesizer for Software-Defined Radios in 0.13- $\mu$ m CMOS Process. <i>IEEE Transactions on Circuits and Systems II: Express Briefs</i> , 63(1):109–113, January 2016.                                                                                                                                                                                                                                                                                                                                                 |
| [Säc17]               | Eduard Säckinger. <i>Analysis and Design of Transimpedance</i><br><i>Amplifiers for Optical Receivers.</i> John Wiley & Sons, Inc.,<br>Hoboken, NJ, USA, October 2017.                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| [SRCE19]              | Paul Starke, Vincent Ries, Corrado Carta, and Frank Ellinger.<br>Active Single-Ended to Differential Converter (Balun) for DC<br>up to 70 GHz in 130 nm SiGe. In 2019 IEEE BiCMOS and<br>Compound Semiconductor Integrated Circuits and Technol-<br>ogy Symposium (BCICTS), pages 1–4, Nashville, TN, USA,<br>November 2019. IEEE.                                                                                                                                                                                                                                                                                                  |

- [TCE15] Paolo Valerio Testa, Corrado Carta, and Frank Ellinger. 220 GHz wideband distributed active power combiner. In 2015 Asia-Pacific Microwave Conference (APMC), volume 2, pages 1–3, December 2015.
- [Tho49] W.E. Thomson. Delay networks having maximally flat frequency characteristics. *Proceedings of the IEE - Part III: Radio and Communication Engineering*, 96(44):487–490, November 1949.
- [TL09] Diego Fabian Tondo and Ramiro Rogelio Lopez. A lowpower, high-speed CMOS/CML 16:1 serializer. In 2009 Argentine School of Micro-Nanoelectronics, Technology and Applications, pages 81–86, October 2009.
- [TLK<sup>+</sup>13] Ping-Han Tsai, Yu-Hsuan Lin, Jing-Lin Kuo, Zuo-Min Tsai, and Huei Wang. Broadband Balanced Frequency Doublers With Fundamental Rejection Enhancement Using a Novel Compensated Marchand Balun. *IEEE Transactions on Mi*crowave Theory and Techniques, 61(5):1913–1923, May 2013.
- [VDHDSR<sup>+</sup>23] Menno Van Den Hout, Giammarco Di Sciullo, Georg Rademacher, Ruben S. Luís, Benjamin J. Puttnam, Nicolas K. Fontaine, Roland Ryf, Haoshuo Chen, Mikael Mazur, David T. Neilson, Pierre Sillard, Frank Achten, Jun Sakaguchi, Cristian Antonelli, Chigo Okonkwo, and Hideaki Furukawa. 273.6 Tb/s Transmission Over 1001 km of 15-Mode Fiber Using 16-QAM C-Band Signals. In 2023 Optical Fiber Communications Conference and Exhibition (OFC), pages 1–3, March 2023.
- [VKDKP<sup>+</sup>19] Joris Van Kerrebrouck, Timothy De Keulenaer, Ramses Pierco, Jan De Geest, Jeffrey H. Sinsky, Bartlomiej Kozicki, Xin Yin, Guy Torfs, and Johan Bauwelinck. NRZ, Duobinary, or PAM4?: Choosing Among High-Speed Electrical Interconnects. *IEEE Microwave Magazine*, 20(7):24–35, July 2019.
- [VMA<sup>+</sup>21] Luis A. Valenzuela, Aaron Maharry, Hector Andrade, Clint L. Schow, and James F. Buckwalter. Energy Optimization for Optical Receivers Based on a Cherry-Hooper Emitter

Follower Transimpedance Amplifier Front-end in 130-nm SiGe HBT Technology. *Journal of Lightwave Technology*, 39(23):7393–7405, December 2021.

- [VMD<sup>+</sup>22] Luis A. Valenzuela, Ghazal Movaghar, James Dalton, Navid Hosseinzadeh, Hector Andrade, Aaron Maharry, Clint L. Schow, and James F. Buckwalter. An Energy-Efficient, 60-Gbps Variable Transimpedance Optical Receiver in a 90-nm SiGe HBT Technology. In 2022 IEEE/MTT-S International Microwave Symposium - IMS 2022, pages 279–282, June 2022.
- [VMW<sup>+</sup>22] Forrest Valdez, Viphretuo Mere, Xiaoxi Wang, Nicholas Boynton, Thomas A. Friedmann, Shawn Arterburn, Christina Dallo, Andrew T. Pomerene, Andrew L. Starbuck, Douglas C. Trotter, Anthony L. Lentine, and Shayan Mookherjea. 110 GHz, 110 mW hybrid silicon-lithium niobate Mach-Zehnder modulator. *Scientific Reports*, 12(1):18611, November 2022.
- [Voi13] Sorin Voinigescu. *High-Frequency Integrated Circuits*. Cambridge RF and Microwave Engineering Series. Cambridge University Press, Cambridge ; New York, 2013.
- [WCL<sup>+</sup>22] Zhongkai Wang, Minsoo Choi, Kyoungtae Lee, Kwanseo Park, Zhaokai Liu, Ayan Biswas, Jaeduk Han, Sijun Du, and Elad Alon. An Output Bandwidth Optimized 200-Gb/s PAM-4 100-Gb/s NRZ Transmitter With 5-Tap FFE in 28-nm CMOS. *IEEE Journal of Solid-State Circuits*, 57(1):21–31, January 2022.
- [WHO<sup>+</sup>00] K. Washio, R. Hayami, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and M. Kondo. 67-GHz static frequency divider using 0.2-μm self-aligned SiGe HBTs. In 2000 IEEE Radio Frequency Integrated Circuits (RFIC) Symposium Digest of Papers (Cat. No.00CH37096), pages 31–34, Boston, MA, USA, 2000. IEEE.
- [Wid71] R.J. Widlar. New developments in IC voltage regulators. *IEEE Journal of Solid-State Circuits*, 6(1):2–7, February 1971.
- [WYW<sup>+</sup>22] Rui Wang, Xiaohong Yang, Hui Wang, Tingting He, and Yongsheng Tang. A Modified Equivalent Circuit Model for

|         | High-Speed InGaAs/InAlAs Avalanche Photodiodes. <i>Journal of Lightwave Technology</i> , 40(9):2944–2951, May 2022.                        |
|---------|--------------------------------------------------------------------------------------------------------------------------------------------|
| [Xil19] | Xilinx Inc. Virtex UltraScale FPGAs Data Sheet: DC and AC Switching Characteristics (DS893 v1.12), 2019.                                   |
| [Zwi20] | Heiner Zwickel. Silicon-Organic Hybrid Electro-Optic Mod-<br>ulators for High-Speed Communication Systems. PhD thesis,<br>Karlsruhe, 2020. |

### **Own Publications**

- C. Bohn, M. Kaynak, T. Zwick, and A. Ç. Ulusoy, "A 100 GBd PAM-4 Combiner and Driver in SiGe BiCMOS," *IEEE Microwave and Wireless Technology Letters*, vol. 33, pp. 1337–1340, Sept. 2023.
- [2] C. Bohn and A. C. Ulusoy, "A High Bandwidth Energy Efficient Linear Transimpedance Amplifier for Short-Range 100 GBd PAM-4 Applications," in 2022 IEEE/MTT-S International Microwave Symposium - IMS 2022, (Denver, CO, USA), pp. 634–637, IEEE, June 2022.
- [3] C. Bohn, J. Hebeler, C. Koos, T. Zwick, and A. C. Ulusoy, "PAM-4 Driver Amplifier using Distributed Power Combining," in 2021 IEEE MTT-S International Microwave Symposium (IMS), (Atlanta, GA, USA), pp. 390–392, IEEE, June 2021.
- [4] C. Bohn, M. Kaynak, T. Zwick, and A. C. Ulusoy, "Ultra-Wideband Frequency Doubler with Differential Outputs in SiGe BiCMOS," in 2022 IEEE 22nd Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), (Las Vegas, NV, USA), pp. 58–61, IEEE, Jan. 2022.
- [5] G. Gramlich, J. Hebeler, C. Bohn, U. Lemmer, and T. Zwick, "Aerosol Jet Printed Microstrip Lines on Polyimide for D-Band," in 2021 51st European Microwave Conference (EuMC), (London, United Kingdom), pp. 551–554, IEEE, Apr. 2022.
- [6] M. Kretschmann, C. Bohn, B. Nuss, A. Bhutani, A. Tessmann, A. Leuther, and T. Zwick, "THz Broadband Antenna on GaAs using Laser-structured Fused Silica Matching Layer," in 2022 52nd European Microwave Conference (EuMC), pp. 147–150, Sept. 2022.
- [7] K. Smirnova, C. Bohn, M. Kaynak, and A. Ç. Ulusoy, "Ultralow-Power W-Band Low-Noise Amplifier Design in 130-nm SiGe BiCMOS," *IEEE Microwave and Wireless Technology Letters*, vol. 33, pp. 1171–1174, Aug. 2023.
- [8] T.-C. Tsai, C. Bohn, and A. Ç. Ulusoy, "100-GBd Linear Optical Modulator Driver for Short-Reach Links in 130-nm SiGe:C BiCMOS," in

2023 18th European Microwave Integrated Circuits Conference (Eu-MIC), pp. 113–116, Sept. 2023.

- [9] T.-C. Tsai, C. Bohn, J. Hebeler, M. Kaynak, and A. C. Ulusoy, "A Linear and Efficient Power Amplifier Supporting Wideband 64-QAM for 5G Applications from 26 to 30 GHz in SiGe:C BiCMOS," in 2021 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), (Atlanta, GA, USA), pp. 127–130, IEEE, June 2021.
- [10] C. v. Vangerow, C. Bohn, H. Zwickel, C. Koos, and T. Zwick, "50GBit/s PAM-4 Driver Circuit Based on Variable Gain Distributed Power Combiner," in 2019 IEEE 19th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF), pp. 1–3, Jan. 2019.