# Non-Data-Aided Digital Feedforward Clock Recovery for Optical Communication Systems

Zur Erlangung des akademischen Grades eines

# DOKTORS DER INGENIEURWISSENSCHAFTEN (Dr.-Ing.)

von der KIT-Fakultät für Elektrotechnik und Informationstechnik des Karlsruher Instituts für Technologie (KIT)

angenommene

#### DISSERTATION

von

Patrick André Matalla, M.Sc.

Tag der mündlichen Prüfung: 04.07.2025

Hauptreferent: Prof. Dr.-Ing. Sebastian Randel Korreferent: Prof. Dr.-Ing. Georg Rademacher

# **Table of Contents**

| Kι | urzfa | ssung   |                                                    | V  |
|----|-------|---------|----------------------------------------------------|----|
| Pr | efac  | e       |                                                    | ix |
| 1  | Intr  | oductio | on                                                 | 1  |
| 2  | Fun   | damer   | ntals of Digital Non-Data-Aided Clock Recovery     | 7  |
|    | 2.1   | Imper   | fection of Electrical Oscillators                  | 7  |
|    |       | 2.1.1   | Jitter Specifications                              | 11 |
|    | 2.2   | Clock   | Recovery Architectures                             | 12 |
|    |       | 2.2.1   | All-Analog Clock Recovery                          | 13 |
|    |       | 2.2.2   | Hybrid Analog-and-Digital Clock Recovery           | 14 |
|    |       | 2.2.3   | All-Digital Clock Recovery                         | 16 |
|    | 2.3   | Digita  | l Clock Recovery Components                        | 18 |
|    |       | 2.3.1   | Timing Error Acquisition                           | 19 |
|    |       | 2.3.2   | Interpolation                                      | 34 |
|    |       | 2.3.3   | Elastic Buffer                                     | 49 |
|    |       | 2.3.4   | Phase-Locked Loop                                  | 54 |
|    | 2.4   | Digita  | l Clock Recovery Performance Benchmark             | 58 |
|    |       | 2.4.1   | Jitter Metric for Clock Recovery Algorithms        | 58 |
|    |       | 2.4.2   | Clock Recovery Performance Evaluation              | 60 |
|    |       | 2.4.3   | Impact of Clock Frequency Offset on Clock Recovery | 64 |
| 3  | Elas  | stic Bu | ffer Design for All-Digital Clock Recovery         | 67 |
|    | 3.1   | Introd  | uction                                             | 67 |
|    | 3.2   | Elastic | Buffer Concept                                     | 71 |
|    | 3 3   |         | Implementation                                     | 74 |

|   | 3.4                               | Real-Time Experiment                                     | 75 |  |  |  |  |
|---|-----------------------------------|----------------------------------------------------------|----|--|--|--|--|
| 4 | Chr                               | omatic Dispersion Tolerant Clock Recovery for            |    |  |  |  |  |
|   | IM/I                              |                                                          | 79 |  |  |  |  |
|   | 4.1                               |                                                          | 80 |  |  |  |  |
|   | 4.2                               |                                                          | 82 |  |  |  |  |
|   |                                   |                                                          | 82 |  |  |  |  |
|   |                                   | 1                                                        | 87 |  |  |  |  |
|   | 4.3                               | Chromatic Dispersion Tolerant Clock Recovery for         |    |  |  |  |  |
|   |                                   | $\varepsilon$                                            | 91 |  |  |  |  |
|   | 4.4                               | Chromatic Dispersion Tolerant Clock Recovery for         |    |  |  |  |  |
|   |                                   | Low-Roll-Off and Faster-Than-Nyquist Signals             |    |  |  |  |  |
|   | 4.5                               | Experimental Validation                                  | 00 |  |  |  |  |
| 5 |                                   | nosecond Clock Synchronization for Passive Optical       |    |  |  |  |  |
|   |                                   | works                                                    |    |  |  |  |  |
|   | 5.1                               | Introduction                                             | 05 |  |  |  |  |
|   | 5.2                               | PONs Featuring Free-Running Oscillators with All-Digital |    |  |  |  |  |
|   |                                   | Clock Recovery                                           |    |  |  |  |  |
|   | 5.3                               | Experimental Performance Evaluation for PON Upstream 1   | 08 |  |  |  |  |
| 6 | Non-Data-Aided Clock Recovery for |                                                          |    |  |  |  |  |
|   | Cor                               | ntinuous-Variable Quantum Key Distribution 1             | 15 |  |  |  |  |
|   | 6.1                               | Introduction                                             | 15 |  |  |  |  |
|   | 6.2                               | Pilot-Free Digital Timing Synchronization                |    |  |  |  |  |
|   | 6.3                               | Experimental Validation                                  | 20 |  |  |  |  |
| 7 | Joii                              | nt NDA Clock Recovery for SDM Optical                    |    |  |  |  |  |
|   | Trai                              | nsmission Systems                                        | 25 |  |  |  |  |
|   | 7.1                               | Introduction                                             | 26 |  |  |  |  |
|   | 7.2                               | Non-Data-Aided Joint Clock Recovery                      | 28 |  |  |  |  |
|   |                                   | 7.2.1 MIMO Channel Model                                 | 29 |  |  |  |  |
|   |                                   | 7.2.2 Joint Timing Estimation Algorithm                  | 31 |  |  |  |  |
|   |                                   | 7.2.3 Effect of Frequency-Dependent Group Delays 1       | 34 |  |  |  |  |
|   |                                   | 7.2.4 Hardware Complexity Analysis                       | 37 |  |  |  |  |
|   | 7.3                               | Performance Simulation                                   | 42 |  |  |  |  |

|     | 7.4                        | Experimental Validation                                                                                                                           |
|-----|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| 8   | Sun                        | nmary and Outlook                                                                                                                                 |
| Αŗ  | pend                       | dices                                                                                                                                             |
| Α   | A.1<br>A.2                 | Discrete-Time Signals and Systems  Discrete-Time Signals  Discrete-Time Systems  150  A.2.1 Linear Time-Invariant Systems  A.2.2 Z-Transform  160 |
|     |                            | A.2.3 FIR Filters                                                                                                                                 |
| В   | <b>Mul</b> :<br>B.1<br>B.2 | ti-Rate System168Sampling Theorem169Sampling Rate Down- & Upconversion171                                                                         |
| С   | Trar                       | at NDA Clock Recovery for SDM Optical asmission Systems                                                                                           |
|     |                            | Factorization                                                                                                                                     |
| GI  | ossa                       | ry                                                                                                                                                |
| Bi  | bliog                      | raphy                                                                                                                                             |
| Da  | nksa                       | ngung                                                                                                                                             |
| Lis | Jour<br>Cont               | Publications207nal Publications207ference Publications208rint Publications218                                                                     |

# Kurzfassung

Wir leben in einer vernetzten Gesellschaft. Wir verabreden uns per Nachrichtendienst, Besprechungen finden standortübergreifend in der virtuellen Welt statt, das Abendessen wird per Knopfdruck über das Mobiltelefon bestellt und der abendliche Blockbuster wird gestreamt – um nur wenige Beispiele aus dem Alltag zu nennen. Das Internet verbindet Menschen weltweit und längst fordern neue Technologien wie intelligente Städte (sogenannte "Smart Cities"), das Internet der Dinge, kommunizierende Fahrzeugflotten oder das Trainieren von Modellen der künstlichen Intelligenz (KI) in der "Wolke" (engl. Cloud) immer höhere Datenraten mit exponentiellem Wachstum. Technisch ermöglicht wird dieser enorme Datenaustausch durch optische Glasfasernetze, welche das Rückgrat der Telekommunikationsinfrastruktur bilden. In solchen Netzen werden (digitale) Daten auf einem Prozessor zunächst in ein elektrisches Signal umgewandelt und anschließend auf einen optischen Träger moduliert. Das optische Signal propagiert durch die Glasfaser und profitiert hierbei von geringen Verlusten innerhalb eines breiten Frequenzbereiches. Am Empfänger wird das optische Signal in einem Photodetektor zurück in ein elektrisches Signal konvertiert, bevor es digitalisiert wird und in einem Prozessor verarbeitet werden kann.

Die Synchronisierung des Prozessortakts des Empfängers mit dem Sendetakt ist eine essenzielle Voraussetzung für die zuverlässige und fehlerfreie Dekodierung des Empfangssignals. Während die Übertragung des Taktes über eine separate Datenleitung für sehr kurze Distanzen von weniger als etwa einem Kilometer möglich ist, ist dies für größere Distanzen nicht praktikabel. In solchen Fällen wird die Taktinformation aus dem Empfangssignal extrahiert und der Empfängertakt entsprechend synchronisiert. Ein geläufiges Verfahren, um Oszillatoren zu synchronisieren, ist mittels einer rückgekoppelten Regelschleife. Hierbei wird aus

dem Empfangssignal ein Fehlersignal abgeleitet, welches anschließend den Oszillator in den korrekten Arbeitspunkt korrigiert. Diese Regelkreise ermöglichen eine frequenz- und phasenstabile Synchronisation des Empfängertaktes und finden daher Anwendung in optischen Punkt-zu-Punkt-Langstreckennetzen. Regelkreisbasierte Synchronisationsverfahren weisen jedoch eine limitierte Bandbreite auf, sodass hochfrequente Taktfluktuationen (engl. Jitter) unzureichend kompensiert werden und eine schnelle Synchronisation von Datenblöcken in der Burst-Übertragung erschwert wird. Eine alternative Architektur stellt die vorwärtsgerichtete Taktsynchronisation dar. Hierbei findet ein Zeitschätzer Anwendung, welcher den Zeitversatz des Sende- und Empfangstaktes unmittelbar schätzt und folglich das Empfangssignal ohne Regelung korrigieren kann. Da kein rückgekoppelter Regelkreis erforderlich ist, weist diese Methode signifikant bessere Eigenschaften zur Kompensation von hochfrequentem Jitter und zur Synchronisation von Burst-Signalen auf. Darüber hinaus lässt sich die vorwärtsgerichtete Taktsynchronisation voll-digital in modernen CMOS-Schaltungen implementieren, was die Notwendigkeit analoger Schaltungen zur Oszillatorkontrolle überflüssig macht und somit die Sendeempfänger-Komplexität (engl. Transceiver) reduziert. Im Rahmen einer voll-digitalen Synchronisation erfolgt die Korrektur der Oszillatorphase nicht mehr physisch, sondern durch die zeitliche Verzögerung der Abtastpunkte des Signals in einem digitalen Verzögerungselement, welches aus einem temporären Speicher und einer digitalen Verzögerungsleitung besteht. Die Herausforderung einer solchen digitalen Realisierung besteht in dem Verlust von Symbolen beim Überlaufen des temporären Speichers, was die Anwendung in praktischen Systemen derzeit noch limitiert.

Die vorliegende Dissertation untersucht die voll-digitale Implementierung der vorwärtsgerichteten Taktsynchronisation in ihrer Anwendung in modernen optischen Kommunikationssystemen. In diesem Rahmen wurde zunächst ein neuartiger Zeitschätzalgorithmus entwickelt, welcher auf dem Algorithmus von Barton und Al-Jalili basiert. Darüber hinaus wurde ein Verfahren entwickelt, welches den Informationsverlust beim Überlaufen des temporären Speichers durch Übertakten des Prozessors vermeidet und somit die praktische Umsetzung der voll-digitalen

Taktsynchronisation in modernen Transceivern ermöglicht. Während die digitale Taktsynchronisation in Signalverarbeitungsprozessoren kohärenter Systeme in den vergangenen zwei Jahrzehnten eine ausgereifte Technologie darstellt, resultieren aus modernen optischen Kommunikationssystemen neue Anforderungen und Herausforderungen. Gegenstand dieser Dissertation ist die Untersuchung der vorwärtsgerichteten Taktsynchronisation in modernen optischen Kommunikationssystemen.

Infolge steigender Datenraten in Kurzstreckenübertragungen zwischen und innerhalb von Rechenzentren, getrieben durch KI-Applikationen, stellt die chromatische Dispersion einen limitierenden Faktor in der Übertragungsperformance von Systemen mit Direktempfang dar. Die vorliegende Arbeit untersucht erstmalig den Effekt der chromatischen Dispersion auf die Taktsynchronisation und präsentiert zwei neuartige Algorithmen, welche eine hohe Robustheit gegenüber der chromatischen Dispersion aufweisen.

Im weiteren Verlauf der vorliegenden Arbeit wird die nanosekunden-schnelle Synchronisation von Datenblöcken in der Burst-Übertragung in passiven optischen Netzen demonstriert. Die voll-digitale vorwärtsgerichtete Taktsynchronisation erlaubt somit die Substitution herkömmlicher analoger Schaltungen zur Takt- und Datenrückgewinnung und ermöglicht eine platzsparende und energieeffizientere Umsetzung.

Im Bereich der abhörsicheren Datenübertragung stellt die kontinuierlich-variable Quantenverschlüsselung (engl. continuous-variable quantum key distribution, CV-QKD) einen vielversprechenden Ansatz gegenüber der herkömmlichen diskretvariablen Quantenverschlüsselung dar. In solchen Systemen liegt die Leistung der optischen Signale in der Größenordnung des Quantenrauschens, was die Taktsynchronisation erschwert. Um dieser Herausforderung zu begegnen, werden in der Regel zusätzliche "Hilfssignale" wie Pilottöne eingesetzt, um die Taktsynchronisation zu gewährleisten. In dieser Arbeit wurde die Taktsynchronisation ohne den Einsatz von Pilottönen untersucht und demonstriert, welches die Komplexität zukünftiger CV-QKD-Systeme reduzieren kann.

Der kontinuierlich steigende Bedarf an immer höheren Datenraten erfordert die Untersuchung von optischen Übertragungssystemen mit einem Raummultiplex (engl. Space-Division Multiplexing, SDM) zur parallelen Übertragung großer Datenströme. In diesem Kontext werden derzeit Systeme mit gekoppelten Signalwegen, beispielsweise Glasfasern mit mehreren Wellenleitermoden oder dichtgepackten Faserkernen, erforscht. Für SDM-Systeme mit gekoppelten Kanälen wird ein Algorithmus vorgestellt, der erstmals robust gegenüber räumlicher Dispersion ist. Diese Entwicklung ist von wesentlicher Bedeutung für die Kommerzialisierung von SDM-Systemen mit gekoppelten Kanälen.

# **Preface**

We live in a connected world. Dates are arranged via text messengers, meetings take place across offices in the virtual world, dinner is ordered by just a click on the smartphone and the movie in the evening is streamed – to name just a few examples from everyday life. The Internet is connecting people across the globe and new technologies such as smart cities, the Internet of things, communicating vehicle fleets, and the training of artificial intelligence (AI) models in the cloud are driving demand for ever higher data rates at exponential growth rates. This enormous data traffic is facilitated by optical fiber networks, which form the backbone of the telecommunications infrastructure. In such networks, (digital) data on a processor is first converted into an electrical signal and afterwards modulated onto an optical carrier. The optical signal propagates through the optical fiber and thereby benefits from low losses within a broad frequency range. At the receiver, the optical signal is converted back into an electrical signal using a photodetector before it is digitized and can be further processed in a processor.

Synchronizing the processor clock of the receiver with the transmitter clock is an essential prerequisite for reliable and error-free decoding of the received signal. While it is possible to transmit the clock over a dedicated communication channel for very short distances of about less than one kilometer, this is not practical for longer distances. In such cases, the clock information is extracted from the received signal and the receiver clock is synchronized accordingly. A common method for synchronizing oscillators is by using a feedback control loop. Here, an error signal is derived from the received signal, which then corrects the oscillator to the correct operating point. Such control loops enable frequency and phase-stable synchronization of the receiver clock and are thus widely used in

optical point-to-point long-haul links. However, control loop-based synchronization methods exhibit a limited bandwidth, resulting in insufficient compensation of high-frequency clock fluctuations (jitter) and rendering rapid synchronization of data blocks in burst-mode transmission difficult. An alternative architecture is represented by feedforward clock synchronization. Here, a timing estimator is applied, which directly estimates the sampling offset of the transmitter and receiver clock and can therefore correct the receive signal without the need for a control loop. Since no feedback control loop is required, this method has significantly better capabilities for compensating high-frequency jitter and for synchronizing burst-mode signals. In addition, feedforward clock synchronization can be implemented fully-digital in modern complementary metal-oxide semiconductor (CMOS) circuits, eliminating the need for analog circuits for oscillator control and thus reducing the transceiver complexity. In a fully-digital synchronization, the oscillator phase is no longer corrected physically, but by the temporal delay of the sampling points in a digital delay element, which consists of a buffer and a digital delay line. The challenge of such a digital realization is the loss of symbols when the buffer overflows, which currently still limits its application in practical systems.

This doctoral thesis investigates the all-digital implementation of feedforward clock synchronization and its application in modern optical communication systems. In this context, a novel timing estimation algorithm based on the algorithm of Barton and Al-Jalili was developed. In addition, a process was developed which avoids the loss of information when the buffer overflows by overclocking the processor and thus enables the practical implementation of fully-digital clock synchronization in modern transceivers. While digital clock synchronization in coherent system signal processors has become a mature technology over the past two decades, modern optical communication systems pose new requirements and challenges. The subject of this dissertation is the investigation of feedforward clock synchronization in modern optical communication systems.

As a result of increasing data rates in short-distance transmissions between and within data centers, driven by AI applications, chromatic dispersion is a limiting factor in the transmission performance of systems with direct reception. This

thesis studies for the first time the effect of chromatic dispersion on clock synchronization and presents two novel algorithms that are particularly robust against chromatic dispersion.

Furthermore, this thesis demonstrates the nanosecond-fast synchronization of data blocks in burst-mode transmission in passive optical networks. The fully-digital feedforward clock synchronization thus allows the substitution of conventional analog circuits for clock and data recovery and enables a compact and energy-efficient realization.

In the field of secure data transmission, continuous-variable quantum key distribution (CV-QKD) presents a promising approach to conventional discrete-variable quantum key distribution. In such systems, the power of the optical signals is in the order of the shot noise, thus complicating clock synchronization. To overcome this challenge, additional "auxiliary signals", e.g., pilot tones, are usually used to ensure proper clock synchronization. In this work, clock synchronization was investigated and demonstrated without the use of pilot tones, which can reduce the complexity of future CV-QKD-systems.

The continuously increasing demand for ever higher data rates requires the investigation of optical transmission systems with space-division multiplexing (SDM) for the parallel transmission of large data streams. In this context, systems with coupled channels, such as optical fibers with multiple waveguide modes or densely packed fiber cores, are currently being explored. For SDM systems with coupled channels, an algorithm is presented that is robust to spatial dispersion for the first time. This development is of major importance towards the commercialization of SDM systems with coupled channels.

# **Achievements of the Present Work**

In this thesis, the non-data-aided (NDA), digital feedforward (FF) clock recovery is investigated for its application in modern optical communication systems. For this purpose, the algorithms and synchronization architecture are first investigated through simulation and hardware implementation on a field-programmable gate array (FPGA). Due to the advantages of FF clock recovery in terms of high-frequency jitter compensation and fast timing acquisition, its application in different types of systems is investigated. A concise overview of the major achievements is given in the following list:

Comprehensive performance benchmark of feedback (FB) and FF clock recovery algorithms and architectures: The performance of various timing estimators (TEs) (used for FF) and timing error detectors (TEDs) (used for FB) is studied in detail for different algorithm and system parameters. The results confirm the finding from [1] stating that the various algorithms perform identically due to mathematical equivalence. Furthermore, the bandwidth of FB and FF clock recovery architectures is discussed, which demonstrates the suitability of FF architectures for high-frequency jitter and large clock frequency offset (CFO) compensation.

First demonstration of an all-digital clock recovery with free-running receiver oscillator that allows, both, lower and higher frequencies than the transmitter clock frequency: A major limitation of fully digital clock recoveries is the elastic buffer (EB) overflow at a receiver clock frequency less than the transmitter clock frequency. Current literature simply avoids this problem by setting the receiver clock frequency slightly higher than the transmitter clock frequency [2]. In this thesis, an EB method is reported for the first time, which allows to use a

receiver clock frequency lower and higher than the transmitter clock frequency. The functionality of the novel EB method is implemented together with the remaining clock recovery components on an FPGA and demonstrated in an optical transmission experiment. This enables the shared use of a free-running clock without phase-locked loop (PLL) synchronization for the transmitter and receiver in modern transceivers.

First time analysis of the impact of chromatic dispersion (CD) on the digital clock recovery in intensity modulation and direct detection (IM/DD) systems and development of two novel CD-tolerant algorithms: CD is increasingly becoming a limiting factor in high baud-rate systems that use direct detection. In this thesis, the effect of CD on clock recovery in direct detection systems is investigated analytically and in simulation for the first time. Afterwards, two novel CD-tolerant TEs were developed. The CD penalty was confirmed in experiments and the TEs were validated. The progress made in this area presents an important contribution for future high baud-rate direct-detection systems, which are strongly distorted by CD.

Comprehensive analysis of digital FF clock recovery in passive optical networks (PONs) enabling nanosecond-fast synchronization: With the standardization of the 50-Gbit/s PON, bandwidth limitations, device nonlinearities, and CD present increasing difficulties in the error-free decoding of the received signals. To compensate for such channel effects, 50G-PONs combine an analog-to-digital converter (ADC) with a receiver digital signal processing (DSP). This allows the replacement of analog clock and data recovery (CDR) by digital clock recovery and adaptive equalization. For this purpose, digital FF clock recovery is investigated in this thesis, as it allows fast synchronization of burst-mode signals and the use of low-cost oscillators in the optical network unit (ONU). Successful synchronization within 36.57 ns of two ONUs at 112 Gbit/s data rate in upstream is demonstrated. This paves the way for fast-synchronizing FF clock recoveries in future high-speed PONs with data rates beyond 100 Gbit/s.

Comprehensive analysis of NDA clock recovery in CV-QKD systems: The challenge in coherent DSP for CV-QKD systems lies in the extremely low signal-to-noise ratio (SNR) of the received signal. This complicates the synchronization of the sampling clock phase and the local oscillator phase. For this reason, auxiliary signals, e.g., pilot tones, are often transmitted alongside the actual quantum signal. The generation, transmission, reception, and processing of such pilot tones increase the overall system complexity. In the course of this thesis, NDA clock recovery using the modified Barton & Al-Jalili TE is investigated. The presented results demonstrate successful clock recovery for long averaging lengths of the TE, which in turn reduce the clock recovery bandwidth. Provided that highly stable oscillators are used, this work proves that NDA clock recovery can replace conventional pilot tone-based clock recovery, thereby reducing the system complexity.

First time demonstration of a digital NDA joint clock recovery tolerant to polarization-and-spatial-mode dispersion: Clock recovery failure in SDM systems with coupled channels is a well-known issue that has not yet been solved. For the first time a joint clock recovery that is tolerant to polarization-and-spatial-mode dispersion is proposed. Successful clock synchronization for a 90-GBd 16-level quadrature amplitude modulation (16-QAM) signal resulting in a total data rate of 2.92 Tbit/s over a 150-km randomly-coupled 4-core fiber (RC-4CF) has been experimentally demonstrated.

# 1 Introduction

Clock recovery is a fundamental building block in communication systems and refers to the process of synchronizing the sampling clock at the receiver in, both, phase and frequency, with the clock used to generate the data at the transmitter. This is essential to compensate for a sampling offset between the transmitter and receiver clocks and thus sampling the received waveform at the ideal sampling points that minimize intersymbol interference (ISI) and consequently bit errors. The clock recovery is typically located at the beginning of the receiver DSP chain. In coherent systems, the clock recovery is usually performed after CD compensation and before the adaptive equalizer and carrier recovery [3]. In IM/DD systems, the clock recovery is located directly at the beginning of the DSP chain (or after the digital resampling if needed) and before the adaptive equalizer. For this reason, a loss of synchronization affects most DSP modules and leads to a total failure of the transceiver, which will require a reacquisition of all control loops in the system including the NDA equalizer [4]. Therefore, the synchronization of the receiver clock must operate in a reliable and stable manner. Depending on the communication system, this results in different clock recovery requirements, e.g., the choice of the modulation format, SNR (see clock recovery for CV-QKD), jitter tracking and synchronization speed (see clock recovery for PON) as well as signal distortions caused by, e.g., CD (see CD-tolerant clock recovery), polarization-andspatial-mode dispersion (see joint clock recovery for SDM), etc. Fig. 1.1 shows a simplified system overview of an optical communication system that incorporates a receiver DSP.

The clock recovery can be implemented either in a FB architecture based on a PLL, in a FF architecture, or in a combination of both approaches. Under the prerequisite that relatively stable oscillators are used (e.g.,  $\pm 20$  parts-per-million



Fig. 1.1: System overview of an optical communication system including the transmitter and receiver voltage-controlled oscillators (VCOs). The sampling clock is downmixed by a factor P for the signal processor, which processes the samples P-fold in parallel.

(ppm) for ZR/ZR+ standardized transceivers [5, 6]) and data transmission is continuous, FB structures result in stable phase tracking and are therefore commonly used in optical communications, e.g., in long-haul point-to-point systems [7]. But also in short-reach systems they are frequently deployed because of their low complexity [8, 9]. However, due to their lower clock recovery bandwidth compared to FF synchronization, FB loops suffer by a relatively long acquisition time and therefore might not meet the stringent requirements for fast synchronization in burst-switched systems, such as PONs [10, 11] and data centers [12], or systems that are affected by link outages, e.g., free-space optical communications under atmospheric turbulence [13, 14] or optical camera communications [C6]. Furthermore, the bandwidth limitation results in worse compensation of high-frequency jitter. In scenarios where a fast and robust synchronization is required, FF schemes can be beneficial due to their instantaneous timing estimation and their improved high-frequency jitter performance especially when using low-cost oscillators that feature wider linewidths and lower frequency adjustment accuracies [C2, 15]. On the basis of these advantages, the application of FF clock recovery in various areas of modern optical communication systems is studied in this dissertation. The different fields of application are briefly explained below. Fig. 1.2 gives an overview of the thesis outline with the clock recovery building blocks (*Chapter 2*) and the hardware implementation of all-digital FF clock recovery (Chapter 3) depicted as the foundation and the areas of application visualized as pillars.

### Short-reach optical links

Optical links with distances of less than a few tens of kilometers preferably use IM/DD, as this type of transmission requires less energy and is more cost-effective than coherent transceivers. However, these transceiver types also come with some disadvantages. For example, CD is a nonlinear channel effect with regard to the received optical power, which scales quadratically with the symbol rate assuming a constant fiber length. With currently targeted symbol rates of 112 GBd or even 224 GBd in Ethernet links, CD thus becomes a limiting factor in signal quality. Current research is primarily investigating the compensation of this nonlinear effect using sophisticated nonlinear equalizers and machine learning (ML) methods. However, the impact of CD on clock recovery in IM/DD systems has not yet been investigated. *Chapter 4* analyzes the effect on clock recovery for the first time and demonstrates two novel CD-tolerant clock recovery algorithms for IM/DD systems.

### Passive optical networks (PONs)

PONs are point-to-multipoint optical networks and the preferred network architecture in optical access due to their low cost. They connect an optical line terminal (OLT) located in the central office via a single fiber to a purely passive splitter, from which the respective fibers lead to the individual ONUs. In order to reduce the costs in such networks (especially on the part of the ONU), such networks also utilize IM/DD. Since the feeder fiber from the OLT to the passive splitter is shared by all network users, such a network requires time-allocated transmission slots for each ONU. For this reason, PONs employ time-division multiplexing (TDM) in the downstream and time-division multiple access (TDMA) in the upstream. This makes fast synchronization to the signal bursts in upstream essential. Conventional PON systems use an analog PLL-based synchronization, also referred to as CDR (see section 2.2.1), which derives the timing information from a preamble. As a result of modern applications and services in a smart city, the number of network users will increase from currently 64 users to up to 256 users. In order to guarantee the same latency for 256 users as for 64 users, the burst duration must be shortened. Therefore, efficient preamble design is mandatory to ensure a high net data throughput at the same time [J1]. Chapter 5 analyzes the use

of digital NDA FF clock recovery to ensure fast clock synchronization for future PONs without the need of a preamble.

### Continuous-variable quantum key distribution (CV-QKD)

Quantum key distribution (QKD) is a type of secure communication, which exploits quantum mechanics to implement a secure cryptographic protocol. Here, conventional communication via the Internet is encrypted using a tap-proof quantum channel to exchange an encryption key between to parties. The concept of the secure quantum communication is based on the no-cloning theorem from quantum physics, which states that it is impossible to copy an independent and identical quantum state. As a consequence, an evesdropper cannot intercept, copy, and resend the transmitted quantum states to the intended recipient without changing their quantum states and thus revealing its presence. The first protocol of this kind, the BB84 protocol introduced by Brennett and Brassard in 1984, encodes the key information on discrete physical quantities, for example the polarization of a photon [16]. For this reason, this method is also classified as a discretevariable quantum key distribution (DV-QKD). A disadvantage of this method is the necessity of complex single-photon detectors, which often have to be cooled in cryogenic conditions to minimize noise. This poses difficulties for the integration and scaling of the technology in existing telecommunication systems. An alternative method is CV-QKD, proposed by Grosshans and Gragnier in 2002 [17]. Here, the key information is modulated on the amplitude and phase of a coherent light source, which are continuous physical quantities. These quantum states are generated with the optical power of the signal equal or less than the quantum noise. This method is similar in many ways to modern coherent optical communications and allows the re-use of the technological advances through highly integrated, commercially available transceivers with sophisticated DSP. A major problem in the coherent DSP of CV-QKD systems is the clock and carrier phase synchronization at extremely low SNR of down to -20 dB and below. Auxiliary signals, so-called pilot tones, are therefore often used to enable synchronization. These pilot tones increase the system complexity, which is why pilot tone-free CV-QKD systems are currently an active field of research [18]. Chapter 6 analyzes the requirements for NDA digital clock recovery at extremely low SNR.

### Space-division multiplexing (SDM) optical systems

To cope with the increasing demand for ever higher data rates in telecommunication networks, various multiplexing techniques are used. In conventional coherent fiber-optic systems the amplitude and phase of a coherent light source are modulated as well as both polarizations (polarization multiplexing). Furthermore, different data streams are modulated at a number of different wavelengths (wavelength-division multiplexing (WDM)) in order to exploit the low losses of a broad frequency range of the optical fiber efficiently. The spatial dimension is another physical dimension that can be used to further scale the total data rate. In recent years, SDM has been a field of research that has attracted particular interest. Here, several independent data streams are modulated onto several spatial paths of special optical fibers such as multi-mode fibers (MMFs) or multi-core fibers (MCFs). The coupling of spatially multiplexed signals and their propagation through the fiber with different group velocities is referred to as spatial-mode dispersion. Spatial-mode dispersion is a common issue in coupled SDM systems which leads to a failure of the clock recovery. In Chapter 7, a novel joint clock recovery algorithm that is tolerant to polarization-and-spatial-mode dispersion is presented.



Fig. 1.2: Overview of the thesis structure.

# 2 Fundamentals of Digital Non-Data-Aided Clock Recovery

This chapter summarizes the theoretical and technical background of digital clock recovery with a particular focus on the FF implementation. The necessity of clock recovery is the consequence of the frequency and phase instability of electrical oscillators. For this reason, the voltage-controlled oscillator (VCO) jitter and jitter specifications in communication systems is explained in the first section. Afterwards, the different clock recovery architectures are discussed in the second section and then the necessary building blocks of digital FF and FB clock recovery are explained in the third section. In the last section, a performance benchmark between FB and FF clock recovery is performed to demonstrate the advantages of the FF implementation, which serves as a foundation for the following chapters.

## 2.1 Imperfection of Electrical Oscillators

In an asynchronous communication system, the transmitter and receiver feature their own clocks, which are not synchronized to each other via a physical connection. Even if identical oscillator types are used at the transmitter and receiver, these do not oscillate at the same frequency and phase after start-up and therefore lead to jitter in the communication system. While a stable on-board crystal quartz oscillator at a few GHz and a PLL can be used to stabilize the frequency of the VCO that is fed to the high-frequency signal converters and signal processor, a certain amount of phase noise remains, which is the main source of jitter in the communication system and is commonly referred to as *VCO jitter* [4].

The VCO can be described as a sinusoidal oscillation around the center frequency  $f_0$  and normalized amplitude, which is affected by amplitude noise  $a_n(t)$  and phase noise  $\varphi_n(t)$  [19, 20, J4, C7]. The VCO can hence be modeled as

$$V(t) = (1 + a_{\rm n}(t)) \sin(2\pi f_0 t + \varphi_{\rm n}(t)). \tag{2.1}$$

The amplitude noise  $a_{\rm n}$  can usually be neglected, since electrical oscillators utilize a control circuit for precise amplitude stabilization and therefore  $a_{\rm n}(t)\ll 1$  [21]. Phase noise mainly originates from thermal and flicker noise of the oscillator device [22, 23]. To understanding the effect of phase noise on the VCO oscillation in a simplified example [4, 19, 20, 22, 23], the VCO's phase noise  $\varphi_{\rm n}(t)$  can be assumed to be a single sinusoidal phase modulation with amplitude  $a_{\rm mod}$  normalized to the VCO amplitude and modulation frequency  $f_{\rm mod}$  as

$$\varphi_{\rm n}(t) = a_{\rm mod} \sin\left(2\pi f_{\rm mod}t\right). \tag{2.2}$$

Hence, the VCO voltage can be described as

$$V(t) = \sin(2\pi f_0 t + \varphi_n(t))$$

$$= \sin(2\pi f_0 t)\cos(\varphi_n(t)) + \cos(2\pi f_0 t)\sin(\varphi_n(t)),$$
(2.3)

since  $\sin(x+y) = \sin(x)\cos(y) + \cos(x)\sin(y)$ . Considering that the phase modulation amplitude is small, the cosine and sine involving the phase modulation can be simplified using  $\cos(\varphi_n(t)) \approx 1$  and  $\sin(\varphi_n(t)) \approx \varphi_n(t)$ , respectively. Afterwards, applying the trigonometric identities, the VCO voltage results in

$$V(t) \approx \sin(2\pi f_0 t) + a_{\text{mod}} \cos(2\pi f_0 t) \sin(2\pi f_{\text{mod}} t)$$

$$\approx \sin(2\pi f_0 t) + \frac{a_{\text{mod}}}{2} \left[ \sin(2\pi (f_{\text{mod}} - f_0) t) + \sin(2\pi (f_{\text{mod}} + f_0) t) \right].$$
(2.4)

It can be seen that the phase noise modulation generates two new tones at a distance of  $\pm f_{\rm mod}$  to the oscillator frequency  $f_0$ . The carrier-to-modulation ratio  $\mathcal{L}(\Delta f)$ , i.e., the ratio of the side-tone power and carrier power at a certain frequency offset

 $\Delta f$  from the carrier and within a bandwidth of 1 Hz, is specified in dBc/Hz [4]. In this example, the carrier-to-modulation ratio  $\mathcal{L}_{\mathrm{dB}}$  in decibel is

$$\mathcal{L}_{dB}(\Delta f = |f_{mod}|) = 10 \log_{10} \left( \frac{2 \left| \int_{t=0}^{1s} \sin^{2} \left( 2\pi (f_{0} - \Delta f) t \right) dt \right|^{2} + 2 \left| \int_{t=0}^{1s} \sin^{2} \left( 2\pi (f_{0} + \Delta f) t \right) dt \right|^{2}}{2 \left| \int_{t=0}^{1s} \sin^{2} \left( 2\pi f_{0} t \right) dt \right|^{2}} \right) = 10 \log_{10} \left( \frac{2 \left| \frac{a_{mod}}{2} \frac{1}{2} \right|^{2} + 2 \left| \frac{a_{mod}}{2} \frac{1}{2} \right|^{2}}{2 \left| \frac{1}{2} \right|^{2}} \right) = 10 \log_{10} \left( \frac{a_{mod}^{2}}{2} \right). \tag{2.5}$$

Rearranging eq. (2.5), the root-mean-square (RMS) phase modulation amplitude (or VCO jitter J) in unit radians RMS can be expressed as

$$J(\Delta f) = \sqrt{10^{\frac{\mathcal{L}_{dB}(\Delta f)}{10}}} \tag{2.6}$$

and the absolute jitter in picoseconds RMS follows by normalizing to  $2\pi f_0$  as

$$J(\Delta f) = \frac{1}{2\pi f_0} \sqrt{10^{\frac{\mathcal{L}_{dB}(\Delta f)}{10}}} \times 10^{12}.$$
 (2.7)

In practical systems, phase noise is not a single-frequency modulation but rather involves a modulation of many frequencies around the carrier, leading to a general broadening of the oscillator's spectral line. Within the linewidth of the oscillator carrier  $f_{\rm 3dB}$ , which is defined as the full-width at half-maximum power, the spectrum of the real-world oscillator decreases by  $\sim 1/f^3$  ( $-30~{\rm dBc/Hz}$  per decade). Beyond the linewidth, the carrier spectrum decreases by  $\sim 1/f^2$  ( $-20~{\rm dBc/Hz}$  per decade) until the spectrum finally disappears below the noise floor [19, 20, J4,

C7, 22, 23]. The  $1/f^2$  region is typically modeled by a Lorentzian function, that is defined as

$$\mathcal{L}_{\text{Lorentzian}}(\Delta f) = \frac{\left(\frac{f_{\text{3dB}}}{2}\right)^2}{\Delta f^2 + \left(\frac{f_{\text{3dB}}}{2}\right)^2}.$$
 (2.8)

Fig. 2.1 visualizes the single-sideband carrier-to-modulation ratio of an oscillator containing the  $1/f^3$  (red curve) and  $1/f^2$  (orange curve) regions as well as the Lorentzian function (dashed blue curve), which overlaps well with the  $1/f^2$  curve. The highpass behavior of a clock recovery with a 3-dB bandwidth of about 1 MHz is sketched in green. As a result of the highpass behavior, high-frequency jitter of the oscillator is not compensated. For this reason, a high clock recovery bandwidth is preferable, which, however, entails increased self-noise (also referred to as detector jitter) of the clock recovery (see section 2.4) [4].



Fig. 2.1: VCO phase noise spectrum which decreases with  $1/f^3$  and  $1/f^2$  for offset frequencies smaller and larger than the linewidth, respectively. The Lorentzian function is simulated for a 200-MHz linewidth and approaches the  $1/f^2$  decay for frequencies larger than the linewidth. The clock recovery highpass response is transparent for high-frequency jitter.

To calculate the total jitter within a frequency range between  $f_1$  and  $f_2$ , eq. (2.7) is used with the carrier-to-modulation ratio integrated over the frequency range [4, 6], i.e.,

$$J_{f_1, f_2}(\Delta f) = \frac{1}{2\pi f_0} \sqrt{\int_{\Delta f = f_1}^{f_2} 10^{\frac{\mathcal{L}_{dB}(\Delta f)}{10}} d\Delta f} \times 10^{12}.$$
 (2.9)

As an example using Fig. 2.1, the total jitter from  $10\,\text{MHz}$  to  $100\,\text{MHz}$  of a 50-GHz oscillator with noise floor at  $-90\,\text{dBc/Hz}$  will result in  $0.96\,\text{ps}$  jitter. In relation to the oscillator period of  $20\,\text{ps}$ , the jitter already accounts for 5% of the oscillator period.

## 2.1.1 Jitter Specifications

As shown in Fig. 2.1, clock recovery compensates only for a part of the VCO jitter. Furthermore, depending on the clock recovery design, it also adds self-noise jitter. If the recovered clock is used to transmit another signal, it can lead to a jitter building up in the network. For this reason, the recovered receiver clock must meet stringent specifications after clock recovery that typically include low jitter in order to be used in a commercial system. These specifications are defined in telecommunication standards, e.g., the Synchronous Digital Hierarchy (SDH) [24, 25], the Optical Transport Network (OTN) [26], or Ethernet (IEEE 802.3). In such standards, a distinction is made between different sources and system levels of jitter generation and transfer, which are explained below.

### Jitter generation

The jitter that is generated at a system output while no jitter is present at the system input.

#### Jitter transfer

The jitter transfer quantifies the extent to which input jitter is propagated through a system to its output. Here, the input jitter represents the cumulative jitter passed on from all preceding stages. If the jitter transfer of a system is too high, the output jitter can accumulate across multiple systems, potentially exceeding the jitter tolerance and resulting in transmission errors.

### **Output jitter**

Output jitter comprises both the internally generated jitter (jitter generation) and the jitter propagated through the system (jitter transfer). Accordingly, the resulting jitter spectrum must comply with the requirements defined by the relevant

communication standard. For instance, Fig. 2.2 illustrates the phase noise mask specified in the Ethernet-based OpenZR+ Multi-Source Agreement, commonly applied in point-to-point coherent transmission systems spanning distances greater than 500 km [6]. The output jitter must remain below this defined phase noise mask.

#### Jitter tolerance

Jitter tolerance refers to a system's ability to withstand jitter present at its input without compromising performance. It is typically evaluated by superimposing a sinusoidal jitter onto the incoming communication signal. Although real-world jitter is not purely sinusoidal, this approach enables a systematic sweep across different jitter frequencies to verify compliance with system specifications. In the context of this thesis, a CFO between transmitter and receiver is introduced to assess system performance across various jitter frequencies, under the assumption that the measurement equipment's intrinsic jitter is negligible.



Fig. 2.2: Transmitter clock phase noise mask for an oscillator center-frequency of 469.83 MHz as specified in the OpenZR+ multi-source agreement [6].

## 2.2 Clock Recovery Architectures

Clock recovery architectures can be classified according to their circuit architecture, while each architecture has its own advantages and disadvantages. The synchronization of the receive clock can be accomplished in an *analog*, *digital*, or

hybrid analog-and-digital circuit architecture, whereby each of these implementation types can in turn have a feedback (FB) structure, feedforward (FF) structure, or a combination of both structures. The following chapter gives an overview of the different architectures and explains their advantages and disadvantages. This leads from the analog and hybrid architectures to an all-digital implementation, which was examined for modern optical communication systems in the course of this thesis.

## 2.2.1 All-Analog Clock Recovery

fully analog clock recovery architectures, also referred to as CDR, are the preferred synchronization circuits in short-distance optical transmission links, where low transceiver costs and low power consumption require simple receiver circuits without a receiver DSP. In general, this is the case for IM/DD systems [27], e.g., in intra-datacenter (DC) connections or PONs. CDR circuits can be realized in various ways (an overview is given in [28-30]), of which analog PLL-based techniques with a charge pump (CP), as depicted in Fig. 2.3, are commonly used in multi-gigabit fiber-optic links. Here, the optical signal is first converted into an electrical signal using a photodiode and, if necessary, amplified using a radio frequency (RF) amplifier, usually a transimpedance amplifier (TIA). Optionally, an analog equalizer circuit can follow. Next is the CDR circuit. First, an error signal is generated in a phase detector (PD), which is proportional to the sampling error between the received analog signal and the recovered clock. This error signal then tunes the electrical current in a CP, which is lowpass filtered in a loop filter (LF), and finally adjusts the voltage at the VCO such that the sampling error is reduced. The recovered and phase-locked clock is then used to accurately sample the received signal in the PD [28–30]. The sampling is often accomplished utilizing a D-flip-flop, which also serves as a decision circuit [29]. The retimed signal is thus available in a digital format and can be forwarded to a deserializer for further processing. Note that with a PLL-based analog CDR, no low-GHz reference clock is needed to stabilize the VCO in the start phase, since the PLL can inherently lock the clock to the receive signal in frequency and phase [29, 30].

Drawbacks of analog PLL-based CDR are the relatively large footprint as large LF capacitors are utilized and the low power efficiency in contrast to a digital CMOS-based implementation [29, 30]. Furthermore, analog CDR is difficult to implement in a deep-submicron technology while still providing the necessary performance required for high-speed communications [4, 30]. For these reasons, digital CDR for short-reach optical links currently represents a vivid field of research [11, 30].



Fig. 2.3: Clock recovery architecture of a charge pump PLL-based CDR circuit.

## 2.2.2 Hybrid Analog-and-Digital Clock Recovery

With increasing data rates, transmission distances, and higher-order modulation formats, distortions of the signal also become more pronounced, e.g., caused by CD, which makes reliable convergence of CDR circuits impossible. Modern receivers designed for high data rates and longer reach, for example for datacenter interconnects (DCIs), ultra-high-speed PONs, or coherent long-reach transmission, employ one or multiple ADCs in conjunction with a receiver DSP to compensate for the channel effects. The use of a DSP hence also allows parts of the PLL-based clock recovery to be implemented in digital domain and thus benefit from a space and energy-efficient implementation [4].



Fig. 2.4: Analog-and-digital clock recovery architecture.

Fig. 2.4 shows the receiver clock architecture for an IM/DD system with a single ADC and an analog-and-digital clock recovery scheme. In the case of coherent detection, multiple ADCs are driven by the clock and the digitized signals all enter the receiver DSP [4]. The VCO is typically operated at a very high frequency that is in the order of the sampling rate of the signal converters. For a sampling rate of 64 GSa/s, the VCO oscillates for example at 16 GHz [4]. Since, due to temperature and CMOS process variations, the high-GHz VCO starts with a large frequency offset in comparison to the desired oscillation frequency, which can be up to  $\pm 10\%$ , a very stable external low-GHz (e.g., 2 GHz) reference crystal oscillator is used [4]. The high-GHz clock is down-mixed by a factor  $P_{\rm ref}$  to the approximate frequency of the stable crystal oscillator. Afterwards, the frequency and phase of the VCO is stabilized to the 2 GHz reference in a PLL consisting of a PD and LF. The VCO is then stable in the order of tens of ppm accuracy and facilitates the subsequent fine-tuning to the transmitter clock by means of an analog-and-digital FB clock recovery.

The received signal is sampled and quantized after the analog front-end in an ADC, which is timed by the VCO. As the digital circuits in the receiver DSP can only operate at a processing clock of 500 MHz to 1 GHz, the ADC parallelizes the output samples. With a sampling rate of 64 GSa/s and a DSP clock of 500 MHz, this corresponds to a parallelization of 128 samples per clock cycle. The digitized

and parallelized sequence of samples then enters the receiver DSP in which the samples can be used to determine a timing error proportional to the sampling offset in a TED. Afterwards, the error signal is filtered in a digital LF and converted into a voltage utilizing a slow digital-to-analog converter (DAC), which in turn drives the VCO to converge to the transmitter clock. Since the temporal dynamic of the clock drift is less than the sampling rate of the signal converter, the low-speed DAC can be operated at a lower sampling rate, about the order of magnitude of the DSP clock, and therefore simplifies the application-specific integrated circuit (ASIC) design and reduces costs. Note that other algorithms can also precede the clock recovery in the DSP chain or can be implemented in a nested clock recovery and equalization structure in order to increase the tolerance of the TED to certain channel effects [15, 31–35].

## 2.2.3 All-Digital Clock Recovery

The use of a receiver DSP raises the question of whether the entire clock recovery can be implemented digitally and possibly operate the VCO in a free-running mode apart from the reference clock locking. This eliminates the need for one of the two VCO tuning ports and a low-speed DAC including its power supply, which is expected to result in further savings in energy consumption and reduced space on the chip. In addition to a fully digital PLL-based FB architecture, the all-digital implementation also offers the option of a FF clock recovery architecture. Both architectures are illustrated in Fig. 2.5.

An FB clock recovery utilizing a digital PLL is shown in Fig. 2.5(a). Here, the timing correction is accomplished by an EB and an interpolator which correct the integer sampling delay m and fractional sampling delay  $\mu$ , respectively. The EB is a buffer that compensates for a CFO between the transmitter and receiver clock or absorbs large ranges of clock phase drift due to the non-existent physical adaptation of the receiver VCO to the transmitter VCO. The fractional and integer sampling offsets are provided by a numerical-controlled oscillator (NCO) instead of a VCO. The retimed signal then continues to the following DSP blocks and

#### (a) Digital Feedback Clock Recovery



#### (b) Digital Feedforward Clock Recovery



Fig. 2.5: All-digital FB (a) and FF (b) clock recovery architectures.

also enters the FB loop for the TED and LF, which are identical to the analogand-digital clock recovery. The LF, usually a proportional-integral (PI) filter, and NCO form a second-order PLL, which recursively derives a timing estimate.

In FF schemes, the signal is split into two paths, as depicted in Fig. 2.5(b). In the first path, a TE directly estimates the sampling phase offset from the signal. Afterwards, the estimated phase is unwrapped at phase jumps of  $2\pi$ . To apply this sampling phase to the associated samples, the signal is delayed in the second path in a buffer and is then corrected in the EB and interpolator. The advantage of FF architectures lies in the immediate timing estimation, which makes a PLL obsolete, thus saving the acquisition time of the feedback control and enabling a higher clock recovery bandwidth. Furthermore, there is no need for a complex PLL design. Due to the higher bandwidth of the FF clock recovery, high-frequency jitter can be compensated more efficiently which increases the jitter tolerance of the transceiver and reduces the jitter transfer in a concatenated network [15]. As a drawback of FF architectures, the hardware complexity of the TEs is often mentioned [31]. This statement usually refers to the most popular TE proposed by Oerder and Meyr [36], which requires an oversampling >2 and for practical reasons is usually implemented at fourfold oversampling, while typical TEDs are often implemented at only twofold oversampling. However, as shown in section 2.3.1, there exist also TEs that require the same oversampling as TEDs. A combination of a slow analog-and-digital FB adjustment to physically align the VCO and a digital FF clock recovery to compensate for residual high-frequency jitter is demonstrated in [15].

## 2.3 Digital Clock Recovery Components

The following chapter explains the respective functional blocks of the digital clock recovery in more detail. To this end, section 2.3.1 first explains the key elements of the clock recoveries, namely the TED and TE. The elements for timing correction, i.e., the interpolator and the EB, are thereafter explained in sections 2.3.2 and section 2.3.3. Finally, a brief chapter on the LF design and

latencies in FB paths as well as the resulting choice of PI coefficients follows in section 2.3.4.

## 2.3.1 Timing Error Acquisition

This subsection explains the underlying mathematical concepts of digital and NDA timing error acquisition using a TED and a TE. Afterwards, the modified version of the algorithm by Barton and Al-Jalili is explained – an algorithm which was developed as part of the thesis. Finally, the last subchapter gives an overview of commonly used TEDs and TEs.

#### 2.3.1.1 Discrete-Time Random Sequence

At the transmitter, a sequence  $a_m^{-1}$  with  $m \in \mathbb{Z}$ , comprising data symbols from a predefined alphabet  $a_m \in \mathcal{A}$ , is generated. For the sake of simplicity and without loss of generality, a real-valued sequence is considered in the following. The elements of the sequence are i.i.d. random realizations every symbol period  $T_{\mathrm{sym}}$  of a zero-mean, cyclostationary random process a with realizations

$$a(t) = \sum_{m=-\infty}^{\infty} a_m \delta(t - mT_{\text{sym}}). \tag{2.10}$$

The autocorrelation function  $R_a(k_0)$  for a symbol delay  $k_0$  and mean value  $E\{\cdot\}$  is defined as

$$R_{\mathsf{a}}(k_0) = \mathrm{E}\left\{a_m^* a_{m+k_0}\right\} = \sigma_{\mathsf{a}}^2 \delta(k_0),$$
 (2.11)

Note the analogy to the time-continuous case, where x(t) describes, both, the value of the function at a single point in time t and the whole function of time t. Since discrete-time systems always map series to series, no misunderstandings are to be expected.

with the variance  $\sigma_a^2$  and the Dirac function  $\delta(k_0)$ . The discrete-time Fourier transform (DTFT) of the autocorrelation function yields the power spectral density (PSD)  $S_a(f)$  as

$$S_{\mathsf{a}}(f) = \sum_{k_0 = -\infty}^{\infty} R_{\mathsf{a}}(k_0) \,\mathrm{e}^{\mathrm{j}\,2\pi f k_0 T_{\mathrm{sym}}} = \sigma_{\mathsf{a}}^2 \,.$$
 (2.12)

The result shows, that the a random symbol sequence a(t) has in average a flat power spectrum. It follows that the amplitude of the Fourier transform of the random sequence is also constant over the frequency, while the phase has an uniform distribution [37]. Finally, the Dirac comb in eq. (2.10) is periodic in frequency-domain (FD) with the symbol rate  $f_{\rm sym} = 1/T_{\rm sym}$  (see appendix B), i.e.,

$$\mathcal{F}\left\{\sum_{m=-\infty}^{\infty} \delta(t - mT_{\text{sym}})\right\} = f_{\text{sym}} \sum_{m=-\infty}^{\infty} \delta\left(f - mf_{\text{sym}}\right), \qquad (2.13)$$

and therefore, also the DTFT of the symbol sequence is periodic with symbol rate. Fig. 2.6(a) visualizes the DTFT of the symbol sequence, where the information content is repeating at integer multiples of the symbol rate. In the following, the signal from eq. (2.10) is sampled with sampling frequency  $f_{\rm sa} > f_{\rm sym}$ , i.e., an oversampling ratio of  $\eta_{\rm os} = f_{\rm sa}/f_{\rm sym}$ , and is denoted as  $s_k$ . For convenience, an integer oversampling ratio greater or equal two is assumed. The periodicity of the N-point discrete Fourier transform (DFT)  $\underline{\tilde{s}}_n$  of the samples  $s_k$  and with frequency bin index n can then be expressed as

$$\tilde{\underline{S}}_n = \tilde{\underline{S}}_{n + \frac{m}{n-1}N} \tag{2.14}$$

for  $m \in \{1, \dots, \eta_{\rm os} - 1\}$  and  $n \in \{0, \dots, N/\eta_{\rm os} - 1\}$ . For simplicity, clock recovery algorithms are often implemented at twofold oversampling. Implementations at fractional oversampling were not considered in this thesis and hence integer

oversampling is assumed in this thesis as well. For twofold oversampling, i.e.,  $\eta_{\rm os}$  = 2, the spectral periodicity is

$$\tilde{\underline{S}}_n = \tilde{\underline{S}}_{n+\frac{N}{2}}. \tag{2.15}$$

Consequently, the product of a signal component  $\underline{\tilde{s}}_n$  and the complex conjugate of its periodic repetition  $\underline{\tilde{s}}_{n+N/2}$  is a real value, whose expected value is proportional to the variance  $\sigma_{\bf a}^2$ 

$$\tilde{\underline{s}}_{n}\tilde{\underline{s}}_{n+\frac{N}{2}}^{*} = \tilde{\underline{s}}_{n}\tilde{\underline{s}}_{n}^{*} = \left|\tilde{\underline{s}}_{n}\right|^{2} \\
E\left\{\underline{\tilde{s}}_{n}\tilde{\underline{s}}_{n+\frac{N}{2}}^{*}\right\} = E\left\{\tilde{\underline{s}}_{n}\tilde{\underline{s}}_{n}^{*}\right\} = \frac{\sigma_{\mathsf{a}}^{2}}{2}.$$
(2.16)

If the analog or digital pulse-shaped sequence still contains parts of the second (or higher) Nyquist zone, see Fig. 2.6(b), the periodicity of the information content is still preserved. This property is utilized by NDA clock recovery algorithms and is explained in the following subsections.



Fig. 2.6: (a) Discrete-time Fourier transform (DTFT) of a random sequence repeating at multiple integers of the symbol rate  $f_{\rm sym}=1/T_{\rm sym}$ . (b) The lowpass-filtered signal contains spectral components of the first and second Nyquist zone. Due to sampling at a higher rate, the frequency components repeat at multiples of the sampling rate  $f_{\rm sa}=1/T_{\rm sa}$ .

#### 2.3.1.2 Group Delay Estimation

A digital signal  $x_{k,\tau=0}=x(kT_{\rm sa})$  at sampling instance k and without any time delay  $\tau$  is considered. A version of the signal shifted in time by  $\tau$  is expressed as  $x_{k,\tau}=x(T_{\rm sa}(k+\eta_{\rm os}\tau))^1$ , where  $\tau$  is normalized to the symbol period  $T_{\rm sym}$ . A negative and positive time delay corresponds to a too early and too late sampling, respectively. The time delay is composed of the group delays of the various analog components in the communication channel, e.g., analog filters, amplifiers, cables, optical fibers, etc., as well as the asynchronous sampling between the transmitter and receiver clock and can generally vary in time, e.g., due to the clock phase walk of the oscillators. For simplicity, a time delay  $\tau$  constant over time is assumed in the analytical discussion. A time delay  $\tau$  in time domain (TD) corresponds to a linear phase shift in FD [38]. Consequently, the N-point DFT of the sequence  $x_{k,\tau}$  can be decomposed into the DFT of the signal at ideal sampling point  $\tilde{x}_{n,\tau=0}$  and a linear phase as

$$\sum_{k=-\infty}^{\infty} x_{k,\tau} e^{-j 2\pi k \frac{n}{N}} = \tilde{\underline{x}}_{n,\tau} = \tilde{\underline{x}}_{n,\tau=0} e^{j 2\pi \frac{n}{N} \eta_{os} \tau} . \tag{2.17}$$

The purpose of the digital clock recovery circuit is to estimate the sampling phase offset  $\tau$  and delay the signal in time to compensate for this delay. In the following, the total time delay  $\tau$  is attributed to a noise-free linear time-invariant (LTI) channel  $\underline{h}(t)$ , i.e., the group delay of the channel has to be estimated. As the timing phase estimation is implemented in digital domain, the digital impulse response  $\underline{h}_k$  of the channel is considered. The frequency response of the channel using an N-point DFT is defined as  $\underline{\tilde{h}}_n$  and can be decomposed in magnitude and unwrapped phase as

$$\underline{\tilde{h}}_{n} = \left| \underline{\tilde{h}}_{n} \right| e^{j \varphi_{n}},$$
(2.18)

Mathematically, a delay is usually defined as  $\delta(t-\tau)$  and finally the group delay  $\tau$  corresponds to the negative derivative of the spectral phase as in eq. (A.28). In the field of clock recovery, the notation  $\delta(t+\tau)$  is commonly used and, thus, the negative sign when estimating the group delay can be omitted.

for  $n \in \{N/2, \dots, N/2-1\}$ , where n=0 refers to the zero-frequency. According to the definition of the group delay of a system in eq. (A.28) in appendix A.2.3.1, the group delay at the n-th frequency bin can be obtained for a given frequency interval  $1 < \Delta f < N$  as

$$\tau_{\mathrm{g},n}(\Delta f) = \begin{cases} \frac{1}{2\pi} \frac{N}{\Delta f} \left( \varphi_{n-\frac{\Delta f}{2}} - \varphi_{n+\frac{\Delta f}{2}} \right) & \Delta f \text{ even, } -\frac{N-\Delta f}{2} \leq n < \frac{N-\Delta f}{2} \\ \frac{1}{2\pi} \frac{N}{\Delta f} \left( \varphi_{n-\frac{\Delta f-1}{2}} - \varphi_{n+\frac{\Delta f-1}{2}} \right) & \Delta f \text{ odd, } -\frac{N-\Delta f+1}{2} \leq n \leq \frac{N-\Delta f-1}{2} \\ \text{undefined} & \text{otherwise.} \end{cases}$$

$$(2.19)$$

In the following, a frequency interval with even  $\Delta f$  is considered. By averaging over the frequency-dependent group delay, the frequency-averaged group delay  $\overline{\tau}_{\rm g}(\Delta f)$  is obtained as

$$\overline{\tau}_{g}(\Delta f) = \frac{1}{N - \Delta f} \sum_{n = -(N - \Delta f)/2}^{(N - \Delta f)/2 - 1} \tau_{g,n}(\Delta f) 
= \frac{1}{2\pi} \frac{N}{(N - \Delta f)\Delta f} \sum_{n = -(N - \Delta f)/2}^{(N - \Delta f)/2 - 1} \varphi_{n - \frac{\Delta f}{2}} - \varphi_{n + \frac{\Delta f}{2}}.$$
(2.20)

Assuming that the frequency-depended group delay is only slightly nonlinear over the frequency range under consideration, a large interval  $\Delta f$  provides a sufficient estimation of the group delay. For the special case of  $\Delta f = N/2$ , the frequency-averaged group delay results in

$$\overline{\tau}_{g}\left(\Delta f = \frac{N}{2}\right) = \frac{1}{2\pi} \frac{4}{N} \sum_{n=-N/4}^{N/4-1} \varphi_{n-\frac{N}{4}} - \varphi_{n+\frac{N}{4}} \\
= \frac{1}{2\pi} \frac{4}{N} \sum_{n=0}^{N/2-1} \varphi_{n} - \varphi_{n+\frac{N}{2}}.$$
(2.21)

Note the re-indexing of  $n \in \{0, ..., N-1\}$  in eq. (2.22). Since averaging the absolute phase can lead to incorrect mean values when phase jumps at  $\pm \pi$  occur, averaging in the complex plane is preferred, i.e.,

$$\frac{1}{N} \sum_{n=0}^{N-1} \varphi_n = \frac{1}{N} \sum_{n=0}^{N-1} \arg \left\{ e^{j \varphi_n} \right\}$$

$$= \arg \left\{ \prod_{n=0}^{N-1} \left( e^{j \varphi_n} \right)^{\frac{1}{N}} \right\}.$$
(2.22)

Using eq. (2.22) and the definition of the channel from eq. (2.18), the frequency-averaged group delay from eq. (2.21) can also be expressed as

$$\overline{\tau}_{g} = \frac{1}{\pi} \arg \left\{ \prod_{n=0}^{N/2-1} \left( e^{i\left(\varphi_{n} - \varphi_{n+\frac{N}{2}}\right)} \right)^{\frac{2}{N}} \right\}$$

$$= \frac{1}{\pi} \arg \left\{ \prod_{n=0}^{N/2-1} \left( \tilde{h}_{n} \tilde{h}_{n+\frac{N}{2}}^{*} \right)^{\frac{2}{N}} \right\}.$$
(2.23)

This equation provides an accurate estimate of the linear phase portion of an arbitrary nonlinear phase over the frequency. However, it comes with two disadvantages. First, the geometric mean determines the exact mean phase, but does not take into account the amplitudes of the complex values, i.e., the phase of signal components with low signal power in the stopband are equally accounted for as the phase of signal components in the passband. This can lead to estimation errors, particularly for signal components below the noise floor. Furthermore, the product expansion consists of a large number of complex-valued multiplications, which result in a high computational effort.

A sufficient approximation of the geometric mean for weak phase nonlinearities in the passband is the arithmetic mean, i.e.,

$$\arg\left\{\prod_{n=0}^{N-1} \left(e^{j\varphi_n}\right)^{\frac{1}{N}}\right\} \approx \arg\left\{\frac{1}{N} \sum_{n=0}^{N-1} e^{j\varphi_n}\right\}. \tag{2.24}$$

This has the advantage that the phases of weak signal components are weighted less strongly and complex-valued multiplications are replaced by complex-valued additions. Using this approximation, eq. (2.23) yields

$$\overline{\tau}_{g} \approx \frac{1}{\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \underline{\tilde{h}}_{n} \underline{\tilde{h}}_{n+\frac{N}{2}}^{*} \right\}. \tag{2.25}$$

Note that here the group delay is related to the sampling period, while the timing estimate for the clock recovery is normalized to the symbol period. Furthermore, it is emphasized that the sum in eq. (2.25) corresponds to the spectral correlation of the filter, which is commonly referred to as clock tone.

### **Example: 3rd-order Bessel lowpass filter**

In the following example, a channel with lowpass characteristic is considered, which can be modeled as a 3rd-order Bessel lowpass filter that is defined as

$$\tilde{\underline{h}}(f) = \frac{15}{15 - 24\pi^2 f^2 + j(30\pi f - 8\pi^3 f^3)},$$
(2.26)

where the 3-dB bandwidth is defined at the angular frequency  $\omega=2\pi f=1.756$  [39]. Fig. 2.7 shows the simulated filter characteristics of a 3rd-order Bessel lowpass filter with a 3-dB bandwidth of 50 GHz and a sampling rate of 500 GHz. In Fig. 2.7(a), the PSD is shown with the 3-dB limit. In Fig. 2.7(b), the unwrapped phase is obtained as  $\arg\{\tilde{h}(f)\}$  (see eq. (A.27)). From this, the group delay in Fig. 2.7(c) is computed according to eq. (2.19) for  $\Delta f=2$  (blue curve) and the mean group delay according to eq. (2.20) using the geometric mean from eq. (2.22) (red curve) and the arithmetic mean from eq. (2.24) (orange curve). The geometric mean averages the group delay over all frequencies and therefore corresponds to the true mean group delay. The arithmetic mean additionally weights the spectral phase differences according to the amplitudes of the frequency response. For this reason, the arithmetic mean approximates the group delay in the passband, which is relevant for the sampling offset of the signal. Based on the two mean group delays obtained, the linear phase portion of the Bessel filter is plotted in Fig. 2.7(b). Using the geometric mean gives the linear phase portion of the

system over all frequencies. This also includes weak signal components, which in practice are strongly affected by noise and distort the phase estimation. The arithmetic mean, on the other hand, approximates the linear phase in the passband and thus reduces distortions caused by out-of-band frequency components. Note that the arithmetic mean only provides reliable phase estimates as long as the frequency-dependent spectral phase in the passband is weakly nonlinear, which is true for a Bessel filter. Furthermore, if a large frequency interval  $\Delta f = N/2$  is used, the determined geometric and arithmetic group delay is incorrect for strong nonlinear phases. In this case, only a slight nonlinear phase can be tolerated in the observation window.

## 2.3.1.3 Modified Barton & Al-Jalili Algorithm

To obtain an understanding of the TE, the received signal is considered as a function of the transmitted signal and the communication channel. A real-valued upsampled random symbol sequence s(t) (see section 2.3.1.1) is digitally and/or analog pulse-shaped at the transmitter with impulse response p(t). It then passes through a linear channel with real-valued impulse response c(t). At the receiverside, zero-mean additive white Gaussian noise (AWGN) n(t) is added before the signal is filtered by a receive filter p(t), hence, emulating bandwidth limitations of the receiver. For convenience, the transmitter lowpass p(t) and receiver lowpass p(t) have symmetric and real-valued impulse responses and therefore also feature real-valued Fourier transforms with zero spectral phase. Any time delay caused by the transmitter and receiver architecture (circuit group delays or DAC and ADC sampling offset) and the channel is modeled by an overall group delay t normalized to the symbol period as t0 the received signal t1 is then described as

$$x(t) = s(t) * p(t) * c(t) * g(t) * \delta(t + \tau T_{\text{sym}}) + n(t) * g(t), \qquad (2.27)$$

For simplicity, we consider a real-valued signal. However, the considerations are also valid for a complex-valued upsampled symbol sequence  $\underline{s}(t)$ .



**Fig. 2.7:** 3rd-order Bessel lowpass filter for a sampling rate of 500 GHz and a 3-dB bandwidth of 50 GHz. (a) Spectrum in decibel. (b) Nonlinear spectral phase of the Bessel filter with the linear spectral phase portion obtained from the frequency-averaged group delay (GD) shown in (c).

where \* is the convolution operator. To simplify, h(t) = p(t) \* c(t) \* g(t) is abbreviated. After sampling with sampling interval  $T_{\rm sa}$ , the received samples are obtained as  $x_k = x(kT_{\rm sa})$ . In the following, twofold oversampling  $\eta_{\rm os} = 2$  is assumed, i.e., the spectrum of the sampled receive signal contains two copies of  $S_{\rm a}(f)$ , as shown in Fig. 2.6(b). Taking the DFT over a block of N received samples yields the FD representation as

$$\tilde{\underline{x}}_n = \tilde{\underline{s}}_n \tilde{h}_n e^{j 4\pi \frac{n}{N}\tau} + \tilde{\underline{n}}_n \tilde{g}_n$$
 (2.28)

with the ensemble average denoted as  $\langle \cdot \rangle$  given by

$$\langle \tilde{x}_n \rangle = \langle \tilde{s}_n \rangle \tilde{h}_n e^{j 4\pi \frac{n}{N} \tau}$$
 (2.29)

The TE according to Barton and Al-Jalili [40] exploits the periodicity of the cyclostationary random sequence from eq. (2.15) to cancel the random phase of the sequence  $\underline{\tilde{s}}$  by subtracting the phase of the ensemble average of the two frequency components for a frequency separation of  $\Delta f = N/2$  as

$$\frac{1}{2\pi} \left( \arg\left\{ \left\langle \tilde{x}_{n} \right\rangle \right\} - \arg\left\{ \left\langle \tilde{x}_{n+\frac{N}{2}} \right\rangle \right\} \right)$$

$$= \frac{1}{2\pi} \left( \arg\left\{ \left\langle \tilde{x}_{n} \right\rangle \tilde{h}_{n} e^{j 4\pi \frac{n}{N}\tau} \right\} - \arg\left\{ \left\langle \tilde{x}_{n} \right\rangle \tilde{h}_{n+\frac{N}{2}} e^{j 4\pi \frac{n-\frac{N}{2}}{N}\tau} \right\} \right)$$

$$= \frac{1}{2\pi} \left( \arg\left\{ \left\langle \tilde{x}_{n} \right\rangle \right\} + 4\pi \frac{n}{N}\tau - \arg\left\{ \left\langle \tilde{x}_{n} \right\rangle \right\} - 4\pi \frac{n}{N}\tau + 2\pi\tau \right)$$

$$= \frac{1}{2\pi} (2\pi\tau)$$

$$= \tau, \tag{2.30}$$

where the identity  $\arg\{c_1c_2\} = \arg\{c_1\} + \arg\{c_2\}$  with a phase ambiguity of  $2\pi$  is used. Considering not the ensemble averages and instead the frequency components obtained for the calculation of a single DFT, the estimated sampling offset  $\hat{\tau}$  is obtained as

$$\hat{\tau}_{\text{BAJ}} = \frac{1}{2\pi} \left( \arg \left\{ \underline{\tilde{x}}_n \right\} - \arg \left\{ \underline{\tilde{x}}_{n+\frac{N}{2}} \right\} \right). \tag{2.31}$$

In the course of this thesis, the modified BAJ algorithm (mod-BAJ) was applied and analyzed for the first time in the literature [C1, C2, 41]. Here, the phase difference in the complex plane is averaged over the frequency for  $\Delta f = N/2$  to effectively suppress noise, i.e., the frequency components of the left sideband  $\tilde{x}_n$  are multiplied by the complex conjugate frequency components of the right sideband  $\tilde{x}_{n+N/2}$  (see eq. (2.25)). The TE therefore results in

$$\hat{\tau}_{\text{mod-BAJ}} = \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \tilde{x}_n \tilde{x}_{n+\frac{N}{2}}^* \right\}.$$
 (2.32)

Again, by examining the ensemble average from eq. (2.29) and using eq. (2.16), it can be shown that this algorithm provides the sampling offset as

$$\frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \left( \tilde{\underline{x}}_n \tilde{\underline{x}}_{n+\frac{N}{2}}^* \right) \right\}$$

$$= \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \left( \tilde{\underline{s}}_n \right) \tilde{h}_n e^{j 4\pi \frac{n}{N} \tau} \left\langle \tilde{\underline{s}}_n^* \right) \tilde{h}_{n+\frac{N}{2}} e^{-j 4\pi \frac{n-\frac{n}{N}}{N} \tau} \right\}$$

$$= \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \left( \tilde{\underline{s}}_n \tilde{\underline{s}}_n^* \right) \tilde{h}_n \tilde{h}_{n+\frac{N}{2}} e^{j 2\pi \tau} \right\}$$

$$= \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \frac{\sigma_{\mathbf{a}}^2}{2} \tilde{h}_n \tilde{h}_{n+\frac{N}{2}} e^{j 2\pi \tau} \right\}$$

$$= \tau. \tag{2.33}$$

Fig. 2.8 illustrates the magnitude and phase of the multiplication of the left sideband with the complex conjugate right sideband and how a constant spectral phase proportional to the sampling offset can be obtained. Since the BAJ and mod-BAJ algorithms directly provide a timing estimate  $\hat{\tau}$ , they can be implemented in a FF clock recovery architecture.

The computational complexity of the mod-BAJ algorithm can be reduced by omitting the calculation of the argument and considering only the imaginary part of the autocorrelation. This yields a sine as a function of the sampling offset,



**Fig. 2.8:** Visualization of the magnitude (left column) and the spectral phase (right column) of the left sideband (top row), right sideband (middle row), and the product of the left sideband with the complex conjugate right sideband (bottom row). The red and blue areas mark identical spectral components of the first and second Nyquist zone of the random symbol sequence. The linear phase caused by a time delay is shown in green.

since  $\exp(j 2\pi\tau) = \cos(2\pi\tau) + j\sin(2\pi\tau)$ , which can be used as an error signal  $\hat{\varepsilon}$  in a PLL to derive the timing phase. The resulting algorithm is proposed by Godard [42] with the estimated error signal  $\hat{\varepsilon}$  as

$$\hat{\varepsilon}_{Godard} = \frac{1}{2\pi} \Im \left\{ \sum_{n=0}^{N/2-1} \tilde{\underline{x}}_n \tilde{\underline{x}}_{n+\frac{N}{2}}^* \right\}. \tag{2.34}$$

### 2.3.1.4 Overview of Common Clock Recovery Algorithms

In the course of the thesis, further TE and TED algorithms were investigated, which will be briefly outlined in this subchapter. In general, most clock recovery algorithms exploit the cyclostationarity of the signal to obtain the sampling phase offset and hence are mathematically equivalent [1].

The FD algorithms presented in the previous subsection can also be implemented in TD, hence, saving the complexity to implement a fast Fourier transform (FFT). The well-studied Gardner algorithm [43] evaluates the signal  $x_k$  by comparing adjacent samples around a center sample  $x_{2k+1}$  at twofold oversampling. The averaged difference of the adjacent samples is proportional to the deviation from the ideal sampling point to the center sampling point. Therefore, the difference is weighted by the center sample and then averaged over a block of N samples to derive an estimated error signal  $\hat{\varepsilon}$  as

$$\hat{\varepsilon}_{\text{Gardner}} = \sum_{k=0}^{N/2-1} x_{2k+1} \left( x_{2k} - x_{2k+2} \right) . \tag{2.35}$$

The square-timing-recovery algorithm by Oerder and Meyr [36] (OEM) calculates a timing estimate  $\hat{\tau}$  from N samples at an oversampling ratio larger than two. For practical reasons, four samples per symbol is often chosen, as in this work. In a way similar to the mod-BAJ algorithm, the autocorrelation of the spectrum is attained by computing the squared signal in TD. The squaring leads to the full spectral correlation for all frequency separations and therefore results in a broadening of the spectrum, which necessitates the higher oversampling to avoid aliasing. The frequency component of the spectral correlation  $\Delta f = N/2$  then appears at the symbol rate  $f_{\rm sym}$  in FD. The phase of the resulting tone at  $f_{\rm sym}$ ,

i.e., at the frequency bin with the index n = N/4 for fourfold oversampling, can be determined from the argument of the corresponding Fourier coefficient as

$$\hat{\tau}_{\text{OEM}} = \frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{N-1} x_k^2 e^{-j 2\pi k \frac{n}{N}} \right\} \Big|_{n=\frac{N}{4}} 
= \frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{N-1} x_k^2 e^{-j \pi \frac{k}{2}} \right\}.$$
(2.36)

To apply the OEM scheme at twofold oversampling, Zhu et al. [44] extended the algorithm by shifting the symbol-rate frequency component to baseband and limiting the bandwidth by a lowpass filter h of length  $N_{\rm tap}$  prior to the squaring operation

$$\hat{\tau}_{\text{Zhu}} = \frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{N-1} \left( \sum_{m=0}^{N_{\text{tap}}-1} x_{k-m} e^{-j\pi \frac{k-m}{2}} h_m \right)^2 \right\}.$$
 (2.37)

In the course of this work, h is a simple 4-tap moving average (MA) lowpass filter. It should be emphasized that all algorithms are also applicable to higher-order modulation formats and to complex-valued signals in coherent optical transmission systems. Furthermore, the algorithms are not restricted to non-return-to-zero (NRZ) signals, but can also be used for digitally pulse-shaped waveforms as long as the signal spectrum extends into the second Nyquist zone. Moreover, the algorithms are not limited to  $2 \, \text{Sa/Sym}$ , or  $4 \, \text{Sa/Sym}$  for the OEM algorithm, but can also be adapted to fractional oversampling ratios, provided that the position of the clock tone is taken into account and aliasing is avoided [35].

Table 2.1 provides an overview of the various NDA clock recovery algorithms presented and, for the sake of completeness, also lists other popular decision-directed (DD) algorithms, i.e., algorithms that include symbols after decisioning. The symbol decided from a sample  $x_k$  is denoted by  $\hat{x}_k$ . Note, that if the center sampling point  $x_{2k+1}$  of the Gardner algorithm is obtained after the hard-decision circuit, providing a symbol  $\hat{x}_{2k+1}$ , the TED corresponds to the DD early-late detector [45, 46].

**Table 2.1:** Overview of timing error detector (TED) and timing estimator (TE) algorithms. In the column "features", the oversampling ratio required for the algorithms is mentioned as well as other properties of the algorithms. FB: feedback, FF: feedforward, DD: decision-directed, NDA: non-data aided, TD: time domain, FD: frequency domain.

| Algorithm                    | Features                | Equation                                                                                                                                                                   |
|------------------------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Early-late Detector [45, 46] | TED, DD, TD, 2 Sa/Sym   | $\hat{\varepsilon} = \sum_{k=0}^{N/2-1} \hat{x}_{2k+1} \left( x_{2k} - x_{2k+2} \right)$                                                                                   |
| Müller & Müller [47]         | TED, DD, TD, 1 Sa/Sym   | $\hat{\varepsilon} = \sum_{k=0}^{N-1} \hat{x}_k x_{k+1} - \hat{x}_{k+1} x_k$                                                                                               |
| Zero-crossing Detector [48]  | TED, DD, TD, 2 Sa/Sym   | $\hat{\varepsilon} = \sum_{k=0}^{N/2-1} (\hat{x}_{2k} - \hat{x}_{2k+2}) x_{2k+1}$                                                                                          |
| Gardner [43]                 | TED, NDA, TD, 2 Sa/Sym  | $\hat{\varepsilon} = \sum_{k=0}^{N/2-1} x_{2k+1} \left( x_{2k} - x_{2k+2} \right)$                                                                                         |
| Oerder & Meyr [36]           | TE, NDA, TD, >2 Sa/Sym  | $\hat{\tau} = \frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{N-1} x_k^2 e^{-j\pi \frac{k}{2}} \right\}$                                                                           |
| Zhu [44]                     | TE, NDA, TD, >1 Sa/Sym  | $\hat{\tau} = \frac{1}{2\pi} \arg \left\{ \sum_{k=0}^{N-1} \left( \sum_{m=0}^{N_{\rm tap}-1} x_{k-m}  \mathrm{e}^{-\mathrm{j}  \pi \frac{k-m}{2}}  h_m \right)^2 \right\}$ |
| Godard [42]                  | TED, NDA, FD, >1 Sa/Sym | $\hat{\varepsilon} = \frac{1}{2\pi} \Im \left\{ \sum_{n=0}^{N/2-1} \tilde{x}_n \tilde{x}_{n+\frac{N}{2}}^* \right\}$                                                       |
| BAJ [40]                     | TED, NDA, FD, >1 Sa/Sym | $\hat{\tau} = \frac{1}{2\pi} \left( \arg \left\{ \tilde{x}_n \right\} - \arg \left\{ \tilde{x}_{n+\frac{N}{2}} \right\} \right)$                                           |
| Modified BAJ [C1, C2, 41]    | TE, NDA, FD, >1 Sa/Sym  | $\hat{\tau} = \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \tilde{\underline{x}}_n \tilde{\underline{x}}_{n+\frac{N}{2}}^* \right\}$                                     |

## 2.3.1.5 Timing Estimation Post-Processing

The timing estimation provides a sampling offset  $\hat{\tau}$  defined in the interval [-0.5, 0.5). To compensate for the estimated group delay, the signal has to be delayed by the inverted timing estimate. Afterwards, since the sampling offset is normalized to the symbol period, the inverted timing estimate is normalized to

the sampling period. For twofold oversampling, this requires a multiplication by a factor of two. Finally, in order to track a clock phase walk beyond a sampling period (also referred to as unit interval), the estimated timing phase is unwrapped. The result serves as the delay normalized to the sampling interval to be corrected in the EB and interpolator. The EB is a type of first-in first-out (FIFO) register to delay the signal by an integer sampling period delay m, while the interpolator interpolates the signal for a fractional delay  $\mu$  defined in [0,1). Hence, the integer delay  $m \in \mathbb{Z}$  and fractional delay  $\mu$  are obtained as

$$m = \lfloor \operatorname{unwrap} \{-2\hat{\tau}\} \rfloor$$

$$\mu = \operatorname{mod}_{1} \{\operatorname{unwrap} \{-2\hat{\tau}\} \},$$
(2.38)

where  $\operatorname{mod}_1\{\cdot\}$  is the modulo-1 operation. In the following two subsections, the interpolator and EB modules are explained.

## 2.3.2 Interpolation

The term interpolation in the context of clock recovery is not to be mistaken with the classic terminology of interpolation and decimation in DSP. Interpolation in sampling adjustment, more precisely also named "digital phase shifting" [49], "sampling-rate conversion" [49], or "digital delay element" [50], includes, in addition to the classic interpolation, also the upsampling before the filtering and subsequent downsampling back to the processing rate. Furthermore, it also addresses the additional noise or interpolation error. This chapter introduces the objective of digital interpolation. Afterwards, a practical implementation method of an interpolation filter such as the Lagrange interpolator, which is suited for hardware-efficient implementation in high-speed transceivers, is presented.

## 2.3.2.1 Ideal Interpolation

We consider a sampled signal sequence with samples  $x_k = x(kT_{sa})$  at sampling instance  $k \in \mathbb{Z}$ . Provided that the Nyquist sampling theorem is fulfilled, i.e., the

single-sided signal bandwidth B is equal or less than half the sampling rate  $f_{\rm sa}$ , the sampled signal  $x_k$  contains the same information as the analog signal x(t) before the ADC and any arbitrary sampling point of the analog signal can be digitally obtained using an ideal interpolator [51, 52].

By upsampling the signal and subsequent lowpass-filtering using an ideal interpolator, the sampling rate can be increased arbitrarily. The ideal interpolator is a lowpass with rectangular-shaped amplitude in FD and cut-off frequency at the Nyquist frequency. This allows to remove the spectral images of the upsampled signal as explained in the appendix B. The sampling phase offset can then be selected according to the temporal phase difference between the transmitter and receiver clock and the sampling rate of the signal can be reduced back to the initial sampling rate without any penalty caused by aliasing [49, 53]. Assuming identical upsampling and downsampling factors for a digital delay element, the above steps can be combined into a single  $\mu$ -dependent filter [49, 51, 53] with the frequency response  $\tilde{h}(f,\mu)$ 

$$\tilde{\underline{h}}(f,\mu) = \begin{cases}
T_{\text{sa}} e^{j 2\pi f \mu T_{\text{sa}}} & |f| < B \\
0 & \text{else}.
\end{cases}$$
(2.39)

The frequency response is a brick-wall lowpass filter with constant gain and a linear phase term due to the time delay  $\mu$ . From this, the filter's impulse response  $h_k(\mu)$  results as a sinc function

$$h_k(\mu) = \operatorname{sinc}\left(\frac{\pi}{T_{\operatorname{sa}}}\left(kT_{\operatorname{sa}} + \mu T_{\operatorname{sa}}\right)\right), \qquad k \in \mathbb{Z},$$
 (2.40)

where k is the filter coefficient index and  $\mu \in [0,1)$  is the fractional timing delay normalized to the sampling period, i.e., the inverse deviation of the estimated ideal from the actual sampling time. Fig. 2.9 shows an outtake of the infinitely broad ideal interpolator impulse response for  $\mu = 0$  and  $\mu = 0.3$ .

The ideal interpolator is a non-causal filter with infinite length and, hence, it cannot be realized by a practical filter [54, 55]. The objective of practical filter



**Fig. 2.9:** Outtake of the infinitely broad impulse response of a sinc interpolator. The blue curve represents a continuous-time sinc as reference for the discrete filter coefficients shown in red.

design is therefore to approximate these properties in the best possible way in order to avoid an approximation error that can be considered as additional noise, which impairs the interpolation quality [55]. Such an approximation error occurs when not all spectral images are suppressed and result in aliasing, a non-constant gain in the passband distorts the amplitude of the signal, or a nonlinear spectral phase distorts the signal.

Although the amplitude error for an infinite impulse response (IIR) filter can be chosen to be arbitrarily small due to its infinite-long memory, such filters have a nonlinear spectral phase, so that a phase error cannot be avoided [51]. While a single IIR filter allows a hardware-efficient implementation due to its recursive structure [51], for a present fractional sampling offset  $\mu$ , all past samples for all possible delays  $\mu$  must be computed and stored, which represents an exceptionally high computational effort [54]. For this reason, finite impulse response (FIR) filters are the preferred choice for interpolation [54].

An FIR filter can be designed with a linear phase and the amplitude error can be chosen as small as necessary at the expense of the filter's length. Approximating the ideal interpolator by truncating the sinc function with  $N_{\rm tap}$  filter taps generates large ripples near the transition band due to the Gibbs phenomenon [55]. A more suitable approach is the minimum-mean-squared error (MMSE) FIR filter, which

approximates the ideal interpolator by minimizing the mean-squared error of the frequency response. An important constraint in the optimization and design of interpolator filters is to ensure that the interpolated curve matches the input samples, i.e.,

$$h_k(\mu = 0) = \begin{cases} 1 & k = 0 \\ 0 & \text{else} \end{cases}$$
  $h_k(\mu = 1) = \begin{cases} 1 & k = -1 \\ 0 & \text{else} \end{cases}$  (2.41)

where  $k \in \{-(N_{\rm tap}-1)/2, \ldots, (N_{\rm tap}-1)/2\}$  and  $N_{\rm tap}$  is an odd number to grant a linear-phase FIR filter [51, 53] (see Fig. 2.9). More details about the MMSE FIR filter design are provided in [53, 56]. Since the filter taps differ for various  $\mu$ , all  $N_{\rm tap}$  coefficients need to be pre-computed and stored as look-up table (LUT) for a number of  $N_{\rm L}$  distinct time delays, i.e., for various  $\mu_l = l/N_{\rm L}$  with  $l \in \{0,1,\ldots,N_{\rm L}-1\}$ . The  $N_{\rm L}$  filters are then arranged in a parallel filterbank. According to the sampling offset  $\mu$ , only the corresponding subfilter needs to be computed. Doing so, a filterbank with  $N_{\rm tap} \times N_{\rm L}$  coefficients needs to be implemented. This leads to an increased memory consumption for storing the coefficients and a timing discretization error. Since  $N_{\rm tap}$  and  $N_{\rm L}$  tend to become large to achieve a low approximation error, the total filter structure becomes inappropriate fast [53].

## 2.3.2.2 Polynomial Interpolation

The question arises if there is an alternative method to the MMSE FIR filter to lower the hardware requirements. The polynomial interpolation approximates the ideal FIR coefficients by polynomials in  $\mu$  and offers a possibility for a hardware efficient implementation suitable for high-speed applications. Although the constraint of using polynomials results in a worse filter approximation compared to an MMSE filter, the polynomial approach shows a better performance in terms of minimizing the timing discretization error. Furthermore, polynomial-based filters are easy to describe, are extensively studied in literature, and show despite all good

filter characteristics [57]. The MMSE FIR filter coefficients are approximated by a polynomial in the measurand  $\mu$  as

$$h_k(\mu) = \sum_{m=0}^{M(k)} c_{k,m} \mu^m, \qquad (2.42)$$

where  $h_k(\mu)$  are the filter coefficients in dependence of the timing delay  $\mu$ , M(k) is the polynomial degree which can vary for any k, and  $c_{k,m}$  are the polynomial coefficients. For the sake of simplicity, the polynomial degree is assumed to be same for all filter coefficients, i.e., M(k) = M. The polynomial coefficients are again obtained by minimizing the quadratic error with respect to the ideal interpolator under the constrains stated in eq. (2.41). Describing the filter by its z-transform (see appendix A.2.2) and inserting the polynomial ansatz yields

$$\tilde{h}(z,\mu) = \sum_{k=-(N_{\text{tap}}-1)/2}^{(N_{\text{tap}}-1)/2} h_k(\mu) z^{-k}$$

$$= \sum_{k=-(N_{\text{tap}}-1)/2}^{(N_{\text{tap}}-1)/2} \left[ \sum_{m=0}^{M} c_{k,m} \mu^m \right] z^{-k} .$$
(2.43)

By interchanging the summations, the z-transform can be expressed as

$$\tilde{h}(z,\mu) = \sum_{m=0}^{M} \mu^m \underbrace{\left[ \sum_{k=-(N_{\text{tap}}-1)/2}^{(N_{\text{tap}}-1)/2} c_{k,m} z^{-k} \right]}_{:=\tilde{h}_m(z)}.$$
(2.44)

As shown in the appendix A.2.3.1,  $\tilde{h}_m(z)$  represents an FIR filter with coefficients  $c_{k,m}$ , which are independent of  $\mu$ . All M+1 filters weighted by  $\mu$  in a pipeline structure represents the so-called Farrow filter, named after C. W. Farrow, who first introduced the architecture in 1988 [50] (see Fig. 2.10). By choosing M to be small, much more hardware efficient filters can be realized compared to the MMSE filter. Therefore, already for small values  $M \le 3$  very good performance and low signal degradation can be observed [53, 57]. Another advantage compared to FIR

filters in a filterbank is, that the coefficients of the polynomial-based filters are computed "online" and thus no additional memory to store the coefficient values for distinct delays  $\mu_l$  is required [54]. Due to this,  $N_{\rm L}$   $\mu$ -dependent filters in a filterbank can be replaced by  $M \ll N_{\rm L}$  filters in a Farrow structure.



Fig. 2.10: Farrow filter structure of the polynomial-based interpolator.

### 2.3.2.3 Lagrange Interpolation

Any classical polynomial interpolation can be described in terms of its Lagrange coefficients [57]. Since the Lagrange interpolation uses a more complex base consisting of real polynomials  $l_k$  instead of the monomials  $\mu^m$  used in eq. (2.42), the polynomials can be weighted directly with the signal x [58] (note that the monomials are used to approximate the MMSE filter, which is then applied to the signal x). The general, continuous-time expression for a Lagrange interpolation [53, 58] of a continuous-time function y(t) from it's base points  $x_k$  is

$$y(t) = \sum_{k=|-N_{\text{tap}}/2|+1}^{\lfloor N_{\text{tap}}/2 \rfloor} l_k(t) x_k, \qquad (2.45)$$

where  $\lfloor \cdot \rfloor$  rounds to the next lower integer and the Lagrange coefficients are defined as

$$l_k(t) = \prod_{\substack{i=\lfloor -N_{\text{tap}}/2 \rfloor \\ i \neq k}}^{\lfloor N_{\text{tap}}/2 \rfloor + 1} \frac{t-i}{k-i}.$$
 (2.46)

The base points  $x_k$  do not have to be equidistantly-spaced sampled, nor must the number of points  $N_{\mathrm{tap}}$  be an even number. It is obvious that  $l_k(i) = \delta_{k-i}$  applies [58] and, therefore, the interpolated points y(t=i) match with the base points  $x_k$ , which fulfills the condition in eq. (2.41). As shown in the following section,  $N_{\mathrm{tap}}$  is assumed to be an even number and only a single interpolant around the center interval  $0 \le t < 1$  is computed [51]. With only one interpolated value in the center interval (see subsection 2.3.2.4), the time t corresponds to the fractional time offset t, i.e., t = t [53]. Eq. (2.45) can then be rewritten with t0 being the interpolated value at the filter output as

$$y_n(\mu) = \sum_{k=|-N_{\text{tan}}/2|+1}^{\lfloor N_{\text{tap}}/2 \rfloor} l_k(\mu) x_{n+k}.$$
 (2.47)

In addition, we describe a non-causal interpolation as we consider that a set of samples  $x_{n+k}$  with  $k \in \{\lfloor -N_{\text{tap}}/2 \rfloor + 1, \dots, \lfloor N_{\text{tap}}/2 \rfloor\}$  is computing a centered

n-th sample  $y_n$  with delay  $\mu$ , as this is commonly done in the field of interpolation. This way it is simpler to illustrate the symmetry property of the filter and it is more intuitive to describe an interpolant in the center with index n. Next, the Lagrange coefficients are expressed as polynomials in  $\mu$  [53] as

$$l_k(\mu) = \sum_{m=0}^{N_{\text{tap}}-1} c_{k,m} \mu^m,$$
 (2.48)

with coefficients  $c_{k,m}$ . In [51], a formula to directly calculate the Lagrange coefficients  $l_k$  and hence also the factors  $c_{k,m}$  is provided as

$$l_{k}(\mu) = \frac{(-1)^{k+N_{\text{tap}}/2}}{\left(\frac{N_{\text{tap}}}{2} - 1 + k\right)! \left(\frac{N_{\text{tap}}}{2} - k\right)! (\mu - k)} \times \prod_{i=1}^{N_{\text{tap}}} \left(\mu + \frac{N_{\text{tap}}}{2} - i\right), \qquad N_{\text{tap}} \text{ even}$$

$$l_{k}(\mu) = \frac{(-1)^{k+(N_{\text{tap}}-1)/2}}{\left(\frac{N_{\text{tap}}-1}{2} + k\right)! \left(\frac{N_{\text{tap}}-1}{2} - k\right)! (\mu - k)} \times \prod_{i=0}^{N_{\text{tap}}-1} \left(\mu + \frac{N_{\text{tap}}-1}{2} - i\right), \qquad N_{\text{tap}} \text{ odd}.$$
(2.49)

Analog to the polynomial interpolation, eq. (2.48) can be inserted in eq. (2.47) and the summations interchanged, which results in

$$y_{n}(\mu) = \sum_{k=\lfloor -N_{\text{tap}}/2 \rfloor}^{\lfloor N_{\text{tap}}/2 \rfloor} \left[ \sum_{m=0}^{N_{\text{tap}}-1} c_{k,m} \mu^{m} \right] x_{n+k}$$

$$= \sum_{m=0}^{N_{\text{tap}}-1} \mu^{m} \left[ \sum_{k=\lfloor -N_{\text{tap}}/2 \rfloor+1}^{\lfloor N_{\text{tap}}/2 \rfloor} c_{k,m} x_{n+k} \right].$$
(2.50)

Again the expression in the square brackets represents an FIR filter, whereas the whole equation can be computed as the output of a Farrow structure. Although the Lagrange interpolation seems very similar to the classical polynomial interpolation, it has some important differences. For the classical polynomial-based

filters, a polynomial represents a single filter tap  $h_k$  (see eq. (2.42)), whereas for the Lagrange interpolator, a polynomial describes the entire range of  $\mu$  covered by the  $N_{\rm tap}$  filter taps. The Lagrange coefficients are determined by the input signal and can therefore be easily calculated, whereas the coefficients of the classical polynomial interpolators are a result of optimization techniques. Furthermore, the degree of the Lagrange interpolator is always  $N_{\rm tap}-1$ , whereas M can be freely chosen for the polynomial-based filters [53]. Due to the simple computation of the Lagrange coefficients and good filter characteristics, the Lagrange interpolator is commonly applied in digital communications.

#### 2.3.2.4 Lagrange Interpolator - Filter Design

The interpolator is part of the overall communication system and thus its design depends on a variety of factors. The approximation error of the interpolator and the quantization error caused by a limited bit resolution of the coefficients can be interpreted as an additional noise source which affects the SNR of the signal and therefore the bit error ratio (BER) as well. Furthermore, the signal bandwidth at a given processing rate is decisive for the amount of aliasing [55]. In the following, the criteria for the design of a Lagrange interpolator are discussed. For this purpose, the impulse and frequency responses of the Lagrange interpolator are analyzed and simulated, which provide adequate insights into the characteristics of the filter.

In contrast to a conventional FIR filter, which has  $N_{\rm tap}$  base points and provides a single output value, the Lagrange interpolator in its general form in eq. (2.45) uses  $N_{\rm tap}$  base points to provide  $N_{\rm tap}-1$  interpolators for the intervals between the input samples. Since the Lagrange coefficients  $l_k(t)$  are different polynomials in t (or here  $\mu$ ), the associated impulse response of the interpolator is piecewise polynomial and can be interpreted as  $N_{\rm tap}-1$  separate impulse responses, each corresponding to the  $N_{\rm tap}-1$  different interpolation intervals between the  $N_{\rm tap}$  base points [51, 57]. Essentially, an interpolator upsamples and subsequently filters its input and can be represented by its subfilters. To compare the Lagrange interpolator with such a filtering process and to visualize the impulse responses

within the interpolation intervals, an upsampling factor  $U_{\uparrow}$  is introduced. This allows to formulate the time delay in eq. (2.45) as  $\mu = n/U_{\uparrow}$ . As an example, a Lagrange interpolator with degree two and  $N_{\rm tap} = 3$  is considered. Thus, three base points span two interpolation intervals. Using eq. (2.45), each filter output per interval with index n can be computed by

$$y_n = \sum_{k=-1}^{1} l_k \left(\frac{n}{U_{\uparrow}}\right) x_k , \qquad (2.51)$$

where n can lie within the different intervals

$$-U_{\uparrow} < n \le 0$$

$$0 < n < U_{\uparrow}.$$
(2.52)

Comparing eq. (2.51) with the convolution operation of a conventional FIR filter h, eq. (2.51) can be expressed as

$$y_n = \sum_{k=-1}^{1} h_{n-kU_{\uparrow}} x_k \,. \tag{2.53}$$

Comparing eq. (2.51) and eq. (2.53) and using eq. (2.49) for odd  $N_{\rm tap}$  to obtain the Lagrange coefficient, the different impulse responses per interval are [51]

$$h_{n+U_{\uparrow}} = l_{-1} \left( \frac{n}{U_{\uparrow}} \right) = \frac{1}{2} \frac{n^{2}}{U_{\uparrow}^{2}} - \frac{1}{2} \frac{n}{U_{\uparrow}}$$

$$h_{n} = l_{0} \left( \frac{n^{2}}{U_{\uparrow}^{2}} \right) = -\frac{n^{2}}{U_{\uparrow}^{2}} + 1$$

$$h_{n-U_{\uparrow}} = l_{1} \left( \frac{n}{U_{\uparrow}} \right) = \frac{1}{2} \frac{n^{2}}{U_{\uparrow}^{2}} + \frac{1}{2} \frac{n^{2}}{U_{\uparrow}^{2}}.$$
(2.54)

Fig. 2.11 shows the concatenated piecewise impulse responses for the two interpolation intervals of the Lagrange interpolator with an upsampling factor  $U_{\uparrow} = 5$ . Choosing a higher upsampling factor will yield a smoother impulse response.



Fig. 2.11: Impulse responses h for the interpolation intervals  $-U_{\uparrow} < n \le 0$  (left) and  $0 < n \le U_{\uparrow}$  (right) of the degree-two Lagrange interpolator and an upsampling factor of  $U_{\uparrow} = 5$ .

It is apparent that both impulse responses are not symmetrical to the y-axis, i.e.,  $h_k \neq h_{-k}$ , because the interpolation intervals do not lie symmetrically in the middle of all base points. Therefore, the filter has no linear spectral phase. This is true for any Lagrange interpolator with an odd number of base points [51]. It remains to be investigated, how the impulse responses for even  $N_{\rm tap}$  look like. As before, eq. (2.51) to eq. (2.54) can be formulated for a Lagrange filter with degree three and  $N_{\rm tap} = 4$  as

$$y_n = \sum_{k=-1}^{2} l_k \left(\frac{n}{U_{\uparrow}}\right) x_k$$

$$= \sum_{k=-1}^{2} h_{n-kU_{\uparrow}} x_k$$
(2.55)

with the  $x_k$  samples spanning the interpolation intervals

$$-U_{\uparrow} < n \le 0$$

$$0 < n \le U_{\uparrow}$$

$$U_{\uparrow} < n \le 2U_{\uparrow}.$$
(2.56)

Since now four base points span three intervals, four piecewise polynomial impulse responses result:

$$h_{n+U_{\uparrow}} = l_{-1} \left( \frac{n}{U_{\uparrow}} \right) = -\frac{1}{6} \frac{n^{3}}{U_{\uparrow}^{3}} + \frac{1}{2} \frac{n^{2}}{U_{\uparrow}^{2}} - \frac{1}{3} \frac{n}{U_{\uparrow}}$$

$$h_{n} = l_{0} \left( \frac{n}{U_{\uparrow}} \right) = \frac{1}{2} \frac{n^{3}}{U_{\uparrow}^{3}} - \frac{n^{2}}{U_{\uparrow}^{2}} - \frac{1}{2} \frac{n}{U_{\uparrow}} + 1$$

$$h_{n-U_{\uparrow}} = l_{1} \left( \frac{n}{U_{\uparrow}} \right) = -\frac{1}{2} \frac{n^{3}}{U_{\uparrow}^{3}} + \frac{1}{2} \frac{n^{2}}{U_{\uparrow}^{2}} + \frac{n}{U_{\uparrow}}$$

$$h_{n-2U_{\uparrow}} = l_{2} \left( \frac{n}{U_{\uparrow}} \right) = \frac{1}{6} \frac{n^{3}}{U_{\uparrow}^{3}} - \frac{1}{6} \frac{n}{U_{\uparrow}}.$$
(2.57)

Fig. 2.12 depicts the three impulse responses for the interpolant within the three intervals for a degree-three Langrange interpolator. Again it can be seen that the impulse responses of the interval  $-U_{\uparrow} < n \le 0$  and interval  $U_{\uparrow} < n \le 2U_{\uparrow}$  exhibit a nonlinear spectral phase. However, the center interval for  $0 < n \le U_{\uparrow}$  is symmetrical to the y-axis and thus has a linear phase. It can be concluded that a linear-phase Lagrange interpolator can be realized by calculating only the interpolant in the center interval. This is in analogy to the conventional FIR filter, where only one output value is computed with  $N_{\rm tap}$  filter taps and agrees with the statement for  $0 \le \mu = n/U_{\uparrow} < 1$  in eq. (2.47).

Next, the properties related to the order of the Lagrange filter are investigated. The most accurate approximation of the ideal lowpass and, hence, good filter properties come at the price of an increased hardware complexity, since more base points need to be processed. In the following, the frequency response of the linear-phase Lagrange interpolator will be analyzed with respect to the signal oversampling. Given the impulse responses by eq. (2.49), the frequency response is obtained as the Fourier transform of the impulse response. Fig. 2.13 shows the impulse response and corresponding frequency response of the interpolators with degrees  $N_{\rm tap}-1\in\{3,5,7,9\}$ .

The impulse responses for higher-degree polynomials is broader and approaches more closely a sinc function. The broader impulse response contains more zeros,



Fig. 2.12: Impulse responses h for the three interpolation intervals of the degree-three Lagrange interpolator with an upsampling factor of  $U_{\uparrow} = 5$ .



Fig. 2.13: Impulse response (left) and frequency response (right) for odd-degree Lagrange interpolators with  $N_{\rm tap}-1 \in \{3,5,7,9\}$ . Smooth curves are obtained for tenfold oversampling and a  $2^{11}$ -point FFT.

which increases the attenuation in the spectrum around multiples of the sampling frequency. These are the positions coincident with the spectral images of the upsampled signal (see appendix B). A broader attenuation band thus minimizes aliasing. Furthermore, a wider passband around zero frequency can be recognized for higher degrees. While for a degree-three Lagrange interpolator the -40-dB bandwidth at the first multiple of the sampling rate is about  $0.41 \times f_{\mathrm{sa}}$ , for the interpolators of degrees five, seven, and nine it is already  $0.52 \times f_{\rm sa}$ ,  $0.59 \times f_{\rm sa}$ , and  $0.64 \times f_{\rm sa}$ , respectively. The attenuation of the first side lobe at sampling rate is -29, -31, -32, and -33 dB for a degree of three, five, seven, and nine, respectively . It can be clearly seen that for larger  $N_{\rm tap}$  the filter quality improves, but pays the price of an increased computational complexity. Beside the interpolator frequency response, the amount of aliasing when downsampling is depending on the signal bandwidth and oversampling ratio. A symbol rate DSP is preferred in low-power transceivers for short-distance transmission links while in longerdistance coherent links an oversampled DSP sampling rate may be used. For this reason, the aliasing for different polynomial degrees and oversampling ratios of 1 Sa/Sym and 2 Sa/Sym is investigated. The latter is chosen, since many clock recovery algorithms are investigated at twofold oversampling for convenience. Fig. 2.14 and Fig. 2.15 illustrate the signal spectrum of an oversampled receive signal without noise and root-raised cosine (RRC) pulse shape with 0.1 roll-off<sup>1</sup>, the interpolator frequency response magnitude, and the amount of aliasing by multiplying the signal images with the filters frequency response.

For 1 Sa/Sym, the baseband signal spectrum and its spectral images are overlapping. While for an raised cosine (RC) pulse shape the neighboring spectra result in a flat spectrum, for the RRC there is a bulge in the spectrum in the overlap region due to constructive interference of the frequency components, which results in ISI. As the signal spectrum is (almost) flat over all frequencies, the amount of aliasing is large. For the left edge of the first spectral image at sample rate, even

For convenience, a low roll-off RRC pulse shape is assumed, as it would be used in coherent transmission. Although, in IM/DD systems no digital pulse shaping is applied, the NRZ signal's sinc-shaped spectrum will result in a more rectangular spectrum due to bandwidth limitations and, hence, the signal spectra with images will look similar to the ones shown here.



**Fig. 2.14:** Visualization of aliasing for degree three, five, seven, and nine Lagrange interpolators at 1 Sa/Sym and for an RRC pulse shape with 0.1 roll-off. The amount of aliasing when downsampling is indicated by the yellow-shaded area which arises due to the multiplication of the signal with its images and the interpolator filter frequency response.

for higher polynomial degrees, the signal portion leading to aliasing does not fall below  $-10 \, dB$ , resulting in strong additional noise of the interpolation signal.

On the other hand, the twofold oversampled signal in Fig. 2.15 shows a reduced normalized bandwidth and hence less aliasing. Whereas the amount of aliasing decreases rapidly towards higher-degree Lagrange polynomials, almost falling bellow  $-60\,\mathrm{dB}$  for a degree of nine, a simple Lagrange interpolator of degree three already features aliasing lobes with less than  $-30\,\mathrm{dB}$ . Since this is considerably low and will only require  $4\times4=16$  filter coefficients, it is commonly used in practical communication systems. Since some coefficients are trivial operations



**Fig. 2.15:** Visualization of aliasing for degree three, five, seven, and nine Lagrange interpolators at 2 Sa/Sym and for an RRC pulse shape with 0.1 roll-off. The amount of aliasing when downsampling is indicated by the yellow-shaded area which arises due to the multiplication of the signal with its images and the interpolator filter frequency response.

like 0, +1, or -1, only 10 non-trivial coefficients have to be implemented in hardware. The filter architecture for a degree three Lagrange interpolator with filter coefficients shown in eq. (2.57) in a Farrow filter structure (see eq. (2.50)) is depicted in Fig. 2.16.

## 2.3.3 Elastic Buffer

For oscillators that have a clock phase drift comprising several unit intervals or for a CFO between transmitter and receiver oscillators, an EB is required that can



Fig. 2.16: Degree-three Lagrange interpolator in a Farrow filter structure.

adjust integer unit interval delays. Fig. 2.17 shows the samples before and after clock recovery for a constant CFO where the receiver clock runs either slower or faster than the transmitter clock. In this case, the sampling phase of the too fast and too slow receiver clock increases ( $\mu = +0.3$  per time instance) and decreases ( $\mu = -0.3$  per time instance) linearly with time, respectively. For a too fast clock there are more samples available than necessary. Due to this, if m increases, one sample has to be skipped. The analog case applies to a too slow receiver oscillator, where less samples are available than required. When m decreases, one sample has to be used twice for sampling offset correction.

An EB is used to select the correct samples for the subsequent interpolator. The EB is a FIFO register in which the number of parallel read and write samples (parallelization factor P) as well as the read and write clock can differ. Fig. 2.18 shows an EB for a serial input and output data stream and for a parallel input and output data stream with parallelization P=3. In this example, the read and write clock is identical. The correct samples for interpolation can thus be controlled simply by the read address, which is incremented/decremented by  $m_k$ . Since the sampling phase difference of the transmitter and receiver clock changes only slowly compared to the DSP clock rate, it is sufficient to correct a sampling offset for all P parallel samples per clock cycle, which significantly simplifies the EB implementation. As an example, a large clock instability of  $\pm 20$  ppm for a 140-GBd transceiver [5, 6], which samples at 150 GSa/s and features a 1 GHz ASIC-DSP clock rate, corresponds to a rate of  $\pm 6$  MHz of which m changes, which is still very low in relation to the DSP clock rate.

A hardware design constraint is the finite memory depth of the EB. This is no concern for systems with FB clock recovery, where the receiver oscillator frequency locks to the transmitter oscillator frequency and thus the CFO is zero. In this case, the EB memory depth must be chosen such that a certain clock phase walk due to remaining phase noise is covered over a few unit intervals. However, for free-running oscillators, a CFO leads to a constantly accumulating sampling phase offset. At some point, this leads to the read address reaching the end of the EB. A receiver clock that is too fast will lead to a buffer underflow. In this case, the EB and the entire DSP must pause for one clock cycle and reset the EB

#### (a) Receiver Clock too Fast



#### (b) Receiver Clock too Slow



Fig. 2.17: Illustration of the time offset correction for a too fast (a) and too slow (b) receiver clock. The analog waveform (gray) is sampled with sampling period  $T_{\rm sa}$  and results to  $x_{k,\tau}$  shown in blue.  $\mu$  is the fractional sampling offset corrected by the interpolator and m is the integer delay corrected by the EB. The corrected samples  $x_{k,\tau=0}$  are shown in red.

# (a) Serial Elastic Buffer



## (b) Parallel Elastic Buffer



Fig. 2.18: General overview of an EB with (a) serial sample in- and output as well as (b) parallel sample in- and output.

read address to the initial value. A receiver clock that is too slow will lead to a buffer overflow. In this case, the EB has to drop some samples and reset the EB read address to the initial value in order to "catch up" with the transmitter clock. As this leads to information being lost, it is important to prevent this scenario. One option would be to profit from the advantages of FB clock control and FF clock recovery by implementing both architectures [15]. Another option for an all-digital implementation is to operate the receiver clock slightly faster than the transmitter clock [2]. However, this solution requires two separate clock circuits

in a transceiver design since the transmitter and receiver clock frequencies are different, which in turn leads to increased analog complexity. Chapter 3 of this thesis presents a novel EB control method that allows the use of free-running oscillators while assuring synchronization when the receiver clock is too slow or too fast without any loss of samples.

## 2.3.4 Phase-Locked Loop

In order to emulate the clock recovery's capability to track a CFO, the entire clock recovery architecture has to be implemented. That includes the digital PLL design for the case of an FB synchronization. Fig. 2.19(a) shows the basic structure of a digital PLL. Here, the phase detector comprises the timing corrector and TED (see Fig. 2.5 for comparison). The error value is forwarded to the LF, which is often a proportional-integral control, as shown in Fig. 2.19(a) below. The proportional arm multiplies the error signal by  $c_{\rm p}$ , while the integral arm multiplies by  $c_{\rm i}$  and additionally accumulates past values. The integral arm ensures a constant control loop output for  $\hat{\varepsilon} \to 0$ , while the proportional arm accelerates the convergence of the loop for large error signals in the initial phase t = 0. Choosing the correct values for  $c_{\rm p}$  and  $c_{\rm i}$  determines the control loop bandwidth and hence convergence time as well as its stability. For an ideal control loop without a FB loop delay, i.e.,  $D_{\rm L}$  = 0, the loop coefficients can be derived from the loop bandwidth  $B_{\rm L}$  that is normalized to the sample rate, the damping factor  $\zeta_L$ , the TED sensitivity  $c_d$ , which is the derivative of the s-curve at  $\tau - \hat{\tau} = 0$  (see Fig. 2.19(b)), and the natural frequency  $\omega_n$  [59] as

$$\omega_{\rm n} = \frac{8B_{\rm L}\zeta_{\rm L}}{1 + 4\zeta_{\rm I}^2} \tag{2.58}$$

as

$$c_{\rm i} = \frac{\omega_n^2}{c_{\rm d}}$$

$$c_{\rm p} = \frac{2\zeta_{\rm L}}{\sqrt{\frac{c_{\rm d}}{c_{\rm i}}}}.$$
(2.59)

#### (a) Basic PLL Architecture



Fig. 2.19: (a) Basic PLL architecture with detailed view of the loop filter. (b) Exemplary s-curve with derivation at zero-crossing to derive  $c_{\rm cl}$ .

However, in a practical hardware implementation, the loop stability in a FB scheme suffers from an inner-loop delay  $D_{\rm L}$ , introduced by pipelining, filtering, or other inner-loop mechanisms, which can become very large in optical systems [60, 61]. This makes the PLL a higher-order control loop. To remain stable,  $B_{\rm L}$  must be reduced or  $\zeta_{\rm L}$  increased, which in turn reduces the PLL acquisition time – an important factor in burst-switched systems. The optimum parameters can be found using numerical optimization. In this thesis, a closed-form description of the PLL parameters is used, in which only the dominant poles in the Laplace domain are used to approximate a higher-order system to a 2nd-order loop [62]. This provides the advantage that the PLL parameters can be calculated analytically instead of performing extensive optimizations. To analytically derive the loop coefficients for a higher-order control loop, the dominant-pole method, which approximates

the higher-order control loop by only choosing its dominant pole in the Laplace domain [62], is used. Here, the auxiliary variables R and  $\theta$ 

$$R = e^{-\omega_n \zeta_L}$$

$$\theta = \omega_n \sqrt{1 - \zeta_L^2}$$
(2.60)

are introduced to derive B and C

$$B = R^{D_{L}+1} \cos(\theta(D_{L}+1)) - 2R^{D_{L}} \cos(\theta D_{L}) + R^{D_{L}-1} \cos(\theta(D_{L}-1))$$

$$C = R^{D_{L}+1} \sin(\theta(D_{L}+1)) - 2R^{D_{L}} \sin(\theta D_{L}) + R^{D_{L}-1} \sin(\theta(D_{L}-1)).$$
(2.61)

Using this, the loop coefficients  $c_i$  and  $c_p$  normalized to  $c_d$  can be computed as

$$c_{\rm p} = -\frac{\frac{C}{R\sin(\theta)}}{c_{\rm d}}$$

$$c_{\rm i} = \frac{\frac{1 - R\cos(\theta)}{R\sin(\theta)}C - B}{c_{\rm d}}.$$
(2.62)

The PLL implemented in this thesis features a PI LF with a bandwidth of  $B_{\rm L}=0.005$  and a damping factor of  $\zeta_{\rm L}=0.707$  [59]. Fig. 2.20 shows the step responses of the FF and FB clock recovery schemes for a constant sampling phase offset of 0.5 and -0.5 of an NRZ signal at twofold oversampling using the simulation setup shown in Fig. 2.22. The instantaneous timing estimation is directly evident for the FF architecture. The sampling offset of 0.5 and -0.5 is estimated with 128 symbols and 2048 symbols, respectively. Due to the higher averaging at 2048 symbols, the estimation is less noisy over time. For the FB case, only a block length of 128 symbols is chosen, since the temporal response for a block length of 2048 symbols is similar, but requires more time to converge. The sampling offset of 0.5 and -0.5 is simulated for a loop delay of  $D_{\rm L}=70$  and  $D_{\rm L}=20$  clock cycles (256 samples per clock cycle), respectively. For both sampling offsets, the timing estimation is shown for the case without additional delay, with delay and keeping the true 2nd-order PLL parameters, as well as the optimized loop parameters

according to the dominant-pole method. In this example, the additional lowpass characteristic of the LF becomes apparent, which is particularly suitable for continuous-transmission communication systems with low high-frequency jitter. In burst-switched systems, however, the PLL must repeatedly converge to new bursts of signals. In addition, the control loop instability caused by FB delays must be taken into account. While this is not yet noticeable for a delay of 20 clock cycles, the loop becomes less stable for 70 cycles delay and using the conventional PLL parameters. Optimizing the PLL parameters improves the stability, but at the expense of an increased acquisition time. Here, it takes about 800 timing estimates, i.e.,  $800 \times 256 = 204,800$  samples, until the loop converges, corresponding to  $1.024~\mu s$  acquisition time assuming a 100-GBd system.



Fig. 2.20: Step responses for FF and FB schemes for a simulated 50-GBd signal with an SNR of 15 dB and  $f_{\rm 3dB}$  =35 GHz. For FF, the mod-BAJ algorithm is compared for a block length of 128 symbols and 2048 symbols. For FB, the Gardner algorithm is compared for an inner-loop delay of  $D_{\rm L}$  = 70 and  $D_{\rm L}$  = 20 and PLL design parameters  $B_{\rm L}$  = 0.005 and  $\zeta_{\rm L}$  = 0.707.

# 2.4 Digital Clock Recovery Performance Benchmark

In the following section, the advantages and disadvantages of the digital FF and FB clock recovery architectures will be discussed on the basis of simulated performance evaluations. To this end, the performance metric for determining the self-noise jitter of TEDs and TEs is first introduced and explained. Afterwards, the jitter for various system parameters of a 50-GBd IM/DD system is shown. A final subsection is dedicated to the performance for a transmitter and receiver CFO.

#### 2.4.1 Jitter Metric for Clock Recovery Algorithms

In addition to the VCO jitter, discussed in section 2.1, the TED and TE algorithms also exhibit a degree of uncertainty in the timing estimation, known as self-noise jitter [63]. When investigating novel clock recovery algorithms, the literature often only refers to a (self-noise) jitter and assume ideal physical properties of the oscillators, i.e., no jitter generation and transfer of the preceding systems. In this thesis, the self-noise jitter is also used to evaluate the algorithms. When comparing the performance of TEDs and TEs, it is important to consider the different algorithm architectures. While the TE in FF schemes has a linear relationship between the clock phase estimate  $\hat{\tau}$  and the actual clock phase offset  $\tau$ , the TED in FB schemes provides an error estimate  $\hat{\varepsilon}$  related to the remaining clock phase offset  $\tau - \hat{\tau}$ , usually as a sine function. In the latter case, the timing error over the sampling phase offset is therefore also referred to as s-curve. Fig. 2.21 shows a selection of simulated linear curves and s-curves for various SNRs for the mod-BAJ (top row) and the Godard (bottom row) scheme, respectively.

A frequently applied metric when benchmarking clock recovery algorithms is the tracking jitter J [1, 35], which is described as the variance  $var(\tau - \hat{\tau}) = std^2(\tau - \hat{\tau})$  (or mean-squared error) in decibel. It describes the statistical distribution to what extent a certain clock offset can be compensated. For a PLL, this describes the



**Fig. 2.21:** Simulated timing estimates  $\hat{\tau}$  (mod-BAJ algorithm, upper row) and normalized timing error values  $\hat{\varepsilon}$  (Godard algorithm, lower row) for an actual clock phase  $\tau$  and clock phase offset  $\tau - \hat{\tau}$ . Each column corresponds to an electrical SNR of 30 dB, 14 dB, and 6 dB normalized to the symbol rate. For each computation of the timing error  $\hat{\varepsilon}$  or timing estimate  $\hat{\tau}$ , a random two-level pulse amplitude modulation (PAM2) waveform with a constant sampling offset  $\tau$  and with N samples is simulated. Simulation parameters: PAM2 modulation, N=256 samples, 3-dB component bandwidth  $f_{\rm 3dB}=35$  GHz, 25,000 constant timing offset realizations.

steady-state error, i.e., the difference between the desired value  $\tau$  and the actual value  $\hat{\tau}$  of a converged control system at  $t \to \infty$  and  $\hat{\varepsilon} \to 0$ . Hence, the jitter can be understood as the horizontal width of the s-curve in the zero-crossing

$$J_{FB} = 20 \log_{10} \left( \operatorname{std}(\tau - \hat{\tau}) \right) \Big|_{\hat{\varepsilon}=0}. \tag{2.63}$$

However, this approach cannot be applied to FF architectures, as these do not employ a control loop. In this case, a clock phase  $\hat{\tau}$  is directly estimated from the input signal with sampling offset  $\tau$ . Since the jitter is defined as difference of the both values, it would correspond to the vertical deviation of the clock phase estimation to the actual clock phase in the linear curve. To avoid the unwrapping

at  $\tau = \pm 0.5$  and considering that the deviation is same for all  $\tau$ , the jitter for TEs in the vertical zero-crossing is defined as

$$J_{FF} = 20 \log_{10} \left( \operatorname{std}(\tau - \hat{\tau}) \right) \Big|_{\tau=0}$$
 (2.64)

In order to later examine the bandwidth limitations of the respective architectures for a CFO, the entire PLL is implemented for the FB architecture. The control loop then converges to the estimated sampling phase  $\hat{\tau}$ . In this case, the simulated clock phase offset  $\tau$  of the signal is simply compared to the derived/estimated clock phase  $\hat{\tau}$  for the FB/FF system and the jitter can be computed as  $J=20\log_{10}\left(\operatorname{std}(\hat{\tau}-\tau)\right)$  for any  $\tau$ . To avoid an uncertainty of the jitter when the estimated clock phase  $\hat{\tau}$  is wrapped at  $\tau=\pm\,0.5$ , the clock phase is unwrapped before calculation.

#### 2.4.2 Clock Recovery Performance Evaluation

The presented FB and FF clock recovery algorithms are investigated for different system and algorithm parameters. For this purpose, a 50-GBd system with IM/DD is simulated, which is impaired by CD and AWGN, as displayed in Fig. 2.22. At the transmitter, a sequence of PAM2 or four-level pulse amplitude modulation (PAM4) symbols is generated and each symbol is repeated by eight to model NRZ pulses in the pulse shape block. Bandwidth limitations of a DAC and other analog components at the transmitter are taken into account by applying a 5th-order Bessel lowpass filter with a 3-dB bandwidth of  $f_{3dB}$ . This signal is offset by a minimum necessary bias to obtain a positive optical power, i.e., modeling the ideal case of a modulator with infinite extinction ratio (ER). The optical field amplitude is obtained by computing the square-root. To simulate the fiber-optic channel of length L, the resulting signal is impaired by CD for a certain CD coefficient  $D_{\rm CD}$ . At the receiver side, the signal is detected by a photodiode, which is modeled as square-law detector, and AWGN is added to set a certain electrical SNR that is normalized to the symbol rate. The resulting signal is once again filtered by a Bessel lowpass and downsampled to either 2 Sa/Sym or 4 Sa/Sym for the OEM method. Finally, a sampling offset  $\tau$  is added before the clock recovery is applied to the signal.



**Fig. 2.22:** Simulated IM/DD link for clock recovery algorithm evaluation. At the transmitter, a PAM2/PAM4, NRZ, signal is generated and impaired by CD in the fiber. At the receiver, after square-law detection, AWGN is added and a lowpass filter applied to emulate bandwidth limitations. After downsampling to 2 Sa/Sym or 4 Sa/Sym, a sampling offset is added to the sequence, which is then fed to the clock recovery module.

For the parameter sweeps shown in Fig. 2.23, a constant fractional sampling offset  $\tau \in (-0.5, 0.5]$  normalized to the symbol period is added for 14,000 different values of  $\tau$  in order to obtain the linear curve/s-curve as depicted in Fig. 2.21 and to compute the jitter. Unless stated otherwise,  $L = 0 \,\mathrm{km}$  (no CD),  $f_{\mathrm{3dB}} =$ 35 GHz, a block length of  $M_{\rm B}$  = N/2 = 128 symbols, and an SNR of 15 dB were chosen as default parameters. First, the jitter for PAM2 and PAM4 modulation is investigated. As can be seen in Fig. 2.23(a), the performance degrades with increasing noise but does not differ between the modulation formats, although higher-order modulation formats require a higher SNR for achieving the same BER. This is because PAM2 and PAM4 have the same total signal power and hence the choice of the modulation format does not affect the generated clock tone. For this reason, the following simulations continue with a PAM2 modulation. Furthermore, note that all algorithms perform the same, since they all rely on the cyclostationarity of the signal [1]. Next, it is investigated to what extent the block length  $M_{\rm B}$  can improve the SNR sensitivity. For this purpose, a block length of 128 symbols (dashed line) and 2048 symbols (solid line) is compared in

Fig. 2.23(b). It can be noticed that the averaging effect due to the correlation in the FD improves the performance with increasing block length. For a block length of 2048 symbols, clock recovery can be guaranteed down to an SNR of 0 dB. Note, that in case of narrow-bandwidth pulse shaping, e.g., for a RRC filter with small roll-off factor, the overlap of the left and right spectral sidebands is limited to only a few frequency bins around half the symbol rate. This will reduce the averaging effect of the linear phase in FD and hence results in more noisy clock phase estimations. In this case, clock recovery suitable for low roll-off signals or fasterthan-Nyquist (FTN) signals will help, as presented in chapter 4. Furthermore, it should be emphasized that additional averaging of the timing estimate, e.g., by averaging the complex-valued correlation terms before computing the angle (mod-BAJ, Zhu, OEM) or imaginary part (Godard) or by lowpass filtering due to the PLL LF, can further decrease the noise impairment. This allows retaining low jitter at extremely low SNRs of down to −20 dB. However, large block lengths and additional lowpass filtering come at the cost of increased computational effort, e.g., larger FFT sizes and reduced tracking speed of the random clock phase walk (see chapter 6)[C3]. To investigate the influence of the block length  $M_{\rm B}$ further, the SNR is now kept constant at 15 dB, while  $M_{\rm B}$  is swept in Fig. 2.23(c). Although the jitter decreases for increasing block lengths, the trade-off between performance and complexity must be considered in practical algorithm design. Finally, the influence of bandwidth limitation is examined in Fig. 2.23(d), since this is an important system parameter in, e.g., high-speed PONs, in which 25Gcomponents are commonly reused. In general, it can be seen that for bandwidths above half the symbol rate, the clock recovery works well, while below half the symbol rate the jitter increases rapidly. The reason for this behavior is that the cyclostationary spectrum no longer extends into the second Nyquist zone. Hence, 25G-components do not affect the timing synchronization performance in 50-GBd PONs. In case of extremely bandwidth-limited systems, algorithms designed for FTN systems can be used (see chapter 4) [32, 64].



Fig. 2.23: Comparison of the simulated algorithm jitter performance for various parameter sweeps. Default parameters are PAM2 modulation, fiber length  $L=0\,\mathrm{km}$ , Bessel lowpass 3-dB bandwidth  $f_\mathrm{3dB}=35\,\mathrm{GHz}$ , block length  $M_\mathrm{B}=128\,\mathrm{symbols}$ , SNR = 15 dB. (a) SNR sweep for PAM2 (left figure) and PAM4 (right figure). (b) SNR sweep for 128 symbols and 2048 symbols for FB algorithms (left figure) and FF algorithms (right figure). (c) Block length sweep for a constant SNR of 15 dB (see eye diagram as inset). (d) Jitter over component bandwidth normalized to the symbol rate.

# 2.4.3 Impact of Clock Frequency Offset on Clock Recovery

To emulate a CFO, a linearly increasing clock phase  $\tau$  is added to a PAM2 sequence with a length of 2<sup>21</sup> samples at twofold oversampling. For the sake of clarity and because of the identical performance for FF and FB algorithms, only the mod-BAJ algorithm for FF schemes and the Gardner algorithm for FB schemes are simulated. To investigate the bandwidth limitation of the entire clock recovery structure, the jitter metric as defined in subsection 2.4.1 is used. Fig. 2.24 depicts the jitter over CFO in ppm for the mod-BAJ algorithm and the Gardner algorithm implemented using the dominant-pole variant. Fig. 2.24(a) shows that FF algorithms can track high-frequency clock jitter up to thousands of ppm and are only limited in the temporal resolution given by the block processing. For a block length of 2048 symbols, the phase estimation update rate is  $50 \, \text{GHz} / 2048 \approx$ 24 MHz, which in turn corresponds to  $10^6 \times 24 \,\mathrm{MHz}/100 \,\mathrm{GHz} = 240 \,\mathrm{ppm}$ . This would correspond to a single timing estimate for each clock phase drift from zero to one. Due to this, the CFO has to be less than 240 ppm to still be sufficiently resolved in time. This explains the abrupt increase of the jitter above 200 ppm, while for shorter block lengths, frequency offsets larger than 1000 ppm can still be tracked. In addition to this limitation, the FB schemes are limited by the lowpass characteristic of the control loop, which is essentially determined by the inner-loop delay. Though for large block lengths the timing estimate has less jitter, the temporal resolution decreases and the time needed to converge to the optimum clock phase increases, thus limiting the tolerance for high-frequency jitter. Fig. 2.24(b) illustrates some examples for the clock phase tracking. In (1) and (2), the limited temporal resolution of the FF algorithms can be seen. Note that for simplicity, no interpolation between the timing estimates is implemented, i.e., the estimate is repeated for all samples of a block. An overlap of the processing blocks could increase the temporal resolution, allowing higher CFOs to be resolved. This can be easily realized for FF schemes in real-time systems by buffering the signal before the timing estimation, while a parallel FB is difficult to realize in FB schemes. In (3), the PLL converges to the optimum sampling point within 1  $\mu$ s for  $D_L$  = 20. However, if the inner-loop delay is increased to  $D_L$  = 70 in  $\boxed{4}$ , the PLL reacts too slowly and can no longer follow the phase drift.



**Fig. 2.24:** Jitter performance for a signal impaired by CFO. (a) Jitter for a CFO sweep using the mod-BAJ algorithm (FF) and Gardner algorithm (FB). (b) Exemplary clock phase tracking for FF and FB schemes.

#### 3 Elastic Buffer Design for All-Digital Clock Recovery

A linearly increasing sampling phase offset, resulting from free-running receiver oscillators, poses a significant challenge to the practical realization of all-digital clock recovery architectures. This chapter introduces a novel EB concept designed to prevent buffer overflows, thereby enabling robust all-digital clock recovery. The content of this chapter has been published as preprint on arXiv in 2025 [P1]. The material has been adapted to align with the format and structure of this thesis.

[Beginning of paper [P1]]

#### Elastic Buffer Design for Real-Time All-Digital Clock Recovery Enabling Free-Running Receiver Clock with Negative and Positive Clock Frequency Offsets

P. Matalla, J. Dittmer, M. S. Mahmud, C. Koos, and S. Randel

Published on arXiv (2025)

Eprint: arXiv: 2507.13748 (eess.SP)

#### 3.1 Introduction

Clock recovery is a fundamental component in communication systems, enabling synchronization of the receiver's sampling clock with the transmitter's in both frequency and phase. In systems without DSP, this is achieved using an analog

PLL that tunes the receiver VCO. Modern high-speed optical transceivers that employ a DSP, however, often generate the VCO control signal digitally. A TED and LF derive a digital control signal from the sampled received signal, which is then converted into a voltage using a low-speed DAC to control the VCO [4, 15]. Replacing the analog control path with a fully digital implementation eliminates the need for analog control circuits and the DAC and thus fully exploits the advantages of modern CMOS technology, e.g., enhanced power efficiency and reduced chip area [30]. An all-digital implementation is a prerequisite for a FF clock recovery architecture [J2, C1, C2]. In such architectures, the estimated sampling phase  $\hat{\tau}$  is digitally corrected via a digital delay element. To do so,  $\hat{\tau}$  is decomposed into an integer sampling period delay m and a fractional sampling period delay  $\mu$ . Afterwards, an EB, which is a type of FIFO register with variable read address as well as read/write clocks and bus widths that may differ, compensates for the integer delay by selecting the appropriate samples. These samples are then interpolated according to  $\mu$ , typically using a Lagrange interpolator. The overall FF clock recovery using the Zhu algorithm [44] for timing phase estimation is illustrated in Fig. 3.1.

A key limitation of all-digital clock recovery architectures arises in the presence of CFOs. In such cases, the EB must continuously compensate for the accumulating phase drift between the transmitter and receiver clocks. If the receiver clock is faster than the transmitter clock, the EB will be empty at some point (buffer underflow). This can be mitigated by temporarily pausing the DSP to allow the buffer to refill. Conversely, if the receiver clock is slower, the EB eventually overflows, requiring to drop some samples which results in irreversible data loss. To prevent such scenarios, hybrid clock recovery architectures are commonly employed. These combine a FF path for high-frequency jitter compensation with a FB loop that physically tunes the VCO [15]. One all-digital approach proposed in [2] suggests operating the receiver clock marginally faster than the transmitter clock to avoid an EB overflow. However, this method presents practical challenges. It requires either separate VCOs for the transmitter and receiver paths of a transceiver or a shared clock that is slightly up-converted to generate the receiver clock using a PLL. Achieving the necessary small frequency offset

(within a standardized tolerance of about  $\pm 20$  ppm, e.g., for a 400ZR standard [5]) requires PLLs with near-unity multiplication factors, which in turn require impractically high divider and multiplier values.

This paper presents an EB design that is capable of handling receiver CFOs lower and higher than the transmitter clock frequency, which facilitates the use of free-running oscillators in transceivers with all-digital clock recovery. The proposed EB is implemented on an FPGA for a real-time 30-Gbit/s on-off-keying (OOK) optical transmission, demonstrating robust, error-free operation across CFOs ranging from -400 to +400 ppm.



Fig. 3.1: FF clock recovery building blocks for DSP hardware implementation.

#### 3.2 Elastic Buffer Concept

A buffer overflow occurs when the receiver samples too slowly. While increasing the receiver sampling rate can prevent this, it is typically avoided due to the reasons mentioned earlier. Digital resampling faces similar limitations: despite efficient polyphase filterbank implementations, the high up- and downsampling ratio quickly leads to large hardware designs in parallel implementation. In this work, we propose to overclock the EB without altering the sampling rate, i.e., reading the samples faster from the EB than new samples are written. Again, a PLL to generate a faster read clock is not suitable due to the conversion ratio constraints and leads to crossing clock domains in hardware. Instead, in FPGA or ASIC implementations with parallel processing, overclocking is achieved by choosing a read bus width larger than the write bus width. This allows more samples to be read per cycle, with the read address incremented accordingly to prevent duplicate reads. As a result, buffer underflows occur regularly even under negative CFOs, but buffer overflows are effectively eliminated.

Fig. 3.2 illustrates the EB concept with an input of 3 parallel samples and an output of 4 parallel samples after a Lagrange interpolator. In each clock cycle, input samples are written into the register from the left, while the interpolator reads the required samples from the right. The gray-shaded output samples indicate the additional samples required by the interpolator memory. Since one additional sample is generated per cycle, the read address increments by one each cycle (blue arrows). In case of a phase wrap of the fractional delay, the integer delay has to adjust the address by  $\pm 1$ . For negative CFOs, the integer delay decreases, requiring one sample to be repeated (indicated by a red m-shift). Hence, the read address remains unchanged during that cycle (third register). If the read address reaches the buffer end (fourth register), the EB is reset in the next cycle and the subsequent DSP is paused to avoid double processing. This is signaled by setting the active-low enable signal en\_o high. A key question is how much negative CFO this method can tolerate. Assuming a fractional sampling offset is sampled in the interval [0,1) with only two estimates over time, a phase wrap takes place every second cycle, i.e., a sample must be repeated. However, since the read

address still advances by one each cycle, even this worst-case scenario avoids buffer overflow. The green-shaded area indicates the samples used for timing estimation. Consequently, the timing offset compensation may incur a maximum latency of one clock cycle, depending on the data read address. However, this offset corresponds to extreme clock jitter or CFO and is negligible in practical scenarios.



Fig. 3.2: EB concept demonstrating the necessity of repeating a sample and the case of an buffer underflow.

#### 3.3 FPGA Implementation

To validate the proposed EB concept, we implement the entire FF clock recovery architecture on an AMD Virtex UltraScale+ XCVU9P FPGA. The hardware modules, including bus widths and port names are depicted in Fig. 3.1, with bus widths formatted as "number of samples × bit width per sample". The system processes 256 parallel input samples per cycle with nominally 2 samples per symbol oversampling. Unless noted otherwise, all data is represented as signed integers. First, the input samples are split into two paths. In the lower path, the sampling phase normalized to the sampling period is estimated using the Zhu algorithm [C1, C2, 44] with a block size of 256 samples (one clock cycle), followed by an MA over 16 cycles for smoothing the timing estimate (as described in [J2]). This produces one sampling offset in the interval [-1,1) for a block of 256 samples every cycle, which is then unwrapped in the subsequent module. The phase unwrapping module computes the phase steps, applies a phase unwrap using an edge detector, and accumulates the result over time. The resulting accumulated phase dly\_o has a 16-bit resolution, with the 6 least-significant bits (LSBs) (unsigned) representing the fractional component. The integer delay m and fractional delay  $\mu$  can be easily obtained by splitting the phase into the most-significant bit (MSB) and LSB, respectively. To align the data path with the estimated sampling offset, the upper path used for sample correction is delayed to match the latency of the estimation path. The EB is configured for 256 parallel input samples and 258 output samples after the Lagrange interpolator (to maintain an even output width). Accounting for the Lagrange interpolator memory, a total of 256 + 2 + 3 = 261 samples are read per cycle from the EB. The EB outputs, along with the appropriately delayed fractional sampling offset, are then processed by a third-order Lagrange interpolator to generate 258 timing-corrected samples. An external reset signal is used to initialize the MA filter, phase accumulator, and EB address in case of undefined states during FPGA startup. Note, that the entire clock recovery system operates within only a single FPGA clock domain.

#### 3.4 Real-Time Experiment

The real-time clock recovery is validated in an IM/DD optical back-to-back system with 30-Gbit/s OOK as depicted in Fig. 3.3. The transmitter and receiver processing units each consists of a Keysight USPA platform equipped with an AMD FPGA. The transmitter generates a 30 Gbit/s NRZ signal, which modulates a 1540-nm distributed-feedback (DFB) laser using a 34-dB 3-dB-bandwidth electro-absorption modulator (EAM) (Optilab LT-40-E-M). The optical power is attenuated to -1 dBm to fully utilize the ADC dynamic range after detection by a 27-GHz 3-dB bandwidth photodiode (Optilab PR-40G-M). The ADC samples the signal at 60 GSa/s, resulting in nominally twofold oversampling as required by the clock recovery. The clock recovery is performed in real-time on the FPGA and afterwards 2<sup>18</sup> received symbols, along with the fractional sampling offset, EB address, and en\_o signal, are written into a memory for analysis.

To assess performance under various CFOs, the receiver clock is detuned by manually changing the external oscillator frequency. Fig. 3.4 illustrates the timing behavior for CFOs of +20 ppm and -200 ppm. As expected, a higher CFO results in more +1 address corrections and thus less closely spaced en\_o signals.

Fig. 3.5 presents the signal-to-noise-and-distortion ratio (SNDR) (without any equalizer) and BER derived from the recorded data as a function of CFO. As the timing estimate update rate is 60 GHz/256 and the 16-tap MA filter has a 3-dB bandwidth of about 1/32-th and the first spectral null at about 1/16-th of the update rate, the 3-dB clock recovery bandwidth results to 7.32 MHz and a failure to track the CFO is expected at 14.65 MHz. These values align well with the observed clock recovery failure in Fig. 3.5. The SNDR at 0 ppm is 1.5 dB higher than at adjacent CFOs. This CFO-related effect arises from a dynamic common-mode voltage control of the ADC, which alters the signal amplitude for different sampling offsets due to the changing sample statistics. Advanced ADC designs in transceivers can mitigate this behavior.

[End of paper [P1]. The paper's conclusion is added to chapter 8.]





**Fig. 3.3:** Optical back-to-back transmission setup. (a) Schematic overview of the experimental setup. (b) Picture of the real-time setup.



Fig. 3.4: Fractional sampling offset, enable signal en\_0, and EB address for CFOs of  $+20 \, \text{ppm}$  and  $-200 \, \text{ppm}$ .



Fig. 3.5: Experimental results of the BER and SNDR over CFO in units MHz and ppm obtained from  $2^{18}$  received symbols.

# 4 Chromatic Dispersion Tolerant Clock Recovery for IM/DD Systems

This chapter reports on the impact of CD on the oversampled digital clock recovery in IM/DD systems and proposes two novel TEs that are tolerant to CD. The chapter is taken from the Early Access Preprint [J3] and is accepted for publication at the *Journal of Lightwave Technology*. The material from the publication has been adapted to comply with the layout and the structure of this thesis.

[Beginning of paper [J3]]

#### Impact of Chromatic Dispersion on Oversampled Digital Clock Recovery in Direct-Detection Systems: Analysis and Solutions

P. Matalla, C. Koos, and S. Randel

Journal of Lightwave Technology, Early Access (2025) DOI: https://doi.org/10.1109/JLT.2025.3600353

Cloud-based services such as the training of AI/ML models in large-scale data centers as well as smart city applications such as the Internet of things (IoT), connected driverless vehicles, etc., are key drivers for ever higher data rates, especially in short-range fiber-optic links. Such optical transceivers employ IM/DD to reduce the costs and power consumption and, therefore, CD is a nonlinear channel effect with respect to the received signal. With net data rates currently going up to 100 Gbit/s for PONs and 400 Gbit/s for Ethernet connections, CD is a main

limitation in transmission performance at such high data rates. To reconstruct the signal impaired by CD, research has mainly discussed nonlinear equalizers and ML models, while, to the best of the authors knowledge, no research has been conducted on the influence of CD on NDA, digital clock recovery in direct-detection systems. In this work, we investigate analytically and in simulation the effect of CD on oversampled digital clock recovery and find that for some dispersion values, clock recovery fails entirely. Based on our findings, we present two dispersion-tolerant clock recovery algorithms that work for NRZ and digital pulse-shaped signals as well as FTN signals. Finally, we validate our findings and algorithms in a 34-GBd PAM4 transmission experiment for NRZ and RRC pulse-shaped signals and different accumulated dispersion values.

#### 4.1 Introduction

Modern applications such as AI models trained and processed in the cloud, virtual reality, smart city applications, e.g., IoT, automated and connected driving, and many more, require dense and widespread communication networks that transmit high data rates with high reliability and low latency [J1, 66–68]. Such applications are a key driver for ever higher data rates, especially over short distances less than few tens of kilometers. For such distances, optical fiber links based on IM/DD are the preferred choice due to their lower energy consumption and lower cost compared to coherent transceivers. The fields of application of IM/DD transceivers range from PONs to DCIs and intra-DC optical links. All of these systems share a common challenge – as the data rate increases, CD becomes an increasingly limiting factor of the transmission performance, since the quadratic phase distortion of the optical field spectrum leads to a nonlinear channel effect when detecting the optical power.

In intra-DC and DCI, current research is considering 800 GE (Gigabit Ethernet) and 1.6 TE (Terabit Ethernet) as the next-generation standard, while discussion on 3.2 TE has already started [66–68]. Such high data rates are achieved by WDM, whereby it is more lucrative to achieve the highest possible data rate

per wavelength, e.g., 16×200G or 8×400G for 3.2 TE. To provide an aggregated net data rate of 200 Gbit/s per wavelength up to fiber distances of 10 to 20 km in O-band, higher-order modulation formats, e.g., PAM4 up to eight-level pulse amplitude modulation (PAM8), are under consideration [69–71]. Despite the low dispersion in O-band, dispersion is a main limitation in transmission performance at such high symbol rates. The situation is similar for PONs, which are pointto-multipoint optical networks with passive optical splitters, used, for example, for broadband connectivity in cities [J1] and 6G fronthaul [72] and leverage the technological advances in the Ethernet market. For the recently standardized 50G-PON, PAM2 modulation at 50 GBd is considered. With fiber lengths of up to 20 or even 40 km, CD is already a limiting factor here, which is why DSP equipped with adaptive equalizers is employed. The introduction of a DSP also allows conventional analog CDR circuits to be migrated to the digital domain. For the 50G-PON standard, a dispersion range from -127 to 77 ps/nm is defined in O-band [73]. Current research is investigating the next potential IM/DD PON with a line rate of 100 to 125 Gbit/s using either 50 to 62.5 GBd PAM4 or 100 to 125 GBd PAM2. Depending on the choice of wavelength, a dispersion tolerance from -120 to 80 ps/nm is required [74].

Previous research has mainly focused on system aspects and feasibility studies for IM/DD systems at such high data rates. A major focus here was the signal attenuation in the spectral dips caused by CD in direct-detection links and on counteracting this with sophisticated nonlinear equalizers and ML methods [75, C8]. The effect of CD on the digital clock recovery is investigated in several works for coherent transmission [33, 76]. However, to the best of the authors' knowledge, no research has been done on the effect of CD on the clock recovery in direct-detection systems.

This work is structured in two parts. In the first part, we analyze and simulate the effect of CD on oversampled digital clock recovery in direct-detection systems. We holistically investigate the effect for signals with NRZ pulses as well as signals with RRC pulse shaping. Building on the findings from the first part, we then propose two CD-tolerant clock recovery algorithms that work for NRZ or high roll-off signals as well as for low roll-off or extremely bandwidth-limited signals

such as FTN signals. Finally, we confirm our analysis and the viability of the proposed clock recovery algorithms in a 34-GBd PAM4 transmission experiment over accumulated dispersion values of up to 600 ps/nm.

# 4.2 Impact of Chromatic Dispersion on Digital Clock Recovery

### 4.2.1 Signal Propagation through Chromatic Dispersive Channel

To understand the effect of CD on the clock recovery for direct reception at the receiver end, we first consider the influence of CD on the received intensity-modulated signal. In [77, 78], the received optical power impaired by CD is analytically derived. We consider a zero-mean, real-valued transmit signal  $s_{\rm tx}(t)$  in TD, which is normalized to the mean optical power  $P_0$  and  $\min\{s_{\rm tx}(t)\} \ge -1$ . The transmitted optical power can then be described as  $P_{\rm tx}(t) = P_0(1 + s_{\rm tx}(t))$ . Considering chirp-free modulation and neglecting any noise, we can assume a constant optical phase and express the transmitted complex amplitude with respect to optical center frequency as the square-root of the optical power, i.e.,

$$E_{\rm tx}(t) = \sqrt{P_0 \left(1 + s_{\rm tx}(t)\right)} \tag{4.1}$$

with the ER in decibels given as

$$ER_{dB} = 10 \log_{10} \left( \frac{1 + \max(s_{tx}(t))}{1 + \min(s_{tx}(t))} \right). \tag{4.2}$$

The square-root can be expressed as a power series resulting in

$$E_{\rm tx}(t) = \sqrt{P_0} \left( 1 + \sum_{n=1}^{\infty} \alpha_n s_{\rm tx}^n(t) \right), \tag{4.3}$$

with coefficients

$$\alpha_n = \frac{(-1)^{n-1}(2n)!}{4^n(n!)^2(2n-1)} \quad \text{for} \quad n \in \mathbb{N}^+.$$
 (4.4)

As the optical signal propagates through a dispersive medium, it experiences CD. The optical channel  $\underline{\tilde{h}}_{cd}(f)$  can be described by an allpass filter with quadratic spectral phase in FD as

$$\tilde{\underline{h}}_{cd}(f) = \exp\left(j\frac{\pi}{\vartheta}\lambda^2 f^2 L D_{cd}\right) = \exp\left(j\zeta(f)\right),$$
(4.5)

where  $D_{\rm cd}$  is the CD coefficient, f the equivalent baseband frequency of the signal,  $\lambda$  the center wavelength, L the fiber length,  $\vartheta$  the speed of light, the tilde indicates being in the Fourier domain, and the underline denotes a complex value/signal. For notational convenience, we abbreviate the argument by  $\zeta(f) = \pi \lambda^2 f^2 L D_{\rm cd}/\vartheta$ . Since the real and imaginary parts of the channel frequency response are both even real functions, the real and imaginary parts of the impulse response  $\underline{h}_{\rm cd}(t) = \mathcal{F}^{-1}\left\{\underline{\tilde{h}}_{\rm cd}(f)\right\}$  can be expressed as

$$\Re\{\underline{h}_{\mathrm{cd}}(t)\} = \mathcal{F}^{-1}\left\{\Re\left\{\underline{\tilde{h}}_{\mathrm{cd}}(f)\right\}\right\} = \mathcal{F}^{-1}\left\{\cos\left(\zeta(f)\right)\right\}$$

$$\Im\{\underline{h}_{\mathrm{cd}}(t)\} = \mathcal{F}^{-1}\left\{\Im\left\{\underline{\tilde{h}}_{\mathrm{cd}}(f)\right\}\right\} = \mathcal{F}^{-1}\left\{\sin\left(\zeta(f)\right)\right\}.$$
(4.6)

The received optical power then results as the absolute square of the received optical field as

$$P_{\rm rx}(t) = \left(E_{\rm tx}(t) * \Re\{\underline{h}_{\rm cd}(t)\}\right)^2 + \left(E_{\rm tx}(t) * \Im\{\underline{h}_{\rm cd}(t)\}\right)^2, \tag{4.7}$$

where \* denotes the convolution operation. Since the Fourier transform of the constant term corresponds to the Dirac impulse  $\delta(f)$  and using eq. (4.6), the

convolution of the direct-current term with the real and imaginary part of the CD impulse response results to one and zero, respectively, i.e.,

$$\mathcal{F}\{1 * \Re\{\underline{h}_{\mathrm{cd}}(t)\}\} = \delta(f)\cos(\zeta(f)) = 1$$

$$\mathcal{F}\{1 * \Im\{\underline{h}_{\mathrm{cd}}(t)\}\} = \delta(f)\sin(\zeta(f)) = 0.$$
(4.8)

The received electrical signal  $x(t) \propto P_{\rm rx}(t)$  results in [78]

$$x(t) \propto 1 + s_{\text{tx}}(t) * \Re{\{\underline{h}_{\text{cd}}(t)\}} + \epsilon(t)$$
(4.9)

with

$$\epsilon(t) = 2 \sum_{n=2}^{\infty} \alpha_n s_{\text{tx}}^n(t) * \Re{\{\underline{h}_{\text{cd}}(t)\}}$$

$$+ \left[ \left( \sum_{n=1}^{\infty} \alpha_n s_{\text{tx}}^n(t) * \Re{\{\underline{h}_{\text{cd}}(t)\}} \right)^2 + \left( \sum_{m=1}^{\infty} \alpha_m s_{\text{tx}}^m(t) * \Im{\{\underline{h}_{\text{cd}}(t)\}} \right)^2 \right].$$

$$(4.10)$$

The first term in eq. (4.9) is a constant direct-current offset, while the second term is referred to as power fading (PF). Considering the Fourier transform of the received signal  $\tilde{x}(f)$ 

$$\underline{\tilde{x}}(f) \propto \delta(f) + \underline{\tilde{s}}_{tx}(f)\tilde{h}_{pf}(f) + \underline{\tilde{\epsilon}}(f)$$
 (4.11)

with the frequency response

$$\tilde{h}_{\rm pf}(f) = \Re\left\{\tilde{\underline{h}}_{\rm cd}(f)\right\} = \cos\left(\zeta(f)\right),$$
(4.12)

we find that the power fading effect shapes the spectrum of the received signal and imposes nulls at frequencies  $f_{m}$ 

$$f_m = \pm \sqrt{\frac{(1+2m)\vartheta}{2\lambda^2 L D_{\rm cd}}}$$
 for  $m \in \mathbb{N}^0$ . (4.13)

Note, that the impact of the power fading depends on the spectral extent of the signal  $s_{\rm tx}(t)$  and the accumulated dispersion  $LD_{\rm cd}$ .  $\epsilon(t)$  comprises the nonlinear interference terms, which include the power fading of the signal's power series and signal-signal beat interferences.

The simulation setup used to investigate the effect of CD is shown in Fig. 2.22. Here, we simulate a 112-GBd PAM4 signal with NRZ pulse shape at 8-fold oversampling. Afterwards, we apply a 5th-order Bessel lowpass filter with a 3-dB bandwidth of 78 GHz (70% of symbol rate) to account for bandwidth limitations. For practical transceivers, the component bandwidth can be significantly lower which is examined in section 4.4. CD is added to the waveform and the signal detection is modeled as square-law operation.

Fig. 4.1(a) shows the transmit and receive spectrum for  $2^{12}$  symbols, 24 ps/nmaccumulated CD, and averaged over 100 signal realizations to obtain a smooth averaged spectrum. For a small ER of the modulator of 0.1 dB, the power fading effect is dominating and the spectral dips in accordance to eq. (4.13) are clearly visible. For a more realistic ER of 10 dB, the nonlinear interference is emphasized in the power fading dips and tones at multiples of the symbol rate are generated. Fig. 4.1(b) shows the amplitude ratio function in decibel between the transmitted  $\tilde{s}_{tx}(f)$  and received  $\tilde{x}(f)$  signal, where  $\tilde{s}_{tx}(f)$  and  $\tilde{x}(f)$  are the transmit signal after the Bessel lowpass and receive signal after photodetection, respectively, and the power fading function. The amplitude ratio function agrees well with the power fading function. For higher ER, the nonlinear interference increasingly distorts the signal amplitude. In the spectral dips arising from the NRZ pulse at 112 GHz and 224 GHz, the distortion due to nonlinear interference dominates as less signal power is present. Similarly, Fig. 4.1(c) compares the phase difference between the transmit and receive signals  $\arg\{\tilde{\underline{s}}_{tx}(f)\tilde{\underline{x}}^*(f)\}\$  with the phase of the power fading function. In case of dominating power fading (left plot), the negative sign of the power fading frequency response  $\tilde{h}_{\rm pf}(f)$  causes a  $\pi$ -phase shift. In addition, the nonlinear ISI will cause a phase distortion around the spectral nulls, as indicated for the case with a 10 dB ER (right plot).



Fig. 4.1: Simulated effect of CD on the received 112-GBd PAM4 NRZ signal in a direct-detection link with zero sampling offset  $\tau=0$ . (a) Transmit (blue) and receive (orange) signal spectrum for an accumulated dispersion of  $LD_{\rm cd}=24\,{\rm ps/nm}$ . Amplitude ratio function in decibel (b) and phase difference (c) between the transmit and receive signal (blue) and power fading function (red).



Fig. 4.2: (a) Phase of cosine product  $\arg\{\cos(\zeta_n)\cos(\zeta_{n-N/2})\}$  (red) and signal phase  $\arg\{\tilde{x}_n\tilde{x}_{n+N/2}^*\}$  relevant for the timing estimation (blue) over the frequency bins n for an ER of 0.1 and 10 dB. The received signal is simulated without noise and is distorted by 24 ps/nm accumulated CD and a 256-point FFT is computed. (b) Left and right sideband of a 10% RRC-shaped spectrum indicated by the blue- and red-shaded areas and power fading function in decibel. (c) Clock tone power penalty over accumulated dispersion and RRC roll-off factors.

# 4.2.2 Effect of Chromatic Dispersion on Clock Recovery

To investigate the effect of CD on the clock recovery analytically, we consider a mathematical channel model in FD, where  $n = \{1, ..., N\}$  is the frequency bin index of an N-point FFT. The zero-frequency is therefore assigned to n = 1.

For convenience, we consider a clock recovery, which is implemented at twofold oversampling. A twofold upsampled symbol sequence  $\tilde{\underline{s}}_n$  (sample rate upconversion by inserting zeros in between the symbols) is interpolated by a multiplication of the frequency response  $\tilde{p}_n$  of a pulse shaping filter. The optical channel is modeled as dispersive medium with a linear response  $\tilde{h}_{pf,n}$  and nonlinear interference term  $\underline{\tilde{\epsilon}}_n$  (see eq. (4.9)). Finally, the channel adds AWGN  $\underline{\tilde{n}}_n$ . At the receiver side, another lowpass filter  $\tilde{g}_n$  is emulating bandwidth limitations and hence suppresses out-of-band noise. The lowpass filters  $\tilde{p}_n, \tilde{g}_n \geq 0$  are assumed to be real-valued and non-negative in FD since any nonlinear spectral phase can be pre-compensated in the DSP. Lastly, we consider a sampling offset  $\tau$  between the transmit and receive clocks which is normalized to the symbol period and can be assumed to be constant within a block of N samples. The n-th element  $\underline{\tilde{x}}_n$  of the N-point FFT of the received signal can then be expressed as

$$\underline{\tilde{x}}_n = \left(\underline{\tilde{s}}_n \tilde{p}_n \tilde{h}_{\text{pf},n} + \underline{\tilde{\epsilon}}_n + \underline{\tilde{n}}_n\right) \tilde{g}_n e^{j 4\pi \frac{n}{N} \tau} . \tag{4.14}$$

Most common NDA clock recoveries compute the spectral autocorrelation to generate a so-called clock tone at symbol rate, whose phase is proportional to the sampling offset [1], e.g., the algorithms proposed in [J2, 40, 42–44, 79, 80]. The spectral autocorrelation at symbol rate  $R_{\tilde{x}\tilde{x}} = 2/N \sum_{n=0}^{N/2-1} \tilde{x}_n \tilde{x}_{n+\frac{N}{2}}^*$  can be expressed as

$$R_{\tilde{x}\tilde{x}} = \frac{2}{N} \sum_{n=0}^{N/2-1} \left( \tilde{g}_n \tilde{p}_n \tilde{h}_{\mathrm{pf},n} + \tilde{\underline{\epsilon}}_n + \tilde{\underline{n}}_n \right)$$

$$\times \left( \tilde{\underline{s}}_{n+\frac{N}{2}}^* \tilde{p}_{n+\frac{N}{2}} \tilde{h}_{\mathrm{pf},n+\frac{N}{2}} + \tilde{\underline{\epsilon}}_{n+\frac{N}{2}}^* + \tilde{\underline{n}}_{n+\frac{N}{2}}^* \right)$$

$$\times \tilde{g}_n \tilde{g}_{n+\frac{N}{2}} e^{j 2\pi\tau} .$$

$$(4.15)$$

Note that the linear spectral phase of  $\tilde{x}_{n+N/2}$ , which is caused by the sampling offset, is  $\exp(\mathrm{j}\,4\pi(n-N/2)/N\tau)$ , since the frequency range from N/2 to N-1 represents the left sideband at negative baseband frequencies. For convenience, we substitute the pulse shape product and receiver-side lowpass product as a real-valued variable  $\tilde{\rho}_n = \tilde{p}_n \tilde{p}_{n+N/2}$  and  $\tilde{\gamma}_n = \tilde{g}_n \tilde{g}_{n+N/2}$ , respectively. Furthermore,

we summarize all noise and nonlinear terms including the nonlinear interference  $\underline{\tilde{\epsilon}}_n$  by  $\tilde{\chi}_n$ . The spectral correlation can then be written as

$$R_{\tilde{x}\tilde{x}} = \frac{2}{N} e^{j 2\pi \tau} \sum_{n=0}^{N/2-1} |\underline{\tilde{s}}_n|^2 \tilde{\gamma}_n \tilde{\rho}_n \, \tilde{h}_{\mathrm{pf},n} \tilde{h}_{\mathrm{pf},n+\frac{N}{2}} + \underline{\tilde{\chi}}_n \,. \tag{4.16}$$

Due to the cyclostationarity of the transmit symbol sequence, the Fourier transform of the random sequence  $\underline{\tilde{s}}_n$  is periodic with symbol rate N/2, and thus the product  $\underline{\tilde{s}}_n\underline{\tilde{s}}_{n+N/2}^* = \underline{\tilde{s}}_n\underline{\tilde{s}}_n^* = |\underline{\tilde{s}}_n|^2$  results in a real value. The phase of the clock tone then gives an estimation of the sampling offset  $\hat{\tau}$  as

$$\hat{\tau} = \frac{1}{2\pi} \arg\left\{ R_{\tilde{x}\tilde{x}} \right\}$$

$$= \frac{1}{2\pi} \arg\left\{ \frac{2}{N} e^{j 2\pi\tau} \sum_{n=0}^{N/2-1} \left| \underline{\tilde{s}}_n \right|^2 \tilde{\gamma}_n \tilde{\rho}_n \cos\left(\zeta_n\right) \cos\left(\zeta_{n-\frac{N}{2}}\right) + \underline{\tilde{\chi}}_n \right\},$$
(4.17)

where  $\zeta_n = \zeta(f = (n-1)f_{\rm sa}/N)$  and  $f_{\rm sa}$  is the sampling rate. For the case of no CD, i.e.,  $\tilde{h}_{\rm pf,n} = 1$  and  $\tilde{\chi}_n = 0$ , the phase of the clock tone is only proportional to the sampling offset  $\tau$ . However, in the presence of CD, the timing estimation is impaired by the power fading and the nonlinear interference contained in  $\tilde{\chi}_n$ .

To investigate the effect of CD in simulation, we simulate a 112-GBd IM/DD system, which is impaired by CD and AWGN, as displayed in Fig. 2.22. At the transmitter, a sequence of PAM4 symbols is generated. Afterwards, each symbol is repeated by eight to model NRZ pulses with a rise time of zero or the symbol sequence is upsampled by inserting zeros between the symbols and interpolated using an RRC pulse-shaping filter. Bandwidth limitations of a DAC and other analog components at the transmitter are taken into account by applying a 5th-order Bessel lowpass filter with a 3-dB bandwidth of 78 GHz. To model a non-negative optical power, a bias is added and the optical field amplitude is obtained by the square-root of the optical power. To simulate the fiber-optic channel of length L, the resulting signal is impaired by CD for a certain CD coefficient  $D_{\rm cd}$ . At the receiver side, the signal is detected by a photodiode, which is modeled as square-law detector and AWGN is added to set a certain

electrical SNR that is normalized to the symbol rate. The resulting signal is once again filtered by a Bessel lowpass and downsampled to 2 Sa/Sym. Finally, a sampling offset  $\tau$  is added before the clock recovery.

The influence of CD on the clock recovery performance can be observed through three effects. The first effect is depicted in Fig. 4.2(a), which shows the phase of the cosine product  $\arg\{\cos(\zeta_n)\cos(\zeta_{n-N/2})\}\$ , which can be either 0 or  $\pi$ , in orange and the phase of  $\arg\{\tilde{x}_n\tilde{x}_{n+N/2}^*\}$  for each frequency bin n in blue. To see the effect of CD, we add no AWGN and neglect the influence of the sampling offset on the FD phase for the moment by setting  $\tau = 0$ . The FFT size is N = 256(128 symbols) and the accumulated dispersion is 18 ps/nm at 112 GBd. Since  $\tau = 0$ , the phase difference should be zero for each frequency bin. However, there is a phase shift of  $\pi$  for the frequency bins where the product of  $\cos(\zeta_n)$ and  $\cos(\zeta_{n+N/2})$  have opposite sign. Furthermore, the nonlinear interference in  $\underline{\tilde{\chi}}_n$  distorts the phase which becomes especially relevant in the dips of the power fading function  $\tilde{h}_{\rm pf}(f)$ . The weighted sum over the cosine product in eq. (4.17) then has either positive or negative sign. A negative sign leads to a  $\pi$ -phase shift, which results in a false estimation of the sampling offset by half a symbol period and therefore the closed eye is interpreted as the ideal sampling point. However, this constant sampling offset can be compensated manually or by an adaptive equalizer.

The second effect occurs for certain dispersions, where the weighted sum of the cosine product approaches zero. In this case, the left term of the sum in eq. (4.17) vanishes and the nonlinear interference  $\tilde{\chi}_n$  dominates the phase estimation and leads to a inconsistent clock phase estimation. This effect can be seen for the modified version of the algorithm by Barton and Al-Jalili [40] (see eq. (4.21)) at 11 ps/nm in Fig. 4.3(b).

The third effect occurs for narrow-band lowpass responses  $\tilde{p}$  and  $\tilde{g}$ , i.e., for strong bandwidth limitations (FTN signals) or low RRC roll-off signals, and is visualized in Fig. 4.2(b) by showing the overlapping left and right sidebands of a 10% roll-off RRC frequency response (blue and red area) and the power fading function (blue and red curve) in decibel at 112 GBd and 18 ps/nm accumulated dispersion. The

term proportional to the cosine product in eq. (4.17) disappears if the nulls of the cosines fall within the narrow region of the overlapping sidebands. In Fig. 4.2(c), we compute the clock tone power penalty (CTPP) in decibel by comparing the clock tone resulting from the RRC frequency response with the one impaired by CD as

CTPP = 
$$10 \log_{10} \left\{ \frac{\sum_{n=0}^{N/2-1} \tilde{\rho}_n \cos(\zeta_n) \cos(\zeta_{n-\frac{N}{2}})}{\sum_{n=0}^{N/2-1} \tilde{\rho}_n} \right\}$$
 (4.18)

For a roll-off of 1, the clock tone power penalty is about 3.8 dB for an accumulated dispersion larger than 5 ps/nm, while for low roll-offs the periodic extinction of the clock tone can be observed for dispersion values, where the power fading dips fall around half the symbol rate, i.e., where  $|f_m| = f_{\rm sym}/2$  is valid for eq. (4.13). Hence, eq. (4.13) can be solved for  $f_{\rm sym}/2$  and the affected accumulated dispersion values can be found as

$$LD_{\rm cd} = \frac{2(1+2m)\vartheta}{f_{\rm sym}^2\lambda^2}$$
 (4.19)

The three effects repeat periodically as the accumulated CD increases, since the function  $\tilde{h}_{\rm pf}(f)$  is periodic as well. It shall be noted that the effects occur for TEDs used in FB control loops and TEs used in FF clock recovery architectures. Furthermore, this effect can be observed regardless of whether the timing is estimated in time or frequency domain.

# 4.3 Chromatic Dispersion Tolerant Clock Recovery for Non-Return-to-Zero Signals

As a consequence of CD on the clock recovery for data communication, the sample period offset at twofold oversampling now results in sampling at the signal transitions, leading to strong additional ISI. This effect can either be compensated by a constant half-symbol-period sampling offset using an adaptive equalizer after

the clock recovery or by manual addition of a  $\pi$ -phase shift for the affected fiber lengths. However, both approaches will still suffer by the clock recovery failure due to a vanishing clock tone if the summed cosine product is close to zero (second effect). For this reason, we propose a modification that makes the FD clock recovery algorithm robust to CD in direct-detection systems.

The starting point of our consideration is the algorithm of Barton and Al-Jalili (BAJ) [40], where a timing estimate is computed from two frequency components at bin n and n + N/2 as

$$\hat{\tau}_{\text{BAJ}} = \frac{1}{2\pi} \arg\left\{ \tilde{x}_n \tilde{x}_{n+\frac{N}{2}}^* \right\}. \tag{4.20}$$

In [J2, C1, 41], the BAJ algorithm is modified by averaging the product  $\tilde{x}_n \tilde{x}_{n+N/2}^*$  over the frequency axis to reduce the phase estimation inaccuracy caused by random noise, resulting in the autocorrelation (see eq. (4.17))

$$\hat{\tau}_{\text{mod-BAJ}} = \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \tilde{x}_n \tilde{x}_{n+\frac{N}{2}}^* \right\}. \tag{4.21}$$

Clock recovery in FD offers the major advantage that we can easily compensate for the cosine terms. In order to mitigate the effect of CD, we first correct the Fourier transform  $\tilde{x}$  for the phase rotation of the cosine product

$$\underline{\tilde{y}}_n = \underline{\tilde{x}}_n \operatorname{sgn} \left\{ \cos \left( \zeta_n \right) \right\} \tag{4.22}$$

and then apply the TE as

$$\hat{\tau}_{\text{CD-BAJ}} = \frac{1}{2\pi} \arg \left\{ \sum_{n=0}^{N/2-1} \tilde{\underline{y}}_n \tilde{\underline{y}}_{n+\frac{N}{2}}^* \right\}.$$
 (4.23)

Doing so, the power fading function will always be positive and the spectral correlation will add up constructively, hence, it avoids the  $\pi$ -phase shift (first effect) and the weighted cosine to approach zero (second effect). This modification requires the information of the total accumulated dispersion of the fiber link, for

example from the CD estimation performed in the link configuration. From this,  $\tilde{h}_{{
m pf},n}$  can be pre-computed as a reference model. Note that small CD measurement inaccuracies may lead to frequency components in  $\underline{\tilde{y}}_n$  which have an incorrect sign-correction. However, this effect is mitigated due to the correlation averaging over N/2 frequency bins. Furthermore, it should be emphasized that this modification only marginally increases the computational effort compared to the mod-BAJ, as only the sign of the Fourier transform has to be adjusted. Fig. 4.3 shows the simulation results for a 112-GBd PAM4 signal with NRZ pulse shape and 78 GHz 6-dB bandwidth (one Bessel filter on, both, the transmitter and receiver side) for various accumulated CD values. A common performance metric to evaluate clock recovery algorithms is the jitter, defined as the variance of the deviation of the estimated clock phase from the actual clock phase in decibels as  $20 \log_{10}(\operatorname{std}(\hat{\tau} - \tau))$ . However, if the accumulated CD leads to a  $\pi$ -phase shift, it is possible that a timing estimate with low uncertainty is obtained, which however leads to a high jitter since  $\hat{\tau} - \tau$  is large. For this reason, we visualize the performance in Fig. 4.3(a) by plotting the timing estimate error  $\hat{\tau} - \tau$  over the accumulated CD and keep  $\tau$  = 0 in simulation. Since the  $\pi$ -phase shift leads to phase jumps due to phase wrapping, we plot the absolute value of the estimation error for a simpler overview. To recognize the effect of the CD, we add no AWGN to the signal for now. For each CD value, we simulate 100 waveforms and calculate the timing estimate for a block length of N = 256 samples. We plot the mean estimation error (dark blue curve) and the  $\pm \sigma$  confidence interval of the estimation error as the light blue area around it. Furthermore, we indicate the areas in which the cosine product including the pulse shape leads to a change of the sign in gray. For the mod-BAJ algorithm with 0.1 dB ER, the phase jumps can be clearly noticed. Around those phase jumps, the cosine product is close to zero and the timing estimate is distorted by the nonlinear interference, which leads to an increase of the standard deviation. This effect is emphasized for an ER of 10 dB. With our proposed CD-BAJ algorithm, this can be counteracted so that no phase jumps occur and the standard deviation is low. Only a constant estimation error linearly depending on the accumulated CD cannot be corrected, but can easily be compensated for by an consecutive adaptive equalizer. Fig. 4.3(b) shows the timing estimate over 50,000 simulated sampling offsets for different CD

values and an SNR of  $20\,\mathrm{dB}$  for the mod-BAJ and the CD-tolerant algorithm. The colored symbols mark the position in Fig. 4.3(a).



**Fig. 4.3:** Clock recovery performance for a 112-GBd PAM4 signal with NRZ pulse shape and 78-GHz 6-dB bandwidth limitation. (a) Absolute mean error of the timing estimate (dark blue line) with its  $\pm \sigma$  confidence interval indicated by the light blue area around it computed from 100 waveforms over accumulated CD and no AWGN. (b) Exemplary linear curves of the timing estimate over 50,000 simulated sampling offsets using the mod-BAJ and the CD-tolerant BAJ algorithm for 20 dB SNR. The inset symbols mark the position in subfigure (a).

### 4.4 Chromatic Dispersion Tolerant Clock Recovery for Low-Roll-Off and Faster-Than-Nyquist Signals

The presented CD-tolerant BAJ algorithm only compensates for the phase rotation caused by the cosine and the clock tone extinction caused by counter-phase cosine products. However, in case of low-roll-off signals or extreme bandwidth limitations, this modification does not compensate for the power fading of the overlapping areas of the signal sidebands. In such a case, clock recovery initially designed for FTN signals, i.e., for signals whose 3-dB bandwidth are significantly lower than half the symbol rate, may be beneficial. In order to broaden the spectrum of such signals, thus allowing to recover a clock tone, a nonlinear operation is commonly used, for example the magnitude squared in TD, which is also referred to as 4-th power method [32, 81, 82]. Similar to the algorithm in [32], the squaring in TD  $z_k = |x_k|^2 = x_k x_k^*$  can also be described as a convolution in FD as

$$\tilde{\underline{z}}_n = \sum_{m=0}^{N-1} \tilde{\underline{x}}_m \tilde{\underline{x}}_{m-n}^* \,. \tag{4.24}$$

Note that the complex conjugate TD signal is considered and thus the algorithm can also be applied to complex-valued modulation formats. Although PAM signals are real-valued, the complex conjugation in the FD must be considered in order to obtain the power correlation. At this point, we would like to highlight that under severe bandwidth limitations (e.g., FTN signaling), the ADC is capable of capturing the full signal spectrum even with baud-rate sampling, hence, enabling significant power savings. Following the initial sampling, digital resampling can

be employed to facilitate oversampled digital clock recovery. Accordingly, the 4-th power version of the mod-BAJ algorithm results in

$$\hat{\tau}_{\text{mod-4P-BAJ}} = \frac{1}{2\pi} \arg \left\{ -\sum_{n=0}^{N/2-1} \tilde{z}_n \tilde{z}_{n+\frac{N}{2}}^* \right\} 
= \frac{1}{2\pi} \arg \left\{ -\sum_{n=0}^{N/2-1} \left[ \sum_{m=0}^{N-1} \tilde{x}_m \tilde{x}_{m-n}^* \right] \left[ \sum_{m=0}^{N-1} \tilde{x}_m \tilde{x}_{m+\frac{N}{2}-n}^* \right]^* \right\}.$$
(4.25)

Finally, we include the power fading correction from eq. (4.22) to obtain the CD-tolerant version of the 4th-power BAJ TE as

$$\hat{\tau}_{\text{CD-4P-BAJ}} = \frac{1}{2\pi} \arg \left\{ -\sum_{n=0}^{N/2-1} \left[ \sum_{m=0}^{N-1} \tilde{\underline{y}}_{m} \tilde{\underline{y}}_{m-n}^{*} \right] \left[ \sum_{m=0}^{N-1} \tilde{\underline{y}}_{m} \tilde{\underline{y}}_{m+\frac{N}{2}-n}^{*} \right]^{*} \right\}. \quad (4.26)$$

In analogy to the CD-BAJ in Fig. 4.3(a), Fig. 4.4(a) shows the absolute mean estimation error including the error's  $\pm \sigma$  confidence interval over the accumulated CD for the CD-BAJ, mod-4P-BAJ, and the CD-tolerant version CD-4P-BAJ for a 112-GBd PAM4 signal with a 2% RRC roll-off. For the CD-BAJ, we calculate the spectral correlation only for the non-zero frequency components, which is given by the RRC roll-off. For a roll-off of 0.02, the spectral correlation is calculated for only 3 frequency components [J2]. Due to the more narrow signal bandwidth, the sign changes of the cosine product are correspondingly less frequent and are again indicated by the gray area. Around these positions, the narrow overlap of the signal spectrum is canceled out by the power fading, which is why the CD-BAJ produces incorrect timing estimates with high standard deviation. While the mod-4P-BAJ works for extremely bandwidth-limited signals, the power fading in particular leads to an incorrect and highly uncertain timing estimate over the entire range of all accumulated CD values. By first compensating the sign of the power fading again, the CD-4P-BAJ algorithm becomes much more robust against CD. The improved robustness of the CD-4P-BAJ comes at the price of increased computational effort compared to the CD-BAJ. The question therefore arises when it performs better (except for the case when clock tone extinction occurs at around 20, 60, and 100 ps/nm). For this purpose, we simulated the

jitter and swept the RRC roll-off and the Bessel filter bandwidth for the CD-BAJ and CD-4P-BAJ in Fig. 4.4(b) for a constant accumulated CD of 0 ps/nm and 30 ps/nm and an SNR of 20 dB. When comparing the jitter between 0 and 30 ps/nm, a penalty due to the nonlinear interference terms can be observed. The CD-4P-BAJ outperforms the CD-BAJ for roll-off less than 0.06 while the CD-4P-BAJ performance rapidly degrades for roll-offs larger than about 0.08, due to aliasing caused by the squaring in TD. Sweeping the 6-dB bandwidth at a 0.02 roll-off reveals the superior performance of the 4-th power method compared to the mod-BAJ algorithm. The penalty caused by nonlinear interferences can be reduced by additional averaging over consecutive timing estimate realizations by buffering and averaging the complex values before calculating the angle [C3]. With an MA over 8 clock tones, the jitter can be significantly reduced, here shown by the dashed lines. The CD-4P-BAJ then performs better for roll-offs smaller than 0.05.



Fig. 4.4: Clock recovery performance for a 112-GBd PAM4 signal with 2% roll-off RRC pulse shape and 78-GHz 6-dB bandwidth limitation. (a) Absolute mean timing estimate error (dark blue line) with its  $\pm \sigma$  confidence interval indicated by the light blue area around it computed from 100 waveforms over accumulated CD and without AWGN. (b) Jitter over RRC roll-off and component bandwidth normalized to the symbol rate computed from 50,000 waveforms with 20 dB SNR as well as 0 ps/nm and 30 ps/nm accumulated CD. The dashed lines show the jitter when an 8-tap MA filter is applied to the clock tone.

### 4.5 Experimental Validation

In this section, we experimentally validate our model for the effect of CD on clock recovery and our proposed algorithms. For this purpose, we set up an IM/DD system as depicted in Fig. 4.5. At the transmitter and receiver sides, we use the Keysight USPA FPGA platform, equipped with a DAC and an ADC (formerly Micram DAC3 and ADC3), respectively. We generate a 34-GBd PAM4 signal (limited by available hardware) using the FPGA with either an NRZ or an RRC pulse shape with 0.1 roll-off. The DAC has a nominal 6-dB bandwidth of 28 GHz and is sampling at 68 GSa/s, thus, NRZ pulses are generated by simply repeating each symbol twice. The amplified analog signal is then used to modulate the light emitted by an external-cavity laser (ECL) at 1550 nm utilizing a Mach-Zehnder modulator (MZM) with a 3-dB bandwidth of 25 GHz and an ER of 22 dB according to the datasheet. To realize various values of accumulated dispersion, we use a combination of 10.56 km-long fiber spools combined with a wavelength-selective switch (WSS), which allows fine-tuning within  $\pm 100$  ps/nm. The fiber spools have an estimated CD coefficient of 16.3 ps/nm/km, i.e., a single spool, a concatenation of two spools, and a concatenation of three spools features a total accumulated dispersion of 172 ps/nm, 344 ps/nm, and 516 ps/nm, respectively. Therefore, the dispersion can be set in a range from -100 to 616 ps/nm. The optical power at the output of the WSS is 3.6 dBm before the fiber spools. Finally, a variable optical attenuator (VOA) allows to set a constant received optical power of -2 dBm for 10.56 km and 21.12 km fiber distance and -4.5 dBm for 31.68 km. At the receiver side, a 27-GHz 3-dB bandwidth optical detector (Optilab PR-40G-M) consisting of a photodiode and a TIA is used to convert the optical signal into a voltage. Finally, the ADC with nominal 3-dB bandwidth of about 37 GHz is sampling the received signal at 68 GSa/s, which is is stored in the receiver FPGA for offline processing.

Since the group delay of the channel varies with different dispersion settings applied at the WSS during the experiment, the corresponding constant sampling phase estimation also differs for each dispersion value. Therefore, we evaluate the timing estimates separately for selected characteristic dispersion levels. To



Fig. 4.5: Experimental IM/DD setup to investigate the impact of accumulated dispersion on the clock recovery. The amount of accumulated dispersion is set by using different fiber spools together with a WSS.

this end, the simulated timing estimates as a function of accumulated CD for the mod-BAJ and the CD-BAJ for 34 GBd NRZ signals considering a 25-GHz 3-dB bandwidth Bessel lowpass filter are shown in Fig. 4.6(a). Similarly, we compare the CD-BAJ and CD-4P-BAJ for RRC-shaped signals with 10% roll-off in Fig. 4.7(a). We then select certain characteristic accumulated dispersion values (indicated by the orange symbols in Fig. 4.6(a) and Fig. 4.7(a)) and show the timing estimates over time obtained from the experimental waveforms in Fig. 4.6(b) and Fig. 4.7(b). In offline processing, we split the received waveforms from the experiment into blocks of N = 256 samples. Afterwards, the timing estimate is computed for each of these blocks using the clock recovery algorithms described above. Due to the strong bandwidth limitation, it is sufficient to consider only a part of the frequency components in the summation of the correlation (eq. (4.21) and eq. (4.23)) [35] for the NRZ signal. For this reason, we use only the center N/4 frequency components in the summation over n in order to reduce the number of frequency components outside the signal spectrum, which primarily contain noise and nonlinear interference terms, and to reduce the computational complexity. We compare the timing estimation for four dispersion values by means of conventional clock recovery using the mod-BAJ and using the proposed CD-tolerant BAJ algorithm in Fig. 4.6(b) for the NRZ signal. The experimental results are in very good agreement with our theoretical considerations and the simulation in Fig. 4.6(a). For 195 ps/nm, indicated by a circle, the  $\pi$ -phase shift can be recognized at low estimation deviation. For about 308 ps/nm, indicated

by a diamond, the sum over the cosine products is approaching zero, leading to a cancellation of the clock tone and, therefore, nonlinear interference terms are more pronounced. The conventional clock recovery fails, while the proposed CD-tolerant algorithm still provides a timing estimate with low deviation. At 403 ps/nm is a region with no  $\pi$ -phase shift, where both algorithms provide identical estimations. Finally, at 500 ps/nm the clock tone extinction starts again, but the CD-tolerant BAJ continues to perform well.

Next, we consider the experimental clock recovery performance for an RRCshaped signal with 0.1 roll-off to observe the third effect mentioned in section 4.2.2. Since the signal spectrum is now more narrow, it is sufficient to sum only over the non-zero frequency components, i.e., where  $\tilde{\gamma}_n \tilde{\rho}_n > 0$ , for the CD-BAJ algorithm. In this case, we sum over 12 out of 128 frequency components per sideband. As aliasing starts to decrease the performance of the 4-th power method, only the middle N/4 bins in the outer sum over index n in eq. (4.26) are evaluated. Since fewer bins are used, the influence of noise on the estimate is more significant. For this reason, an additional 8-tap MA of the spectral correlation is used. As the frequency range of the overlapping RRC-shaped sidebands is small for such a signal, a  $\pi$ -phase shift is less frequent, but a clock tone extinction due to spectral dips caused by power fading is possible. For a 34-GBd signal, this can be observed in simulation in Fig. 4.7(a) at around 200 ps/nm by an increased uncertainty of the timing estimate. This observation is also confirmed in the experiment when considering the timing estimate for 195 ps/nm using the CD-BAJ and the CD-4P-BAJ. The proposed CD-BAJ shows a high standard deviation due to the few/no frequency bins over which it can average, while the CD-4P-BAJ exhibits a significantly lower standard deviation and is therefore tolerant to, both, the  $\pi$ -phase shift and the clock tone extinction due to the spectral nulls falling in the overlap region. At 322 ps/nm there are enough frequency bins available for the CD-BAJ to estimate the timing with low uncertainty. However, the CD-4P-BAJ shows a higher uncertainty caused by aliasing, showing the price to pay when using this algorithm. This effect is also reflected in Fig. 4.7(b), where the CD-4P-BAJ shows more than 10 dB worse jitter compared to the CD-BAJ at a roll-off of 0.1. Accordingly, the CD-4P-BAJ is only advantageous in the case of



**Fig. 4.6:** Timing estimation results for 34-GBd PAM4 NRZ signals. (a) Simulated timing estimate as a function of accumulated CD obtained using the mod-BAJ (blue) and CD-BAJ (red) algorithms. Characteristic dispersion values are indicated by yellow markers. (b) Timing estimates obtained by offline processing of recorded waveforms from the experiment for the four characteristic dispersion values indicated in (a). For a block size of N = 256 samples, N/4 frequency bins per sideband are evaluated for the mod-BAJ (red line) and CD-BAJ algorithm (blue line), respectively.

strong bandwidth limitation, either caused by a narrowing of the signal spectrum when the spectral dips fall directly on the edges of the signal spectrum, or in the case of extremely low roll-off factors or FTN signals.



Fig. 4.7: Timing estimation results for 34-GBd PAM4 signals exhibiting a RRC pulse shape with 0.1 roll-off. (a) Simulated timing estimate as a function of accumulated CD obtained using the CD-BAJ (blue) and CD-4P-BAJ (red) algorithms. Characteristic dispersion values are indicated by yellow markers. (b) Timing estimates obtained by offline processing of recorded waveforms from the experiment for the two characteristic dispersion values indicated in (a). For a block size of N=256 samples and 8-tap MA of the spectral correlation, 12 frequency bins per sideband are evaluated for the CD-BAJ (red line) and N/4 frequency bins per sideband for the CD-4P-BAJ algorithm (blue line), respectively.

[End of paper [J3]. The paper's conclusion is added to chapter 8.]

### 5 Nanosecond Clock Synchronization for Passive Optical Networks

This chapter discusses the advantages of a fully digital FF clock recovery in future high-speed PONs equipped with a DSP. Parts of the results of this chapter were presented at the *Optical Fiber Communication Conference (OFC)* in 2024 [C4]. The material from the publication has been adapted to comply with the layout and the structure of this thesis.

#### 5.1 Introduction

As discussed in chapter 2, the clock recovery can be implemented either in a FB architecture based on a PLL, in a FF architecture, or in a combination of both approaches. Under the prerequisite that relatively stable oscillators are used (±20 ppm for 400G-ZR standardized transceivers [5, 6]) and data transmission is continuous, FB structures result in stable phase tracking and are therefore commonly used in optical communications, e.g., in long-haul point-to-point systems [7]. But also in short-reach systems they are frequently deployed because of their low complexity [8, 9]. However, due to their relatively long acquisition time, such control loops might not meet the stringent requirements for fast synchronization in burst-switched systems, such as PONs [10, 11] and data centers [12] or systems that are affected by link outages, e.g., free-space optical communications under atmospheric turbulence [13, 83] or optical camera communications [C6]. In this

case, FF schemes can be beneficial due to their instantaneous timing estimation and their improved high-frequency jitter performance especially when using low-cost oscillators that feature wider linewidths and lower frequency adjustment accuracies [C2, 15].

In PONs, the timing estimate is typically derived from a preamble [73, 84]. In [85], the authors showed that a NDA FF algorithm allows to shorten or even avoid the preamble. A prominent example of such an algorithm is the square-timing-recovery algorithm proposed by Oerder and Meyr [36]. However, since this algorithm requires an oversampling factor larger than two, it is considered too computationally complex in comparison to FB algorithms, which are often implemented at twofold oversampling. As shown in chapter 2, there exist also FF algorithms, which require only an oversampling ratio sufficient to resolve the signal's bandwidth. Due to their fast synchronization (see section 2.3.4) and their ability to compensate high-frequency jitter, i.e., large CFOs, (see section 2.4.3), digital FF clock recovery can potentially replace analog CDR and offer fast synchronization in burst-switched systems.

In this work, a nanosecond-scale synchronization in high-speed IM/DD PONs is presented. The synchronization within 36.57 ns at the OLT side of two ONUs in burst-mode upstream is demonstrated. To do so, 56-GBd NRZ, PAM2 and PAM4 signals are modulated and transmitted in C-band at about 1540 nm over 2.2 km single-mode fiber (SMF) with an estimated CD coefficient of  $D_{\rm CD} \approx 10.5~{\rm ps/nm/km}$ . Furthermore, we experimentally evaluate the synchronization performance for various CFOs.

## 5.2 PONs Featuring Free-Running Oscillators with All-Digital Clock Recovery

With the standardization of the high-speed 50G-PON, an ADC and DSP is introduced to PONs [86]. This allows the replacement of analog CDR by digital clock recovery and adaptive equalization. It can be implemented either in a FB

architecture, in a FF architecture, or in a combination of both approaches. In this thesis, a future high-speed PON which employs digital and NDA clock recovery with free-running oscillators at the OLT and ONU is envisioned. This allows bulky LF capacitors of analog CDR circuits to be avoided and enables a higher tolerance to channel effects (see chapter 4), noise, temperature, and voltage variations through a purely-digital implementation. Furthermore, a deep-submicron CMOS technology facilitates scalability [30].

The envisioned PON system is depicted in Fig. 5.1. At the OLT, a highly-stable, free-running clock with low jitter generation is used. In order to synchronize all ONU clocks to the same OLT clock, the ONUs utilize the downstream signal addressed to all ONUs for digital, NDA clock synchronization. The ability of a FF architecture to compensate for high-frequency jitter allows the use of low-cost oscillators at the ONU (see chapter 2). This leads to improved compliance with the specified jitter requirements of the PON standards and thus to a low jitter transfer to the ONU. The all-digital clock recovery at the ONU side compensates for any frequency and phase offset between the transmitter and receiver clock in the digital domain (see chapter 3). This allows clock synchronization to be implemented fully digital and benefits from the tremendous progress in integration and power reduction of CMOS circuits and thus potential size, power, and cost savings in ONUs. The digitally-synchronized clock can then also be used to process the upstream signal. In burst-mode upstream, a FF clock recovery at the OLT again ensures low jitter transfer and the high clock recovery bandwidth enables for a nanosecond-scale synchronization of the constant time delay for each received burst, as the fiber distance for each ONU to the OLT may differ. Using an NDA algorithm allows to track the clock phase during the whole duration of a burst and not only during the preamble. Note, that the algorithms discussed in this work can be used for any higher-order modulation format and can be also applied to complex-valued signals in coherent PONs. This flexibility is a further advantage of digital clock recovery implementations over conventional analog CDR, which are designed especially for OOK. In coherent PONs, the preamble length increases as it is required for polarization and frequency estimation as well [85]. At the same time, the total burst duration may shorten, as future PONs may serve up to

256 network users, e.g., for new types of applications in smart cities [J1]. Due to that, efficient preamble design is crucial and NDA clock recovery may help to shorten or even avoid the preamble for the clock synchronization [85].



Fig. 5.1: Envisioned PON employing fully digital NDA clock recovery which allows the use of freerunning oscillators. FE: front-end.

# 5.3 Experimental Performance Evaluation for PON Upstream

Fig. 5.2(a) illustrates the setup for a 56-GBd burst-mode transmission of a loud and silent burst. In the ONU, a DFB laser at 1540 nm is modulated by an EAM, which is driven from an arbitrary waveform generator (AWG) with a 28 GHz 6-dB bandwidth running at 56 GSa/s (Keysight USPA real-time prototyping platform). The digital signal is a 56 GBd, NRZ, PAM2 and PAM4 sequence at the ONU-1 and ONU-2, respectively. A symbol rate of 56 GBd is chosen due to limitations in the available hardware. The available memory of the AWG allows generating bursts with a length of 4.5812 µs (128,272 symbols), followed by a pause of 4.7812 µs.

The signal of ONU-1 is transmitted over a 2.2-km SMF and is combined with the signal from ONU-2 in an optical coupler. A VOA then sets the received optical power in front of the OLT. The receiver first amplifies the signal using a semiconductor optical amplifier (SOA) followed by a 3-nm-wide bandpass filter. Afterwards, the optical signal is detected by a 40-GHz positive intrinsing negative (PIN) photodiode followed by an electrical amplifier and captured by a 33-GHz real-time oscilloscope sampling at 80 GSa/s. The received signal spectrum is shown as inset in Fig. 5.2(a), where the strong attenuation beyond 33 GHz is caused by the bandwidth limitation of the oscilloscope. Finally, offline DSP is applied to resample the signal to twofold or fourfold oversampling, as required for the clock recovery algorithms presented in section 2.3.1. Unless otherwise stated, all clock recovery algorithms have a block size of 128 symbols. To reduce the impact of noise in the FF methods, an additional 16-tap MA filter is added prior to computing the angle. For the PLL, the parameters  $B_{\rm L}$  = 0.005,  $\zeta_{\rm L}$  = 0.707, and  $D_{\rm L} = 20$  are used. After clock recovery, a 21-tap linear FF equalizer [87] is applied and hard-decision and BER testing is performed. Fig. 5.2(b) shows the receiver sensitivity after DSP using the Zhu algorithm (see eq. (2.37)) for clock recovery and for continuous transmission of a single ONU in optical back-to-back (Btb) as well as over 2.2 km ( $LD_{\rm CD} \approx 23.1 \, \mathrm{ps/nm}$ ) and 10.56 km ( $LD_{\rm CD} \approx 110.88 \, \mathrm{ps/nm}$ ) fiber in C-band ( $D_{\rm CD} \approx 10.5\,{\rm ps/nm/km}$ ). For 10.56 km, a 64-tap MA is used to improve the timing estimation since it seemed to be around a tipping point where a  $\pi$ -phase shift occurs, as mentioned in chapter 4. Moreover, a 51-tap FF equalizer is utilized to compensate the increased channel memory caused by CD. For PAM2, in Btb and over  $2.2 \, \text{km}$ , the  $10^{-2}$  forward error correction (FEC) limit [88] is obtained at -22 dBm and for 10.56 km at around -16 dBm received optical power. For PAM4, the  $2 \times 10^{-2}$  FEC limit [89] is obtained at around  $-13 \, \mathrm{dBm}$ received optical power. Here, the nonlinear gain saturation caused by the SOA is the main limiting factor towards high optical powers and may be reduced by using sophisticated nonlinear equalizers or neural networks [C8]. Further improvement of the dynamic range can be achieved by more sensitive optical receivers, e.g., using avalanche photodiodes instead of PIN photodiodes as well as adding a TIA after photodetection.



Fig. 5.2: (a) Experimental setup for PON burst-mode upstream with received signal spectrum for -15 dBm received optical power and 2.2 km and 10.56 km fiber lengths. (b) Receiver sensitivity for PAM2 and PAM4 modulation using the Zhu TE.

In order to analyze the potential for compensating high-frequency jitter, the transmitter clock frequency for a single ONU in continuous transmission mode over 2.2 km fiber and PAM2 modulation is detuned. The received optical power is set to -15 dBm. To calculate the BER without the convergence time of the PLL-based clock recovery and adaptive equalizer, only the last  $5 \times 10^5$  symbols are evaluated, hence, limiting the minimum observable BER to  $2 \times 10^{-6}$ . The results are depicted in Fig. 5.3 using box plots generated out of 30 measured waveforms, to also account for PLL instability. Given the clock recovery parameters, an update

rate of 56 GHz/128 = 437.5 MHz of the timing estimate and the error signal results. For the FF algorithms, the approximated 3-dB bandwidth of the MA filter is  $437.5 \,\mathrm{MHz}/(2\times16) \approx 13.7 \,\mathrm{MHz}$ , which corresponds to 244 ppm, and the first spectral dip lies at around  $437.5 \, \text{MHz} / 16 \approx 27.3 \, \text{MHz}$ , which corresponds to 488 ppm. The result becomes apparent in Fig. 5.3, where the BER gradually degrades for a CFO between 244 ppm and 488 ppm. For the FB algorithms, the approximated inherent 3-dB bandwidth of the PLL results to  $437.5 \, \text{MHz} \times 0.005 \approx 2.2 \, \text{MHz}$ , which corresponds to around 39 ppm. However, this bandwidth can only be achieved for an inner-loop delay of  $D_{\rm L}$  = 1, as explained in section 2.3.4. Since  $D_{\rm L}$  = 20, the total loop bandwidth is reduced, which also matches with the observation. For the Godard algorithm, the PLL tends to become unstable more frequently, even though both, the Godard and Gardner algorithms, are implemented similarly and behave identically in simulations. This highlights the necessity of precise PLL design for FB clock recovery. It is evident that FF schemes can accommodate significantly larger CFOs, resulting in a substantially higher jitter tolerance, in contrast to FB schemes.

In Fig. 5.4, the clock recovery performance for the time-domain Zhu and Gardner algorithms using two ONUs in burst-mode transmission at a received optical power of -5 dBm is evaluated. The FB control loop is implemented with  $B_{\rm L}$  = 0.05 to allow a faster convergence and a FB delay of  $D_{\rm L}$  = 1. In the first row of Fig. 5.4, the block length is set to  $M_{\rm B}$  = 128 symbols and the CFO is set to 1 ppm. It can clearly be seen, that the Zhu algorithm delivers an instantaneous timing estimation, while the Gardner algorithm oscillates towards the correct clock phase at the beginning of the bursts. Considering the settling time of the 16-tap MA filter, the total synchronization speed of the Zhu algorithm results to  $8.93 \text{ ps} \times 256 \times 16 = 36.57 \text{ ns}$ . An even further reduction of the synchronization time can be achieved by using overlapping blocks of samples. When increasing to  $M_{\rm B}$  = 2048 (second row of Fig. 5.4), the synchronization time is reduced by a factor of eight which can be seen by the more smoothed transitions of the estimated clock phase for the FF algorithm. For the Gardner algorithm, the PLL becomes unstable and cannot follow the clock phase anymore. Finally, the clock phase offset is set to 20 ppm. The algorithm block length is changed to



Fig. 5.3: BER performance for an algorithm block length of 128 symbols for various CFO measured out of 30 waveforms after transmission over 2.2 km fiber and at a received optical power of –15 dBm.

 $M_{\rm B}$  = 128 symbols and the FB delay is set to a more realistic value of  $D_{\rm L}$  = 50, requiring a reduced loop bandwidth of  $B_{\rm L}$  = 0.005 to achieve stable operation (third row of Fig. 5.4). While the FF synchronization can still track the fast clock drift, the FB scheme cannot follow the phase drift anymore.



Fig. 5.4: Received burst-mode signal (blue) at twofold oversampling and estimated sampling offset (red) over time for the Zhu and Gardner algorithm for various CFO and inner-loop delays  $D_{\rm L}$ .

# 6 Non-Data-Aided Clock Recovery for Continuous-Variable Quantum Key Distribution

This chapter investigates the use of digital NDA clock recovery at extreme-low SNR, e.g., for CV-QKD systems. Since NDA clock recovery algorithms often fail under such difficult conditions, auxiliary pilot tones are sent alongside the actual QKD signal, which increases the system complexity. The content of this chapter was presented at the *European Conference on Optical Communications (ECOC)* in 2023 [C3]. The material from the publication has been adapted to comply with the layout and the structure of this thesis.

[Beginning of paper [C3]]

### Pilot-Free Digital Clock Synchronization for Continuous-Variable Quantum Key Distribution

P. Matalla, M. S. Mahmud, C. Koos, and S. Randel

European Conference on Optical Communications (2023) DOI: https://doi.org/10.1049/icp.2023.2552

#### 6.1 Introduction

Fundamentally secure communication networks are essential for protecting our economy and society from cyber threats. During the past years a number of QKD

systems have been successfully demonstrated and first commercial products are deployed by government agencies and authorities. In the longer term, a quantum communication infrastructure could enable additional functionalities alongside QKD, such as digital signatures, authentication, and secret sharing schemes like e-voting [90]. QKD was first demonstrated with single photons and information is encoded, e.g., into their polarization or phase, and the secret key is established upon detection of individual photons. This so-called DV-QKD requires, however, dedicated hardware components such as single-photon detectors. Recently, an alternative approach, referred to as CV-QKD has attracted significant attention in the research community, since it allows to reuse components like inphase and quadrature modulators (IQMs) and balanced photodiodes (BPDs) originally developed for the telecommunications market [91].

In contrast to classical telecommunication links, CV-QKD links are operated in the vacuum noise limit at an SNR of  $-10\,\mathrm{dB}$  or below, with a noise bandwidth matching the symbol rate. This makes it necessary to revisit the DSP algorithms for coherent optical receivers. In systems adding the local oscillator (LO) at the receiver, the carrier frequency and phase recovery as well as the polarization de-rotation can be solved by adding pilot tones or symbols [92]. Moreover, the symbol clock frequency and phase needs to be recovered at the receiver side, which can be achieved by adding additional pilots [93].

In this paper, we demonstrate through numerical simulations and by evaluating measured waveforms that it is possible to recover the symbol timing from a 1-GBd quadrature phase-shift keying (QPSK) signal detected with a heterodyne coherent receiver even close to the receiver noise floor while the CFO is as large as 10 ppm. We obtain this in an optimized FF clock recovery structure with a TE based on the modified BAJ algorithm [C1, C2, 40].

### 6.2 Pilot-Free Digital Timing Synchronization

As discussed in section 2.3.1.2, a time delay  $\tau$  of a received signal x(t) corresponds to a linear phase shift  $\exp(\mathrm{j}\,2\pi f\tau)$  in FD. The algorithm investigated in this work exploits the spectral redundancy of a QPSK signal with RRC spectral shape with roll-off factor  $\rho$  in order to estimate the timing phase. To do so, the N-point FFT  $\tilde{x}_n$  of the sampled receive signal is computed. Afterwards, a clock tone is generated by calculating the spectral correlation of the left and right sideband. The phase of the clock tone is then proportional to the clock phase offset. Accordingly, the timing estimation  $\hat{\tau}$  is obtained according to eq. (2.32). To reduce the computational complexity it is sufficient to compute only the nonnegative frequency components that contribute to the correlation. Hence, the summation limits in eq. (2.32) can be adapted resulting in

$$\hat{\tau}_{\text{mod-BAJ}} = \frac{1}{2\pi} \arg \left\{ \sum_{n=N/4-\lfloor \rho N/4 \rfloor}^{N/4+\lfloor \rho N/4 \rfloor} \tilde{x}_n \tilde{x}_{n+\frac{N}{2}}^* \right\}. \tag{6.1}$$

Due to the correlation of the two sidebands, random noise is effectively suppressed while the clock tone is preserved. This effect is particularly useful in systems with extremely-low SNR, such as in CV-QKD or spread spectrum signals [C9]. To investigate the performance of the synchronization for low SNR, a simulation of the entire FF clock recovery architecture is implemented. Besides the TE according to eq. (6.1), this also includes the buffering of the signal, the interpolation as well as additional averaging. Fig. 6.1 shows the complete processing chain used for the balanced heterodyne detection in this work.

For the FF architecture, the signal is split into two paths. The first is used to estimate the clock phase, while in the second the signal is buffered for the duration of the estimation and then delayed by an integer multiple m of the sampling period in an EB and additionally by a fractional sampling period  $\mu$  by a 5-th-order Lagrange interpolator. The TE path starts with a buffer as well. This allows an overlapping of the N-sample-long blocks, such that the temporal resolution of the timing estimation is increased. In this work, an overlap of



Fig. 6.1: Full DSP chain with FF clock synchronization.

50% is used. Next, the TE algorithm is applied. The oversampling ratio of the signal must be chosen such that no aliasing occurs. Here, an oversampling of two is chosen. In addition to optimizing the FFT size, the autocorrelation is smoothed by an MA filter of length  $N_{\rm tap}$  in the complex plane to make the timing estimation less susceptible to noise. The timing phase is unwrapped over several unit intervals and linearly interpolated according to the resolution of the signal sequence. Finally, a position calculator determines the integer and fractional delay as mentioned earlier. More detailed explanations of the hardware implementation on FPGAs and comparisons between different FF architectures are provided in [C1, C2]. For the simulation, a random QPSK symbol sequence is generated, which is oversampled twice and pulse-shaped to an RRC spectrum. After adding

AWGN, a constant sampling frequency offset is applied followed by a receive filter matched to the RRC transmit filter. The performance metric is the mean-squared error (MSE) of the timing estimate  $\hat{\tau}$  from the set sampling offset  $\tau$  calculated in decibels.

Fig. 6.2(a) shows the simulation results for an SNR of -20 dB and a CFO of 0.5 ppm. For such a case, effective noise suppression is mandatory. To ensure that the clock tone extends over several frequency bins over which the correlation is formed, a high roll-off factor of  $\rho = 1$  is chosen. The FFT size as well as the MA filter length determine the noise reduction, but limit the synchronization bandwidth [C2]. It can be seen that the timing synchronization becomes better with larger FFT size and averaging. If the averaging is too large, the synchronization can no longer follow the CFO and the performance degrades. Fig. 6.2(a), right, shows the best performance for N = 8192 and  $N_{\rm tap}$  = 128 and demonstrates how the timing estimate follows the actual timing phase. Fig. 6.2(b) shows a scenario with a roll-off of 0.7, as it is used in the experiment. A higher SNR of  $-15 \, dB$  is simulated with a higher CFO of 1 ppm. The higher SNR allows the use of smaller FFT sizes and less averaging. At the same time this results in a synchronization at higher CFOs and it can successfully follow an offset of 1 ppm using N = 4096 and  $N_{\mathrm{tap}}$  = 256. Regarding the implementation complexity, the MA filter is simply an accumulating FIFO register and thus can be implemented without significant resources. The latency caused by the large FFT size and the averaging can be easily adjusted by the buffer in FF architectures, a feature which would not easily be possible in a FB architecture.



**Fig. 6.2:** Simulated MSE between the estimated and actual sampling offset for an FFT size N and averaging length  $N_{\rm tap}$ . In (a), a 20 dB SNR,  $\rho$  = 1, and a CFO of 0.5 ppm is used. In (b), a 15 dB SNR,  $\rho$  = 0.7, and a CFO of 1 ppm is used.

### 6.3 Experimental Validation

Fig. 6.3 shows the experimental setup to validate the timing synchronization. Two pseudo-random binary sequences (PRBSs) of order 15 and with a bit rate of 1 Gbit/s are generated in real-time in the transmitter-side AMD ZCU208 RF-System-on-a-Chip (SoC) running at a sampling clock of 3932.16 MHz. Afterwards, the two bit streams are mapped to QPSK symbols and are pulse-shaped. The signal is amplified and the spectral images are suppressed by a lowpass filter (LPF) before they are fed into an IQM, which modulates the optical carrier generated by an ECL at a frequency of 193.489 THz with optical power of 12 dBm.

The optical signal is then attenuated using a VOA and transmitted over 19.4 km of standard SMF. At the receiver-side, a free-running LO at 193.488 THz with 13 dBm optical power is mixed with the signal for heterodyne coherent reception in a BPD. To optimally drive the ADC of a receiver-side RF-SoC, an RF amplifier with 22 dB gain is used. The ADC is clocked by an external synthesizer to set a defined CFO of either  $10\,\mathrm{kHz}$ ,  $20\,\mathrm{kHz}$ , or  $40\,\mathrm{kHz}$ . After the ADC, the sampled signal is written into a block random-access memory (RAM) and read out for offline processing following the DSP chain shown in Fig. 6.1. Fig. 6.4(a) reveals that the CV-QKD signal operates close to the receiver noise floor. The receiver noise floor is determined by switching off the modulation at the transmitter. Accordingly, the noise floor includes the electrical noise together with the photodiode noise for a continuous-wave laser. As shown in Fig. 6.4(b), it is possible to detect and follow different CFO despite the low SNR. For all CFOs, N=1024 and 64-fold averaging is applied.

[End of paper [C3]. The paper's conclusion is added to chapter 8.]



Fig. 6.3: Experimental setup for the CV-QKD quantum channel.



**Fig. 6.4:** (a) Spectra obtained for the CV-QKD system over a 19.4 km long quantum channel. (b) Timing estimates following various CFOs. (c) Exemplary QPSK constellation of the CV-QKD signal.

### 7 Joint Non-Data-Aided Clock Recovery for Space-Division Multiplexed Optical Transmission Systems

This chapter reports on a novel clock recovery algorithm that is robust against spatial-and-polarization-mode dispersion by exploiting the spatial diversity of the received signals in SDM systems. The chapter has been published in the *Journal of Lightwave Technology* [J2]. The material from the publication has been adapted to comply with the layout and the structure of this thesis. Associated supplementary information can be found in appendix C.1.

[Beginning of paper [J2]]

#### Joint Non-Data-Aided Clock Recovery for Space-Division Multiplexed Optical Transmission Systems

P. Matalla, J. Krimmer, L. Schmitz, D. Fang, C. Koos, and S. Randel

Journal of Lightwave Technology, Volume 43, Issue 13, pages 6128-6138 (2025) DOI: https://doi.org/10.1109/JLT.2025.3546721

In recent years, SDM has been proposed as a technique to cope with the increasing demand for higher per-fiber capacity in optical networks by modulating multiple independent signals onto multiple spatial paths. This can be accomplished by using specialized fibers that carry multiple signals in a number of fiber cores, fiber

modes, or a combination of both types. In such fibers, strong spatial coupling of the signals requires for a joint DSP at the receiver. While research has mainly focused on system performance and multiple-input multiple-output (MIMO) equalizers, a reliable joint clock recovery tolerant to spatial-and-polarization-mode dispersion is an active field of research with recent progress. In this paper, we present a novel digital NDA joint clock recovery that is tolerant to polarization-and-spatial-mode dispersion. The joint clock recovery is implemented in a FF architecture, which allows simple implementation. We provide a detailed analysis of the algorithm complexity for hardware implementation. In simulations, we show low clock phase jitter for fiber lengths up to 10,000 km. Finally, we demonstrate clock recovery for a 90-GBd 16-QAM signal over a 150-km randomly-coupled 4-core fiber (RC-4CF) resulting in a total data rate of 2.88 Tbit/s per wavelength and analyze equalizer convergence using a dedicated joint clock recovery.

#### 7.1 Introduction

SDM has become a thriving area of research as it allows to increase the perfiber information capacity by transmitting independent data signals on multiple spatial paths [94]. In addition, spatial diversity might offer the potential to lower the energy consumption per bit by using shared hardware. Furthermore, coupled SDM systems benefit from greater tolerance to nonlinearities, the reuse of existing manufacturing, cabling, and installation technologies, and fewer spatial channel outages caused by faulty connectors [95, 96]. On the other hand, coupled SDM systems require a MIMO-DSP which leads to an increased computational complexity [97]. Spatial diversity can be obtained by utilizing MCFs, MMFs, or a combination of both [95, 96]. In MCFs, multiple fiber cores are arranged inside a single fiber cladding. Down to a certain limit of the core pitch where supermodes start to occur, dense packing of the fiber cores leads to strong coupling of the signals [98]. In MMFs, independent signals can be modulated onto the individual spatial modes. Here, modal crosstalk caused, e.g., by fiber imperfections and bends, is leading to a coupling of the signals as well. In both cases, MIMO-DSP

at the receiver is indispensable to compensate for the coupled channel impulse response matrix. To leverage the advantages of SDM transceivers, the parallel signals can use a shared oscillator as a clock. In this case, the number of oscillators required is reduced and a joint clock recovery can leverage spatial diversity.

So far, research has mainly focused on system performance and MIMO equalization to mitigate the multidimensional channel impulse response [94, 99]. However, in these experiments, the transmitter and receiver often shared a common clock for best performance, avoiding the need for clock recovery at the receiver [100–102]. The clock recovery is necessary to either physically synchronize the receiver clock to the transmitter clock in a control loop or to generate a control signal in case of a free-running receiver clock [2]. This can be achieved by using a joint implementation of an adaptive equalizer with a clock recovery [103, 104]. However, this approach has yet not been studied in the context of MIMO-equalizers with large dimensions. Furthermore, we show in this work, that a dedicated clock recovery improves the equalizer convergence, which features a significantly increased number of coefficients in SDM systems. While the differential group delay (DGD) in single-core single-mode fibers is still manageably small, it can span multiple tens or hundreds of samples due to spatial-and-polarization-mode dispersion in SDM fibers and, so far, a viable joint clock recovery for coupled SDM channels is still lacking [76, 105].

In our previous work, we demonstrated a joint clock recovery suitable for SDM optical transmission systems with coupled channels [C5]. Independently, a similar algorithm has been demonstrated in [106]. Both algorithms rely on the spectral correlation of all received signals, hence, employing the spatial diversity of the signals. In this extended paper, we expand on our research and explain the underlying mathematical concept, provide more detailed simulation results, and a complexity analysis. In addition, we investigate the MIMO-equalizer convergence for the three cases of a synchronized transmitter and receiver over a side-channel and non-synchronized transmitter and receiver using only an equalizer-based synchronization as well as dedicated joint clock recovery synchronization. Finally, we

have optimized our SDM transmission and demonstrate successful clock synchronization in a 90-GBd 16-QAM transmission experiment over a 150 km randomly-coupled 4-core fiber (RC-4CF) [107] resulting in a total data rate of 2.88 Tbit/s on a single wavelength.

# 7.2 Non-Data-Aided Joint Clock Recovery

The essential DSP building blocks of our envisioned SDM-receiver are depicted in Fig. 7.1. First, the D complex-valued, received signals  $\underline{r}^{(d)}(t)$  are digitized. Here, D = 8 is the number of all coupled spatial degrees of freedom including the polarization and  $d \in \{1, 2, ..., D\}$ . In addition, we define the number of uncoupled channels as K, in order to describe uncoupled MCFs, where only the polarizations are coupled, i.e., D = 2. For instance, a coupled 4-core fiber has D=8 and K=1, while an uncoupled 4-core fiber has D=2 and K=4. After resampling, a coarse frequency offset compensation is applied as described in [100]. The coarse frequency recovery is applied before the CD compensation, as we use heterodyne detection in our experiment and hence downconvert the signal to the baseband. For intradyne detection, the frequency recovery can also be performed after NDA clock recovery and MIMO equalization as done in common coherent receivers [108]. Afterwards, the quadratic spectral phase caused by CD needs to be removed before the clock recovery. Each signal  $\underline{x}_k^{(d)}$ at sampling instance k is then fed into a digital FF joint TE to compute a timing estimate  $\hat{\tau} \in [-0.5, 0.5)$  normalized to the symbol period of a sampling offset  $\tau$ . The phase is then unwrapped and divided into an integer sampling offset m and fractional sampling offset  $\mu \in [0,1)$  to correct the timing of the delayed signal  $\underline{x}_{k}^{(d)}$  in a FIFO register used as EB and a Lagrange interpolator, respectively [C1, 15]. In case of a free-running receiver clock, the EB can under- or overflow and hence a control signal to pause or skip samples is required as explained in [2] and in section 2.3.3. This underlines the necessity of a dedicated clock recovery in asynchronous communication systems. Finally, the signal is fed into a MIMO



Fig. 7.1: MIMO DSP building blocks as used in this work for heterodyne detection comprising a joint FF clock recovery.

equalizer. The presented approach does not require a nested architecture of clock recovery and equalization in a FB loop, which simplifies the implementation.

#### 7.2.1 MIMO Channel Model

The following considerations occur in Fourier domain and the DFT of a block of N samples is denoted by a tilde. We assume, without loss of generality, a signal that is twofold oversampled, i.e., the signal is sampled at a rate of 2 samples per symbol. The n-th frequency component of the received-signal vector  $\underline{\tilde{\mathbf{x}}}_n(\tau) \in \mathbb{C}^D$  after propagation through a coupled optical channel can be expressed in terms of the upsampled (sample rate upconversion by inserting zeros in between the symbols) transmit-symbol-sequence vector  $\underline{\tilde{\mathbf{s}}}_n \in \mathbb{C}^D$  and a lowpass filter  $\tilde{g}_n \geq 0$  applied to the real- and imaginary part of all D signals that comprises the digital pulse shaping as well as bandwidth limitations of the transmitter and is assumed to be real-valued and non-negative in FD since any nonlinear spectral phase can be compensated in the transmitter DSP. Furthermore, the received signal depends on the baseband MIMO-channel frequency response  $\underline{\tilde{\mathbf{H}}}_n(\tau) \in \mathbb{C}^D$ , and an additive

circularly-symmetric complex-valued Gaussian noise vector  $\underline{\tilde{\mathbf{n}}}_n$  with distribution  $\mathcal{CN}(0, \sigma_{\mathrm{noise}}^2 \mathbf{I}^{D \times 1})$  as

$$\tilde{\underline{\mathbf{x}}}_{n}(\tau) = \tilde{g}_{n}\tilde{\underline{\mathbf{H}}}_{n}(\tau)\tilde{\underline{\mathbf{s}}}_{n} + \tilde{\underline{\mathbf{n}}}_{n}. \tag{7.1}$$

The overall sampling offset  $\tau$  at the receiver is normalized to the symbol period and comprises a mode-averaged group delay of the channel  $\overline{\tau}_g$ , which we assume to be constant over time, and the time-varying sampling offset  $\tau_{\rm tx/rx}$  between the clocks at the transmitter and receiver side

$$\tau(t) = \tau_{\rm tx/rx}(t) + \overline{\tau}_{\rm g} \,. \tag{7.2}$$

As we assume a slow drift of the clock phases relative to the DFT size,  $\tau$  can be considered to be constant over time. The objective of the clock recovery is to estimate the transmitter-receiver sampling offset in order to synchronize the sampling phase of the receiver clock to the phase of the transmitter clock. Since  $\underline{\underline{s}}$  is the DFT of a twofold oversampled (by inserting zeros) symbol sequence that consists of i.i.d. random realizations every symbol period from a zero-mean, cyclostationary random process, the Fourier transform  $\underline{\underline{s}}$  is periodic with symbol rate N/2, i.e.,  $\underline{\underline{s}}_n = \underline{\underline{s}}_{n+N/2}$  for any  $n \in \{0, 1, \dots, N/2-1\}$ . Hence, the ensemble average  $\langle \cdot \rangle$  of the product of a frequency component at frequency bin n and the N/2-separated conjugate transpose component is equal to the variance  $\sigma_s^2$  of the random process

$$\left\langle \underline{\tilde{\mathbf{s}}}_{n}\underline{\tilde{\mathbf{s}}}_{n+N/2}^{\dagger} \right\rangle = \left\langle \underline{\tilde{\mathbf{s}}}_{n}\underline{\tilde{\mathbf{s}}}_{n}^{\dagger} \right\rangle = \sigma_{\mathbf{s}}^{2} \mathbf{I}^{D \times D} . \tag{7.3}$$

For the sake of simplicity, we neglect any mode-dependent loss (MDL) for now. Using a principal-mode decomposition [109, 110], we factorize the channel matrix as

$$\underline{\tilde{\mathbf{H}}}_{n}(\tau) = \underline{\tilde{\mathbf{U}}}\underline{\tilde{\mathbf{D}}}_{n}(\tau)\underline{\tilde{\mathbf{V}}}^{\dagger}. \tag{7.4}$$

In general, the matrices at channel output  $\tilde{\mathbf{U}}$  and channel input  $\tilde{\mathbf{V}}$  express unitary coordinate transformation matrices independent of frequency and  $\tilde{\mathbf{D}}$  is a diagonal matrix that describes the propagation delays  $\tau_d$  of each signal

$$\tilde{\mathbf{D}}_{n}(\tau) = e^{-j \, 4\pi \, \frac{n}{N} \Lambda} \tag{7.5}$$

with

$$\mathbf{\Lambda} = \operatorname{diag}\left[\tau_1, \dots, \tau_D\right]. \tag{7.6}$$

The overall sampling offset can be obtained by calculating the trace of the matrix  $\Lambda$  as

$$\frac{1}{D}\operatorname{tr}(\boldsymbol{\Lambda}) = \tau_{\mathrm{tx/rx}} + \overline{\tau}_{\mathrm{g}}. \tag{7.7}$$

Since we assume a frequency-independent group delay for now,  $\Lambda$  is independent of frequency. When the spatial channels are uncoupled, the transformation matrices are diagonal matrices and the clock recovery can be applied to all signals individually. In case of coupled signal propagation, modal dispersion (MD) prevents the correct estimation of the clock phase [76, 105]. Using the matrix decomposition, the transformation matrices map the transmit and receive signals to the principal modes of the channel which propagate with characteristic group delays [109–111]. Since the principal modes at the fiber input and output can be different, the unitary matrices  $\tilde{\mathbf{U}}$  and  $\tilde{\mathbf{V}}$  are not identical, see [109]. By considering the delays in the diagonal matrix  $\tilde{\mathbf{D}}_n$ , the sampling offset can be precisely determined.

# 7.2.2 Joint Timing Estimation Algorithm

In order to apply our channel model from the previous subsection, we consider no residual frequency offset and CD to be fully compensated in the preceding DSP modules (see Fig. 7.1. Our proposed novel joint timing estimation scheme is based on the algorithm by Barton & Al-Jalili [C3, 40], where the phase difference of two frequency components separated by symbol rate is compared to estimate the linear phase that is caused by a time delay in TD. To do so, a total of D

N-point FFTs  $\tilde{\mathbf{x}}(\tau)$  of the received signals are computed. The FFTs are split along the frequency axis into a left and right sideband  $\tilde{\mathbf{x}}_n(\tau)$  and  $\tilde{\mathbf{x}}_{n+N/2}(\tau)$  with  $n \in \{0,\dots,N/2-1\}$ , respectively. Afterwards, we consider a single frequency pair with N/2 frequency separation, which results in a matrix  $\tilde{\mathbf{M}}_n \in \mathbb{C}^{D \times D}$  as

$$\underline{\tilde{\mathbf{M}}}_{n}(\tau) = \underline{\tilde{\mathbf{x}}}_{n}(\tau)\underline{\tilde{\mathbf{x}}}_{n+\frac{N}{2}}^{\dagger}(\tau). \tag{7.8}$$

Due to the frequency separation equal to the symbol rate, the matrix  $\underline{\tilde{\mathbf{M}}}$  corresponds to a clock tone at the symbol rate. Considering the ensemble average of  $\underline{\tilde{\mathbf{M}}}_n$  and inserting the channel model from eq. (7.1) yields

$$\left\langle \tilde{\mathbf{M}}_{n}(\tau) \right\rangle = \left\langle \left( \tilde{g}_{n} \tilde{\mathbf{H}}_{n}(\tau) \tilde{\mathbf{g}}_{n} + \tilde{\mathbf{n}}_{n} \right) \left( \tilde{g}_{n+\frac{N}{2}} \tilde{\mathbf{g}}_{n+\frac{N}{2}}^{\dagger} \tilde{\mathbf{H}}_{n+\frac{N}{2}}^{\dagger}(\tau) + \tilde{\mathbf{n}}_{n+\frac{N}{2}}^{\dagger} \right) \right\rangle 
= \sigma_{\mathsf{s}}^{2} \tilde{g}_{n} \tilde{g}_{n+\frac{N}{2}} \tilde{\mathbf{H}}_{n}(\tau) \tilde{\mathbf{H}}_{n+\frac{N}{2}}^{\dagger}(\tau) ,$$
(7.9)

where the product of the zero-inserted transmit symbol sequence  $\langle \tilde{\mathbf{g}}_n \tilde{\mathbf{g}}_{n+\frac{N}{2}}^{\dagger} \rangle$  simplifies to the variance  $\sigma_s^2$  and the noise-times-noise and noise-times-signal terms vanish since they are uncorrelated. Inserting the channel's decomposition from eq. (7.4) and eq. (7.5) and defining  $\tilde{\gamma}_n = \tilde{g}_n \tilde{g}_{n+N/2}$  results in

$$\begin{split}
\left\langle \tilde{\mathbf{M}}_{n}(\tau) \right\rangle &= \sigma_{\mathsf{s}}^{2} \tilde{\gamma}_{n} \tilde{\mathbf{U}} \tilde{\mathbf{D}}_{n}(\tau) \tilde{\mathbf{V}}^{\dagger} \tilde{\mathbf{V}} \tilde{\mathbf{D}}_{n+\frac{N}{2}}^{\dagger}(\tau) \tilde{\mathbf{U}}^{\dagger} \\
&= \sigma_{\mathsf{s}}^{2} \tilde{\gamma}_{n} \tilde{\mathbf{U}} e^{-\mathrm{j} 4\pi \frac{n}{N} \mathbf{\Lambda}} e^{\mathrm{j} 4\pi \frac{n+\frac{N}{2}}{N} \mathbf{\Lambda}} \tilde{\mathbf{U}}^{\dagger} \\
&= \sigma_{\mathsf{s}}^{2} \tilde{\gamma}_{n} \tilde{\mathbf{U}} e^{\mathrm{j} 2\pi \mathbf{\Lambda}} \tilde{\mathbf{U}}^{\dagger}.
\end{split} \tag{7.10}$$

Note that the phase of this expression is frequency-independent. As in the case of polarization mode dispersion (PMD) [41], the determinant removes the influence of the unitary matrix  $\tilde{\mathbf{U}}$ , since  $\det(\tilde{\mathbf{U}}\tilde{\mathbf{U}}^{\dagger}) = 1$ . The determinant of an diagonal matrix is the product of the diagonal elements. Using eq. (7.7), this results in

$$\det\left(\left\langle \tilde{\mathbf{M}}_{n}(\tau)\right\rangle \right) = \sigma_{\mathsf{s}}^{2D} \tilde{\gamma}_{n}^{D} \, \mathrm{e}^{\mathrm{j} \, 2\pi \, \mathrm{tr}(\mathbf{\Lambda})}$$
$$= \sigma_{\mathsf{s}}^{2D} \tilde{\gamma}_{n}^{D} \, \mathrm{e}^{\mathrm{j} \, 2\pi D\tau} \, . \tag{7.11}$$

The phase of the determinant is now only proportional to the overall mode-averaged group delay including a constant sampling offset between the transmitter and receiver clock. By computing the argument, i.e., the phase from the interval  $-\pi$  to  $\pi$  of the complex-valued clock tone, the full equation to obtain  $\tau$  is

$$\tau = \frac{1}{2\pi D} \arg \left\{ \det \left( \left\langle \tilde{\mathbf{M}}_n(\tau) \right\rangle \right) \right\}. \tag{7.12}$$

In the practical implementation, the ensemble average is approximated as an average over the frequency. Since we are interested in the phase of  $\tilde{\underline{\mathbf{M}}}_n$ , an averaging over the phase would correspond to the geometric mean

$$\left\langle \underline{\tilde{\mathbf{M}}}_{n}(\tau) \right\rangle = \left( \prod_{n=0}^{N/2-1} \underline{\tilde{\mathbf{x}}}_{n}(\tau) \underline{\tilde{\mathbf{x}}}_{n+\frac{N}{2}}^{\dagger}(\tau) \right)^{\frac{2}{N}}. \tag{7.13}$$

However, the geometric mean involves many multiplications resulting in a high computational load. To reduce the computational complexity, we approximate the geometric mean using the arithmetic mean. The geometric mean is in good approximation to the arithmetic mean as long as the group delays  $\Lambda$  vary only slightly over the frequency. Averaging over all N/2 frequency bins, we can approximate eq. (7.13) using eq. (7.10) as

$$\left\langle \tilde{\mathbf{M}}_{n}(\tau) \right\rangle \approx \frac{2}{N} \sum_{n=0}^{N/2-1} \tilde{\mathbf{x}}_{n}(\tau) \tilde{\mathbf{x}}_{n+\frac{N}{2}}^{\dagger}(\tau)$$

$$= \frac{2}{N} \sum_{n=0}^{N/2-1} \tilde{\mathbf{M}}_{n}(\tau)$$

$$\approx \sigma_{s}^{2} \tilde{\mathbf{U}} e^{j 2\pi \Lambda} \tilde{\mathbf{U}}^{\dagger} \frac{2}{N} \sum_{n=0}^{N/2-1} \tilde{\gamma}_{n}.$$
(7.14)

We see that the amplitude of the clock tone is affected by the overlapping pulse shape  $\tilde{\gamma}_n$ . As indicated in Fig. 7.2, only the non-zero overlapping areas of the sidebands, which depend on the spectral roll-off of the RRC, contribute to the spectral

correlation. Hence, only these areas are computed to reduce the computational complexity. The estimation of the sampling offset  $\tau$  is then

$$\hat{\tau} = \frac{1}{2\pi D} \arg \left\{ \det \left( \sum_{\substack{n=0\\\tilde{\gamma}_n > 0}}^{N/2 - 1} \tilde{\underline{\mathbf{x}}}_n(\tau) \tilde{\underline{\mathbf{x}}}_{n + \frac{N}{2}}^{\dagger}(\tau) \right) \right\}.$$
 (7.15)

Note that due to the computation of the argument, the factor 2/N can be omitted. In the case of extreme bandwidth limitations or Nyquist signals (roll-off of approximately 0), modifications similar to those applied to FTN signals are necessary [32, 81, 82]. The modification of the proposed algorithm for such signals is subject of future research activities.

To further reduce the effect of noise on the timing estimate, the spectral correlation  $\tilde{\mathbf{M}} = \sum_n \tilde{\mathbf{M}}_n$  can be averaged over multiple FFT blocks before calculating the determinant. Taking into account the positions of the overlapping areas of the sidebands, the algorithm can also be adapted for fractional oversampling, provided that the signal spectrum can still be resolved [35]. Furthermore, the algorithm is independent of the modulation format. Lastly, note that for one-dimensional signals with D=1, e.g., for pulse amplitude modulation, the determinant has the trivial solution  $\det(x)=x$  and hence the proposed algorithm simplifies to the algorithms presented in [C1, C3].

# 7.2.3 Effect of Frequency-Dependent Group Delays

In order to explain the TE approach, we initially neglected any residual CD, MDL, and a frequency-dependence of the group delays in  $\Lambda$  for the sake of simplicity. In general, however, for a modulated signal with a bandwidth that is not small compared to the coherence bandwidth  $1/\sigma_g$  of the channel, the group delays become frequency-dependent (in [109] also referred to as higher-order MD). This leads to a frequency-dependent group delay  $\tau_{g,n}^{(d)}$  of the d-th principal mode and consequently  $\Lambda_n$  [109, 112]. Fig. 7.3(a) shows the simulated group



Fig. 7.2: The figure shows the spectral magnitude of the d-th spatial signal, highlighting the overlapping spectral components in the pulse shaping roll-off regions that are contributing to the FD correlation. Below, the multiplication of all left and right sideband components in a butterfly structure and at frequency bin n resulting in the matrix  $\tilde{\mathbf{M}}_n$  is visualized.

delays  $\Delta \Lambda_n = \Lambda_n - \Lambda_{n+N/2}$  for all frequency bins from 1 to N/2 and each principal mode of a 4-core fiber with a fiber length of 100 km. We simulated the waveplate model without MDL using 500 segments and a DGD of 1 ps/ $\sqrt{km}$  for PMD and a spatial group delay standard deviation  $\sigma_{\rm g,km}$  of  $10\,{\rm ps}/\sqrt{\rm km}$  [107]. The sampling offset between transmitter and receiver clock is set to zero  $\tau_{\rm tx/rx}$  = 0 and no mode-averaged group delay is considered, i.e.,  $\overline{\tau}_{g,n} = 1/D \operatorname{tr}(\Lambda_n) = 0$ . Fig. 7.3(a) shows how the respective group delays vary around their mean value with frequency, while the mode-averaged group delay (shown as black dashed line) remains frequency-independent. Due to this property, the geometric mean in eq. (7.13) can be used for averaging over the frequencies. However, the more hardware-efficient arithmetic mean in eq. (7.14) produces a good approximation only for a minor frequency dependency. To ensure that the arithmetic mean delivers a low estimation error, we can adjust the summation limits in eq. (7.15) to ensure that we are within the coherence bandwidth. This condition is fulfilled for signals with a very low spectral roll-off. For signals with a larger spectral roll-off, the frequency range for averaging must be reduced as the fiber length increases. As a result, the noise suppression deteriorates due to fewer available frequency bins given a fixed FFT size. However, in this case, an averaging of the



Fig. 7.3: Simulated group delay  $\Delta \Lambda_n = \Lambda_n - \Lambda_{n+N/2}$  for a 100 km RC-4CF. The waveplate model is emulated by simulating 500 segments with a DGD of 1 ps/ $\sqrt{\rm km}$  for PMD and a spatial group delay standard deviation  $\sigma_{\rm GD,km}$  of 10 ps/ $\sqrt{\rm km}$ . Using a  $2^{13}$ -point FFT, subfigure (a) shows the respective group delays (lines in color) and the mode-averaged group delay (black dashed line) for each frequency bin within 100 GHz, while (b) shows a zoom-in of 1 GHz around the frequency at n=N/4, that corresponds to 50 GHz for a 100-GBd signal.

correlation matrix  $\tilde{\mathbf{M}}$  over several FFT blocks can be applied. For a 100-km-long MCF, the total group delay standard deviation results to  $\sigma_{\mathrm{g}} = \sqrt{L}\sigma_{\mathrm{g,km}} = 0.1\,\mathrm{ns}$ . The group delay variance of each principal mode, i.e., the elements of  $\Delta \Lambda_n$ , have an averaged variance that is approximately  $\sigma_{\tau_{\mathrm{g}}^{(d)}}^2 \approx \sigma_{\mathrm{g}}^2/D$  [109]. This leads to a coherence bandwidth of the group delays of  $\sqrt{D}/\sigma_{\mathrm{g}} = 28.3\,\mathrm{GHz}$ . We consider a bandwidth around N/4 (half the symbol rate, where the roll-off regions overlap, i.e.,  $\tilde{\gamma}_n > 0$ ) that is much smaller than the coherence bandwidth. We find that about 3.5% of the coherence bandwidth (1 GHz) gives a good trade-off between the estimation error of the arithmetic mean and the available number of bins to average over. For a  $2^{13}$ -point FFT, 41 frequency bins are available. The hardware complexity can be reduced by not implementing the full FFT, but only the relevant frequency components, as explained in section 7.2.4. Fig. 7.3(b) shows a zoom-in of the group delays  $\Delta \Lambda_n$  within this bandwidth. It can be seen that  $\Delta \Lambda_n$  is only slightly frequency dependent, hence, providing a good approximation of the geometric mean.

### 7.2.4 Hardware Complexity Analysis

We analyze the proposed TE complexity for implementation in hardware, e.g., the realization using FPGAs or ASICs. We consider the necessity for three real-valued multiplications and three real-valued additions for a complex multiplication and two real-valued additions for a complex addition. Furthermore, we distinguish for a real-valued multiplication between the product of two variable values and the product between a variable and a constant factor, as this can be implemented by a binary shift-add algorithm without the need for dedicated multiplication cells [113]. For the sake of simplicity, we consider signed values (MSB indicates the sign) with a constant bit width of  $N_{\rm bit}$  throughout all computations. Therefore, for a multiplication with a constant factor,  $(N_{\rm bit}-1)/2$  binary shifts and  $\alpha_{\rm c}=(N_{\rm bit}-1)/2-1$  additions have to be implemented in average for  $N_{\rm bit}\geq 3$ . For a multiplication with an unknown factor, all possible  $N_{\rm bit}-1$  binary shifts must be implemented as well as the entire adder tree, which results in  $\beta_{\rm c}=N_{\rm bit}-2$  additions.

The complexity C of the TE is mainly attributed to the KD FFTs, the complex correlation of the sidebands, the averaging of the correlation, as well as the computation of the determinant. The block size N of the FFT determines, both, the frequency resolution and the noise power per frequency component. For a single N-point split-radix FFT [114, 115], the amount of real-valued operations scale with  $\mathcal{O}(N\log_2(N))$ . In our case, we use a large FFT size because a very low frequency resolution is needed, e.g, when using small pulse shaping roll-off factors (eq. (7.15)) or to maintain approximate frequency independence (eq. (7.14)). However, due to the RRC roll-off and the condition of weak frequency dependence, only a very small number of frequency components are used from this large FFT. It is therefore not necessary to implement the entire FFT. Furthermore, a large FFT size of, e.g.,  $2^{13}$ , is not necessary for the averaging of noise given a sufficient SNR. For a hardware-efficient implementation, we therefore suggest computing only the necessary frequency components in the range of the pulse shape overlap  $\tilde{\gamma}_n > 0$ . This allows a sufficiently fine choice of frequency resolution, e.g.,  $f_{\rm sa}/2^{13}$ , but with a smaller averaging length that is in the range of the

number of parallel samples per clock cycle, e.g., N=256. Taking into account the DFT symmetry (see Radix-2 FFT [116]), a frequency pair  $\tilde{x}_n$  and  $\tilde{x}_{n+N/2}$  can be computed from a block of N TD samples  $x_k$  with  $k \in \{0,1,\ldots,N-1\}$  as

$$\tilde{\underline{x}}_{n} = \sum_{k=0}^{N-1} \underline{x}_{k} e^{-j 2\pi k \frac{n}{N}} = \tilde{\underline{x}}_{n,e} + \tilde{\underline{x}}_{n,o} 
\tilde{\underline{x}}_{n+\frac{N}{2}} = \sum_{k=0}^{N-1} \underline{x}_{k} e^{-j 2\pi k \left(\frac{n}{N} + \frac{1}{2}\right)} = \tilde{\underline{x}}_{n,e} - \tilde{\underline{x}}_{n,o},$$
(7.16)

where  $\tilde{x}_{n,\mathrm{e}}$  and  $\tilde{x}_{n,\mathrm{o}}$  are the even and odd frequency bins of  $\tilde{x}_n$ , respectively. Dividing the incoming TD samples  $x_k$  into even  $x_k$ , and odd  $x_k$ , samples in a decimation in time (DIT) unit, the even and odd frequency components can be calculated by

$$\tilde{\underline{x}}_{n,e} = \sum_{k=0}^{N/2-1} \underline{x}_{k,e} e^{-j 4\pi k \frac{n}{N}}$$

$$\tilde{\underline{x}}_{n,o} = e^{-j 2\pi \frac{n}{N}} \sum_{k=0}^{N/2-1} \underline{x}_{k,o} e^{-j 4\pi k \frac{n}{N}} .$$
(7.17)

The overall block diagram for computing the frequency components as well as the required number of complex operations is shown in Fig. 7.4(a). As known from the Radix-2 FFT, we can divide the calculation of the frequency components from N samples into two summations over N/2 samples. These summations each contain four trivial multiplications  $\{-1,1,-\mathrm{j},\mathrm{j}\}$ , so that the total number of complex multiplications is only N-8. The required real-valued multiplications  $C_{\mathrm{fc},\times}$  for the so-called twiddle factors and real-valued additions  $C_{\mathrm{fc},+}$  for only one frequency component pair  $\tilde{x}_n$  and  $\tilde{x}_{n+N/2}$  account for

$$C_{\rm fc, \times} = 3N - 21$$
 (7.18)  $C_{\rm fc, +} = 5N - 21$  .

However, the twiddle factors can be pre-computed and stored in LUTs, and hence, a real-valued multiplication can be realized by using  $\alpha_c$  additions only. The

complexity of  $\nu$  frequency pairs for a total of KD signals can be expressed only in terms of real-valued additions as

$$C_{\rm fc} = KD\nu \left( (3\alpha_{\rm c} + 5)N - 21(\alpha_{\rm c} + 1) \right).$$
 (7.19)

The complexity to compute the frequency components is irrespective of whether the channels are coupled or not and scales linearly with the signal count KD. The total complexity scales as  $\mathcal{O}(KDN\nu)$ .



Fig. 7.4: Hardware architecture for computing the frequency pairs  $\tilde{x}_n$  and  $\tilde{x}_{n+N/2}$  (a) and MA filter (b). The required complex-valued operations are depicted in blue.

The number of complex multiplications and additions of the sidebands depend on the number of frequency pairs  $\nu$  and account for  $D^2\nu$  and  $D^2(\nu-1)$  operations, respectively. If the spatial channels are not coupled, each pair of polarizations can be processed separately in K 2×2 matrices  $\tilde{\mathbf{M}}$ . Hence, the total number of real-valued multiplications and additions is

$$C_{\text{corr},\times} = KD^2 3\nu$$
  
 $C_{\text{corr},+} = KD^2 (5\nu - 2)$ . (7.20)

As this involves a multiplication of two variable values, the total number of real-valued additions for the correlation results in

$$C_{\text{corr}} = KD^2(3\beta_c \nu + 5\nu - 2)$$
. (7.21)

The complexity of the correlation scales quadratically with the number of coupled channels and linearly with the number of frequency pairs as  $\mathcal{O}(KD^2\nu)$ .

The spectral correlation matrix  $\underline{\tilde{\mathbf{M}}}_t$  at time instance t is written into a FIFO register and the matrix elements are averaged over consecutive matrices. Using a MA with simplified architecture as shown in Fig. 7.4(b), only two complex additions are required for each of the  $D^2$  matrix elements. If the register length is a power of two, the division by the number of filter taps  $N_{\mathrm{tap}}$  is a simple bit truncation. Hence, the total amount of real-valued additions is

$$C_{\text{avg}} = KD^2 4 \tag{7.22}$$

and the complexity scales with  $\mathcal{O}(KD^2)$  only.

Finally, the complexity of computing the determinant of a complex matrix is analyzed. The Laplace expansion provides a straightforward implementation for calculating the determinant. However, the complexity scales with  $\mathcal{O}(D!)$ , which makes a hardware implementation even for small matrix dimensions unfeasible [117]. A more common and efficient way to compute the determinant for D > 3 is by using the QR factorization. Here, the complex matrix  $\tilde{\mathbf{M}}$  is decomposed into the product of an orthonormal matrix  $\mathbf{Q}$  and an upper triangular matrix  $\mathbf{R}$  as  $\tilde{\mathbf{M}} = \mathbf{Q}\mathbf{R}$ . Both matrices can be obtained using the Gram-Schmidt procedure. Because the classical Gram-Schmidt method often produces a matrix  $\mathbf{Q}$  that is not orthonormal, the modified Gram-Schmidt algorithm is used to counteract this issue [118, 119]. Since  $\det(\tilde{\mathbf{M}}) = \det(\mathbf{Q}) \det(\tilde{\mathbf{R}}) = \pm 1 \prod_{i=1}^{D} r_{i,i}$ , the computation of the determinant mainly consists of the QR factorization and the

multiplication of the diagonal complex elements of the matrix  $\underline{\mathbf{R}}$ . As derived in the Appendix C.1, the computation of the determinant for KD signals involves

$$C_{\text{det}} = K[(5+3\beta_{c})(D^{3}+D^{2}) + (2+3\beta_{c})D - 3\beta_{c} - 3]$$
(7.23)

real-valued additions. Compared to the Laplace expansion, the complexity is now reduced and scales with  $\mathcal{O}(KD^3)$ . Finally, the total timing estimation requires  $C = C_{\rm fc} + C_{\rm avg} + C_{\rm corr} + C_{\rm det}$  real-valued additions. For only a few coupled channels, the complexity is mainly determined by the frequency components. As the number of coupled channels increases, efficient implementation of the determinant becomes more crucial.

Fig. 7.5 shows the number of real-valued additions C per signal required for the clock recovery of  $KD \in \{2, 8, 14, 64\}$  signals, e.g., for coupled and uncoupled single-core, 4-core, 7-core, and 32-core fibers. We assume an overall bit width of 6 bit for all calculation steps and neglect bit growth, the computation of  $\nu = 10$  frequency pairs using a block size of N = 256 samples, and an averaging of 8 consecutive correlation matrices. For a coupled 4-core and 32-core fiber, the joint timing estimation requires 9.3% and 336.7% more operations compared to K clock recovery implementations for individual dual-polarization channels, respectively. However, the algorithm complexity of a joint clock recovery is still low compared to MIMO-equalizers [120]. Finally, we want to emphasize that the use of appropriate FFT sizes also allows joint implementation with FD CD compensation and equalization. Further reduction in FFT complexity could be achieved by splitting a large FFT over several clock cycles such that only a fraction of the total FFT needs to be implemented for each clock cycle [C1].



Fig. 7.5: Number of real-valued additions per signal for  $KD \in \{2, 8, 14, 64\}$  signals. The blue/green bars show the complexity for uncoupled spatial channels, i.e., K clock recoveries for each dual-polarization signal (D=2) and the red/yellow bars show the complexity for a joint clock recovery employing coupled spatial channels (K=1). The number of operations are computed for a clock recovery with a bit width of 6 bit, 10 frequency pairs, a block size of 256 samples, and averaging of 8 consecutive matrices.

#### 7.3 Performance Simulation

To verify our concept in simulations, we generate a 100-GBd QPSK sequence, that is twofold oversampled and interpolated using an RRC filter with 0.01 roll-off. Afterwards, we numerically approximate the MCF-channel using the waveplate model [105, 121, 122] with 500 segments and a PMD DGD of 1 ps/ $\sqrt{\text{km}}$  and an overall spatial channel group delay variance normalized to the square-root fiber length of  $10 \text{ ps}/\sqrt{\text{km}}$  [107]. At the receiver side, AWGN is added to set a certain SNR with a noise bandwidth matching the symbol rate. The resulting signal is filtered using a receive filter matched to the pulse-shaping filter. Finally, the clock recovery algorithm [40] is applied either to each received channel individually or jointly to all received signals using the procedure described in the section before. Fig. 7.6 shows the simulation results for a received SNR of 10 dB and a  $2^{13}$ -point FFT of which we compute 11 frequency components per sideband for the timing estimation. Fig. 7.6(a) shows the timing estimate  $\hat{\tau}$  as a function of the simulated clock phase  $\tau$  obtained for 30,000 signal realizations with constant timing offset  $\tau$  and a single realization of a 3,000 km long 4-core fiber (D = 8, K = 1). Here, the left and right figure shows the timing estimates without averaging of the correlation matrix  $\tilde{\mathbf{M}}$  and with a MA of eight consecutive matrices, respectively. It is obvious that the averaging improves the suppression of the random noise, leading to an improvement of jitter from  $-32\,\mathrm{dB}$  to  $-49\,\mathrm{dB}$ . The jitter is defined as the variance of the deviation of the estimated clock phase from the actual clock phase in decibels as  $20\log_{10}\left(\mathrm{std}(\hat{\tau}-\tau)\right)$ , where  $\mathrm{std}(\cdot)$  computes the standard deviation. Furthermore, we notice a D-fold phase ambiguity as expected from eq. (7.12). Together with the noise, this phase ambiguity can result in errors when unwrapping the phase. However, due to a sufficient large block size N, the averaging over frequency bins, and optional moving averaging of the correlation matrix  $\tilde{\mathbf{M}}$ , the effect of noise can be effectively reduced.

Each block of N samples at sample rate  $f_{\rm sa}$  will result to one timing estimate. To further increase the temporal resolution, the sample blocks can overlap by  $N_{\rm o}$  samples. Finally, the MA filter will limit the 3-dB bandwidth of the timing estimation to about  $1/N_{\rm tap}$  of half the estimation rate. The 3-dB bandwidth of the TE  $f_{\rm 3dB}$ , i.e., the maximal frequency of the jitter that can be tracked by the TE, can be expressed as

$$f_{\rm 3dB} \approx \frac{f_{\rm sa}}{N - N_{\rm o}} \frac{1}{2N_{\rm tap}} \,.$$
 (7.24)

Even with a large  $2^{13}$ -point FFT, a CFO of 5 ppm at 200 GSa/s can be resolved by a sufficient number of  $10^6/2^{13}/5 \approx 24$  timing estimations. Considering a practical implementation of single frequency pairs from a block of 256 samples, an overlap of 128 samples, and an averaging of 8 consecutive correlation matrices, the maximum traceable jitter at twofold oversampling results to about 100 MHz, which corresponds to 500 ppm at 200 GSa/s. Note that a FF implementation of clock recovery does not require a loop filter in a FB configuration, which will also feature a lowpass characteristic. This makes the proposed clock recovery architecture applicable for long-haul transmission, where jitter caused by equalization enhanced phase noise may be a limiting factor [123–125].

Next, we set a constant zero sampling offset  $\tau = 0$  and evaluate the timing jitter from 0 km to 10,000 km. We simulate 1,000 signal realizations for a single MCF

channel realization per fiber length from which we compute the jitter. Again, we compute 11 sideband components of a 213-point FFT and apply an 8-tap MA of the correlation matrix. In Fig. 7.6(b), the jitter using a joint clock recovery and a clock recovery applied to each core individually is shown after a simulated MCF transmission with four cores. In addition, we plot as a comparison the joint clock recovery performance for a higher core count of 19 randomly-coupled cores, which we also simulated with a spatial channel group delay variance of  $10 \text{ ps}/\sqrt{\text{km}}$ for a fair comparison to the 4-core fiber and as this was a value experimentally confirmed in [126]. Since the clock recovery for each core after propagating through a 4-core fiber is strongly impaired by MD, the timing estimates are heavily distorted and lie within the gray-shaded area in Fig. 7.6(b). In contrast to the percore clock recovery, the joint timing estimation offers low-jitter performance up to a transmission distance of 10,000 km. For the joint clock recovery, we observe a 4-dB and 10-dB degradation for the 4-core and 19-core fiber. The reason for this is a worse approximation of eq. (7.14) with increasing group delay variance. The 19-core fiber exhibits a better jitter for short fiber distances, since the sampling offset is averaged over a larger matrix. The dashed lines represent a minimum requirement for the jitter in order to prevent distorted timing estimates due to D-phase ambiguity. They are calculated by assuming an uniform distribution of the timing estimates within the estimation range of 1/D resulting in

$$J_{\text{max}} = 10 \log_{10} \left( \frac{1}{12D^2} \right). \tag{7.25}$$



Fig. 7.6: Clock recovery simulation results for a 1% roll-off QPSK signal with an electrical SNR of 10 dB after propagation through a randomly-coupled multi-core fiber (MCF). (a) Simulated timing estimates  $\hat{\tau}$  over set sampling offset  $\tau$ . The timing estimates are obtained for 30,000 constant timing offset realizations  $\tau$  in a 3,000 km long 4-core fiber (D=8). The left and right figure shows the timing estimates without averaging of the correlation matrix and with an averaging of eight consecutive matrices, respectively. (b) Timing jitter evaluated for 1,000 realizations over fiber length. Using a joint clock recovery with averaging of over eight correlation matrices, the jitter over all 1,000 realizations for a 4-core and 19-core fiber is shown. The jitter for the per-core clock recovery after a 4-core fiber lies within the gray area.

# 7.4 Experimental Validation

We validate the joint clock recovery algorithm in an SDM transmission experiment with balanced heterodyne detection, as depicted in Fig. 7.7. A dual-polarization RRC-shaped 90-GBd 16-QAM signal with 0.01 roll-off factor and sequence length of 230,400 i.i.d. symbols, corresponding to 2.56 µs, is generated with a 45-GHz and 120-GSa/s AWG (AWG, Keysight M8194A) followed by 50-GHz driver amplifiers (SHF807). Using a 35-GHz dual-polarization inphase and quadrature modulator (DP-IQM), the signal is then modulated onto an optical carrier at

1550 nm generated by an ECL with a nominal linewidth of 100 kHz. The optical signal is amplified by an erbium-doped fiber amplifier (EDFA) and out-of-band amplified spontaneous emission (ASE) noise is suppressed by a 1 nm optical bandpass. Spatial multiplexing is emulated by decorrelating four copies of the signal by delays of 0 ns, 50 ns, 0.5 µs, and 1 µs (delays determined by fibers available in the lab), respectively, before launching them into the MCF fan-in with an optical signal power of 10 dBm per core. The MCF link consists of three concatenated 50-km-long, single-mode RC-4CF [107] spools, i.e., spanning a total transmission distance of 150 km. The overall group delay spread caused by MD is specified as 10 to  $12 \text{ ps}/\sqrt{\text{km}}$ . Including the fan-in and fan-out losses, the total loss per spool is around 9 dB. Therefore, to ensure sufficient optical receive signal power at the photodiodes, the optical signal is amplified to 10 dBm per core after 100 km using four EDFAs. This results in an optical signal power of about 1 dBm per core before the dual-polarization optical hybrids (DP-OHs). At the receiver, a second ECL is used as LO, which is detuned by 48 GHz for balanced heterodyne detection. The LO is amplified before it is split and launched into the four DP-OHs with a power of 17 dBm per receiver. Since we use heterodyne detection, only one quadrature of each signal is detected, and hence, only eight BPDs and synchronized oscilloscope channels are required. Here, we utilize four 100 GHz BPDs (Fraunhofer HHI) and four 90 GHz BPDs (Finisar BPDV4120R). Finally, 737,280 symbols of each received signal are captured by two synchronized 4-channel 100-GHz oscilloscopes (Keysight UXR1004A) and DSP is carried out offline.

The receiver DSP chain is depicted in Fig. 7.1(a). First, all signals are sampled by ADCs. Afterwards, the signal is downconverted to baseband and resampled to two samples per symbol. Next, the carrier-frequency offset is estimated from a single polarization and applied to all received signals [100]. After FD CD compensation using the overlap-and-save algorithm [127], our proposed clock recovery follows. The joint clock recovery selects 41 frequency components from each sideband from a  $2^{13}$ -point FFT. In the experiment, we further employ an overlap of N/2 samples for each FFT and average the matrix  $\tilde{\mathbf{M}}$  over 16 consecutive estimates [C3]. Afterwards, we use an  $8\times8$  MIMO-equalizer with 150 filter



Fig. 7.7: 150 km RC-4CF transmission setup employing balanced heterodyne detection. Synchronization between the transmitter and receiver can be obtained using a side channel with a switch (green).

taps whose coefficients are obtained using the least mean square (LMS) algorithm with integrated phase recovery switching from data-aided to decision-directed mode [99] after 40,000 computed output symbols. To achieve fast convergence in the initialization phase and improved convergence later, the stochastic gradient descent step size is switched from  $\mu_{\rm eq,1}$  =  $5\times10^{-5}$  to a step size  $\mu_{\rm eq,2}$  after 20,000 computed output symbols. After the MIMO-equalizer, we use a real-valued  $2\times2$  post-equalizer for each received polarization to remove residual transmitter impairments. Lastly, we evaluate the BER and the SNDR over the final  $5\times10^5$  symbols.

To prove the proper operation of our approach, we use the AWG and oscilloscopes with and without external synchronization through a side channel. Fig. 7.8(a) shows successful timing estimation for both cases over 8 µs. We can observe a frequency offset of about 880 kHz between the transmitter and receiver, that corresponds to around 4.9 ppm at twofold oversampling. Without dedicated clock recovery, the MIMO-equalizer would have to continuously track and compensate this sampling phase walk. This becomes apparent in Fig. 7.8(b), where the equalizer convergence is examined. The figure shows the temporal evolution of the LMS error, which is smoothed over time and averaged for all D = 8 signals. Here, the equalizer is operated in data-aided mode only. For the case of a synchronized transmitter and receiver, the equalizer must mainly compensate for a constant sampling offset with weak phase fluctuations (see orange line in Fig. 7.8(a)), which is why it still converges reliably for a low step size of  $\mu_{eq,2} = 5 \times 10^{-6}$ . In the scenario where transmitter and receiver are not synchronized and no clock recovery is used (see blue curves in Fig. 7.8(a)), the equalizer must also track and compensate for a CFO. In this case, the FB control loop of the equalizer must have the necessary bandwidth and stability. Note, that the step size  $\mu_{\rm eq}$  corresponds to the integral coefficient in a PLL design. For a low step size of  $\mu_{eq,2} = 5 \times 10^{-6}$ , the LMS error increases in the second training phase since the equalizer cannot follow the clock phase drift fast enough. For this reason,  $\mu_{eq,2}$  has to be set to a larger value to follow the 4.9 ppm clock phase drift. In this case, the equalizer performs comparably to the case of a synchronized transmitter and receiver. In the realistic scenario, with no synchronization but with dedicated clock recovery, the

LMS error is about 1 dB lower. In this case, the equalizer only has to compensate for a constant sampling offset, but no clock phase fluctuations, i.e., it has to track reduced channel dynamics. This allows for a small step size parameter  $\mu_{\rm eq,2}$  and enables a better convergence as well as an improved stability of the equalizer. It should be emphasized that the equalizer convergence significantly affects the performance when switching to decision-directed mode, as this causes a worse LMS error the more errors are made during decisioning.

The transmission performance in terms of the SNDR and BER for all 8 signals and no synchronization between transmitter and receiver is shown in Fig. 7.8(c). The results shown in gray ( $\mu_{\rm eq,2} = 5 \times 10^{-5}$ ) and blue ( $\mu_{\rm eq,2} = 5 \times 10^{-6}$ ) are for the case without and with dedicated clock recovery, respectively. The results with joint clock recovery are for the decision-directed mode, while for the equalizer-based clock recovery the data-aided mode was applied, as too many errors in the hard-decision caused the equalizer to diverge. For the SNDR, we observe an improvement of 1 dB if our proposed joint clock recovery is used. We demonstrate successful transmission assuming a pre-FEC BER limit of  $2.41 \times 10^{-2}$ , which can in principle be achieved, e.g., using a soft-decision low-density parity-check code (LDPC) code with 20% overhead [128, 129].

[End of paper [J2]. The paper's conclusion is added to chapter 8.]



Fig. 7.8: (a) Timing-error evolution (modulo one symbol interval) with and without external synchronization of the transmitter and receiver. (b) LMS error smoothed over time and averaged over all receive signals over the number of computed output symbols. (c) SNDR and BER for all eight receive signals.

# 8 Summary and Outlook

In the following, the results achieved are briefly summarized and an outlook on pending research questions is given. The conclusion and outlook of the respective chapters are listed below and are mostly obtained from the publications cited.

#### Hardware implementation of all-digital clock recovery

[obtained from [P1]]

An EB design that enables all-digital clock recovery with a free-running receiver oscillator, supporting, both, positive and negative CFOs is reported for the first time. The clock recovery is implemented on an FPGA, demonstrating error-free data transmission for CFOs up to  $\pm 400\,\mathrm{ppm}$ . This method eliminates the need for analog VCO control and a low-speed DAC, offering an important step toward fully-digital, power-efficient clock recovery in modern DSP-based optical transceivers.

#### **Short-reach optical links**

[obtained from [J3]]

This work analytically examines the influence of CD on clock recovery in direct-detection systems. Power fading is identified as the dominant impairment, causing  $\pi$ -phase errors at specific dispersion values. Additionally, the clock tone may cancel out when the weighted cosine product approaches zero or when spectral narrowing due to power fading occurs. These effects are consistently observed in simulation and experiment. To mitigate them, CD-tolerant clock recovery algorithms for NRZ and RRC/FTN signals are proposed, which compensate phase distortions in the FD. Reliable clock recovery under severe CD is critical for the convergence of adaptive equalizers. This work therefore makes an important contribution to improving the robustness of future high-speed direct-detection

systems equipped with DSP. Future research may consider low-power baud-rate clock recovery and the investigation of the additional influence of modulator chirp. The presented 4th-power method can be implemented in a more robust way for high roll-offs as described in [32]. In future work, the findings and algorithms of this work can be used to perform high-speed transmission experiments with FTN signals beyond 200 GBd.

#### Passive optical networks (PONs)

The introduction of DSP from 50G-PONs enables the implementation of a digital clock recovery and hence might replace analog CDR. The restriction on highfrequency jitter and CFOs tolerance relies primarily on the algorithm architecture rather than the individual algorithms themselves. Using an FF clock recovery enables efficient compensation of high-frequency jitter and potentially allows the use of the fully-digital recovered free-running, low-cost clock in the ONU for upstream transmission. It is demonstrated, that synchronization speeds are inherently constrained to the order of microseconds due to FB delays. On the other hand, FF structures allow for ultra-fast synchronization within tens of nanoseconds. In the course of the thesis, synchronization for a burst-mode signal with a symbol rate of 56-GBd within 36.57 ns using FF clock recovery is demonstrated. This makes them a potential candidate for digital clock recovery in future highspeed PONs, where low-cost oscillators with high-frequency jitter are utilized and fast synchronization in upstream is required. It remains to demonstrate the envisioned OLT-ONU-OLT transmission with a free-running oscillator and fullydigital clock recovery. In that context, the jitter generation, transfer, and tolerance can be studied in more detail.

#### **Continuous-variable quantum key distribution (CV-QKD)**

[obtained from [C3]] and modified

A pilot-free digital FF clock synchronization scheme based on the modified timing estimation algorithm of Barton and Al-Jalili is demonstrated in simulation and in experiment for application in CV-QKD. In this context, operation near the receiver noise floor is possible even at practical CFOs as high as 10 ppm. This paves the way towards simplified CV-QKD systems, which can operate without auxiliary signals and pilot tones and reuse mature optical transceivers and DSP from the

telecommunications market. In the presented proof-of-concept experiment of this thesis, the optical signal was manually attenuated to a level close to the total noise floor. Future work may involve verifying the FF clock recovery in a practical CV-QKD system with proper noise calibration to shot noise units and by specifying the achieved key rates. Furthermore, the large block size of the clock recovery will result in high hardware requirements for a real-time implementation, e.g., on an FPGA [C1, C2]. Further research has to be undertaken on low-complexity NDA clock recovery and their implementation feasibility on FPGAs.

#### Space-division multiplexing (SDM) optical systems

[obtained from [J2]]

We have presented a novel joint clock recovery algorithm that is tolerant to spatialand-polarization-mode dispersion by computing the joint group delays of the principal modes in FD from the spectral correlation matrix. In simulations, the algorithm performs well for randomly-coupled fibers with lengths of up to 10,000 km. Furthermore, we analyzed the hardware complexity for FPGA or ASIC implementation. Compared to uncoupled-channel dual-polarization signals, computing the determinant of large correlation matrices becomes more crucial. Finally, we experimentally demonstrated joint clock recovery in a 90-GBd 16-QAM transmission over 150 km RC-4CF resulting in a total data rate of 2.88 Tbit/s. Using a dedicated clock recovery relieves the equalizer and improves equalizer convergence and stability, which results in an SNDR improvement by more than 1 dB in decision-directed mode. More in-depth research regarding the effect of a frequency-dependent mode-averaged group delay and the influence of MDL as well as optimizations for improved inphase and quadrature imbalance tolerance is proposed for future research. The dispersion-tolerant clock recovery algorithm presented in this work enables robust and hardware-efficient DSP in future SDM systems and is an important step towards the practical use of SDM systems.

# **Appendices**

# A Discrete-Time Signals and Systems

DSP considers quantized signals in time and amplitude, so-called digital signals, which are processed in digital circuits or systems. In this appendix the mathematical principles and basic hardware structures of discrete-time signals and systems, that are relevant for this thesis, are provided. For the sake of simplicity, the TD signals are considered to be real-valued.

# A.1 Discrete-Time Signals

A signal is the variation of an observed quantity over one or many independent variables that contains information relevant to the observer [38]. The signal can be of physical or non-physical nature. In communication engineering, information is conveyed from a transmitter to a receiver by transmitting signals through a system. The physical transmission medium is a continuous-time system, through which a continuous-time signal x(t) is propagated. A DAC converts a digital signal into a time-continuous waveform, whereas an ADC in turn digitizes an analog signal. Signal values in between the equidistant sampling points are lost.

There exist two mathematical representations of discrete-time signals [38]. The first describes the discrete signal as a series

$$x_k = x(kT_{\rm sa}), \qquad k \in \mathbb{Z},$$
 (A.1)

where  $kT_{\rm sa}$  are the sampling instances obtained after each sampling period  $T_{\rm sa}$ . The reciprocal  $f_{\rm sa}$  =  $1/T_{\rm sa}$  defines the sampling frequency. In the second representation, the discrete-time signal is expressed by a multiplication of the time-continuous signal with a pulse series as

$$x_* = x(t) \sum_{k=-\infty}^{\infty} \delta(t - kT_{\rm sa}), \qquad (A.2)$$

where the asterisk depicts the time-discretized signal. The choice between the respective representations depends on the specific application context. The first form is commonly employed in summation-based operations, such as FIR filtering. In contrast, the second representation, which utilizes a pulse train (or Dirac comb), is particularly well suited for Fourier analysis due to its advantageous mathematical properties, especially its behavior under the Fourier transform, where it leads to a periodic spectrum that facilitates analytical treatment.

In many signal processing applications, examining the spectrum of a sampled signal is essential. For a sampled signal  $x_k$ , the DTFT and inverse DTFT are defined by

$$\tilde{x}(f) = \sum_{k=-\infty}^{\infty} x_k e^{-j 2\pi f k T_{\text{sa}}}$$

$$x_k = \frac{1}{2\pi} \int_{-\pi}^{\pi} \tilde{x}(f) e^{j 2\pi f k T_{\text{sa}}} df,$$
(A.3)

with the frequency f normalized to the sampling frequency or commonly also referred as normalized angular frequency  $\Omega = 2\pi f/f_{\rm sa}$  [46]. The resulting spectrum is continuous in f and periodic with  $2\pi$ . In practical digital systems, however, the spectrum can only be evaluated at discrete frequency points, known as frequency bins. By sampling the spectrum within the interval  $[-\pi,\pi)$  at N equidistant points, specifically at the frequencies  $\Omega_n = 2\pi n/N$  for  $n \in \{0,\ldots,N-1\}$ , a

discrete formulation of the DFT is obtained. The DFT is defined for a sampled signal of finite length  $x_k$  with  $k \in \{0, ..., N-1\}$  as

$$\tilde{x}_{n} = \sum_{k=0}^{N-1} x_{k} e^{-j 2\pi k \frac{n}{N}}$$

$$x_{k} = \frac{1}{N} \sum_{n=0}^{N-1} \tilde{x}_{n} e^{j 2\pi k \frac{n}{N}} .$$
(A.4)

Sampling the DTFT in FD corresponds to a periodic extension of the finite-length signal  $x_k$  in TD. Because the TD signal of the inverse DFT is periodic, it can also be represented as a discrete-time Fourier series [46].

# A.2 Discrete-Time Systems

In signal processing, the term system or filter refers to a unit which responds to a stimulus at the input with a reaction at the output. Since in the scope of this thesis only discrete-time signals are considered, the total input sequence can be referred to as  $x_k$ , whereas the output sequence is  $y_k^{-1}$ . Thus, the behavior of the system can be mathematically described by the discrete-time operator equation [38]

$$y_k = \mathcal{S}\{x_k\}. \tag{A.5}$$

The operator S maps an input series to an output series and is not to misunderstand with the case, where a single entry sample is mapped to a single exit sample. On the basis of the operator equation many properties, such as linearity, time-invariance, causality, dynamics, and stability can be analyzed. In the following, the LTI system is explained. Afterwards, the z-transform is introduced to describe a system with differential equations. This provides the necessary mathematical

Note the analogy to the time-continuous case, where x(t) describes, both, the value of the function at a single point in time t and the whole function of time t. Since discrete-time systems always map series to series, no misunderstandings are to be expected.

framework to analyze the phase behavior of discrete filters with finite impulse response.

# A.2.1 Linear Time-Invariant Systems

LTI systems are of special interest in DSP since they have a simple mathematical representation that facilitates their analysis. First, the properties of the system are defined according to [38].

#### **Definition 1. Linearity**

A discrete-time system S is called linear if for two arbitrary input signals  $x_k^{(1)}$  and  $x_k^{(2)}$  and two arbitrary constants  $c_1, c_2 \in \mathbb{R}$  or  $\mathbb{C}$ 

$$S\left\{c_{1}x_{k}^{(1)}+c_{2}x_{k}^{(2)}\right\}=c_{1}S\left\{x_{k}^{(1)}\right\}+c_{2}S\left\{x_{k}^{(2)}\right\} \tag{A.6}$$

is valid.

#### **Definition 2. Time-invariance**

A discrete-time system S is called time-invariant if a delay of the input signal  $x_{k-k_0}$  results in a delayed output  $y_{k-k_0}$ .

The impulse response is now introduced as the response of a system to a discretetime Dirac impulse stimulus

$$h_k = \mathcal{S}\{\delta_k\}. \tag{A.7}$$

Based on the properties of the Dirac impulse, a single sample at time instance  $k_0$  of the input series can be represented as a convolution with the Dirac impulse as

$$x_{k_0} = \sum_{k=-\infty}^{\infty} x_k \delta_{k-k_0} \tag{A.8}$$

and the whole input sequence can be formulated as convolution sum

$$x_k = \sum_{m=-\infty}^{\infty} x_m \delta_{k-m} . (A.9)$$

Inserting this form of the input series into the operational equation of the LTI system from eq. (A.5) and using the definition of linearity and the definition of the impulse response results in

$$y_{k} = \mathcal{S}\{x_{k}\}$$

$$= \mathcal{S}\left\{\sum_{m=-\infty}^{\infty} x_{m} \delta_{k-m}\right\}$$

$$\stackrel{\text{eq. (A.6)}}{=} \sum_{m=-\infty}^{\infty} x_{m} \mathcal{S}\{\delta_{k-m}\}$$

$$\stackrel{\text{eq. (A.7)}}{=} \sum_{m=-\infty}^{\infty} x_{m} h_{k-m}$$

$$= x_{k} * h_{k},$$
(A.10)

where \* is the convolution operator. Due to the commutativity of the convolution the above expression can be rewritten as

$$x_k * h_k = \sum_{m=-\infty}^{\infty} x_{k-m} h_m. \tag{A.11}$$

Eq. (A.11) shows that a discrete-time LTI system can be completely characterized by its impulse response. Depending on the length of the impulse response a system can be distinguished. A system with finite impulse response (FIR) is referred to as FIR filter, whereas an infinite impulse response (IIR) is referred to as IIR filter.

# A.2.2 Z-Transform

The relationship of time-continuous signals in a technical system is usually described by linear differential equations with constant coefficients. A similar approach is also used in the z-transform to describe discrete-time systems. This chapter is dedicated to a definition and qualitative explanation of the z-transform

as it will be required in the course of the thesis. A detailed mathematical analysis of the Laplace and z-transform goes beyond the scope of this work and the interested reader is referred to literature, e.g., [38, 46].

An LTI system is fully characterized by its impulse response  $h_k$ . Because  $h_k$  can be interpreted as a signal, the Fourier transform can be applied to obtain further information about the system behavior. Since for a limited class of functions the Fourier integral does not exist, the Fourier transform is extended by a term  $e^{-\kappa t}$  with  $\kappa \in \mathbb{R}$  to ensure the convergence of the Fourier integral for large  $\kappa$ . This transformation is referred to as Laplace transform and is applied to time-continuous systems. If this approach is used on the sampled signal of eq. (A.2), the discrete-time equivalent of the Laplace transform, the so-called z-transform, is obtained. The z-transform  $\tilde{\underline{x}}(\underline{z}) = \mathcal{Z}\{x_k\}$  of a signal sequence  $x_k$  and its counterpart, the inverse z-transform, are defined as

$$\tilde{\underline{x}}(\underline{z}) = \sum_{k=-\infty}^{\infty} x_k \underline{z}^{-k} 
x_k = \frac{1}{j 2\pi} \oint_C \tilde{\underline{x}}(\underline{z}) \underline{z}^{k-1} d\underline{z}.$$
(A.12)

The inverse z-transform is defined as a contour integral over a closed circular and counterclockwise-oriented path C, which encircles the origin and lies entirely within the region of convergence of  $\tilde{x}(z)$  [38, 46].

For a causal time shift of the series by  $k_0 \in \mathbb{N}^+$  samples, i.e.,  $y_k = x_{k-k_0}$ , the z-transform of  $y_k$  results in

$$\tilde{\underline{y}}(z) = \sum_{k=-\infty}^{\infty} x_{k-k_0} \underline{z}^{-n} 
= \underline{z}^{-k_0} \sum_{k=-\infty}^{\infty} x_{k-k_0} \underline{z}^{-(k-k_0)} 
= z^{-k_0} \tilde{x}(z).$$
(A.13)

It is obvious that a time shift  $k_0$  of the sequence corresponds to a multiplication of  $\underline{z}^{-k_0}$  in z-domain. Lastly, a convolution in TD corresponds to a multiplication in z-domain, i.e.,

$$\mathcal{Z}\{x_k * y_k\} = \sum_{k=-\infty}^{\infty} (x_k * y_k) \underline{z}^{-k}$$

$$= \sum_{k=-\infty}^{\infty} \sum_{m=-\infty}^{\infty} x_m y_{k-m} \underline{z}^{-k}$$

$$= \sum_{m=-\infty}^{\infty} x_m \underline{z}^{-m} \sum_{k=-\infty}^{\infty} y_{k-m} \underline{z}^{-(k-m)}$$

$$= \underline{\tilde{x}}(\underline{z}) \tilde{y}(\underline{z}).$$
(A.14)

#### A.2.3 FIR Filters

As already mentioned some interesting properties of discrete-time systems can be shown with the help of the z-transform. The following considerations are restricted to the class of the FIR filters, since these are commonly used in DSP and in this thesis exclusively. When considering discrete-time LTI systems, the system can generally be represented by a linear difference equation [38, 46] using time-independent coefficients a and b as

$$\sum_{\nu=n_1}^{n_2} a_{\nu} y_{k-\nu} = \sum_{\mu=m_1}^{m_2} b_{\mu} x_{k-\mu} . \tag{A.15}$$

With eq. (A.13) the z-transform of the difference equation is

$$\sum_{\nu=n_1}^{n_2} a_{\nu} \underline{z}^{-\nu} \underline{\tilde{y}}(\underline{z}) = \sum_{\mu=m_1}^{m_2} b_{\mu} \underline{z}^{-\mu} \underline{\tilde{x}}(\underline{z}). \tag{A.16}$$

According to eq. (A.14), the z-transform of a system's output can be expressed as

$$\tilde{y}(z) = \tilde{\underline{x}}(z)\tilde{\underline{h}}(\underline{z}).$$
 (A.17)

Comparing eq. (A.17) with eq. (A.16), the system function of the filter can be described as

$$\tilde{\underline{h}}(\underline{z}) = \frac{\tilde{y}(z)}{\tilde{x}(\underline{z})} = \frac{\sum_{\mu=m_1}^{m_2} b_{\mu} \underline{z}^{-\mu}}{\sum_{\nu=n_1}^{n_2} a_{\nu} \underline{z}^{-\nu}}.$$
(A.18)

Following some assumptions explained in more detail in [38], the summation limits can be simplified to

$$\tilde{\underline{h}}(z) = \frac{\sum_{\mu=0}^{M} b_{\mu} z^{M-\mu}}{\sum_{\nu=0}^{M} a_{\nu} z^{M-\nu}}.$$
(A.19)

Hence, the system function  $\underline{\tilde{h}}(z)$  is described as a rational function  $\underline{\tilde{h}}(z) = \underline{P}(z)/Q(z)$  with the polynomial functions  $\underline{P}(z)$  and  $\underline{Q}(z)$ .

#### **Definition 3. Stability of LTI systems**

A discrete-time, causal LTI system S is stable, if all poles of the denominator polynomial  $\underline{Q}(\underline{z})$  lie within the unit circle, i.e.,  $|\underline{z}_{\infty,\nu}| < 1$  applies for each pole  $\underline{z}_{\infty,\nu}$  with index  $\nu$ .

For an FIR filter, the system output at a point in time is independent of past and future system outputs, i.e.,  $n_1 = n_2$ . Hence, eq. (A.15) simplifies to

$$a_{n_1} y_{k-n_1} = \sum_{\mu=m_1}^{m_2} b_{\mu} x_{k-\mu}$$
 (A.20)

with z-transform

$$a_{n_1} \underline{z}^{-n_1} \underline{\tilde{y}}(\underline{z}) = \sum_{\mu=m_1}^{m_2} b_{\mu} \underline{z}^{-\mu} \underline{\tilde{x}}(\underline{z}).$$
 (A.21)

Therefore, the system function from eq. (A.18) simplifies to

$$\tilde{\underline{h}}(z) = \frac{\tilde{y}(z)}{\tilde{x}(z)} = \frac{\sum_{\mu=m_1}^{m_2} b_{\mu} z^{-\mu}}{a_{n_1} z^{-n_1}}.$$
(A.22)

This expression can be further simplified by the fact that always the current output sample is computed, i.e.,  $n_1 = n_2 = 0$ . Furthermore, the coefficient  $b_{\mu}$  can be divided by  $a_0$  resulting in a new coefficient  $h_{\mu}$ . Since  $\tilde{h}(z)$  is a time-invariant

system, the summation limits can be shifted in time by  $m_1$ , such that  $M=m_2-m_1$ , and hence

$$\tilde{\underline{h}}(z) = \sum_{\mu=0}^{M} h_{\mu} z^{-\mu} .$$
(A.23)

As shown in eq. (A.13) the multiplication by  $z^{-1}$  corresponds to a time delay of one sampling period and  $h_{\mu}$  are the filter coefficients. Alternatively, eq. (A.23) would be obtained by directly z-transforming the impulse response of the filter. In addition, the filter structure can be constructed by convolving the input signal with the system impulse response, also referred to as multiply-and-add (MAC) operation

$$y_k = \sum_{\mu=0}^{M} x_{k-\mu} h_{\mu} . \tag{A.24}$$

The graphical structure of the filter is shown in Fig. A.1. In the graphical representation, the z-variable is shown without underline, as this is the usual presentation. The unit delay elements of the FIR filter structure can be realized by D-flip-flops,



Fig. A.1: The FIR filter of order M comprises M unit delays as well as M+1 filter coefficients. The weighted input samples x are accumulated and form the output signal y.

which hold the signal value for the duration of one clock cycle. A serial arrangement of unit delays is called a register, which is suitable for implementing a time delay or temporary memory. The values stored in the register are multiplied by the filter coefficients  $h_{\mu}$  and are then accumulated.

#### A.2.3.1 Linear-Phase FIR Filters

It can be easily shown, that the system function  $\tilde{\underline{h}}(z)$  for  $z=\exp(\mathrm{j}\,2\pi f T_\mathrm{sa})$  corresponds to the DTFT

$$\tilde{\underline{h}}(f) = \sum_{k=-\infty}^{\infty} h_k e^{-j 2\pi f k T_{sa}}$$

$$= \sum_{k=-\infty}^{\infty} h_k \left( e^{j 2\pi f T_{sa}} \right)^{-k}$$

$$= \tilde{\underline{h}} \left( \underline{z} = e^{j 2\pi f T_{sa}} \right).$$
(A.25)

The frequency response of a filter can therefore be determined through the DTFT/DFT or the z-transform. In general, the frequency response is complex-valued, i.e., it consists of a magnitude and phase

$$\underline{\tilde{h}}\left(z = e^{j \, 2\pi f T_{\text{sa}}}\right) = \left|\underline{\tilde{h}}\left(z = e^{j \, 2\pi f T_{\text{sa}}}\right)\right| e^{\varphi(f)} \tag{A.26}$$

with

$$\varphi(f) = \arg\left\{\tilde{\underline{h}}\left(\underline{z} = e^{j 2\pi f T_{sa}}\right)\right\}.$$
 (A.27)

The amplitude response  $|\underline{\tilde{h}}|$  describes the amplitude ratio of the input and output signal at a given frequency, whereas the phase response quantifies the phase shift introduced between the input and output signals at that frequency. The derivative of the spectral phase of a system with respect to the frequency denotes the group delay  $\tau_g(f)$  and describes the delay of a signal component in a differentially small frequency range  $\mathrm{d}f$  [38].

$$\tau_g(f) = -\frac{1}{2\pi} \frac{\mathrm{d}}{\mathrm{d}f} \varphi(f) \tag{A.28}$$

A spectral phase rotation causes a distortion of the output signal. Accordingly, a frequency-independent phase response of zero is desirable. However, such a phase response cannot be realized with a causal system [38]. A compromise is given by a linear phase response, which does not produce any phase distortion,

but only a time shift due to the constant group delay. Given a real-valued  $h_{\mu}$  with  $\mu \in \{0,...,M\}$  and even number M, the filter has linear phase if the impulse response is symmetric around the central M/2+1-th coefficient, i.e., it has an odd number M+1 of filter coefficients [38].

# **B** Multi-Rate System

The field of multi-rate signal processing addresses concepts, algorithms, and system architectures that involve a change of the sampling rate at one or multiple points of the signal flow path. Reducing the sampling rate eases the demands on the computational effort, as less sample points have to be processed. Increasing the sampling rate can be beneficial to avoid aliasing if nonlinear functions broaden the spectrum of the signal. The following chapter first analyses the periodicity of a discrete-time signal in FD and derives the sampling theorem therefrom. Based on this, the basic operations of a multi-rate system, namely downsampling and upsampling, are explained.

## **B.1** Sampling Theorem

A time-continuous signal x(t) is bandlimited within a bandwidth 2B if

$$\tilde{x}(f) = 0$$
 for  $|f| \ge B$ . (B.1)

The Fourier transform of the bandlimited, equidistantly sampled signal from eq. (A.2) is then

$$\tilde{\underline{x}}_{*}(f) = \tilde{\underline{x}}(f) * \mathcal{F} \left\{ \sum_{k=-\infty}^{\infty} \delta(t - kT_{\mathrm{sa}}) \right\}.$$
(B.2)

The multiplication of the time-continuous signal x(t) with the pulse train corresponds to a convolution in FD. The Fourier transform of the pulse train can be

simply obtained by rearranging the expression with help of the Poisson summation formula (a)

$$\mathcal{F}\left\{\sum_{k=-\infty}^{\infty} \delta(t - kT_{\mathrm{sa}})\right\} = \mathcal{F}\left\{f_{\mathrm{sa}} \sum_{k=-\infty}^{\infty} e^{\mathrm{j} \, 2\pi k f_{\mathrm{sa}} t}\right\}$$

$$\stackrel{\text{(a)}}{=} f_{\mathrm{sa}} \sum_{k=-\infty}^{\infty} \mathcal{F}\left\{e^{\mathrm{j} \, 2\pi k f_{\mathrm{sa}} t}\right\}$$

$$= f_{\mathrm{sa}} \sum_{k=-\infty}^{\infty} \delta\left(f - k f_{\mathrm{sa}}\right).$$
(B.3)

Inserting this expression into eq. (B.2) gives

$$\tilde{\underline{x}}_{*}(f) = \tilde{\underline{x}}(f) * f_{sa} \sum_{k=-\infty}^{\infty} \delta(f - kf_{sa})$$

$$= f_{sa} \int_{-\infty}^{\infty} \tilde{\underline{x}}(\nu) \sum_{k=-\infty}^{\infty} \delta(f - \nu - kf_{sa}) d\nu$$

$$= f_{sa} \sum_{k=-\infty}^{\infty} \tilde{\underline{x}}(f - kf_{sa}).$$
(B.4)

The Fourier transform of a pulse series in the TD corresponds to a pulse series in the FD. The convolution of the bandlimited, time-continuous signal spectrum  $\tilde{x}(f)$  with the pulse series results in a periodic repetition of the spectrum in intervals  $f_{\rm sa}$ . The DFT would correspond to a periodic, sampled spectrum. The spectral overlap of  $\tilde{x}(f)$  introduces a signal distortion, which is referred to as aliasing. To avoid aliasing, the highest frequency of the signal  $f_{\rm max}$  has to be less than half the sampling rate [38]

$$f_{\text{max}} < B \le \frac{f_{\text{sa}}}{2} \,. \tag{B.5}$$

This condition is also referred to the sampling theorem and has to be fulfilled if a definite reconstruction of the information contained by the sampled, analog signal is desired. The following sections show basic operations to ensure this requirement for a sample rate conversion.

## **B.2** Sampling Rate Down- & Upconversion

Sample rate decimation is the reduction of the sampling rate by an integer factor  $U_{\downarrow}$  by retaining every  $U_{\downarrow}$ -th sample while discarding the remaining samples. The process consists of two elements, the anti-aliasing lowpass filter followed by the sample rate downconversion. Downsampling of the discrete-time sequence  $x_k$  by a factor  $U_{\downarrow}$  decreases the sampling rate and corresponds to a convolution of eq. (B.4) by a second Dirac comb with a comb line spacing of  $f'_{\rm sa} = f_{\rm sa}/U_{\downarrow}$ . Considering eq. (B.4) the Fourier transform of the downsampled sequence  $y_m$  results in [53, 130]

$$\underbrace{\tilde{y}}_{*}(f) = \underline{\tilde{x}}(f) * f_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - k f_{\text{sa}}) * f'_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - k f'_{\text{sa}})$$

$$= \underline{\tilde{x}}(f) * f_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - k f_{\text{sa}}) * \frac{f_{\text{sa}}}{U_{\downarrow}} \sum_{k=-\infty}^{\infty} \delta(f - k \frac{f_{\text{sa}}}{U_{\downarrow}})$$

$$= \underbrace{f_{\text{sa}}}_{U_{\downarrow}} \sum_{m=-\infty}^{\infty} \underline{\tilde{x}} \left(f - m \frac{f_{\text{sa}}}{U_{\downarrow}}\right).$$
(B.6)

The signal spectra  $\tilde{x}(f)$  are now centered around the resulting sampling frequency  $f'_{\rm sa} = f_{\rm sa}/U_{\downarrow}$ . To avoid an overlapping of the spectra, an anti-aliasing lowpass filter with a cut-off frequency  $f_{\rm c} \leq f_{\rm sa}/(2U_{\downarrow})$  has to be applied before the downsampling.

Increasing the sampling rate by an integer factor  $U_{\uparrow}$  is similar to the case of a downconversion. The discrete signal is upsampled by inserting  $U_{\uparrow} - 1$  zeros between the samples. Afterwards, an interpolation lowpass has to be applied. The upsampling operation increases the sampling rate by inserting zeros in between the sample points, which corresponds to a convolution of eq. (B.4) by a second

Dirac comb with a comb line spacing of  $f'_{sa} = U_{\uparrow} f_{sa}$ . Eq. (B.4) can be formulated for the upsampled sequence  $y_m$  as [53, 130]

$$\underline{\tilde{y}}_{*}(f) = \underline{\tilde{x}}(f) * f_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - kf_{\text{sa}}) * f'_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - kf'_{\text{sa}})$$

$$= \underline{\tilde{x}}(f) * f_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - kf_{\text{sa}}) * U_{\uparrow} f_{\text{sa}} \sum_{k=-\infty}^{\infty} \delta(f - kU_{\uparrow} f_{\text{sa}})$$

$$= f_{\text{sa}} \sum_{m=-\infty}^{\infty} \underline{\tilde{x}}(f - mf_{\text{sa}}).$$
(B.7)

Despite increasing the sampling rate to  $f_{\rm sa}'$ , the spectral replica still repeat periodically with  $f_{\rm sa}$ . Due to this, the spectral images that are not multiples of the new sampling rate are redundant and alias into the first Nyquist zone. A lowpass filtering with a cut-off frequency greater than the signal bandwidth but less then  $f_{\rm sa}$  – B and an amplification of  $U_{\uparrow}$  in the passband has to be applied. This filter is commonly referred to as interpolation filter, since it fills up the amplitude of the zero-inserted samples.

# C Joint Non-Data-Aided Clock Recovery for Space-Division Multiplexed Optical Transmission Systems

This chapter has been published as Appendix of [J2]. The material from the publication has been adapted to comply with the layout and the structure of this thesis.

# C.1 Determinant Complexity Using the Modified Gram-Schmidt Factorization

The modified Gram-Schmidt factorization is a common way to implement the QR factorization on hardware [118, 119]. By QR factorizing the matrix  $\tilde{\mathbf{M}} = \mathbf{Q}\mathbf{R}$  into an orthonormal matrix  $\mathbf{Q}$  and an upper triangular matrix  $\mathbf{R}$ , the calculation of the determinant can be simplified, since  $\det(\tilde{\mathbf{M}}) = \det(\mathbf{Q}) \det(\mathbf{R}) = \pm 1 \prod_{i=1}^{D} r_{i,i}$ . Algorithm 1 shows the procedure to find  $\mathbf{Q}$  and  $\mathbf{R}$  with the required number of complex operations given as comments.

#### Algorithmus 1: Gram-Schmidt Factorization

```
\begin{split} & \underline{\tilde{\mathbf{M}}} \colon D \times D \text{ Spectral correlation matrix} \\ & \underline{\mathbf{Q}} = \underline{\tilde{\mathbf{M}}} \\ & \underline{\tilde{\mathbf{R}}} = \mathbf{I} \\ \text{// } D \text{ times} \\ & \mathbf{for } i = 1 \text{ to } D \text{ do} \\ & & \underline{r}_{i,i} = \operatorname{sqrt}(\underline{\mathbf{q}}_{:,i}^{\dagger}\underline{\mathbf{q}}_{:,i}) \\ & & \underline{\mathbf{q}}_{:,i} = \underline{\mathbf{q}}_{:,i}^{\dagger}/\underline{r}_{i,i} \\ \text{// } D \text{ divisions} \\ & \underline{\mathbf{q}}_{:,i} = \underline{\mathbf{q}}_{:,i}^{\dagger}/\underline{r}_{i,i} \\ \text{// } D \text{ divisions} \\ & & \underline{r}_{i,j} = \underline{\mathbf{q}}_{:,j}^{\dagger}\underline{\mathbf{q}}_{:,i} \\ & \underline{\mathbf{q}}_{:,j} = \underline{\mathbf{q}}_{:,j}^{\star} - \underline{r}_{i,j}\underline{\mathbf{q}}_{:,i} \\ & \underline{\mathbf{m}} \end{split}
```

With  $\sum_{k=1}^{D} k = D(D+1)/2$  we find the number of complex multiplications and additions required for the QR factorization to be

$$C_{\text{det,comp.x}} = \sum_{k=1}^{D} 2D + 2D(D - k)$$

$$= D^{3} + D^{2}$$

$$C_{\text{det,comp.+}} = \sum_{k=1}^{D} D - 1 + (D - k)(2D - 1)$$

$$= D^{3} - \frac{1}{2}D^{2} - \frac{1}{2}D.$$
(C.1)

After QR factorization, the determinant is mainly given by the product of the diagonal elements in  $\mathbf{R}$ , which involves D-1 additional complex multiplications. This gives

$$C_{\text{det,x}} = 3D^3 + 3D^2 + 3D - 3$$
  
 $C_{\text{det,+}} = 5D^3 + 5D^2 + 2D - 3$  (C.2)

real-valued multiplications and additions and finally leads to eq. (7.23) considering the multiplications realized using the shift-add algorithm.

# Glossary

#### List of Abbreviations

ADC Analog-to-digital converter

AI Artificial intelligence

ASE Amplified spontaneous emission

ASIC Application-specific integrated circuit

AWG Arbitrary waveform generater

AWGN Additive white Gaussian noise

Btb Back-to-back
BER Bit-error ratio

BPD Balanced photodetector
CD Chromatic dispersion
CDR Clock and data recovery
CFO Clock frequency offset

CMOS Complementary metal-oxide semiconductor

CP Charge pump

CSNR Constellation signal-to-noise ratio

CSNDR Constellation signal-to-noise-and-distortion ratio
CV-QKD Continuous-variable quantum key distribution

DAC Digital-to-analog converter

DC Datacenter

DCI Datacenter interconnect

DD Decision-directed

DFB Distributed-feedback

DFT Discrete Fourier transform

DGD Differential group delay

DIT Decimation in time

DP-IQM Dual-polarization inphase and quadrature modulator

DP-OH Dual-polarization optical hybrid

DSP Digital signal processing

DTFT Discrete-time Fourier transform

DV-QKD Discrete-variable quantum key distribution

EAM Electro-absorption modulator

EB Elastic buffer

ECL External-cavity laser

EDFA Erbium-doped fiber amplifier

ER Extinction ratio

FB Feedback

FD Frequency domain

FEC Forward error correction

FF Feedforward

FFT Fast Fourier Transform

FIFO First-in first-out

FIR Finite impulse response

FPGA Field-programmable gate array

FTN Faster-than-Nyquist

IIR Infinite impulse response

IM/DD Intensity modulation and direct detection

IoT Internet of things

IQM Inphase and quadrature modulator

ISI Intersymbol interference

LDPC Low-density parity-check code

LF Loop filter

LMS Least-mean-square
LO Local oscillator
LPF Lowpass filter

LSB Least-significant bit
LTI Linear time-invariant

LUT Look-up table

MA Moving average

MAC Multiply-and-add

MCF Multi-core fiber

MD Modal dispersion

MDL

MIMO Multiple-input multiple-output

Mode-dependent loss

ML Machine learningMMF Multi-mode fiberMSB Most-significant bitMSE Mean-squared error

MMSE Minimum-mean-squared error MZM Mach-Zehnder modulator

NDA non-data-aided

NCO Numerical-controlled oscillator

NRZ Non-return-to-zero
OOK On-off keying

ONU Optical network unit
OLT Optical line terminal

PAM2 Two-level pulse amplitude modulation
PAM4 Four-level pulse amplitude modulation
PAM8 Eight-level pulse amplitude modulation

PD Phase detector PF Power fading

PI Proportional-integral

PIN Positive intrinsic negative

PLL Phase-locked loop

PMD Polarization mode dispersion

PON Passive optical network

ppm Parts-per-million

PRBS Pseudo-random binary sequence

PSD Power spectral density
QKD Quantum key distribution

16-QAM 16-level quadrature amplitude modulation

RAM Random-access memory

RC Raised-cosine

RC-4CF Randomly-coupled 4-core fiber

RC-MCF Randomly-coupled multi-core fiber

RF Radio frequency
RMS Root-mean-square
RRC Root-raised-cosine

Rx Receiver

SDM Spatial-division multiplexing

SMF Single-mode fiber

SNDR Signal-to-noise-and-distortion ratio

SNR Signal-to-noise ratio

SOA Semiconductor optical amplifier

SoC System-on-a-chip
TD Time domain

TDM Time-division multiplexing

TDMA Time-division multiple access (TDMA)

TE Timing estimator

TED Timing error detector

TIA Transimpedance amplifier

VCO Voltage-controlled oscillator

VOA Variable optical attenuator

WDM Wavelength-division multiplexing

WSS Wavelength-selective switch

## **List of Mathematical Symbols**

#### **Uppercase Latin Symbols**

J Jitter (VCO or detector)

B Single-sided signal bandwidth

 $B_{\rm L}$  Control loop bandwidth

 ${\cal C}$  Complexity specified in number of real-valued additions

D, K Number of coupled/uncoupled channels

Propagation matrix

 $D_{\rm cd}$  Chromatic dispersion coefficient  $D_{\rm L}$  Feedback delay in a control loop

 $E_{\mathrm{tx}}$  Transmitted optical field

I Identity matrixL Fiber length

 $ilde{\mathbf{M}}$  Spectral correlation matrix

M Polynomial degree  $M_{\rm B}$  Number of symbols

Number of elements (FFT or algorithm)

 $N_{\rm bit}$  Bit width

 $N_{\rm L}$  Number of discrete time delays

 $N_{\rm o}$  Number of samples that overlap for timing acquisition

 $N_{\mathrm{tap}}$  Number of filter taps

P Number of parallel samples processed in DSP/

Downconversion factor from VCO frequency to lower frequency

 $P_0$  Mean optical power

 $P_{\rm tx}, P_{\rm rx}$  Transmitted/received optical power

 $R_a$  Autocorrelation function of a random process a

S<sub>a</sub> PSD of the autocorrelation function of a random process a

 $T_{
m sa}$  Sampling period  $T_{
m sym}$  Symbol period

 $U_{\uparrow}, U_{\downarrow}$  Integer up-/downsampling factor

 $\tilde{\mathbf{U}}, \tilde{\mathbf{V}}$  Unitary coordinate transformation matrices

V(t) VCO voltage

#### **Lowercase Latin Symbols**

a Zero-mean, cyclostationary random process

 $a(t), a_m$  Continuous-/discrete-time symbol

 $a_{\mathrm{mod}}$  Sinusoidal phase modulation amplitude

a<sub>n</sub> Noise amplitude

a,b In appendix: coefficients

c Constant coefficients

 $c_{\rm d}$  Loop filter timing error detector sensitivity  $c_{\rm i}, c_{\rm p}$  Loop filter integral/proportional coefficient

f Frequency

 $f_0$  Center frequency  $f_{3dB}$  3-dB bandwidth  $f_c$  Cutoff frequency

 $f_{lo}$  Local oscillator frequency

 $f_m$  Frequency nulls caused by power fading

 $f_{\rm max}$  Maximum frequency

 $f_{
m mod}$  Sinusoidal phase modulation frequency

 $f_{
m sa}$  Sampling rate  $f_{
m sym}$  Symbol rate

 $\Delta f$  Frequency difference h, c, p, g System responses

 $h_{\rm cd}$  Chromatic dispersion response

 $h_{\rm pf}$  Power fading function

k Discrete-time element index

 $k_0$  Discrete-time delay

l Lagrange interpolator coefficients

m Integer sampling delay

n Additive white Gaussian noisen In subscript: frequency bin index

 $r^{(d)}(t)$  d-th time-continuous, received signal

s Upsampled symbol sequence  $s_{\rm tx}$  Electrical transmit signal

x, y, z Signals

## **Greek Symbols**

 $\alpha$  Power series coefficient

 $\alpha_{\rm c}$  Number of additions per variable-fixed-value multiplication

 $\beta_{\rm c}$  Number of additions per variable-variable-value multiplication

 $\varepsilon$  Timing error

 $\epsilon$  Nonlinear interference terms  $\zeta_{\rm L}$  Control loop damping factor

 $\eta_{\rm os}$  Oversampling ratio

 $\vartheta$  Speed of light

 $\lambda$  Wavelength

 $\Lambda$  Channel group delay matrix  $\mu$  Fractional sampling delay

 $\mu_{\rm eq}$  Equalizer step size

 $\nu$  Number of frequency pairs

 $\rho$  Roll-off factor

 $\sigma$  Standard deviation

au Overall sampling offset

 $\hat{\tau}$  Estimated overall sampling offset

 $au_{
m g}$  Group delay

 $au_{
m tx,rx}(t)$  Sampling phase offset between transmitter and receiver clock

 $\varphi$  Spectral phase  $\varphi_n$  Phase noise

 $\rho, \gamma, \xi, \zeta$  Auxiliary variables/signals

 $\omega,\Omega$  Angular frequency/Normalized angular frequency

 $\omega_{\rm n}$  Control loop natural frequency

### **Signal Annotation and Accents**

 $x_k$  Real value in time domain

 $\underline{x}_k$  Complex value in time domain

 $\tilde{x}_n$  Complex value in frequency domain

 $\overline{x}$  Average of x

 $\langle x \rangle$  Ensemble average of x  $\hat{x}$  Symbol decision of x

x VectorX Matrix

#### **Mathematical Symbols**

 $\mathcal{CN}$  Complex-valued normal distribution

e Euler's number, 2.71828 ···

 $E\{\cdot\}$  Expected value

 $\mathcal{F}\{\cdot\}$  Fourier transform

 $\mathcal{I}\{\cdot\}$  Imaginary part j Imaginary unit

 $\mathcal{L}, \mathcal{L}_{\mathrm{dB}} \qquad \quad \text{Carrier-to-modulation ratio within a bandwidth of 1 Hz}$ 

 $\mathcal{N}$  Normal distribution

 $\mathcal{O}(\cdot)$  Big-O notation

 $\mathcal{R}\{\cdot\}$  Real part

 $\mathcal{S}\{\cdot\}$  Time-discrete system operator

#### **Mathematical Functions**

 $\delta(\cdot)$  Dirac delta function

 $arg\{\cdot\}$  Phase from the interval  $-\pi$  to  $\pi$  of a complex value

| $\cos(\cdot)$                    | Cosine function                                                         |
|----------------------------------|-------------------------------------------------------------------------|
| $\det(\cdot)$                    | Determinant of a matrix                                                 |
| $\exp(\cdot)$                    | Natural exponential function                                            |
| $\log_x(\cdot)$                  | Logarithm with base $x$                                                 |
| $\max(\cdot)$                    | Maximum element of a vector                                             |
| $\min(\cdot)$                    | Minimum element of a vector                                             |
| $\operatorname{mod}_1\{\cdot\}$  | Modulo-1 operation                                                      |
| $\mathrm{sgn}(\cdot)$            | Sign function                                                           |
| $\sin(\cdot)$                    | Sine function                                                           |
| $\mathrm{sinc}(\cdot)$           | Normalized sinc function $\operatorname{sinc}(x) = \sin(\pi x)/(\pi x)$ |
| $\mathrm{std}(\cdot)$            | Standard deviation                                                      |
| $\mathrm{tr}(\cdot)$             | Trace of a matrix                                                       |
| $\operatorname{unwrap}\{\cdot\}$ | Function to unwrap jumps between consecutive samples                    |
| $\mathrm{var}(\cdot)$            | Variance                                                                |
| $\lfloor \cdot \rfloor$          | Round to next lower integer                                             |

# **Bibliography**

- [1] L. Huang, D. Wang, A. P. T. Lau, C. Lu, and S. He, "Performance analysis of blind timing phase estimators for digital coherent receivers", *Opt. Express* **22**(6), pp. 6749–6763 (Mar. 2014). DOI: 10.1364/0E.22.006749.
- [2] D. Schmidt and B. Lankl, "Parallel architecture of an all digital timing recovery scheme for high speed receivers", in *Int. Symp. on Commun. Syst., Netw. & Digital Signal Proc.*, (Jul. 2010), pp. 31–34. DOI: 10. 1109/CSNDSP16145.2010.5580466.
- [3] D. Mello and F. Barbosa, *Digital Coherent Optical Systems: Architecture and Algorithms*. Springer International Publishing, Jan. 2021, ISBN: 978-3-030-66540-1. DOI: 10.1007/978-3-030-66541-8.
- [4] H. Sun and K.-T. Wu, "Timing synchronization in coherent optical transmission systems", in *Enabling Technologies for High Spectral-Efficiency Coherent Optical Communication Networks*. John Wiley & Sons, Ltd, 2016, ch. 10, pp. 355–394, ISBN: 9781119078289. DOI: 10.1002/9781119078289.ch10.
- [5] Optical Internetworking Forum (OIF), "OIF-400ZR-02.0: Implementation Agreement 400ZR", Tech. Rep., Nov. 2022.
- [6] Multi-Source Agreement, "Open ZR+ MSA", Tech. Rep., Sep. 2023.
- [7] M. Kuschnerov, F. N. Hauske, K. Piyawanno, B. Spinnler, M. S. Alfiad, A. Napoli, and B. Lankl, "DSP for coherent single-carrier receivers", *J. Lightw. Technol.* 27(16), pp. 3614–3622 (Aug. 2009). DOI: 10.1109/ JLT.2009.2024963.

- [8] N. Kaneda, D. van Veen, A. Mahadevan, and V. Houtsma, "DSP for 50G/100G hybrid modulated TDM-PON", in *Proc. Eur. Conf. Opt. Commun.*, (Dec. 2020), pp. 1–4. DOI: 10.1109/EC0C48923.2020. 9333248.
- [9] D. van Veen and V. Houtsma, "Real-time validation of downstream 50G/25G and 50G/100G flexible rate PON based on Miller encoding, NRZ, and PAM4 modulation", J. Opt. Commun. Netw. 15(8), pp. C147– C154 (Aug. 2023). DOI: 10.1364/JOCN.483159.
- [10] N. Iiyama, M. Fujiwara, T. Kanai, H. Suzuki, J.-i. Kani, and J. Terada, "Clock conversion for burst-mode digital coherent QPSK receivers in a pon upstream transmission with a 100-ppm clock mismatch", *Opt. Express* **29**(2), pp. 1265–1274 (Jan. 2021). DOI: 10.1364/0E.410522.
- [11] M. Verbeke, P. Rombouts, H. Ramon, J. Verbist, J. Bauwelinck, X. Yin, and G. Torfs, "A 25 Gb/s all-digital clock and data recovery circuit for burst-mode applications in PONs", *J. Lightw. Technol.* 36(8), pp. 1503–1509 (Apr. 2018). DOI: 10.1109/JLT.2017.2784848.
- [12] K. Clark, D. Cletheroe, T. Gerard, I. Haller, K. Jozwik, K. Shi, B. Thomsen, H. Williams, G. Zervas, H. Ballani, P. Bayvel, P. Costa, and Z. Liu, "Synchronous subnanosecond clock and data recovery for optically switched data centres using clock phase caching", *Nat. Electron.* 3 (Jul. 2020). DOI: 10.1038/s41928-020-0423-y.
- [13] C. Valjus and R. Wolf, "Comparison of timing recovery algorithms for optical feeder links", in *Photonic Networks, ITG Symposium*, (May 2023), pp. 1–5.
- [14] C. Valjus, R. Wolf, and J. Poliak, "Review and analysis of digital signal processing algorithms for coherent optical satellite links", *Int. J. Satell. Commun. Netw.* (Feb. 2025). DOI: 10.1002/sat.1553.
- [15] C. R. S. Fludger, T. Duthel, P. Hermann, and T. Kupfer, "Jitter tolerant clock recovery for coherent optical receivers", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2013), pp. 1–3. DOI: 10.1364/0FC.2013.0Th1F.3.

- [16] C. H. Bennett and G. Brassard, "Quantum cryptography: Public key distribution and coin tossing", *Theor. Comput. Sci.* **560**, pp. 7–11 (Dec. 2014), ISSN: 0304-3975. DOI: 10.1016/j.tcs.2014.05.025.
- [17] F. Grosshans and P. Grangier, "Continuous variable quantum cryptography using coherent states", *Phys. Rev. Lett.* 88, p. 057 902 5 (Jan. 2002). DOI: 10.1103/PhysRevLett.88.057902.
- [18] S. Kleis, M. Rueckmann, and C. G. Schaeffer, *Continuous-variable quantum key distribution with a real local oscillator and without auxiliary signals*, 2019. arXiv: 1908.03625 [quant-ph].
- [19] K. Kouznetsov and R. Meyer, "Phase noise in LC oscillators", *IEEE J. Solid-State Circuits* 35(8), pp. 1244–1248 (Aug. 2000). DOI: 10.1109/4.859518.
- [20] T. Lee and A. Hajimiri, "Oscillator phase noise: A tutorial", *IEEE J. Solid-State Circuits* 35(3), pp. 326–336 (Mar. 2000). DOI: 10.1109/4.826814.
- [21] C. McNeilage, E. Ivanov, P. Stockwell, and J. Searls, "Review of feedback and feedforward noise reduction techniques", in *Proc. IEEE Int. Freq. Control Symp.*, (May 1998), pp. 146–155. DOI: 10.1109/FREQ.1998.717897.
- [22] M. Aqamolaei and M. M. Ahmadi, "Modelling an oscillator phase noise whose spectral density contains both 1/f2 and 1/f3 regions", *Electron. Lett.* **58**(4), pp. 139–141 (Dec. 2021). DOI: 10.1049/ell2.12393.
- [23] A. K. Poddar, U. L. Rohde, and A. M. Apte, "How low can they go?: Oscillator phase noise model, theoretical, experimental validation, and phase noise measurements", *IEEE Microw. Mag.* **14**(6), pp. 50–72 (Aug. 2013). DOI: 10.1109/MMM.2013.2269859.
- [24] International Telecommunication Union Telecommunication Standardization Sector (ITU-T), "ITU-T G.783: Characteristics of synchronous digital hierarchy (SDH) equipment functional blocks", Tech. Rep., Mar. 2006.

- [25] International Telecommunication Union Telecommunication Standardization Sector (ITU-T), "ITU-T G.825: The control of jitter and wander within digital networks which are based on the synchronous digital hierarchy (SDH)", Tech. Rep., Mar. 1999.
- [26] International Telecommunication Union Telecommunication Standardization Sector (ITU-T), "ITU-T G.8251: The control of jitter and wander within the optical transport network (OTN)", Tech. Rep., Nov. 2022.
- [27] X. Zhou and C. Xie, Enabling Technologies for High Spectral-Efficiency Coherent Optical Communication Networks, English. Hoboken, NJ, USA: John Wiley & Sons, 2016, ISBN: 9781118714768.
- [28] S. Ahmed and T. Kwasniewski, "Overview of oversampling clock and data recovery circuits", in *Canadian Conf. on Electrical and Computer Engineering*, (May 2005), pp. 1876–1881. DOI: 10.1109/CCECE.2005. 1557348.
- [29] M.-T. Hsieh and G. E. Sobelman, "Architectures for multi-gigabit wire-linked clock and data recovery", *IEEE Circuits Syst. Mag.* 8(4), pp. 45–57 (Dec. 2008). DOI: 10.1109/MCAS.2008.930152.
- [30] M. Verbeke, "Low-power subsampling all-digital clock and data recovery techniques for multi-gigabit passive optical networks", Ph.D. dissertation, Ghent University, Jan. 2018, ISBN: 9789463550888.
- [31] X. Zhou, "Efficient clock and carrier recovery algorithms for single-carrier coherent optical systems: A systematic review on challenges and recent progress", *IEEE Signal Process. Mag.* **31**(2), pp. 35–45 (Feb. 2014). DOI: 10.1109/MSP.2013.2281071.
- [32] K.-T. Wu and H. Sun, "Frequency-domain clock phase detector for Nyquist WDM systems", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2014), pp. 1–3. DOI: 10.1364/0FC.2014.Th3E.2.

- [33] F. N. Hauske, N. Stojanovic, C. Xie, and M. Chen, "Impact of optical channel distortions to digital timing recovery in digital coherent transmission systems", in *Int. Conf. Transparent Opt. Netw.*, (Jun. 2010), pp. 1–4. DOI: 10.1109/ICTON.2010.5549082.
- [34] N. Stojanovic, F. N. Hauske, C. Xie, and M. Chen, "Clock recovery in coherent optical receivers", in *Photonic Networks, ITG Symposium*, (May 2011), pp. 1–4.
- [35] A. Josten, B. Baeuerle, E. Dornbierer, J. Boesser, D. Hillerkuss, and J. Leuthold, "Modified Godard timing recovery for non integer oversampling receivers", *Appl. Sci.* **7**(7) (Jun. 2017), ISSN: 2076-3417.
- [36] M. Oerder and H. Meyr, "Digital filter and square timing recovery", *IEEE Trans. Commun.* **36**(5), pp. 605–612 (May 1988). DOI: 10.1109/26.1476.
- [37] C. Lanczos and B. Gellai, "Fourier analysis of random sequences", *Computers & Mathematics with Applications* **1**(3), pp. 269–276 (1975), ISSN: 0898-1221. DOI: 10.1016/0898-1221(75)90025-5.
- [38] F. Puente León, H. Jäkel, and U. Kiencke, *Signale und Systeme* (De Gruyter Studium), ger, 6. edition. Berlin and Boston: De Gruyter Oldenbourg, 2015, 542 pp., ISBN: 978-3-11-040385-5.
- [39] W. Thomson, "Delay networks having maximally flat frequency characteristics", *Proc. Inst. Electr. Eng., Part 3* **96**, pp. 487–490 44 (Nov. 1949). DOI: 10.1049/pi-3.1949.0101.
- [40] S. Barton and Y. Al-Jalili, "A symbol timing recovery scheme based on spectral redundancy", in *IEE Colloquium on Advanced Modulation and Coding Techniques for Satellite Communications*, (Jan. 1992), pp. 3/1–3/6.
- [41] N. Kaneda, A. B. Leven, and S. Weisser, "Symbol timing recovery in polarization division multiplexed coherent optical transmission system", US 8,655,191 B2, Feb. 2014.

- [42] D. Godard, "Passband timing recovery in an all-digital modem receiver", *IEEE Trans. Commun.* **26**(5), pp. 517–523 (May 1978). DOI: 10.1109/TCOM.1978.1094107.
- [43] F. Gardner, "A BPSK/QPSK timing-error detector for sampled receivers", *IEEE Trans. Commun.* **34**(5), pp. 423–429 (May 1986). DOI: 10.1109/TCOM.1986.1096561.
- [44] W.-P. Zhu, Y. Yan, M. Ahmad, and M. Swamy, "Feedforward symbol timing recovery technique using two samples per symbol", *IEEE Trans. Circuits Syst. I* **52**(11), pp. 2490–2500 (Nov. 2005). DOI: 10.1109/TCSI.2005.853902.
- [45] W. Lindsey and M. Simon, *Telecommunication Systems Engineering* (Dover Books on Electrical Engineering). Dover Publications, 1991, ISBN: 9780486668383.
- [46] M. Rice, *Digital communications*, *A discrete-time approach* (Pearson international edition), eng. Upper Saddle River, New Jersey: Pearson/Prentice Hall Pearson Education International, 2009, 778 pp., ISBN: 0138138222.
- [47] K. Müller and M. Müller, "Timing recovery in digital synchronous data receivers", *IEEE Trans. Commun.* 24, pp. 516–531 5 (May 1976), ISSN: 00906778. DOI: 10.1109/TCOM.1976.1093326.
- [48] F. M. Gardner, *Demodulator reference recovery techniques suited for digital implementation*, Final Report, European Space Agency, Aug. 1988.
- [49] R. E. Crochiere and L. R. Rabiner, *Multirate Processing of Digital Signals*, 4. edition. Prentice-Hall, 1983, ISBN: 0-13-605162-6.
- [50] C. W. Farrow, "A continuously variable digital delay element", in *Int. Symposium on Circuits and Syst.*, (Jun. 1988), pp. 2641–2645. DOI: 10.1109/ISCAS.1988.15483.
- [51] R. W. Schafer and L. R. Rabiner, "A digital signal processing approach to interpolation", *Proc. IEEE* 61, pp. 692–702 6 (Jun. 1973), ISSN: 0018-9219. DOI: 10.1109/PROC.1973.9150.

- [52] U. Mengali and A. N. D'Andrea, *Synchronization techniques for digital receivers* (Applications of communications theory), eng. New York: Plenum Press, 1997, 520 pp., ISBN: 0-306-45725-3.
- [53] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital communication receivers: Synchronization, channel estimation, and signal processing (Wiley series in telecommunications and signal processing), eng. Hoboken, NJ: Wiley-Interscience, 2001, ISBN: 0-471-50275-8. DOI: 10.1002/0471200573.
- [54] F. M. Gardner, "Interpolation in digital modems. I. fundamentals", *IEEE Trans. Commun.* 41, pp. 501–507 3 (Mar. 1993), ISSN: 00906778. DOI: 10.1109/26.221081.
- [55] J. Vesma, M. Renfors, and J. Rinne, "Comparison of efficient interpolation techniques for symbol timing recovery", in *Proc. Global Telecommun. Conf.*, (Nov. 1996), pp. 953–957, ISBN: 0-7803-3336-5. DOI: 10.1109/ GLOCOM. 1996. 586008.
- [56] G. Oetken, "A new approach for the design of digital interpolating filters", *IEEE Trans. Acoust., Speech, Signal Process.* **27**(6), pp. 637–643 (Dec. 1979). DOI: 10.1109/TASSP.1979.1163316.
- [57] L. Erup, F. M. Gardner, and R. A. Harris, "Interpolation in digital modems. II. implementation and performance", *IEEE Trans. Commun.* 41, pp. 998–1008 6 (Jun. 1993), ISSN: 00906778. DOI: 10.1109/ 26.231921.
- [58] F. Puente León, Messtechnik, Systemtheorie für Ingenieure und Informatiker, ger, 10. edition. Berlin and Heidelberg: Springer Vieweg, 2015, Puente León, Fernando, Puente León, Fernando, ISBN: 978-3-662-44820-5. DOI: 10.1007/978-3-662-44821-2.
- [59] F. Gardner, *Phaselock techniques*, 3rd ed. John Wiley & Sons, Inc., 2005.

- [60] D. Zibar, A. Bianciotto, Z. Wang, A. Napoli, and B. Spinnler, "Analysis and dimensioning of fully digital clock recovery for 112 gb/s coherent polmux QPSK systems", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2009), pp. 1–2.
- [61] L. Barletta, M. Magarini, F. Scardoni, and A. Spalvieri, "Impact of loop delay on the performance of Gardner timing recovery", *IEEE Photon. Technol. Lett.* 25(18), pp. 1797–1800 (Sep. 2013). DOI: 10.1109/LPT. 2013.2276412.
- [62] J. Wilson, A. Nelson, and B. Farhang-Boroujeny, "Parameter derivation of type-2 discrete-time phase-locked loops containing feedback delays", *IEEE Trans. Circuits Syst. II* 56(12), pp. 886–890 (Dec. 2009). DOI: 10.1109/TCSII.2009.2034197.
- [63] F. Scardoni, M. Magarini, and A. Spalvieri, "Impact of self noise on tracking performance of non-data-aided digital timing recovery", *J. Lightw. Technol.* 33(18), pp. 3755–3762 (Jul. 2015). DOI: 10.1109/JLT.2015. 2452313.
- [64] N. Stojanovic, Y. Zhao, and C. Xie, "Feed-forward and feedback timing recovery for Nyquist and faster than Nyquist systems", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2014), pp. 1–3. DOI: 10.1364/OFC.2014.Th3E. 3.
- [65] **P. Matalla**, J. Dittmer, M. S. Mahmud, C. Koos, and S. Randel, *Elastic buffer design for real-time all-digital clock recovery enabling free-running receiver clock with negative and positive clock frequency offsets, 2025. arXiv: 2507.13748 [eess.SP]. [Online]. Available: https://arxiv.org/abs/2507.13748.*
- [66] P. Maniotis and D. M. Kuchta, "Exploring the benefits of using co-packaged optics in data center and AI supercomputer networks: A simulation-based analysis [invited]", *J. Opt. Commun. Netw.* 16(2), A143–A156 (Feb. 2024). DOI: 10.1364/JOCN.501427.

- [67] S. J. B. Yoo, "New trends in photonic switching and optical networking architectures for data centers and computing systems [invited]", *J. Opt. Commun. Netw.* 15(8), pp. C288–C298 (Aug. 2023). DOI: 10.1364/ JOCN.484577.
- [68] S. T. Le, G. De Valicourt, P. Pupalaikis, R. Giles, M. Lamponi, L. Elsinger, S. Liu, B. Sawyer, J. Proesel, E. Ho, K. Liu, G. Homsey, J. Lopez, Z. Zhu, S. Corteselli, L. Alloin, C. Daunt, M. Ferriss, B. Rahmani, F. Warning, A. Bruno, S. Abbaslou, M. Zaman, Z. Pan, G. Fischer, P. Haigh, G. Reichert, A. Gazman, F. Fesharaki, and P. Winzer, "1.6-Tbps low-power linear-drive high-density optical interface (HDI/O) for ML/AI", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2024), pp. 1–3.
- [69] M. Jacques, Z. Xing, A. Samani, X. Li, E. El-Fiky, S. Alam, O. Carpentier, P.-C. Koh, and D. V. Plant, "Net 212.5 Gbit/s transmission in o-band with a SiP MZM, one driver and linear equalization", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2020), pp. 1–3.
- [70] X. Pang, T. Salgals, H. Louchet, D. Che, M. Gruen, Y. Matsui, T. Dippon, R. Schatz, M. Joharifar, B. Krüger, F. Pittala, Y. Fan, A. Udalcovs, L. Zhang, X. Yu, S. Spolitis, V. Bobrovs, S. Popov, and O. Ozolins, "200 Gb/s optical-amplifier-free IM/DD transmissions using a directly modulated Oband DFB+R laser targeting LR applications", *J. Lightw. Technol.* 41(11), pp. 3635–3641 (Jun. 2023). DOI: 10.1109/JLT.2023.3261421.
- [71] X. Zhou, C. F. Lam, R. Urata, and H. Liu, "State-of-the-art 800G/1.6T datacom interconnects and outlook for 3.2T", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2023), pp. 1–3. DOI: 10.1364/0FC.2023.W3D.1.
- [72] P. Zhu, Y. Yoshida, K. Akahane, and K.-i. Kitayama, "High-speed reachextended IM-DD system with low-complexity DSP for 6G fronthaul", *J. Opt. Commun. Netw.* **16**(1), A24–A32 (Jan. 2024).
- [73] International Telecommunication Union Telecommunication Standardization Sector (ITU-T), "ITU-T G.9804.3: 50-Gigabit-capable passive

- optical networks (50G-PON): Physical media dependent (PMD) layer specification", Tech. Rep., Mar. 2024.
- [74] R. Bonk and E. Harstead, "The road towards 100G and 200G-passive optical networks", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2024).
- [75] N. Stojanovic, F. Karinou, Z. Qiang, and C. Prodaniuc, "Volterra and Wiener equalizers for short-reach 100G PAM-4 applications", *J. Lightw. Technol.* **35**(21), pp. 4583–4594 (Nov. 2017).
- [76] D. Wang, M. Qiao, K. Lian, and Z. Li, "CD and PMD effect on cyclostationarity-based timing recovery for optical coherent receivers", *J. Lightw. Technol.* 41(8), pp. 2405–2412 (Apr. 2023), ISSN: 1558-2213. DOI: 10.1109/jlt.2023.3235048.
- [77] J. Wang and K. Petermann, "Small signal analysis for dispersive optical fiber communication systems", *J. Lightw. Technol.* **10**(1), pp. 96–100 (Jan. 1992). DOI: 10.1109/50.108743.
- [78] M. Chagnon, "Optical communications for short reach", J. Lightw. Technol. 37(8), pp. 1779–1797 (Apr. 2019). DOI: 10.1109/JLT.2019. 2901201.
- [79] H. Meyr, M. Moeneclaey, and S. A. Fechtel, *Digital Communication Receivers: Synchronization, Channel Estimation and Signal Processing*, English. New York, NY, USA: John Wiley & Sons, 1998, ISBN: 0-471-20057-3.
- [80] S. J. Lee, "A new non-data-aided feedforward symbol timing estimator using two samples per symbol", *IEEE Commun. Lett.* **6**(5), pp. 205–207 (May 2002). DOI: 10.1109/4234.1001665.
- [81] M. Yan, Z. Tao, L. Dou, L. Li, Y. Zhao, T. Hoshida, and J. C. Rasmussen, "Digital clock recovery algorithm for Nyquist signal", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2013), pp. 1–3.

- [82] N. Stojanovic, B. Mao, and Y. Zhao, "Digital phase detector for Nyquist and faster than Nyquist systems", *IEEE Commun. Lett.* **18**(3), pp. 511–514 (Mar. 2014). DOI: 10.1109/LCOMM.2014.012314.132364.
- [83] Y. Gu, S. Cui, C. Ke, K. Zhou, and D. Liu, "All-digital timing recovery for free space optical communication signals with a large dynamic range and low OSNR", *IEEE Photonics J.* **11**(6), pp. 1–11 (2019). DOI: 10. 1109/JPHOT.2019.2956086.
- [84] J. Li, N. Wang, J. Zhu, N. Zhang, L. Hu, M. Yu, H. Yuan, and B. Li, "First real-time symmetric 50G TDM-PON prototype with high bandwidth and low latency", in *Opto-Electron. and Commun. Conf.*, (Jul. 2023), pp. 1–4. DOI: 10.1109/0ECC56963.2023.10209815.
- [85] J. Zhang, Z. Jia, M. Xu, H. Zhang, and L. Alberto Campos, "Efficient preamble design and digital signal processing in upstream burst-mode detection of 100G TDM coherent-PON", *J. Opt. Commun. Netw.* 13(2), A135–A143 (Feb. 2021). DOI: 10.1364/JOCN.402591.
- [86] R. Bonk, D. Geng, D. Khotimsky, D. Liu, X. Liu, Y. Luo, D. Nesset, V. Oksman, R. Strobel, W. Van Hoof, and J. S. Wey, "50G-PON: The first ITU-T higher-speed PON system", *IEEE Commun. Mag.* 60(3), pp. 48–54 (Mar. 2022). DOI: 10.1109/MCOM.001.2100441.
- [87] R. Bonk, E. Harstead, R. Borkowski, V. Houtsma, Y. Lefevre, A. Mahadevan, D. van Veen, M. Verplaetse, and S. Walklin, "Perspectives on and the road towards 100 Gb/s TDM PON with intensity-modulation and direct-detection", *J. Opt. Commun. Netw.* 15(8), pp. 518–526 (Aug. 2023). DOI: 10.1364/JOCN.489228.
- [88] International Telecommunication Union Telecommunication Standardization Sector (ITU-T), "ITU-T G.9804.2: Higher Speed Passive Optical Networks: Common Transmission Convergence Layer Specification", Tech. Rep., Feb. 2023.

- [89] R. Borkowski, Y. Lefevre, A. Mahadevan, D. van Veen, M. Straub, R. Kaptur, B. Czerwinski, B. Cornaglia, V. Houtsma, W. Coomans, R. Bonk, and J. Maes, "FLCS-PON—an opportunistic 100Gbit/s flexible PON prototype with probabilistic shaping and soft-input FEC: Operator trial and ODN case studies", *J. Opt. Commun. Netw.* 14(6), pp. C82–C91 (Jun. 2022). DOI: 10.1364/JOCN.452036.
- [90] European Commission. "Shaping europe's digital future". (2023), [Online]. Available: https://digital-strategy.ec.europa.eu/en/policies/european-quantum-communication-infrastructure-euroqci (visited on 05/05/2023).
- [91] F. Laudenbach, C.Pacher, C.-H. F. Fung, A. Poppe, M. Peev, B. Schrenk, M. Hentschel, P. Walther, and H. Hubel, "Continuous-variable quantum key distribution with Gaussian modulation the theory of practical implementations", *Adv. Quantum Technol.* 1(1), p. 1800011 (Jun. 2018). DOI: 10.1002/qute.201800011.
- [92] T. A. Eriksson, R. S. Luís, K. Gümüs, G. Rademacher, B. J. Puttnam, H. Furukawa, N. Wada, Y. Awaji, A. Alvarado, M. Sasaki, and M. Takeoka, "Digital self-coherent continuous variable quantum key distribution system", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2020).
- [93] H.-M. Chin, N. Jain, U. L. Andersen, D. Zibar, and T. Gehring, "Digital synchronization for continuous-variable quantum key distribution", *Quantum Sci. and Technol.* 7(4), p. 045 006 (Jul. 2022). DOI: 10.1088/2058-9565/ac7ba2.
- [94] B. J. Puttnam, G. Rademacher, and R. S. Luís, "Space-division multiplexing for optical fiber communications", *Optica* **8**(9), pp. 1186–1203 (Sep. 2021). DOI: 10.1364/OPTICA.427631.
- [95] G. Rademacher, B. Puttnam, R. Luís, T. Eriksson, N. Fontaine, M. Mazur, H. Chen, R. Ryf, D. Neilson, P. Sillard, F. Achten, Y. Awaji, and H. Furukawa, "Peta-bit-per-second optical communications system using a

- standard cladding diameter 15-mode fiber", *Nat. Commun.* **12**, p. 4238 (Jul. 2021). DOI: 10.1038/s41467-021-24409-w.
- [96] T. Matsui, P. L. Pondillo, and K. Nakajima, "Weakly coupled multicore fiber technology, deployment, and systems", *Proc. IEEE* **110**(11), pp. 1772–1785 (Nov. 2022). DOI: 10.1109/JPROC.2022.3202812.
- [97] P. J. Winzer and D. T. Neilson, "From scaling disparities to integrated parallelism: A decathlon for a decade", *J. Lightw. Technol.* 35(5), pp. 1099–1115 (Mar. 2017). DOI: 10.1109/JLT.2017.2662082.
- [98] T. Hayashi, T. Sakamoto, Y. Yamada, R. Ryf, R.-J. Essiambre, N. Fontaine, M. Mazur, H. Chen, and T. Hasegawa, "Randomly-coupled multi-core fiber technology", *Proc. IEEE* 110(11), pp. 1786–1803 (Nov. 2022). DOI: 10.1109/JPROC.2022.3182049.
- [99] S. Randel, R. Ryf, A. Sierra, P. J. Winzer, A. H. Gnauck, C. A. Bolle, R.-J. Essiambre, D. W. Peckham, A. McCurdy, and R. Lingle, "6×56-Gb/s mode-division multiplexed transmission over 33-km few-mode fiber enabled by 6×6 MIMO equalization", *Opt. Express* **19**(17), pp. 16 697–16 707 (Aug. 2011). DOI: 10.1364/0E.19.016697.
- [100] M. van den Hout, "Ultra-wideband and space-division multiplexed optical transmission systems", Ph.D. dissertation, Eindhoven University of Technology, Feb. 2024, ISBN: 978-90-386-5909-1. DOI: 10.6100/jnxx-6t19.
- [101] M. Mazur, R. Ryf, N. K. Fontaine, A. Marotta, E. Börjeson, L. Dallachiesa, H. Chen, T. Hayashi, T. Nagashima, T. Nakanishi, T. Morishima, F. Graziosi, L. Palmieri, D. T. Neilson, P. Larsson-Edefors, A. Mecozzi, and C. Antonelli, "Real-time MIMO transmission over field-deployed coupled-core multi-core fibers", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2022).
- [102] M. Mazur, L. Dallachiesa, N. K. Fontaine, R. Ryf, E. Börjeson, H. Chen, H. Sakuma, T. Ohtsuka, T. Hayashi, T. Hasegawa, H. Tazawa, D. T. Neilson, and P. Larsson-Edefors, "Real-time transmission over 2x55 km all

- 7-core coupled-core multi-core fiber link", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2022).
- [103] K. Kikuchi, "Clock recovering characteristics of adaptive finite-impulse-response filters in digital coherent optical receivers", *Opt. Express* **19**(6), pp. 5611–5619 (Mar. 2011). DOI: 10.1364/0E.19.005611.
- [104] M. Kuschnerov, F. Hauske, K. Piyawanno, B. Spinnler, E.-D. Schmidt, and B. Lankl, "Joint equalization and timing recovery for coherent fiber optic receivers", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2008), pp. 1–2.
- [105] J. C. M. Diniz, F. Da Ros, and D. Zibar, "Clock recovery challenges in DSP-based coherent single-mode and multi-mode optical systems", *Future Internet* 10(7) (Jun. 2018), ISSN: 1999-5903. DOI: 10.3390/ fi10070059.
- [106] M. Qiao, Y. Wang, Z. Li, Y. Xiao, Y. Chen, Z. Li, and D. Wang, "Novel dispersion and timing estimation for weakly-coupled OAM fiber transmission systems", *IEEE Photon. Technol. Lett.* 36(14), pp. 913–916 (Jun. 2024). DOI: 10.1109/LPT.2024.3410348.
- [107] T. Hayashi, Y. Tamura, T. Hasegawa, and T. Taru, "Record-low spatial mode dispersion and ultra-low loss coupled multi-core fiber for ultra-long-haul transmission", *J. Lightw. Technol.* **35**(3), pp. 450–457 (Feb. 2017). DOI: 10.1109/JLT.2016.2614000.
- [108] S. J. Savory, "Digital coherent optical receivers: Algorithms and subsystems", *IEEE J. Sel. Topics Quantum Electron.* 16(5), pp. 1164–1179 (Nov. 2010). DOI: 10.1109/JSTQE.2010.2044751.
- [109] K.-P. Ho and J. M. Kahn, "Chapter 11 mode coupling and its impact on spatially multiplexed systems", in *Optical Fiber Telecommunications* (*Sixth Edition*), ser. Optics and Photonics, I. P. Kaminow, T. Li, and A. E. Willner, Eds., Sixth Edition, Boston: Academic Press, 2013, pp. 491– 568. DOI: 10.1016/B978-0-12-396960-6.00011-0.

- [110] E. Deriushkina, J. Schröder, and M. Karlsson, "Dynamic model for coupled-core fibers", J. Lightw. Technol., pp. 1–9 (Jul. 2024). DOI: 10.1109/JLT.2024.3430374.
- [111] J. Carpenter, B. J. Eggleton, and J. Schröder, "Comparison of principal modes and spatial eigenmodes in multimode optical fibre", *Laser Photonics Rev.* **11**(1), p. 1 600 259 (Dec. 2016).
- [112] M. Cappelletti, M. Mazur, N. K. Fontaine, R. Ryf, T. Hayashi, A. Mecozzi, M. Santagiustina, A. Galtarossa, C. Antonelli, and L. Palmieri, "Statistical analysis of modal dispersion in field-installed coupled-core fiber link", *J. Lightw. Technol.* 42(11), pp. 4103–4109 (Jun. 2024).
- [113] U. Meyer-Baese, *Digital Signal Processing with Field Programmable Gate Arrays*, 4th. Springer Publishing Company, Incorporated, 2014, ISBN: 9783642453083. DOI: 10.1007/978-3-642-45309-0.
- [114] R. Yavne, "An economical method for calculating the discrete Fourier transform", in *Fall Joint Computer Conference*, ser. AFIPS '68 (Fall, part I), San Francisco, California: Association for Computing Machinery, (Dec. 1968), pp. 115–125, ISBN: 9781450378994. DOI: 10.1145/1476589.1476610.
- [115] P. Duhamel and H. Hollmann, "'split radix' FFT algorithm", *Electron*. *Lett.* **20**, pp. 14–16 (Jan. 1984). DOI: 10.1049/el:19840012.
- [116] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series", *Math. Comput.* **19**, pp. 297–301 (May 1965).
- [117] H. E. Rose, *Linear Algebra: A Pure Mathematical Approach*. Birkhäuser Verlag, 2002, ISBN: 3 7643 6905 1.
- [118] Å. Björck, "Numerics of Gram-Schmidt orthogonalization", *Linear Algebra and its Applications* 197-198, pp. 297–316 (Apr. 1994), ISSN: 0024-3795. DOI: https://doi.org/10.1016/0024-3795 (94) 90493-6.

- [119] G. Rünger and M. Schwind, "Comparison of different parallel modified Gram-Schmidt algorithms", in *Int. Eur. Conf. on Parallel and Distributed Computing*, Springer Berlin Heidelberg, (2005), pp. 826–836, ISBN: 978-3-540-31925-2.
- [120] S. Randel, P. J. Winzer, M. Montoliu, and R. Ryf, "Complexity analysis of adaptive frequency-domain equalization for MIMO-SDM transmission", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2013), pp. 1–3. DOI: 10.1049/cp.2013.1540.
- [121] J.-P. Elbers, C. Glingener, M. Duser, and E. Voges, "Modelling of polarisation mode dispersion in singlemode fibres", *Electron. Lett.* **33**, pp. 1894–1895 (Nov. 1997). DOI: 10.1049/el:19971297.
- [122] J. P. Gordon and H. Kogelnik, "PMD fundamentals: Polarization mode dispersion in optical fibers", *Proc. Natl. Acad. Sci.* 97(9), pp. 4541–4550 (Apr. 2000). DOI: 10.1073/pnas.97.9.4541.
- [123] H. Sun and K.-T. Wu, "Clock recovery and jitter sources in coherent transmission systems", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2012), pp. 1–3.
- [124] C. S. Martins, A. Lorences-Riesgo, S. Mumtaz, T. H. Nguyen, A. Hraghi, Z. Wu, Y. Frignac, G. Charlet, and Y. Zhao, "Frequency-band analysis of equalization enhanced phase noise jointly with DSP impact", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2024), pp. 1–3.
- [125] M. Qui, X. Tang, Y. Chen, J. He, and C. Li, "Mitigation of equalization enhanced phase noise using feedfoward timing error correction", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2024).
- [126] G. Rademacher, M. van den Hout, R. S. Luís, B. J. Puttnam, G. Di Sciullo, T. Hayashi, A. Inoue, T. Nagashima, S. Gross, A. Ross-Adams, M. J. Withford, J. Sakaguchi, C. Antonelli, C. Okonkwo, and H. Furukawa, "Randomly coupled 19-core multi-core fiber with standard cladding diameter", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2023), pp. 1–3. DOI: 10.1364/OFC.2023.Th4A.4.

- [127] T. Xu, G. Jacobsen, S. Popov, M. Forzati, J. Mårtensson, M. Mussolin, J. Li, K. Wang, Y. Zhang, and A. T. Friberg, "Frequency-domain chromatic dispersion equalization using overlap-add methods in coherent optical system", *J. Opt. Commun.* **32**(2), pp. 131–135 (Jun. 2011). DOI: 10. 1515/joc.2011.022.
- [128] D. A. Morero, M. A. Castrillon, F. A. Ramos, T. A. Goette, O. E. Agazzi, and M. R. Hueda, "Non-concatenated FEC codes for ultra-high speed optical transport networks", in *Global Telecommun. Conf.*, (Dec. 2011), pp. 1–5. DOI: 10.1109/GLOCOM.2011.6133616.
- [129] H. Li and L. Schmalen, "A spatially coupled LDPC coding scheme with scalable decoders for space division multiplexing", in *Proc. Eur. Conf. Opt. Commun.*, (Oct. 2023), pp. 1186–1189.
- [130] F. J. Harris, *Multirate Signal Processing for Communication Systems*. USA: Prentice Hall PTR, 2004, ISBN: 0131465112.

# **Danksagung**

Diese Dissertation entstand im Rahmen meiner Tätigkeit am Institut für Photonik und Quantenelektronik (IPQ) des Karlsruher Instituts für Technologie (KIT). Die zugrunde liegende wissenschaftliche Arbeit wurde von mehreren Forschungsprojekten des Bundesministeriums für Bildung und Forschung (BMBF) gefördert, darunter KIGLIS, AI-NET ANTILLAS, STARFALL und Open6GHub. Meinem Doktorvater Sebastian Randel möchte ich für die Möglichkeit, an diesen Vorhaben mitwirken zu dürfen, sowie für das mir entgegengebrachte Vertrauen danken.

Während meiner Zeit am IPQ wurde ich von vielen Menschen begleitet und unterstützt. Mein besonderer Dank gilt meinen aktuellen und ehemaligen Kolleginnen und Kollegen für die tolle Zusammenarbeit und die angenehme Arbeitsatmosphäre. Die vielfältigen gemeinsamen Aktivitäten außerhalb des Forschungsalltags – von unzähligen Kaffeepausen über gesellige Abende auf dem Institutsdach bis hin zur unvergesslichen Reise nach New York – waren geprägt von etlichen humorvollen Momenten und auch so mancher anregenden wissenschaftlichen Diskussion. Vielen Dank für die schöne Zeit.

Für zahlreiche konstruktive Brainstorming-Sessions danke ich insbesondere der System-Gruppe, bestehend aus Adib Md Hossain, Salek Md Mahmud, Jonas Krimmer und Joel Dittmer. Ein besonderer Dank gilt unserem ehemaligen Gruppenleiter und Mentor Christoph Füllner (alias Obi-Wan), der uns als Jedi-Meister in die Welt der Forschung eingeführt hat. Auch über seine Zeit am Institut hinaus blieb "die Macht" mit uns.

Abschließend möchte ich meinen Freunden und meiner Familie für ihre Unterstützung und den notwendigen Ausgleich neben der Arbeit danken. Ein besonderer Dank gilt meiner Verlobten Chiara Windsor, für ihre Geduld und ihren Rat in herausfordernden Zeiten. Ohne sie wäre diese Arbeit in ihrer heutigen Form kaum möglich gewesen.

## **List of Publications**

### **Journal Publications**

- [J1] M. M. H. Adib<sup>†</sup>, P. Matalla<sup>†</sup>, C. Füllner, S. Li, E. Giacoumidis, C. Raack, U. Menne, M. Straub, T. Saier, C. Schweikert, S. Orf, M. Gontscharow, T. Käfer, M. Färber, A. Richter, R. Bonk, and S. Randel, "Optical-access networks for smart sustainable cities: From network architecture to fiber deployment", *J. Opt. Commun. Netw.* 17(3), pp. 221–232 (Mar. 2025).
  <sup>†</sup> authors contributed equally to the work.
  DOI: 10.1364/JOCN.542368.
- [J2] P. Matalla, J. Krimmer, L. Schmitz, D. Fang, C. Koos, and S. Randel, "Joint non-data-aided clock recovery for space-division multiplexed optical transmission systems", *J. Lightw. Technol.* 43(13), pp. 6128–6138
  - (Jul. 2025). DOI: 10.1109/JLT.2025.3546721.
- [J3] P. Matalla, C. Koos, and S. Randel, "Impact of chromatic dispersion on oversampled digital clock recovery in direct-detection systems: Analysis and solutions", *J. Lightw. Technol.*, pp. 1–10 (2025). DOI: 10.1109/ JLT.2025.3600353.
- [J4] J. Dittmer, J. Tebart, P. Matalla, S. Wagner, A. Tessmann, A. Bhutani, C. Koos, A. Stöhr, and S. Randel, "Comparison of electronic and optoelectronic signal generation for (sub-)Thz communications", *Int. J. Microw. Wirel. Technol.*, pp. 1–11 (Nov. 2024). DOI: 10.1017/ S1759078724000667.

- [J5] J. Tebart, J. Dittmer, T. Haddad, P. Matalla, P. Lu, S. Randel, and A. Stöhr, "Point-to-multipoint beam-steering terahertz communications using a photonics-based leaky-wave transmit antenna", *Int. J. Microw. Wirel. Technol.*, pp. 1–9 (Nov. 2024). DOI: 10.1017/S1759078724000679.
- [J6] M. S. Mahmud, P. Matalla, J. Dittmer, C. Koos, and S. Randel, "Optic-electronic-optic (OEO) interferometer enabling coherent optical add-drop multiplexing", *Opt. Express* 33(4), pp. 6885–6893 (Feb. 2025). DOI: 10.1364/OE.532854.
- [J7] D. Fang, D. Drayss, H. Peng, G. Lihachev, C. Füllner, A. Kuzmin, P. Marin-Palomo, P. Matalla, P. Kharel, R. Wang, J. Riemensberger, M. Zhang, J. Witzens, J. C. Scheytt, W. Freude, S. Randel, T. J. Kippenberg, and C. Koos, "320 Ghz photonic-electronic analogue-to-digital converter (ADC) exploiting Kerr soliton microcombs", *Light: Sci. Appl.* 14(241) (Jul. 2025). DOI: 10.1038/s41377-025-01778-1.

#### **Conference Publications**

- [C1] **P. Matalla**, M. S. Mahmud, C. Füllner, C. Koos, W. Freude, and S. Randel, "Hardware comparison of feed-forward clock recovery algorithms for optical communications", in *Proc. Opt. Fiber Commun. Conf.*, (Jun. 2021), pp. 1–3.
- [C2] P. Matalla, M. S. Mahmud, C. Füllner, W. Freude, C. Koos, and S. Randel, "Real-time feedforward clock recovery for optical burst-mode transmission", in *Proc. Opt. Fiber Commun. Conf.*, (2022). DOI: 10. 1364/0FC.2022.M2H.2.
- [C3] P. Matalla, M. S. Mahmud, C. Koos, and S. Randel, "Pilot-free digital clock synchronization for continuous-variable quantum key distribution", in *Proc. Eur. Conf. Opt. Commun.*, (Oct. 2023), pp. 1386–1389. DOI: 10.1049/icp.2023.2552.

- [C4] P. Matalla, C. Koos, and S. Randel, "Comparison of feedback and feedforward clock recoveries for ultra-fast synchronization in passive optical networks", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2024). DOI: 10.1364/0FC.2024.W2A.36.
- [C5] P. Matalla, L. Schmitz, J. Krimmer, D. Fang, C. Koos, and S. Randel, "Demonstration of joint blind clock recovery in a 1.92 Tbit/s transmission over 50 km randomly-coupled 4-core fiber", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2024).
- [C6] F. von Schoettler, E. Lyczkowski, Z. Hua, P. Matalla, and S. Randel, "Timing synchronization for smartphone-based optical camera communication", in *Eur. Conf. on Netw. and Commun. & 6G Summit*, (Jun. 2023), pp. 311–316. DOI: 10.1109/EuCNC/6GSummit58263.2023. 10188265.
- [C7] J. Dittmer, P. Matalla, C. Fuellner, S. Wagner, A. Tessmann, C. Koos, and S. Randel, "Comparison of electronic and optoelectronic signal generation for wireless THz communications", in *Int. ITG Workshop on Smart* Antennas and Conf. on Syst., Commun., and Coding, (2023), pp. 1–6.
- [C8] L. Schmalen, V. Lauinger, J. Ney, N. Wehn, P. Matalla, S. Randel, A. von Bank, and E.-M. Edelmann, "Recent advances on machine learning-aided DSP for short-reach and long-haul optical communications", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2025).
- [C9] D. Fang, H. Peng, Y. Chen, J. Dittmer, A. Tessmann, S. Wagner, P. Matalla, D. Drayss, G. Lihachev, A. Voloshin, S. T. Skacel, M. Lauermann, I. Kallfass, T. Zwick, W. Freude, T. J. Kippenberg, S. Randel, and C. Koos, "Wireless THz communications at 250 Gbit/s using self-injection-locked Kerr soliton microcombs as photonic-electronic oscillators at the transmitter and receiver", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2024).
- [C10] **P. Matalla**, C. Koos, and S. Randel, "Chromatic-dispersion-tolerant digital clock recovery for IM/DD systems", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2025), *accepted at ECOC*.

- [C11] V. Lauinger, L. Schmitz, P. Matalla, A. Rode, S. Randel, and L. Schmalen, "Novel phase-noise-tolerant variational-autoencoder-based equalization suitable for space-division-multiplexed transmission", in *Proc. Eur. Conf.* Opt. Commun., (Sep. 2025), submitted to ECOC.
- [C12] R. Fischer, **P. Matalla**, S. Randel, and L. Schmalen, "Non-linear equalization in 112 Gb/s PONs using Kolmogorov-Arnold networks", in *Proc. Opt. Fiber Commun. Conf.*, (Mar. 2024).
- [C13] D. Bogdoll, P. Matalla, C. Füllner, C. Raack, S. Li, T. Käfer, S. Orf, M. R. Zofka, F. Sartoris, C. Schweikert, T. Pfeiffer, A. Richter, S. Randel, and R. Bonk, "Kiglis: Smart networks for smart cities", in *Int. Smart Cities Conf.*, (Sep. 2021), pp. 1–4. DOI: 10.1109/ISC253183.2021.9562826.
- [C14] C. Shao, E. Giacoumidis, P. Matalla, J. Li, S. Li, S. Randel, A. Richter, M. Färber, and T. Käfer, "Advanced equalization in 112 Gb/s upstream PON using a novel Fourier convolution-based network", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2024).
- [C15] J. Ney, P. Matalla, V. Lauinger, L. Schmalen, S. Randel, and N. Wehn, "Real-time FPGA demonstrator of ANN-based equalization for optical communications", in *Int. Conf. on Machine Learning for Commun. and* Netw., (May 2024).
- [C16] C. Füllner, D. Fang, P. Matalla, W. Freude, C. Koos, and S. Randel, "Ultra-broadband electrical signal generation and IM/DD transmission of QAM signals at symbol rates up to 90 GBd", in *Proc. Eur. Conf. Opt. Commun.*, (Sep. 2021), pp. 1–4. DOI: 10.1109/EC0C52684.2021. 9605833.
- [C17] M. S. Mahmud, P. Matalla, M. M. H. Adib, C. Koos, and S. Randel, "Coherent add/drop multiplexing using an optic-electronic-optic interferometer", in *Conf. Lasers Electro-Opt.*, (May 2023), SM2I.6. DOI: 10.1364/CLE0\_SI.2023.SM2I.6.

[C18] M. S. Mahmud, P. Matalla, J. Dittmer, A. Schindler, P. Runge, C. Koos, and S. Randel, "Optic-electronic-optic interferometer on an Indium Phosphide platform", in *Conf. Lasers Electro-Opt.*, (2024), pp. 1–2.

## **Preprint Publications**

[P1] **P. Matalla**, J. Dittmer, M. S. Mahmud, C. Koos, and S. Randel, *Elastic buffer design for real-time all-digital clock recovery enabling free-running receiver clock with negative and positive clock frequency offsets*, 2025. arXiv: 2507.13748 [eess.SP]. [Online]. Available: https://arxiv.org/abs/2507.13748.