# Terabit sampling system with photonic time-stretch analog-to-digital converter

M. Caselle<sup>a</sup>, S. Bielawski<sup>b</sup>, O. Manzhura<sup>a</sup>, S. Chilingaryan<sup>a</sup>, T. Dritschler<sup>a</sup>, A. Ebersoldt<sup>a</sup>, A. Kopmann<sup>a</sup>, M. J. Nasse<sup>a</sup>, M. M. Patil<sup>a</sup>, E. Bründermann<sup>a</sup>, E. Roussel<sup>b</sup>, C. Szwaj<sup>b</sup>, and A.-S. Müller<sup>a</sup>

<sup>a</sup>Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen, DE

<sup>b</sup>Univ. Lille, CNRS, UMR 8523 - PhLAM - Physique des Lasers, Atomes et Molécules, Centre d'Étude Recherches et Applications (CERLA), F-59000, Lille, France

## ABSTRACT

The detection of rapid dynamics in diverse physical systems is traditionally very difficult and strongly dominated by several noise contributions. Laser mode-locking, electron bunches in accelerators and optical-triggered phases in materials are events that carry important information about the system from which they emerge. To understand the underlying dynamics of complex systems often large numbers of single-shot measurements must be acquired continuously over a long time with extremely high temporal resolution. Ultrafast real-time instruments allow the acquisition of large data sets, even for rare events, in a relatively short time period. The real-time measurement of fast single-shot events with large record lengths is one of the most challenging problems in the fields of instrumentation and measurement. In this contribution, the novel ultra-fast and continuous data sampling system THERESA using photonic time-stretch is presented and its performance is discussed. The proposed data acquisition system is based on the latest ZYNQ Radio Frequency System on Chip (ZYNQ-RFSoC) family from Xilinx, which combines an array of fast (GS/s) multi-channel Analog-to-Digital Converters (ADCs) with a Field Programmable Gate Array (FPGA) and a multi-core ARM processor in a single heterogeneous programmable device. The stretched pulse is sampled in parallel by 16 wideband sampling channels operating in time-interleaving mode. The sampled data is transferred by a 100 Gb Ethernet data link to the Data Acquisition (DAQ) compute node for further analysis. The combination of both, the photonic time-stretch and the fast sampling system, is capable of sampling short pulses with femtosecond time resolution. Applications of the new system, hardware implementation and the commissioning of the first system for the electron bunch diagnostics are presented.

**Keywords:** Photonic time-stretch, femtosecond time resolution, mode-locked lasers characterization, electron bunches in accelerators, single shot measurement, ZYNQ-*RF*SoC, time-interleaving ADC, real-time data processing

### 1. INTRODUCTION

The acquisition of non-repetitive and statistically rare signals that occur on short timescales requires fast realtime measurements that often exceed the speed, precision and record length of conventional digitizers. Photonic time-stretch (PTS) aims at recording such information in single-shot, by converting the ultrafast information to be recorded into a relatively slow time-evolution, typically in the GHz range, that is eventually recorded by electronic means. Pioneer works<sup>1</sup> on PTS aimed at digitizing fast electronic signals using slower ADCs by literally "slowing-down" the information. In the following years, novel PTS schemes have been developed, enabling continuous ultra-fast single-shot spectroscopy,<sup>2</sup> imaging,<sup>3-6</sup> digitization of electric fields evolution with THz bandwidth<sup>7</sup> and other measurements at refresh rates of trillions of consecutive frames per second (see Ref.<sup>8</sup> for a review). The technology has opened up new frontiers in measurement science: in nonlinear dynamics

Further author information: (Send correspondence to M. Caselle)

michele.caselle@kit.edu; Phone: +49 721 608-29170; Fax: +49 721 608-2559; https://www.ipe.kit.edu/

of optical rogue waves,<sup>9</sup> acoustic shock waves,<sup>10</sup> mode-locked lasers,<sup>11–18</sup> parametric oscillators,<sup>19</sup> relativistic electron bunching,<sup>7,20</sup> as well as in applications to cancer cell identification,<sup>21,22</sup> optical coherence tomography (OCT),<sup>23,24</sup> material pump-probe spectroscopy,<sup>25</sup> and LIDAR.<sup>26</sup>

Although these current trends in PTS real-time measurements are extremely diverse, they share two fundamental challenges, that are typical of real-time measurements. The first is the limitation of the data converters, that is, the trade-off between the dynamic range (measured in number of bits) and the speed of the Analog-to-Digital Converter (ADC). The second is the trade-off between the speed and the sensitivity of the optoelectronic frontend. The most successful non-electronic method to alleviate the ADC speed limitations has been PTS.<sup>1</sup> In this method, the analog signal to be sampled is slowed down prior to digitization. The general principle of PTS (applied to recording electric field evolutions) is shown in Figure 1. A broadband, chirped optical carrier laser pulse is fed through an electro-optical device which encodes the ultra-fast (THz or electrical) pulses under investigation. The modulated laser pulses are then stretched in time by means of a natural or synthetic dispersive element in a long fiber until their duration is in the order of nanoseconds (see Figure 1). The factor S, by which the pulse is slowed down, can be calculated using Equation 1 where  $L_1$  and  $L_2$  is the length of the two fibers.<sup>1</sup>

$$S = 1 + \frac{L_1}{L_2}$$
(1)

The stretched optical signal is converted into an electrical signal by a photodetector. The electrical signal can then be recorded by a conventional real-time digitizer, i.e. oscilloscope, and processed off-line by classical signal processing or Artificial Intelligence (AI) deployed on a Central Processing Unit (CPU) or an accelerator units such as a Graphics Processing Unit (GPU).



Figure 1: Photonic time-stretch and its main components,  $L_1$  represents the length of the optical fiber before the electro-optical modulator, while  $L_2$  is the optical fiber length after the modulator. Assuming fiber lengths of  $L_1 = 10 \text{ m}$  and  $L_2 = 2 \text{ km}$  and an input laser pulse with a duration of  $T_1 = 1 \text{ ps}$ , then according to Equation 1 the stretching factor S is calculated to 200. The high-frequency THz signal under investigation will be stretched to  $T_2 = S \cdot T_1 = 200 \text{ ps}$ , which corresponds to a frequency of 5 GHz. Note that the strategy applies to the analysis of electric signal with different origins such as: Free-propagating or guided THz signals (typically in Time-Domain Spectroscopy), or Coulomb fields created by charged bunched in particle accelerators. However, variants of PTS also applies to other measurements as ultrafast imaging.

The photonic time-stretch is nowadays used in many applications. Along with the most apparent aspects, the real potential of this technique, which enables continuous ultra-fast acquisition of recordings spanning trillions of consecutive frames, is still not completely exploited. High-bandwidth oscilloscopes are expensive and due to limited internal memory and missing fast readout interfaces are not suitable for the long term and continuous acquisition of analog input signals. To overcome these limitations, a high-speed digitizer architecture for continuous sampling of ultra-fast analog signals has been developed. This paper presents a high-performance terasampling system based on the photonic time-stretch and its fundamental and practical design considerations. The paper is divided into eight sections. After the introduction the PTS setup and its components is presented. The third section focuses on the development of wideband front-end electronics. The forth section presents the fast digitizer so-called THERESA (TeraHErtz REadout SAmpling) and its data acquisition card based on ZYNQ -RFSoC family from Xilinx.<sup>27</sup> In section five and six, the Field Programmable Gate Array (FPGA) firmware architecture and the System-on-Chip (SoC) implementation are shown. The seventh section describes the hardware integration of the optics section with the proposed fast digitizer. Finally, a summary and conclusion is provided.

# 2. DIGITIZER ARCHITECTURE FOR PHOTONIC TIME-STRETCH

Devices like the Karlsruhe Pulse Taking Ultra-Fast Readout Electronics (KAPTURE)<sup>28</sup> is a high-performance and wideband sampling systems operating with over hundreds of GS/s. It is a crucial device for the study of complex beam dynamics in synchrotron and plasma accelerators.<sup>29</sup> Recent years have seen an ever increasing demand of high-performance advanced diagnostic instrumentation coupled to real-time data processing based on artificial intelligent deployed on programmable devices.<sup>30</sup> KAPTURE is very efficient in the direct sampling of ultra-short pulse signals with Full Width at Half Maximum (FWHM) with less than 100 ps. However it is unable to acquire PTS signals with a time duration only up to several nanoseconds. Thus, the THERESA system has been developed to speedup the sampling rate and acquire time-stretched signals in continuous mode. The whole system, including the PTS setup for the sampling of THz radiation emitted by relativistic electron bunches in modern accelerator machines, the THERESA digitizer system and its components are shown in Figure 2. The encoded time-stretched signal coming from the optical path is fed into a photodetector that



Figure 2: PTS system and readout architecture. It consists of an optical time-stretching path, a fast photodetector and the THERESA sampling system. The THERESA system contains the wideband front-end electronics, the THERESA sampling card and the data acquisition system based on the recent ZYNQ-*RF*SoC.

converts the photon pulse into an electrical signal. The photodetector signal is then split in sixteen identical copies by a wideband active power-dividers. All output signals are then sampled by the THERESA sampling card. THERESA integrates up to sixteen parallel sampling channels. Each sampling channel contains a wideband Track-and-Hold Amplifier (THA) controlled by a programmable delay chip that controls the sampling with a time step of 11 ps. The sampling card is connected to the ZCU216 evaluation board from Xilinx, which is equipped with the latest generation of ZYNQ UltraScale+ Radio Frequency System-on-Chip (RFSoC). Xilinx *RF*SoC is a relatively new product line that integrates several multi-gigasample data converters with processors, Ultrascale+ programmable logic 16 nm FinFET FPGA, double-data-rate memory, and various peripherals in a System-on-Chip (SoC) architecture. The RFSoC includes a quad-core ARM Cortex A53 application processor and a dual-core ARM Cortex R5 processor for real-time processing. This makes the ZCU216 the perfect platform for analog, digital and embedded system design, which also simplifies the calibration and synchronization along the signal chains. The mismatch between the sampling channels, resulting from the time-interleaving, are digitally corrected and the samples are sent to the computational node by a high-performance 100 Gb/s Ethernet connection implemented in the programmable logic. The THERESA sampling card is the key-component of the proposed system. The working principle of THERESA is shown in Figure 3. An external reference clock can be applied to the board in order to synchronize the acquired samples to the timing system of the experimental

setup. An ultra low-noise clock jitter cleaner with dual-loop Phase Locked Loops (PLL) is employed to generate a clock signal with high temporal accuracy. The clock signal is distributed to the delay chips by a low-skew and low-jitter clock fanout buffer. In this way, the THAs receive a time controlled sampling signal synchronized with the reference clock. Each delay chip is individually programmable by FPGA in steps of 11 ps with a delay ranging from 3.2 ns to  $17 \text{ ns.}^{31}$  The parallel analog samples of the THAs are digitized by an array of 16 ADCs integrated into the ZYNQ-*RF*SoC. The digital samples are merged, formatted and optionally processed in real-time by FPGA on the ZYNQ-*RF*SoC. The samples are then stored locally in a Double Data Rate Gen 4 (DDR4) memory device before being sent to the high-performance computing node by a high data throughput, a modern 100 Gb/s Ethernet readout interface. The use of standard "off-the-shelf" Ethernet devices optimizes cost, effort, improves the scalability of the system and ensures seamless incremental upgrades with technological progress. The multi-core ARM Cortex-A53 processor integrated in the *RF*SoC allows a simple integration into the experimental setup.



Figure 3: Working principle of the THERESA sampling card and its components. The sampling card consists of sixteen Track-and-Hold-Amplifiers (THAs), picosecond delay chips, a jitter cleaning PLL and a low-jitter, low-skew fanout buffer for clock distribution.

To correct mismatches between the sampling channels, which would reduce the performance of Spurious Free Dynamic Range (SFDR) and Effective Number Of Bits (ENOB), the calibration system described in section 6 has been integrated in THERESA. One key-feature of the THERESA architecture is its high flexibility in the sampling operation. The system can operate either in continuous or in single-shot sampling modes. In the case of a continuous mode operation, the phase of the sixteen parallel sampling channels are equidistant and distributed equally over the sampling interval, which can be a multiple of the sync clock, as shown in Figure 3. In the single-shot sampling mode, the phase of sixteen sampling channels is set with a minimum time distance of 11 ps. In that mode, the system will sample an input signal at the frame rate of 90 GS/s. In the case of the proposed system, the parallel array of 16 time-interleaved ADCs is capable of sampling the input signal with a fs rate (sampling period  $\Delta T = 1/fs$ ) even though each individual ADC is sampling at a lower rate of fs/16. Considering that the sampling rate of each individual ADC reaches up to 2.5 GS/s, the maximum frame rate achievable by the proposed system is up to 40 GS/s in continuous acquisition mode. When combined with the PTS system shown in Figure 1 and assuming a realistic time stretch factor S of 200, the proposed sampling system operates at a frame rate of 40 GS/s  $\cdot 200 = 8 \text{ TS/s}$ .

## **3. BROADBAND ACTIVE POWER DIVIDER**

A key component is the ultra-wide bandwidth active power divider. It must be very low noise and have an ultra-wide bandwidth, due to the fact that any further noise contribution and bandwidth limitation will affect

the final frequency performance of the entire system. Therefore, an ultra-wide bandwidth active power divider from one-to-four outputs has been designed. To split the analog photodetector signal into sixteen identical copies, five power dividers are employed and connected in a tree configuration. The one-to-four power divider is shown in Figure 4 (left). It consists of an ultra-wide bandwidth Low-noise Amplifier (LNA) and a power divider based on tapered transmission lines. The LNA, operating from 0.5 GHz to 80 GHz,<sup>32</sup> has been integrated to compensate for the insertion loss due to the power divider. The amplified RF-output of the LNA is then divided into two parallel RF-outputs by the T-Junction with an impedance of  $100 \Omega$ , Figure 4 (middle). Each branch is then re-adapted from  $100 \Omega$  to  $50 \Omega$  by the wideband tapered line transformers. The tapered line is



Figure 4: High analog bandwidth active power divider with housing (left). The internal view and its components (middle). Preliminary time characterization with a pulse is shown in the right picture. Channel 1 (top) shows the input RF signal. Channel 2 (bottom) shows one of the four outputs of the device.

an RF lumped component that is employed as an impedance transformer. It matches an impedance  $Z_1$  to an impedance  $Z_2$  using a gradually varying characteristic impedance Z(z) along the line. The tapered line is the main lumped component and its frequency behaviour has been investigated thoroughly. The length has been optimized to balance the minimum wave reflection and the minimum insertion loss. The final geometry has been optimized to minimize the time skew between output channels. To ensure an ultra-high bandwidth, the power divider has been produced using the Roger Duroid 5880 RF/microwave substrate.<sup>33</sup> The frequency behaviour of the described power divider architecture has been reported in a previous publication.<sup>28</sup> All output channels show a uniform and flat transmission (|S21|, |S31|, |S41|, |S51|) up to 100 GHz. The insertion loss is completely compensated by the gain of the LNA, the return loss is in the range of 20 dB at 50 GHz.

# 4. THERESA SAMPLING CARD

The THERESA sampling card is a complex mixed-signal Printed Circuit Board (PCB) where the digital and analog sections have been designed carefully to prevent signal interference. The most important guideline is to separate the system into different sections to help ensure isolation between critical signals. The PCB consists of sixteen metal layers of Megtron6 substrate from Panasonic, which is specifically designed for high-speed/highfrequency applications.<sup>34</sup> Both analog and digital signals require wideband transmission lines and well-controlled time skew signals, therefore all lines have been individually optimized and routed by "accordion" traces techniques to match the signal propagation time with picosecond tolerance. Ad-hoc RF filters located close to the RF components have been adopted to reduce noise. Via fences and guard ring techniques have been employed in the layout to reduce the crosstalk between adjacent transmission lines, the electromagnetic interference and to improve the performance at high frequency. The accurate return path through the system ensures that the crosstalk does not occur between different board regions. Therefore, the upper eight layers are dedicated to the digital circuits, while the bottom eight are dedicated to the RF-analog circuits. The top and bottom sides of the THERESA sampling card is shown in Figure 5. The bottom side of the card is dedicated to the placement of the analog-RF components and the wideband grounded coplanar waveguide transmission lines. The analog signals from the power dividers is fed into the THAs via wideband RFC-2.4MM series connectors.<sup>35</sup> The THA employed on the board is the wideband HMC760LC4B, which features an analog bandwidth of 18 GHz and a low random aperture jitter less than hundred femtoseconds.<sup>36</sup> The output from the THAs is routed through



Figure 5: THERESA card with analog RF circuits on the bottom side (left) and the digital design on top side (right).

RFMC Low Profile Array, Male (LPAM) connectors<sup>37</sup> to the RF data converters integrated on the ZYNQ-RFSoC. Additional reference clock signals with very precise phase are generated from THERESA by dedicated PLLs<sup>38</sup> and propagated to the data converters by a small (6x20) LPAM connector. The top side of the card is dedicated to the placement and routing of the timing controlled sampling clocks and the slow-control for the PLL and delay chip configurations. As shown in Figure 5, an FPGA Mezzanine Card (FMC+)<sup>39</sup> connector has been integrated to provide all digital signals from and to the ZYNQ-*RF*SoC. The NB6L295 delay chip provides the sampling clock to the THA, which can be programmed by the ZYNQ-*RF*SoC via Serial Peripheral Interface (SPI), in a range from 3.2 ns to 17 ns with a step size of 11 ps.<sup>31</sup> THERESA makes use of the highest



Figure 6: Clock distribution and its components implemented on the THERESA sampling card. The analog input to the THAs are not shown.

performance of the latest commercially available components on the market. The timing distribution of both group delay in the RF analog region and the clock skew in the sampling region are very critical for the system. In fact, any timing skew in both analog or sampling time will result in a spurious harmonic component in the output spectrum. Therefore, particular precautions have been taken in the design of the clock distribution of THERESA. The clock distribution architecture is shown in Figure 6. The main clock device is a dual-loop PLL architecture LMK04808B, which provides the low-noise jitter cleaner functionality.<sup>40</sup> The PLL has two individually configurable delays: a coarse delay from 2.25 ns to 261 ns with a time step of 500 ps and an analog fine delay ranging from 0 ps to 475 ps in steps of 25 ps. The flexible and accurate clock distribution allows easy synchronization between the output of THA and ADC timing. The LMX2594 PLL are employed as clock generator for the data converters on the ZYNQ-RFSoC, as its output frequency can reach up to 15 GHz.<sup>38</sup> The cleaned clock references generated by the main PLL is then propagated to the delay chips via a low-jitter and low-skew fanout buffers (HMC987LP5E). A multiple-input clock selector is provided for asynchronous switching between clock sources with different frequencies. Three clock sources have been implemented. The first source is provided by the FPGA, which is useful in case of continuous free running sampling mode. The second source is an external clock reference, which is particularly important to sample the input signal synchronized to an external time reference, provided by the experimental setup and a local oscillator for debugging.

# 5. FPGA FIRMWARE ARCHITECTURE

The THERESA system generates a large data volume that needs to be sustained for a long observation time and, therefore, a high data throughput FPGA firmware architecture is necessary. Furthermore, the FPGA must provide the additional logic for the control and configuration of the whole system. The RF-direct data converters integrated into the ZYNQ-*RF*SoC features up to sixteen 14-bit ADCs operating at a sampling frequency of up to 2.5 GS/s and sixteen 14-bit DACs operating at 10 GS/s. To handle the mentioned high data throughput, four high-speed Small Form-Factor pluggable (SFP) 28 Gb/s optical connectors are employed for data transfer to DAQ-PC over 100 Gb/s Ethernet. Moreover, for temporary data storage/buffering, two 4 GB DDR4 memory modules (one for FPGA and one for processor) are present on the board.<sup>41</sup> In the FPGA firmware architecture



Figure 7: Firmware architecture deployed on the programmable logic of ZYNQ. The architecture implements three main data paths: data streaming from THERESA sampling card to standard Ethernet 100 GbE (thick arrows), calibration path (thin arrows) and the command-configuration path received from the control-PC over Ethernet to the ARM processor (dash arrows).

shown in Figure 7 there are three main paths: data, DAC calibration, slow-control and configuration path. The analog samples from the THERESA sampling card (left side of Figure 7) are digitized by the fast ADCs, which are configured and controlled by the ZYNQ Ultrascale+ RF Data Converter IP-core<sup>42</sup> ("RF-ADC/DAC and configuration"). The samples coming from the data converters are fed through a First In, First Out (FIFO) stage into a data formatting block ("ADC Data Formatting") where the data is merged and formatted. The data

stream from the "ADC data formatting block" is then moved to 100 Gb/s Ethernet Subsystem IP-core<sup>43</sup> for the final transmission to the PC-DAQ over Ethernet link. The DAC calibration path plays an important role in the calibration of the sixteen sampling channels. The calibration system and the procedure is described in section 6. It relies on a fast DAC channel to generate a user-defined reference signal which is applied to the input of the time interleaved sampling channels. The reference signal is generated in Python by the ARM processor and then moved to the DAC by an Advanced extensible Interface (AXI) interface. For visibility reasons the slow-control interface of delay chips or PLLs are not shown in the block diagram. The configuration of programmable devices, like the delay chips, is realized via SPI and dedicated AXI registers. The configuration software sends the desired parameter values to the AXI registers. The register values are then send via SPI to the hardware devices.

# 6. SYSTEM ON CHIP AND CALIBRATION ARCHITECTURE

The THERESA system is designed to support Python Productivity for ZYNQ (PYNQ) framework.<sup>44</sup> PYNQ is a development environment based on Jupyter notebooks<sup>45</sup> that allows developers with little FPGA experience to rapidly implement designs to be able to take full advantage of FPGA performance for high-speed, computeintensive applications. As the name suggests, PYNQ takes advantage of the development productivity gains associated with the Python programming language. At the same time, developers can implement time critical functions in C language. Although experienced developers can extend PYNQ with specialized hardware overlays and C language software libraries, PYNQ's strength lies in its ability to provide a high productivity development environment for any developer able to build a Python program and deploy it on FPGA. The PYNQ framework could be employed to develop specific data processing or data correction to reduce the intrinsic mismatching present between the sixteen sampling channels of THERESA due to time interleaving. The most important benefit of time interleaving is the increased bandwidth made possible by the wider Nyquist zone of the interleaved ADCs. However, time interleaving also creates some challenges. There are spurs harmonic components that appear in the output spectrum that result from the mismatches between the sampling channels. There are four basic types of mismatching: offset, gain, timing, and bandwidth. To minimize such mismatching between ADCs, with the consequent improvement of the effective number of bits (ENOB), a calibration system has been integrated on the THERESA system. It is based on a fast 14-bit 10 GS/s DAC generating an analog signal, which is fed into the analog sampling channels for the offset, gain, timing and bandwidth test and calibration. The calibration system and its components are shown in Figure 8. With the proposed system, an analog waveform can be programmed in Python or C on the multi-core ARM processor and then applied, by glue logic on FPGA, to the DAC for the generation of the reference signals as shown in Figure 8.



Figure 8: Calibration circuit on THERESA card. A dedicated high-speed Digital-to-Analog converter operating up to 10 GHz is instantiated on the ZYNQ-*RF*SoC to generate a programmable reference waveform for the calibration of the sampling channels. The fast-DAC is driven by a Python routine executed on the ARMs processors.

The calibration system takes advantage of the heterogeneous architecture of ZYNQ *RF*SoC that combines the possibility to program automated calibration routines in Python on the ARM processor cores, with DACs being available on the same die. The four different mismatches that cause issues are characterized by precise spur harmonic components. Three of the four mismatches produce a spur in the output spectrum at  $f_S/N \pm f_{IN}$ . The offset mismatch spur can be easily identified since it alone resides at  $f_S/N$  and can be fairly compensated. The gain, timing, and bandwidth mismatches produce a spur at  $f_S/N \pm f_{IN}$  in the output spectrum, therefore an appropriated calibration routine is necessary to measure and compensate the spur harmonics. The calibration routine performed on the ARM processors consists of four steps:

- Step 1 Offset correction. All analog signals from the THA output are AC-coupled and therefore the offset between different interleaved channels are dramatically reduced by design. However, the residual offset could still be present due to the DC components between ADCs. The more appropriate technique to reduce such offset would be to match the offset of one ADC to the other ADCs. The offset of one ADC is chosen as reference, and the offset of the other ADCs are set to match that value as close as possible. In case of offset mismatch, no signal is necessary to see the inherent DC offset of the N ADCs.
- Step 2 Gain correction. In order to minimize the spur caused by gain mismatch, a similar strategy to the offset mismatch compensation is employed. The gain of one of the ADCs is chosen as reference and the gain of the other ADCs is set to match that gain value as close as possible. The gain component of the bandwidth mismatch can be separated from the gain mismatch by performing a gain measurement at low frequency near DC. The gain mismatch is not a function of frequency like the gain component of the bandwidth mismatch.
- Step 3 Phase correction. The two components of the phase mismatching, the group delay in the analog region and the clock skew of the THA, are strongly reduced by design with a proper routing of all analog and clock lines. All analog lines are routed with a precise timing/length matching in order to have a time skew down to 1.2 ps. Moreover, the uncertainty aperture time (THA) is expected to be <70 fs.<sup>36</sup> The sampling clock distribution of the THA is carefully routed, the total skew is 1.2 ps and a Root Mean Square (RMS) time jitter is 111 fs.<sup>40</sup> To measure and reduce the timing mismatch, a low frequency (near DC) reference signal is generated and then subsequent measurements are performed at higher frequencies to separate the timing component of bandwidth mismatch from the timing mismatch.
- Step 4 Bandwidth correction. Because the bandwidth mismatch results in different gain values of the sampling channels at different frequencies, the mismatch characterization is performed by applying of a high-frequency reference signal and measure the gain of the sampling channels. The best way to minimize the bandwidth mismatch is to have very good circuit design and layout practice that work to minimize bandwidth mismatches between the ADCs. The RF-analog transmission lines have been designed to match a bandwidth up to 50 GHz.

# 7. INTEGRATION WITH PHOTONIC FRONT-ENDS

Depending on applications, THERESA will be a versatile tool that can be used in various implementations of photonic time-stretch. In the short term, the devices are developed for diagnostics in accelerator physics and high throughput single-shot THz time-domain spectroscopy. In both cases, the front-end principle is similar, and aims at recording electric fields. However, whereas the basic principle displayed in Figure 1 is viable without modification, investigations in real cases revealed two main challenges on the photonic side: (1) ensuring enough sensitivity and (2) ensuring high bandwidth even for long recordings, which has been a problem for single-shot EO sampling up to recently.

#### 7.1 Increasing the effective number of bits using balanced detection at the analog level

When using a single photodetector form (Figure 1), at each shot, the electro-optic signal is mainly composed of the same signal (corresponding to the stretched laser pulse shape  $S_L(t)$ ), and the information on the electric field is usually a small modulation  $\epsilon(t)$  of  $S_L(t)$ . The signal at the ADC system's input has thus the form:

$$X(t) = (1 + \epsilon(t)) S_L(t), \qquad (2)$$

which means that, using a single photodetector, most of the ENOB of the ADC board would be used for recording the laser shape, and a only small fraction of the ENOB would be effectively used for recording the information  $\epsilon(t)$ . An effective solution consists of using a balanced photodetector, in order to subtract the signal S(t) before digitization, thus using the full ENOB of THERESA for recording the electro-optic signal  $\epsilon(t)$ . Two alternate strategies have been successfully tested, up to now using classical oscilloscopes:

- 1. One solution consists of subtracting the laser pulse shape  $S_L(t)$  at each shot. This technique has been tested at KARA,<sup>20</sup> for recording the Coulomb field of electron bunches, and at SOLEIL<sup>46</sup> for recording freely propagating THz pulses.
- 2. A second solution is based on the fact that the photonic front-end can provide two complementary signals. Subtracting these two signals, using a balanced detector, is thus also a natural solution, which has been implemented in several experiments.<sup>7, 47, 48</sup> One example of front-end with balanced detection is represented in Figure 9.



Figure 9: Detail of one of the photonic front-ends compatible with THERESA. A chirped laser pulse is modulated by the electric field under interest, using the Pockels effect in an electro-optic crystal. Then several layout options are possible. Here we display the version where the two outputs can be subtracted at the analog level.<sup>47,48</sup> THERESA is also expected to be compatible with the so-called DEOS scheme for achieving even larger bandwidth (see text and Reference<sup>49</sup>). Note that Brewster plates are optional and have the effect of increasing the sensitivity, typically by one order of magnitude.<sup>47</sup> Options that avoid the use of polarization maintaining (PM) fibers are also possible (see, e.g., Reference<sup>7</sup>). YDFA: Ytterbium-doped amplifier, GaP Xtal: Gallium Phosphide crystal.

In addition, when the subtraction is performed between data coming from the same laser shot, the laser noise is naturally canceled out, which further increases the detectivity. Last but not least, specific developments have been performed in order to increase the electro-optic signal at the optical level. The interested reader will find the corresponding information in Reference.<sup>47</sup>

# 7.2 Foreseen integration of the Diversity Electro-Optic Sampling (DEOS) design, for recording THz signal over long time windows, and with high bandwidth

It is important to note that classical chirped pulse EO sampling suffers from a fundamental bandwidth limitation, when one output (or the difference between the two outputs) is recorded. It has been shown in the nineties<sup>50</sup> that the achievable temporal resolution is limited to:<sup>50</sup>

$$\tau_R \approx \sqrt{\tau_L \tau_w},\tag{3}$$

where  $\tau_L$  is the compressed laser pulse duration, and  $\tau_w$  is the duration of the stretched laser pulse at the electrooptic crystal (or equivalently the duration of the recording). However it has been also shown that it is possible to retrieve numerically the input THz signal with high temporal resolution (i) from the two outputs of the EO sampling system, and (ii) for special arrangements of the crystal and waveplate orientations. The method, called Diversity Electro-Optic Sampling (DEOS),<sup>49</sup> has been up to now demonstrated in designs using a different type of readout (that is based on a single-shot grating optical spectrum analyzer). However, we think that DEOS should be implementable also in the case of photonic time-stretch. In this case, we expect the time resolution (for the THz signal) to be limited by the laser pulse duration:

$$\tau_R = O(\tau_L),\tag{4}$$

or to be limited by the crystal or the electronics bandwidths, whichever comes first. This represents an important improvement with respect to the classical limitation of Eq. (3). As an illustration, using a 100 fs laser and a 10 ps recording window would limit the resolution  $\tau_R$  to values of the order of 1 ps using the classical (single output) arrangement, this issue being worse when the need acquisition window increases. In contrast, using DEOS, this limitation will be simply limited by the laser pulse duration (or the bandwidths of the crystal or electronics). THERESA should be able to integrate the DEOS algorithm in real time, by using the on-board FPGA. At each shot, for a recording on N points, the computing time is mainly needed for performing three Fast Fourier Transforms (over N points) – see Ref.<sup>49</sup> for details.

#### 8. CONCLUSION AND OUTLOOK

Modern photon science detectors require cutting-edge technologies: high-frequency, low-noise analog sampling circuits, high-throughput readout electronics combined with advanced real-time data processing. The latest generation of ZYNQ UltraScale+ Radio Frequency System-on-Chip (*RF*SoC) device has proven to be an efficient platform that simplifies the complex design. It supports also artificial intelligence that is planed to be used in future for the data processing. The THERESA sampling system is a result of close collaboration between engineering and beam line scientists. This collaboration has produced one of the fastest digitizer available in the scientific communities with an analog bandwidth up to 20 GHz and a sampling rate up to 90 GS/s. When combined with the photonic time-stretch setup, developed at Lille University, the system will be able to sample an incoming signal with an unprecedented frame rate of 8 TS/s. The continuous acquisition for long observation time by high data throughput readout electronics combined with artificial intelligence framework will open up new possibilities for real-time data processing and fast detection of rare events nowadays considered very difficult or almost impossible to observe.

#### ACKNOWLEDGMENTS

This work is supported by the Helmholtz Program-Oriented Funding (PoF), research program Matter and Technologies (Detector Technology and System) and the Franco-German ULTRASYNC ANR-DFG project, the French CEMPI LABEX, the Wavetech CPER, and the DYDICO project from Lille University.

#### REFERENCES

- Bhushan, A. S., Coppinger, F., and Jalali, B., "Time-stretched analogue-to-digital conversion," *Electronics Letters* 34(9), 839–841 (1998).
- [2] Chou, J., Han, Y., and Jalali, B., "Time-wavelength spectroscopy for chemical sensing," *IEEE Photonics Technology Letters* 16(4), 1140–1142 (2004).
- [3] Goda, K., Tsia, K., and Jalali, B., "Serial time-encoded amplified imaging for real-time observation of fast dynamic phenomena," *Nature* 458(7242), 1145–1149 (2009).
- [4] Mahjoubfar, A., Goda, K., Ayazi, A., Fard, A., Kim, S. H., and Jalali, B., "High-speed nanometer-resolved imaging vibrometer and velocimeter," *Applied Physics Letters* 98(10), 101107 (2011).
- [5] Wong, T. T., Lau, A. K., Wong, K. K., and Tsia, K. K., "Optical time-stretch confocal microscopy at 1 μm," Optics letters 37(16), 3330–3332 (2012).
- [6] Bosworth, B. T., Stroud, J. R., Tran, D. N., Tran, T. D., Chin, S., and Foster, M. A., "High-speed flow microscopy using compressed sensing with ultrafast laser pulses," *Optics express* 23(8), 10521–10532 (2015).

- [7] Roussel, E., Evain, C., Le Parquier, M., Szwaj, C., Bielawski, S., Manceron, L., Brubach, J.-B., Tordeux, M.-A., Ricaud, J.-P., Cassinari, L., et al., "Observing microscopic structures of a relativistic object using a time-stretch strategy," *Scientific reports* 5(1), 1–8 (2015).
- [8] Mahjoubfar, A., Churkin, D. V., Barland, S., Broderick, N., Turitsyn, S. K., and Jalali, B., "Time stretch and its applications," *Nature Photonics* 11(6), 341–351 (2017).
- [9] Solli, D. R., Ropers, C., Koonath, P., and Jalali, B., "Optical rogue waves," Nature 450(7172), 1054–1057 (2007).
- [10] Hanzard, P.-H., Godin, T., Idlahcen, S., Rozé, C., and Hideur, A., "Real-time tracking of single shockwaves via amplified time-stretch imaging," *Applied Physics Letters* 112(16), 161106 (2018).
- [11] Herink, G., Jalali, B., Ropers, C., and Solli, D. R., "Resolving the build-up of femtosecond mode-locking with single-shot spectroscopy at 90 mhz frame rate," *Nature Photonics* 10(5), 321–326 (2016).
- [12] Herink, G., Kurtz, F., Jalali, B., Solli, D. R., and Ropers, C., "Real-time spectral interferometry probes the internal dynamics of femtosecond soliton molecules," *Science* 356(6333), 50–54 (2017).
- [13] Sun, S., Lin, Z., Li, W., Zhu, N., and Li, M., "Time-stretch probing of ultra-fast soliton dynamics related to q-switched instabilities in mode-locked fiber laser," *Optics express* 26(16), 20888–20901 (2018).
- [14] Ryczkowski, P., Närhi, M., Billet, C., Merolla, J.-M., Genty, G., and Dudley, J. M., "Real-time full-field characterization of transient dissipative soliton dynamics in a mode-locked laser," *Nature Photonics* 12(4), 221–227 (2018).
- [15] Peng, J., Sorokina, M., Sugavanam, S., Tarasov, N., Churkin, D. V., Turitsyn, S. K., and Zeng, H., "Realtime observation of dissipative soliton formation in nonlinear polarization rotation mode-locked fibre lasers," *Communications Physics* 1(1), 1–8 (2018).
- [16] Liu, M., Li, H., Luo, A.-P., Cui, H., Xu, W.-C., and Luo, Z.-C., "Real-time visualization of soliton molecules with evolving behavior in an ultrafast fiber laser," *Journal of Optics* 20(3), 034010 (2018).
- [17] Suzuki, M., Boyraz, O., Asghari, H., Trinh, P., Kuroda, H., and Jalali, B., "Spectral periodicity in soliton explosions on a broadband mode-locked yb fiber laser using time-stretch spectroscopy," *Optics letters* 43(8), 1862–1865 (2018).
- [18] Wang, Z., Nithyanandan, K., Coillet, A., Tchofo-Dinda, P., and Grelu, P., "Optical soliton molecular complexes in a passively mode-locked fibre laser," *Nature communications* 10(1), 1–11 (2019).
- [19] Touil, M., Becheker, R., Godin, T., and Hideur, A., "Spectral correlations in a fiber-optical parametric oscillator," *Physical Review A* 103(4), 043503 (2021).
- [20] Bielawski, S., Blomley, E., Brosi, M., Bründermann, E., Burkard, E., Evain, C., Funkner, S., Hiller, N., Nasse, M. J., Niehues, G., et al., "From self-organization in relativistic electron bunches to coherent synchrotron light: observation using a photonic time-stretch digitizer," *Scientific reports* 9(1), 1–9 (2019).
- [21] Chen, C. L., Mahjoubfar, A., Tai, L.-C., Blaby, I. K., Huang, A., Niazi, K. R., and Jalali, B., "Deep learning in label-free cell classification," *Scientific reports* 6(1), 1–16 (2016).
- [22] Mahjoubfar, A., Chen, C., Niazi, K. R., Rabizadeh, S., and Jalali, B., "Label-free high-throughput cell screening in flow," *Biomedical optics express* 4(9), 1618–1625 (2013).
- [23] Moon, S. and Kim, D. Y., "Ultra-high-speed optical coherence tomography with a stretched pulse supercontinuum source," Optics Express 14(24), 11575–11584 (2006).
- [24] Xu, J., Zhang, C., Xu, J., Wong, K., and Tsia, K., "Megahertz all-optical swept-source optical coherence tomography based on broadband amplified optical time-stretch," *Optics letters* 39(3), 622–625 (2014).
- [25] Kobayashi, M., Arashida, Y., Yamashita, G., Matsubara, E., Ashida, M., Johnson, J. A., and Katayama, I., "Fast-frame single-shot pump-probe spectroscopy with chirped-fiber bragg gratings," *Optics letters* 44(1), 163–166 (2019).
- [26] Jiang, Y., Karpf, S., and Jalali, B., "Time-stretch lidar as a spectrally scanned time-of-flight ranging camera," *Nature photonics* 14(1), 14–18 (2020).
- [27] Xilinx Inc., Zynq UltraScale+ RFSoC Data Sheet: Overview (April 2021). v. 1.12.
- [28] Caselle, M., "KAPTURE-2. a picosecond sampling system for individual THz pulses with high repetition rate," *Journal of Instrumentation* 12, C01040–C01040 (January 2017).

- [29] Brosi, M., Steinmann, J. L., Blomley, E., Boltz, T., Bründermann, E., Gethmann, J., Kehrer, B., Mathis, Y.-L., Papash, A., Schedler, M., Schönfeldt, P., Schreiber, P., Schuh, M., Schwarz, M., Müller, A.-S., Caselle, M., Rota, L., Weber, M., and Kuske, P., "Systematic studies of the microbunching instability at very low bunch charges," *Physical review accelerators and beams* 22(2) (2019). 54.01.01; LK 01.
- [30] Wang, W., Caselle, M., Boltz, T., Blomley, E., Brosi, M., Dritschler, T., Ebersoldt, A., Kopmann, A., Santamaria Garcia, A., Schreiber, P., Bründermann, E., Weber, M., Müller, A.-S., and Fang, Y., "Accelerated Deep Reinforcement Learning for Fast Feedback of Beam Dynamics at KARA," *IEEE Transactions* on Nuclear Science 68(8), 1794–1800 (2021).
- [31] ON Semiconductor, NB6L295 2.5V / 3.3V Dual Channel Programmable Clock/Data Delay with Differential LVPECL Outputs (March 2012).
- [32] Analog Devices, HMC-AUH312-DIE (2019).
- [33] Rogers Corporation, RT/duroid 5880LZ High Frequency Laminates (2021). Publication # 92-137.
- [34] Panasonic, High Speed, Low Loss Multi-layer Materials (April 2021). Data Sheet.
- [35] Inc, P. L., "RFC-2.4MM SERIES 2.4MM 50 GHZ PCB EDGE MOUNT CONNECTOR." Online http://www.phxlogistics.com/parts-catalog?page=shop.product\_details&flypage=flypage-ask. tpl&product\_id=561&category\_id=4 (2020). (Datasheet).
- [36] Analog-Devices, "Ultra-wideband 4 GS/s Track-and-Hold Amplifier DC 18 GHz." Online https: //www.analog.com/media/en/technical-documentation/data-sheets/hmc661.pdf (2015). (Version: v03.0615).
- [37] Samtec, ".050" LP Array<sup>™</sup> High-Speed High-Density Low Profile Open-Pin-Field Array, Terminal." Online https://www.samtec.com/products/lpam (2008). (Datasheet).
- [38] Texas Instruments, LMX2594 15-GHz Wideband PLLATINUM<sup>™</sup> RF Synthesizer With Phase Synchronization and JESD204B Support (April 2019).
- [39] VITA 57.1-2008 standard, "VITA 57.1-2008 standard." Online https://www.vita.com/FMC-News-Portal/ 6653462 (2008). (ANSI AND VITA RATIFY ANSI/VITA 57.1-2008 FPGA MEZZANINE CARD (FMC) STANDARD).
- [40] Texas-Instruments, "Low-noise clock jitter cleaner with dual loop PLLs and integrated 2.9-GHz VCO." Online https://www.ti.com/product/LMK04808 (2014). (Revision: K).
- [41] Xilinx, "Zynq UltraScale+ RFSoC ZCU216 Evaluation Kit." Online https://www.xilinx.com/products/ boards-and-kits/zcu216.html (2022). (Accessed: 16 January 2022).
- [42] Xilinx Inc., Zynq UltraScale+ RFSoC RF Data Converter v2.4 Gen 1/2/3 (November 2022). v. 2.4.
- [43] Xilinx Inc., UltraScale+ Devices Integrated 100G Ethernet Subsystem v3.1 (February 2021). v. 2.1.
- [44] Xilinx, "PYNQ: PYTHON PRODUCTIVITY." Online http://www.pynq.io/home.html (2018). (On-line version: 2.6.1).
- [45] jupyter, "Jupyter Lab: A Next-Generation Notebook Interface." Online https://jupyter.org/ (2022). (On-line version).
- [46] Szwaj, C., Evain, C., Le Parquier, M., Bielawski, S., Brubach, J., Manceron, L., Tordeux, M., Labat, M., and Roy, P., "Improving the sensitivity of existing electro-optic sampling setups by adding brewster plates: Tests of the strategy at soleil," 6th International Beam Instrumentation Conference, JACoW proceedings (online) https://doi.org/10.18429/JACoW-IBIC2017-TUPCC08.
- [47] Szwaj, C., Evain, C., Le Parquier, M., Roy, P., Manceron, L., Brubach, J.-B., Tordeux, M.-A., and Bielawski, S., "High sensitivity photonic time-stretch electro-optic sampling of terahertz pulses," *Review of Scientific Instruments* 87(10), 103111 (2016).
- [48] Evain, C., Roussel, E., Le Parquier, M., Szwaj, C., Tordeux, M.-A., Brubach, J.-B., Manceron, L., Roy, P., and Bielawski, S., "Direct observation of spatiotemporal dynamics of short electron bunches in storage rings," *Physical review letters* **118**(5), 054801 (2017).
- [49] Roussel, E., Szwaj, C., Evain, C., Steffen, B., Gerth, C., Jalali, B., and Bielawski, S., "Phase diversity electro-optic sampling: A new approach to single-shot terahertz waveform recording," *Light: Science & Applications* 11(1), 1–14 (2022).
- [50] Sun, F., Jiang, Z., and Zhang, X.-C., "Analysis of terahertz pulse measurement with a chirped probe beam," *Applied Physics Letters* 73(16), 2233–2235 (1998).