

## A High-Performance Data Acquisition System for Smart Cameras in Science

zur Erlangung des akademisches Grades eines

Doktors der Ingenieurwissenchaften

der Fakultät für Elektrotechnik und Informationstechnik des Karlsruher Instituts für Technologie (KIT)

genehmigte

**DISSERTATION** 

von

**UROS STEVANOVIC** 

Referent: Prof. Dr. Marc Weber

Korreferent: Prof. Dr.-Ing. Jürgen Becker Tag der mündlichen Prüfung: 12.12.2017

#### Zusammenfassung

Diese Dissertation führt eine neuartige intelligente Kameraplatform ein, die als flexibles Datenerfassungssystem für wissenschaftliche Anwendungen dient. Durch den aktuellen technologischen Fortschritt wurde die Leistung in den für uns relevanten Bereichen erhöht, diese sind hoher Datendurchsatz, Datenverarbeitung und Detektorenleistung. Aktuell Datenakquisitionslösungen konzentrieren sich in der Regel auf einen dieser Aspekte. Aber die Anforderungen wissenschaftliche Experimente an Datendurchsatz, Geschwindigeit und Flexibilität steigen, getrieben durch die Forschung, stetig an.

In dieser Dissertation stellen wir ein System vor, das neben einer High-Speed-Datenübertragung auch in der Lage ist eingehenden Daten frühzeitig zu interpretieren.

Um das volle Potenzial der Kameraplattform zu demonstrieren legen wir den Fokus auf Röntgenbildgebung mit Synchrotron-Lichtquellen. Anwendungen der Röntgenbildgebung können die Merkmale der technologischen und biologische Prozesse über Mikrosekunden bei der Röntgenographie und Millisekunden bei Tomographie Anwendungen untersuchen. Diese Anwendungen können unterschiedliche Sensoren und komplexe Experiment erfordern.

Die neue intelligente Kameraplattform ist Teil eines größeren Projektes, mit namen UFO, das ein neues Konzept für die Röntgenbildgebung einführt. Die Online-Datenbewertung wird verwendet, um ein datengesteuertes Feedback und aktives Management sowohl des Prozesses als auch der Datengewinnung zu ermöglichen. Dies wird mit einer GPU-Plattform zur schnellen Rekonstruktion, integrierte Datenverarbeitung auf der Kamera erreicht und Integration der Smart-Kamera in ein Datenerfassungssystem mit hoher Durchsatzrate.

Das endgültige Design der Smart-Kamera-Plattform besteht aus einer speziell angefertigten Hochleistungs-FPGA-Platine, die kontinuierliche Datenübertragung bietet, eingebettet Bildverarbeitung und eine flexible Eingangsstufe. In der IMAGE-Strahllinie von ANKA, die Kamera ist in das neue
Steuerungssystem integriert und wird in realen Anwendungen eingesetzt.
Ein maximaler Datendurchsatz von bis zu 8 GB/s wird erreicht. Ein angepasster Bild-basierte Algorithmus, mit strengen Echtzeit-Anforderungen, ist
in dem FPGA implementiert und erhöht die native Sensorgeschwindigkeit
bis zu Faktor fünf, wobei die Menge übermittelten Datenmenge gleichzeitig
reduziert wird. Mehrere Bildsensoren finden verwendung, jeweils mit einer
Auflösungen von bis zu 20 Megapixel und Bildraten von bis zu 5 kfps. Die

intelligente Kameraplattform fand auch Verwendung in nicht-bildgebenden Anwendungen, die aus der flexiblen Eingangsstufe resultieren. Die vorgeschlagene Kameraarchitektur ermöglicht es dem Benutzer, das System an jede Anwendung mit hoher Durchsatzrate anzupassen und die Verarbeitungsalgorithmen anzupassen oder spezifisch zu implementieren.

### **Abstract**

This dissertation proposes a novel smart camera platform serving as a flexible data acquisition system for scientific applications. Current technological progress offers increasing performance in the areas we consider, namely high data-throughput, data processing, and detector performance. Prevalent data acquisition solutions typically focus on one of these aspects. However, driven by science, experiments experience increasing demands in terms of data throughput, speed and flexibility. In this dissertation, we introduce a system which, in addition to being able to provide high-speed data transfer, is also capable of interpreting the incoming information at an early stage.

In order to demonstrate the full potential of the smart camera platform, we focus on X-ray imaging with synchrotron light sources. X-ray imaging applications can investigate the traits of technological and biological processes over microseconds for radiography, and milliseconds for tomography applications. These applications may require different sensors, and include complex experiment operations.

The new smart camera platform is part of a larger project, UFO, which introduces a new concept for X-ray imaging. On-line data assessment is used to provide a data-driven feedback and active management of both the process and data acquisition procedure. This is accomplished using a GPU platform for fast reconstruction, embedded on-camera data processing, and integrating smart camera in a high-throughput data acquisition system.

The final design of the smart camera platform consists of a custom high-performance FPGA board, providing continuous data transfer, embedded image processing, and a flexible input stage. In the IMAGE beamline of ANKA, camera is integrated in the new control system, and used in real-life applications. A maximum data-throughput of up to 8 GB/s is achieved. A custom image-based algorithm is implemented in the FPGA, with stringent real-time requirements, able to increase native sensor speed up to five times while reducing the amount of transfered data. Several image sensors are used, with resolutions of up to 20 megapixels and frame rates of up to 5 kfps. The smart camera platform was also used in non-imaging applications, stemming from the flexible input stage. The proposed camera architecture enables the user to modify the current system for any kind of high data-throughput applications, and to modify and implement custom processing algorithms.

#### Acknowledgement

I have many kind memories over the years it took me to finish my dissertation. There were certainly more than a few low moments, but I am confident that in the future I will remember them fondly. I am very grateful to Prof. Marc Weber, for giving me this opportunity, and for the patience and guidance during my work on the dissertation. I would also like to thank Dr. Andreas Kopmann and Dr. Michele Caselle, for their help in forming the dissertation and finding answers to relevant questions. Naturally, I would like to mention my colleagues Dr. Matthias Vogelgesang, Dr. Suren Chilingaryan, Dr. Lorenzo Rota and Timo Dritschler with whom I've spent many hours. For the numerous discussions we had, and for their friendship, I will always be thankful.

Last, but not least, my biggest gratitude and unending love to Reiko, for everything.

For Reiko

# **Contents**

| 1 | Intr | roduction                                     | 1   |
|---|------|-----------------------------------------------|-----|
| 2 | Sma  | art camera platform concept                   | 7   |
|   | 2.1  | History of the digital cameras                | 7   |
|   | 2.2  | Smart cameras — definition and classification | 8   |
|   | 2.3  | Smart camera platform for UFO project         | 10  |
|   | 2.4  | Custom smart camera platform concept          | 13  |
| 3 | Sma  | art camera platform implementation            | 17  |
|   | 3.1  | Data processing using FPGAs                   | 17  |
|   | 3.2  | Smart camera architecture design              |     |
| 4 | Det  | ector module — pixel detectors                | 49  |
|   | 4.1  | Principle of operation                        | 49  |
|   | 4.2  | Charged-Coupled Devices (CCD) image sensors   | 60  |
|   | 4.3  | Monolithic CMOS image sensors                 | 62  |
|   | 4.4  | Acquisition process and noise sources         | 67  |
|   | 4.5  | Sensor characterization and comparison        | 71  |
|   | 4.6  | Image sensors characterization results        | 74  |
| 5 | App  | olications                                    | 85  |
|   | 5.1  | X-ray applications                            | 85  |
|   | 5.2  | The UFO project                               | 91  |
|   | 5.3  | Fast reject algorithm                         | 97  |
|   | 5.4  | Expansion of the streaming platform           |     |
| 6 | Cor  | nclusion                                      | 113 |

## 1 Introduction

For the last several decades, Moore's law [1] has accurately predicted the advancement in the development of integrated circuits. In short, Moore's law states that every one to two years, the number of transistors in the same floor area will double. Even though the Moore's law is effectively ending [2], new design techniques still keep the technological progress active. With the decrease in size of transistors, the performance and speed of the integrated circuits have increased, while power consumption, size, and costs of the computing devices have decreased. This in turn has caused substantial performance improvements and the development of new concepts in computing, data transfer, and data acquisition.

For data intensive scientific imaging applications, there has been continuous progress in both data processing and digital sensors. For data processing, the initial setback caused by the power consumption [3, 4] and higher clock frequency was overcome by further scaling down of the technology (currently at 7 nm) and by the development of multithreaded and multi-core architectures. Beside multi-core processors, technological progress has led to the increased performance in large-scale parallel data tasks using massively parallel architectures such as Graphics Processing Units (GPU) and Field Programmable Gate Arrays (FPGAs).

For visible photon detectors, the dominant technology up to the 2000s was the Charge Coupled Device (CCD) [5] due to its sensitivity and low noise performance. However, in the past decade, the maturity of the Complementary Metal Oxide Semiconductor (CMOS) technology provides attractive features, especially in terms of on-chip functionality, low power consumption, and high-speed imaging [6]. For high-speed imaging especially, modern CMOS sensors perform in tens of thousands frames per second (fps), with the data rates in several Giga Bytes per second (GB/s). This has led to new possibilities in scientific imaging applications, and to new challenges in obtaining and processing such high volumes of data.

The ongoing technological development, for the high-speed sensors and data processing capabilities, has not always been fully exploited so far. Up to now, there is no general technological solution that provides the user with the ability to easily couple sensors and data processing, while keeping the processing throughput that modern GPUs or FPGAs are capable of achiev-

2 Introduction

ing. Furthermore, commonly it is not possible to use the same data acquisition chain in applications with differing, sometimes even conflicting requirements. Present scientific imaging applications and experiments use a single device, a camera, to obtain visual data and transfer it to processing stage. Usually, the device is used with emphasis on one particular property, i.e. either the device focuses on the image sensor properties, data throughput, or internal data processing, but not all features are attained in a single device or camera.

In this dissertation, we will present a smart camera platform for scientific vision applications capable of providing both high data throughput and performant data processing. As an example for scientific applications, X-ray imaging was selected, where synchrotron radiation light serves as a valuable tool for scientific and technological research and development. Synchrotron-based X-ray imaging is a powerful method for non-destructive, high-resolution investigations of a broad range of samples for life and material sciences. At the ANKA synchrotron radiation source, located at the Karlsruhe Institute of Technology [7], a new beamline is under construction for high-speed and high-precision X-ray imaging. The focus of the new beamline is on the non-invasive research of materials and biological samples. The high-brilliance and high-coherence synchrotron radiation facility allows sub-micrometer, quantitative, 2D and 3D imaging at micro- or nano-second time scale. Unlike conventional imaging applications, X-ray imaging experiments have challenging requirements for high data rate and streamed data processing.

Currently available high-speed commercial cameras have several draw-backs in terms of data transfer, recording time, and data processing. For example, at ANKA, for high-speed X-ray imaging applications a camera [8] is used, where most of the operating time is spent on the data transfer, rather than the data acquisition. This is due to the large discrepancy between the image sensor speed and the data transfer rate of the camera. Images are stored in the camera's internal buffer before being sent further. A new data acquisition is only possible when all data is transferred. This type of operation is typical for most commercial cameras. Consequently, commercial cameras are usually not able to provide online data processing, due to the limited or non-existent streaming capabilities.

Next common drawback of commercial cameras is in their construction as close systems, where firmware is not open to the users. Therefore adaptation, modification, and optimization of the cameras functionality to the application requirements is generally not possible. Additionally, image sensors within the camera are not interchangeable, and sensor parameters are

commonly hidden from the user, with very few parameters available for optimization. This in turn limits the usage of a camera, and typically more than one cameras with different sensor parameters are needed for experiments with divergent requirements.

This dissertation will present the architectural reasoning of the smart camera platform in order to answer the presented demands. The platform takes advantage of the current technological progress, leading to significant increases in the performance and speed. The platform is able to work with different sensors, or data inputs, and at the same time to provide high throughput data processing. We also provide a general data acquisition chain that can be used in other non-vision high throughput applications. The flexible design of the smart camera platform expands the possibilities of current scientific applications and at the same time serves as a tool for the development of new applications.

#### Research points and goals

High-performance and high-speed applications are not only needed for image applications. The goal of the smart camera platform is to provide a flexible approach for high-performance and high-throughput data acqui-The ANKA accelerator consists of a 53 MeV microtron as a preaccelerator, a 500 MeV booster synchrotron and a 2.5 GeV storage ring [9]. For X-ray imaging applications, KIT's ANKA IMAGE beamline [10] provides high-energy and high-coherence X-ray beams, which are suitable for a data acquisition with a high sampling rate and high spatial resolution. The IMAGE beamline is developed as a part of a new project called UFO ("Ultra-Fast X-ray Imaging of Scientific Processes with On-line Assessment and Data-driven Process Control") [11]. The main project goals are to provide fast feedback based on data, and fast online reconstruction, in order to optimize the beam and actively control the experiments. This enables, for example, the investigations of processes at a micrometer level, and studying changes that occur on a microsecond scale. Additionally, it provides the ability to scan more samples and items within the same time period. High spatial resolution was initially achieved using "indirect detection" and cameras with CCD sensors, however, in order to achieve the highest frame rates, in the order of thousand fps, the most promising idea is to use CMOS sensors instead [12]. With this, commercial components can also be employed.

For high-speed imaging, approaches were to either use custom made devices [13], or more commonly, to use commercial cameras [14, 15]. However, commercial systems impose a hard limit on the users, i.e. there is no cur-

4 Introduction

rently available detector that is capable to continuously stream data at a millisecond or sub-millisecond time range to the processing stage. The current devices usually poses an internal memory that is used to store the data at the highest rate, and data is subsequently transferred further. The streaming capability is necessary in order to optimize the performance, such as increasing the duration of the experiment, improving beam time usage, obtaining all the desired results, or implementing novel data-based feedback loops.

Increasing the data throughput, spatial and temporal resolution, requires a flexible detector that is able to cope with the large amount of data and different experiment requirements. Besides, the lack of embedded data processing prevents novel experiments based on event-triggered feedback, or the ability to control the experiment based on the data processing results.

Advancing reactive, fast experiments necessitates a detector capable of high-throughput data acquisition, data processing, as well as versatility in operating image sensors. So far, none of the experiment setups provide the necessary solutions to such requirements. Instead, the usual approach is to use commercial systems or solutions tailored to one particular experiment that may limit the achievable results. Since the detector, or a camera, is intended to be used by scientists who do not necessarily have an engineering background, it is important to provide interface to the camera that "feels" familiar, and is not complicated to use. Therefore, the camera must work as commercial ones, and must provide simple access to its more sophisticated features such as embedded processing.

Focusing on these identified current limitations, the main questions addressed in this dissertation are:

- **Flexible and exchangeable input stage.** X-ray imaging experiments have differing requirements in terms of speed, data throughput, and spatial and temporal resolution. The requirement for the new camera platform is *modularity*, where sensors can be interchanged as necessary, or replaced with a custom input stage for non-vision applications.
- **High-throughput data streaming.** High-speed imaging applications, with framerate in hundreds of fps and sensor resolution in megapixels (MPix), have demanding requirements for data transfer. Typically, this would necessitate data transfers that exceeds several GB/s. The smart camera platform has a hard imperative to handle this throughput.
- Embedded processing. In order to achieve data-driven fast feedback, real-time data processing near the sensor is needed. The data processing is required to extract the relevant scientific information and to re-

duce the amount of raw data. Furthermore, depending on the application, different data processing algorithms can be implemented.

These questions led to the problems and design considerations further delineated in the remaining chapters. The focus of a novel smart camera framework is to to improve the current limitations of existing solutions, and to enable easy customization for experiments. The main design goals of the smart camera platform are modularity, streaming capability, and embedded data processing. With modularity, the sensor or detecting device can be chosen according to experiment demands, and access to its all parameters is available to the user. Streaming removes the bottleneck of acquiring the data at full image sensor speed, with further benefits of beam usage optimization and eliminating the hard experiment time limits. Using the embedded processing capabilities, it is possible to have preprocessing close to the sensor, or implement image processing algorithms, or to have a feedback based event capture, or any combination of the three.

The smart camera platform is part of the general UFO DAQ framework [11], and it is accompanied by the readout PC and GPU servers for enhanced processing. This setup allows the partitioning of functions in programmable logic, readout PC or accelerators (like GPUs or FPGAs). As mentioned, the camera will be used for X-ray imaging, and the benefits of such camera is further highlighted with the integration in the IMAGE beamline control setup [16]. The following chapter provides an overview of the existing smart camera solutions, explains in more details the concept behind the smart camera platform, and makes the case for the architectural solutions used in its final design.

# 2 Smart camera platform concept

A smart camera is a vision system that can perform tasks far beyond simply taking photos and recording videos. Smart cameras are not simple vision devices but they are embedded systems that can process and extract the information from the incoming data. Smart cameras combine video sensing, processing, and communication. Their area of applications has vastly expanded in recent years, and includes, among others, industrial vision, robotics, traffic control, security systems, high-performance imaging. They can detect motion, measure objects, interpret human behavior. The ever increasing demands in speed, amount of information, and real-time data processing have caused a growing interest in the research communities for the development and expansion of smart cameras. The advantages of the smart cameras over conventional cameras is in providing image analysis with the image acquisition, often in one compact system. This chapter will present the history and development of the smart and, naturally, digital cameras, as well as the goals and rationale for the implemented smart camera platform.

#### 2.1 History of the digital cameras

Digital cameras capture and store images in digital form on digital memory cards, internal storage memory, or transfer data directly to the receiving device. The concept of digitizing video signals was initially used in scanners [17]. The history of the digital cameras starts with the development of the first practical videotape recorder (VTR). Charles Ginsburg led the development of the first practical implementation at Ampex Corporation [18, 19]. It converted the incoming electrical impulses and stored the information onto magnetic tape.

The advancement of digital photography was influenced by the US space program and military spy applications. Early spy satellites used airborne retrieval of photographs on film, which was complicated. This created a need for an electronic image capturing device that could replace film. NASA switched to digital technology in the 1960s, during the development of the Apollo Lunar Exploration Program. The first time the "digital photography" was mentioned was in 1961 by Eugene F. Lally of the Jet Propulsion Laboratory, when he published the first description of how to produce digital

still photos using a mosaic photosensor [20]. The purpose was to provide onboard navigation information to astronauts on missions to planets.

Solid state imaging started in 1960s, when different research groups were working on NMOS, PMOS, and bipolar processes with varying degree of success. In 1963, a structure that allowed the determination of the spatial position of a light spot was mentioned [21]. IBM presented in 1964 the *Scanistor* [22], which used an array of npn junctions. The term "pixel" was mentioned for the first time in 1964 [23]. A monolithic mosaic of photon sensors was presented in 1966, with a monolithic 50 x 50 pixel array of phototransistors [24].

All of these sensors operated without integration, and their sensitivity was very low. Weimer et al. [25] introduced in 1967 a 180 x 180 pixel sensor array using CdS/CdSe thin-film transistors. Noble [26] described in 1968 several configurations of self-scanning silicon image detector arrays, and also documented the issue of fixed pattern noise [27]. The first charge-coupled device (CCD) chip was proposed by Boyle and Smith in 1969 [28]. The first confirmed attempt at building an actual digital camera was in 1975 by Kodak [29]. The camera consisted of the solid state CCD image sensor chips developed by Fairchild Semiconductor in 1973 [30]. The camera recorded black and white images to a cassette tape, with a resolution of 0.01 MPix, and took 23 seconds to record an image. In 1981 Sony launched *Mavica* electronic still camera. In 1985 Fairchild introduced the first line scan camera, using a sensor array with pixels in one row only. The first megapixel image sensor was introduced by Kodak in 1986 [31].

The development of the digital cameras, and consequently smart cameras, was closely related to the progress of digital electronics. More advanced features were only made possible with the recent advancement in processing capabilities of integrated circuits.

#### 2.2 Smart cameras — definition and classification

The earliest commercial smart cameras can be traced back to the 1980s, with the invention of optical mouse [32]. The sensing and processing capabilities of early smart cameras were very limited. Their applications were very limited as well, mostly to perform simple machine vision tasks.

When we speak about what are "smart cameras", there is no clear definition of the term smart camera. According to *Belbachir et al.* [31], the first publication of the term was in 1975 [33]. There are many established descriptions of smart cameras, but they differ between manufacturers, areas of applications, and academic groups. In most definitions, the image process-

ing feature is stressed as the crucial one, however, many cameras have some kind of processing capabilities. Most modern consumer cameras provide processing such as auto-focus, automatic white-balance, or image compression, among others. This does not, however, make them "smart". The processing functions are provided with the purpose of improving image quality, or for data reduction and more efficient data transfer. The main purpose of image processing in smart cameras is to generate events, react, or to provide decisions for other devices, all in an automated fashion, depending on the extracted information from the incoming images.

A simplified representation of the smart camera structure, found in one of the seminal books on the subject [31], is shown in Figure 2.1. The *optics* 



**Figure 2.1.** Simplified functional structure of a smart camera, from [31].

provide the proper illumination of the image sensor. The *image capture* block consists of a solid-state image sensor (CMOS or CCD) and associated circuits or components that ensure conversion from light to digitized image array. The application-specific information processing (ASIP) block is what makes the camera smart. The goal of ASIP is not only to provide better image quality, for example by removing noise, but to process and extract the information from the incoming images and react or propagate the information to the user or for further processing. The *communication interface* receives commands or instructions from a user or a host, and sends out data or decisions to a user or an intelligent system.

Belbachir et al. [31] defines three important aspects for a smart or "intelligent" camera:

- *Frame acquisition* defines the ability of the camera to acquire images. The images do not necessarily have to be in visible light (e.g. infrared cameras).
- *Embedded processing* in cameras is achieved using FPGA, Digital Signal Processor (DSP), or a CPU/GPU combination. Embedded memory and communication interface are also needed for the camera to operate in autonomous and automatic way.

• Event generation means that the camera utilizes the built-in processing capabilities not only for improving the image quality, but also to detect a predefined event, and to be able to properly react to it.

This can be compared to the concept of *Trigger* in High Energy Physics (HEP), where *Frame acquisition* corresponds to data acquisition, *Embedded processing* to processing of the initial data, and *Event generation* to event building, used to define when and which data should be stored for offline analysis [34].

As with the absence of the common definition of the smart cameras, there is also no general classification of the smart cameras. One way of classifying smart cameras can be based on their functionality. This has the advantage to provide clear indication where the smart cameras are used, e.g. smart cameras can be used in machine vision, robotics, surveillance, etc., and they can be categorized accordingly. The classification used in [35] classifies the camera whether the processing occurs in sensors itself ("artificial retinas"), in the PC which is connected to the camera ("PC-based systems"), or the camera possesses internal processing capability ("stand-alone cameras"). This however, does not include the distributed smart cameras.

In [36], cameras are organized in three general groups: single smart cameras, distributed smart cameras, and pervasive smart cameras. For single smart cameras, processing could be realized in the image sensor [37], DSP [38, 39], FPGA [40], or using processors [41]. Distributed smart cameras are formed when multiple smart cameras are integrated in a single common network. One of the benefits is the extended sensor coverage. Here too, exists several classification whether the smart cameras are organized in a centralized or decentralized manner, with various architectures and applications [42, 43, 44, 45]. Pervasive smart cameras integrate flexibility and autonomy to distributed smart cameras [46, 47, 48].

Belbachir et al. classification is shown in Figure 2.2. Similar classification can be found in other publications, e.g. [49]. It is similar to already mentioned classifications with more detailed information. Main three categories are: integrated smart cameras, compact-system smart cameras, and distributed smart cameras. Integrated smart cameras are further classified in three types: single chip smart cameras, embedded smart cameras, and stand-alone smart cameras. Their structure and common field of usage is shown in Table 2.1.

#### 2.3 Smart camera platform for UFO project

The project Ultra Fast X-ray imaging of scientific processes with On-line assessment and data-driven process control (UFO) aims to develop the next



**Figure 2.2.** Smart camera classification. From top to bottom, cameras have decreasing level of integration (from [31])

generation of a X-ray computer tomography experimental station optimized for 3D and 4D imaging. The improved time resolution will give insight into the temporal evolution, allows scientists to better understand functional units of devices and organisms and helps to optimize technical processes. The whole setup consists of three sections: the beamline providing X-rays, the UFO experimental station, and a high-performance data storage system (more details in 5.2).

In order to establish high-speed volumetric imaging, some of the identified bottlenecks and limitations are embedded data processing, high-speed readout, and data transfer. Most modern cameras are a "black box", meaning that modifications of the existing functionality is not supported. Furthermore, it is not possible to implement any data processing on-camera. The other mentioned bottleneck in commercial cameras is in their relatively slow data transfer (usually, less than 1 GB/s) [50]. This is true even for high-speed cameras, which use internal memory to store the data at their highest speed, e.g. PCO [8]. These limitations exclude most commercial high-speed cameras, and smart camera platform was designed to overcome the limitations.

The smart camera platform extends the original definition of smart cam-

| Туре                                | Structure                                                                           | Applications                                                                                                     |  |
|-------------------------------------|-------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|--|
| Single chip<br>smart camera         | ASIP on the same chip as<br>the image sensor,<br>extremely low power,<br>small size | Toys, sensors                                                                                                    |  |
| Embedded smart cameras              | Camera is embedded in another device such as a mobile phone                         | Optical mice, fingerprint readers, smart camera phones                                                           |  |
| Stand-alone<br>smart cameras        | "Normal" smart cameras, single camera casing                                        | Industrial machine vision,<br>human computer<br>interfaces                                                       |  |
| Compact-<br>system smart<br>cameras | ASIP in a separate embedded system (e.g. PC)                                        | Security, traffic surveillance, machine vision                                                                   |  |
| Distributed smart cameras           | Part of the system ASIP rendered by the network topology                            | Intelligent and pervasive video surveillance, industrial machine vision, pervasive information gathering systems |  |

**Table 2.1.** Structure and example applications of smart cameras (from [31])

eras (in Chapter 2.2) by adding the modularity, with the exchangeable sensors for front-end modification. Additionally, it removes the identified bottlenecks with high-throughput streaming to eliminate the latency between the frames coming from the sensor and the processing stage, and provides embedded processing. This is illustrated in Figure 2.3.

FPGAs are selected for the smart camera design, due to their flexibility to interface with other hardware periphery, their modifiability and parallel processing capabilities [51, 35, 40]. As imaging components, CMOS image sensors were selected, due to their advantages over CCDs, in terms of high-speed and ease of operation. The structure and the functionality of the basic building blocks are explained in more detail in subsequent chapters.



**Figure 2.3.** Smart camera platform concept. The application specific areas are easily modifiable to fit the applications requirements. The underlying basic structure and interface do not have to be modified.

#### 2.4 Custom smart camera platform concept

Many definitions of smart cameras emphasize the built-in image processing ability [31]. Smart cameras are capable of extracting application-specific information from the captured images. The technological progress in integrated circuit design provide special effectiveness for smart camera design, producing speed, flexibility and parallel processing capabilities. Flexible camera architectures can be adapted to a large number of applications. FPGAs provide dedicated functional blocks and task parallelism that can perform complex processing operations with real time constraints [40] and are successfully used to accelerate computing applications [51]. This dissertation proposes a modular FPGA-based high-throughput platform that is intended for scientific high-speed imaging applications [52, 53]. The architecture of the camera is presented in Figure 2.4. The CMOS image sensors are selected as the image sensing devices, located on the interchangeable daughter carrier cards. The main readout board contains a Virtex-based Xilinx [54] FPGA and Dual-Data Rate (DDR) Random Access Memory (RAM) memory. The FPGA architecture contains the interface to the CMOS sensor, data readout and the custom logic.

#### Concept of modularity

For the selected field of usage for the smart camera platform, X-ray synchrotron imaging, there are different, and sometimes mutually exclusive requirements for experiments. Third generation synchrotron sources are almost ideal instruments for a large number of applications, like structural biology and material science. The possibilities for the improvement and the development of new studies have only expanded in recent years with



**Figure 2.4.** Smart camera architecture. We can identify three main areas: a) modularity, with the CMOS sensor and input stage, b) embedded processing stage, and c) fast streaming readout.

the technological development [55]. Depending which techniques are employed (e.g. crystallography and tomography), and on the desired observed processes, the requirements in terms of speed, resolution, amount of data differ. The usual approach in resolving these conflicting demands was to have several different cameras with image sensors attuned for one specific condition. This may complicate the operation of the beamline, since each camera needs to be integrated in the beamline.

The modular design of the smart camera offers the opportunity to select the CMOS image sensors according to the application requirements and the respective sensor characteristics, while the means of operating the camera has not changed. Furthermore, as we can see in Figure 2.4, the camera platform can be used as a data acquisition platform, where input stage can be decoupled from the rest of the architecture. More information will be provided in the subsequent chapters.

#### **Embedded processing**

FPGAs for image acquisition and preprocessing were already existing in the 1990s [56, 57, 58]. With the technological advancements, FPGAs were applied for more complex operations [58, 59]. Field programmable devices provides flexibility, reconfigurability, and parallel processing, which are especially suitable for smart camera design. FPGAs provide dedicated func-

tional blocks and task parallelism that can perform or accelerate complex processing operations with real-time constraints [40]. Our smart camera platform was used to implement a custom image processing algorithm, fast reject, of which more details are provided in Chapter 5.3.

#### High data throughput

Modern high-speed image sensors provide a large amount of data, e.g. AM1X5 image sensor [60] has a speed of 5kfps with 1Kx1K pixels, which corresponds to 50Gb/s of data. As shown in Figure 2.4, a standard PCI Express (PCIe) connection is provided to transfer the data from the camera directly to the computer. The benefits of using PCIe are low overhead and a high data throughput. For example, PCIe GEN3 x8 (8-lanes connection of generation 3) provides a bandwidth of 64 Gb/s [61]. A custom PCIe-DMA (Direct Memory Access) module, operating as a Bus Master, i.e. it can request a control of the PCIe bus, is used for a continuous data transfer to the system memory. The principle of operation of the custom DMA module will be explained later.

# 3 Smart camera platform implementation

Smart camera platform was designed with flexibility and versatility as main features. In the first part of the chapter, we will present the history and features of the FPGAs, and present their suitability for embedded processing. We will also review current smart camera implementations. In the second part, we will present the final implementation of the smart camera platform, in the form of a custom FPGA board, designed and produced in IPE, KIT. We will explain how the functionalities of the platform are partitioned, and the reasoning behind it.

#### 3.1 Data processing using FPGAs

Reconfigurable hardware (RH), which is what FPGAs essentially are, provides flexibility for implementing features and functionalities in hardware. The hardware consist of a set of logic and routing resources controlled by configuration memory. Originally, reconfigurable hardware was used as a glue logic, connecting functionalities implemented in Application Specific Integrated Circuits (ASIC). As the price per unit and functionalities of the FPGAs are improving, the fixed-design ASICs are becoming less appealing [62]. Many applications benefit from the ability for fine grained tailoring of the functionalities to the application, or, e.g. the ability to reconfigure the hardware at runtime [63]. The applications of interest to us are scientific data acquisition and analysis [64, 65], and image and video applications. As mentioned in [63], for image and video applications, design can be easily changed to fit the application [66, 67].

#### FPGAs — History and structure

A Field-Programmable Gate Array (FPGA) is a large-scale integrated circuit that can be programmed after it is manufactured rather than being limited to a predetermined, unchangeable hardware function. The term "field-programmable" refers to the ability to change the operation of the device "in the field", while "gate array" is a somewhat dated reference to the basic internal architecture that makes this after-the-fact reprogramming possible

[68]. FPGAs have become one of the key digital circuit implementation media over the last decade. A crucial part of operating FPGAs lies in their architecture, which controls their programmable logic functionality and their programmable interconnect ability [69]. One main advantage of the FPGAs over the conventional computers and their Arithmetic Logic Units (ALU) is that the functionalities are implemented and executed as a parallel process, rather than sequential. More information about this is presented later in the dissertation.

The origin of the FPGAs is related to the development of the integrated circuits in the 1960s. The ability to change the logic function of a chip after the fabrication process was achieved with the introduction of cellular arrays [70]. The functionality of each logic cell in the array could be determined by setting programmable fuses, with the use of programming currents or photo-conduction exposure. In the 1970s, devices based on read-only memory (ROM) were produced. These first circuits, programmable logic arrays (PLA), and later, programmable array logic (PAL), relied on the AND-OR logic planes to achieve the desired functionality [71], as shown in Figure 3.1. The first modern FPGA was manufactured by Xilinx in 1984 [72]. It con-



**Figure 3.1.** PLA(a) and PAL(b) structure. [69]

tained 64 logic blocks and 58 inputs and outputs (I/O). Current FPGAs contain around 2 million logic cells and have around 1200 I/O pins [73, 74]. They have sufficient logic resources (and associated interconnect routing) to implement many complex applications on a single chip [75]. More recent trends have led to the integration of specific functional blocks within the FPGA, including multipliers, memories, high speed input-output interfaces,

and even serial processor cores [76].

FPGAs are based around a matrix of Configurable Logic Blocks (CLBs) connected through programmable interconnects. As opposed to Application Specific Integrated Circuits (ASICs), where the device is custom built for the particular design, FPGAs can be programmed to the desired application or functionality requirements. Although One-Time Programmable (OTP) FPGAs are available, the dominant type are SRAM-based, which can be reprogrammed as the design evolves. While the structure of the FPGA shown in Figure 3.2 uses Xilinx terminology, i.e. CLB, the principle is the same for other FPGA manufacturers.



*Figure 3.2.* FPGA block structure, from Xilinx [77]

Another, more general representation of the FPGas can be found in Figure 3.3. The basic logic blocks are usually based on look-up table (LUT) structure, making them being able to implement any logical function. The logic blocks are usually organized in a grid structure and interconnected via a programmable routing matrix that enables the blocks to be connected in arbitrary configurations. This also means that any signal can be connected to virtually any I/O pin of the device, with some limits in the performance. FPGAs also have to provide clock synchronization to control the timing of the clock signal relative to an external source. A clock distribution network provides clock signals to all parts of the FPGA while limiting the clock skew between different sections of a design. There is also some dedicated logic for loading the configuration into the FPGA. This logic does not directly form



Figure 3.3. FPGA basic architecture, from [75]

part of the users design, but is required for FPGAs to be programmable [75]. The flexibility of the FPGAs leads to an increase in area, delay, and power consumption versus ASIC: an FPGA requires approximately 20 to 35 times more area than a standard cell ASIC, has a speed performance roughly 3 to 4 times slower than an ASIC and consumes roughly 10 times as much dynamic power [78]. These disadvantages arise largely from an FPGA's programmable routing fabric which trades area, speed, and power in return for ease of use and reconfigurability. However, despite these disadvantages FPGAs are increasingly used in the digital design. With the advances in the Moore's law and by the current state of the art of deep submicron ASIC technology, it is becoming increasingly demanding in terms of time and money to design an ASIC. The Computer Aided Design tools (CAD) necessary for synthesis, rooting, simulation, and etc. are becoming increasingly expensive, as are the production costs. This makes FPGA design a common choice when a small volume and a fast turnaround are required.

#### Programming principles of the FPGAs

Each FPGA uses some programming technology to control the switches that provides the programmability of the FPGAs. There are several pro-

gramming technologies and their differences have a significant effect on programmable logic architecture. The approaches that have been used historically include EPROM [79], EEPROM [80], flash [81], static memory [72] and anti-fuses [82]. Of these approaches, only the flash, static memory and anti-fuse approaches are widely used in modern FPGAs [69].

FPGAs are programmed by loading the configuration file into the internal memory. The configuration file is previously generated by the user and streamed onto the FPGA. When using the external flash memory, shown in Figure 3.4, FPGA boards can be set-up in a such way that they are automatically configured on power-on. Naturally, FPGAs can be reconfigured at any time, also serially (usually by JTAG), or by loading a new configuration file in the flash memory [75].



**Figure 3.4.** FPGA configuration [75]

The two main FPGA manufacturers in terms of market share are Xilinx and Altera, although there are several others that provide FPGAs. The basic structure of these FPGA providers always correspond to what was already described, FPGA vendors also provide custom functionalities, in the form of Intellectual Property (IP) designs. These can be in the form of hard IP, where the functionality is already present as an on-die structure, and soft IP, where the core is delivered in a Hardware Description Language (HDL) or as a synthesizable netlist. With the recent technological developments the functionalities, and therefore internal structure, have experienced greater diversification. In the case of Xilinx and Altera, for example, modern FPGA chips provide multiple on-die designs, such as hard-IP processors, or PCIe IP designs.

Typical FPGA design flow is presented in Figure 3.5. It consists of several key steps. The design must be presented in an easy to read form. While



Figure 3.5. FPGA design flow [83]

originally the hardware was designed using schematics, the complexity of modern systems makes that impractical. Now design is typically done using HDLs. The two most common languages employed in logic circuit designs are Very high speed integrated circuits (VHSIC) Hardware Description Lan-

guage (VHDL) [84] and Verilog [85]. Using these languages, the user can describe the hardware architecture at different levels of abstraction, using a hierarchical and modular approach [86]. It is important to notice the difference between the behavioral HDL and synthesized HDL. Only the latter will actually produce an exact representation of the final circuit.

Register Transfer Level (RTL) design and simulation describes circuits at a higher level of abstraction. The lowest level is the transistor-level design, which involves connecting transistors into circuits to build gates and components. Next higher level is logic-level design, whose building blocks are logic gates. RTL design is the next abstraction level, which describes the data transfer between registers, through other logical components [87]. Simulation results are bit and cycle accurate, which means that the simulation results should match the hardware representation [86].

Synthesis is the next step in the flow, which creates a netlist representing the actual hardware from the logic described using HDLs. Usually this representation is in Electronic Design Interchange Format (EDIF) [88], which is vendor neutral. Synthesis constraints control aspects of the synthesis process, and help optimize design for speed, area, power consumption [75].

Mapping, placing, and routing the design are the next step. First, the logic is mapped to actual available components. Afterwards, these components are placed in particular logical blocks of the FPGA, and the routing is determined to connect them. Previously mentioned constraints are here also used to define the critical timing, as well as actual Input/Output (I/O) pins. This is an interactive phase of the design, and it can be run several times, also with changes in the RTL design if needed, to achieve desired results. In the final step, a configuration file needed to program the FPGA is generated [75].

#### Overview of the image processing algorithms for FPGA

Image processing and accelerating image processing using FPGAs is not a new concept. Naturally, FPGAs are not the only devices applied for such purpose. There is a large body of research comparing FPGAs with not just CPUs, but also GPUs [89]. For many applications, GPUs provide better performance [90, 91]. GPUs are a good fit for applications where the processing algorithm can be efficiently mapped in high-level languages, and when there is no inter-dependency in the data flow. FPGAs are a good fit for applications that may involve a lot of detailed low-level hardware control operations and when taking advantage of data streaming and pipelining is especially important [92]. For embedded applications, and for this work, FPGAs are a more natural fit.

Embedded systems are usually dedicated to a specific task or purpose [93]. However, FPGAs with programmability and reconfigurability provide flexibility in developing the desired applications [94]. As mentioned, FPGAs also have the ability for parallel execution and data processing. Embedded image processing on FPGAs is, naturally, a subset of more general digital signal processing that is possible with FPGAs. Digital signal processing has many applications, in telecommunication, speech, audio, robotics, and, as mentioned, image and video processing [95].

When the light reaches the image sensor, it is converted to an electrical signal. Digital image sensors transfer the signal as an already digitized information. The methods of operation of CCDs and CMOS active pixel sensors are explained in Chapter 4. This chapter will present the image processing operations conducted when the digitized image reaches the FPGA.

In "Design for Embedded Image Processing of FPGAs", by Donald Bayes, image processing algorithm is defined as "the sequence of image processing operations used to process an image from one state to another" [75]. Digital image processing, or image vision techniques, can be classified into three classes, low-level image processing algorithms, intermediate-level image processing algorithms, and high-level image processing algorithms [96]. For the low-level algorithms, input and output is still a digital image. With the intermediate processing algorithms, the input is a digital image, but the output can be a low-level symbolic representation of the image features, e.g. contour of the object, or a label associated with a region. High-level algorithms utilize symbolic representation as both input and output [96].

The classification of image processing algorithms is depicted in Figure 3.6. At the low level, the image is represented as a group of pixels. On their own, individual pixel may posses low information, which can be of high volume, however. If we move to a higher, intermediate level, with grouping the corresponding pixels in meaningful regions, they may provide more information. The part of images, or a group of pixels, can carry more information than just the value of their pixels. Features may be extracted, e.g. the location of an object, or extricating different objects from one another. At the high level, the extracted data is usually about distinguished regions that may carry the information used to classify an object into a category, or, be used to describe an object [75].

The classification of image processing operations can also be represented as an image processing pyramid, where the algorithms are classified into three levels: low, intermediate, and high [97, 98, 99], shown in Figure 3.7.

Low-level operations are at the bottom of the pyramid, and usually, they are preprocessing operations. They are typically pixel-based operations, such



**Figure 3.6.** Image transformation example, from low level to high level of information (from [75])



Figure 3.7. Image processing pyramid (from [99] and [75])

as filtering and edge detection [100]. The purpose of these operations is enhancing the relevant information and removing irrelevant data, e.g. noise [75]. These operations typically work on large volumes of data, and include simple computation, like multiplication and addition.

At the medium level, algorithms process digital image, and produce set of lines or regions [99]. These can be segmentation and classification algorithms. Segmentation operations can be thresholding or color detection, and these algorithms are at the boundary between the low and intermediate levels. The purpose of segmentation is to detect objects or regions of interest, which have a desired quality.

After segmentation comes classification, where features of regions may determine or classify objects. Classification transforms the data from regions to features, and then to labels. The data is no longer image based, but position information can be contained within the features, or be associated with the labels. At the highest level, processing operations produce objects from symbolic inputs such as image features. One such operation is recognition, which derives a description or some other interpretation of the scene [75].

Depending on what type of image processing operations perform on, they

can be grouped in histogram operations, point operations, and filters [101]. Histograms are statistic representation of the image. In case of gray scale images, histograms represent the frequency of occurrence of gray levels [102]. For example, in case of 8-bit gray scale images, a histogram may contain a maximum of  $2^8 = 256$  entries. Operations on histograms may include image enhancements, such as those involving contrast and dynamic range. Since no information of about where each pixel is located in the image, histograms do not present any visual information about the appearances of the objects inside the image. Therefore, reconstruction of the image from a histogram is in general not possible [101]. Considering that in this dissertation we place emphasis on identifying and extracting information about objects, histogram operations are not of interest for this work.

Filter operations need more than one pixel from the source image for computing a new value of the pixel [101]. Usually, these multiple pixel values, or area or a window, are in the vicinity of the desired output pixel. The window size is arbitrary, and depends on the chosen operation and the intended result. Typically, filters are divided into linear and nonlinear filters. Some filter operations include smoothing of the picture, or detection of simple local structures (like edges). Furthermore, filter operations can detect motion, or be used for reconstruction and restoration of images, and noise reduction [103]. Since multiple pixels are needed for filters, for stream processing, parts of the images may be partly or wholly buffered [104]. This may impose constraints on the performance.

Unlike with filter operations, point operations are performed on a single pixel [101]. The value of the pixel depends only on the corresponding pixel value of the input image or input images, since point operations may be applied between multiple images [75]. In case of a single image, typical operations are modifying brightness and contrast, thresholding, and intensity transformations.

Of particular interest for this work are operations on multiple images, especially image subtraction and image comparison, since high performance point operations can be implemented in hardware [75]. Image subtraction is used to match one image against another. One commonly used operations to determine the similarity is the sum of absolute differences (SAD):

$$SAD(x, y, i, j) = \sum_{l=0}^{M-1} \sum_{k=0}^{N-1} \left| A_{(x+l, y+k)} - B_{(x+i+l, y+j+k)} \right|$$
 (3.1)

where at the current block location, i.e. (x,y),  $A_{(x+l,y+k)}$  and  $B_{(x+i+l,y+j+k)}$  represent blocks of pixels of the current and certain reference frame, respectively [105]. Values (x,y,i,j) are pixel locations within an image, or spatial

coordinates, while values (l,k) represent frame numbers, or temporal coordinates.

A second technique to ascertain the similarity between images is the sum of squared differences (SSD):

$$SSD(x, y, i, j) = \sum_{l=0}^{M-1} \sum_{k=0}^{N-1} \left( A_{(x+l, y+k)} - B_{(x+i+l, y+j+k)} \right)^2$$
 (3.2)

SAD or SSD techniques can be used to detect changes in images. SAD is commonly used in fast motion estimation algorithms [106]. This is portrayed in Figure 3.8. Using SAD, the zero difference between pixels signifies







Figure 3.8. Illustration of image difference for change detection (from [75]). Left image is the original; in centre is the new image, with highlighted changes; right image represents the difference (with the difference highlighted). Offset is used to represent zero difference as mid-gray.

no changes, while non-zero results indicates difference in the image [107]. The absence of objects can be detected, but also the shifts in the position of the objects. One important caveat when using SAD is that the presence of noise, camera vibrations, or the variable illumination between images will also result in a difference. There are several possible approaches to these problems. One approach for illumination difference is to establish a stable, possibly artificial, lighting. Furthermore, if the images are acquired with small exposure time, the lighting difference between images will be low. Small exposure time may also reduce the influence of some sources of noise. Additional approach is to provide a dynamic estimation of the background, which can adapt if conditions change, and may serve to remove the influence of variable illumination and noise [75]. For camera vibrations, for the applications we consider in this work, the experimental setup precludes the occurence of vibrations.

For an adaptive background, which may include the presence of noise, the approach might be to start from the initial static background, and provide

a dynamic estimate of the succesive backgrounds [108]. Simple models can be determing mean value [109], or median value of previous images [110]. Limitiation of the mean approach is the sensitivity of the parts of images which are variable by default, such as outdoors scenes. More robust models are using bimodal and multimodal backgrounds, with complex modeling for each pixel value [111].

An additional approach might combine SAD methods with thresholding, where values below certain thresholds might be considered as noise or background, or simply the lower values might indicate movement that can be disregarded. This is especially the case with complex scenes and backgrounds. Double differencing can be used to detect changes between images. It uses data from three successive images. This type of algorithms has been implemented to detect human motion [112], and also for vehicle detection using FPGA [113].

For stream processing, considering that each pixel can be processed independently, point operations may be implemented in parallel. This is particularly well suited for hardware implementations. With stream processing, we can convert spatial parallelism to temporal parallelism. The image is read and stored row by row, as explained in Chapter 4, but pixels can be processed while being read-out. When needed for more complex processing, operations can be pipelined in order to maintain the needed throughput. This is especially important for high-throughput applications. With point operations, small or no neighborhood is used, therefore caching requirements are low. With stream operations, latency requirements can be fine-tuned, and specialized hardware, like FIFO, can be used to synchronize outputs from multiple different operations. In this case, when operation latency is much smalled than loading of the whole image, the algorithm execution time is determined by the frame rate [75].

Taking these details into account, as well the initial requirements of stream data processing, high-data throughput, and fast feedback, in this dissertation we opt for implementing a stream point data processing algorithm in smart camera platform. In Chapter 5.3, an implementation of the custom stream processing algorithm, named fast reject, is presented. This algorithm relies on point operations using thresholding and SAD to detect changes in multiple part of images.

## 3.2 Smart camera architecture design

X-ray imaging is a key technology for providing advances in medical diagnostics, homeland security, materials research and biology. Non-destructive

X-ray imaging at synchrotron light sources enables micrometer and sub-micrometer resolutions. Typical applications are micro-tomography, fast radiography (radioscopy) and holotomography.

Charge-Coupled Device (CCD) and hybrid pixel detectors which are commonly used in direct X-ray imaging are limited in terms of pixel size, radiation resistance and X-ray stopping power [114, 115]. In order to improve the X-ray absorption, indirect detection methods, coupling CCD or Complementary Metal-Oxide-Semiconductor (CMOS) image sensors to a scintillator, are used [116]. Fast X-ray detectors and a short exposure time are necessary requirements for investigating fast processes in real-time or reducing the samples' radiation dose during in vivo experiments [12]. Also, in order to conduct experiments with a large number of samples there is a need to reduce the required time for a micro-tomography sequence [117], [118]. Within the framework of the UFO project [11] an experimental station is built in the IMAGE beamline of KIT's ANKA synchrotron radiation facility. The station will cover a large variety of scan geometries and contrast mechanism. The UFO project aims to push the present limits of high-speed X-ray imaging and introduces a novel concept with on-line data assessment, data driven feedback, and active control of both the sample and the measuring procedure.

### Architecture description

The UFO Data Acquisition (DAQ) chain consists of three main elements: a smart high-speed camera platform, real-time high-performance Graphics Processing Units (GPU) computing stage and feedback control loops. Fast volumetric X-ray imaging has already been achieved in the range of a few tens of ms but is presently limited to a few seconds of total acquisition time. The low acquisition time is caused by the limited internal memory storage and the high recording frame rate of commercial cameras. Subsequent data transfer from the internal camera memory to the external computer is performed at a much lower speed [119].

By removing on-camera memory buffers on our smart camera, the duration of the acquisition is prolonged and limited only by the size of the main memory (RAM) of the recording computer. The smart camera also serves as a platform for embedded image processing which can be used to increase the effective frame rate or as an internal feedback loop. With frames being streamed directly to system RAM, a high-performance GPU-based computing stage can process the data in real-time. On-line data evaluation is used as an image-based control loop for the sample manipulator and for controlling the experiment conditions.

The detector setup consists of a transparent single crystal scintillator coupled to a CCD or a CMOS image sensor readout through diffraction-limited magnifying optics [120, 121]. Indirect detection methods in X-ray imaging have been extensively explored and applied in synchrotron light sources since the early 1990s [122].

The new DAQ framework enables continuous data acquisition at the highest speed and offers real-time data assessment. This novel concept of databased feedback leads provides opportunities for new experiments.

A key feature of the smart camera are built-in embedded processing capabilities able to extract application-specific information from the captured images [53]. FPGAs are used in smart camera designs because of the flexibility to interface with other hardware periphery and their parallel processing capabilities. The camera uses a daughter carrier board to host the image sensor, which is based on Monolithic Active Pixel CMOS sensor. In order to satisfy the main goals mentioned in Chapter 2.4, namely *Modularity*, *Embedded processing*, *and High-data throughput*, the camera also needs to be fully configurable. Full access to the image sensor or other detector properties are provided to the user. This is crucial in order to adapt the pixel response to any experiment condition. Parameters such as adjustable gain, or image exposure time, noise threshold can be adjusted by the user.

The development was split in two versions, with the first iteration being developed using commercially available components, while the second version was realized on a custom Printed Circuit Board (PCB) FPGA board. The reasoning for these steps is to speed up the development while focusing on the new concepts, and to provide a proof-of-concept relatively soon. The initial architecture is shown in the Figure 3.9. The architecture is divided into three main parts: CMOS-image sensor, Xilinx Virtex-6 FPGA board [123] and a PC used for controlling the camera and for storing and for further data processing. The modular design of the camera provides the ability to select the CMOS image sensors. Image sensors are connected to the board using an FPGA Mezzanine Card (FMC) connector [124]. A custom image-based self-event trigger algorithm is also implemented in the FPGA. The initial accomplished smart camera prototype is shown in Figure 3.10.

### **Initial Data Acquisition Architecture**

The initial CMOS image sensor selected was CMOSIS CMV2000 [125], with its architecture shown in Figure 3.11. The CMV2000 has a pixel size of 5.5  $\mu$ m x 5.5  $\mu$ m and a nominal frame rate of 330 frames per second (fps) at 2.2 MPixels. It is provided as a monochromatic or color sensor with 10 to 12 bits resolution. More information about the sensor is provided in the chapter on



*Figure 3.9. Smart camera platform* — *Initial architecture* 



Figure 3.10. Initial smart camera prototype

sensor characterization.

Communication with the sensor is done using Serial Peripheral Interface (SPI). Clock and control signals are provided by the FPGA. All sensor parameters, like gain, offset, exposure time, are stored in control registers, and programmed using SPI. When the exposure or integration time is provided



Figure 3.11. CMV2000 architecture, from [125]

to the sensor, and the FPGA issues a frame request command, the image is stored in the pixel-matrix (with a global electronic shutter). The pixel values are passed through the Analog Front End (AFE) and digitized. These values are transfered using Low Voltage Differential Signal (LVDS) channels. Each LVDS channel is responsible for a group of adjacent columns of the pixel matrix.

In order to read out the camera system as fast as possible, a standard PCI Express (PCIe) cable connection is used to transfer the data from the camera directly to the main computer memory. In the first camera design, a 4 lanes connection is utilized. There are passive copper cables and active optical links available for this interface. The PCIe x4 lane generation 2 connection has a theoretical bandwidth of 16 Gbit/s. Direct Memory Access (DMA) is used to transfer the data from the camera to the main computer memory and vice versa. By using PC memory, we avoid the bottleneck of "dead time" between data acquisitions. Also, by not being limited by an in-camera memory, we can acquire much more data, considering that the memory available for PCs are much larger and can therefore hold much more data. Third benefit is that the data processing can start immediately after acquiring data.

Addressable 32-bit user bank registers are implemented in the dedicated

Base Address Register (BAR) space. Bank registers are used to read/write the status/configurations of DMA engines, CMOS sensor and FPGA logic. Further bank locations can be used for additional user applications. The DDR memory device is used for both temporary frame data storage and for image processing algorithms.

The three main blocks, as seen in Figure 3.9 present the foundation of the smart camera platform. Embedded processing and flexibility is built on top of it. The next chapters will explain the initial implementation of these hardware building blocks.

### First DMA-PCIe Architecture

The first module handles the communication via the PCIe interface. The term "Bus Master", used in the context of PCIe, indicates the ability of a PCIe port to initiate PCIe transactions, typically memory read and write transactions. The most common application for "Bus Mastering Endpoints" is for DMA. DMA is a technique for efficient data transfer to and from host CPU system memory. This implementation has many advantages over standard Programmed Input/Output (PIO) data transfers. In addition, the DMA engine offloads the CPU from directly transferring the data, resulting in better overall system performance through lower CPU utilization. The initial PCIe-DMA architecture is presented in Figure 3.12. Two IP cores are employed in combination with the logic blocks developed at KIT. An integrated Endpoint Xilinx-IP core for PCIe [126] and two Northwest Logic DMA [127] engines are used to move the data from the FPGA board to the PC central memory and vice versa. A custom PCIe-DMA interface logic has been developed to adapt the Xilinx PCIe interface to the DMA engines.



Figure 3.12. PCIe-DMA Architecture using Northwest IP

The interface with the internal FPGA logic is provided by two I/O logic

blocks. Each logic block includes a data First-In First-Out (FIFO) memory block and a Finite State Machine (FSM) to provide the control for the received and transmitted data packets. The FSMs generate the start and end of packet signals for DMA and manage the situation when the PC is not able to receive the data packets. This is important, since no data packets should be lost.

The FIFOs are used as a temporary data storage during streaming and, when needed, for communication and transferring data between different clock domains. In this way the defined clocks,  $Clock\_in$  and  $Clock\_out$ , can be used to send and receive data even though operating on a different frequencies. The  $Data\_out$  and  $Data\_valid$  signals are synchronized with the user defined  $Clock\_out$  domain. The  $Data\_valid$  signal is used to inform the rest of the logic when the valid data are present on the data out bus. A busy signal is used to temporary stop the data flow received from the FPGA internal logic. The  $WR\_EN$  signal allows writing a data word at the  $Data\_in$  bus using  $Clock\_in$  frequency. The  $Back\_pressure$  signal informs the FPGA logic that the PC is busy, and not presently able to receive the data.

### Input stage logic

The SerDes (serializer/deserializer) module is used for communication between image sensor and FPGA. SerDes are devices that take parallel, single-ended signal buses and transforms them them to a few, typically one, differential signal that works at a much higher frequency rate than a wide single-ended data bus. SerDes provides a point-to-point transfer of of data. The clock division, parallel data width, and the training pattern are configurable according to the CMOS image sensor specifications. The image sensor used in the camera prototype has 16 parallel high-frequency LVDS serial lines to move data from the pixel-matrix to the FPGA. Each line provides data at a double data rate with 480 Mbits/s. Furthermore, the ADC resolution of 10 bits or 12 bits per pixel can be selected. This has an impact on the SerDes logic that must translate serial input to 10-bit or 12-bit parallel output. To overcome the SerDes logic limitations in the Virtex6 FPGA [123], a new SerDes input stage module has been developed. The basic architecture of a single SerDes channel is shown in Figure 3.13.

To cover all image sensor outputs, 16 parallel SerDes input stages are utilized. A common FPGA regional clock for all 16 input stages has been defined as a division of the LVDS data clock according to the parallel data width. An individual programmable absolute delay primitive block, IODE-LAY [128], is used for a precise 80 ps step time synchronization between data and clock. The LVDS input data line is converted from double data rate



**Figure 3.13.** SerDes architecture

to two single-ended data lines by a double data rate (IDDR) register [128]. Lines are then combined for parallel data output by the custom SerDes logic. The dedicated word alignment FSM checks the correct position of the Most Significant Bit (MSB) in the parallel data output by comparing it with the training pattern. A bit-slip signal is generated from the alignment FSM and received from the custom SerDes and used to shift the wrong MSB bit to the correct position. A data lock signal informs the rest of the logic about the correct alignment of the parallel data.

### **DDR3 RAM Memory Interface**

The new memory interface logic combines a Xilinx physical layer (PHY) for DDR3 devices with custom additional logic. The new development extends the memory interface Xilinx IP core [129] features and overcomes the limitations present in the DDR3 IP core provided by the Xilinx, shown in Figure 3.14. The  $WR\_EN$  signals writing a data word with a user defined data width present at the  $Data\_in$  bus. A WR-FIFO block is used as a temporary data storage and for a clock domain change between user clock frequency,  $Clock\_in$ , and internal logic clock domain. The Arbiter FSM continuously checks if the WR-FIFO is empty. If not, the enable signal for the write operation is propagated to the WR-DDR FSM. The WR-DDR FSM receives a write command and generates the address and all necessary control signals for the PHY logic. The PHY logic receives the command and writes the incoming

data in the address position specified by the WR-DDR FSM.



Figure 3.14. RAM memory interface

The Arbiter FSM can receive a read request for the DDR3 device. In this case, the Arbiter inquires if the WR-DDR FSM is in "idle" state. If true, a read command is transmitted to the RD-DDR FSM. The RD-DDR FSM receives a read command and as in write cycle, generates the necessary signals for the PHY logic. The output data from the PHY are stored in the RD-FIFO. According to the user defined data width, a data word is present at the  $Data\_out$  port. A  $Data\_valid$  signal informs the user logic that the data is ready. All signals are synchronized to the user defined  $Clock\_out$ .

The internal read and write paths work with a data width of 256 bits at 200 MHz internal clock that corresponds to a bandwidth of 51 Gbit/s in a half-duplex mode. This bandwidth limitation is imposed by the PHY logic in the used Virtex 6 FPGA speed-grade. Nevertheless, a quasi full-duplex data flow is achieved with the proposed architecture. Balancing the amount of the data in both FIFOs is managed by the Arbiter FSM. With this, an intelligent burst write and read commands can be alternatively issued to the PHY. This in effect provides a full-duplex DDR3 interface with a mean bandwidth of 25 GBit/s for both read and write.

### Evolution of the hardware architecture

The initial smart camera platform was successfully implemented. It was used in real X-ray radiography and tomography experiments and it was successfully integrated in the control system. As a proof-of-concept, image-based self-trigger algorithm was implemented and demonstrated the applicability of the smart camera platform for embedded processing. The second

phase of the development aimed to increase the performance and capabilities of the initial design and stress the main qualities of the smart camera platform.

### Modular approach — multiple CMOS image sensors

The modular design of the smart camera offers the opportunity to select the CMOS image sensors according to the experimental requirements and the respective sensor characteristics such as spatial and temporal resolution, dynamic range, noise influence, etc. Figure 3.15 shows the physical camera architecture for a particular sensor daughter board and the fixed FPGA readout board.



**Figure 3.15.** The smart camera system with exchangeable image sensors and the FPGA readout board

Three types of CMOS image sensors are provided. The first type of the image sensors, as previously mentioned, has a pixel size of 5.5  $\mu$ m x 5.5  $\mu$ m and a nominal frame rate of 330 frames per second (fps) at 2.2 MPixels [125]. It is provided as a monochromatic or color sensor with a maximum frame size of 4 MPixels and 10 to 12 bits resolution. The 4MP sensor is shown in Figure 3.16(a). The second type has a high frame rate of 2000 fps at 1 MPixels. Pixel size is 7  $\mu$ m x 7  $\mu$ m, with 10 bits per pixel. Finally, for applications that

require even higher frame rate, a CMOS image sensor with a 5000 fps at 1 MPixels is provided, shown in Figure 3.16(b). It provides the largest pixel size of the selected sensors, with 13.7  $\mu$ m x 13.7  $\mu$ m, and 10 bits per pixel [60].





(a) CMV4000 — 150 fps @ 4MP

(b) Polaris — 5000 fps @ 1MP

**Figure 3.16.** Two additional CMOS image sensor boards

The board containing the CMOS sensor is connected with a FPGA mezzanine card (FMC) cable, which allows for easy extension and modification to best suit the given applications. The image sensors are cooled with a Peltier cell and the whole board is enclosed in a vacuum environment. The cooling is used to minimize the noise due to leakage current. As before when operating, the CMOS image sensor receives a request for a new frame. After the predefined integration time the pixel-matrix is read out row-by-row. The pixel values are passed to a column ADC cell and digitized. These digital signals are then transferred by multiple parallel lines to the FPGA readout board.

### Modular approach — New DMA architecture

For the later design iteration, a new commercial readout board is used, from HiTech Global [130], with a larger, more powerful FPGA, and with more DDR3 memory. The concept of the internal FPGA architecture is similar as in first design, but the new DMA architecture is used. The new DMA was not realized as part of the work in this dissertation, and it was developed by other persons in the Institute's FPGA group and not by the author. However, this development was used to once more illustrate the benefits of the modular approach of the new data acquisition platform. Technological progress, as well as new functional development can be used to improve the

capabilities of the original platform. The new DMA architecture is shown in Figure 3.17.



Figure 3.17. New DMA architecture (from [131]).

The DMA uses two different engines to transmit and receive data (TX and RX) using the PCIe core. The RX engine is implemented using the same architecture as the TX engine. The architecture also includes the address table and a Base Address Registers (BAR) space, which is used to congure the DMA engine. The interface between the TX/RX engines and the user logic is a FIFO-like interface with a width of 128 bits, according to the input/output data width of the PCIe core, and operating at 250 MHz [131]. The architecture has been optimized to work in streaming mode for long transmission times, where the expected data size greatly exceeds the maximum memory allocable by the OS [131].

In addition, the new FPGA board has a faster PCIe interface, now with x8 generation 2 lines. To take a full advantage of the technological advancements for PCIe connection, a new custom PCIe-DMA module was developed in the IPE FPGA group, and not by the author [132]. As before, it is able to continuously transfer data to the system memory, but now with the speed of up to 3500 MB/s, for x8 generation 2 PCIe version. Control of the DMA is again achieved through 32-bit user bank registers, implemented in the dedicated BAR space. In addition to controlling CMOS sensor and FPGA logic, bank registers are also sued for the configuration of the DMA. The smart camera described in the dissertation relies on the achievements

made possible by the new DMA engine, while not directly contributing to the design.

### Smart camera platform embedded processing

The DDR3 memory interface was modified and improved, and is now capable of providing even faster data throughput, if the FPGA version and PHY can support it. The initial limits in performance of the available DDR3 IP are overcome, and it provides an interface at a maximum possible speed, limited by the PHYs physical properties. In case of the FPGA device presented in Figure 3.15 i.e. xc6vlx365t-2ff1759, the overall memory access is 50Gbit/s. It is now based on Advanced eXtensible Interface (AXI), standard provided by ARM [133]. Using the AXI interface standard provides an easy extension to other IPs.

The embedded processing architecture is shown in Figure 3.18. The modular approach is also evident here. The interface to both input stage, or image sensor, and to the external DDR3 memory is established as generically as possible. This design was intentional in order to easily modify or extend the current embedded processing functionality. Incoming data from the input stage are synchronized with the rest of the logic through large FIFO, as previously explained. Multiplexer is added to enable both data streaming, to DDR3 module, and to provide data to the processing stage. For the external memory, access is now provided using AXI modules, and again, synchronized with the rest of the logic using FIFO.

As before, control of the all functionalities is achieved using *Bank register*. The user has full access to all functionalities of the design. CMOS control and Interleaving FSM (Finite State Machine) blocks are used for the control of the image sensor, and ensuring the proper flow of data. Interleaving FSM, for example, is used to set the corresponding ROI when acquiring or processing data. Multi-channel access to memory is provided with the custom KIT\_IPCore. It is based on the module explained in "DDR3 RAM Memory Interface" section of the Chapter 3.2. The difference now is in using the AXI instead of native interface. An important additional benefit is that the user now is able to directly target specific memory locations. With this, the whole memory can be used as a set of distinct memories, if the need arises. Memory partitioning can be used to have separate locations for different data, which can be used for different stages in data processing. Naturally, the overall memory access throughput is still the same, and caution must be taken when designing and implementing desired data processing algorithms to avoid the memory access bottleneck. Furthermore, "Memory Wall" is a well known term explaining the discrepancy between the increasing processing



**Figure 3.18.** Embedded processing architecture. Dashed arrows represent data flows. F1 is the classic camera data streaming flow. F2 is the image processing flow, in conjunction with flow F1, when needed.

performance and memory access, especially for Dynamic Random-Access Memories (DRAMs) [134, 135, 136]. While the performance of DRAMs [137] is increasing, the latency between issuing the RD/WR (Read/Write) request for a specific memory address and subsequent receiving of the data can still be prohibitively long for certain algorithms. If these requests can be foreseen, however, then the features of programmable hardware can be used to mitigate these effects, namely pipelining and parallelism.

The two main data paths, shown in Figure 3.18 as *F1* and *F2*, represent two main modes of operation, data streaming and data processing, respectively. Data processing may include data streaming as well. In the mode *F1* (pink dashed arrow), data are transferred from the input stage through the custom *KIT\_IPCore* and external memory, to the external device using the PCIe-DMA engine. As always, user has full access to all functions.

In the data processing mode, *F*2 (black dashed arrow), data are sent from the input stage to the processing module. Still, the data flow *F*1 may be also retained. In order to process incoming data, usually data already need to be acquired and stored in the external memory, which is flow *F*1. As explained, for point processing operations, data are compared between consecutive frames. This is clearly shown in Figure 3.18. After the initial data acquisition, data are simultaneously read out from the memory and input

stage.

The both incoming data are evaluated in the processing module. The result of the processing operation may trigger further actions, such as storing and then transferring the data to the external stage. Using the *KIT\_IPCore* there can be more than one read or write operation at the same time. For example, data can be simultaneously read out for processing and for external transfer, from different parts of the memory. The processing stage, and interface it provides is used to implement a custom algorithm, described in later chapters. However, the interface to the processing module is generic, and can be used for implementing other data processing algorithms. Care was taken when developing logic to provide a "lean" design, with the intention of keeping the total FPGA occupancy reasonably low. For example, total slice occupancy is 21%, and total internal FPGA RAM occupancy is 19%.

### Final smart camera platform design

We have described the work designing the smart camera platform, and demonstrated the possibility of its implementation. Additionally, we have also shown the evolution of the camera, and proved that the camera capabilities scale well with the technological progress. However, in order to achieve the desired performance goals, a custom design was needed. For this purpose, a new FPGA board was designed and produced. Designing the schematic and producing the board was a team effort in the Institute's FPGA group, where the author was mainly involved in designing the power management of the board.

In Figure 3.19, schematic of the board is shown. The initial idea of the general data acquisition flow is clearly visible: input stage is achieved through the versatile FMC connection, FPGA is used for the the control and data processing, and high-throughput is accomplished using GEN3 PCIe connection. The design of the whole board was a team effort, with several members of the team involved in designing a schematic and the final PCB design of the board. The work executed during the span of the dissertation involved designing the power supply of the board, and designing the schematic for the small outline dual in-line memory module (SODIMM) DDR3 [138] component, with 204 pins. Additional work included PCB design of the power supply modules. In the following chapter there will be more information about designing power supply.

#### Power management

Creating an FPGA PCB board requires careful management and design of the power supply. FPGAs require anywhere from 3 to 15 or more voltage



**Figure 3.19.** Smart camera board — schematic

rails. The logic fabric is usually at the latest process technology node that determines the core supply voltage. Configuration, "glue" circuitry, various I/Os (input/outputs), SerDes transceivers (Serial De-serial), clock managers, and other functions have differing requirements for voltage rails, sequencing/tracking, and voltage ripple limits [139].

Depending on the technology process and type of I/Os, the voltage ranges from 1 to 1.8 V, or higher, up to 5 V. For the smart camera board, FPGA requires voltage rails of 1 to 1.8V for functioning, however additional voltage levels are supplied, namely 2.5 V and 3.3 V. The higher voltage levels were needed for supplying power for the DDR3 chip, and also for interfacing with the logic connected via FMC connectors. The overview of the main voltage levels implemented in the board are presented in the Table 3.1.

When designing the power supply of the FPGA board, one important point that needs to be considered is Startup Sequencing/Tracking in powering up and down voltage rails. If misconfigured, or ignored, it can lead to damage or latchup, i.e. malfunction [140]. The power-on sequence for the board was taken from the Xilinx recommendation, shown in Figure 3.20.

The main components used for power supplies are switching-mode power regulators that work as a step-down or buck converters. The schematic of

| Voltage domains   | Value (V) |
|-------------------|-----------|
| VCCINT, VCCBRAM   | 1         |
| MGTVACC           | 1         |
| VCCAUX, MGTVCCAUX | 1         |
| MGTAVTT           | 1.2       |
| VCC_1_5           | 1.5       |
| VADJ              | 1.8       |
| VCCAUX_IO         | 2         |
| VCC3V3            | 3.3       |
| Input voltage     | 12        |

**Table 3.1.** Overview of main voltage levels implemented in the UFO smart camera board



Figure 9. Virtex-7 FPGA Power Architecture Example

Figure 3.20. Power-on sequencing order (from [139])

the implemented power supply for the VCCINT (with voltage of 1 V), is shown in Figure 3.21.

In this example, the power regulator used was MAX8686. It is a synchronous PWM regulator that operates from a 4.5 V to 20 V input supply, and generates an adjustable output voltage from 0.7 to 5.5 V, and is able to



Figure 3.21. MAX8686, Step-Down DC-DC Converter [141]

deliver up to 25A [141]. MAX8686 operates with an adjustable switching frequency from 300 kHz to 1 MHz, and also includes an enable input and a power-OK indicator that may be used for power sequencing.

For our use-case, we use 12 V input voltage to provide 1V output voltage for internal supply voltage for the FPGA, i.e. VCCINT voltage. Furthermore, it is important to achieve a stable output, as the internal supply voltage must achieve values from 0.97 V to 1.03 V, or 60 mV in total. The power requirements for powering the FPGA internals are usually most demanding when designing a PCB board. As such, the implemented design is able to provide more than 10 A of current.

The minimum and maximum startup ramp rates are between 0.2 ms and 50 ms. For the smart camera board, a ramp up rate of 8ms was implemented, using the EN/SLOPE pin. In order to achieve the power-on sequence, from Figure 3.20, we use the power-OK indicator, where each subsequent voltage group starts powering-on after the previous has finished. Maxim Integrated provides a simulation tool, called *EE-Sim* [142], that can be used to verify by simulation whether the previously mentioned important values are achieved. The result of the ramp-up simulation, and obtained correct voltage level of 1 V is shown in Figure 3.22.

The simulation was conducted for the output current rising from almost zero to 15 Ampers (20 ms time point). The short rising time simulates operation of the FPGA under high occupancy. We see that even in this upper boundary condition, the ramp-up of the voltage is almost fully monotonous. We also see that the final achieved voltage level is the desired 1 V.



*Figure 3.22.* Ramp-up simulation for the internal supply voltage VCCINT

Other important characteristic for the VCCINT is the very precise voltage level where the total variation of the voltage is 60 mV. In the Figure 3.23 we see that the achieved final result is much better than the nominal. Total ripple is only 3 mV.



**Figure 3.23.** Voltage variation, or ripple, of the internal supply VCCINT

The final produced PCB with all the mounted components can be seen in Figure 3.24. We can identify the main items on the board.

Two High-pin count FMC connections on top of the board. Below FMC connectors is Virtex-7 FPGA, and an DDR3 SO-DIMM 204-pin connector to



**Figure 3.24.** Smart camera board — final design. Main board components are outlined.

the left of the FPGA. The power design occupies a large space of the board, to the right of the FPGA. The PCIe x16 Gen3 connector is located on the bottom of the board.

# 4 Detector module — pixel detectors

Development of image sensors are crucial for improving the performance of cameras. The two main technologies used in indirect X-ray imaging are charged-coupled devices (CCD) and monolithic active pixel (APS) CMOS image sensors. The next chapters are discussing the principles of operation and photon generation in the silicon, as well as the structure of image sensors in more details.

# 4.1 Principle of operation

In semiconductor radiation detectors, the signal arises from the charge generated by an incoming particle. The collected charge is further processed by the frond-end electronics. The read-out electronics may be either a separated chip connected to the detector (strip or pixel hybrid detector), or it may be integrated on the same substrate as the detector (e.g. monolithic pixel detectors or charge coupled devices). The scope of this work deals with mostly the generation of charge carriers by electromagnetic radiation.

### Photoelectric effect

The mode of operation of radiation detectors can be explained by the so-called photoelectric effect. It was first observed by the French physicist Antoine-Henri Becquerel in 1839. The classical wave theory of light of the time was unable to explain the observed properties, and the experimental evidence was contradicting the accepted theories. Albert Einstein finally solved the problem (in 1905) when he combined the Planck's quantum theory with the corpuscular theory of light according to which light represents a beam of particles — photons, which carry a discrete energy. For this discovery, he was subsequently awarded the Nobel Physics Prize in 1921 [143]. The corpuscular-wave dualism of not only photons, but of the elementary particles was confirmed by Arthur Compton in 1923. The electrons of an atom not only occupy discrete energy levels, but also absorb the impinging photon energy in a discrete manner. If the energy of an impinging photon is sufficiently high, then an electron can be ejected from the atom leaving

behind a vacant position in the atomic orbital, as in the Figure 4.1. Once cre-



**Figure 4.1.** Representation of the photoelectric effect using Rutherford-Bohr atom model (from [144]).

ated, the carriers are free to move in the semiconductor lattice, where they are collected by pixel elements (pixels).

### Semiconductor physics

Most commonly used semiconductors are single crystals with diamond (*Si* and *Ge*) or zincblende (e.g. *GaAs*) lattice type structure. All atoms in the diamond lattice are identical. Each atom is surrounded by four close neighboring atoms. They are arranged in a tetrahedron, and each atom forms covalent bonds with its neighboring atoms. The two dimensional representation of the formed covalent bonds is shown in Figure 4.2(a). At low tem-



- (a) Two-dimensional representation of a tetrahedron bond
- (b) Energy levels of silicon atoms, as a function of lattice spacing

**Figure 4.2.** Representation of the main structural parameters for charged carries generation (from [145])

peratures, the electrons are bound in their respective tetrahedron lattice. At higher temperatures thermal vibrations may break the covalent bond and a valence electron may become a free electron leaving behind a free place or hole. The electrical parameters of solid materials can be described using the energy band model. The band theory assumes that in a condensed materials, such as crystals, electrons can occupy energy levels grouped in bands. In a solid material composed of N closely spaced atoms, electrons of an adjacent atom are reciprocally influenced. The discrete energy levels of each individual atom do not remain, but become grouped in bands (formed by many closely spaced energy levels of single atoms). Allowed energy bands are separated by a forbidden band, which consists in energy levels that are not available for electrons.

Energy levels as a function of lattice spacing for silicon are shown in Figure 4.2(b). At large distances, atoms have the same energy levels. As the lattice spacing decreases, these levels start to form energy bands. Within a given material, two distinct energy bands are important to determine its electrical properties. The highest completely filled energy band at a temperature of 0 K is called the valence band. The band placed above, partially filled or empty, is called the conduction band. In order to bring the material into a conduction state, electrons needs to move by changing their quantum state. Therefore this movement is possible only towards the unfilled conduction band.

In Figure 4.2(b), three types of materials depending on their electrical properties are presented. In the case of a wide forbidden gap, electrons from the valence band cannot acquire enough energy to jump to the conduction band in order to contribute to conduction. Such material is an insulator. In the case of a conductor, the valence and conduction bands overlaps, or the conduction band is partially filled. In both cases, many vacant states are available for electrons. The third group of materials in Figure 4.3, the semiconductors, have a relatively small forbidden gap with an empty conductive band and a filled valence band at low temperature. However, thermal excitation at room temperature is sufficient to transfer a few electrons to the conduction band, thus leading to a weak conductivity.

Two types of semiconductors depending on their band gap structure are recognized. These are the direct-band and the indirect-band semiconductors. When an electron, due to excitation, is promoted from the valence to the conduction band, one needs to care about momentum conservation. In the case of a direct-band semiconductor, the highest state of the valence band and the lowest state of the conduction band have the same momentum (Figure 4.4). For indirect-band semiconductors, besides energy, an electron



**Figure 4.3.** Energy band structure of insulators, semiconductors, and conductors, left to right, respectively. (from [145])



**Figure 4.4.** Energy band structures for direct-band and indirect-band gap semiconductors. (from [146])

needs to change its momentum in order to pass trough the band gap. In most of the cases, this is done with phonon assistance.

### Intrinsic semiconductors

Intrinsic semiconductors contain no (in practice, very few) impurities compared with the number of thermally generated electrons and holes. Each electron, thermally elevated from the valence band to the conduction band, leaves a hole behind it. In an intrinsic semiconductor, the numbers of generated holes and electrons are approximately the same. The density of free electrons, n, and of holes, p is given by

$$n = N_c e^{-\frac{E_c - E_f}{kT}} \tag{4.1}$$

$$p = N_v e^{-\frac{E_c - E_f}{kT}} (4.2)$$

where  $N_c$  and  $E_c$  are the effective density of states and the energy level of the conduction band, respectively, and  $N_v$  and  $E_v$  are the effective density states and energy level of the valence bands, respectively. k is the Boltzmann constant and T is the absolute temperature. The Fermi energy  $E_F$  for which the probability of occupation for an electronic state is 0.5 lies midway between the two bands. The product of electron and hole concentrations is given by

$$np = n_i^2 \tag{4.3}$$

where  $n_i$  is the intrinsic carrier density. The equation 4.3 is called the mass action law.

For intrinsic semiconductors, the Fermi level can be from the requirement that the numbers of electrons and holes are equal:  $n=p=n_i$  and expressed by

$$E_i = \frac{E_c + E_v}{2} + \frac{3kT}{4} \ln\left(\frac{m_p}{m_n}\right) \tag{4.4}$$

where  $m_n$  and  $m_p$  are the effective masses of electrons and holes, respectively. The intrinsic Fermi level  $E_i$  is located in the middle of the forbidden gap, since the deviation due to the second term of the sum is only of the order of 0.01 eV. Intrinsic semiconductors are rather weak conductors, because this property depends strongly on the purity of the material and on its temperature.

### **Extrinsic semiconductors**

The conductivity of a semiconductor can be improved by doping, which consists in adding small amounts of impurities to the material. Impurities replace crystal lattice atoms, thus introducing new energy levels in the forbidden gap of the material (Figure 4.5). Such doped semiconductors are called extrinsic semiconductors. The doping material is chosen in such a way that it has a different number of valence electrons than a semiconductor atom, thus adding new electrons or holes. Dopants that bring extra electrons are called donors and those that bring extra holes acceptors.

In case of silicon, donor atoms have five valence electrons (e.g. phosphorus, or arsenic), and provide one additional electron. Conversely, acceptor atoms have three valence electrons for one additional hole (e.g. boron). It should be noted that the moving hole is more than a missing electron whose place is filled by a neighboring electron, which follows from quantum mechanical considerations and is experimentally verified in the Hall experiment [145].



**Figure 4.5.** Bond represents of n-type (left) and p-type (right) semiconductors. (from [147])

### Carrier generation and recombination

Free electrons and holes may be generated by the lifting of electrons from the valence band into the conduction band. This can by processes of thermal generation of charge carriers, optical excitation (electromagnetic radiation), or by ionization by penetrating charged particles.

Thermal generation has a negative effect in the operation of radiation detectors, since it causes the noise increase and signal degradation. In some direct semiconductors, the band gap is sufficiently small at the room temperature so that the electrons may be excited directly from the valence to the conduction band. These detectors (e.g. Ge) needs to be cooled down to low temperature when operated.

Generation of charge carriers by electromagnetic radiation is the basis of photo detectors or solar cells. A photon is absorbed and its energy is used to lift the electron from the valence band into the conduction band if the photon energy is higher or the same than the band gap  $E_G$  (Figure 4.6). Absorption of photons with energies below  $E_G$  can occur if there are local states in the band gap due to the lattice imperfection.

In radiation detectors, signal is generated from ionization by penetrating charged particles. When radiation interacts in a semiconductor, the energy deposition always leads to the creation of equal numbers of holes and electrons, regardless whether the semiconductor is doped or not. The shape of ionization paths depend on the nature of incident radiation. For visible and ultraviolet light, in general, a photon would produce a single electron-hole pair, and a photon would be absorbed close to the surface. In case of X-rays most electron-hole pairs are generated in a small spatial region around the interaction point.



**Figure 4.6.** Generation of charge carriers due to the photon interaction. (from [145])

If an electric field in a semiconductor is increased above a certain value, a charge may be accelerated enough to generate electron-hole pairs. This is called the avalanche process. For example, an electron may gain sufficient kinetic energy before colliding with an electron in a valence band, than upon the transfer of the part of energy, the impacted electron would make an upward transition to the conduction band. Consequently, the generated charge carrier pair begins to accelerate and collide with other valence electrons, and so on. This is also referred as the impact ionization process, and may result in the breakdown in the p-n junction [147].

When excess of minority carriers is introduced (e.g. electrons in a *p*-type device), the system works to return it to thermal equilibrium. The transition back to equilibrium is due to the recombination of the excess minority carriers with the majority carriers, e.g. when an electron makes a transition downward from the conduction band to the valence band, an electron-hole pair is annihilated.

The recombination process differs significantly in direct-band and indirect-band semiconductors. For the direct-band semiconductors, the probability is high that electrons and holes will recombine directly, because the bottom of the conduction band and the top of the valence band have the same momentum. In this instance, no additional momentum is needed for the transition across the band gap.

For indirect-band semiconductors, like silicon, a direct recombination is highly unlikely, because the electrons at the bottom of the conduction band have nonzero momentum with respect to the holes at the top of the valence band. A direct transition that conserves both energy and momentum is not possible without a simultaneous lattice interaction. Therefore recombination occurs in two step processes through intermediate-level states in the

forbidden gap between the conduction band and the valence band [148].

The replacement of an atom in the lattice by a different atom creates localized energy levels in the band gap (Figure 4.7). In case of donors, the



**Figure 4.7.** Energy band model of extrinsic *n*-type and *p*-type semiconductors. (from [145])

introduced energy level is very close to the conduction band. At room temperature, all donor states are ionized and all donor electrons are transported to the conduction band. Therefore the concentration of electrons, n, is equal to the concentration of donor atoms,  $N_D$ .

Donor doped materials are called n-type semiconductors. Similar considerations can be made for acceptor type dopants. In that case, an additional energy level is placed close to the valence band. In order to create a valence band with crystal atoms, acceptors will trap an electron from the valence band, thus leaving a hole behind them. The concentration of created holes, p, is equal to the concentration of acceptor atoms,  $N_A$ . Acceptor doped materials are called p-type semiconductors. In both cases, the intrinsic Fermi level is shifted towards the conduction or the valence bands for donors or acceptors, respectively. New energy levels are expressed by

$$E_{F} = \begin{cases} E_{i} + kT ln\left(\frac{N_{D}}{n_{i}}\right), & \text{for donor dopants} \\ E_{i} - kT ln\left(\frac{N_{A}}{n_{i}}\right), & \text{for acceptor dopants} \end{cases}$$
(4.5)

According to the mass action law (eq. 4.3), the increase of majority carriers (electrons in n-type materials and holes in p-type materials) is accompanied with a decrease of minority carriers. The electrical conductivity is almost exclusively determined by the flow of majority carriers, and the minority carriers play a small role.

### *p-n* junction

The electron-hole pairs in semiconductor detectors are created by ionization by an impinging particle, or by an electromagnetic radiation. These electron-hole pairs can be separated with the presence of an electric field. In order to limit the leakage current in the device, different sensor structures exist. For silicon, a reversed biased p-n junction is used, where the depleted region serves as an active volume for signal detection. Some other detectors with inherent high resistivity, e.g. diamond, an ohmic contact is used.

The p-n junction is formed when two extrinsic semiconductors of opposite doping are joined, illustrated in Figure 4.8. Once the bodies are joined, the



**Figure 4.8.** A *p-n* junction in thermal equilibrium with no external electric field, from [149].

electrons will diffuse into the p region, and the holes into the n region. This creates of negative electric charge in the p region, and of positive charge in the n region. Electric field is created, which counteracts the diffusion, and prevents electron-hole pairs from recombination. As a result, the region around the junction becomes free of mobile carriers and is called the depleted region. The potential difference across the depletion zone is called

the junction potential,  $V_j$ .

The condition of zero net electron and hole currents requires that the Fermi level must be constant through the region. Therefore, the built-in potential  $V_{bi}$  ( $\Delta V$ ), or diffusion potential, is given by

$$V_{bi} = \frac{kT}{q} \ln \left( \frac{N_D N_A}{n_i^2} \right) \tag{4.6}$$

where  $N_D$  and  $N_A$  are the dopant concentrations of donors and acceptors. In thermal equilibrium, the total negative charge in the n-type part is equal to the total positive charge in the p-type side of the depletion region, so the electrical neutrality is preserved.

The width of the depleted region at thermal equilibrium is expressed by

$$W = \sqrt{\frac{2\epsilon}{q} \left(\frac{N_D + N_A}{N_D N_A}\right) V_{bi}} \tag{4.7}$$

When an external potential is applied, thermal equilibrium is no longer existent. If an external voltage is applied with the same polarity as the built-in potential, or reversed-biased, the depleted region can be increased.

In case of the reversed biased p-n junction, the depletion widths on the p-and the n-side of the junction are

$$x_n = \sqrt{\frac{2\epsilon V_b}{qN_D (1 + N_D/N_A)}}$$

$$x_p = \sqrt{\frac{2\epsilon V_b}{qN_A (1 + N_A/N_D)}}$$
(4.8)

where the  $V_b$  is the applied reverse bias voltage. In this case, the junction potential is

$$V_j = \left(\frac{N_D}{N_A}\right) \frac{V_b}{(1 + N_A/N_D)} \tag{4.9}$$

For asymmetrical junction with  $N_D \ll N_A$  the junction potential is

$$V_j \approx \frac{N_D}{N_A} V_b \tag{4.10}$$

and the junction potential is equal to the potential of the p contact, so all of the bias voltage is spread across the lightly doped n-region of the depletion width (Figure 4.9). Increasing the bias voltage will increase the depletion width, which in turn increase the signal charge and reduce the electronic noise. There is a limit, however, since the electric fields above the  $10^5\ V/cm$  could ultimately lead to the destructive avalanche process. This can be offset for increasing the width of the depletion region by reducing the dopant concentration.



**Figure 4.9.** Diode is formed by introducing a highly doped surface layer into a lightly doped bulk. The depletion zone then extends into the bulk. Metalization layers provide electrical contacts to the doped  $p^+$  and  $n^+$  layers that form the junction and the back electrode (from [150]).

### **Photodiodes**

A photodiode contains a depleted semiconductor region with a high electric field that serves to separate photo-generated electron-hole pairs [147]. For high-speed operation, the depleted region should be thin to reduce the charge traveling time. However, in order to increase the number of generated electron-hole pairs per photon, the depletion region must be sufficiently thick to enable more of the incident light to be absorbed. Therefore, the trade off between the speed of the response and the quantum efficiency must be considered.

Photodiodes used in visible or near-infrared range are reverse-biased with a moderately high bias voltage, since this reduces the carrier transit time and the diode capacitance. The reverse voltage is kept sufficiently low not to cause the avalanche process. Main parameters of the photodiode are its quantum efficiency, response speed, and device noise.

The quantum efficiency is the number of electron-hole pairs generated per incident photon, and it is given by

$$\eta = \frac{I_{ph}}{q\Phi} = \frac{I_{ph}}{q} \left(\frac{h\nu}{P_{opt}}\right) \tag{4.11}$$

where  $I_{ph}$  is the photocurrent,  $\Phi$  is the photon flux (=  $P_{opt}/h\nu$ ), q is carrier charge, and  $P_{opt}$  is the optical power. The maximum value of the quantum efficiency is one. Any reduction is due to current loss by recombination, incomplete absorption, reflection, etc. A related parameter is the responsitivity, given by

$$\Re = \frac{I_{ph}}{P_{out}} = \frac{\eta q}{h\nu} = \frac{\eta \lambda(\mu m)}{1.24} \tag{4.12}$$

which increases linearly with wavelength. For a given semiconductor, a wavelength range for which the sufficiently high photocurrent can be generated is limited. The long-wavelength cutoff  $\lambda_c$  is established by the energy gap of the semiconductor, for example 0.65 eV for Ge and 1.12 eV for Si at 300K. The short-wavelength cutoff is due to the very high absorption coefficient  $\alpha$ , and the radiation is absorbed close to the surface where the photocarriers are more likely to recombine before they can be collected in the p-n junction.

The response speed is determined by the drift time in the depletion region, diffusion of carriers and capacitance of the depletion region. Carriers generated outside the depletion region must diffuse to the junction resulting in considerable time delay. The junction should be formed very close to the surface to minimize the diffusion effect. Most photons will be absorbed when the depletion region is sufficiently thick. If the depletion layer is too wide, transit-time effects will limit the frequency response. It also should not be too thin, or excessive capacitance C will result in a large  $R_LC$  time constant, where  $R_L$  is the load resistance, i.e. the resistance "seen" by the diode [147].

# 4.2 Charged-Coupled Devices (CCD) image sensors

A charge-couple device (CCD) can either be used as an image sensor or as a shift register. In fact, when used in imaging array systems such as a camera or video recorder, they are functioning as both. As a photodetector, it has also been called charge-coupled image sensor or charge-transfer image sensor. The concept of CCDs was introduced by Boyle and Smith, when the possibility of using it as an imaging device was mentioned in their paper in 1970 [28]. The importance of such discovery was demonstrated by awarding them the Nobel Prize in Physics in 2009 [151].

During the light exposure of the CCD, the photogenerated carriers are integrated, and the signal is stored in the form of a charge packet. Since each CCD pixel is basically a capacitor, it has to be operated in a nonequilibrium condition under a large gate pulse. The structures of the surface-channel CCD image sensor are similar to those of the CCD shift register, with the exception that the gates are semitransparent to let light pass through (Figure 4.10). Common materials for the gates are metal, polysilicon, and silicide. Alternatively, the CCD can be illuminated from the back of the substrate to avoid light absorption by the gate. In this configuration, the semiconductor has to be thinned down so that most of the light can be absorbed within the

 $\Phi_1$ =5V  $\Phi_2$ =10V  $\Phi_3$ =5V  $\Phi_2$ =10V  $\Phi_3$ =15V Silicon oxide Silicon oxide P - Si

depletion region at the top surface [147].

(a) Charge storage

*Figure 4.10.* Charged-coupled device — principle of operation (from [146])

(b) Charge transfer

The photodetector used in CCDs, the photogate, is a MOS capacitor which converts impinging photons into stored charge rather than into photocurrent or voltage signals. By applying the synchronized sequence of voltages on the gates, the charge is transfered to neighboring transistors (deeper potential well), operating as a shift register. By repeating this process, the accumulated photo charge of each pixel eventually reaches the end of the row where it gets processed by a charge-sensitive amplifier and is turned into a voltage or current signal [147].

An example of how the CCD functions is given in Figure 4.11. In this example the pixel contains three electrodes. Voltages are applied in specific sequence on these electrodes in order to transfer the charge. Six clock cycles are needed to transfer the charge from one pixel to another. To transfer the charge across the pixel array, the charge is shifted down the column and then horizontally. The charge is subsequently deposited on a storage capacitor and transferred to the readout line by the output amplifier [150].

A big advantage of CCDs is the very high granularity and the possibility to design devices with large areas, while avoiding dead zones. Additional advantage in using CCDs is their low noise performance. The cons in using the CCDs are the relatively slow readout, especially with large area devices. Since charges need to be moved through a large number of elements, the transfer efficiency is a very important parameter of the CCD device, where it is desirable to be as close to 100 % as possible [150].

Another aspect is their relatively modest radiation hardness, due to the displacement damage caused by the high energy particles and photons [152]. This affects the performance of the CCDs by introducing defects and trap-



**Figure 4.11.** Upper right: schematic cross-sectional view of a CCD array. Upper left: Timing diagram of transfer voltage. Lower right: Transfer of charge between two adjoining pixels. Lower left: Sequential readout of pixels. Image from [150]

ping centers [153]. Different architectures of CCDs were studied in order to improve the key parameters and to make them practical devices in various fields of use. The buried-channel or two phase CCD was designed in order to improve the charge transfer efficiency and speed [154, 147].

# 4.3 Monolithic CMOS image sensors

CMOS image sensors have been used more extensively since the 1990s. The novelty in using CMOS image sensors is in integrating more functionalities within pixels, and taking advantage of the conventional CMOS scaling and inexpensive technology. CMOS image sensors are mixed-signal circuits containing pixels, analog signal processors, analog-to-digital converters, bias generators, timing generators, digital logic and memory [6]. The schematic and main architecture principle of the CMOS image sensors is shown in Figure 4.12.



*Figure 4.12.* Architecture of the CMOS image sensor. Image from [155]

The first CMOS pixels arrays were passive pixel sensors (PPS) [156]. Passive pixel consists only of a photodiode and an access transistor, operating similar to DRAM memory, shown in Figure 4.13(a). When the access transistor is activated, the photodiode is connected via a vertical column bus to a charge-integrating amplifier. With the access transistor off, the photodiode discharges at a rate proportional to the incident light intensity. When the photodiode is accessed, the voltage on the photodiode is reset to the column bus voltage, and the charge, proportional to the optical signal, is converted to a voltage by the amplifier [157, 31].

The single-transistor photodiode passive pixel allows the highest fill factor (up to 90%). On the other hand, PPS sensors suffer from low sensitivity and readout noise due to the large bus capacitances which are directly connected to the pixel circuit during readout and the consequential limited scalability to larger array sizes or high frame rates. PPS arrays are prone to column fixed patter noise (FPN) due to column amplifier mismatch [31, 147].

First experiences with the CMOS PPS led to the finding that a buffer/amplifier could potentially improve the performance of the pixel. A sensor with an active amplifier within each pixel is referred to as an active pixel sensor or APS. The CMOS APS have smaller pixel fill factor than PPS, due to the pixel-level amplifier. However, loss in optical signal is compensated by a reduction in readout noise, leading to an increase in the signal-to-noise ratio (SNR) and dynamic range [157]. The applied active circuit is usually a sim-

ple source follower that acts as both an amplifier and a buffer that isolates the photodetector charge from the large capacitances of the readout buses. Power dissipation is kept low because each amplifier is only activated during readout.

APS operate in integration mode. While a PPS directly transfers the accumulated signal charges to the outside, an APS transduces the charges to a voltage at the pixel level. Conventional APS's suffer from a high level of fixed pattern noise (FPN) caused by the differences in the transistor thresholds and gain properties due to the wafer process variations. Typical photodiode APS structure is shown in Figure 4.13(b).



**Figure 4.13.** Passive and active CMOS pixel sensors. Notice the added source follower transistor in APS (SF). Image from [31]

Other architectures were developed to improve the characteristics of the first APS. Photogate (PG) type APS (Figure 4.14(a)) was introduced in 1993 [157, 158], and it employs the principle of operation of CCDs concerning integration transport and readout inside each pixel, while keeping the ability for a random access readout. Its transfer of charge and correlated double sampling permits a low noise operations, therefore, it is suitable for high performance and low light applications. Pinned photodiode APS (Figure 4.14(b)) have lower pixel noise and lower dark current, and offers higher sensitivity than PG. The drawback compared to the photodiodes is that it can not be fabricated in a standard CMOS process [159].

APS sensors were expanded past the photodiodes or photogates. Some-



Figure 4.14. APS improved architectures

times non-linear output could be desirable to increase the dynamic range. These sensors are called logarithmic APS. However, logarithmic APS also contain large FPN, and currently are not used as extensively as before.

### Comparison of CCD and CMOS image sensors

CMOS and CCD image sensors are both made from silicon and convert incident light into electrical charge based on the similar physical processes. Both technologies support photogates and photodiodes as sensing elements and have similar fundamental properties of sensitivity in the visible and NIR ranges of the electromagnetic spectrum [160].

CCD technology was dominant since its invention in 1970, because it provided better solutions to the typical problems, such as FPN and it had a higher fill factor, smaller pixel size, larger format, etc. than CMOS, which could not compete with CCD performance. CCDs provide excellent image quality at excellent noise performance. However, it is not practical to integrate other camera functions like clock drivers, logic circuits for timing control, signal processing blocks on the CCD chip. Most CCD camera systems therefore contain several chips. The amplitude and shape of the various clock signals controlling the charge transfer in CCDs are critical. Generating correctly sized and shaped clocks is usually done in a specialized clock driver chip [157].

The principle of the light detection is the same for both CMOS sensors and CCDs, but further stages are quite different. The accumulated charge packets in CMOS sensors are not transferred as with the CCDs, but converted as early as possible by charge-sensitive circuits. The main drawback

of the CMOS sensors is the mismatch of the individual amplifiers in the pixels leading to significant fixed pattern noise (FPN), however, the issue has been mostly resolved by employing on-chip correction techniques.

One employed method in reducing the noise is correlated double sampling (CDS). Higher dark currents contribute to the larger noise in CMOS sensors compared to the CCDs. A major benefit of CMOS cameras over CCD lies in the possibility of integrating all of the electronic sensor functions. E.g. A/D conversion, signal processing, timing logic, exposure control, etc. are integrated on-chip [31].

Main advantages of CMOS image sensors compared to the CCDs are:

- On-chip functionality and processing. CMOS image sensors are made using standard CMOS technology, enabling monolithic integration of readout and signal processing electronics. ADCs, amplifiers, etc. are integrated on the same chip [6]. This is not the case for CCDs, where additional chips are needed that perform the required operations.
- Random pixel access. CMOS sensors allows windowing and region-ofinterest (ROI) access, which generally is not available for CCD sensors [161].
- Removal of blooming and smearing effects. If the readout of the CCDs is slow, or the intensity of the light is too high, during the readout there may be smearing or blooming effect. This occurs due to the spilling over of electrons from wells into adjacent channels[146]. This effect is not found in CMOS image sensors.
- Low power consumption. CMOS image sensors power consumption is significantly lower than the CCDs, for example due to the lower voltage requirement [147, 6]. Additionally, to operate a CCD, more logic is needed, which increase the power consumption.
- Lower cost. CMOS image sensor technology scales in line with the standard CMOS technology process, which helps reducing the costs [155]. However, lower costs for CMOS is mainly due to the cost of the needed additional functions, such as timing and digitization [160]. This is due to the processes involved in creating good imaging performance are not always used in standard CMOS production.

However, CMOS sensors do possess several disadvantages, mainly:

• *Sensitivity*. CMOS Monolithic Active Pixel Sensors (MAPS) have a limited fill factor, and therefore have a reduced sensitivity to incident light [6].

- *Noise*. CCDs have better noise performance than CMOS sensors, due to less on-chip circuits and common output amplifiers, among others [160]. However, noise in CCDs drastically increases with temperature [6].
- *Dynamic range*. Dynamic range is limited by the photosensitive area size, integration time and noise. CCDs still have better better performance in this area [6].
- Uniformity. The pixel response of CMOS image sensors is not as consistent as of the CCDs. Amplifiers for each pixel in CMOS sensors may have different values for gain and offset, which introduces further nonuniformities. While recent CMOS image sensors are improving their performance, their response is still not as good as that of the CCDs, especially in regards of dark nonuniformities [160].

CMOS image sensors have been focused on improving their properties, like low noise, dynamic range and fill factor. One area where CMOS image sensors are increasingly used is in high-speed imaging, where the achieved frame rates are much higher than CCDs [152]. Due to the development in the CMOS image sensors new applications have appeared. Automotive applications, mobile phone imaging, medical applications, etc. have been improved.

CMOS image sensors offer superior integration, power dissipation and system size at the expense of image quality (particularly in low light), however with the recent developments the gap is closing. CCDs offer superior image quality and flexibility at the expense of system size. They remain the most suitable technology for high-end imaging applications, such as digital photography [160]. A summary of CMOS and CCD image sensors is given in Table 4.1.

### 4.4 Acquisition process and noise sources

The operation principles of CCD and CMOS image sensors do differ, however, a very similar acquisition model can be proposed for both of them, illustrated by a simplified diagram in Figure 4.15.

In short, CCDs and CMOS both transform incoming light photons into voltage output values. More precisely, these sensors are silicon-based integrated circuits including a dense matrix of photo-diodes that first convert light photons into electronic charge [162, 163]. Light photons interact with the silicon atoms generating electrons that are stored in a potential well.

| CCD                                  | CMOS                        |  |
|--------------------------------------|-----------------------------|--|
| Lower noise                          | Low power consumption       |  |
| Smaller pixel size                   | Single power supply         |  |
| Lower dark current                   | High integration capability |  |
| 100 % fill factor                    | Lower cost                  |  |
| Higher sensitivity                   | Single master clock         |  |
| Electronic shutter without artifacts | Random access               |  |

**Table 4.1.** CCD and CMOS advantages, respectively (from [6])

When the potential well is full, the pixel saturates, and no further electrons are stored (with CCDs, this can lead to blooming effect).

In case of the CCDs, the accumulated charge may then be efficiently transferred from one potential well to another across the chip, until reaching an output amplifier where the charge is converted to a voltage output value. This voltage is then quantified to give the corresponding pixel value.

For the CMOS technology, the impinging photons are also accumulated in the photo-diodes. However, unlike CCDs, CMOS pixels have conversion electronics to perform the charge to voltage conversion at each location. This increases noise and generates extra fixed pattern noise sources compared to CCDs [163].



**Figure 4.15.** Diagram of the acquisition process and the noise sources in the pixel sensor (from [164])

#### **Noise sources**

Two physical phenomena are responsible for the random noise generation during the camera acquisition process: the discrete nature of light, which is behind the photon shot noise [165], and thermal agitation, which explains the random generation of electrons inside the sensor when the temperature increases. Noise determines the SNR (Signal-to-Noise Ratio) of the detectors, and ultimately, the resolution. Therefore, it is desirable to keep the noise to a minimum.

Photon shot noise. As previously mentioned, photon shot noise is caused by the discrete nature of light. Most light sources emit the photons independently of each other, leading to a Poisson noise distribution in the observed photon numbers. This implies that detection of an average number of N photons is associated with an r.m.s. noise figure  $\Delta N_{Poisson}$ , given by

$$\Delta N_{Poisson} = \sqrt{N} \tag{4.13}$$

where N is the number of photons. The Poisson noise limits the SNR performance demanded from the photodetector, where the theoretical maximum is given by the eq. 4.13.

Dark shot noise. Some of the electrons accumulated on the potential well do not come from the photodiode but result from thermal generation. These electrons are known as dark current, since they are present and will be sensed even in the absence of light. Dark currents can be generated at different locations in the sensor and they are related to irregularities in the fundamental crystal structure of the silicon and crystal defects [162].

For an electron to contribute to the dark current, it must be thermally generated but also to manage to reach the potential well. This last event happens independently for each electron. As a consequence, it can be shown that the number of electrons thermally generated and reaching the potential well p is well modeled by a Poisson distribution, depending on the temperature and exposure time. This noise is generally referred to as dark current shot noise or dark shot noise.

Thermal noise. Known also as Johnson or Nyquist noise, thermal noise is caused by the random thermal motion of the current carriers [166]. In every conductor or resistor at a temperature above absolute zero, the electrons are in random motion, and this vibration is dependent on temperature. The thermal noise voltage is

$$\overline{v} = 4kTR\Delta f \tag{4.14}$$

where  $\overline{v}$  is the *rms* noise signal in volts, k is the Boltzmann constant, T is temperature in Kelvins, R is the resistance in ohms and  $\Delta f$  is the frequency bandwidth in hertz [167].

In CMOS sensors, reset operation causes noise, called kTC noise, which is a special case of thermal noise. Translated into effective charge noise  $\Delta Q_{reset}$  on the detection capacitance C, reset noise at the temperature T is

$$\Delta Q = \sqrt{kTC} \tag{4.15}$$

The schematic used to derive the equation is presented in Figure 4.16.

The kTC noise does not depend on the resistance of the transistor (marked as  $R_{ON}$  in the picture). This is because as the resistance increases, so does the noise increases, but the noise bandwidth decreases, thereby the kTC noise is independent of the resistance [167, 150].



**Figure 4.16.** Equivalent circuits of kTC noise.  $R_{on}$  is the ON-resistance of the reset transistor and  $C_{PD}$  is the accumulation capacitance. Image from [167]

Flicker (1/f) noise. The major cause of 1/f noise (sometimes called pink noise) in semiconductor devices is traceable to properties of the surface of the material. The generation and recombination of carriers in surface energy states are important factors [166].

Fixed pattern noise (FPN). Fixed pattern noise consists of two components, photo response non-uniformity (PRNU) and dark current non-uniformity (DCNU). The PRNU describes the differences in pixel responses to uniform light sources. Different pixels will not produce the same number of electrons from the same number of impacting photons. Not all impinging photons will be absorbed in the photodiode. This is caused by variations in pixel geometry, substrate or micro-lenses [168]. The effect of PRNU is proportional to illumination and is not present in the absence of signal. DCNU represents the variations in the dark current generation rates from pixel to pixel. This variations is related to the material characteristics of the pixels.

Other noise sources in the detectors could be *column noise*, due to the differences in the column amplifiers, and also *quantization noise*.

### 4.5 Sensor characterization and comparison

In order to provide sensor characteristics, it is important to use a standardized test method which provides consistent, quantitative, and verifiable performance data such as read noise, dark current, full well capability, dynamic range, linearity, gain, sensitivity. A test method which is commonly used to obtain these parameters is called Photon Transfer Curve (PTC) [169].

#### **Photon Transfer Curve**

An ideal PTC response from a camera system exposed to a uniform light source is illustrated in Figure 4.17. For a sub-array of pixels, the root-mean-square (*rms*) noise is plotted as a function of average signal at different light levels (or exposure times). Four distinct noise areas of operation exists in a PTC. The first area, read noise, represents the random noise measured under absence of any light, which often includes several different noise contributors. As the light illumination is increased, read noise gives way to photon shot noise, which is shown in the middle region of the curve. Since the plot in Figure 4.17 is on log-log coordinates, the shot noise is characterized by a



**Figure 4.17.** An ideal PTC curve showing four typical noise areas of operation ([169])

line with a slope of 1/2.

The third regime is associated with pixel FPN, which produces a characteristic slope of unity because signal and FPN scale together. The fourth region occurs when the sub-array of pixels enters the full-well area of operation. In this region, the noise modulation typically decreases as saturation

is approached. Although shot noise always decreases, for some arrays the FPN may actually increase (CMOS detectors often exhibit this characteristic). This happens because some columns of the array may reach full well before others, generating a fixed-pattern, column-to-column noise. In either case, a rapid noise deviation from the 1/2 or 1 slope curves indicates that full well operation has taken place.

#### EMVA characterization standard

An additional method of sensor characterization is given by the European Machine Vision Association (EMVA), which is also based on the PTC curve, with small modifications [170]. The EMVA standard presumes an ideal image sensor for characterization, i.e. following conditions need to be met:

- The amount of photons collected by a pixel depends on the product of irradiance E (units  $W/m^2$ ) and exposure time  $t_{\rm exp}$  (units s), i.e., the radiative energy density  $Et_{\rm exp}$  at the sensor plane.
- The sensor is linear, i.e., the digital signal *y* increases linearly with the number of photons received.
- All noise sources are considered to be white with respect to time and space. The parameters describing the noise are invariant with respect to time and space.
- Only the total quantum efficiency is wavelength dependent. The effects caused by light of different wavelengths can be independently considered.
- Only the dark current is temperature dependent.

A real sensor will be more or less different from the ideal sensor and the given assumptions. As long as the difference is not very large, the EMVA standard can be used.

#### EMVA characterization method

The setup for the measurement of sensitivity, linearity and non-uniformity, according to the EMVA standard is shown in Figure 4.18. During the measurement, all sensor parameters, as well as the temperature, need to be kept constant. Certain parameters, like offset and gain, must be set properly in order to correctly measure the dark current. The offset is set as low as possible, but large enough to ensure that the dark signal including the temporal noise and spatial non-uniformity does not cause significant underflow, i.e.

less than 0.5% of the pixels, with the absence of light, can have the zero value.

The measurement procedure requires taking at least 50 equally spaced exposure times or irradiation values resulting in digital gray value from the dark gray value and the maximum digital gray value, with two images for each irradiation level. It also requires taking the frames at each exposure time with and without the light. For the measurement, a light source is re-



**Figure 4.18.** Characterization setup for indirect method. 1-Light source, 2-Monochromator, 3-Integrating sphere, 4-Photodiode with 5-Amplifier

quired that irradiates the image sensor homogeneously without a mounted lens. Thus the sensor is illuminated by a diffuse disk-shaped light source with a diameter D placed in front of the camera, at a distance d from the sensor plane (Figure 4.19(a)). Each pixel must receive light from the whole disk under an angle.

It is important that the incoming light on the sensor is homogeneous. To achieve that, the visible light setup contains the integrating sphere. The homogeneity of the irradiation depends on the diameter of the sensor, D', as shown in Figure 4.19(b). For a distance d=8\*D and a diameter D' of the image sensor equal to the diameter of the light source, the decrease is only around 0.5%. Thus the diameter of the sensor area should not be larger than the diameter for the opening of the light source.



**Figure 4.19.** *a*): Optical setup for the irradiation of the image sensor by a disk-shaped light source, b): Relative irradiance at the edge of an image sensor with a diameter D', illuminated by a perfect integrating sphere with an opening D at a distance d = 8D, according to the EMVA standard (from [170]).

### 4.6 Image sensors characterization results

All results of the characterization of the used image sensors are presented in this section. As mentioned, camera was characterized using three methods, namely EMVA standard, PTC curve and also, using the direct X-ray exposure with different targets.

### Direct X-ray exposure

This was the first conducted characterization method, and was used to better understand the parameters of the used CMOS sensor. The characterized sensor was CMOSIS CMV2000 [125], with 2.2 MPix maximum resolution. The suitability of the direct X-ray exposure method is due to the known parameters of the various used targets. The setup if shown in Figure 4.20, and the characterization was conducted in ETP Institute, KIT [171].

The X-ray transition energies [172] table is shown in the Table 4.2. Only values for the used targets are presented.

The principle of the setup relies on the interaction of the X-rays and different materials. High energy X-ray photons are created in the X-ray tube and emitted on the target. Due to the bombardment of the target with photons,  $K_{\alpha_1}$  emission (photons) occurs. Since the materials used as targets have a known  $K_{\alpha}$  energy, we can use this information to map the acquired gray value (ADC) from the sensor to the collected number of electrons.  $K_{\alpha}$  is



**Figure 4.20.** Direct X-ray Irradiation setup for pixel sensor calibration in ETP. Main setup elements are clearly marked in the picture.

| Element | Energy ( $e^-V$ ) | $e^{-}$ |
|---------|-------------------|---------|
| Fe      | 6403.13           | 1779    |
| Cu      | 8048.11           | 2236    |
| Zn      | 8639.10           | 2400    |
| Mo      | 17479.10          | 4855    |
| Ag      | 22162.99          | 6156    |
| In      | 24209.78          | 6725    |
| Sn      | 25271.34          | 7020    |

**Table 4.2.** *X*-ray transition table for used elements

the strongest X-ray spectral line for an element irradiated with the energy sufficient to cause X-ray emission [173]. Once all the values are mapped to the corresponding ADC values, conversion ratio (in  $e^-/ADC$ ) can be deter-

mined from the resulting curve.

Due to the small pixel size of the sensor, and the relative high energy of the incident photons, ionization was not localized in a single pixel, but clusters of pixels were formed. Therefore, the total resulting energy of the incident photon was derived by accumulating values in clusters. The process of identifying the ADC value of the material is shown in Figure 4.21. The accumu-



Figure 4.21. Hysteresis plot for Molibden (Mo).

lated cluster values are plotted on the hysteresis, and the ADC value of the used material can be determined (for *Molibden* it is 565 *ADC* counts).

From the linear fit plot we can determine the conversion factor, in this case it is  $7.39\ e^-/ADC$ . Of course, the conversion factor depends on the set parameters of the image sensor, and it can change. This will be further discussed in next chapters.



**Figure 4.22.** Linear fit for all obtained values ( $e^-/ADC$  is electrons per ADC value).

#### EMVA characterization results

When characterizing the sensor using visible light, variable irradiation of the sensor is needed. The sensor is irradiated for the values starting from the dark, unilluminated state to the state in which sensor is in the saturation state, i.e. the increase of the irradiation is not causing the increase in ADC value. There are three ways of changing the irradiation of the sensor:

- Constant illumination with variable exposure time. With this method, the light source is operated with constant radiance and the irradiation is changed by the variation of the exposure time. Because the dark signal generally may depend on the exposure time, it is required to measure the dark image at every exposure time used. The absolute calibration depends on the true exposure time being equal to the exposure time set in the camera.
- Variable continuous illumination with constant exposure time. With this
  method, the radiance of the light source is varied by any technically
  possible way that is sufficiently reproducible. With LEDs this is simply
  achieved by changing the current. Therefore the absolute calibration
  depends on the true exposure time being equal to the exposure time
  set in the camera.
- Pulsed illumination with constant exposure time. With this method, the irradiation of the sensor is varied by the pulse length of the LED. When switched on, a constant current is applied to the LEDs. The irradiation H is given as the LED irradiance E times the pulse length t. The sensor exposure time is set to a constant value, which is larger than the maximum pulse length for the LEDs. The LEDs pulses are triggered by the "integrate enable" or "strobe out" signal from the camera, which is when the sensor starts the integration time (exposure time). The LED pulse must have a short delay to the start of the integration time and it must be made sure that the pulse fits into the exposure interval. The pulsed illumination technique must not be used with rolling shutter mode. Alternatively it is possible to use an external trigger source in order to trigger the sensor exposure and the LED flashes synchronously.

The constant illumination with variable exposure time is selected as a characterization method. As previously mentioned, frames were taken at 50 equally spaced exposure times with the digital gray value from the dark gray value and the maximum digital gray value (point of saturation), with

two images for each irradiation level. The frames were taken at each exposure time with and without the light. EMVA characterization was conducted in ANKA lab [174], where the lab is shown in Figure 4.23.



**Figure 4.23.** 3D sketch of the ANKA Detector Lab. 1: X-ray setup. 2: Visible light setup. 3: Control electronics. 4. X-ray control interface. 5: Visible-light control interface. Taken from [174].

The mathematical model of the sensor is presented in Figure 4.24. Digital



**Figure 4.24.** Model of the pixel sensor (from [170]). Image a shows a physical model of the sensor, while b represents a mathematical model of a pixel sensor

image sensor converts incoming photons into a digital value. On average  $\mu_p$ 

photons hit the area of a single pixel. Quantum efficiency,  $\eta$  indicates how many of them is absorbed:

$$\eta(\lambda) = \frac{\mu_e}{\mu_p} \tag{4.16}$$

The mean number of photons that hit a pixel with the area A during the exposure time  $t_{\text{exp}}$  can be computed from the irradiance E on the sensor surface in  $W/m^2$  using the following equation:

$$\mu_p = \frac{AEt_{\rm exp}}{hc/\lambda} \tag{4.17}$$

where c is the speed of light and  $\lambda$  is the light wavelength. In the camera electronics, the charge units accumulated by the irradiation is converted into a voltage, amplified, and finally converted into a digital signal y by an analog digital converter (ADC). The whole process is assumed to be linear and can be described by a single quantity, the overall system gain K with units  $ADC/e^-$ , i.e., digits per electrons. Then the mean digital signal  $\mu_y$  results in:

$$\mu_{y} = K(\mu_{e} + \mu_{d}) = \mu_{y.dark} + K\mu_{e},$$

$$\mu_{y} = \mu_{y.dark} + K\eta\mu_{p} = \mu_{y.dark} + K\eta\frac{\lambda A}{hc}Et_{exp}$$
(4.18)

where  $\mu_d$  is the mean number of electrons present without the light which results in the mean dark signal  $\mu_{y.dark} = K\mu_d$ . The eq. 4.18 is used to determine the responsitivity  $K\eta$ , and the overall system gain K. From the eq. 4.13, we can determine the variance of the charge units (electrons):

$$\sigma_e^2 = \mu_e \tag{4.19}$$

The variance of all noise sources add up linear, the total temporal variance of the digital signal y,  $\sigma_y^2$ , is given by:

$$\sigma_y^2 = \underbrace{K^2 \sigma_d^2 + \sigma_q^2}_{\text{offset}} + \underbrace{K}_{\text{slope}} (\mu_y - \mu_{y.dark})$$
(4.20)

where  $\sigma_d^2$  is a signal independent noise (with normal distribution) related to the noise caused by the sensor readout and amplifier circuits, and  $\sigma_q^2 = 1/12 \, [ADC^2]$  is the variance introduced by the quantization [103]. The eq. 4.20 is used for the characterization of the sensor, which is also known as the photon transfer method [175].

The characterization setup with smart camera platform is shown in Figure 4.25.

The results are shown in Table 4.3, and the resulting image is shown in Figure 4.26. We can see that the conversion factor, shown in Table 4.3, is



Figure 4.25. Characterization setup

| Parameter           | Value          |
|---------------------|----------------|
| Dark noise          | $10~e^-$       |
| Saturation capacity | $7859~e^-$     |
| Conversion factor   | $7.82~e^-/ADC$ |
| Quantum efficiency  | 57.41 %        |
| Dynamic range       | 57.22 dB       |

**Table 4.3.** Sensor parameters obtained using EMVA compliant method

only slightly different from the one obtained using direct X-ray exposure,  $7.39~e^-/ADC$  vs.  $7.82~e^-/ADC$ . The small difference can be explained with the temperature difference in both measurement, since all other sensor parameters were the same. Green line represents the linear fit of the data. Only first 70 % of all points are taken into account when the fit is calculated.

The obtained results will naturally vary if we change certain sensor parameters. By changing the sensor parameters like analog gain, main sensor



Figure 4.26. Photon transfer method

clock, or ADC gain value, as per sensor documentation [125], the saturation capacity and conversion factor may change:  $9203e^-$  and  $6255e^-$  as saturation capacity,  $9.6e^-/ADC$  and  $6.4e^-$ , as the conversion factor, for minimum and maximum analog gain, respectively. Also, if we use 12-bit pixel values instead of the 10-bit, conversion factor will also be different:  $6.4e^-$  for 10-bit, and  $1.92e^-$  for 12-bit, which is expected.

The sensor was compared with industrial cameras which are used in X-ray experiments conducted at ANKA. The comparison results are shown in Table 4.4.

### Photo transfer curve results — HZG setup

Additional characterization was conducted together in cooperation with the Helmholtz-Zentrum Geesthacht (HZG) [176, 177]. As mentioned in Section 4.5, a PTC is generated by plotting the camera's output average standard noise deviation vs. average signal level on a Log-Log curve. Then readout noise is directly available from PTC by recording the noise level at zero illumination. The saturation capacity parameter is the signal level at the point where the PTC drops.

| Sensor                          | CMOSIS 10-bit | CMOSIS 10-bit | PCO 1400    |
|---------------------------------|---------------|---------------|-------------|
| properties                      | min. gain     | max. gain     | CCD 12-bit  |
| Pixel size ( $\mu^2$ )          | 5.5 x 5.5     | 5.5 x 5.5     | 6.45 x 6.45 |
| Dark noise ( $e^-$ )            | 13            | 12            | 5           |
| Saturation capacity $(e^-)$     | 9203          | 6455          | 14985       |
| Conversion factor ( $e^-/ADC$ ) | 9.6           | 6.4           | 0.91        |
| Quantum efficiency (%)          | 48            | 47            | 54.7        |
| Dynamic range $(dB)$            | 56.34         | 53.98         | 68.99       |

Table 4.4. Sensor comparison

The conversion gain G of cameras is typically expressed in  $ADC/e^-$ , which represent the number of A/D units per signal electron. To obtain this gain, we note that for an increase in illumination of X, the camera's average signal level will change by GX. In contrast, the same increase in illumination level will cause the noise variance, to change by  $G^2X$ , then:  $Log(G^2X) = LogGX + LogG$ . That means that on the logarithmic plot an intersection between the line interpolating the PTC part with slope of half and the x-axis gives the LogG. Thus, by plotting the photon transfer curve in logarithmic scale we can easily find the camera's key performance parameters by graphical means.

For the "dark current", four image frames were taken at the given exposure time starting from zero with equidistant steps of 100 ms up to 6800 ms. The result for a single pixel is presented in Figure 4.27. It is a plot of the mean value for each group of four images versus exposure time. The linear behavior with a slope of slightly above zero of the camera sensor can be seen.

At the exposure time of 4500~ms a slight jump down to the original value at 0 exposure time can be detected, which is caused by cooling the sensor down to the originally set temperature of  $-10\,^{\circ}$ C. It is also clear that the overall dispersion is in a range of 20/300 i.e. less than  $6.7\,\%$  and the overall average mean value is about 290 counts. So, since our camera is operated in 12-bit mode (max value is 4095), we assume that the dark noise level is reasonable enough for the subsequent measurements.

The next step is the PTC measurement, with results shown in Figure 4.28. Single images were acquired with and without the illumination at the given



Figure 4.27. Dark current test results, from [176].

exposure time, a 100 for each state. Starting from zero with a step of 100 ms, the exposure time goes up to 12 seconds in order to cover the sensor's full-well capability. The mean values and variances are calculated for every group of 100 pictures.

In Figure 4.28, on the X-axis we plot the values of the difference between the illuminated and dark value (mean value — dark current) of a given pixel, while on the Y-axis we plot the values of the variance for the same pixel. By fitting the first "dark current" points by a horizontal line and intersecting this line with the Y-axis, the value of the readout noise is obtained. With the maximum Y-value, the saturation capacity parameter is obtained at the corresponding X-point. By fitting the points between the saturation capacity and the first points with low-exposed light, the line with slope of 1/2, and intersecting this line with the X-axis, the corresponding conversion gain is calculated.



**Figure 4.28.** PTC characterization results. Values on both axis are natural logarithm values (alog is  $log_e$ ), from [176].

## 5 Applications

Since the 1990s, smart cameras have attracted significant interest from research groups, universities, and industry. This is due to the advantages of the smart cameras over normal (or standard) cameras by performing not just image capture but also image analysis and event/pattern recognition, all in one compact system [31]. The increasing performance of smart cameras is related to the progress in semiconductor process technology and computer vision techniques. Currently, smart cameras are employed in many scientific and industrial applications, including but not limited to video surveillance, machine vision, biology, and medicine.

One field where high performance digital detectors are increasingly applied, is in X-ray imaging experiments, where they are utilized since the 80's [178]. Since the discovery of X-rays by Wilhelm Conrad Röntgen [179], non-destructive imaging of objects has proved to be a powerful tool in diverse fields such as medicine, materials research, archeology, quality control or homeland security. The progress in X-ray imaging techniques has persisted since the end of 19<sup>th</sup> century. E.g. the use of X-rays in determining the atomic structure of crystals initiated a new field of science, called X-ray crystallography [180]. From that time and on, the advance in the fields of X-ray medical imaging, X-ray crystallography, and X-ray imaging in general strongly depended on the improvement of X-ray sources and X-ray detectors.

### 5.1 X-ray applications

X-rays are high energy electromagnetic radiation. Depending on its energy, and thus on its ability to ionize mater, radiation can be classified either as non-ionizing or ionizing [181]. The ionization potential of atoms, or the minimum energy required for ionizing an atom, ranges from a few eV for alkali elements to 24.6 eV for helium.

Non-ionizing radiation cannot ionize matter because its energy is lower than the ionization potential of matter. *Ionizing radiation* can ionize matter either directly, with charged particles, or indirectly, with neutral particles, since its energy exceeds the ionization potential of matter. Directly ionizing radiation is caused by electrons, protons, alpha particles, heavy ions. The deposition of energy is accomplished through direct Coulomb interactions between the

ionizing charged particle and orbital electrons of atoms in the medium. Indirect ionization is caused by photons (like x-rays or gamma rays), or neutrons, and the deposition of energy is a two step process, where first a charged particle is released in the medium (photons release electrons or positrons, neutrons release protons or heavier ions), and then the released charged particle through direct Coulomb interactions with orbital electrons deposits the energy.

X-rays have a frequency between  $3*10^{16}$ Hz and  $3*10^{19}$  Hz. Thus, according to the formula:

$$E = \frac{hc}{\lambda} \tag{5.1}$$

the energy E of X-ray photons varies from 120 eV to 120 keV, which is more than enough to ionize atoms and molecules.

### Synchrotron radiation

Short-wavelength synchrotron radiation generated by relativistic electrons in man-made circular accelerators dates back to shortly after the Second World War [182]. However, the theoretical basis for synchrotron radiation traces back to the time of Thomson's discovery of the electron in 1897. In the same year, Joseph Larmor derived an expression from classical electrodynamics for the instantaneous total power radiated by an accelerated charged particle, and the following year, the French physicist Alfred Liénard showed the radiated power emitted by electrons moving on a circular path to be proportional to  $(E/mc^2)^4/R^2$ , where E is the electrons' kinetic energy, m is the electron rest mass and R is the radius of the trajectory. The first observation of synchrotron radiation came on 24 April 1947, at the General Electric Research Laboratory in Schenectady, New York [183].

Initially, synchrotron radiation was seen as an unwanted and unavoidable loss of energy in accelerators designed (ironically) to produce intense beams of X-rays by directing accelerated electrons onto a suitable target. The potential advantages of synchrotron radiation were first presented in 1956, in Cornell [184].

The first generation of synchrotron radiation sources were sometimes referred as parasitic facilities (reflecting, perhaps, the perception of the particle physicist which were primary users of these facilities), as the synchrotrons were primarily designed for high-energy or nuclear physics experiments. Most of these facilities had storage rings energies around 1 GeV, and the experiments were operating in the soft X-ray regimes.

Synchrotron radiation has a continuum spectrum from the infrared to hard X-rays, including the visible light. In 1964 the synchrotron radiation was ex-

tended to hard X-ray region for the first time, using the 6 GeV Deutsches Elektronen-Synchrotron (DESY) in Hamburg. After the development of efficient electron storage rings for long-term operation, first dedicated facilities designed specifically for synchrotron radiation were developed. The 2 GeV Synchrotron Radiation Source (SRS) at Daresbury, England, was the first of these so-called second-generation synchrotron sources.

Synchrotron radiation is emitted from areas in the storage ring where the particle orbit is bent by electromagnetic fields [185]. The increase in the x-ray beam brilliance could be achieved by optimizing the property of the electron beam in the storage ring, employing so-called insertion devices and by careful design of the magnet lattice used to manipulate the electrons.

Third generation synchrotron radiation sources include *wigglers* and *undulators* (shown in Figure 5.1), beside dipole magnets which are necessary for a storage ring. These magnetic devices are placed in straight sections in between the curved arcs of large storage rings. They operate by perturbing the path of the electrons in an oscillatory manner, so that even though their average direction remains unchanged, synchrotron radiation is produced. The produced X-ray beam is tangent to the curved trajectory of the electrons, therefore the beamlines are located around the storage ring. The first third-generation facility is the European Synchrotron Radiation Facility (ESRF, 6 *GeV* storage ring) in Grenoble, France, which began experiments in 1994 [185, 182].

The original features of third generation synchrotron radiation facilities, when considering the microtomographic applications, are:

- the very high intensity of the X-ray beam
- the high energy of the electrons producing the radiation, which implies the availability of high energy photons (beyond 100 keV)
- the small size of the electron beam cross-section ( $< 100 \ \mu m$ ). This leads to high brilliance, but also to a very small angular extension of the source as seen from a point in the specimen, hence to a sizable lateral coherence of the X-ray beam.

Additional important characteristics exists which are not directly related to the source, but are nevertheless crucial for the microtomography ( $\mu$ CT). The presence of a suitable detector is important, possessing at the same time a large dynamic range, low noise, and a transfer time shorter than the typical exposure time. Also, the improvement of the reconstruction and image processing procedures is principal, leading to the software design suited to the problem and with the required computing memory and calculation power.



**Figure 5.1.** Schematic of a third-generation synchrotron. "Electrons moving at highly relativistic velocities in an evacuated storage ring emit electromagnetic (synchrotron) radiation as their direction is changed by bending magnets, used to guide them in a closed path, or by wigglers or undulators placed in straight sections of the storage ring, which oscillate the electrons left and right but keep an average straight trajectory. At the beamline, tangential to the storage ring, the radiation is often (but not always) made monochromatic and focused using x-ray optics onto a sample" (Text and image from [182]).

These features combined make it possible to perform microtomographic experiments that are improved or completely new.

ANKA is the Synchrotron Radiation Facility at the Karlsruhe Institute of Technology (KIT). It started it operation in March 2003. The ANKA accelerator complex consists of a 53 MeV microtron as a preaccelerator, a 500 MeV booster synchrotron and a 2.5 GeV storage ring. The injector has a repetition rate of 1 Hz and the booster current is about 5 mA. Injection into the storage ring is two times a day. A nominal current of up to 200 mA is accumulated in the storage ring at 500 MeV and then ramped to 2.5 GeV. The lifetime of the stored beam at 2.5 GeV is 16 hours for 150 mA [186].

ANKA currently contains 13 beamlines, plus additional 4 which are under construction. Among other beamlines, the IMAGE beamline [10], currently under commissioning, is a beamline devoted to X-ray imaging applications in material and life sciences. The main applications of the beamline will be radiographic and tomographic imaging, both in absorption mode and in phase-contrast mode, with a resolution range of approximately  $40~\mu m$  (pro-

jection radiography and tomography) down to  $40 \, nm$  (full-field microscopy). For materials research, ANKA conducts in-situ and in-operando characterization of electronic materials, functional materials, microsystems devices, batteries and multi-phase fluidics. Life sciences applications will include 3D and 4D in-vivo imaging of organisms, hierarchical, correlated imaging, for example of tissues, cells and scaffolds, for the study of morphodynamics, phenotypes and biocompatibility. The beamline will host two separate optical and experimental sections: the so-called IMAGE main branch and the side-branch, shown in Figure 5.2. Various modes of imaging will be imple-



Figure 5.2. Layout of the IMAGE main and side-branches

mented. For contrast modes, absorption, refraction, diffraction, and fluorescence will be utilized, and for imaging modes, 2D microscopy, radiography, topography, microdifraction imaging, 3D tomography and laminography.

### X-ray imaging applications

Compared with optical imaging, the two obvious advantages of X-ray imaging are the fact that they are capable of penetrating through matter, and the fact that they have a much shorter wavelength and hence the potential to produce images with higher spatial resolution. This provides the ability for non-destructive visualization of the internal structure of samples. The wave properties of X-rays provide a number of advanced imaging techniques which allow the visualization of structures via their ability to scatter (or equivalently refract) an X-ray beam. The contrast in the ( $\mu$ CT) image is caused by variations of the linear attenuation coefficient. Commonly, attenuation is understood as the sum of two effects, namely, photoelectric absorption and scatter. Pair production does not occur below 1 MeV photon energy [187].

One of the X-ray imaging applications which will be conducted in the IM-AGE beamline is X-ray microtomography. X-ray tomography is widely used in the industry and science for non-destructive investigation of samples. Tomography allows for 3D representation and investigation of the object. Tomographic methods do not generate a 3D image of an object directly but allows reconstruction of the 3D shape of objects using suitable methods [103]. The example setup of the synchrotron X-ray tomography is shown in Figure 5.3.



**Figure 5.3.** The principle of x-ray computed tomography

Tomography overcomes the limitation of the conventional radiography where when the 3D object is projected on a 2D plane a spatial information is lost. The process of the tomographic scanning is conducted in three steps. The first step is acquiring the data in the form of radiographs [188, 189]. The necessary number of 2D images is approximately equal to the used number of pixel columns of the detector. In the second step a Fouirer transformation is applied on the data. From the Fouirer slice theorem [103], the Fouirer transformation of the object is constructed. Inverse Fouirer transformation applied in the last step in order to obtain the reconstructed image of the object.

Various 3D reconstruction algorithms can be applied. A common one is the filtered back projection (FBP) [190]. The theory behind FBP is not within the scope of this work, however for better understanding of the algorithm, the main points will be mentioned. The first step is acquiring projections for a desired number of steps, or angles, between 0 and  $180^{\circ}$ . Afterwards, a Fast Fourirer Transform (FFT) is applied to the projections, and then multiplied by the weighting function  $2\pi |w|/K$ , where K is the number of projections above  $180^{\circ}$  with a frequency of w. Final step is summing over the image plane the inverse Fourier transforms of the filtered projections (which is the backprojection process) [190]. The advantage in using the algorithms like FBP is that they are suitable for faster computing, since the reconstruction procedure can start as soon as the first projection is acquired.

### 5.2 The UFO project

Based on available photon flux densities at modern synchrotron sources ultra-fast X-ray imaging could enable investigation of the dynamics of technological and biological processes with a time scale down to the microsecond range in 2D and milliseconds in 3D, respectively, especially if broad band radiation is employed. Using CCD detectors, the integration time could be reduced to below 50 ms, as reported in [191]. To achieve very high frame rates, CCD detectors are not ideal, since the serial read-out architecture causes dead times whenever the integration time is shorter than the readout time. To overcome these limitations and therefore allow the investigation of fast processes on the microsecond timescale, CMOS based detectors are employed. In [192], frame rates of 5000 images per second were achieved, using a filtered white beam from ESRF's ID19 wiggler source with flux densities in excess of  $1015 \ ph/s/mm^2$ . Using a larger effective pixel size, frame rates of 40000 images per second were reported [193].

A new project was envisioned to further increase data acquisition times, and to develop and integrate the tools and instrumentation to provide intelligent X-ray imaging of process with high spatio-temporal resolution. The name of the project is "Ultra fast X-ray imaging of scientific processes with on-line assessment and data-driven process control", or UFO [11], in short. Within the framework of the UFO project an experimental station is realized at the IMAGE beamline of KIT's ANKA synchrotron radiation facility. The UFO project aims to push the present limits of high-speed X-ray imaging and introduces a novel concept with on-line data assessment, data driven feedback, and active control of both the sample and the measuring procedure. It also intends to remove the bottlenecks which arise due to slow sample manipulation, data transfer, image reconstruction and interpretation.

The rationale and architecture of UFO is schematically given in Figure 5.4. The UFO setup consists of three sections, a beamline bringing high flux density at the sample position, the UFO experimental station, and a high-performance data storage system. The central component, the experimental stations, consists of the four main functional units: the dedicated sample setup, the dedicated high-speed detector system, the online monitoring and evaluation module, and the control components.

The sample set-up and the detector system constitute the main hardware of the experimental station. The sample set-up is split into special X-ray optics (grating set-up optimized for white beam and high-speed), fast sample and detector manipulation units and special sample environment. The smart high-speed camera contains the detector (sensor, detector optics and



**Figure 5.4.** System architecture of the UFO project. Cascaded feedback loops are used for real-time processing and experiment control. The three main parts are clearly outlined.

readout chip) itself, pre-processing and fast data link. The software part consists of the monitoring, evaluation and visualization components. A fast inner feedback loop and an outer control loop based on classification of reconstructed images form the control components.

The benefits of the UFO project are following:

- On-line reconstruction and fast feedback will allow to control and optimize the beam parameters and the sample alignment leading to a most efficient use of beam time and increased data quality.
- Combined with fast sample loading robotics it will allow to increase sustained sample throughput substantially.
- On-line reconstruction will enable data driven feedback to actively control the process, moreover to drive the measuring procedure in direct dependence on image dynamics of the process, which can be assessed via the optical flow.

# Smart camera platform — Integration in the UFO control system

Within the IPE PDV group [194], a Python-based control system was designed and implemented, mostly by M. Vogelgesang. This control system,

named *Concert*, is described in depth in [16]. *Concert* is designed to meet the requirements of the high-speed applications in beamlines. It integrates device control, experiment processes and data analysis. A generalized hierarchy which shows the device, as well as system integration is shown in Figure 5.5.



**Figure 5.5.** Integration of the DAQ framework in Concert — Users access point to the system is from Concert session. Further software layers abstract underlying hardware. Libuca abstracts cameras (detectors). ALPS is used to access the custom camera platform.

A custom Advanced Linux PCIe Driver (ALPS) [52] was also developed in the IPE PDV group by S Chilingaryan, to support the high throughput of the UFO DAQ platform and to provide flexibility and customization for other applications. A low-level C library called *libuca* [195] was developed in the same group by M Vogelgesang, to provide a general low latency access to 2D pixel detectors from multiple vendors, including custom smart camera based on the ALPS driver. Because there is already a libuca-based detector implementation within *Concert*, we could seamlessly implement device-specific setup and calibration procedures.

In *Concert*, every device *type* (e.g. motor, detector, etc.) implements a base Device interface which provides basic functionality shared by all devices: parameters, locks, and state.

By deriving device type classes from the Device class, we obtain a class hierarchy that *guarantees* re-usability of top-level interfaces. Hence, higher-level processes using such an interface will work with all implementations of that interface. For instance, the base Camera class provides a grab method

to acquire one frame. This interface has many hardware-specific implementations which a higher-level process can use without knowing the hardware-specific details. For instance, slowly controlled devices are accessed through TANGO [196], whereas fast data transfers are handled natively, e.g. the FPGA platform through the ALPS driver and the libuca C library.

Key features of *Concert* for the integration with the smart cameras are handling of the device parameters, synchronization and process control, and data processing capabilities. More details are provided in the next subsections.

#### **Concert Device Parameters**

Each device exposes an arbitrary list of uniquely named parameters that map to Parameter objects. The user reads and — depending on the access rights – writes values into that Parameter object. Before the value is passed to the hardware device, it is validated against the Parameter's soft limit and checked for unit compatibility.

Device classes not only implement device-manipulating methods that can change the state internally (for example starting a continuous motion). To provide a valid interface for all devices of the same type, each base class of that device type either implements its methods directly or dispatches to device-specific methods.

A quantifiable parameter value is always associated with a physical unit which is used for validation and calculation of quantities (numerical values bound to a unit) as well as converting the result to a final target base unit. We enforce the usage throughout the system, from the base parameter class to the user interface to reduce the chance of invalid input. Whereas in most systems, the user can only input what the device accepts any of the following equivalent statements is possible in *Concert*:

```
cam.exposure_time = 0.002 * q.s
cam.exposure_time = 2 * q.milliseconds
cam.exposure_time = q.minute / 120000.0
```

#### **Asynchronous Operation**

To reduce the total amount of time to control several independent devices, device accesses are parallelized wherever possible. *Concert* provides asynchronous device access by encapsulating parameter access and device methods in *future* objects. A *future* represents a value produced by an asynchronous operation that is ready at some point in the future. It provides methods to query its state (running(), cancelled(), done()), to get the final result (result()) and to attach callbacks that are called when the *fu*-

ture has finished. To produce a future for a task, it is internally submitted to a ThreadPoolExecutor. Although Python's global interpreter lock (GIL) prevents real multi-threading, device operations will still run in parallel because I/O operations yield from execution.

By decoupling parameter access from de-referencing the value, we can access several devices at the same time. The *future* objects themselves can then be used to synchronize with other asynchronous operations by chaining callbacks or waiting explicitly for a future to finish. Contrarily, attribute-like parameter access is always executed synchronously because setting the parameter cannot return a *future*:

```
# Synchronous access
motor.position = 1 * q.mm

# Asynchronous access returns a future
future = motor.set_position(1 * q.mm)
future.wait()
```

Because we cannot anticipate the methods that are attached to derived device classes, we provide a Python decorator @async that wraps device methods to provide a similar asynchronous interface. Hence, device developers do not need to care about *how* parallelism is implemented but only apply the decorator like this:

```
class Motor(Device):
    @async
    def move(self, delta):
        self.position += delta
motor.move(-2 * q.mm).wait()
```

#### **Process Control**

The basic device and parameter abstractions can be used to control devices manually. One could use these mechanisms to write simple scripts that perform certain tasks. However, these tasks often have a very similar functionality and differ mostly only in a few parameters. *Concert* reduces re-inventing the same procedures with a high-level process control module, that is the result of decomposing the recurrent logic from hardware-specific operations.

A very common procedure is a scan of a parameter and evaluation of a measure at each scan point. The Scanner object from the process module provides this in an abstract way. By passing Parameter objects instead of Devices, we can model every type of scan, for example a tomographic scan

In this example, the scanner manipulates the detector's exposure time and acquires one frame at each set time. When the process is finished, the positions and the actual frame data are returned. Although scanning can already process the measured data, it is not suitable for *feedback-based control* due to the missing feedback loop.

Feedback-based control is necessary for beamline tasks that need to evaluate a measure and act upon the result. In tomographic environments, the control algorithms often require an image-based feedback, e.g. focusing, sample alignment, etc. This logic can be decoupled into image-based metrics, control algorithms and feedback mechanisms.

We can find different metrics for diverse problems, or even for a single one like focusing, where one can use gradient information, variance or some other metric. Data assessment based on such metric is then used by a control algorithm which optimizes the parameters in order to achieve better results. Computed parameters are then projected onto the hardware by a high-level device API, which closes the feedback loop.

Because of the asynchronous approach employed in *Concert*, we are able to use the feedback loops in a *continuous* mode, i.e. we do not have to wait for the above-mentioned steps to finish one before each other, but let them run in parallel.

#### **Data Processing**

Process abstractions can employ basic data processing to enhance the control outcome (e.g. with NumPy). Up to now, control and data processing are two independent entities in control systems, with data processing commonly moved to a later offline stage. However, to improve the user's beam time, it is necessary to analyze the results right after or during the acquisition.

For this, we integrated our GPU-based data processing framework UFO within *Concert* [195]. With this framework, a user describes their processing workflow as a graph of processing tasks. The result is the transformation of data going from the roots to the leaves of the graph. The graph itself is transformed before execution to utilize all processing units (CPU cores,

GPUs and remote nodes) as effectively as possible.

To integrate this framework in *Concert*, we used the process abstraction described before and export scalar properties in the same way as device parameters but flagged with read-only status. Thus, a user can scan along a node's property "axis" to see the effects of a parameter change.

### 5.3 Fast reject algorithm

The new UFO experimental station generates an enormous amount of data. In order to manage the huge data stream, analysis and careful reduction helps to identify the desired information. For this an intelligent image-based self-trigger for fast spontaneous processes has been developed and implemented in the high-throughput camera platform.

In conventional setups unpredictable physical events could be lost or only partially acquired due to limited observation time given by the camera memory and/or readout bandwidth limitations [197, 52]. To circumvent this an adaptive frame rate and adaptive selection of the Region-Of-Interest (ROI) is used in order to record the temporal evolution of physical events at the fastest possible speed. An example of fast spontaneous events is shown in Figure 5.6, with two bubbles merging in gelatinous agar.



**Figure 5.6.** Bubbles merging in a gelatinous agar. (courtesy of the IPS institute, KIT)

The gelatinous bubbles are unchanged for the first  $48.9 \ ms$  of the data acquisition, corresponding to 489 unchanged frames at 10000 fps. After this time, the merging takes place in less than a millisecond. The desired logic

should be able to detect fast physical events and consequently, reject redundant data frames. A multi-event detection located in different frame regions is intended.

The image-based self-trigger is able to increase the effective frame rate and reduce the amount of received data. An example of using the fast reject to record the unpredictable physical events is shown in Figure 5.7, where a small object falls into a glass of water. Incoming frames are compared with a pre-recorded reference frame and the regions with events are identified and marked for acquisition. In the sequence the highlighted part of the image shows the selected physical events. The darker part is not read out but taken from the reference frame. In the first part, between frames 7 and 8, the evolution of the events is recorded with a frame rate of 1000 fps. In the second part the larger turbulent area is recorded with a frame rate of 500 fps which is still more than the native frame rate of the sensor.



**Figure 5.7.** Image sequence taken with the fast reject — Darkened parts of the images are reproduced from the reference frame. During the fast reject operation, readout is reduced to the interleaved lines. Region of interest is detected (highlighted part) when the difference exceeds the thresholds and read-out on demand (see frame numbers). Time is shown relative to the reference frame (frame 0).

### Description of the Fast reject algorithm

Recent CMOS sensors provide direct pixel access allowing the readout of individual rows of the pixel-matrix. The fast reject logic exploits this feature for its interleaved, row-based sub-sampling readout mechanism. The principle behind the self triggering algorithm is shown in Figure 5.8.

Interleaved readout skips a programmable number of rows between the rows that are read-out. A complete reference frame is stored in the DDR3 memory at the beginning of the data acquisition. Then the event-trigger Finite State Machine (FSM) enables the interleaved readout. In each interleaved frame, the first sent row is shifted down by one row position. Cor-



**Figure 5.8.** Basic principle behind image based self-trigger algorithm. Corresponding rows of the incoming frame and reference frames are subtracted. If the consequent result exceeds the predefined thresholds, the area is marked for the acquisition.

responding rows of the incoming images are subtracted row by row on a frame by frame basis with the reference frame. Several parameters are used to adjust for noise influence and the desired event size detection.

In case the difference exceeds the predefined threshold, the row is marked as a triggered row. The event-trigger FSMs receive the triggered rows and use this information to program the dynamic size of the region-of-interest (ROI) that will be acquired. The size of the ROI is determined by the position of the triggered rows. The user can choose whether to acquire everything between the first and the last triggered row, or just the area around the triggered rows. In the latter case up to eight ROIs can be acquired in a single readout. By reading fewer rows a drastic reduction of the readout time is achieved without losing the full field of view. User can select whether the reference frame will be updated with the acquired data or not. The self triggering architecture and the data flow are shown in Figure 5.9.

In order to balance the efficiency of the event detection and the influence of noise several programmable threshold parameters are used:

• *Pixel value threshold.* Indicates the minimum value for which corresponding pixels can be considered sufficiently different.

100 APPLICATIONS



**Figure 5.9.** Self-triggering architecture and the ROI strategy — Only highlighted rows are read out in the interleaved readout. The board memory is divided in reference frame and data storage. Readout for row comparison and to PCIe-DMA engine is in parallel.

- *Row threshold.* Indicates the minimum number of pixels per row for the row to be considered "triggered", i.e. marked for acquisition.
- *Global threshold.* Indicates the minimum number of triggered rows for acquisition.
- *Row area.* Number of rows around the triggered row to be acquired.

All trigger parameters are accessible by the control system. In addition, the control system supports the user, and translates these parameters to more intuitive entities.

The automatic reduction and the acquisition of only the changed image regions reduces the effective data size and at the same time increases the frame rate. Detection of multiple events in different regions of the same frame is possible. Benefits of the algorithm are simplification and acceleration of the data analysis, optimization of the effective bandwidth and significant increase of the time resolution with the higher frame rates.

#### **Evaluation and performance**

Measurements were taken in order to evaluate the performance of the fast reject algorithm. They were conducted in the lab, but also in the real world environment, using IMAGE beamline at ANKA, KIT. Experiments with repeatable behaviour and fast event occurrence were performed to test and characterize the efficiency and precision of the fast reject algorithm for radiography applications.

The FPGA DAQ platform is integrated and used as a 2D pixel detector. High-performance streaming capabilities and all sensors parameters are accessible by a custom control system, *Concert* [16]. In addition, *Concert* provides access to the data processing features of the DAQ platform. In order to simplify the usage of the fast reject algorithm, *Concert* automatically sets the initial parameters values in order to minimize the influence of the noise according to the desired maximum speed and object size detection.

The set\_fast\_reject\_parameters function is called to set the parameters and activate the fast reject before the events start. It accepts as input parameters the object size defined by user, the maximum desired frame rate, and the number of frames. Object size detection is determined by parameters num\_of\_diff\_pixels or row threshold and num\_of\_diff\_lines or global threshold. Maximum frame rate determines the maximum speed of the camera when the smallest detection occurs. This in turn defines the area, or ROI, of the frame which will be acquired since the exposure time and the readout time of a single row is known.

The pixel threshold is calculated in order to minimize the noise influence, and therefore, false positives of the trigger. The parameter <code>num\_of\_frames</code> sets how many frames are acquired to determine the noise influence. All parameters are checked against the physical limits. The user still has the ability to set each parameter freely and tune the algorithm as needed. These values are used to set the parameters previously explained in Section 5.3.

By using *Concert*, adding new features or modifying the existing image processing algorithm is straightforward. As shown in Figure 5.5, *Concert* uses several layers of abstraction to access hardware devices. Adding a new camera will require adding it to the libuca, but basic usage within the *Concert* would remain the same. Implementing new metrics or features in *Concert* can be done without the need to modify the libraries. If needed, the user can define its own procedure to automate the setting of the fast reject parameters.

Several measurements were conducted at the IMAGE beamline with the UFO smart camera to prove the potential of the UFO DAQ framework (ex-

102 APPLICATIONS

periment setup shown in Figure 5.10). With the first measurement, *Concert* was used to automatize X-ray tomography tasks.



**Figure 5.10.** Beamline experimental setup — The object is located on the rotating stage

Before acquiring tomography scans, the axis of rotation on the rotating stage needs to be aligned, and the camera must be focused on the sample. These steps were executed automatically, without any user intervention. Functions implemented in *Concert* automatically calculated all the necessary parameters from the user input, and opened and closed shutters within the beamline, acquired dark and flat fields, scanned the sample, and initiated the flat field correction. The volume was then immediately reconstructed using the UFO GPU framework. With the immediate feedback, experimental procedure is simplified, easier to reproduce and the beam time is utilized more efficiently.

In order to test and characterize the efficiency and time response of the fast reject for radiography applications, a synthetic setup with repeatable behavior and fast event occurrence was developed. It consists of a rotating aluminium plate with a 300  $\mu$ m diameter hole, shown in Figure 5.11. For the optical setup, 3x and 1.5x magnification was used with LuAG 50  $\mu$ m thick scintillator. The aluminium plate was rotating at a speed of 500 rotations per minute (rpm). The resolution of the sensor was set to 2048 x 1088 pixels, and



**Figure 5.11.** Schematic of the fast reject testbench setup — Round aluminium plate (2 mm thickness). Rotating speed is 500 rpm.

the exposure time was set to 4  $\mu$ s to avoid blurring effects. In Figure 5.12 an example of acquired data is shown. One data set is taken



**Figure 5.12.** Fast reject data with captured successive events — The full frame shows the appearance of the object in relation to the full field of view. Dark parts of the images are not read-out. The acquisition starts with the reference frame (frame 0) and then continues with the interleaved readout and ROI acquisition. Time is shown relative to each previous frame, e.g.  $\Delta t = 1.08$  ms between frame 17 and 18.

for more than 2 seconds with 240 fps, in streaming mode and without the fast reject. It contains 768 frames with a total size of 3.3 GB.

104 APPLICATIONS

The rotating disk is larger than the field of view. Therefore more than 95% of the acquired frames do not see the object, neither partially nor fully. There is no guarantee that the object will always be observed each time it is in the field of view of the camera since the required time to read 1088 rows (full field of view), regardless of the exposure time, is 3 ms and it takes the object less than 1.9 ms to cross the whole height of the frame.

The same measurement was done with the image-based self-trigger algorithm. In Figure 5.12 the data set taken with the fast reject is also presented. The data set contains 195 frames, with a total size of 234 MB, for a recording time of 12 seconds. The frames now contain only the captured events. Whenever the object is in the field of view of the sensor, it is observed and acquired. Frame 17 represents one observed event in time. The time distance between frame 17 and the next observed event in frame 18 is 1.08 ms with an effective frame rate of 931 fps. For the next event in frame 19 the time distance is 125 ms which corresponds to the set speed of 500 rpm. We can see that the event is always observed at least twice during its observable path, and that no events are lost. Compared with the initial experiment data size is reduced by an order of magnitude. And the fast reject algorithm enables the camera to reconstruct the full field of view with a much higher frame rate without the loss of information.

The dependency of the frame rate on the number of skipped rows and the object size is shown in Figure 5.13. The frame rate has been evaluated con-



**Figure 5.13.** Frame rate as a function of number of skipped rows — If number of skipped rows is larger than the object size, it can result in suboptimal frame rate, even lower than the nominal frame rate.

sidering both ROI readout time and the reaction time needed to detect the physical event. The readout time depends only on the size of ROI and not

on the gap between rows. The reaction time depends on both the number of skipped rows and the physical event area.

Minimum reaction time, which corresponds to maximum frame rate, is achieved when the gap is the same as the event area, in this case 90 rows. In case the physical event area is smaller than the gap, the probability to detect the event is reduced and the frame rate decreases rapidly. In Figure 5.13 measured data are presented together with the simulated data, and it is shown that the fast reject algorithm works as expected.

Performance of the fast reject highly depends on the parameters, event properties and experiment conditions. For example, significant noise and poorly chosen pixel threshold could lead to false positives and reduce the effectiveness of the algorithm. For very large continuous events the fast reject could be less effective than the normal radiography [52]. Parameters of the fast reject should be set to correspond to the expected event size. With the appropriate parameter values the concept of the fast reject has the potential to increase the frame rate of virtually any image sensor, especially if operating with addressable row readout.

## 5.4 Expansion of the streaming platform

The new smart camera platform as a DAQ has been integrated in several detector systems developed at IPE and Helmholtz-Zentrum Geesthacht (HZG). This chapter will describe these systems and their requirements, in order to demonstrate the modular approach of the smart camera system, and its suitability for a number of applications.

#### Phase contrast camera setup

Imaging based on analyzing the amplitude and phase of X-rays scattered from the materials may provide advantages over conventional radiography. Phase sensitive X-ray imaging methods can provide substantially increased contrast over conventional absorption-based imaging. It is two to three orders of magnitude more sensitive and it is better suited for the investigation of samples consisting mainly of light elements (i.e. elements with low Z-number, e.g. organic matter) [187]. E.g. 17.5 keV X rays that pass through a 50  $\mu$ m thick sheet of biological tissue are attenuated by only a fraction of a percent, while the phase shift is close to  $\pi$ .

In order to record X-ray phase changes that are caused by the specimen structure, we have to superimpose the beam behind the specimen with a reference beam and measure the pattern of interference fringes [198, 199]. There are multiple methods which are developed for phase-contrast imag-

106 APPLICATIONS

ing, however the stringent requirements for the temporal or spatial coherence hindered their more extensive adoption as a standard tool in industrial and biomedical applications [200].

The use of gratings as optical elements in hard X-ray phase imaging overcomes some of the problems that have impaired the wider use of phase contrast in X-ray radiography and tomography. Phase-contrast imaging approach based on grating interferometry can be efficiently performed with a conventional, low brilliance X-ray source[201]. The main advantages are not only the significantly reduced delivered dose, without the degradation of the image quality, but also much higher efficiency. The new technique sets the prerequisites for future fast and low-dose phase-contrast imaging methods, fundamental for imaging biological specimens and in vivo studies.

For the tomography applications using grating-based X-Ray Differential Phase-Contrast (DPC), a single projection is calculated from a set of images to resolve small refraction angles. An absorbing grating is used to scan an interference pattern, which is produced behind a periodic phase grating by means of the Talbot effect [202]. The first measurements using a grating interferometer were conducted in the early 2000s [203, 204].

The Helmholtz-Zentrum Geesthacht is operating microtomography stations using synchrotron radiation at the high brilliance third-generation synchrotron light source PETRA III at DESY in Hamburg. Grating interferometers were installed at the Imaging Beam Line (IBL, P05) and at the High Energy Material Science (HEMS, P07). They were designed to be operable at a wide range of energies with a fixed geometry [205]. The principle behind a grating-based phase contrast experimental setup is illustrated in Figure 5.14.



**Figure 5.14.** Principle behind the grating based phase contrast tomography. Gratings are controlled with a piezo drive, directly connected with the smart camera platform.

To provide an optimal setup for DPC tomography, a smart camera setup was expanded, in cooperation with the HZG. The modifications can be seen in Figure 5.15. The HZG team modified the camera provided by the IPE



**Figure 5.15.** Modified camera with the new housing. Besides the extension for the piezo drive, new cooling and enhanced power supply was also added by the HZG. Courtesy of HZG

EPS team, including the author. New development was needed in order to control the piezo drive, used for the movement of the phase grating. The control of the piezo drive was accomplished through the Control register,

108 APPLICATIONS

explained in details in Chapter 3. The HZG team also build a housing for the camera, upgraded the power supply, and introduced liquid cooling for the FPGA and image sensor board.

Grating interferometer consists of  $\pi$ -shifting phase grating (G1), made of nickel with a period of 4.8  $\mu m$  and height of the bars of 16  $\mu m$ , and the absorption gold grating (G2) with a period of 2.4  $\mu m$  and height of the bars of 120  $\mu m$ . The distance between the G1 and G2 is approximately 21 cm. For the stepping process a piezo drive was used. As the test object a mouse bone was used. The setup, experimental procedure, and reconstruction process are described in more details in [205]. DPC projection measurements were performed using different scan parameters, with various number of phase steps, scanned periods, and exposure time. Experiment setup with the smart camera is shown in Figure 5.16.



**Figure 5.16.** Phase contrast setup in HEMS beamline. Camera tower contains the smart camera with a CMOS sensor and a CCD camera. Courtesy of HZG

The measurement was conducted using the exposure time of  $550\ ms$  for the CCD camera, and  $100\ ms$  for the CMOS camera. In both cases 10 phase steps were made by shifting the G1 grating using a piezo. One period was used for the reconstruction. The image quality and the resulting contrast

are comparable for both pictures. The field of view (FOV) is smaller for the CMOS camera, due to its smaller matrix size. From the pictures it is apparent that the CMOS camera system is able to perform with a comparable image quality to the CCD camera. The results, experimental setup description, and images were taken from [176]. For both cameras phase projections are measured and calculated using the same optical magnification of the detector system. The resulting images are shown in Figure 5.17.



**Figure 5.17.** Results showing a single projection of the mouse bone sample. 1a: reconstructed phase projection obtained with the CCD camera. 1b: part of the same projection used to compare with the smart camera. 2: reconstructed phase projection acquired with the smart camera. For both pictures: white is -2.5, and black is 2.5.

### KAPTURE — DAQ for for recording ultra-short pulses

The recording of coherent synchrotron radiation requires data acquisition systems with a temporal resolution of tens of picoseconds (*ps*). The KAP-TURE (Karlsruhe Pulse Taking Ultra-fast Readout Electronics) system is a DAQ system which enables continuous sampling of ultra-short pulses generated by THz detectors [206]. This system is developed in IPE EPS group [207], and principal idea and work was done by M. Caselle.

The system consists of a fast sampling board combined with a high data throughput readout. The first board is designed for sampling of the fast pulse signals with a full width half maximum (FWHM) between a few tens to one hundred picoseconds with a minimum sampling time of 3 ps. The high data throughput board consists of a PCIe-Bus Master DMA architecture

110 APPLICATIONS

used for fast data transfer up to  $3\,GB/s$ . The full readout chain with the fast THz detectors and the data acquisition system was tested in ANKA.

At the synchrotron light source ANKA up to 184 RF buckets can be filled with electrons with the distance between two adjacent buckets of 2 ns corresponding to the 500 MHz frequency of the RF system [208]. A coherent synchrotron radiation (CSR) can be created from the short electron bunches. To detect and study the emission characteristics of CSR in the THz range over multiple revolutions several detector systems based on superconductor film layers have been developed.

The new generation of detectors is based on a thin Yttrium barium copper oxide (YBCO) superconductor film with a intrinsic response time down to 1 ps [209]. The data acquisition system (DAQ) for CSR analysis requires two signal processing chains: one analog-to-digital converter (ADC) for energy measurement and one time-to-digital converter (TDC) with discriminator (e.g., constant fraction discriminator), for accurate time jitter measurement between the two consecutive bunches. The KAPTURE system shown in Figure 5.18.



*Figure 5.18.* Image of the KAPTURE system. Courtesy of M Caselle

The ultra-fast THz thin film detector provides the incoming pulse signal to a wideband power splitter, via low-noise amplifier, which divides the signal in four identical analog signals. This enables for the pulse to be sampled at four different points with a picosecond precision. These four samples, after being digitized, are then provided to the FPGA readout board. In order to keep a continuous data acquisition, the necessary bandwidth is  $24 \ Gb/s$  (12 bits per sample, at each  $2 \ ns$ ).

The readout board, with the bus master DMA, is able to stream the data

to a high end GPU server via PCIe [210]. The readout board architecture is the same as used for the smart camera. The only difference, as expected, is in the input stage, which accepts the incoming four samples. This example serves as a further evidence of versatility and re-usability of the smart camera platform. The readout board also provides the ability for the on-line pulse reconstruction and measurement of the arriving time and the signal amplitude, which is planned for a future implementation.

#### KALYPSO — DAQ for real-time beam diagnostics

The KALYPSO or KArlsruhe Linear arraY detector for MHz rePetition-rate SpectrOscopy is a line scan detector with the goal to provide scientists at ANKA with a complete detector system which will enable real-time, turn-by-turn measurements of the bunch profile with sub-ps temporal resolution [211].

The acquisition rate of commercially available line array detectors is a bottleneck for beam diagnostics at high-repetition rate machines like synchrotron lightsources or FELs with a quasi-continuous or macro-pulse operation. KALYPSO removes this bottleneck, as an ultra-fast linear array detector operating at a frame-rate of up to 2.7 Mfps.

The KALYPSO detector mounts InGaAs or Si linear array sensors to measure radiation in the near-infrared or visible spectrum. The FPGA-based read-out card can be connected to an external data acquisition system with a high-performance PCI-Express 3.0 data-link, allowing continuous data taking and real-time data analysis. The detector is fully synchronized with the timing system of the accelerator and other diagnostic instruments. The detector is currently installed at several accelerators: ANKA, the European XFEL and TELBE [211].

KALYPSO consists of a detector board and an FPGA-based readout card. The detector board mounts the sensor, the front-end amplier and the Analog-to-Digital Converter (ADC). The sensor is a Si or an InGaAs linear array, with 256 pixels and a pitch of  $50~\mu m$ , to detect radiation in the visible and near-infrared spectrum up to wavelengths of  $1.7~\mu m$ . The sensor is connected to the readout ASIC with high-density gold ball-to-wedge wire-bonds. The detector board is shown in Figure 5.19.

The readout card used at ANKA is based on a Xilinx Virtex7 FPGA and controls the detector's operation. The DAQ system is based on a custom PCI-Express 3.0 Direct Memory Access engine with a maximum throughput of 6.4 GB/s, therefore allowing streaming operation at the maximum repetition rate of the detector. Real-time data analysis can be performed inside the FPGA or on a custom processing framework based on GPUs. FPGAs offer

112 APPLICATIONS



**Figure 5.19.** Picture of the KALYPSO detector board with Si sensor. DAC: Digital to Analog Converter, ADC: Analog to Digital Converter, ROIC: Readout Chip. Courtesy of L Rota

real-time data elaboration without the need of additional hardware, however GPUs allow for more flexibility in the data evaluation, with the ability to store the raw data and perform offline data analysis.

## 6 Conclusion

In the last few decades there was a massive advancement in the in integrated circuits design and technology. This development permits more complex functionalities, and makes new data acquisition paradigms possible. High performance and high throughput applications are the primary goal of the work done in this dissertation. X-ray tomography and radiography, among others, were selected as target applications due to their demanding requirements. The creation of the custom FPGA board provides flexible smart platform for experiments with real-time and high throughput demands.

#### High-throughput data acquisition

X-ray experiments create a large amount of data, and require systems which are able to acquire large volumes of image data. For this task a smart camera platform was designed, in which was subsequently incorporated a custom Bus Master DMA engine. The Bus Master DMA architecture provides data throughput of up to 8 GB/s. We are able to retain the native image sensor speed and provide real-time control of the system. Furthermore, data smart camera platform is seamlessly integrated in Concert, a custom UFO control system, and for basic functionalities, behaves just like any other high-performance camera.

## Embedded image processing

Upon the start of the acquisition, embedded image processing is available to the user. A custom image-based trigger, "fast reject" is implemented. The processing algorithm relies on the properties of modern CMOS image sensors, namely the ability for a row-based readout, with manifold increase in speed compared to the CCD sensors.

We exploited the parallelism of the FPGA, and with a pipeline implementation, shown that the overall performance now depends on the image sensor, and not, as with the commercial solutions, on the data acquisition hardware. Fast reject algorithm is able to detect events of desired size, however small or large, and at the same time increase the native image sensor speed, while reducing the amount of transfered data.

The experimental results have shown an increase of up to five times the native sensor frame rate. Additionally, the implemented internal FPGA archi-

114 CONCLUSION

tecture provides a robust platform for additional, further, data processing. The division of functionalities its corresponding domains, namely acquisition, memory access, and subsequently data transfer to PC, with a generic interface between them, provides a useful platform for future embedded processing. This is also supported by the small occupancy of the logic, which with a further technological development, will only decrease (percent wise).

#### **Modularity**

The smart camera platform is able to suport several image sensors, CMO-SIS CMV2000 and CMV4000, and Polaris. These sensors cover use cases from high resolution and medium speed to high speed applications. The main contribution is the seamless operation of the camera, regardless of the utilized sensor. Depending on the experimental demands, sensor can be select which best fit the requirements, while keeping the same functionality in operation and data acquisition with smart camera platform.

All the custom features and data processing capabilities are available to the user. In order to use different sensors, the expansion daughter carrier boards are made, and used to connect the sensors to the smart camera platform. User has full access to all sensor features (e.g. gain, offset).

#### Versatility with multiple applications

Smart camera platform has shown versatility by being used in multiple applications. For X-ray high volume applications, it removed the bottleneck between image sensor capabilities and GPU processing stage, with the data transfer speed of up to 8 GB/s. Furthermore, it was employed in an additional field of X-ray imaging, phase contrast experiments, conducted in cooperation with Helmholtz-Zentrum Geesthacht.

Embedded processing capabilities where demonstrated by implementing an image based self-trigger algorithm, with a potential five fold increase in frame rate, while reducing the actual data rate. Camera was successfully integrated in the custom UFO framework, and is operated by the user in a similar fashion as a commercial device.

## Agile prototyping

Smart camera platform was designed with adaptivity in mind, where main functionalities were grouped in logical units. As such, each function can be modified and expanded as needed. To facilitate easier embedded processing, we provide interface to incoming data and memory. Various input sources are supported with high-pin and high-speed connectors, for appli-

cations other than image acquisition. For high throughput applications, custom DMA module is present, with PCIe connection, which can expand the smart camera platform to additional applications, like DirectGPU.

# **Publications**

- [1] U. Stevanovic, M. Caselle, A. Cecilia, S. Chilingaryan, T. Farago, S. Gasilov, A. Herth, A. Kopmann, M. Vogelgesang, M. Balzer, and M. Weber, "A Control System and Streaming DAQ Platform with Image-Based Trigger for X-ray Imaging," *IEEE Transactions on Nuclear Science*, vol. 62, no. 3, pp. 911–918, 2015.
- [2] U. Stevanovic, M. Caselle, M. Balzer, A. Cecilia, S. Chilingaryan, T. Farago, S. Gasilov, A. Herth, A. Kopmann, M. Vogelgesang, and M. Weber, "Control System and Smart Camera with Image Based Trigger for Fast Synchrotron Applications," *Real Time Conference (RT)*, 2014 19th IEEE-NPSS, pp. 1–4, 2014.
- [3] U. Stevanovic, M. Caselle, S. Chilingaryan, A. Herth, A. Kopmann, M. Vogelgesang, M. Balzer, and M. Weber, "High-speed Camera with Embedded FPGA Processing," *Design and Architectures for Signal and Image Processing (DASIP)*, 2012 Conference on, pp. 1–2, 2012.
- [4] M. Caselle, S. Chilingaryan, A. Herth, A. Kopmann, U. Stevanovic, M. Vogelgesang, M. Balzer, and M. Weber, "Ultrafast Streaming Camera Platform for Scientific Applications," *Nuclear Science, IEEE Transactions on*, vol. 60, no. 5, pp. 3669–3677, 2013.
- [5] S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago, A. Kopmann, U. Stevanovic, and M. Vogelgesang, "Computing Infrastructure for Online Monitoring and Control of High-Throughput DAQ Electronics," 10th International Workshop on Personal Computers and Particle Accelerator Controls, PCaPAC2014, pp. 10–12, 2014.

# **Bibliography**

- [1] G. E. Moore, "Cramming more components onto integrated circuits," *Electronics*, vol. 38, no. 8, 1965.
- [2] A. A. Chien and V. Karamcheti, "Moore's law: The first ending and a new beginning," *Computer*, vol. 46, no. 12, pp. 48–53, 2013.
- [3] N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan, "Leakage current: Moore's law meets static power," *computer*, vol. 36, no. 12, pp. 68–75, 2003.
- [4] V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, "Reducing power in high-performance microprocessors," in *Proceedings of the 35th annual Design Automation Conference*. ACM, 1998, pp. 732–737.
- [5] P. Magnan, "Detection of visible photons in CCD and CMOS: A comparative view," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 504, no. 1, pp. 199–212, 2003.
- [6] M. Bigas, E. Cabruja, J. Forest, and J. Salvi, "Review of CMOS image sensors," *Microelectronics journal*, vol. 37, no. 5, pp. 433–451, 2006.
- [7] "ANKA Synchrotron Radiation Facility, KIT," http://www.anka.kit.edu.
- [8] "pco.dimax S1 High-Speed Camera," http://www.pco.de/highspeed-cameras/pcodimax-s1/.
- [9] ANKA, "ANKA Facility," http://www.anka.kit.edu/964.php.
- [10] "IMAGE beamline, ANKA, KIT," http://www.anka.kit.edu/IMAGE.php.
- [11] "Ultra fast X-ray imaging of scientific processes with on-line assessment and data-driven process control," http://www.ufo.kit.edu.

[12] A. Rack, F. Garcia-Moreno, C. Schmitt, O. Betz, A. Cecilia, A. Ershov, T. Rack, J. Banhart, and S. Zabler, "On the possibilities of hard X-ray imaging with high spatio-temporal resolution using polychromatic synchrotron radiation," *Journal of X-ray Science and Technology*, vol. 18, no. 4, pp. 429–441, 2010.

- [13] M. Renzi *et al.*, "Pixel array detectors for time resolved radiography," *Review of Scientific Instruments*, vol. 73, no. 3, pp. 1621–1624, 2002.
- [14] F. García-Moreno, A. Rack, L. Helfen, T. Baumbach, S. Zabler, N. Babcsán, J. Banhart, T. Martin, C. Ponchut, and M. Di Michiel, "Fast processes in liquid metal foams investigated by high-speed synchrotron X-ray microradioscopy," *Applied Physics Letters*, vol. 92, no. 13, pp. 134 104–134 104, 2008.
- [15] A. Rack, F. García-Moreno, T. Baumbach, and J. Banhart, "Synchrotron-based radioscopy employing spatio-temporal microresolution for studying fast phenomena in liquid metal foams," *Journal of synchrotron radiation*, vol. 16, no. 3, pp. 432–434, 2009.
- [16] M. Vogelgesang, T. Farago, T. dos Santos Rolo, A. Kopmann, and T. Baumbach, "When hardware and software work in concert," in will be published in Proceedings of the 14th International Conference on Accelerator and Large Experiment Physics Control Systems, 2013.
- [17] R. A. Kirsch, L. Cahn, C. Ray, and G. H. Urban, "Experiments in processing pictorial information with a digital computer," in *Papers and Discussions Presented at the December 9-13, 1957, Eastern Joint Computer Conference: Computers with Deadlines to Meet*, ser. IRE-ACM-AIEE '57 (Eastern). New York, NY, USA: ACM, 1958, pp. 221–229. [Online]. Available: http://doi.acm.org/10.1145/1457720.1457763
- [18] C. P. Ginsburg, "Comprehensive description of the Ampex video taperecorder," *Journal of the Society of Motion Picture and Television Engineers*, vol. 66, no. 4, pp. 177–182, 1957.
- [19] Y. Shiraishi, "History of home videotape recorder development," *SMPTE Journal*, vol. 94, no. 12, pp. 1257–1263, 1985.
- [20] E. F. Lally, Space Flight Report to the Nation, American Rocket Society. Mosaic Guidance for Interplanetary Travel, 1961.

[21] S. Morrison, "A new type of photosensitive junction device," *Solid-State Electronics*, vol. 6, no. 5, pp. 485 – 494, 1963. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0038110163900339

- [22] J. Horton, R. Mazza, and H. Dym, "The Scanistor a solid-state image scanner," *Proceedings of the IEEE*, vol. 52, no. 12, pp. 1513–1528, 1964.
- [23] R. F. Lyon, "A brief history of Pixel," in *Electronic Imaging 2006*. International Society for Optics and Photonics, 2006, pp. 606 901–606 901.
- [24] M. Schuster and G. Strull, "A monolithic mosaic of photon sensors for solid-state imaging applications," *Electron Devices, IEEE Transactions on*, vol. ED-13, no. 12, pp. 907–912, 1966.
- [25] P. K. Weimer, G. Sadasiv, J. Meyer, J.E., L. Meray-Horvath, and W. Pike, "A self-scanned solid-state image sensor," *Proceedings of the IEEE*, vol. 55, no. 9, pp. 1591–1602, 1967.
- [26] P. Noble, "Self-scanned silicon image detector arrays," *Electron Devices, IEEE Transactions on*, vol. 15, no. 4, pp. 202–209, 1968.
- [27] P. Fry, P. Noble, and R. Rycroft, "Fixed-pattern noise in photomatrices," *Solid-State Circuits*, *IEEE Journal of*, vol. 5, no. 5, pp. 250–254, 1970.
- [28] W. S. Boyle and G. E. Smith, "Charge Coupled Semiconductor Devices," *Bell System Technical Journal*, vol. 49, no. 4, pp. 587–593, 1970. [Online]. Available: http://dx.doi.org/10.1002/j.1538-7305. 1970.tb01790.x
- [29] G. Lloyd and S. Sasson, "Electronic still camera," 1978, uS Patent 4,131,919. [Online]. Available: http://www.google.com/patents/US4131919
- [30] M. R. Peres, Ed., Focal Encyclopedia of Photography. Focal Press, 2007.
- [31] A. N. Belbachir, Ed., *Smart cameras*. New York: Springer Science+Business Media, 2010.
- [32] R. F. Lyon, "The optical mouse, and an architectural methodology for smart digital sensors," in *VLSI Systems and Computations*. Springer, 1981, pp. 1–19.
- [33] R. Schneiderman, "Smart cameras clicking with electronic functions," *Electronics*, vol. 48, no. 17, pp. 74–81, 1975.

[34] R. Ammendola *et al.*, "GPUs for real-time processing in HEP trigger systems," in *Journal of Physics: Conference Series*, vol. 523, no. 1. IOP Publishing, 2014, p. 012007.

- [35] R. Mosqueron, J. Dubois, M. Mattavelli, and D. Mauvilet, "Smart camera based on embedded HW/SW coprocessor," *EURASIP Journal on Embedded Systems*, vol. 2008, p. 3, 2008.
- [36] B. Rinner, T. Winkler, W. Schriebl, M. Quaritsch, and W. Wolf, "The evolution from single to pervasive smart cameras," in *Distributed Smart Cameras*, 2008. ICDSC 2008. Second ACM/IEEE International Conference on. IEEE, 2008, pp. 1–10.
- [37] T. Moorhead and T. Binnie, "Smart CMOS camera for machine vision applications," in *IEE conference publication*. Institution of Electrical Engineers, 1999, pp. 865–869.
- [38] M. Bramberger, J. Brunner, B. Rinner, and H. Schwabach, "Real-time video analysis on an embedded smart camera for traffic surveillance," in *Real-Time and Embedded Technology and Applications Symposium*, 2004. *Proceedings. RTAS* 2004. 10th IEEE. IEEE, 2004, pp. 174–181.
- [39] D. Bauer, A. N. Belbachir, N. Donath, G. Gritsch, B. Kohn, M. Litzenberger, C. Posch, P. Schön, and S. Schraml, "Embedded vehicle speed estimation system using an asynchronous temporal contrast vision sensor," *EURASIP Journal on Embedded Systems*, vol. 2007, no. 1, pp. 34–34, 2007.
- [40] F. Dias, F. Berry, J. Serot, and F. Marmoiton, "Hardware, design and implementation issues on a FPGA-based smart camera," in *Distributed Smart Cameras*, 2007. ICDSC '07. First ACM/IEEE International Conference on, Sept 2007, pp. 20–26.
- [41] R. P. Kleihorst, A. A. Abbo, A. van der Avoird, M. Op de Beeck, L. Sevat, P. Wielage, R. van Veen, and H. van Herten, "Xetal: a low-power high-performance smart camera processor," in *Circuits and Systems*, 2001. ISCAS 2001. The 2001 IEEE International Symposium on, vol. 5. IEEE, 2001, pp. 215–218.
- [42] C. H. Lin, W. Wolf, A. Dixon, X. Koutsoukos, and J. Sztipanovits, "Design and implementation of ubiquitous smart cameras," in *Sensor Networks, Ubiquitous, and Trustworthy Computing*, 2006. IEEE International Conference on, vol. 1. IEEE, 2006, pp. 8–pp.

[43] M. Quaritsch, B. Rinner, and B. Strobl, "Improved agent-oriented middleware for distributed smart cameras," in *Distributed Smart Cameras*, 2007. ICDSC'07. First ACM/IEEE International Conference on. IEEE, 2007, pp. 297–304.

- [44] M. A. Patricio, J. Carbó, O. Pé, J. Garcí, J. M. Molina *et al.*, "Multi-agent framework in visual sensor networks," *EURASIP Journal on Advances in Signal Processing*, vol. 2007, 2006.
- [45] S. Fleck, F. Busch, and W. Straßer, "Adaptive probabilistic tracking embedded in smart cameras for distributed surveillance in a 3D model," *EURASIP Journal on Embedded Systems*, vol. 2007, no. 1, pp. 24–24, 2007.
- [46] S. Hengstler, D. Prashanth, S. Fong, and H. Aghajan, "MeshEye: a hybrid-resolution smart camera mote for applications in distributed intelligent surveillance," in *Proceedings of the 6th international conference on Information processing in sensor networks.* ACM, 2007, pp. 360–369.
- [47] R. Kleihorst, A. Abbo, B. Schueler, and A. Danilin, "Camera mote with a high-performance parallel processor for real-time frame-based video processing," in *Distributed Smart Cameras*, 2007. ICDSC'07. First ACM/IEEE International Conference on. IEEE, 2007, pp. 109–116.
- [48] A. Rowe, A. Goode, D. Goel, and I. Nourbakhsh, "CMUcam3: An open programmable embedded vision sensor, robotics institute," 2007.
- [49] Y. Shi and S. Lichman, "Smart cameras, a review," in *Proceedings of*. Citeseer, 2005, pp. 95–100.
- [50] "AIA machine vision trade association," http://www.visiononline.org/vision-standards.cfm.
- [51] A. DeHon, "The density advantage of configurable computing," *Computer*, vol. 33, no. 4, pp. 41–49, 2000.
- [52] M. Caselle, S. Chilingaryan, A. Herth, A. Kopmann, U. Stevanovic, M. Vogelgesang, M. Balzer, and M. Weber, "Ultrafast streaming camera platform for scientific applications," *Nuclear Science, IEEE Transactions on*, vol. 60, no. 5, pp. 3669–3677, 2013.
- [53] U. Stevanovic *et al.*, "A control system and streaming DAQ platform with image-based trigger for X-ray imaging," *IEEE Transactions on Nuclear Science*, vol. 62, no. 3, pp. 911–918, 2015.

- [54] "Xilinx web site," http://www.xilinx.com.
- [55] D. H. Bilderback, P. Elleaume, and E. Weckert, "Review of third and next generation synchrotron light sources," *Journal of Physics B: Atomic, molecular and optical physics*, vol. 38, no. 9, p. S773, 2005.
- [56] A. Sartori, "A smart camera," FPGAs, pp. 353–362.
- [57] A. Simoni, M. Gottardi, A. Sartori, and A. Zorat, "A digital camera for machine vision," in *Industrial Electronics*, Control and Instrumentation, 1994. IECON'94., 20th International Conference on, vol. 2. IEEE, 1994, pp. 879–883.
- [58] S.-C. Chan, H. Ngai, and K.-L. Ho, "A programmable image processing system using FPGAs," *International Journal of Electronics Theoretical and Experimental*, vol. 75, no. 4, pp. 725–730, 1993.
- [59] S. P. Monacos, R. K. Lam, A. A. Portillo, and G. G. Ortiz, "Design of an event-driven random-access-windowing CCD-based camera," in *High-Power Lasers and Applications*. International Society for Optics and Photonics, 2003, pp. 115–125.
- [60] "AM41 and AM1X5 CMOS image sensors," http://www.alexima.com/products.htm.
- [61] "PCIe specifications," http://pcisig.com/specifications/pciexpress/.
- [62] A. Corporation, "Flash FPGAs in the value-based market white paper," http://www.actel.com, 2005.
- [63] P. Garcia, K. Compton, M. Schulte, E. Blem, and W. Fu, "An overview of reconfigurable hardware in embedded systems," *EURASIP Journal on Embedded Systems*, vol. 2006, no. 1, pp. 13–13, 2006.
- [64] T. R. Rimmele, "Recent advances in solar adaptive optics," in *SPIE Astronomical Telescopes+ Instrumentation*. International Society for Optics and Photonics, 2004, pp. 34–46.
- [65] T. W. Fry and S. A. Hauck, "SPIHT image compression on FPGAs," *IEEE Transactions on Circuits and Systems for Video Technology*, vol. 15, no. 9, pp. 1138–1147, 2005.
- [66] D. Desmet, P. Avasare, P. Coene, S. Decneut, F. Hendrickx, T. Marescaux, J.-Y. Mignolet, R. Pasko, P. Schaumont, and D. Verkest, "Design of Cam-E-leon, a run-time reconfigurable Web camera," in *Embedded processor design challenges*. Springer, 2002, pp. 274–290.

[67] M. Leeser, S. Miller, and H. Yu, "Smart camera based on reconfigurable hardware enables diverse real-time applications," in *Field-Programmable Custom Computing Machines*, 2004. FCCM 2004. 12th Annual IEEE Symposium on. IEEE, 2004, pp. 147–155.

- [68] D. Pellerin and S. Thibault, *Practical FPGA programming in C.* Prentice Hall Press, 2005.
- [69] I. Kuon, R. Tessier, and J. Rose, "FPGA architecture: Survey and challenges," *Foundations and Trends in Electronic Design Automation*, vol. 2, no. 2, pp. 135–253, 2008.
- [70] R. C. Minnick, "A survey of microcellular research," *Journal of the ACM* (*JACM*), vol. 14, no. 2, pp. 203–241, 1967.
- [71] J. M. Birkner and H.-T. Chua, "Programmable array logic circuit," Nov. 7 1978, uS Patent 4,124,899.
- [72] W. Carter, K. Duong, R. H. Freeman, H. Hsieh, J. Y. Ja, J. E. Mahoney, L. T. Ngo, and S. L. Sze, "A user programmable reconfigurable gate array," in *Proceedings of the IEEE Custom Integrated Circuits Conference*, 1986.
- [73] "Altera Stratix V product info," https://www.altera.com/ content/dam/altera-www/global/en\_US/pdfs/literature/sg/ product-catalog.pdf.
- [74] Xilinx, "Xilinx 7 product guide," http://www.xilinx.com/support/documentation/data\_sheets/ds180\_7Series\_Overview.pdf.
- [75] D. G. Bailey, Design for embedded image processing on FPGAs. John Wiley & Sons, 2011.
- [76] P. H. Leong, "Recent trends in FPGA architectures and applications," in *Electronic Design*, Test and Applications, 2008. DELTA 2008. 4th IEEE International Symposium on. IEEE, 2008, pp. 137–141.
- [77] Xilinx, "Xilinx FPGA structure," http://www.xilinx.com/fpga/.
- [78] I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs," *Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on*, vol. 26, no. 2, pp. 203–215, 2007.

[79] D. Frohman-Bentchkowsky, "A fully decoded 2048-bit electrically programmable FAMOS read-only memory," *Solid-State Circuits, IEEE Journal of*, vol. 6, no. 5, pp. 301–306, 1971.

- [80] R. Cuppens, C. D. Hartgring, J. F. Verwey, H. L. Peek, F. Vollebragt, E. G. Devens, I. Sens *et al.*, "An EEPROM for microprocessors and custom logic," *Solid-State Circuits, IEEE Journal of*, vol. 20, no. 2, pp. 603–608, 1985.
- [81] D. C. Guterman, I. H. Rimawi, T. Chiu, R. D. Halvorson, and D. J. Mcelroy, "An electrically alterable nonvolatile memory cell using a floating-gate structure," *Electron Devices, IEEE Transactions on*, vol. 26, no. 4, pp. 576–586, 1979.
- [82] J. a. Birkner, A. Chan, H. Chua, A. Chao, K. Gordon, B. Kleinman, P. Kolze, and R. Wong, "A very-high-speed field-programmable gate array using metal-to-metal antifuse programmable elements," *Microelectronics Journal*, vol. 23, no. 7, pp. 561–568, 1992.
- [83] S. Kilts, Advanced FPGA design: architecture, implementation, and optimization. John Wiley & Sons, 2007.
- [84] VHDL, "VHDL standard," http://accellera.org/downloads/ieee.
- [85] Verilog, "Verilog standard," http://www.verilog.com/.
- [86] E. Monmasson and M. N. Cirstea, "FPGA design methodology for industrial control systems - A review," IEEE transactions on industrial electronics, vol. 54, no. 4, pp. 1824–1842, 2007.
- [87] F. Vahid, Digital Design with RTL Design, Verilog and VHDL. John Wiley & Sons, 2010.
- [88] I. E. Commision, "EDIF standard," https://webstore.iec.ch/publication/5724.
- [89] S. Asano, T. Maruyama, and Y. Yamaguchi, "Performance comparison of FPGA, GPU and CPU in image processing," in *Field programmable logic and applications*, 2009. *fpl* 2009. *international conference on*. IEEE, 2009, pp. 126–131.
- [90] M. Birk, A. Guth, M. Zapf, M. Balzer, N. Ruiter, M. Hübner, and J. Becker, "Acceleration of image reconstruction in 3D ultrasound computer tomography: an evaluation of CPU, GPU and FPGA comput-

- ing," in Design and Architectures for Signal and Image Processing (DASIP), 2011 Conference on. IEEE, 2011, pp. 1–8.
- [91] M. Birk, M. Zapf, M. Balzer, N. Ruiter, and J. Becker, "A comprehensive comparison of GPU-and FPGA-based acceleration of reflection image reconstruction for 3D ultrasound computer tomography," *Journal of real-time image processing*, vol. 9, no. 1, pp. 159–170, 2014.
- [92] S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach, "Accelerating compute-intensive applications with GPUs and FPGAs," in *Application Specific Processors*, 2008. SASP 2008. Symposium on. IEEE, 2008, pp. 101–107.
- [93] J. Catsoulis, Designing Embedded Hardware: Create New Computers and Devices. O'Reilly Media, Inc., 2005.
- [94] R. Woods, J. McAllister, Y. Yi, and G. Lightbody, *FPGA-based implementation of signal processing systems*. John Wiley & Sons, 2008.
- [95] U. Meyer-Baese and U. Meyer-Baese, *Digital signal processing with field programmable gate arrays*. Springer, 2007, vol. 65.
- [96] I. Pitas, *Digital image processing algorithms and applications*. John Wiley & Sons, 2000.
- [97] C. C. Weems, S. P. Levitan, A. R. Hanson, E. M. Riseman, D. B. Shu, and J. G. Nash, "The image understanding architecture," *International Journal of computer vision*, vol. 2, no. 3, pp. 251–282, 1989.
- [98] C. C. Weems, "Architectural requirements of image understanding with respect to parallel processing," *Proceedings of the IEEE*, vol. 79, no. 4, pp. 537–547, 1991.
- [99] A. Downton and D. Crookes, "Parallel architectures for image processing," *Electronics & Communication Engineering Journal*, vol. 10, no. 3, pp. 139–151, 1998.
- [100] N. K. Ratha and A. K. Jain, "Computer vision algorithms on reconfigurable logic arrays," *IEEE Transactions on Parallel and Distributed Systems*, vol. 10, no. 1, pp. 29–43, 1999.
- [101] W. Burgern and M. Burger, "Principles of digital image processing fundamental techniques," 2009.

[102] A. K. Jain, Fundamentals of digital image processing. Prentice-Hall, Inc., 1989.

- [103] B. Jähne, Digital image processing. Springer, 2007.
- [104] C. Johnston, K. Gribbon, and D. Bailey, "Implementing image processing algorithms on FPGAs," in *Proceedings of the Eleventh Electronics New Zealand Conference*, ENZCon04, 2004, pp. 118–123.
- [105] J. Vanne, E. Aho, T. D. Hamalainen, and K. Kuusilinna, "A high-performance sum of absolute difference implementation for motion estimation," *IEEE Transactions on Circuits and Systems for Video Technology*, vol. 16, no. 7, pp. 876–883, 2006.
- [106] P. M. Kuhn and M. Kuhn Peter, *Algorithms, complexity analysis and VLSI architectures for MPEG-4 motion estimation*. Springer, 1999.
- [107] S. Vassiliadis, E. A. Hakkennes, J. Wong, and G. G. Pechanek, "The sum-absolute-difference motion estimation accelerator," in *Euromicro Conference*, 1998. Proceedings. 24th, vol. 2. IEEE, 1998, pp. 559–566.
- [108] A. McIvor, Q. Zang, and R. Klette, "The background subtraction problem for video surveillance systems," in *International Workshop on Robot Vision*. Springer, 2001, pp. 176–183.
- [109] J. Heikkilä and O. Silvén, "A real-time system for monitoring of cyclists and pedestrians," *Image and Vision Computing*, vol. 22, no. 7, pp. 563–570, 2004.
- [110] R. Cutler and L. Davis, "View-based detection and analysis of periodic motion," in *Pattern Recognition*, 1998. Proceedings. Fourteenth International Conference on, vol. 1. IEEE, 1998, pp. 495–500.
- [111] C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," in *Computer Vision and Pattern Recognition*, 1999. *IEEE Computer Society Conference on.*, vol. 2. IEEE, 1999, pp. 246–252.
- [112] Y. Kameda and M. Minoh, "A human motion estimation method using 3-successive video frames," in *International conference on virtual systems and multimedia*, 1996, pp. 135–140.
- [113] R. Cucchiara, P. Onfiani, A. Prati, and N. Scarabottolo, "Segmentation of moving objects at frame rate: a dedicated hardware solution," 1999.

[114] S. M. Gruner *et al.*, "Charge-coupled device area X-ray detectors," *Rev. Sci. Instrum*, vol. 73, no. 8, pp. 2815–2842, 2002.

- [115] G. Mettivier *et al.*, "High frame rate X-ray imaging with a 256/spl times/256 pixel single photon counting Medipix2 detector," in *IEEE Nucl. Sci. Conf. R.*, Fajardo, Puerto Rico, 2005.
- [116] T. Martin *et al.*, "LSO-based single crystal film scintillator for synchrotron-based hard X-ray micro-imaging," *IEEE Trans. Nucl. Sci.*, vol. 56, no. 3, pp. 1412–1418, 2009.
- [117] Y. Wang *et al.*, "A high-throughput X-ray microtomography system at the Advanced Photon Source," *Rev. Sci. Instrum*, vol. 72, no. 4, pp. 2062–2068, 2001.
- [118] M. Di Michiel *et al.*, "Fast microtomography using high energy synchrotron radiation," *Rev. Sci. Instrum*, vol. 76, no. 4, 2005.
- [119] M. Vollmer and K.-P. Möllmann, "High speed and slow motion: the technology of modern high speed cameras," *Phys. Educ.*, vol. 46, no. 2, pp. 191–202, 2011.
- [120] A. Koch *et al.*, "X-ray imaging with submicrometer resolution employing transparent luminescent screens," *J. Opt. Soc. Am. A*, vol. 15, no. 7, pp. 1940–1951, 1998.
- [121] M. Born and E. Wolf, *Principles of optics: electromagnetic theory of propagation, interference and diffraction of light*. United Kingdom: Cambridge University Press, 1999.
- [122] U. Bonse and F. Busch, "[x]-ray computed microtomography ( $\mu$ CT) using synchrotron radiation (SR)," *Prog. Biophys. Mol. Biol.*, vol. 65, no. 1-2, pp. 133–169, 1996.
- [123] Xilinx, "Xilinx Virtex 6 series," http://www.xilinx.com/products/silicon-devices/fpga/virtex-6.html.
- [124] SAMTEC, "FMC standard," https://www.samtec.com/standards/fmc.
- [125] "CMOSIS CMV2000," http://www.cmosis.com/products/product\_detail/cmv2000.
- [126] Xilinx, "Xilinx PCIe Virtex 6," https://www.xilinx.com/products/intellectual-property/v6\_pci\_express\_block.html.

[127] N. Logic, "DMA Back-End Core," http://nwlogic.com/packetdma/.

- [128] Xilinx, "Xilinx Virtex 6 IO guide," https://www.xilinx.com/support/documentation/user\_guides/ug361.pdf.
- [129] Xilinx, "Xilinx Virtex 6 DDR interface guide," https://www.xilinx.com/support/documentation/ip\_documentation/ug406.pdf.
- [130] H. Global, "Xilinx Virtex 6 LX365 board," http://www.hitechglobal.com/Boards/Virtex6\_PCIExpress\_Board.htm.
- [131] L. Rota, M. Caselle, S. Chilingaryan, A. Kopmann, and M. Weber, "A PCIe DMA architecture for multi-gigabyte per second data transmission," *IEEE Transactions on Nuclear Science*, vol. 62, no. 3, pp. 972–976, 2015.
- [132] L. Rota, M. Caselle, S. Chilingaryan, A. Kopmann, and M. Weber, "A new DMA PCIe architecture for gigabyte data transmission," in *Real Time Conference (RT)*, 2014 19th IEEE-NPSS. IEEE, 2014, pp. 1–2.
- [133] ARM, "AMBA AXI specification," http://www.arm.com/products/system-ip/amba-specifications.
- [134] W. A. Wulf and S. A. McKee, "Hitting the memory wall: implications of the obvious," *ACM SIGARCH computer architecture news*, vol. 23, no. 1, pp. 20–24, 1995.
- [135] N. R. Mahapatra and B. Venkatrao, "The processor-memory bottle-neck: problems and solutions," *Crossroads*, vol. 5, no. 3es, p. 2, 1999.
- [136] J. L. Hennessy and D. A. Patterson, *Computer architecture: a quantitative approach*. Elsevier, 2011.
- [137] JEDEC, "Dynamic Random-Access Memory (DRAM)," https://www.jedec.org/standards-documents/dictionary/terms/dynamic-random-access-memory-dram.
- [138] JEDEC, "DDR3 SO-DIMM Reference Desing Specification," https://www.jedec.org/standards-documents/docs/module-42018.
- [139] Xilinx, "Analog Solutions for Xilinx FPGAs Product Guide," https://www.maximintegrated.com/content/dam/files/design/technical-documents/product-guides/FPGA-Xilinx-Product-Guide.pdf.

[140] M. Integrated, "Power-Supply Solutions for Xilinx FPGAs," https://www.maximintegrated.com/en/app-notes/index.mvp/id/5132.

- [141] M. Integrated, "Single/Multiphase, Step-Down, DC-DC Converter Delivers Up to 25A Per Phase," https://www.maximintegrated.com/en/products/power/switching-regulators/MAX8686.html/tb\_tab2.
- [142] M. Integrated, "EE-Sim Design Tools," http://maxim.transim.com/Loader/New.aspx.
- [143] "Nobel prize in physics 1921." [Online]. Available: http://www.nobelprize.org/nobel\_prizes/physics/laureates/1921/
- [144] D. Durini, High Performance Silicon Imaging Fundamentals and Applications of CMOS and CCD sensors. Elsevier, 2014.
- [145] G. Lutz, Semiconductor Radiation Detectors. Springer, 2007.
- [146] H. Zimmermann, *Integrated Silicon Optoelectronics*. Springer, 2009.
- [147] S. M. Sze and K. K. Ng, *Physics of Semiconductor Devices*. Wiley, 2007.
- [148] S. M. Sze, Semiconductor Devices Physics and Technology. Wiley, 2002.
- [149] W. Commons, "A PN junction in thermal equilibrium 2007. with voltage applied," [Online]. https://upload.wikimedia.org/wikipedia/commons/f/fa/ able: Pn-junction-equilibrium-graphs.png
- [150] H. Spieler, Semiconductor Detector Systems. Oxford University Press, 2005.
- [151] "Nobel prize in physics 2009." [Online]. Available: http://www.nobelprize.org/nobel\_prizes/physics/laureates/2009/
- [152] E. R. Fossum, "Active pixel sensors: Are CCDs dinosaurs?" in *IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology*. International Society for Optics and Photonics, 1993, pp. 2–14.
- [153] A. Owens, K. J. McCarthy, A. Wells, W. Hajdas, F. Mattenberger, A. Zehnder, and O. Terekhov, "Measured radiation damage in charge coupled devices exposed to simulated deep orbit proton fluxes," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 361, no. 3, pp. 602–610, 1995.

[154] P. Falus, M. A. Borthwick, and S. G. Mochrie, "Fast CCD camera for X-ray photon correlation spectroscopy and time-resolved X-ray scattering and imaging," *Review of Scientific instruments*, vol. 75, no. 11, pp. 4383–4400, 2004.

- [155] A. J. Theuwissen, "CMOS image sensors: State-of-the-art," *Solid-State Electronics*, vol. 52, no. 9, pp. 1401–1406, 2008.
- [156] G. P. Weckler, "Operation of pn junction photodetectors in a photon flux integrating mode," *IEEE Journal of Solid-State Circuits*, vol. 2, no. 3, pp. 65–73, 1967.
- [157] E. R. Fossum *et al.*, "CMOS image sensors: electronic camera-on-a-chip," *IEEE transactions on electron devices*, vol. 44, no. 10, pp. 1689–1698, 1997.
- [158] T. Lulé, S. Benthien, H. Keller, F. Mutze, P. Rieve, K. Seibel, M. Sommer, and M. Bohm, "Sensitivity of CMOS based imagers and scaling perspectives," *Electron Devices, IEEE Transactions on*, vol. 47, no. 11, pp. 2110–2122, 2000.
- [159] J.-H. Park, S. Kawahito, and Y. Wakamori, "A new active pixel structure with a pinned photodiode for wide dynamic range image sensors," *IEICE Electronics Express*, vol. 2, no. 18, pp. 482–487, 2005.
- [160] D. Litwiller, "CCD vs. CMOS," *Photonics Spectra*, vol. 35, no. 1, pp. 154–158, 2001.
- [161] H.-S. P. Wong, "CMOS image sensors-recent advances and device scaling considerations," in *Electron Devices Meeting*, 1997. *IEDM'97*. *Technical Digest.*, *International*. IEEE, 1997, pp. 201–204.
- [162] A. J. Theuwissen, *Solid-state imaging with charge-coupled devices*. Springer Science & Business Media, 1995, vol. 1.
- [163] I. Brouk, A. Nemirovsky, and Y. Nemirovsky, "Analysis of noise in CMOS image sensor," in *Microwaves, Communications, Antennas and Electronic Systems*, 2008. COMCAS 2008. IEEE International Conference on. IEEE, 2008, pp. 1–8.
- [164] C. Aguerrebere, J. Delon, Y. Gousseau, and P. Musé, "Study of the digital camera acquisition process and statistical modeling of the sensor raw data," 2013.

[165] M. C. Teich and B. Saleh, "Fundamentals of photonics," *Canada, Wiley Interscience*, p. 3, 1991.

- [166] C. D. Motchenbacher and J. A. Connelly, *Low-noise electronic system design*. Wiley New York, 1993.
- [167] J. Ohta, Smart CMOS image sensors and applications. CRC press, 2007.
- [168] K. Irie, A. McKinnon, K. Unsworth, and I. Woodhead, "A model for measurement of noise in CCD digital-video cameras," *Measurement Science and Technology*, vol. 19, no. 4, p. 045207, 2008.
- [169] J. R. Janesick, *Photon transfer*. SPIE press San Jose, 2007.
- [170] "Standard for measurement and presentation of specifications for machine vision sensors and cameras," http://www.emva.org/standards-technology/emva-1288/<Paste>.
- [171] ETP, "ETP X-Ray Irradiation setup," http://www.etp.kit.edu/english/265.php.
- [172] R. D. Deslattes, E. G. Kessler Jr, P. Indelicato, L. De Billy, E. Lindroth, and J. Anton, "X-ray transition energies: new approach to a comprehensive evaluation," *Reviews of Modern Physics*, vol. 75, no. 1, p. 35, 2003.
- [173] G. H. Zschornack, *Handbook of X-ray Data*. Springer Science & Business Media, 2007.
- [174] ANKA, "ANKA Detector Lab," http://www.anka.kit.edu/4481.php.
- [175] J. R. Janesick, K. P. Klaasen, and T. Elliott, "Charge-coupled-device charge-collection efficiency and the photon-transfer technique," *Optical engineering*, vol. 26, no. 10, pp. 260 972–260 972, 1987.
- [176] P. Lytaev *et al.*, "Characterization of the CCD and CMOS cameras for grating-based phase-contrast tomography," in *SPIE Optical Engineering+ Applications*. International Society for Optics and Photonics, 2014, pp. 921 218–921 218.
- [177] HZG, "Helmholtz-Zentrum Geestacht," https://www.hzg.de/index.php.de.
- [178] W. Meyer-Ilse, "Soft X-ray imaging using CCD sensors," in *Soft X-Rays Optics and Technology*. International Society for Optics and Photonics, 1986, pp. 515–518.

[179] W. C. Röntgen, "Über eine neue Art von Strahlen," *Annalen der Physik*, vol. 300, no. 1, pp. 1–11, 1898.

- [180] W. L. Bragg, "The specular reflection of X-rays." *Nature*, vol. 90, p. 410, 1912.
- [181] E. B. Podgorsak, *Radiation physics for medical physicists*. Springer Science & Business Media, 2010.
- [182] P. Willmott, An introduction to synchrotron radiation: Techniques and applications. John Wiley & Sons, 2011.
- [183] F. Elder, A. Gurewitsch, R. Langmuir, and H. Pollock, "Radiation from electrons in a synchrotron," *Physical Review*, vol. 71, no. 11, p. 829, 1947.
- [184] D. H. Tomboulian and P. Hartman, "Spectral and angular distribution of ultraviolet radiation from the 300-Mev Cornell synchrotron," *Physical Review*, vol. 102, no. 6, p. 1423, 1956.
- [185] G. V. Marr, Handbook on Synchrotron Radiation: Vacuum Ultraviolet and Soft X-ray Processes. Elsevier, 2013.
- [186] "ANKA Accelerator," http://www.anka.kit.edu/964.php.
- [187] F. Beckmann, U. Bonse, F. Busch, and O. Günnewig, "X-ray microtomography (μct) using phase contrast for the investigation of organic matter," *Journal of computer assisted tomography*, vol. 21, no. 4, pp. 539–553, 1997.
- [188] J. Als-Nielsen and D. McMorrow, *Elements of modern X-ray physics*. John Wiley & Sons, 2011.
- [189] J. Baruchel, J.-Y. Buffiere, and E. Maire, *X-ray tomography in material science*. HERMES Science Publications, 2000.
- [190] A. C. Kak and M. Slaney, *Principles of computerized tomographic imaging*. Siam, 1988, vol. 33.
- [191] C. Rau *et al.*, "Tomography with high resolution," in *International Symposium on Optical Science and Technology*. International Society for Optics and Photonics, 2002, pp. 14–22.

[192] F. Garcia-Moreno, A. Rack, L. Helfen, T. Baumbach, S. Zabler, N. Babcsan, J. Banhart, T. Martin, C. Ponchut, and M. Di Michiel, "Fast processes in liquid metal foams investigated by high-speed synchrotron X-ray microradioscopy," *Applied Physics Letters*, vol. 92, no. 13, pp. 134 104–134 104–3, 2008.

- [193] A. Rack *et al.*, "The micro-imaging station of the TopoTomo beamline at the ANKA synchrotron light source," *Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms*, vol. 267, no. 11, pp. 1978–1988, 2009.
- [194] IPE, "Prozessdatenverarbeitung (PDV) Fachgruppe," https://www.ipe.kit.edu/96.php.
- [195] M. Vogelgesang, S. Chilingaryan, T. dos\_Santos Rolo, and A. Kopmann, "UFO: A scalable GPU-based image processing framework for on-line monitoring," in *High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS)*, 2012 IEEE 14th International Conference on. IEEE, 2012, pp. 824–829.
- [196] A. Götz *et al.*, "TANGO a CORBA based control system," *ICALEPCS2003, Gyeongju, October*, 2003.
- [197] U. Stevanovic, M. Caselle, S. Chilingaryan, A. Herth, A. Kopmann, M. Vogelgesang, M. Balzer, and M. Weber, "High-speed camera with embedded FPGA processing," in *Design and Architectures for Signal and Image Processing (DASIP)*, 2012 Conference on. IEEE, 2012, pp. 1–2.
- [198] U. Bonse and M. Hart, "An X-ray interferometer," *Applied Physics Letters*, vol. 6, no. 8, pp. 155–156, 1965.
- [199] A. Momose, "Demonstration of phase-contrast X-ray computed to-mography using an X-ray interferometer," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 352, no. 3, pp. 622–628, 1995.
- [200] F. Pfeiffer, T. Weitkamp, O. Bunk, and C. David, "Phase retrieval and differential phase-contrast imaging with low-brilliance X-ray sources," *Nature physics*, vol. 2, no. 4, pp. 258–261, 2006.
- [201] P. Zhu, K. Zhang, Z. Wang, Y. Liu, X. Liu, Z. Wu, S. A. McDonald, F. Marone, and M. Stampanoni, "Low-dose, simple, and fast

grating-based X-ray phase-contrast imaging," *Proceedings of the National Academy of Sciences*, vol. 107, no. 31, pp. 13576–13581, 2010.

- [202] T. Weitkamp, A. Diaz, C. David, F. Pfeiffer, M. Stampanoni, P. Cloetens, and E. Ziegler, "X-ray phase imaging with a grating interferometer," *Optics express*, vol. 13, no. 16, pp. 6296–6304, 2005.
- [203] C. David, B. Nöhammer, H. Solak, and E. Ziegler, "Differential X-ray phase contrast imaging using a shearing interferometer," *Applied physics letters*, vol. 81, no. 17, pp. 3287–3289, 2002.
- [204] A. Momose, S. Kawamoto, I. Koyama, Y. Hamaishi, K. Takai, and Y. Suzuki, "Demonstration of X-ray Talbot interferometry," *Japanese journal of applied physics*, vol. 42, no. 7B, p. L866, 2003.
- [205] A. Hipp, F. Beckmann, P. Lytaev, I. Greving, L. Lottermoser, T. Dose, R. Kirchhof, H. Burmester, A. Schreyer, and J. Herzen, "Grating-based X-ray phase-contrast imaging at PETRA III," in SPIE Optical Engineering+ Applications. International Society for Optics and Photonics, 2014, pp. 921 206–921 206.
- [206] M. Caselle *et al.*, "An ultra-fast data acquisition system for coherent synchrotron radiation with terahertz detectors," *Journal of Instrumentation*, vol. 9, no. 01, p. C01024, 2014.
- [207] IPE, "Fachgruppe Eingebettete Parallele Systeme (EPS)," https://www.ipe.kit.edu/91.php.
- [208] A. Müller *et al.*, "Observation of coherent THz radiation from the ANKA and MLS storage rings with a Hot Electron bolometer," *TU5RFP027*, 2009.
- [209] P. Thoma *et al.*, "High-speed Y–Ba–Cu–O direct detection system for monitoring picosecond THz pulses," *IEEE Transactions on terahertz science and technology*, vol. 3, pp. 81–86, 2013.
- [210] M. Caselle, M. Brosi, S. Chilingaryan, T. Dritschler, E. Hertle, V. Judin, A. Kopmann *et al.*, "Commissioning of an ultra-fast data acquisition system for coherent synchrotron radiation detection," *as proceeding of 5th IPAC*, 2014.
- [211] L. Rota, M. Balzer, M. Caselle, M. Weber, N. Hiller, A. Mozzanica, C. Gerth, B. Steffen, D. Makowski, and A. Mielczarek, "KALYPSO: A linear array detector for visible to NIR radiation."