GoSlow: Design and Implementation of a Scalable Camera Array for High-Speed Imaging

(1)

A R R AY F O R H I G H - S P E E D I M A G I N G c e c i l l e . etheredge

Computer Architecture for Embedded Systems (CAES)

Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente

July, 2016

(2)

This thesis investigates the viability of using multiple low-cost compo-

nents to create something much bigger: an embedded system capable

of capturing high-speed video using a scalable and configurable array

of sensors. Various hard- and software components are designed, im-

plemented and verified as part of a new embedded system platform,

that, not only achieves high-speed imaging capabilities by using an

array of imaging sensors, but also provides a scalable and reusable

design for future camera array systems.

(3)

First of all, I would like to thank prof. Marco Bekooij, for his belief in hands-on research and real-world practicality has enabled this re- search to go forward and develop into this thesis.

Secondly, I would like to thank my family as an endless source of optimism to explore technology, and to venture on research such as this.

A thanks to all colleagues and folks in the CAES group. I would like to thank O ˘guz Meteer, whose countless hours of insights and ex- perience provided the fundamentals for without this thesis would not have been possible, and for most of his borrowed electronics involved still have to be returned at the time of this writing.

Thanks goes out to Stephen Ecob from Silicon On Inspiration in

Australia, providing the much needed hardware building blocks on

more than one occasion and his continued support essential during

the production of the prototype. Also thanks to Alice for sourcing the

much needed BGA components in Hong Kong, and folks at Proto-

Service for providing their field expertise in the final BGA assembly.

(4)

1 i n t r o d u c t i o n 1

1 .1 Problem definition . . . . 2

1 .2 Contributions . . . . 3

1 .3 Thesis outline . . . . 4

2 b a c k g r o u n d 6 2 .1 Digital image sensors . . . . 6

2 .2 High-speed imaging . . . . 10

2 .3 Image sensor arrays . . . . 12

3 h i g h -level system design 14 4 h a r d wa r e d o m a i n i m p l e m e n tat i o n 17 4 .1 Sensor receiver interface . . . . 17

4 .1.1 Physical signaling . . . . 17

4 .1.2 Timing characteristics . . . . 21

4 .1.3 FPGA implementation . . . . 24

4 .2 Streaming DRAM controller . . . . 25

4 .2.1 Dynamic Random-Access Memory (DRAM) . . 26

4 .2.2 Access pattern predictability . . . . 28

4 .2.3 DRAM protocol . . . . 29

4 .2.4 Peak transfer rate . . . . 32

4 .2.5 Protocol state machine . . . . 33

4 .2.6 FPGA implementation . . . . 36

4 .2.7 Command arbitration . . . . 37

4 .3 Stream interleaver . . . . 38

4 .3.1 FPGA implementation . . . . 40

4 .4 Embedded control processor . . . . 41

4 .5 Sensor control interface . . . . 42

4 .5.1 I

²

C protocol overview . . . . 43

4 .5.2 Camera control interface (CCI) . . . . 43

4 .5.3 FPGA implementation . . . . 44

4 .5.4 Phased start . . . . 45

4 .6 Readout interface . . . . 48

4 .6.1 FPGA implementation . . . . 48

4 .7 Clock domain crossing . . . . 49

5 s o f t wa r e d o m a i n i m p l e m e n tat i o n 51 5 .1 Embedded control . . . . 51

5 .2 Stream decoder . . . . 53

5 .2.1 Deinterleaving . . . . 54

5 .2.2 Bit slip correction . . . . 54

5 .2.3 Protocol layer decoder . . . . 55

5 .3 Image rectification . . . . 56

5 .3.1 Mathematical models . . . . 57

5 .3.2 Camera intrinsics and extrinsics . . . . 58

(5)

5 .3.3 Rectification . . . . 62

5 .3.4 Camera calibration . . . . 62

6 d ata f l o w a na ly s i s 65 6 .1 Throughput analysis . . . . 66

6 .1.1 Scenario-aware dataflow graph . . . . 67

6 .1.2 Throughput analysis . . . . 70

6 .1.3 Effects on array size . . . . 71

6 .2 Buffer size analysis . . . . 71

6 .2.1 Latency-rate SRDF graph . . . . 72

6 .2.2 Real-world case analysis . . . . 75

7 r e a l i z at i o n a n d r e s u lt s 78 7 .1 Sensor array configuration . . . . 78

7 .2 Choice of hardware . . . . 79

7 .3 Synthesis results . . . . 81

7 .4 Measurement setup . . . . 85

7 .5 Experimental results . . . . 86

8 c o n c l u s i o n 90 8 .1 Future work . . . . 92

i a p p e n d i x 94

a s a d f g r a p h l i s t i n g 95

b s t r o b o s c o p e p r o g r a m l i s t i n g 98

b i b l i o g r a p h y 100

(6)

Figure 1 Block diagram showing a simplified overview of the processes involved in a modern color image sensor, from incoming scene light to fi- nal output image. . . . 6 Figure 2 Example of a Bayer pattern encoded image (left)

and the resulting full color image after the de- bayering process (right). . . . 7 Figure 3 Readout architectures for conventional CCD and

CMOS sensors. Photodiodes and capacitive el- ements are respectively colored dark green and yellow. . . . 8 Figure 4 Rolling shutter mechanism in action. Left shows

the shutter’s direction of movement in terms of pixel rows. Right contains a plot of row ex- posure over time, clearly showing the sliding window of time. . . . 9 Figure 5 Simplified view of two active pixels with dif-

ferent shutters in a CMOS sensor at the sili- con level. The addition of a memory element causes occlusion of an area of the photodiode. 10 Figure 6 Talbot’s high speed photography experiment.

A paper disc is captured at standstill (left), spin- ning and captured with an exposure time of 10 ms (middle), and with an exposure time of 1 ms (right). Images courtesy of [VM11]. . . . 11 Figure 7 The Stanford Multi-Camera array setup as de-

scribed in [Wil+05]. . . . 12 Figure 8 Abstract system diagram illustrating the hard-

ware and software domain boundaries, their high-level components and the corresponding dataflows in the system. . . . 14 Figure 9 Hardware domain diagram showing its vari-

ous subsystems and components. Orange in-

dicates a component external to the hardware

domain, grey is a custom subsystem imple-

mented by FPGA logic, yellow is a logic ele-

ment and green represents software. Arrows

indicate dataflows, with red representing the

capture stage, and blue representing the read-

out stage. . . . 15

(7)

Figure 10 Software domain diagram showing its various subsystems and components, including the em- bedded control. Orange indicates a component external to the software domain, grey is a cus- tom subsystem implemented in software and yellow is a specific software component. Ar- rows indicate dataflows, with red representing the primary video dataflow. . . . 16 Figure 11 Standard parallel CMOS sensor interface. Data

lines D

₀

through D

_N

represent the N-bit en- coded pixels of the output image. . . . 18 Figure 12 LVDS output circuitry: a current-mode driver

drives a differential pair line, with current flow- ing in opposite directions. Green and red de- note the current flow for respectively high ("one") and low ("zero") output values. . . . 19 Figure 13 Simplified fragment of a video data transmis-

sion using MIPI CSI-2 (top) and HiSPi (bot- tom). LP indicates logic high and low using single-ended signaling, while all other signals are LVDS. Green respresents image data, all other colors represent control words as defined by the respective protocols. . . . 20 Figure 14 Typical CMOS sensor layout in terms of physi-

cal pixels (left) and corresponding structure of the image frame readout data (right), as found in the Aptina MT9M021 CMOS sensor [Apt12b]. 21 Figure 15 Topology within a rank of DRAM devices. A

bank consists of multiple DRAMs, each of which supplies an N-bit data word using row and column addressing. . . . 27 Figure 16 Common layout of a DDR3 DIMM, populated

at both sides with multiple DRAMs. The DIMM is typically structured in such a way that each side represents a single DRAM rank. . . . 27 Figure 17 Simplified illustration of the pipelined DRAM

command timing in case of two BL8 write re- quests. Green and blue indicate any commands, addresses and data bits belonging to the same request. . . . 32 Figure 18 Original simplified DRAM protocol state dia-

gram as in [JED08]. Power and calibration states have been omitted. . . . 34 Figure 19 Redesigned DRAM state diagram optimized

for sequential burst writes. Burst-related states

are highlighted in blue. . . . 35

(8)

Figure 20 I

²

C data transmission using 7-bit slave address- ing with two data words as described in [Phi03].

Grey blocks are transmissions from master to slave, white are from slave to master and green depends on the type of operation. Bit widths are denoted below. . . . 44 Figure 21 Example CCI data transmission addressing an

8 -bit register using a 16-bit address. Grey blocks are transmissions from master to slave, white are from slave to master and green depends on the type of operation. Bit widths are denoted below. . . . 44 Figure 22 Phased start with 8 sensors plotted against time.

Dark regions represent exposure duration, and light regions represent readout duration for all sensors. . . . 46 Figure 23 The two different types of synchronizers used

in our system. Green and blue represent the two different clock domains, and red acts as stabilizing logic in between. . . . 50 Figure 24 Simplified fragment of a video data transmis-

sion using MIPI CSI-2 (top) and HiSPi (bot- tom). LP indicates logic high and low using single-ended signaling, while all other signals are LVDS. Green respresents image data, all other colors represent control words as defined by the respective protocols. . . . 55 Figure 25 OpenCV Feature detection using a planar checker-

board pattern as mounted to a wall. The visual marker in the center is not actually used in the algorithm. . . . 63 Figure 26 SADF graph representing the video streaming

behaviour in our system and used as a basis for dataflow analysis. . . . 68 Figure 27 Corresponding FSM for the Dref node in our

dataflow model, representing write and refresh states with write as initial state. . . . 68 Figure 28 Imaginary FSM for the Dref node in our dataflow

model, using counters to switch between states and scenarios. . . . 70 Figure 29 Simplified latency-rate model of our system,

with the SRDF graph at the top and the cor-

responding task graph at the bottom. Dotted

lines represent the imaginary flow of data in

the system, and are not part of the actual graph. 72

(9)

Figure 30 Stages of development for the camera module PCB, from CAD design to production to pick and placing of SMD and BGA components. . . 80 Figure 31 Final prototype hardware, showing the five in-

dividual sensor modules (left) and the final setup as evaluated with the modules and FPGA hardware connected (right). . . . 81 Figure 32 Stroboscope hardware, showing an array of 10

area LED lights driven by 10 MOSFETs and connected to a real-time embedded system. . . 85 Figure 33 Timing characteristics of the stroboscope, strob-

ing at a 1000 Hz rate, showing two LED-driving MOSFETs being subsequently triggered at 1 ms with an error of only 0.02%. Measured using a 4 GHz logic analyzer. . . . 86 Figure 34 Final frames as captured by four sensors of the

prototype at 240 Hz after being processed in the software domain, ordered from top-left to bottom-right and showing a seamlessly over- lapping capture of the timed stroboscope lights from sensor to sensor. Each subsequent frame is captured by a different sensor, at different subsequent points in time. Stroboscope lights run from bottom left to top right in time, as indicated by the white arrow in the first image. 87 Figure 35 Sensor timing diagram corresponding to our

four sensor prototype at 240 Hz and the frames in Figure 34 . Our stroboscope LED timing is shown on top, with each number and color representing one of the 10 LEDs that is turned on at that time for 1 ms. Dark green represents a single frame, or the light integrated by a sin- gle sensor in the time domain for a single frame exposure, i.e. the first frame captures light from 4 different strobe LEDs. Light green represents the unavoidable time in which a sensor’s shut- ter is closed and image data is read out to our system. When all frames are combined, they in fact form a continuous integration of physical light with an exposure time and capture rate equal to 4.2 ms or 240 Hz as witnessed in Fig- ure 34 . . . . . 88 Figure 36 Timing characteristics of the I

²

C exposure start

commands as sent to different sensors in case of a hypothetical 1200 Hz (833 us) capture rate.

Measured using a 4 GHz logic analyzer. . . . . 89

(10)

1

I N T R O D U C T I O N

Over the past century, the world has seen a steady increase in techno- logical advancement in the field of digital photography. Now more than ever, consumers rely on the availability of advanced digital cam- eras in personal devices such as mobile phones, tablets and personal computers to snap high resolution pictures and videos, all the while cinematographers in the field find themselves with an ever increasing variety of very high-end cameras. And in between these two markets, we have witnessed the rise of an entirely new "prosumer" segment, wielding action cameras and 4K camcorders in the lower high-end spectrum to capture semi-professional video.

Some of the most obvious advancements in cameras can be found in higher resolution imaging capabilities of sensors, as well as semi- conductor production techniques that have seen vast improvements over the years. As a result, sensors have become smaller, more capa- ble, and cheaper to produce, and the cost of including such a sensor of acceptable quality in a new embedded consumer product is rela- tively low, especially in the lower segment of products. Furthermore, some parts of these sensors have become standardized in the indus- try, practically turning many imaging sensors and related technology into commercial off-the-shelf components.

Though, one of the areas of videography that has seen increasing demand but quite conservative technological innovation is the field of high-speed imaging. With the advent of television and on-line docu- mentary series revolving around the capture of slow motion footage, consumers and prosumers have been voicing their interest in want- ing to create slow motion video at home. While professional high- speed cameras have long been available, their prices are far outside the reach of these markets, often costing up to three orders of magni- tude more than conventional cameras. The reason for this is quite sim- ple: high-speed imaging puts extreme demands on an imaging sensor and its surrounding platform in terms of bandwidth, noise and tim- ing as we will see in this thesis, therefore raising the cost price of a single high-speed imaging sensor into the hundreds or thousands of dollars, let alone the required effort and expertise to design the surrounding hardware.

Of course, this does not mean that there is no other way to pro-

vide technological innovation in this field. Taking into account that,

at the lower end, imaging sensors are becoming more standardized

and cheaper, the question arises as to whether it is now possible to

use multiple low-end sensors to achieve the same as a single high-end

(11)

sensor. This trend of combining commodity-class or low-end compo- nents is already being applied by big players such as in the online and cloud computing industry, and can prove to be cost-effective if the components can be properly combined by means of a surrounding platform of hardware and software that is scalable in the numbers.

This thesis focuses on this very idea of using multiple low-cost components to create something much bigger: an embedded system platform, that, not only achieves high-speed imaging capabilities by using an array of imaging sensors, but also provides a scalable and reusable design for future systems with increasingly larger config- urations. Finally, a small-scale prototype based on this platform is produced and evaluated to assess the real-world viability of such a product.

1 .1 p r o b l e m d e f i n i t i o n

In order to investigate, design, implement and ultimately realize such an embedded system platform, we set out the following objectives for this thesis:

1 . Investigate the viability of using a sensor array for high-speed imaging applications —

a) Determine the trade-offs of using multiple sensors versus a single sensor;

b) Identify the negative side-effects, and how to mitigate their effect;

2 . Research and design an embedded system platform that can be used to realize a scalable array of image sensors —

a) Determine relevant hard- and software domains and de- sign subsystems that fit within these domains;

b) Identify any respective bottlenecks in these subsystems and how to mitigate their effect on an implementation;

3 . Realize an embedded system using this platform capable of high-speed imaging capture using a sensor array —

a) Develop a capable hardware design implementing the em- bedded system platform;

b) Implement the software domain solutions and integrate these with the hardware design;

c) Design and implement a hardware setup that can be used to verify the high-speed imaging capabilities of the system;

d) Verify the high-speed imaging capabilities of the system in

a real-world setup.

(12)

Based on these objectives, we define our research question to be the following:

Is it viable to use an array of image sensors for high-speed imaging in an embedded form factor, and if so, which hardware and software domain components would be required to implement such an embedded system?

A solution is considered to be viable if the following theoretical evaluation criteria are met:

• Capable of interfacing with at least 16 image sensors at a cap- ture rate of 60 Hz leading to an effective total capture rate of 960 Hz.

• Capable of issuing phased start commands to image sensors with a timing accuracy of at least 99%, as further explained in Chapter 4 .

These criteria are especially important as they allow the production of a high-speed video using the system, as we will see in this thesis.

1 .2 c o n t r i b u t i o n s

This thesis makes a major contribution in the field of sensor array research by designing a novel embedded system platform that is con- figurable, scalable and complete. This platform covers all necessary aspects in hardware and software necessary in order to enable high- speed image capture using an array of image sensors. Currently, no other known documented solutions exist describing such an embed- ded system.

As part of this design, this thesis introduces a number of hardware subsystems that allow for interfacing with a varying number of image sensors and capturing their corresponding data. This includes a novel and specialized streaming DRAM controller specifically targeted at writing streamed data, as well as a custom interleaver that combines multiple streams of sensor image data into single coherent DRAM write commands.

Building upon these hardware subsystems, this thesis also presents corresponding software subsystems that interface with this hardware in order to properly transform the captured image data into high- speed video. This includes a novel HiSPI sensor protocol decoder, an embedded control system used to configure the sensor array, and a video streaming pipeline that includes a custom deinterleaver as well as options for image rectification and camera calibration.

This platform effectively establishes a significant scale reduction

of all prior documented efforts into an embedded and potentially

handheld form factor. This kind of hands-on description has not been

seen before in this field of research.

(13)

The platform is further solidified by the actual production of a pro- totype, showing real-world viability and presenting future opportu- nities to allow for an even finer product. This real-world exemplifies the actual practical use of the sensor array as an end product, which is often overlooked in existing research.

This thesis does not contribute any new ideas and algorithms in the field of image rectification or distortion caused by sensor arrays, al- though efforts are made to include current state of the art and proven image techniques in the software domain.

1 .3 t h e s i s o u t l i n e

This introductory chapter has so far introduced the reader to the scope and outline of this thesis, as well as the problem definition and research question.

Chapter 2 provides background information on the various topics that are fundamental to this thesis. These topics include the use and technical breakdown of digital imaging sensors, high-speed imaging and camera arrays. Insight is also provided into prior efforts in the field of high-speed imaging using camera arrays to provide perspec- tive in how this thesis contributes to this field.

Chapter 3 describes the top-level design of the system as presented in this thesis. It provides an overview of the system in terms of the various subsystems that are to be implemented in one of the two associated domains of hardware and software.

Chapter 4 deals with the relevant subsystems in the hardware do- main, making it possible to interface with a configurable number of sensors. This includes solutions for clock domain crossing, as well as providing an elaborate description of the design of the sensor receiver interface, memory controller, interleaver, embedded processor, sensor control interface and readout interface.

Chapter 5 presents the subsystems in the software domain that deal with transforming the raw data from the hardware domain into useful video data. This chapter describes the sensor protocol decoder, as well as solutions for image rectification and camera calibration.

Chapter 6 is concerned with the theoretical dataflow analysis in which dataflow models are presented that models the primary bot- tlenecks in the system, and analysis is performed to determine the throughput of the system and any involved buffers while discussing the viability of the system.

Chapter 7 describes the experimental results of this thesis. This

chapter provides insight into the real-world hardware prototype that

was produced, implementing the presented platform. A setup is used

to verify the produced video of this prototype, and timing analysis

is done to verify the timing accuracy of the hardware, and thus the

real-world viability of the platform as a whole.

(14)

Finally, chapter 8 concludes this thesis by providing a brief sum-

mary of this thesis and including final thoughts on future work.

(15)

2

B A C K G R O U N D

Today, our world is filled with embedded systems with wildly vary- ing applications, often relying on the use of techniques for digital image capture and processing. The decreasing cost of digital imag- ing allows for more and more imaging systems to appear in different markets, from consumer photography (e.g. digital cameras and cam- corders), to industrial or military (e.g. product verification, surveil- lance and biometrics) and far beyond.

2 .1 d i g i ta l i m a g e s e n s o r s

These digital imaging systems, such as the one presented in this the- sis, rely on the use of image sensors. And although the basic concept of an image sensor seems simple, e.g. to convert incident light or image into a digital representation, the actual implementation of an image sensor varies considerably depending on the technology that is being used.

Figure 1 describes a simple overview of a modern color image sen- sor. First of all, incident light from the scene is focused by optics, typi- cally one or more lenses. This focusing, like in most imaging systems or cameras, is necessary to ensure that a certain view of the scene is focused onto a capturing plane behind it. The light then passes through a color filter array (CFA) that contains a predefined pattern of colors and is then captured and converted into the analog signal domain by photodetectors directly behind the filter. The analog sig- nals are ultimately processed by an on-chip analog signal processor and converted into the digital signal domain, after which a variety of processing techniques can be used to produce a final image.

f u l l c o l o r i m a g i n g The photodetectors or photodiodes them- selves are semiconductors sensitive to charged particles and photons.

Scene Imaging Optics Filter & Pixel Array Analog-to-Digital Post-Processing

Figure 1: Block diagram showing a simplified overview of the processes in-

volved in a modern color image sensor, from incoming scene light

to final output image.

(16)

Figure 2: Example of a Bayer pattern encoded image (left) and the resulting full color image after the debayering process (right).

These work by absorbing particles and photons and emitting a volt- age proportional to the indicent power, as described in [Big+06], and are thus oblivious to color as they only describe a relation between the amount of light and an analog signal. By placing a filter in front that only passes light in a certain color spectrum range, they can be utilized to only detect the amount of light of a certain color.

In practice, the color filter array that is placed in front of the pho- todetectors often consists of a so called Bayer-pattern that only allows a single color to pass through, making each single photodetector pixel sensitive to a predefined color. Each of the pixels in the sensor’s out- put image only encodes the intensity of a single specific color such as red, green or blue. To produce a final full color image, a spatial in- terpolation process known as demosaicing is used, which interpolates multiple single-color pixel values to produce single full color pixels containing respective values for red, green and blue. Demosaicing is an active topic of research, and different interpolation techniques currently exist, as shown in [LGZ08]. An example of a Bayer pattern image and resulting full color image can be seen in Figure 2 .

c m o s a n d c c d s e n s o r a r c h i t e c t u r e As described in [EGE05], the array of photodetectors introduced before varies significantly be- tween different types of image sensors. Many of the difference arise from the way the array is structured and read out into the analog signal domain, also referred to as the readout architecture, and these differences can be seen in Figure 3 . Modern image sensors generally come in two different types, complementary metal-oxide semiconductors (CMOS) and charge-coupled devices (CCD), and although CCD-type sensors have traditionally been associated with high quality but high cost, recent advancements in CMOS-technology have allowed for the introduction of lower cost CMOS-type sensors approaching the qual- ity of their CCD-type counterparts.

In a modern CCD-type sensor, the array is typically built out of

photosensitive capacitors that passively collect and store a charge for

(17)

Horizontal CCD

Vertical CCD

(a) CCD (interline transfer).

Row Select Logic

Analog Signal Processing

Column Analog-to-Digital Conversion

Column Multiplexer

Timing and Control

(b) CMOS (active pixel sensor).

Figure 3: Readout architectures for conventional CCD and CMOS sensors.

Photodiodes and capacitive elements are respectively colored dark green and yellow.

each pixel during exposure. During readout, charge is shifted out of the array, step by step, into horizontal and vertical CCDs and con- verted to the analog signal domain by means of an amplifier circuit, after which it is sampled and converted by the analog signal proces- sor.

In a modern CMOS-type sensor, the array is typically built out of an active pixel circuit containing a photodetector and amplifier that produces an analog signal for each pixel while being exposed. Dur- ing readout, the analog signals of each row of the array are selected, sampled and integrated, one by one, before being converted by the analog signal processor.

The cost-effectiveness of CMOS sensors arises from differences in the manufacturing process, also described in [Fos97], where integra- tion of a significant amount of on-chip VLSI electronics is possible at a lower cost. This makes it possible to create single solution chip CMOS sensors that contain very elaborate analog signal processors, as opposed to expensive off-chip analog signal processing typically required for CCD sensors.

On the other side, CMOS sensors suffer from a high level of fixed pattern noise due to variations in the manufacturing process as well as other forms of noise such as dark current noise that reduce the effective signal-to-noise ratio (SNR) of the sensor.

In practice, CCD sensors are often seen in highly specialized mar-

kets such as astronomy and microscopy, while CMOS sensors can be

found in a much wider range of applications, including mobile and

consumer electronics, due to their low cost and good quality.

(18)

X

Y

readout reset

rows

time exposure

exposure

Figure 4: Rolling shutter mechanism in action. Left shows the shutter’s di- rection of movement in terms of pixel rows. Right contains a plot of row exposure over time, clearly showing the sliding window of time.

r o l l i n g a n d g l o b a l s h u t t e r s Like conventional film cam- eras, image sensors also require the use of a shutter that controls the exposure of the photodetectors. As mentioned before, each sensor generally has two steps of action: exposure and readout. Both before and after exposure, this shutter prevents any incident light from hav- ing an effect on the photodetectors, such that the photodetectors are only exposed to light for a predetermined amount of time.

Differences in the sensor’s shutter mechanism can, in some cases, have a profound effect on the final output image of the sensor. As opposed to film cameras, sensor shutters are often electronic and op- erate by resetting or decharging the individual pixel circuits of the array, but the shutter timing changes significantly depending on the type of sensor used.

Note that the CCD sensor stores a charge for each individual pixel during and after exposure, essentially capturing and storing a com- plete image inside the sensor until it is read out of the array. This type of shutter mechanism is referred to as a global shutter and in a sense resembles that of a film camera. This mechanism differs greatly from that of the conventional CMOS sensor. Due to the way CMOS sensors are manufactured, only a few rows can be selected, exposed, sampled and integrated in the entire sensor at a time. This principle is known is a rolling shutter mechanism, and is shown in Figure 4 .

The direct result of the rolling shutter is that different rows of sen- sor pixels are only exposed at different points in time. In other words, not all pixels in a final output image will have been captured in the same window of time. While this shift in time simply does not matter for scenes in which there is little movement, significant distortion ar- tifacts will manifest themselves in the image as soon as fast moving objects are captured.

The introduction of an effective global shutter to CMOS sensors is

one of the most sought-after features in the industry, and is an active

research topic in digital imaging. Examples of global shutter imple-

mentations are [Apt12a] and [Fur+07], in which the active circuitry of

(19)

Photodiode Metal layer

Metal layer

(a) Rolling shutter pixel.

Photodiode Metal layer

Metal layer

Memory

(b) Global shutter pixel.

Figure 5: Simplified view of two active pixels with different shutters in a CMOS sensor at the silicon level. The addition of a memory ele- ment causes occlusion of an area of the photodiode.

each pixel is augmented with a sample and hold or memory element that stores the analog signal until all rows of the entire array have been read out. Although the CMOS readout architecture still requires a row-by-row readout, the individual pixel memory elements now allow for the entire array to be exposed at the same time.

The difficulty of adding memory elements to the array lies in the fact that the memory element itself introduces varying degrees of sig- nal contamination. In Figure 5 , it can be seen that the element itself takes up physical space in the pixel array. As such, incident light can never be fully focused on the photodiode due to optical imperfections, and some may fall on the element as well. This stray light contami- nates the analog signal stored in the element, and must be avoided by covering up the element with shielding. Furthermore, as the analog signal is stored in the element, unwanted dark current may accumu- late during storage, requiring careful design of the underlying silicon to decrease the negative effect on the signal-to-noise ratio.

2 .2 h i g h -speed imaging

Paramount to this thesis, the concept of high-speed imaging is best de- scribed as the still frame capture of fast moving objects, which, when used in the context of video, also implies an equivalently high cap- ture frame rate. It is used to analyze real-world events that are diffi- cult to capture with conventional digital cameras, such as automotive crashes, ballistics, golf swings, explosions, and so on.

High-speed photography itself has a long history, which started

well before the practical invention of video. The first documented

experiments were done in the 19th century, in a time when film cam-

era shutters were still very crude and slow mechanisms, hence freez-

ing the motion of fast moving objects was difficult. As described in

(20)

Figure 6: Talbot’s high speed photography experiment. A paper disc is cap- tured at standstill (left), spinning and captured with an exposure time of 10 ms (middle), and with an exposure time of 1 ms (right).

Images courtesy of [VM11].

[Ram08], these early experiments featured a setup with a fast spin- ning paper such that the text on it was unreadable by the human eye.

However, by using an arc flash with a duration of 1/2000 of a second, still images with sharp readable text could now be reliably produced.

Figure 6 shows a modern version of the spinning disc. In order to capture sharp still images, the exposure of the scene’s light to the cam- era must be minimalized: either by opening and closing the shutter very quickly, or by using a single light source that flashes for a frac- tion of a second. If the exposure is too long, fast moving objects in the image will appear fuzzy and blurred due to the accumulation of light at different positions along the object’s frame of movement.

A multitude of different high-speed imaging cameras are available

today and include not only high-end cameras for industrial and sci-

entific use, but also low-end consumer cameras, spurred by a recent

consumer and professional interest in high-speed photography. Ex-

amples include the fps1000 and Edgertronic projects, both of which

have been successfully funded and delivered through crowdfunding

platforms such as Kickstarter and feature a maximum high-speed cap-

ture rate of respectively 550 Hz and 701 Hz at 720p resolution as in

[Per15]. In comparison, high-end cameras such as the Photon Fastcam

SA series are capable of capturing rates up to 7000 Hz at a 1024x1024

resolution, shutter times in the microseconds, and are often the first

choice in scientific research dealing with physical and chemical phe-

nomena such as in [Gül+12] and [BS14]. Nevertheless, these high-end

cameras come at a high price and are often only available on a daily

rental basis, placing them out of reach for normal consumer and "pro-

sumer" markets. Despite the relatively low-end specifications of prod-

ucts such as fps1000 and Edgertronic, price tags of respectively $1500

and $5495 and successful crowdfunding campaigns have shown that

there might be an untapped market demand underneath the existing

high-end products.

(21)

Figure 7: The Stanford Multi-Camera array setup as described in [Wil+05].

2 .3 i m a g e s e n s o r a r r ay s

One aspect that virtually all high-speed imaging products have in common today is that they are designed around a single high-speed CMOS imaging sensor. Depending on the target market and cost price of the final product, a design choice is made from a variety of high- speed imaging sensors with different low- or high-end performance characteristics and matching cost prices such as the CMOSIS series ([WM15]) and Sony IMX or Exmor series ([Son11]). These sensors are often at the forefront of CMOS technology, incorporating cutting edge technology such as back-illuminated sensors, novel global shut- ters and capacitive storage elements to mitigate the sensitivity and bandwidth issues that come into play when imaging at high speed and short shutter time. This in turn demands a high sensor unit cost price, which in turn dictate the minimum cost price of these camera products.

Keeping in mind that conventional high-speed camera products are thus still dependent on one or more relatively high cost components, we investigate the possibility of using an alternative design that re- moves the dependency on these high cost components. We consider an alternative design that does not use a single high cost sensor, but rather multiple low cost sensors, configured in an array, to achieve these high-speed imaging capabilities.

One of the earliest projects involving the use of multiple cameras

in a similar field of research is [Wil+01] at Stanford University. In

this thesis, light field data is captured using an array of up to 64

custom camera boards connected by IEEE1394 bus to one or more

video processing host PCs. Here, the camera boards are custom de-

signed individual single-board computers containing a microproces-

sor, IEEE1394 chipset, MPEG2 video encoder, Xilinx Spartan FPGA

and Omnivision CMOS image sensor. A continuation of this thesis

is described in [Wil+04], in which an identical hardware setup, also

seen in Figure 7 , is used for high-speed imaging and which is practi-

cally the first documented research involving a camera array for high-

(22)

speed imaging. The camera array consists of 52 Omnivision sensors, each capturing at a rate of 30 Hz, providing a video stream that is fur- ther processed by two or more host PCs. The research itself is mostly concerned with dealing with the side effects of using a camera array for high-speed imaging, and therefore provides fundamental insights in correcting geometric distortion, rolling shutter distortion and cam- era timing, which we will also cover in this thesis.

Further in-depth research on this particular project at Stanford is

described in [Wil05]. Here, a number of possible applications for cam-

era arrays are researched and include synthetic aperture photography

(SAP), spatiotemporal view interpolation (SVI) and high-speed imag-

ing. While SAP and SVI are useful in the sense that they provide

image aperture manipulation and improved image quality, we focus

on the high-speed imaging research as this closely fits the scope of

this thesis. Finally, the Stanford setup is further refined in terms of

hardware and system architecture in [Wil+05], and image quality is

evaluated by comparing with a conventional digital camera showing

encouraging results.

(23)

3

H I G H - L E V E L S Y S T E M D E S I G N

The purpose of the embedded system platform presented in this the- sis is to interface with an array of image sensors in order to allow high-speed imaging. The platform, or system, is composed of a num- ber of novel custom hardware and software components that imple- ment the required functionality to ultimately produce a video stream containing high-speed images.

In this chapter, an overview of the system as a whole is presented at increasing levels of detail. We simply begin by establishing the different domains that make up this platform:

• Hardware domain. This domain contains all components imple- mented in hardware, or more specifically, a reconfigurable FPGA platform. The use of a FPGA allows for rapid design and proto- typing, and is thus fundamental to the platform.

• Software domain. This domain contains all components imple- mented in software. The components represent the programs that run on CPU-based architectures either embedded in or con- nected to the hardware domain.

In Figure 8 , a system diagram can be seen describing these do- mains and the components involved at a high and abstract level. In this diagram, the primary dataflow is clearly illustrated and effec- tively begins in one or more image sensors, flows through the FPGA and RAM in the hardware domain, and ultimately ends up at the host in the software domain. A number of secondary dataflows also exist to ensure that the primary dataflow is controlled. Furthermore, while the separation of hardware and software domain boundaries is gen- erally clear, the software domain also covers a small part of software used within the hardware domain, namely that of the embedded con- trol processor as we will see later on in this thesis.

Figure 8: Abstract system diagram illustrating the hardware and software

domain boundaries, their high-level components and the corre-

sponding dataflows in the system.

(24)

Figure 9: Hardware domain diagram showing its various subsystems and components. Orange indicates a component external to the hard- ware domain, grey is a custom subsystem implemented by FPGA logic, yellow is a logic element and green represents software. Ar- rows indicate dataflows, with red representing the capture stage, and blue representing the readout stage.

h a r d wa r e d o m a i n The hardware domain represents the bulk of components in the system, and for good reason. At the highest level, the hardware simply performs two dataflow stages: first to capture as much raw data as possible from all image sensors, and secondly to read out and transmit all this captured data to the software domain for further processing. Note that these two stages do not have to occur at the same time due to the nature of video capturing where data is generally first captured to a storage medium, and then offloaded to another system for further use.

The capture stage is assumed to be a unique and critical stage that cannot be interrupted or paused, as this may lead to data corrup- tion, making it hard real-time by definition as per [But11]. Needless to say, the hardware domain must then provide guarantees that no overflows ever occur in the system in order to avoid fatal data cor- ruption. In contrast, the readout stage occurs after the capture stage has completed, and can occur at any speed. Since it is fully off-line and non-critical, this thesis therefore imposes no timing constraints on this stage.

All components involved in the hardware domain can be seen in Figure 9 . As has been mentioned before, the components or subsys- tems in this domain are implemented as FPGA logic using VHDL or Verilog together with vendor-specific primitives. All components are in fact custom implementations especially designed to implement critical functionality for this platform. In Chapter 4 , each of these sub- systems is described on a functional level. Note that due to the gritty details involved, implementation-specific details such as VHDL or Verilog code do fall outside the scope of this paper.

Next to the FPGA logic (grey components), Figure 9 also illustrates

a number of components (in orange) that are external to the domain

but nevertheless critical to the functionality of the system. Interfaces

to these components are provided by the FPGA logic inside the hard-

(25)

Figure 10: Software domain diagram showing its various subsystems and components, including the embedded control. Orange indicates a component external to the software domain, grey is a custom sub- system implemented in software and yellow is a specific software component. Arrows indicate dataflows, with red representing the primary video dataflow.

ware domain, enabling the primary data to flow from image sensor to the software domain.

s o f t wa r e d o m a i n The software domain largely starts where the hardware domain ends, implementing the necessary components in software to produce a video stream or file containing the captured high-speed images. It is also concerned with performing command and control in order to set up the dataflow at various points in the system, as well as providing auxiliary functionality such as DRAM self-testing.

Figure 10 shows the components and subsystems that make up the

software domain. Here, all subsystems except for those in the dotted

lines represent new custom pieces of software that have been explic-

itly designed for this platform. Note that the diagram also shows a

number of components inside the hardware domain. As mentioned

before, the software domain also covers the software specially writ-

ten for the control processor that is embedded inside the hardware

domain. All subsystems except for those in this embedded part run

on a host computer and are connected to each other through a video

stream pipeline software architecture known as GStreamer. Further

details are provided in Chapter 5 .

(26)

4

H A R D WA R E D O M A I N I M P L E M E N TAT I O N

In the previous chapter, an overview of the system was provided that explained the various domains involved in the design. This chapter will focus specifically on the details of the hardware domain, as il- lustrated in Figure 9 . Recall that each of the subsystems is a custom design specially designed to provide functionality specific to our sys- tem. Each of the sections in this chapter will describe the background, design choices and functional implementation of each of the subsys- tems.

4 .1 s e n s o r r e c e i v e r i n t e r f a c e

The very first input of the entire system and thus the hardware do- main is in fact the sensor array, consisting of a varying number of image sensors. Each of these sensors are physically connected to the FPGA that implements our hardware domain. We therefore require a subsystem in our hardware domain that implements an interface capable of capturing all raw data as physically received by the FPGA.

This section introduces a sensor receiver subsystem that interfaces with a single image sensor. This subsystem is designed such that it can then be instantiated multiple times, for each individual sensor, such that all raw data coming in from the entire sensor array can be captured.

We first provide some background on the physical signaling stan- dards relevant to the receiver, after which the timing parameters rele- vant to the receiver are introduced. Finally, the FPGA logic elements necessary for the functionality of the receiver are described.

4 .1.1 Physical signaling

Conventional CMOS image sensors have traditionally been equipped

with a parallel interface that utilizes simple single-ended signaling to

encode all the necessary video data. This parallel interface, which can

also be seen in Figure 11 , contains a certain number of parallel data

lines with a corresponding clock and synchronization signals. Typical

configurations are 8 or 12 data lines, equivalent to the sensor’s pixel

bit resolution, where each individual line represents a single bit of

the image pixel, and full pixels are thus transmitted on every rising

edge of the corresponding pixel clock. The advantage of the parallel

interface is obvious: it allows for very simple interfacing with low

complexity receivers.

(27)

VSYNC

HSYNC

D[N:0]

PCLK

active lines

active pixels

Figure 11: Standard parallel CMOS sensor interface. Data lines D

0

through D

_N

represent the N-bit encoded pixels of the output image.

The increase of sensor resolution, bandwidth and signal integrity requirements have however led to the adoption of serial interfaces such as Low Voltage Differential Signaling (LVDS). A typical differential interface circuit can be seen in Figure 12 and shows a current-mode driver, two equal transmission lines with opposed polarity and equal impedance, and a comparator at the receiving end.

Although serial transmitters require a higher clock frequency, they allow for a significant improvement in signal integrity of the overall system due to a number of critical factors, such as:

• Common-mode noise. For differential signaling, common-mode noise is injected on both transmission lines and is rejected at the receiving end as only the differential value is sampled.

• Switching noise. The design of differential current-mode drivers allows for a reduction of (simultaneous) switching noise and ringing in the transmission lines as well as the overall system.

• Crosstalk and electromagnetic interference. Differential signal- ing radiates less noise into the system than single-ended signal- ing since magnetic fields are canceled out by the two transmis- sion lines.

Because differential signaling reduces any concerns regarding noise, it is possible to use lower voltage swings in the transmission lines, typically 350 mV for LVDS and 150 mV for SubLVDS. In turn, reduc- ing the voltage allows lowers the overall power consumption and also allows higher data rates to be used, as data can be switched more quickly. Furthermore, the use of current-mode drivers creates an almost flat power consumption across frequency, such that the fre- quency can be increased without otherwise exponentially increasing power consumption [Gra04].

The resulting differential serial interface as implemented in an im-

age sensor typically replaces all single-ended signals of the parallel

(28)

DRIVER

RECEIVER

Figure 12: LVDS output circuitry: a current-mode driver drives a differential pair line, with current flowing in opposite directions. Green and red denote the current flow for respectively high ("one") and low ("zero") output values.

interface with just a differential clock line and only one or more dif- ferential data "lanes" onto which all relevant data is serialized. The number of required I/O pins per sensor are thus minimized, requir- ing less complex board designs. The control and data signals of the earlier parallel interface shown in Figure 11 are instead encoded as a continuous serial stream of bits, and specific leading and trailing bit patterns distinguish between different parts of data, as will be explained in Section 4 .1.1.

There are many varieties of LVDS on the market today, each with different characteristics suited for certain application or vendor spe- cific solutions. LVDS as defined in the original TIA/EIA-644 standard is generally considered to be the most common and original specifica- tion from which many of these implementations are derived. Varieties include BLVDS (Bus LVDS), M-LVDS (Multipoint LVDS), SCI (Scaleable Coherent Interface) and SLVS (Scalable Low-Voltage Signaling) and these varieties mainly exhibit differences in electrical characteristics such as voltage swing, common mode voltage, ground reference and output current [Gol11].

Note that this thesis focuses solely on the use of LVDS-capable sensors in its broadest sense, where the use of the term LVDS refers to LVDS and any of its many derivative transmission standards.

i n d u s t r y -standard interfaces Increased market demand for sensors with higher image resolution, greater color depth and faster frame rates means that current processor-to-camera sensor interfaces are being pushed to their limits. As far as these sensor interfaces go, bandwidth is not the only important design choice: robustness, cost-effectiveness and scalability are among the factors that have an important weight, especially for mobile devices [MIP16a].

As such, the MIPI (Mobile Industry Processor Interface) Alliance was

established to develop a new industry-standard interface, and has

(29)

SoT

SYNC PH EoT

SoF

SYNC FLR ... CRC IDL

SoT

SYNC PH ... PF EoT

LP HS (Short Packet) LP

MIPI CSI-2

HiSPi (Streaming-SP)

HS (Long Packet) LP

Figure 13: Simplified fragment of a video data transmission using MIPI CSI- 2 (top) and HiSPi (bottom). LP indicates logic high and low using single-ended signaling, while all other signals are LVDS. Green respresents image data, all other colors represent control words as defined by the respective protocols.

been rapidly replacing conventional parallel interfaces with their CSI, CSI-2, CSI-3 (Camera Serial Interface) specifications. These CSI speci- fications define the use of a physical communication layer and proto- col layer capable of transporting arbitrary sensor data to the receiver [Lim+10].

The MIPI Alliance promotes their CSI specifications as open stan- dards, though one should note that all CSI specifications are confi- dential. The actual degree of openness and risk of infringement for these specifications is therefore unclear

¹

. It is possible that this pol- icy has indirectly resulted in the introduction of HiSPi, an alterna- tive interface standard that is otherwise very identical to MIPI CSI-2 and designed by Aptina Imaging for use in their own line of LVDS- capable sensors [Sem15a]. As CSI-2 (and thus HiSPI) is currently the most commonly used standard for the class of sensors most relevant to this thesis, we will not further elaborate on the use of CSI, CSI-3 or any other recently introduced standards. Instead, we briefly elaborate on the physical layers of CSI-2 and HiSPi as far as the sensor receiver interface is concerned. In Section 5 .2, we further describe the protocol layer of these standards.

The CSI-2 and HiSPi specifications define a physical layer that spec- ify the transmission medium, electrical circuitry and the clocking logic used to accomodate a serial bit stream. In the case of CSI-2, this physical layer is referred to as the D-PHY and is composed of between one and four (generally) unidirectional data lanes and one clock lane, capable of transmission in one of two modes known as Low Power (LP) and High-Speed (HS) mode.

In HS mode, each lane is terminated on both the transmitter and receiver side and is driven by a SLVS transmitter, allowing for high- speed bursts of protocol packets. In LP mode, all lanes are instead

1 MIPI is an independent, not-for-profit corporate entity that requires an active mem-

bership for the disclosure of any of the confidential specification documents. Mem-

berships require payment of dues ranging from $4.000 to $40.000, yet any company

can apply to join [MIP16b]

(30)

ACTIVE PIXELS DARK PIXELS

DARK PIXELS

ACTIVE PIXEL COLUMNS

ACTIVE PIXEL ROWS

(a) Physical CMOS pixel layout.

ACTIVE PIXELS (VALID IMAGE DATA)

HORIZONTAL BLANKING

VERTICAL BLANKING HORIZONTAL/VERTICAL BLANKING 00 00 00 00 ... 00 00 00 00 00 00 00 00 ... 00 00 00 00 00 00 00 00 ... 00 00 00 00 00 00 00 00 ... 00 00 00 00

00 00 00 00 ... 00 00 00 00

(b) Spatial layout of image frame.

Figure 14: Typical CMOS sensor layout in terms of physical pixels (left) and corresponding structure of the image frame readout data (right), as found in the Aptina MT9M021 CMOS sensor [Apt12b].

driven as single-ended LVCMOS wires, allowing for control signal transmission at decreased power consumption. HS transmissions oc- cur in bursts, and each burst is preceded and followed by various con- trol signals in LP mode to signify the start and end of transmission to the receiver, while the HS transmission itself contains a synchroniza- tion sequence of bits used to align the receiver’s deserializer to the correct word boundary [Lim+10][Xil14].

The HiSPi standard omits the use of a single-ended low power mode in its entirety and only uses a differential SLVS driver in the physical layer [Sem15b], further reducing the receiver’s complexity in case of full compatibility. An example of a data transmission for both standards can be seen in Figure 13 .

4 .1.2 Timing characteristics

As explained in the previous section, image sensors with differen- tial serial interfaces come equipped with an on-chip serializer that transmits binary data at a certain bit rate. This bit rate, referred to as the output serializer frequency, is fully defined by the characteristics of the image sensor, where modern differential image sensors typi- cally operate in the range of 100 and 1000 MHz. Characteristics are often configurable by the developer and include the image resolution, frame rate, bit depth and others.

Since the timing of a sensor is defined by its output serializer fre- quency, which in turn is defined by the configurable characteristics of the sensor, insight into the sensor timing can be gathered by looking at these characteristics in detail.

Figure 14 shows a typical physical CMOS sensor pixel layout, in

which only a smaller region contains active pixels that will actually

capture image data. Pixels outside the active region can be used for

different auxiliary purposes: so called dark pixels are used for black-

(31)

level calibration, while barrier pixels serve no purpose other than to sit in between other types of pixels. This makes it necessary to define a 2 D region or window of active pixels that will capture the final image.

All physical pixels within the active region are then sampled dur- ing image readout and padded with horizontal and vertical blanking pixels, as can be seen in Figure 14 b. Both the active pixels and blank- ing pixels make up the final image frame as it is transmitted, and the blanking regions serve no other purpose than to ensure correct tim- ing between the sensor capture frame rate and data transmission, as we will see later on.

As part of this layout of sensor pixels, we can explicitly define the following configurable characteristics or sensor parameters:

• Active pixel columns, N

active_cols

: the number of active pixels or horizontal columns in the sensor array that are transmitted for every row.

• Horizontal blank columns, N

blank_cols

: the number of blank pix- els or horizontal columns in the sensor array that are transmit- ted for every row after all active columns.

• Active pixel rows, N

active_rows

: the number of active pixel rows in the sensor array that are transmitted. Together with N

active_cols

, this defines the 2D region of active pixels.

• Vertical blank rows, N

_{blank_rows}

: the number of blank vertical rows in the sensor array that are transmitted after all active rows.

Note that these characteristics only describe the various imaging regions. Obviously, each image pixel contains an amount of data, and each image frame consisting of pixels is to be captured at a certain exposure time, and transmitted at a certain rate and over a certain physical interface. We thus further identify the following typical char- acteristics or sensor parameters that directly affect sensor timing:

• Pixel bit depth, N

_{bit_depth}

: the amount of bits per pixel that en- code the pixel’s color value.

• Serial lanes, N

lanes

: the number of individual differential serial lines used to serialize data.

• Exposure time, t

exposure

: the time in which the (global) shutter is opened, all pixels are exposed, sampled and integrated.

• Pixel clock, f

PIXCLK

: the rate at which individual pixels of a frame are transmitted.

As soon as the layout of image frame has been defined, it is fairly

trivial to calculate the timing parameters relevant to the frame:

(32)

t

_row

= (N

active_cols

+ N

_{blank_cols}

) × 1 f

_PIXCLK

Here, t

row

represents the total time spent transmitting a single row of the image frame, which can then be used for all rows:

t

_frame

= (N

active_rows

+ N

_{blank_rows}

) × t

_row

Where t

frame

is the total time spent transmitting the entire image frame. Finally, the frame rate (in Hz) can then be calculated by adding the actual exposure or integration time:

f

_fps

= 1

t

_frame

+ t

_exposure

Since the frame data is to be transmitted over the serial interface, we calculate f

SERIAL

or the serial bit rate at which individual bits are transmitted over a single serial lane:

f

_SERIAL

= f

_PIXCLK

× N

_{bit_depth}

N

_lanes

(1)

t y p i c a l e x a m p l e As an example of the above timing calcula- tions, we consider a 1280 × 720 (720p) resolution CMOS image sensor with the following input parameters:

• N

active_cols

= 1280 pixels, N

_{blank_cols}

= 370 pixels.

• N

active_rows

= 720 pixels, N

blank_rows

= 28 pixels.

• N

_{bit_depth}

= 12 bits, N

lanes

= 2 lanes.

• t

exposure

= 3 ms

• f

PIXCLK

= 74 .25 MHz

The row time t

row

and frame time t

frame

are then calculated as follows:

t

_row

= (1280 + 370) × 1

74.25 MHz ≈ 22.22 µs

t

_frame

= (720 + 30) × t

_row

≈ 16.67 ms

The sensor integration overlaps with the capture period in such a

way that as long as t

exposure

< t

_row

, f

fps

= t

_row

. Since this is the

(33)

case in the above example, the final frame rate f

fps

and serial bit rate f

_SERIAL

are as follows:

f

_fps

= 16 .67 ms = 60 Hz

f

_SERIAL

= 74 .25 Mhz × 12 bits

2 lanes = 445 .5 MHz

As can be seen, careful balancing of the various parameters is re- quired whenever the frame rate is expected to be at a specific final value. As such, the input parameters in this example were tweaked to achieve a final frame rate of 60 Hz.

Since this thesis is concerned with high-speed capture, it is benefi- cial to maximize the sensor’s capture rate f

fps

as well as the image resolution through N

active_cols

and N

active_rows

. In other words, a sensor is expected to capture as much image data as possible at its highest possible frame rate. This is done by setting the input pixel clock f

PIXCLK

to the highest possible frequency, and setting the blank- ing pixels N

_{blank_cols}

and N

_{blank_rows}

to their minimum values.

4 .1.3 FPGA implementation

Now that the timing characteristics for a single sensor have been iden- tified, a matching FPGA logic design is necessary in order to capture the raw sensor data coming in from the physical interface.

The required logic we have designed is fairly straightforward and uses vendor-specific primitives to set up an interface with the physi- cal layer. The following generic elements make up this design:

• Differential clock input or rx_clock. Buffer logic element that inter- faces with the differential clock signal coming in from a sensor.

• Differential data input buffers or rx_data. Buffer logic element that interfaces with a single differential DDR data lane coming in from a sensor.

• Cross clock FIFO or rx_fifo. FIFO logic element that interfaces with all of the above elements as input, and provides an output in a different clock domain.

Using the elements described above and the typical 2-lane sensor

example outlined in the previous chapter, the sensor receiver logic

typically consists of a single rx_clock instance, two rx_data instances

and a single rx_fifo instance. A rx_clock instance is always necessary,

as the rising and falling edge transitions of the differential input clock

are used to sample valid data from the differential data lanes in each

rx_data instance.

(34)

The need for rx_fifo becomes clear if one assumes the likely fact that each sensor operates as a separate unit that is completely inde- pendent from the rest of the system. The incoming differential data clock is therefore unrelated to any other clock and the corresponding data has to be moved over from the sensor’s clock domain into the system’s primary clock domain. In order to achieve this, rx_fifo is im- plemented as a two-clock FIFO synchronizer, as explained in Section 4 .7, clocking the output data safely in the system’s own primary clock domain for further use.

The rx_clock and rx_data element implementations make use of vendor-specific primitives due to their low-level interfacing, e.g. in Xilinx systems, where IBUFDS differential buffer primitives and IS- ERDES2 deserializer primitives are used to realize the sensor receiv- ing interface in the FPGA.

The deserializer primitive deserializes the bit stream coming in from the differential buffer primitives into words of S bits wide, typ- ically 4 or 8 bits dependent on the FPGA vendor. The deserializer primitive is directly connected to the write end of the rx_fifo instance, and therefore imposes an identical FIFO word width of S and arbi- trary depth. The read end of the rx_fifo instance is then exposed as an output bus of S bits wide, accompanied by an enable signal that indicates that the subsystem connected to this bus is reading from the FIFO. To avoid unnecessary filling the FIFO after the image sen- sor has started streaming data but before the FIFO is being read out, the enable signal is connected to both the read enable and write enable signals of the FIFO.

4 .2 s t r e a m i n g d r a m c o n t r o l l e r

This section focuses on the design and implementation of the sys- tem’s specialized memory controller. The controller is a novel design in the sense that it is specifically designed to handle linear DRAM ac- cess patterns, as opposed to conventional random access as is typical for DRAM controllers.

In [Goo+16], it is shown that the drive behind the developments

for conventional DRAM controllers is essentially improvement of the

typical or average-case throughput. As explained in [Axe+14], con-

ventional DRAM controllers also do not provide any form of tempo-

ral isolation, e.g. their latency is largely influenced by other tasks or

history, leading to unpredictable throughput. Since our system oper-

ates in the hard real-time domain, the focus lies on improving the

worst-case behaviour instead. Combining this with the streaming na-

ture of our system dataflow as hinted in previous sections in fact

creates an ideal and fully predictable DRAM use case. The design

presented in this section leverages this behaviour in order to achieve

(35)

optimal sustained throughput that is largely predictable as will later be shown in Chapter 6 .

4 .2.1 Dynamic Random-Access Memory (DRAM)

Today, Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM) is commonly used in embedded systems due to its rel- ative low cost per bit when compared to other conventional volatile memories such as SRAM [Kle07]. It is the de facto choice for em- bedded system boards carrying FPGAs and System-on-Chips, fulfill- ing the need for mebi- to gibibytes of low cost addressable memory.

Manufacturers such as Xilinx, Altera, Lattice Semiconductor provide a variety of options to use DDR memory technology for their de- vices, ranging from hardware memory controllers to fully synthesiz- able memory controllers, as in [Lat15], [Alt15a] and [Geo07]. The use of DDR SDRAM technology for our data capture memory storage therefore seems obvious in terms of practicality.

Apart from the obvious benefits of DDR SDRAM, there is a great variety of DDR device generations and system arrangements to choose from when designing a memory controller. Since the number of stor- age bits contained in a single DRAM device or chips at any given instance is inherently limited, the use of multiple organized DRAM devices provides a necessary degree of storage scalability in the sys- tem. From a (traditional) memory controller’s point-of-view, these DRAM devices are organized as follows [JNW07]:

1 . Channel. A channel represents the use of split data buses, typ- ically used to address multiple memory modules, e.g. a dual- channel configuration that combines two 64-bit modules into a single 128-bit data bus.