A R R AY F O R H I G H - S P E E D I M A G I N G c e c i l l e . etheredge
Computer Architecture for Embedded Systems (CAES)
Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente
July, 2016
This thesis investigates the viability of using multiple low-cost compo-
nents to create something much bigger: an embedded system capable
of capturing high-speed video using a scalable and configurable array
of sensors. Various hard- and software components are designed, im-
plemented and verified as part of a new embedded system platform,
that, not only achieves high-speed imaging capabilities by using an
array of imaging sensors, but also provides a scalable and reusable
design for future camera array systems.
First of all, I would like to thank prof. Marco Bekooij, for his belief in hands-on research and real-world practicality has enabled this re- search to go forward and develop into this thesis.
Secondly, I would like to thank my family as an endless source of optimism to explore technology, and to venture on research such as this.
A thanks to all colleagues and folks in the CAES group. I would like to thank O ˘guz Meteer, whose countless hours of insights and ex- perience provided the fundamentals for without this thesis would not have been possible, and for most of his borrowed electronics involved still have to be returned at the time of this writing.
Thanks goes out to Stephen Ecob from Silicon On Inspiration in
Australia, providing the much needed hardware building blocks on
more than one occasion and his continued support essential during
the production of the prototype. Also thanks to Alice for sourcing the
much needed BGA components in Hong Kong, and folks at Proto-
Service for providing their field expertise in the final BGA assembly.
1 i n t r o d u c t i o n 1
1 .1 Problem definition . . . . 2
1 .2 Contributions . . . . 3
1 .3 Thesis outline . . . . 4
2 b a c k g r o u n d 6 2 .1 Digital image sensors . . . . 6
2 .2 High-speed imaging . . . . 10
2 .3 Image sensor arrays . . . . 12
3 h i g h -level system design 14 4 h a r d wa r e d o m a i n i m p l e m e n tat i o n 17 4 .1 Sensor receiver interface . . . . 17
4 .1.1 Physical signaling . . . . 17
4 .1.2 Timing characteristics . . . . 21
4 .1.3 FPGA implementation . . . . 24
4 .2 Streaming DRAM controller . . . . 25
4 .2.1 Dynamic Random-Access Memory (DRAM) . . 26
4 .2.2 Access pattern predictability . . . . 28
4 .2.3 DRAM protocol . . . . 29
4 .2.4 Peak transfer rate . . . . 32
4 .2.5 Protocol state machine . . . . 33
4 .2.6 FPGA implementation . . . . 36
4 .2.7 Command arbitration . . . . 37
4 .3 Stream interleaver . . . . 38
4 .3.1 FPGA implementation . . . . 40
4 .4 Embedded control processor . . . . 41
4 .5 Sensor control interface . . . . 42
4 .5.1 I
2C protocol overview . . . . 43
4 .5.2 Camera control interface (CCI) . . . . 43
4 .5.3 FPGA implementation . . . . 44
4 .5.4 Phased start . . . . 45
4 .6 Readout interface . . . . 48
4 .6.1 FPGA implementation . . . . 48
4 .7 Clock domain crossing . . . . 49
5 s o f t wa r e d o m a i n i m p l e m e n tat i o n 51 5 .1 Embedded control . . . . 51
5 .2 Stream decoder . . . . 53
5 .2.1 Deinterleaving . . . . 54
5 .2.2 Bit slip correction . . . . 54
5 .2.3 Protocol layer decoder . . . . 55
5 .3 Image rectification . . . . 56
5 .3.1 Mathematical models . . . . 57
5 .3.2 Camera intrinsics and extrinsics . . . . 58
5 .3.3 Rectification . . . . 62
5 .3.4 Camera calibration . . . . 62
6 d ata f l o w a na ly s i s 65 6 .1 Throughput analysis . . . . 66
6 .1.1 Scenario-aware dataflow graph . . . . 67
6 .1.2 Throughput analysis . . . . 70
6 .1.3 Effects on array size . . . . 71
6 .2 Buffer size analysis . . . . 71
6 .2.1 Latency-rate SRDF graph . . . . 72
6 .2.2 Real-world case analysis . . . . 75
7 r e a l i z at i o n a n d r e s u lt s 78 7 .1 Sensor array configuration . . . . 78
7 .2 Choice of hardware . . . . 79
7 .3 Synthesis results . . . . 81
7 .4 Measurement setup . . . . 85
7 .5 Experimental results . . . . 86
8 c o n c l u s i o n 90 8 .1 Future work . . . . 92
i a p p e n d i x 94
a s a d f g r a p h l i s t i n g 95
b s t r o b o s c o p e p r o g r a m l i s t i n g 98
b i b l i o g r a p h y 100
Figure 1 Block diagram showing a simplified overview of the processes involved in a modern color image sensor, from incoming scene light to fi- nal output image. . . . 6 Figure 2 Example of a Bayer pattern encoded image (left)
and the resulting full color image after the de- bayering process (right). . . . 7 Figure 3 Readout architectures for conventional CCD and
CMOS sensors. Photodiodes and capacitive el- ements are respectively colored dark green and yellow. . . . 8 Figure 4 Rolling shutter mechanism in action. Left shows
the shutter’s direction of movement in terms of pixel rows. Right contains a plot of row ex- posure over time, clearly showing the sliding window of time. . . . 9 Figure 5 Simplified view of two active pixels with dif-
ferent shutters in a CMOS sensor at the sili- con level. The addition of a memory element causes occlusion of an area of the photodiode. 10 Figure 6 Talbot’s high speed photography experiment.
A paper disc is captured at standstill (left), spin- ning and captured with an exposure time of 10 ms (middle), and with an exposure time of 1 ms (right). Images courtesy of [VM11]. . . . 11 Figure 7 The Stanford Multi-Camera array setup as de-
scribed in [Wil+05]. . . . 12 Figure 8 Abstract system diagram illustrating the hard-
ware and software domain boundaries, their high-level components and the corresponding dataflows in the system. . . . 14 Figure 9 Hardware domain diagram showing its vari-
ous subsystems and components. Orange in-
dicates a component external to the hardware
domain, grey is a custom subsystem imple-
mented by FPGA logic, yellow is a logic ele-
ment and green represents software. Arrows
indicate dataflows, with red representing the
capture stage, and blue representing the read-
out stage. . . . 15
Figure 10 Software domain diagram showing its various subsystems and components, including the em- bedded control. Orange indicates a component external to the software domain, grey is a cus- tom subsystem implemented in software and yellow is a specific software component. Ar- rows indicate dataflows, with red representing the primary video dataflow. . . . 16 Figure 11 Standard parallel CMOS sensor interface. Data
lines D
0through D
Nrepresent the N-bit en- coded pixels of the output image. . . . 18 Figure 12 LVDS output circuitry: a current-mode driver
drives a differential pair line, with current flow- ing in opposite directions. Green and red de- note the current flow for respectively high ("one") and low ("zero") output values. . . . 19 Figure 13 Simplified fragment of a video data transmis-
sion using MIPI CSI-2 (top) and HiSPi (bot- tom). LP indicates logic high and low using single-ended signaling, while all other signals are LVDS. Green respresents image data, all other colors represent control words as defined by the respective protocols. . . . 20 Figure 14 Typical CMOS sensor layout in terms of physi-
cal pixels (left) and corresponding structure of the image frame readout data (right), as found in the Aptina MT9M021 CMOS sensor [Apt12b]. 21 Figure 15 Topology within a rank of DRAM devices. A
bank consists of multiple DRAMs, each of which supplies an N-bit data word using row and column addressing. . . . 27 Figure 16 Common layout of a DDR3 DIMM, populated
at both sides with multiple DRAMs. The DIMM is typically structured in such a way that each side represents a single DRAM rank. . . . 27 Figure 17 Simplified illustration of the pipelined DRAM
command timing in case of two BL8 write re- quests. Green and blue indicate any commands, addresses and data bits belonging to the same request. . . . 32 Figure 18 Original simplified DRAM protocol state dia-
gram as in [JED08]. Power and calibration states have been omitted. . . . 34 Figure 19 Redesigned DRAM state diagram optimized
for sequential burst writes. Burst-related states
are highlighted in blue. . . . 35
Figure 20 I
2C data transmission using 7-bit slave address- ing with two data words as described in [Phi03].
Grey blocks are transmissions from master to slave, white are from slave to master and green depends on the type of operation. Bit widths are denoted below. . . . 44 Figure 21 Example CCI data transmission addressing an
8 -bit register using a 16-bit address. Grey blocks are transmissions from master to slave, white are from slave to master and green depends on the type of operation. Bit widths are denoted below. . . . 44 Figure 22 Phased start with 8 sensors plotted against time.
Dark regions represent exposure duration, and light regions represent readout duration for all sensors. . . . 46 Figure 23 The two different types of synchronizers used
in our system. Green and blue represent the two different clock domains, and red acts as stabilizing logic in between. . . . 50 Figure 24 Simplified fragment of a video data transmis-
sion using MIPI CSI-2 (top) and HiSPi (bot- tom). LP indicates logic high and low using single-ended signaling, while all other signals are LVDS. Green respresents image data, all other colors represent control words as defined by the respective protocols. . . . 55 Figure 25 OpenCV Feature detection using a planar checker-
board pattern as mounted to a wall. The visual marker in the center is not actually used in the algorithm. . . . 63 Figure 26 SADF graph representing the video streaming
behaviour in our system and used as a basis for dataflow analysis. . . . 68 Figure 27 Corresponding FSM for the Dref node in our
dataflow model, representing write and refresh states with write as initial state. . . . 68 Figure 28 Imaginary FSM for the Dref node in our dataflow
model, using counters to switch between states and scenarios. . . . 70 Figure 29 Simplified latency-rate model of our system,
with the SRDF graph at the top and the cor-
responding task graph at the bottom. Dotted
lines represent the imaginary flow of data in
the system, and are not part of the actual graph. 72
Figure 30 Stages of development for the camera module PCB, from CAD design to production to pick and placing of SMD and BGA components. . . 80 Figure 31 Final prototype hardware, showing the five in-
dividual sensor modules (left) and the final setup as evaluated with the modules and FPGA hardware connected (right). . . . 81 Figure 32 Stroboscope hardware, showing an array of 10
area LED lights driven by 10 MOSFETs and connected to a real-time embedded system. . . 85 Figure 33 Timing characteristics of the stroboscope, strob-
ing at a 1000 Hz rate, showing two LED-driving MOSFETs being subsequently triggered at 1 ms with an error of only 0.02%. Measured using a 4 GHz logic analyzer. . . . 86 Figure 34 Final frames as captured by four sensors of the
prototype at 240 Hz after being processed in the software domain, ordered from top-left to bottom-right and showing a seamlessly over- lapping capture of the timed stroboscope lights from sensor to sensor. Each subsequent frame is captured by a different sensor, at different subsequent points in time. Stroboscope lights run from bottom left to top right in time, as indicated by the white arrow in the first image. 87 Figure 35 Sensor timing diagram corresponding to our
four sensor prototype at 240 Hz and the frames in Figure 34 . Our stroboscope LED timing is shown on top, with each number and color representing one of the 10 LEDs that is turned on at that time for 1 ms. Dark green represents a single frame, or the light integrated by a sin- gle sensor in the time domain for a single frame exposure, i.e. the first frame captures light from 4 different strobe LEDs. Light green represents the unavoidable time in which a sensor’s shut- ter is closed and image data is read out to our system. When all frames are combined, they in fact form a continuous integration of physical light with an exposure time and capture rate equal to 4.2 ms or 240 Hz as witnessed in Fig- ure 34 . . . . . 88 Figure 36 Timing characteristics of the I
2C exposure start
commands as sent to different sensors in case of a hypothetical 1200 Hz (833 us) capture rate.
Measured using a 4 GHz logic analyzer. . . . . 89
1
I N T R O D U C T I O N
Over the past century, the world has seen a steady increase in techno- logical advancement in the field of digital photography. Now more than ever, consumers rely on the availability of advanced digital cam- eras in personal devices such as mobile phones, tablets and personal computers to snap high resolution pictures and videos, all the while cinematographers in the field find themselves with an ever increasing variety of very high-end cameras. And in between these two markets, we have witnessed the rise of an entirely new "prosumer" segment, wielding action cameras and 4K camcorders in the lower high-end spectrum to capture semi-professional video.
Some of the most obvious advancements in cameras can be found in higher resolution imaging capabilities of sensors, as well as semi- conductor production techniques that have seen vast improvements over the years. As a result, sensors have become smaller, more capa- ble, and cheaper to produce, and the cost of including such a sensor of acceptable quality in a new embedded consumer product is rela- tively low, especially in the lower segment of products. Furthermore, some parts of these sensors have become standardized in the indus- try, practically turning many imaging sensors and related technology into commercial off-the-shelf components.
Though, one of the areas of videography that has seen increasing demand but quite conservative technological innovation is the field of high-speed imaging. With the advent of television and on-line docu- mentary series revolving around the capture of slow motion footage, consumers and prosumers have been voicing their interest in want- ing to create slow motion video at home. While professional high- speed cameras have long been available, their prices are far outside the reach of these markets, often costing up to three orders of magni- tude more than conventional cameras. The reason for this is quite sim- ple: high-speed imaging puts extreme demands on an imaging sensor and its surrounding platform in terms of bandwidth, noise and tim- ing as we will see in this thesis, therefore raising the cost price of a single high-speed imaging sensor into the hundreds or thousands of dollars, let alone the required effort and expertise to design the surrounding hardware.
Of course, this does not mean that there is no other way to pro-
vide technological innovation in this field. Taking into account that,
at the lower end, imaging sensors are becoming more standardized
and cheaper, the question arises as to whether it is now possible to
use multiple low-end sensors to achieve the same as a single high-end
sensor. This trend of combining commodity-class or low-end compo- nents is already being applied by big players such as in the online and cloud computing industry, and can prove to be cost-effective if the components can be properly combined by means of a surrounding platform of hardware and software that is scalable in the numbers.
This thesis focuses on this very idea of using multiple low-cost components to create something much bigger: an embedded system platform, that, not only achieves high-speed imaging capabilities by using an array of imaging sensors, but also provides a scalable and reusable design for future systems with increasingly larger config- urations. Finally, a small-scale prototype based on this platform is produced and evaluated to assess the real-world viability of such a product.
1 .1 p r o b l e m d e f i n i t i o n
In order to investigate, design, implement and ultimately realize such an embedded system platform, we set out the following objectives for this thesis:
1 . Investigate the viability of using a sensor array for high-speed imaging applications —
a) Determine the trade-offs of using multiple sensors versus a single sensor;
b) Identify the negative side-effects, and how to mitigate their effect;
2 . Research and design an embedded system platform that can be used to realize a scalable array of image sensors —
a) Determine relevant hard- and software domains and de- sign subsystems that fit within these domains;
b) Identify any respective bottlenecks in these subsystems and how to mitigate their effect on an implementation;
3 . Realize an embedded system using this platform capable of high-speed imaging capture using a sensor array —
a) Develop a capable hardware design implementing the em- bedded system platform;
b) Implement the software domain solutions and integrate these with the hardware design;
c) Design and implement a hardware setup that can be used to verify the high-speed imaging capabilities of the system;
d) Verify the high-speed imaging capabilities of the system in
a real-world setup.
Based on these objectives, we define our research question to be the following:
Is it viable to use an array of image sensors for high-speed imaging in an embedded form factor, and if so, which hardware and software domain components would be required to implement such an embedded system?
A solution is considered to be viable if the following theoretical evaluation criteria are met:
• Capable of interfacing with at least 16 image sensors at a cap- ture rate of 60 Hz leading to an effective total capture rate of 960 Hz.
• Capable of issuing phased start commands to image sensors with a timing accuracy of at least 99%, as further explained in Chapter 4 .
These criteria are especially important as they allow the production of a high-speed video using the system, as we will see in this thesis.
1 .2 c o n t r i b u t i o n s
This thesis makes a major contribution in the field of sensor array research by designing a novel embedded system platform that is con- figurable, scalable and complete. This platform covers all necessary aspects in hardware and software necessary in order to enable high- speed image capture using an array of image sensors. Currently, no other known documented solutions exist describing such an embed- ded system.
As part of this design, this thesis introduces a number of hardware subsystems that allow for interfacing with a varying number of image sensors and capturing their corresponding data. This includes a novel and specialized streaming DRAM controller specifically targeted at writing streamed data, as well as a custom interleaver that combines multiple streams of sensor image data into single coherent DRAM write commands.
Building upon these hardware subsystems, this thesis also presents corresponding software subsystems that interface with this hardware in order to properly transform the captured image data into high- speed video. This includes a novel HiSPI sensor protocol decoder, an embedded control system used to configure the sensor array, and a video streaming pipeline that includes a custom deinterleaver as well as options for image rectification and camera calibration.
This platform effectively establishes a significant scale reduction
of all prior documented efforts into an embedded and potentially
handheld form factor. This kind of hands-on description has not been
seen before in this field of research.
The platform is further solidified by the actual production of a pro- totype, showing real-world viability and presenting future opportu- nities to allow for an even finer product. This real-world exemplifies the actual practical use of the sensor array as an end product, which is often overlooked in existing research.
This thesis does not contribute any new ideas and algorithms in the field of image rectification or distortion caused by sensor arrays, al- though efforts are made to include current state of the art and proven image techniques in the software domain.
1 .3 t h e s i s o u t l i n e
This introductory chapter has so far introduced the reader to the scope and outline of this thesis, as well as the problem definition and research question.
Chapter 2 provides background information on the various topics that are fundamental to this thesis. These topics include the use and technical breakdown of digital imaging sensors, high-speed imaging and camera arrays. Insight is also provided into prior efforts in the field of high-speed imaging using camera arrays to provide perspec- tive in how this thesis contributes to this field.
Chapter 3 describes the top-level design of the system as presented in this thesis. It provides an overview of the system in terms of the various subsystems that are to be implemented in one of the two associated domains of hardware and software.
Chapter 4 deals with the relevant subsystems in the hardware do- main, making it possible to interface with a configurable number of sensors. This includes solutions for clock domain crossing, as well as providing an elaborate description of the design of the sensor receiver interface, memory controller, interleaver, embedded processor, sensor control interface and readout interface.
Chapter 5 presents the subsystems in the software domain that deal with transforming the raw data from the hardware domain into useful video data. This chapter describes the sensor protocol decoder, as well as solutions for image rectification and camera calibration.
Chapter 6 is concerned with the theoretical dataflow analysis in which dataflow models are presented that models the primary bot- tlenecks in the system, and analysis is performed to determine the throughput of the system and any involved buffers while discussing the viability of the system.
Chapter 7 describes the experimental results of this thesis. This
chapter provides insight into the real-world hardware prototype that
was produced, implementing the presented platform. A setup is used
to verify the produced video of this prototype, and timing analysis
is done to verify the timing accuracy of the hardware, and thus the
real-world viability of the platform as a whole.
Finally, chapter 8 concludes this thesis by providing a brief sum-
mary of this thesis and including final thoughts on future work.
2
B A C K G R O U N D
Today, our world is filled with embedded systems with wildly vary- ing applications, often relying on the use of techniques for digital image capture and processing. The decreasing cost of digital imag- ing allows for more and more imaging systems to appear in different markets, from consumer photography (e.g. digital cameras and cam- corders), to industrial or military (e.g. product verification, surveil- lance and biometrics) and far beyond.
2 .1 d i g i ta l i m a g e s e n s o r s
These digital imaging systems, such as the one presented in this the- sis, rely on the use of image sensors. And although the basic concept of an image sensor seems simple, e.g. to convert incident light or image into a digital representation, the actual implementation of an image sensor varies considerably depending on the technology that is being used.
Figure 1 describes a simple overview of a modern color image sen- sor. First of all, incident light from the scene is focused by optics, typi- cally one or more lenses. This focusing, like in most imaging systems or cameras, is necessary to ensure that a certain view of the scene is focused onto a capturing plane behind it. The light then passes through a color filter array (CFA) that contains a predefined pattern of colors and is then captured and converted into the analog signal domain by photodetectors directly behind the filter. The analog sig- nals are ultimately processed by an on-chip analog signal processor and converted into the digital signal domain, after which a variety of processing techniques can be used to produce a final image.
f u l l c o l o r i m a g i n g The photodetectors or photodiodes them- selves are semiconductors sensitive to charged particles and photons.
Scene Imaging Optics Filter & Pixel Array Analog-to-Digital Post-Processing
Figure 1: Block diagram showing a simplified overview of the processes in-
volved in a modern color image sensor, from incoming scene light
to final output image.
Figure 2: Example of a Bayer pattern encoded image (left) and the resulting full color image after the debayering process (right).
These work by absorbing particles and photons and emitting a volt- age proportional to the indicent power, as described in [Big+06], and are thus oblivious to color as they only describe a relation between the amount of light and an analog signal. By placing a filter in front that only passes light in a certain color spectrum range, they can be utilized to only detect the amount of light of a certain color.
In practice, the color filter array that is placed in front of the pho- todetectors often consists of a so called Bayer-pattern that only allows a single color to pass through, making each single photodetector pixel sensitive to a predefined color. Each of the pixels in the sensor’s out- put image only encodes the intensity of a single specific color such as red, green or blue. To produce a final full color image, a spatial in- terpolation process known as demosaicing is used, which interpolates multiple single-color pixel values to produce single full color pixels containing respective values for red, green and blue. Demosaicing is an active topic of research, and different interpolation techniques currently exist, as shown in [LGZ08]. An example of a Bayer pattern image and resulting full color image can be seen in Figure 2 .
c m o s a n d c c d s e n s o r a r c h i t e c t u r e As described in [EGE05], the array of photodetectors introduced before varies significantly be- tween different types of image sensors. Many of the difference arise from the way the array is structured and read out into the analog signal domain, also referred to as the readout architecture, and these differences can be seen in Figure 3 . Modern image sensors generally come in two different types, complementary metal-oxide semiconductors (CMOS) and charge-coupled devices (CCD), and although CCD-type sensors have traditionally been associated with high quality but high cost, recent advancements in CMOS-technology have allowed for the introduction of lower cost CMOS-type sensors approaching the qual- ity of their CCD-type counterparts.
In a modern CCD-type sensor, the array is typically built out of
photosensitive capacitors that passively collect and store a charge for
Horizontal CCD
Vertical CCD
(a) CCD (interline transfer).
Row Select Logic
Analog Signal Processing
Column Analog-to-Digital Conversion
Column Multiplexer
Timing and Control
(b) CMOS (active pixel sensor).
Figure 3: Readout architectures for conventional CCD and CMOS sensors.
Photodiodes and capacitive elements are respectively colored dark green and yellow.
each pixel during exposure. During readout, charge is shifted out of the array, step by step, into horizontal and vertical CCDs and con- verted to the analog signal domain by means of an amplifier circuit, after which it is sampled and converted by the analog signal proces- sor.
In a modern CMOS-type sensor, the array is typically built out of an active pixel circuit containing a photodetector and amplifier that produces an analog signal for each pixel while being exposed. Dur- ing readout, the analog signals of each row of the array are selected, sampled and integrated, one by one, before being converted by the analog signal processor.
The cost-effectiveness of CMOS sensors arises from differences in the manufacturing process, also described in [Fos97], where integra- tion of a significant amount of on-chip VLSI electronics is possible at a lower cost. This makes it possible to create single solution chip CMOS sensors that contain very elaborate analog signal processors, as opposed to expensive off-chip analog signal processing typically required for CCD sensors.
On the other side, CMOS sensors suffer from a high level of fixed pattern noise due to variations in the manufacturing process as well as other forms of noise such as dark current noise that reduce the effective signal-to-noise ratio (SNR) of the sensor.
In practice, CCD sensors are often seen in highly specialized mar-
kets such as astronomy and microscopy, while CMOS sensors can be
found in a much wider range of applications, including mobile and
consumer electronics, due to their low cost and good quality.
X
Y
readout reset
rows
time exposure
exposure
Figure 4: Rolling shutter mechanism in action. Left shows the shutter’s di- rection of movement in terms of pixel rows. Right contains a plot of row exposure over time, clearly showing the sliding window of time.
r o l l i n g a n d g l o b a l s h u t t e r s Like conventional film cam- eras, image sensors also require the use of a shutter that controls the exposure of the photodetectors. As mentioned before, each sensor generally has two steps of action: exposure and readout. Both before and after exposure, this shutter prevents any incident light from hav- ing an effect on the photodetectors, such that the photodetectors are only exposed to light for a predetermined amount of time.
Differences in the sensor’s shutter mechanism can, in some cases, have a profound effect on the final output image of the sensor. As opposed to film cameras, sensor shutters are often electronic and op- erate by resetting or decharging the individual pixel circuits of the array, but the shutter timing changes significantly depending on the type of sensor used.
Note that the CCD sensor stores a charge for each individual pixel during and after exposure, essentially capturing and storing a com- plete image inside the sensor until it is read out of the array. This type of shutter mechanism is referred to as a global shutter and in a sense resembles that of a film camera. This mechanism differs greatly from that of the conventional CMOS sensor. Due to the way CMOS sensors are manufactured, only a few rows can be selected, exposed, sampled and integrated in the entire sensor at a time. This principle is known is a rolling shutter mechanism, and is shown in Figure 4 .
The direct result of the rolling shutter is that different rows of sen- sor pixels are only exposed at different points in time. In other words, not all pixels in a final output image will have been captured in the same window of time. While this shift in time simply does not matter for scenes in which there is little movement, significant distortion ar- tifacts will manifest themselves in the image as soon as fast moving objects are captured.
The introduction of an effective global shutter to CMOS sensors is
one of the most sought-after features in the industry, and is an active
research topic in digital imaging. Examples of global shutter imple-
mentations are [Apt12a] and [Fur+07], in which the active circuitry of
Photodiode Metal layer
Metal layer
(a) Rolling shutter pixel.
Photodiode Metal layer
Metal layer
Memory
(b) Global shutter pixel.
Figure 5: Simplified view of two active pixels with different shutters in a CMOS sensor at the silicon level. The addition of a memory ele- ment causes occlusion of an area of the photodiode.
each pixel is augmented with a sample and hold or memory element that stores the analog signal until all rows of the entire array have been read out. Although the CMOS readout architecture still requires a row-by-row readout, the individual pixel memory elements now allow for the entire array to be exposed at the same time.
The difficulty of adding memory elements to the array lies in the fact that the memory element itself introduces varying degrees of sig- nal contamination. In Figure 5 , it can be seen that the element itself takes up physical space in the pixel array. As such, incident light can never be fully focused on the photodiode due to optical imperfections, and some may fall on the element as well. This stray light contami- nates the analog signal stored in the element, and must be avoided by covering up the element with shielding. Furthermore, as the analog signal is stored in the element, unwanted dark current may accumu- late during storage, requiring careful design of the underlying silicon to decrease the negative effect on the signal-to-noise ratio.
2 .2 h i g h -speed imaging
Paramount to this thesis, the concept of high-speed imaging is best de- scribed as the still frame capture of fast moving objects, which, when used in the context of video, also implies an equivalently high cap- ture frame rate. It is used to analyze real-world events that are diffi- cult to capture with conventional digital cameras, such as automotive crashes, ballistics, golf swings, explosions, and so on.
High-speed photography itself has a long history, which started
well before the practical invention of video. The first documented
experiments were done in the 19th century, in a time when film cam-
era shutters were still very crude and slow mechanisms, hence freez-
ing the motion of fast moving objects was difficult. As described in
Figure 6: Talbot’s high speed photography experiment. A paper disc is cap- tured at standstill (left), spinning and captured with an exposure time of 10 ms (middle), and with an exposure time of 1 ms (right).
Images courtesy of [VM11].
[Ram08], these early experiments featured a setup with a fast spin- ning paper such that the text on it was unreadable by the human eye.
However, by using an arc flash with a duration of 1/2000 of a second, still images with sharp readable text could now be reliably produced.
Figure 6 shows a modern version of the spinning disc. In order to capture sharp still images, the exposure of the scene’s light to the cam- era must be minimalized: either by opening and closing the shutter very quickly, or by using a single light source that flashes for a frac- tion of a second. If the exposure is too long, fast moving objects in the image will appear fuzzy and blurred due to the accumulation of light at different positions along the object’s frame of movement.
A multitude of different high-speed imaging cameras are available
today and include not only high-end cameras for industrial and sci-
entific use, but also low-end consumer cameras, spurred by a recent
consumer and professional interest in high-speed photography. Ex-
amples include the fps1000 and Edgertronic projects, both of which
have been successfully funded and delivered through crowdfunding
platforms such as Kickstarter and feature a maximum high-speed cap-
ture rate of respectively 550 Hz and 701 Hz at 720p resolution as in
[Per15]. In comparison, high-end cameras such as the Photon Fastcam
SA series are capable of capturing rates up to 7000 Hz at a 1024x1024
resolution, shutter times in the microseconds, and are often the first
choice in scientific research dealing with physical and chemical phe-
nomena such as in [Gül+12] and [BS14]. Nevertheless, these high-end
cameras come at a high price and are often only available on a daily
rental basis, placing them out of reach for normal consumer and "pro-
sumer" markets. Despite the relatively low-end specifications of prod-
ucts such as fps1000 and Edgertronic, price tags of respectively $1500
and $5495 and successful crowdfunding campaigns have shown that
there might be an untapped market demand underneath the existing
high-end products.
Figure 7: The Stanford Multi-Camera array setup as described in [Wil+05].
2 .3 i m a g e s e n s o r a r r ay s
One aspect that virtually all high-speed imaging products have in common today is that they are designed around a single high-speed CMOS imaging sensor. Depending on the target market and cost price of the final product, a design choice is made from a variety of high- speed imaging sensors with different low- or high-end performance characteristics and matching cost prices such as the CMOSIS series ([WM15]) and Sony IMX or Exmor series ([Son11]). These sensors are often at the forefront of CMOS technology, incorporating cutting edge technology such as back-illuminated sensors, novel global shut- ters and capacitive storage elements to mitigate the sensitivity and bandwidth issues that come into play when imaging at high speed and short shutter time. This in turn demands a high sensor unit cost price, which in turn dictate the minimum cost price of these camera products.
Keeping in mind that conventional high-speed camera products are thus still dependent on one or more relatively high cost components, we investigate the possibility of using an alternative design that re- moves the dependency on these high cost components. We consider an alternative design that does not use a single high cost sensor, but rather multiple low cost sensors, configured in an array, to achieve these high-speed imaging capabilities.
One of the earliest projects involving the use of multiple cameras
in a similar field of research is [Wil+01] at Stanford University. In
this thesis, light field data is captured using an array of up to 64
custom camera boards connected by IEEE1394 bus to one or more
video processing host PCs. Here, the camera boards are custom de-
signed individual single-board computers containing a microproces-
sor, IEEE1394 chipset, MPEG2 video encoder, Xilinx Spartan FPGA
and Omnivision CMOS image sensor. A continuation of this thesis
is described in [Wil+04], in which an identical hardware setup, also
seen in Figure 7 , is used for high-speed imaging and which is practi-
cally the first documented research involving a camera array for high-
speed imaging. The camera array consists of 52 Omnivision sensors, each capturing at a rate of 30 Hz, providing a video stream that is fur- ther processed by two or more host PCs. The research itself is mostly concerned with dealing with the side effects of using a camera array for high-speed imaging, and therefore provides fundamental insights in correcting geometric distortion, rolling shutter distortion and cam- era timing, which we will also cover in this thesis.
Further in-depth research on this particular project at Stanford is
described in [Wil05]. Here, a number of possible applications for cam-
era arrays are researched and include synthetic aperture photography
(SAP), spatiotemporal view interpolation (SVI) and high-speed imag-
ing. While SAP and SVI are useful in the sense that they provide
image aperture manipulation and improved image quality, we focus
on the high-speed imaging research as this closely fits the scope of
this thesis. Finally, the Stanford setup is further refined in terms of
hardware and system architecture in [Wil+05], and image quality is
evaluated by comparing with a conventional digital camera showing
encouraging results.
3
H I G H - L E V E L S Y S T E M D E S I G N
The purpose of the embedded system platform presented in this the- sis is to interface with an array of image sensors in order to allow high-speed imaging. The platform, or system, is composed of a num- ber of novel custom hardware and software components that imple- ment the required functionality to ultimately produce a video stream containing high-speed images.
In this chapter, an overview of the system as a whole is presented at increasing levels of detail. We simply begin by establishing the different domains that make up this platform:
• Hardware domain. This domain contains all components imple- mented in hardware, or more specifically, a reconfigurable FPGA platform. The use of a FPGA allows for rapid design and proto- typing, and is thus fundamental to the platform.
• Software domain. This domain contains all components imple- mented in software. The components represent the programs that run on CPU-based architectures either embedded in or con- nected to the hardware domain.
In Figure 8 , a system diagram can be seen describing these do- mains and the components involved at a high and abstract level. In this diagram, the primary dataflow is clearly illustrated and effec- tively begins in one or more image sensors, flows through the FPGA and RAM in the hardware domain, and ultimately ends up at the host in the software domain. A number of secondary dataflows also exist to ensure that the primary dataflow is controlled. Furthermore, while the separation of hardware and software domain boundaries is gen- erally clear, the software domain also covers a small part of software used within the hardware domain, namely that of the embedded con- trol processor as we will see later on in this thesis.
Figure 8: Abstract system diagram illustrating the hardware and software
domain boundaries, their high-level components and the corre-
sponding dataflows in the system.
Figure 9: Hardware domain diagram showing its various subsystems and components. Orange indicates a component external to the hard- ware domain, grey is a custom subsystem implemented by FPGA logic, yellow is a logic element and green represents software. Ar- rows indicate dataflows, with red representing the capture stage, and blue representing the readout stage.
h a r d wa r e d o m a i n The hardware domain represents the bulk of components in the system, and for good reason. At the highest level, the hardware simply performs two dataflow stages: first to capture as much raw data as possible from all image sensors, and secondly to read out and transmit all this captured data to the software domain for further processing. Note that these two stages do not have to occur at the same time due to the nature of video capturing where data is generally first captured to a storage medium, and then offloaded to another system for further use.
The capture stage is assumed to be a unique and critical stage that cannot be interrupted or paused, as this may lead to data corrup- tion, making it hard real-time by definition as per [But11]. Needless to say, the hardware domain must then provide guarantees that no overflows ever occur in the system in order to avoid fatal data cor- ruption. In contrast, the readout stage occurs after the capture stage has completed, and can occur at any speed. Since it is fully off-line and non-critical, this thesis therefore imposes no timing constraints on this stage.
All components involved in the hardware domain can be seen in Figure 9 . As has been mentioned before, the components or subsys- tems in this domain are implemented as FPGA logic using VHDL or Verilog together with vendor-specific primitives. All components are in fact custom implementations especially designed to implement critical functionality for this platform. In Chapter 4 , each of these sub- systems is described on a functional level. Note that due to the gritty details involved, implementation-specific details such as VHDL or Verilog code do fall outside the scope of this paper.
Next to the FPGA logic (grey components), Figure 9 also illustrates
a number of components (in orange) that are external to the domain
but nevertheless critical to the functionality of the system. Interfaces
to these components are provided by the FPGA logic inside the hard-
Figure 10: Software domain diagram showing its various subsystems and components, including the embedded control. Orange indicates a component external to the software domain, grey is a custom sub- system implemented in software and yellow is a specific software component. Arrows indicate dataflows, with red representing the primary video dataflow.
ware domain, enabling the primary data to flow from image sensor to the software domain.
s o f t wa r e d o m a i n The software domain largely starts where the hardware domain ends, implementing the necessary components in software to produce a video stream or file containing the captured high-speed images. It is also concerned with performing command and control in order to set up the dataflow at various points in the system, as well as providing auxiliary functionality such as DRAM self-testing.
Figure 10 shows the components and subsystems that make up the
software domain. Here, all subsystems except for those in the dotted
lines represent new custom pieces of software that have been explic-
itly designed for this platform. Note that the diagram also shows a
number of components inside the hardware domain. As mentioned
before, the software domain also covers the software specially writ-
ten for the control processor that is embedded inside the hardware
domain. All subsystems except for those in this embedded part run
on a host computer and are connected to each other through a video
stream pipeline software architecture known as GStreamer. Further
details are provided in Chapter 5 .
4
H A R D WA R E D O M A I N I M P L E M E N TAT I O N
In the previous chapter, an overview of the system was provided that explained the various domains involved in the design. This chapter will focus specifically on the details of the hardware domain, as il- lustrated in Figure 9 . Recall that each of the subsystems is a custom design specially designed to provide functionality specific to our sys- tem. Each of the sections in this chapter will describe the background, design choices and functional implementation of each of the subsys- tems.
4 .1 s e n s o r r e c e i v e r i n t e r f a c e
The very first input of the entire system and thus the hardware do- main is in fact the sensor array, consisting of a varying number of image sensors. Each of these sensors are physically connected to the FPGA that implements our hardware domain. We therefore require a subsystem in our hardware domain that implements an interface capable of capturing all raw data as physically received by the FPGA.
This section introduces a sensor receiver subsystem that interfaces with a single image sensor. This subsystem is designed such that it can then be instantiated multiple times, for each individual sensor, such that all raw data coming in from the entire sensor array can be captured.
We first provide some background on the physical signaling stan- dards relevant to the receiver, after which the timing parameters rele- vant to the receiver are introduced. Finally, the FPGA logic elements necessary for the functionality of the receiver are described.
4 .1.1 Physical signaling
Conventional CMOS image sensors have traditionally been equipped
with a parallel interface that utilizes simple single-ended signaling to
encode all the necessary video data. This parallel interface, which can
also be seen in Figure 11 , contains a certain number of parallel data
lines with a corresponding clock and synchronization signals. Typical
configurations are 8 or 12 data lines, equivalent to the sensor’s pixel
bit resolution, where each individual line represents a single bit of
the image pixel, and full pixels are thus transmitted on every rising
edge of the corresponding pixel clock. The advantage of the parallel
interface is obvious: it allows for very simple interfacing with low
complexity receivers.
VSYNC
HSYNC
D[N:0]
PCLK
active lines
active pixels
Figure 11: Standard parallel CMOS sensor interface. Data lines D
0through D
Nrepresent the N-bit encoded pixels of the output image.
The increase of sensor resolution, bandwidth and signal integrity requirements have however led to the adoption of serial interfaces such as Low Voltage Differential Signaling (LVDS). A typical differential interface circuit can be seen in Figure 12 and shows a current-mode driver, two equal transmission lines with opposed polarity and equal impedance, and a comparator at the receiving end.
Although serial transmitters require a higher clock frequency, they allow for a significant improvement in signal integrity of the overall system due to a number of critical factors, such as:
• Common-mode noise. For differential signaling, common-mode noise is injected on both transmission lines and is rejected at the receiving end as only the differential value is sampled.
• Switching noise. The design of differential current-mode drivers allows for a reduction of (simultaneous) switching noise and ringing in the transmission lines as well as the overall system.
• Crosstalk and electromagnetic interference. Differential signal- ing radiates less noise into the system than single-ended signal- ing since magnetic fields are canceled out by the two transmis- sion lines.
Because differential signaling reduces any concerns regarding noise, it is possible to use lower voltage swings in the transmission lines, typically 350 mV for LVDS and 150 mV for SubLVDS. In turn, reduc- ing the voltage allows lowers the overall power consumption and also allows higher data rates to be used, as data can be switched more quickly. Furthermore, the use of current-mode drivers creates an almost flat power consumption across frequency, such that the fre- quency can be increased without otherwise exponentially increasing power consumption [Gra04].
The resulting differential serial interface as implemented in an im-
age sensor typically replaces all single-ended signals of the parallel
DRIVER
RECEIVER
Figure 12: LVDS output circuitry: a current-mode driver drives a differential pair line, with current flowing in opposite directions. Green and red denote the current flow for respectively high ("one") and low ("zero") output values.
interface with just a differential clock line and only one or more dif- ferential data "lanes" onto which all relevant data is serialized. The number of required I/O pins per sensor are thus minimized, requir- ing less complex board designs. The control and data signals of the earlier parallel interface shown in Figure 11 are instead encoded as a continuous serial stream of bits, and specific leading and trailing bit patterns distinguish between different parts of data, as will be explained in Section 4 .1.1.
There are many varieties of LVDS on the market today, each with different characteristics suited for certain application or vendor spe- cific solutions. LVDS as defined in the original TIA/EIA-644 standard is generally considered to be the most common and original specifica- tion from which many of these implementations are derived. Varieties include BLVDS (Bus LVDS), M-LVDS (Multipoint LVDS), SCI (Scaleable Coherent Interface) and SLVS (Scalable Low-Voltage Signaling) and these varieties mainly exhibit differences in electrical characteristics such as voltage swing, common mode voltage, ground reference and output current [Gol11].
Note that this thesis focuses solely on the use of LVDS-capable sensors in its broadest sense, where the use of the term LVDS refers to LVDS and any of its many derivative transmission standards.
i n d u s t r y -standard interfaces Increased market demand for sensors with higher image resolution, greater color depth and faster frame rates means that current processor-to-camera sensor interfaces are being pushed to their limits. As far as these sensor interfaces go, bandwidth is not the only important design choice: robustness, cost-effectiveness and scalability are among the factors that have an important weight, especially for mobile devices [MIP16a].
As such, the MIPI (Mobile Industry Processor Interface) Alliance was
established to develop a new industry-standard interface, and has
SoT
SYNC PH EoT
SoF
SYNC FLR ... CRC IDL
SoT
SYNC PH ... PF EoT
LP HS (Short Packet) LP
MIPI CSI-2
HiSPi (Streaming-SP)
HS (Long Packet) LP
Figure 13: Simplified fragment of a video data transmission using MIPI CSI- 2 (top) and HiSPi (bottom). LP indicates logic high and low using single-ended signaling, while all other signals are LVDS. Green respresents image data, all other colors represent control words as defined by the respective protocols.
been rapidly replacing conventional parallel interfaces with their CSI, CSI-2, CSI-3 (Camera Serial Interface) specifications. These CSI speci- fications define the use of a physical communication layer and proto- col layer capable of transporting arbitrary sensor data to the receiver [Lim+10].
The MIPI Alliance promotes their CSI specifications as open stan- dards, though one should note that all CSI specifications are confi- dential. The actual degree of openness and risk of infringement for these specifications is therefore unclear
1. It is possible that this pol- icy has indirectly resulted in the introduction of HiSPi, an alterna- tive interface standard that is otherwise very identical to MIPI CSI-2 and designed by Aptina Imaging for use in their own line of LVDS- capable sensors [Sem15a]. As CSI-2 (and thus HiSPI) is currently the most commonly used standard for the class of sensors most relevant to this thesis, we will not further elaborate on the use of CSI, CSI-3 or any other recently introduced standards. Instead, we briefly elaborate on the physical layers of CSI-2 and HiSPi as far as the sensor receiver interface is concerned. In Section 5 .2, we further describe the protocol layer of these standards.
The CSI-2 and HiSPi specifications define a physical layer that spec- ify the transmission medium, electrical circuitry and the clocking logic used to accomodate a serial bit stream. In the case of CSI-2, this physical layer is referred to as the D-PHY and is composed of between one and four (generally) unidirectional data lanes and one clock lane, capable of transmission in one of two modes known as Low Power (LP) and High-Speed (HS) mode.
In HS mode, each lane is terminated on both the transmitter and receiver side and is driven by a SLVS transmitter, allowing for high- speed bursts of protocol packets. In LP mode, all lanes are instead
1 MIPI is an independent, not-for-profit corporate entity that requires an active mem-
bership for the disclosure of any of the confidential specification documents. Mem-
berships require payment of dues ranging from $4.000 to $40.000, yet any company
can apply to join [MIP16b]
ACTIVE PIXELS DARK PIXELS
DARK PIXELS
ACTIVE PIXEL COLUMNS
ACTIVE PIXEL ROWS
(a) Physical CMOS pixel layout.
ACTIVE PIXELS (VALID IMAGE DATA)
HORIZONTAL BLANKING
VERTICAL BLANKING HORIZONTAL/VERTICAL BLANKING 00 00 00 00 ... 00 00 00 00 00 00 00 00 ... 00 00 00 00 00 00 00 00 ... 00 00 00 00 00 00 00 00 ... 00 00 00 00
00 00 00 00 ... 00 00 00 00
00 00 00 00 ... 00 00 00 00