Object Tracking with Stochastic Computing

(1)

Bachelor Informatica

Object Tracking with

Stochas-tic Computing

Dennis Vermeulen

June 8, 2018

Supervisor(s): drs. A. van Inge

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

Stochastic Computing has been on the rise in the academic community for its methods to perform specific arithmetic computations at a lower cost of power, while at the cost of ac-curacy, compared to Binary Computing. Most recent applications in Stochastic Computing are found in image processing. This thesis researches if it is possible to create a Stochastic object tracking algorithm that compares concurrent frames. A simple object tracking al-gorithm is designed using a hybrid of Stochastic and Binary Computing and is successfully able to detect a moving object in a video stream with a fixed background. The algorithm also scales linearly in image size, which would enable it to scale with equal performance on increased image sizes.

(4)

(5)

Introduction

Modern computing has solely relied on one system to do calculations: the Digital System. This system has evolved to be the most effective system in the current environment, but depending on digital computation also has some drawbacks. Bit flipping is found to be a huge issue in digital systems where bits differ in significance based on their position in a binary number. One bit changed by electrical interference could lead to a difference of 264 _{in most modern systems.} The further miniaturization of transistors enhances the urgency of this issue, as they are not guaranteed to be reliable anymore[15]. The increase in processing power gained though smaller transistors is enough to offset the performance loss of error correcting algorithms, this however shows another issue in the current environment. The increase of complexity in these chips also increases the power consumption. This is mainly an issue in embedded systems where resources are a lot more limited.

Stochastic Computing(SC) is an alternative computing system compared to Digital Comput-ing(DC). Instead of working with numbers as groups of bits where each bit is counted as 2n (where n is the position of the bit) when it is 1, it uses streams of bits where the probability of a bit being 1 is the number you are working with. For example the stream of bits (0001) or (0100) or (00100100) are all ways to show the probability of 1₄. This method of working allows certain calculations like multiplication to be done using less complex architectures than their digital counterpart[2]. Stochastic Computing is also by itself more error tolerant, as each bit has the same significant value. For example if a bit would be flipped in the following stream (01010101) to (00010101) it would change the number from 1₂ to 3₈, an error of 25%. In contrary to the Digital system, where this would have changed the number from 85 to 21, an error of 76%. Another interesting aspect of SC is how its data is close to analog data[1]. For example the neural system of humans works in a similar way, where the output of neurons can be seen as a pulse train[4]. These pulses can be directly represented as a Stochastic Number. This could allow for calculations using analog data, without having to digitize it(which DC requires).

SC was originally proposed in the 1960’s as an alternative to DC [12], but it had its own share of problems that did not allow it to perform on a similar level as DC. With technology improving and current requirements switching from mainly needing more computing power to still needing computing power while also having limited resources like power consumption or space constraints. This is a niche where SC could excel.

SC has recently been on the rise in the academic community where the problems of the DC system are actual problems. Currently one of the biggest progressions in this field is in the image processing front. For example retinal implants made with SC circuits can be made a lot smaller than their Binary counterparts[2], while also being more tolerant to the noise that is commonly found in the analog world. This research wants to explore if more image or video related algorithms can be implemented with SC.

(8)

1.1 Research Question

The main question of this research thesis will be to explore more image related algorithms that can be implemented with Stochastic Computing circuits, and if this increases performance in any of the following areas: speed, cycles or power consumption. The research will mainly focus on algorithms that are used to track objects within multiple concurrent images, and how this algorithm scales in comparison to its digital counterpart.

(9)

CHAPTER 2

Theoretical background

2.1 Stochastic computing

2.1.1 Mathematical Basics

As introduced before Stochastic Computing (SC) is a system where numbers are defined as the probability of a 1 in bitstring of length n.

Or rather a stochastic number (SN) X of length n can be defined as the probability: pX =n1

n[0, 1] with n1 being the number of 1’s found in X.

With this definition in place it can also be observed that SNs are not unique. Rather there are (n_n₁) possibilities for each SN with the same accuracy. For example, (0110), (1010) and (10) are all acceptable representations of 0.5. However in the same way that in the base10 system there is a difference between 0.1 and 0.10, there is a difference in accuracy between (1100) and (10). In general a SN gets more accurate the longer the string is. This is a difference between conventional binary format, where each number has the same accuracy if represented with the same number of bits.

The accuracy of a SN also ties directly into its tolerance towards errors. Each bit in a SN has an equal amount of influence towards the final outcome, that is each bit has 1

n significance. As explained before, if n increases in size the accuracy goes up, but the significance of each following bit goes down. In short this suggests that increasing the length of a SN increases both a more accurate and error tolerant number. SNs accuracy and error tolerance being length defined also affects the problems of SC that has limited its practical use. For an SN to double its precision, it will also have to double its total length. This means that increases in accuracy are an exponential increase of the bit-stream length. Which would also increase computing time by an corresponding exponential amount.

2.1.2 Randomness and Correlation

One problem of SNs is the lack of defined randomness. As discussed before, a SN is not unique and has many different representation for the same probability. While this is not a problem for representing a number itself, it can be a problem when doing basic arithmetic. For example if we take the SN 4/8 and 6/8 exactly and multiply them using an AND gate, in an ideal world, this calculation would always lead to the SN 3/8 as seen in figure 2.1a. However depending on the placement of the 1’s in both SNs this could also create the number 2/8 and 4/8(figure 2.1b). The result of the above function depends on the correlation between the 2 SNs. A pair of SNs can have a correlation coefficient between [−1, +1], where +1 is fully correlated with maximum similartity (i.e. 2 SNs) and −1 is fully correlated with minimum similarity (i.e. (1010) and (0101)). A correlation coefficient of ”0” is required to keep the SC calculation accurate. The

(10)

Figure 2.1: AND gate used as a SC multiplier: (a) exact and (b) approximate computation of 4/8 x 6/8 [1]

Stochastic Computing Correlation can be calculated by the following formula[2]. SCC(X, Y ) =

( _ad−bc

nmin(a+b,a+c)−(a+b)(a+c) if ad > bc

ad−bc

(a+b)(a+c)−nmax(a−d,0) otherwise

Where a is denoted as the number of overlapping 1’s, b as the number of overlapping 1’s of X with 0’s of Y , c as the number of overlapping 0’s of X with 1’s of Y and d as the number of overlapping 0’s. As example the SCC of figure 2.1.a and b would be:

SCC(S1, S2)a= 3 − 3 32 − 36 = 0 SCC(S1, S2)b= −6

24 − 16 = −0.75

However the notion that all SNs should be kept with a SCC of 0 is not always correct. A Stochastic circuit with with correlated SNs will behave different than it’s non correlated counterpart, but it’s behaviour is not random[2]. For example the AND gate which is used for multiplication when the SCC is 0, becomes a function that finds the max of both numbers when positively correlated, and min function when negatively correlated. This way circuits can be reused for different purposes, and can do functions that could not be simply solved without correlation.

2.1.3 Basic Components

For Stochastic Computing at least 5 basic components are required to do all the necessary computation. The component most commonly known with SC is the AND gate which is used to implement multiplication(figure 2.2a).

Using correlation we can also use this gate to calculate the maximum or minimum of 2 SNs. Scaled addition is also implemented by a small circuit. A multiplexer(MUX) seen in figure 2.2b with a third SN computes the function

pY = pspX₁+ (1 − ps)pX₂

The MUX is unique in that it is not affected by correlation at all. Further correlated inputs can also be used with an XOR gate to create absolute-valued subtraction when positively correlated, and saturating addition when negatively correlated.

The last two components are not always required if both end the calculation stay in the Stochastic format, but it is often required to translate from or to a binary format. To go from a SN to a Binary Number(BN) it is only required to count the 1’s in a specified bitstring length.

(11)

Figure 2.2: implementation of multiplication and scaled addition [13]

Figure 2.3: SC conversion unit: (a) binary-to-stochastic converter, and (b) stochastic-to-binary converter [2]

To go from a BN to a SC we need to use a random number generator over the specified amount of cycles to generate the SN. For each bit in the length of the SN, it is required to generate a new random number, and compare it to the BN. This process is often seen as the most expensive component of SC, and should be used to a minimum.

2.1.4 Applications

While Stochastic Computing was originally proposed in the 1950’s[16], it was not until recently that it has slowly progressed into relevancy. While a few practical Stochastic Computers were developed at the end of the 1960 [12], it was also then that the fallback of SC was recognized. Mainly the choice of having to choose between either low bandwidth or low precision held SC back. While no actual practical uses were developed until close to the millennium, a good amount of import theoretical findings were found in that time.

Close to the millennium interest in SC started to come from the neural side of science[9]. SNs had a lot in common with the way neurons communicate, namely neurons communicate by firing pulses over their axons. An in increase in fire rate of these axons means a stronger signal, i.e. a harder pain signal or seeing a brighter light. A SN also increases value by having more 1 signals in their bit stream. With the AI boom of the last decade, Stochastic Neural networks are becoming an even more hot topic with multiple new neural network implementations of Stochastic Computing released in the last year.[3][11][10]

Another newly discovered practical application of SC is the decoding of low-density parity check and other related error-correcting codes [5]. LDPC related codes are linear codes that are used mainly for communicating over busy and error-prone channels, i.e. WIFI. These codes often use long code-words that require lots of resources to decode using conventional hardware. Not only that, the most effective LDPC decoding algorithms were not precise, but rather probabilistic. This looks like the exact mix of symptoms that SC shines in solving.

The newest use of SC was found in the analog world, mainly in image processing. Like neural networks, SN are closer to an analog signal than a digital signal. With relative cheap power cost and size of Stochastic Circuits, SC is able to cheaply parallelize the small and simple pixel operations. Saving both space and power compared to its Binary counterpart.

(12)

2.2 Stochastic Simulation in SystemC

SystemC[6] is a C++ library to provide an event-driven simulation interface. It is often described as a system-level modeling language, and can be seen as a higher level language compared to hardware description languages like VHDL[8] and Verilog[7]. SystemCs main building block are the modules. These modules can be seen as a small circuit or gate which executes a function. These modules have ports to communicate with other modules through signals. A signal could be seen as wire which could be either 1 or 0, or in advanced modules could even be a whole bus. These modules execute their functions based on their sensitivity to a signal. A module can be sensitive to a change of the signal in the wire, but could also be sensitive a clock. For example to implement a simple AND gate it is required to have 2 incoming(I) and 1 outgoing(O) gate which are populated by one signal each. The function for an AND gate would simply be

O1= I1&I2

By making the module sensitive to either input signal, it would execute the function whenever either of those signals change and push a signal to the outgoing port. In SystemC, such an AND gate could be programmed as followed.

#i n c l u d e ” s y s t e m c . h” SC MODULE( and ) { s c i n A, B ; // I n p u t S i g n a l s s c o u t F ; // Output S i g n a l v o i d do and ( ) { F . w r i t e (A. r e a d ( ) && B . r e a d ( ) ) ; } SC CTOR( and ) // C o n s t r u c t o r {

SC METHOD( do and ) ; // C a l l t h i s f u n c t i o n when a c t i v a t e d s e n s i t i v e << A << B ; // S e t s e n s i t i v i t y t o i n p u t s i g n a l s }

} ;

In SystemC a module is a C++ class with some extra default functionality. The SC METHOD is the function that gets called whenever the module gets activated. While it is possible to manually activate the module, it is mainly done by a change in the variables it is sensitive for. In this module the sensitivity is set the A and B, meaning a change in either signal would activate the module and call the function do and. As explained before, it is also possible to make a module sensitive to a clock. A clock can be initialized by the following code.

s c c l o c k Clk ( ” Clock ” , 1 0 , SC NS , 0 . 5 ) ;

This creates a clock-signal that switches every 10 nanoseconds, starting at 0.5 nanoseconds. It is not recommended to start the clock at 0 nanoseconds as the simulator will still be in the initializing phase at that time. An example of a module that uses a clock is a multiplexer #i n c l u d e ” s y s t e m c . h” SC MODULE(mux) { s c i n A, B ; // I n p u t S i g n a l s s c o u t F ; // Output S i g n a l s c i n Clk ; // Clock S i g n a l b o o l C ; v o i d do mux ( )

(13)

{ w h i l e ( t r u e ) { i f (C) { F . w r i t e (A. r e a d ( ) ) ; } e l s e { F . w r i t e (B . r e a d ( ) ) ; } C = ! C ; w a i t ( ) ; } } SC CTOR(mux) // C o n s t r u c t o r { C = 0 ;

SC THREAD( do mux ) ; // C r e a t e Thread on c o n s t r u c t i o n

s e n s i t i v e << Clk . pos ( ) ; // S e t s e n s i t i v i t y t o c l o c k s i g n a l s }

} ;

In the above code, instead of using a SC MODULE, a SC THREAD is used. This thread will run continuously, but the wait() command halts the thread until the module is activated by a new clock-signal.

When using a clock SystemC makes sure that each module sensitive to a clock executes at exactly the same time and finishes execution before a new clock tick. This guarantees that when developing parallel systems using SystemC behaves like an actual parallel system that would execute each modules thread at the same time, even though the simulation hardware might not support that many concurrent threads. The Decision to use SystemC as implementation language for this thesis has multiple reasons. First of all the SystemC modules allows to create simple circuits without the need of having knowledge of the actual hardware components or electronic systems. Second of all as image processing is done on multiple pixels concurrently, it is necessary to have a language that can handle multiple concurrent threads, while making sure they stay synchronized. SystemCs promise to synchronize all modules on a clock cycle fits this exactly. And finally SystemC is a language that is close to actual system design, which would make it easier for future research to port the implementation to an actual circuit or emulated on an Field Programmable Gate Array(FGPA).

(14)

(15)

CHAPTER 3

Design

Designing algorithms to be used in a Stochastic system comes with its own set of challenges, the biggest one being that is it is not binary. While porting binary algorithms is possible, it is clear that without changes they will not make use of the niches where SC is strong. Because of this, the algorithm will be built from scratch. During development the following requirements were set for the algorithm:

• The algorithm should have Stochastic in and output

• The algorithm should be able to track a moving object in a video during runtime • The algorithm should be as basic as possible

• The algorithm should be able to scale up to higher image sizes, preferably linear

The reasoning for these requirements is as followed: the original interest in this research was generated by the issues found in programmable robots used at the University of Amsterdam. These robots could use object tracking to move around, however this used a great amount of power and resources that these robots were often overheating. If this algorithm would be used by these robots, then it would be of utmost importance that it would use as little resources as possible. A basic algorithm will most likely use less resources than a more advanced and precise one. The reason for Stochastic in and output is also to lower the amount of resources and power used, as translating to and from binary are one of the more expensive calculation in SC. And as a basic fast algorithm might not be as accurate as its binary counterpart it would ,instead of returning precise coordinates to the device, return probabilities on where it expect the object to be.

While being focused on keeping resources low, of the first decisions made was to make the algorithm hybrid between Stochastic and Binary computing. To register movement with a video processing algorithm it is required to make comparisons between multiple frames. To make this possible it is necessary to at least safe a frame in a memory circuit, and compare it a the correct time with a new frame. These are all control actions where SC is not specialized in, but Binary Computing is, as even increasing a SN by 1 requires the whole number to be recreated. All image and video processing will be done using SC, but there will be a control layer above it using Binary. This is comparable to the use of Graphics Processing Units(GPU) in modern systems. The CPU is able to do all general purpose operations with a limited amount of concurrency, while the GPU is able to do thousands of specific identical operations on parallel data concurrently.

3.1 Object Tracking

The most basic algorithm of object tracking or rather movement detection is by comparing the difference between two frames. When subtracting 2 frames from each other, with a static background, all that is left are the pixels that moved. As this action has to be done for each

(16)

pixel individually, it can be parallelized. As SC is based on small efficient components, it is a good candidate for parallel computations, and this will be the basis for the algorithm.

To create this part of the algorithm 3 parts are required. First of all a subtraction unit is required. It was discussed in chapter 2 that a XOR GATE with 2 maximum similarity correlated signals acts as an absolute subtraction unit. This means that second of all it is required to get 2 correlated signals, or a circuit that changes SNs into correlated signals. And as last a memory unit is required to save at least the previous frame, to subtract from the current frame. The memory unit can be combined with the correlation unit by only pushing the 1’s of the incoming SN towards the memory, until the SN frame was fully received. By pushing only the 1’s, it sorts the SN so that all the 1’s are at the front and 0’s at the back. For example the SN (0101111001) would be saved in the memory unit as (1111110000). Two of these sorted SN will be maximum similarity correlated.

One issue found with method was that it is only possible to identify a new bit in a SN when it switches from 1 to 0 or the other way around. The SN (111110000) would appear to have a length of 2 as the circuit can not perceive bits that do not swap. The solution for this is to make the circuit synchronized by means of a clock. By applying the same clock to all circuits it is possible to identify when a new bit is received, and count the length of the SN to cut it into the required frames.

Finally it was decided that using 3 memory units in a rotating order was required. This way 2 memory units could be used to calculate the difference, while the last saves the currently incoming frame. After which the oldest frame would be replaced by a new frame.

3.2 Noise Reduction

While the algorithm is currently able to track moving objects on static background, it has issues differencing noise from actual objects. Also the subtraction circuit that was created is using multiple binary control functions (for counting and deciding which memory module is being used), which increases the resources and power used by the circuit.

Both these issues can be partially solved by creating super-pixels. A super-pixel is a pixel that is created by a cluster of pixels[17], and represents the same characteristics as the pixels underneath it. These pixels reduce redundancy in images and by lowering the total amount of pixels in the image it also lowers the complexity of concurrent image processing tasks. To reduce the influence of noise, an averaging filter can be used to create the super-pixels. This filter averages the value of a cluster of pixels which lowers noise peaks that can be found in an image.

Creating an averaging filter in SC can be done with a Multiplexer. In chapter 2 it was already shown that a multiplexer can be used as a weighted adder between 2 SN. This filter works in the same way, except that the multiplexer has an equal amount of inputs as the cluster of pixels that will become a super-pixel[14]. The multiplexer only forwards one input per clock tick, and switches input every tick, which leads to every input having an equal influence on the created super-pixel.

The super-pixels can have overlapping sources to not lower the accuracy too much. The size of the super pixels is also variable.

3.3 Object Prediction

The algorithm was designed with the intention of using as little resources as possible. However there are still calculations being done on pixels where the object will most likely not be. For example if the object was found in the upper left corner, then the chance will be small that it will show up in the bottom right corner in the next frame. Therefore implementing simple object prediction could reduce the amount of calculations done.

Implemented in the control unit of the algorithm is a simple prediction algorithm. It classifies super-pixels as part of an object when the result of the frame subtraction is higher than a set threshold. It then enables only the super-pixels that are classified as an object and those directly adjacent to them. All other super-pixels and subtraction units will stay inactive until they get

(17)

activated. This classifier only activates after a few full frame cycles to ensure that an object is found.

To make sure that no new objects are missed it periodically re-enables the whole range of super-pixel and subtraction units again to scan the whole image range for objects.

(18)

(19)

CHAPTER 4

Experiments

To measure the performance of the designed algorithm, multiple experiments have been per-formed.

• To determine the accuracy of the algorithm a combination of SN bit string length, super-pixel size, super-super-pixel overlap, object threshold and different noise levels are tested to determine what parameters are required for a minimum level of effectiveness

• To determine the scalability of the algorithm, the power usage will be measured with different sizes of image resolution. It is important to note that the simulation software used does not grant us any real information about power usage, however an educated guess can be made based on which and how many circuits are activated.

As no Stochastic sensors exist to date, a video generator was implemented in the simulation software. The software generates a square white image with a black blob that moves in a circle around the center of the image. The blob is an ellipse generated by multiple sin and cosine functions, which also turns the blob on it’s own axis by facing it to the center of the image. Figure 4.1 shows 2 frames 20 frames apart which are generated by this video generator. The blobs intensity is at maximum at its center and tapers of to 50% at the edges. The videos generated can be varied in size, blob-size and noise levels, different noise levels can be seen in figure 4.2. The noise function changes each Stochastic pixel by randomly setting a specified percentage of bits in the bit string to ”1”.

(20)

Figure 4.2: Noise levels from left to right 5%, 10%, 15%, 20%

In the first experiment the following parameters of the algorithm will be tested using different levels of noise:

• Super-pixel size, meaning the width of the cluster of pixels that the super-pixel represents. Increasing this size lowers the amount of super-pixels

• Super-pixel overlap, meaning the amount of pixels in 1 axis of the super pixels that overlap (i.e. an overlap of 1 with a super-pixel size of 3 would place the center of each super pixel 2 apart and each side would overlap with another super-pixel). Increasing this increases the amount of super-pixels

• Prediction Threshold, meaning the minimum value of the returned SN that is accepted as a moving object.

• Bit string length, meaning the length of the SN that is taken as a single frame. Increasing this should increase accuracy.

(21)

4.1 Results

Figure 4.3: Noise levels from left to right 5%, 10%, 15%, 20%, default configuration

Figure 4.4: Noise levels from left to right 5%, 10%, 15%, 20%, super-pixel size 4

Figure 4.5: Noise levels from left to right 5%, 10%, 15%, 20%, super-pixel size 4, overlap 2 Figure 4.3 shows how the algorithm performs with the default configuration: super-pixel size 3, super-pixel overlap 1, prediction threshold 15%, SN bit string length 64. The default configuration successfully deals with the noise, showing only the tracked object.

When the Super-pixel size is increased, with a low amount of overlap(figure 4.4). It is observed that the noise on high levels does not get fully cancelled out. Most likely this is because the super-pixels now average around the noise percentage which is equal to the threshold of the prediction algorithm.

Even with increased overlap (so double the amount of super-pixels) it misses the object at a noise level of 15% or more. This could mean that increasing the amount of super-pixels does not make the algorithm more accurate at high noise levels. At the low levels it looks slightly more accurate.

(22)

Figure 4.6: Noise levels from left to right 5%, 10%, 15%, 20%, super-pixel size 2

Figure 4.7: Noise levels from left to right 5%, 10%, 15%, 20%, prediction threshold 5%

Figure 4.8: Noise levels from left to right 5%, 10%, 15%, 20%, prediction threshold 10% Smaller super-pixels are more accurate at lower noise levels, but are more susceptible to noise peaks at higher noise levels. This makes sense as it has less then half the area of even the 3 size super-pixel to smooth over the peak.

A low prediction threshold (figure 4.7) does not appear to filter out any noise. It is of note that it might still be possible to detect the object as the values of the tracked blob should still be higher than the noise.

As expected increasing the prediction threshold (figure 4.8) clears out a lot of noise at the lower levels, but it still does not help at higher noise levels.

(23)

Figure 4.9: Noise levels from left to right 5%, 10%, 15%, 20%, prediction threshold 20%

Figure 4.10: Noise levels from left to right 5%, 10%, 15%, 20%, SN bit string 32

Figure 4.11: Noise levels from left to right 5%, 10%, 15%, 20%, SN bit string 128 A prediction threshold of 20% is slightly too high(figure 4.9). This is probably caused by the fact that the glob object that we’re tracking tapers of in intensity on its edge. Meaning that the change in values between 2 frames will barely be higher than 30, add a noise difference to it and it drops below the threshold.

Lowering the bit string length lowers the accuracy of the SN(figure 4.10) and makes it less resilient to noise. This is in line with the principles of SC.

And increasing the bit string length increases the accuracy of the SN(figure 4.11). The bit string however is also the biggest factor in time between frames. This halves the frame-rate of the algorithm.

(24)

4.2 Power usage

To calculate the power usage of the algorithm it is required to appoint an energy value to each component, as the simulator can track each activated component. If a simple component like a clock tick or an AND gate is 1 energy value, then a binary counter could be seen as 4 because it uses 4 simple components. The super-pixel could be seen as 5 energy values, with 1 gate and 1 counter. The subtraction unit has a counter, 1 input, 2 memory output, XOR gate and 3 AND gates for the memory banks, which would calculate for a value of 11. The control unit also has a counter for each super-pixel. This is definitely not fully accurate, but it gives an idea how the power usage scales at higher resolutions.

Figure 4.12: Energy values over image resolution

Figure 4.12 shows that the energy used per pixel does indeed scale linearly. Meaning that the algorithm will not lose performance as long as enough hardware is able to be used.

(25)

CHAPTER 5

Conclusions

Interest in Stochastic Computing has been on the rise in the academic community as it is believed that it can perform specific arithmetic computations at a lower cost of power at the cost of accu-racy. The most recent applications of Stochastic Computing have been in the image processing field. This paper discusses the possibility of using this technology to create an object tracking algorithm that processes multiple images in a stream to track a moving object. The original research question was if such an algorithm was possible, how it performs and how the algorithm scales with image size. The answer to performance is not directly answered as it was not able to compare the simulation results to real world algorithms due to time constraints. First of all the results show that an object tracking algorithm is possible to be created with Stochastic Com-puting. They show that the algorithm is able to detect a moving object in a video stream with a fixed background with different noise levels. The algorithm can also be easily set to become more or less noise tolerant when needed, at the cost of performance. The results also show that the algorithms power consumption scales linearly with the image size. Meaning that as long as it is feasible to add an equal amount of circuits to each pixel, it will perform equally on a small and large image.

5.1 Discussion

While the thesis concluded that it is possible to create an object tracking algorithm with Stochas-tic Computing it is important to think criStochas-tically if this algorithm is one that really fits StochasStochas-tic Computing. As discussed in the theory SC mainly excels in 2 specific operations: the AND gate that multiplies 2 SNs and the Multiplexer that creates weighted adders. Comparison between 2 SNs however is not an operation in which SC excels. Comparing or subtracting 2 numbers requires either converting the number back to Binary or making the numbers correlate positively, of which these 2 operations are nearly identical. This thesis tried to make up for the cost of those operations by using a control plane that controls which circuits are activated lowering the amount of these expensive operations. However a control plane that has to compare each number to a certain threshold should impact the performance in the same manner. Also the circuits that might be deactivated, are still required at the start of the algorithm, and take up space on the chip. Another issue with this algorithm is that it is static. In this Stochastic implementation it is not possible to change any of the performance related variables, i.e. the amount of super-pixels, after the chip has been created. This reduced the flexibility of an algorithm which you would like a Stochastic implementation to be successful in.

(26)

5.2 Future Research

For further research the most important step would be to get the algorithm out of simulators. To be able to get real world comparisons between other implementations it is necessary to have this working on actual chips. Field Programmable Gate Arrays(FGPA) are able to help get the algorithm into the real world, but as discussed by [2] FGPA’s still do not deliver real world performance compared to creating the chip in silicon. One other prerequisite to get this algorithm out of the simulator would be the creation of Stochastic Sensors, that would deliver images in a stochastic format without first converting it to a digital format and back.

For the future I think it is also important to look further into the Binary Stochastic hybrid and see how much image processing you can do stochastically on an image to reduce the load on the binary object tracking algorithm. I think there is a lot of future in Stochastic Computing as long as it gets used for the niches it excels in, and maybe in the future you can have a Stochastic Processing Unit like we have a Graphical Processing Unit now.

(27)

Bibliography

[1] A. Alaghi and J. P. Hayes. “Survey of stochastic computing”. In: ACM Transactions on Embedded computing systems (TECS) 12.2s (2013), p. 92.

[2] Armin Alaghi. “The Logic of Random Pulses: Stochastic Computing”. PhD thesis. Univer-sity of Michigan, 2015.

[3] A. Ardakani et al. “VLSI implementation of deep neural network using integral stochastic computing”. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25.10 (2017), pp. 2688–2699.

[4] T. H. Bullock. “Neuron Doctrine and Electrophysiology”. In: Science 129.3355 (1959), pp. 997–1002. issn: 00368075, 10959203. url: http://www.jstor.org/stable/1757040. [5] V. C. Gaudet and A. C. Rapley. “Iterative decoding using stochastic computation”. In:

Electronics Letters 39.3 (2003), pp. 299–301.

[6] “IEEE Standard for Standard SystemC Language Reference Manual”. In: IEEE Std 1666-2011 (Revision of IEEE Std 1666-2005) (2012), pp. 1–638. doi: 10.1109/IEEESTD.2012. 6134619.

[7] “IEEE Standard for Verilog Hardware Description Language”. In: IEEE Std 1364-2005 (Revision of IEEE Std 1364-2001) (2006), 0_{1–560. doi: 10.1109/IEEESTD.2006.99495.} [8] “IEEE Standard VHDL Language Reference Manual”. In: IEEE Std 1076-2008 (Revision

of IEEE Std 1076-2002) (2009), pp. c1–626. doi: 10.1109/IEEESTD.2009.4772740. [9] Y. Kim and M. A. Shanblatt. “Architecture and statistical model of a pulse-mode digital

multilayer neural network”. In: IEEE transactions on neural networks 6.5 (1995), pp. 1109– 1118.

[10] V. T. Lee et al. “Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing”. In: Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association. 2017, pp. 13–18.

[11] A. Morro et al. “A stochastic spiking neural network for virtual screening”. In: IEEE transactions on neural networks and learning systems (2017).

[12] W. J. Poppelbaum, C. Afuso, and J. W. Esch. “Stochastic Computing Elements and Sys-tems”. In: Proceedings of the November 14-16, 1967, Fall Joint Computer Conference. AFIPS ’67 (Fall). Anaheim, California: ACM, 1967, pp. 635–644. doi: 10.1145/1465611. 1465696. url: http://doi.acm.org/10.1145/1465611.1465696.

[13] W. Qian and M. D. Riedel. “The synthesis of robust polynomial arithmetic with stochastic logic”. In: Proceedings of the 45th annual Design Automation Conference. ACM. 2008, pp. 648–653.

[14] O Rompelman and HH Ros. “Coherent averaging technique: A tutorial review Part 1: Noise reduction and the equivalent filter”. In: Journal of biomedical engineering 8.1 (1986), pp. 24–29.

[15] J. Sartori, J. Sloan, and R. Kumar. “Stochastic computing: embracing errors in architec-tureand design of processors and applications”. In: Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems. ACM. 2011, pp. 135–144.

(28)

[16] J Von Neumann. “Probabilistic logics and the synthesis of reliable organisms from unreli-able components”. In: Automata studies 34 (1956), pp. 43–98.

Object Tracking with Stochastic Computing

Bachelor Informatica