Stochastic processing for computer vision applications

(1)

Bachelor Informatica

Stochastic processing for

com-puter vision applications

Roene Moolhuizen

June 29, 2018

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

In recent years the design challenges for integrated circuits have shifted to the resource consumption of circuits. The stochastic processing paradigm is able to address the three mayor bottlenecks that digital processing faces. The low gate-level cost of stochastic cir-cuits, combined with its native error tolerance, makes them viable alternatives to conven-tional digital circuitry. This research proposes stochastic hardware for the Sobel operator, a stochastic comparator, and a stochastic median filter. In addition, we demonstrate the usage of stochastic bundle processing for image processing applications. Lastly, we propose hybrid circuitry that uses both stochastic and digital logic for the purpose of object tracking.

(4)

(5)

List of Figures

1.1 The end of Dennard scaling . . . 13

1.2 An analog signal compared to a digital signal . . . 14

2.1 Logic gates . . . 17

2.2 Analog comparator . . . 18

2.3 Discrete representation of a 2D function . . . 18

2.4 Full-adder . . . 19

2.5 Circuit for generating SNs with a specified SCC. . . 22

2.6 Stochastic multiplication . . . 22

2.7 Multiplexer . . . 23

2.8 Stochastic division circuit . . . 23

2.9 Stochastic square root circuit . . . 24

2.10 Stochastic circuit for Roberts cross operator . . . 26

3.1 Proposed stochastic Sobel circuit . . . 29

3.2 The stochastic implementation of the 3x3 median filter . . . 29

3.3 Proposed stochastic comparator . . . 30

3.4 Stochastic frame difference . . . 30

3.5 Calculating super-pixel values . . . 31

3.6 von Neumann’s neighbourhood for P with Manhattan distance r = 1 . . . 31

3.7 Full-adder . . . 32

3.8 Stochastic video processing circuitry overview . . . 33

4.1 Index of dispersion for length of stream . . . 35

4.2 Original image . . . 36

4.3 Mean absolute errors for image representation . . . 37

4.4 Binary vs. stochastic accuracy for different error rates . . . 37

4.5 Mean absolute errors for different error rates . . . 37

4.6 Mean absolute errors for stochastic Roberts cross . . . 38

4.7 Stochastic Roberts cross for varying bit-stream lengths . . . 38

4.8 Mean absolute errors for stochastic Sobel operator . . . 39

4.9 Stochastic Sobel for varying bit-stream lengths . . . 39

4.10 Stochastic median filter . . . 39

4.11 Multiplexing frame difference super-pixels . . . 40

(8)

(9)

List of Tables

2.1 Effects of correlation between SNs in unipolar form . . . 21

2.2 Advantages and Disadvantages of Stochastic Computing . . . 25

3.1 Hardware costs for stochastic and digital processing . . . 33

4.1 Approximate measure of autocorrelation for different inputs . . . 36

A.1 Mean absolute errors for image representations . . . 49

A.2 Mean absolute errors for different error rates . . . 49

A.3 Mean absolute errors for stochastic Roberts cross . . . 49

(10)

(11)

Abbreviations

BB Bounding box.

ILP Instruction level parallelism. MAE Mean absolute error. PP Progressive precision. ROI Regions of interest. SC Stochastic computing.

SCC Stochastic correlation coefficient. SN Stochastic number.

(12)

(13)

CHAPTER 1

Introduction

Motivation

Over the past few decades computer components have become ever so small, and with it, digi-tal computers have become ever so fast. However, digidigi-tal computers still face challenges in the modern time [33]. The three primary obstacles, that digital processing faces, are known as the memory wall, the instruction level parallelism (ILP) wall, and the power wall. The memory wall is known as the increasing gap between the speed of computer memory and that of processing units; a faster processor does not do much good if the memory can not keep up and provide it with the required information to process on (the I/O bottleneck). The second wall, the ILP wall, repre-sents the difficulty of finding enough to do for a single thread; higher parallelism is only beneficial if it can be actively utilized. Finally, there is the power wall; decreasing the size of circuitry does not make a difference if it is not able to function due to heat dissipation constraints. Both the power wall and the ILP wall can be seen at work in figure 1.1. The single-thread performance and the typical power of circuits seemingly stabilize after 2010. The stochastic processing paradigm, that will be explored in this paper, is able to address the three walls that processing faces today.

(14)

Background

Back in the year 1965 the co-founder of Intel, Gordon Earl Moore, published a paper predicting that the amount of components per integrated circuit would double every year [29]. The predic-tion that Moore made back in 1965, based on observing the growth in the amount of transistors, resistors, and capacitors on computer chips from the year 1959 to 1965, has become a widely accepted indication for the growth rate of processing units. Figure 1.1 shows the performance indicators of processing units over the past few decades. The top line in figure 1.1 shows the exponential increase in the amount of transistors on a computer chip over the past 40 years. Though the implications of this reduction in components size are very much observable in every-day life, some other aspects of computing have difficulty keeping the same pace. Over the years researchers have looked at different processing paradigms in order to address the shortcomings observed in digital processing [11][17][19][38]. In this paper we will focus on the the stochastic processing paradigm [13] and how it may be used for the purposes of efficient video and image analysis.

Digital Processing

Digital processing is the term that is used to describe the processing of information in discrete, or finite, form. Digital processing operates on finite data that, in the case of most computers, is being represented by binary; using 1’s and 0’s (bits). One commonly accepted manner to model real world phenomena, is by representing the information in a discrete manner. For example, we can describe the amount of time a process takes, the intensity of a color or the distance between two objects as a discrete number. From this discrete representation of data it is possible to simulate a system that emulates real world behaviour. In spite of possible quantization errors, that come paired with sampling data, digital processing is able to guaranty stable results through the usage of enough bits and carefully constructed algorithms. In addition, the uniformity of the binary format allows for stable processing across different computational systems. These ob-servations historically resulted in digital systems becoming the industry standard for processing information. Other methods, such as analog and stochastic processing, got left behind, since they were relatively difficult to design, build, operate and maintain when compared to digital computers.

Figure 1.2: An analog signal compared to a digital signal

Analog Processing

The analog, or continuous, computer is a processing paradigm that can be seen as taking the opposing approach to the digital one. Instead of operating on sampled discrete data, the analog computer operates on non-sampled natural signals which can be infinitely small. Figure 1.2

(15)

illustrates the difference between an analog and a digital signal and shows how an analog signal may be sampled at different rates to increase the accuracy of the digital representation. A critique on digital processing is that it requires data to be sampled in the first place and that this causes overhead in the processing of information. The large strides that digital computers made in the 1960s in the area of programmability, algorithmic operation, ease of storage and the precision of storage in combination with the improvements made in the field of semiconductors led to the decline of analog computers [6].

Research

Though advancements have been made in addressing the obstacles that digital processing faces, some barriers can not be easily addressed using conventional design approaches. Energy and power constrains are a major challenge for the design of integrated circuits of mobile embedded systems, e.g. digital eye implants. These types of devices are not able to provide circuits with all the power they may need due to battery capacities and the physical limitations of their circuits. If a program does not find a way to deal with the limited resources available to it, it will not be able to serve its intended purpose. In this paper the stochastic programming paradigm that existed before the takeoff of digital processing will be examined. We will start by performing a survey of the existing literature surrounding the stochastic processing paradigm. In the next chapter we will discuss logical circuits, digital processing, analog processing and stochastic variables. After illustrating the theoretical context, we will present our findings on stochastic systems and its applications and present numerical data regarding its performance in comparison to their digital counterparts.

(16)

(17)

CHAPTER 2

Theoretical background

2.1 Logical Circuits

Both analog and digital computers have in common that they function on the basis of electronic systems, which in turn are build from logical circuits. There are several symbolical notations for logic gates; the one that is used for the purposes of this papers is the IEEE Std 91 1991 [18]. Logic gates can be seen as a physical representation of the boolean algebra. Figure 2.1 shows an overview of the different basic logic gates [21]. The time sequence diagram is especially of interest since it shows the behaviour that continuous signals show under a specific logic operator. This will be used extensively later on in this paper when stochastic numbers are introduced.

(18)

Analog comparator

For the purposes of understanding the circuits that are discussed in this paper, one additional component needs to be introduced; the comparator. The comparator is commonly used in the conversion of analog signals to stochastic signals [15]. The comparator is a device that, as the name implies, compares two analog input voltages V+and V−and outputs a binary digital output V0. The output of a comparator is given by:

Vo= (

1, if V+> V− 0, if V+< V−

Figure 2.2 shows the symbolical notation of a comparator, this notation has some variations in which instead of +/-, a < or COMP is written. Within stochastic computing the comparator is often used in the generation of stochastic numbers.

Figure 2.2: Analog comparator

2.2 Digital processing

Digital systems manipulate data in discrete chunks. However, most data is not discrete by nature. In order to convert continuous analog signals from the real world to digital signals, the analog signals need to be made discrete. Continuous signals can be converted to a discrete representation by sampling the signal across different intervals. Figure 2.3 illustrates how a continuous image in nature can be seen as a 2D function f (x, y). In digital processing these continuous signals are sampled at specific x and y points in order to obtain a discrete intensity values at every pixel.

(19)

2.2.1 Binary

Within digital electronics the discrete samples are most commonly stored in the binary format. This is because the binary format closely resembles the the underlying logical circuitry of digital hardware. The binary format is similar to the decimal way of representing numbers, except it uses base 2 instead of 10. The base 2 can be described by 1’s and 0’s in hardware. After a signal is sampled and represented in binary, the sampled data can be processed within digital hardware.

2.2.2 Digital Arithmetic

Once a signal is sampled and converted to binary, it can be processed in digital hardware. The number of bits required to represent a decimal number n is log 2(n). A digital value is build up of multiple individual bits that need to be processed in combination in order to process the derivative of the original value. This manner of representing data requires multiple logic gates for every n-bit number to process, since bits of the same magnitude need to be compared to one-another and arithmetical circuits need to be able to carry any spillage of bits. An example of such a circuit is the full-adder.

Full adder

Figure 2.4 shows the circuitry required to add binary numbers and is called a full-adder. Since the binary representation of numbers is done through a combination of different bits with varying significance, digital circuitry will need to account for these varying degrees of significance. For the addition of binary numbers each bit-adder requires the information of adjacent adders, which is communicated through the Cin and Cout channels. The propagation delay caused by full adders may be omitted through the usage of carry-lookahead adders at the expense of additional hardware [39].

Figure 2.4: Full-adder

2.3 Analog processing

Initial analog computers used continuously changing aspects to model the problem being solved. The term analog comes from the analogy between the computational and primary processes. This historical definition no longer applies since digital processing also has this characteristic. The defining characteristic of analog processing has become the usage of continuous representations, as opposed to digital processing, which uses the discrete representation. It has been argued [25] that post-Moore’s law processing will require computational processes to represent the physical processes that realize them since this would reduce the overhead involved in sampling analog signals and synchronizing their binary representations. This prediction would require us leaving the status quo of binary electronics and explore the area of analog circuit design. For the purposes of this paper we will look at a specific form of analog processing called stochastic processing.

(20)

2.4 Stochastic processing

Stochastic computing (SC) is the collective name of techniques that represent continuous sig-nals as streams of bits. Complex computations on these streams can be made in an effective fashion through the use of ”digital” logical circuitry. Stochastic computing is sometimes called semi-digital processing since it transforms deterministic computations in the Boolean domain into probabilistic computations in the real domain {x ∈ IR | 0 ≤ x ≤ 1}. The logic behind stochastic computing was first described in 1953 [31] in a paper by John von Neumann on the fundamental concepts of probabilistic logic design. A decade after Neumann’s paper the tech-nology of computing caught up to the theoretical concept and several engineers began looking into SC [36]. However, with the may advancements that were made in semiconductor technology and digital processing, it eventually got left behind together with other analog computers. SC is unique in that information is represented and processed in the form of probabilities. In recent years stochastic computing has gained interest since it has been successfully applied to the ar-eas of machine learning and control [12], the decoding of error correcting codes [14], and image processing tasks such as edge detection [4] and image thresholding [30]. The intuition behind SC is that it provides an energy efficient alternative to the conventional binary processing since probabilistic arithmetic can be done though minimal circuitry and power consumption [39].

2.4.1 Theory and definition

Stochastic numbers (SNs) represent probabilities as streams of bits in which the occurrence of a 1 divided by the length of the stream is equal to the probability it represents. A SN X of length N represent a probability px= N_N1 ∈ [0, 1], in which N1 is equal to the amount of 1’s in the bit-stream. For example: px = 1₂ can be described as 011001 = 3₆ = X, in which N = 6 and N1 = 3. From the above example it follows intuitively that there are multiple ways of representing the number X by changing the order of the bit-stream. The numbering system for SC is redundant, there are _NN

1 representations for every number

N1

N . Another characteristic is that the probabilities that they are able to represent are limited by the value of N , e.g. a bit-stream of length 10 is not able to accurately represent the probability ₁₁1. A bit-stream of length N is only able to the describe the set {_N0,_N1, ...N −1_N ,N_N}. We can define a SN as follows: Definition 1. A binary sequence of length N is a SN X of value px = N_N1, where N1 is the number of 1’s in X.

A more intuitive way of understanding might be that the probability represented by a SN is the chance of finding a 1 at a random chosen position in the stream. The probability that each bit in the steam X is equal to 1 is defined as: P (X = 1) = px. An additional property that is inherent to SNs is their progressive precision (PP), which suggests that the precision of a SN increases the longer it is being computer on. In this context the most significant bits arrive before the least significant bits; unlike digital arithmetic where the most significant bits usually arrive the last.

2.4.2 Formats

There are two formats for SNs, i.e. manners in which stochastic bit-streams may be interpreted, called the unipolar and bipolar format.

Unipolar

The standard form of a stochastic variable is that unipolar form. In this form a stochastic variable represents a value px ∈ [0, 1]. This manner of denoting stochastic variables offers the highest amount of precision, but it has the limitation of only being able to describe positive values. The value px may be interpreted as the pulse rate, signal intensity or frequency, depending on the context of the signal. The unipolar form is able to describe neurological activity and may be interesting for the purposes of medical applications [9].

(21)

Bipolar

In his work Gaines [13] proposed a different format for representing SNs called the bipolar format. In the unipolar coding format a SN is capable of describing the range (0 ≤ px≤ 1). The range of SNs may be extended to (−1 ≤ px≤ 1) by using the bipolar format. We can transform a signal ranging from −1 to 1 by using the function Y = 2X − 1. In the bipolar format the probability that each bit in the stream is one is P (X = 1) = x+1₂ . The advantage of the bipolar format is that it allows for negative SNs. The cost, however, is that it only encodes half the precision of an ordinary SN since half of its bits are reserved to represent negative values. For example: the SN 010001 in unipolar format would be 2₆ =1₃. In bipolar format this would get transformed to Y = 2 × 1₃− 1 = −1

3.

2.4.3 Correlation

The function that a logic gate implements on SNs changes depending on the correlation coefficient of the inputs. The effects of autocorrelation may cause a stochastic system to have unintended behaviour. In recent studies [2] it has been found that the correlation of SNs does not always have a harmful effect on the circuit. Some operations that would otherwise require sequential hardware can be implemented through the usage of single logic gates when two SNs are correlated. Correlation coefficient

The correlation coefficient between two n-bit SNs X and Y is given by the function definition 2, in which a is the amount of overlapping 1’s of X and Y, b the number of overlapping of overlapping 1’s of X with 0’s of Y, c the number of overlapping 0’s of X with 1’s of Y, and the number of overlapping 0’s of both SNs by d. From these definitions it also follows that a + b + c + d = n. The proposed function for the stochastic correlation coefficient (SCC) is as follows [2]:

Definition 2.

SCC(X, Y ) =

( _ad−bc

n min(a+b,a+c)−(a+b)(a+c) if ad > bc

ad−bc

(a+b)(a+c)−n max(a−d,0) otherwise

Correlation Table

Table 2.1 shows how the function of logic gates changes depending on the SCC between its inputs. The min, max and absolute-valued subtraction are of special interest, since these operations require sequential logic to implements for uncorrelated SNs [2]. The paper [2] also found that

SCC(A,B)

Gate -1.0 0.0 1.0

AND(A,B) max(A + B − 1, 0) AB min(A, B)

OR(A,B) min(A + B, 1) A + B − AB max(A, B)

NAND(A,B) 1- max(A + B − 1, 0) 1 − AB 1−min(A, B) NOR(A,B) 1−min(A + B, 1) 1 − A − B + AB 1−max(A, B)

XOR(A,B) min(A + B, 1) A + B − 2AB |A − B|

XNOR(A,B) 1−min(A + B, 1) 1 − A − B + 2AB 1 − |A − B| Table 2.1: Effects of correlation between SNs in unipolar form the SCC between 0 and +1 or 0 and -1 can be described through the function:

pz(px, py) =      (1 + SCC(X, Y )).F0(px, py) − SCC(X, Y ).F−1(px, py) if SCC(X, Y ) > 0 (1 − SCC(X, Y )).F0(px, py) + SCC(X, Y ).F+1(px, py) otherwise

in which F0(px, py), F−1(px, py) and F+1(px, py) are the functions of the circuit at SCC(X, Y ) = 0, −1 and +1 respectively. However, this observation has not been applied to stochastic circuitry

(22)

Stochastic correlation circuitry

Figure 2.5 shows the circuitry that may be used to generate two SNs with a specific correlation coefficient. The circuitry shown is programmable, and components may be added or removed depending on the circuitry that is to be implemented. The circuitry generates X in a standard stochastic way, by comparing the output of a random number generator to a probability px. Y is then generated by multiplexing the output of a random number generator 2 with the output of the random number generator used for X. The circuitry is based on the intuition that sharing a random source causes signals to become correlated. The same source may be used for any number of SNs to cause them all to have a SCC of +1.

Figure 2.5: Circuit for generating SNs with a specified SCC [1].

2.4.4 Stochastic Operations

One of the theorized advantages of SC is that it allows for effici¨ent arithmetic using the different types of logic gates. Where digital systems require a relatively large amount of circuitry in order to perform arithmetical operations, stochastic systems often require only a single logic gate to perform the same operation. In this section a look will be taken into how arithmetic may be performed through stochastic logic. The stochastic arithmetic discussed in this section is for the purposes of unipolar SNs. This section also serves, in part, to give a sense as to how stochastic circuitry functions.

Multiplication

One of the main reasons why SNs are interesting is because the multiplications of SNs only requires a single AND gate. It should be noted that for correct functionality the two input streams of the AND gate should be uncorrelated (SCC=0). Figure 2.6 shows how two SNs with p = 0.5 can be multiplied using an AND gate, resulting in a SN with p = 0.25.

01011001

00111100 00011000

Figure 2.6: Stochastic multiplication

Scaled Addition

Stochastic signals are only able to use scaled addition. This is because, by definition, they repre-sent a value in the range [0, 1]. This is done through logic structure called a multiplexer (MUX).

(23)

Figure 2.7 shows how a multiplexer is constructed from the basic logic gates. A multiplexer takes 3 inputs; an input A, an input B, and a selector input r. The selector input decides whether the multiplexer passes the value from A or B. In SC, if we take a SN R with pr = 0.5 as the selector, the multiplexer will implement scaled addition, i.e. 0.5A + 0.5B. By changing the pr the measure of scaled addition can be changed to implement any scaled addition in the form of (1 − pr)A + (pr)B. A r B z Truth Table Input Output r A B A and B 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 Figure 2.7: Multiplexer Division

While there are combinational circuits for the multiplication and scaled addition of SNs, other operations require sequential approaches to implement. For the division of two SNs the negative feedback circuit shown in figure 2.8 was proposed [41]. This circuit estimates the unknown value

p1

p2 by finding the value that multiplied by p2is closest to p1. It does this by adjusting its estimate

upwards or downwards depending on the difference between the estimate and the expected result. Stochastic sequential circuits require some starting time before providing the correct values, since they need time to adjust the estimate. Hence, for the purposes of video processing, in which the value of SNs constantly change, combinational approaches are more suited.

Figure 2.8: Stochastic division circuit [41]

Square root

A circuit for finding the square root [41] was also proposed, following the same logic as the division circuit. The circuit is looking for a SN√p1that when multiplied with itself most closely resembles the input p1. The circuitry for a stochastic square root, show in figure 2.9, is similar to the one for division, except it compares the estimate to itself rather than a second input p2. Absolute valued subtraction

Previous research [8] has provided a sequential approach to calculating the absolute valued subtraction between two SNs. In this approach a state transition diagram is constructed in

(24)

Figure 2.9: Stochastic square root circuit [41]

hardware. This transition table can then be configured in several ways to implement arithmetical functions, such as the calculation of absolute value subtraction. A sequential circuit that would seek to implement the logic of a diagram such as this would require a considerable amount of memory depending on the size N of the transition table. Recent research on the usage of correlation between SNs [2] has found that absolute valued subtraction may be implemented without having to rely on sequential circuitry. Two SNs in unipolar format, with SCC(X, Y ) = 1, can be subtracted in this manner using a single XOR gate. This would reduce the dependency of SC on state-based hardware and therefore reduce the overall chip-size. For the purposes of video processing the efficient implementation of absolute valued operations is of fundamental importance.

2.4.5 Stochastic Hardware

One of the weaknesses of stochastic circuitry is that it requires a method of generating random bit-streams, at least when the source input is in a digital number representation. In practice these streams are often generated using pseudo-random number generators [16]. Continuously generating random numbers to operate on is fairly costly, this causes the savings of the gate-level arithmetic of stochastic computers to often be lost. However, if one were to have a stochastic sensor as a source, these costs would be omitted. As mentioned earlier, the usage of correlation in SNs can extend the amount of operations that stochastic circuitry is able to perform. On the other hand, this has the downside that the functionality of uncorrelated operations becomes lost. Because of this additional resources must be spend correlating and decorrelating stochastic signals [24][35][34][42]. Another limiting factor is that SNs by definition represent a value between 0 and 1, this limits the mathematical operations that can be performed, since the usage of constants such as π and e is not viable, since they are not probabilistic. An overview of the disadvantages and advantages of SC when compared to digital processing can be found in table 2.2. Though SC has many disadvantages, that may be tackled in due time, it is able to address many of the challenges that modern processor design faces today. Through the reduced amount of circuitry required SC is able to address the power wall, since less circuitry will generate less heat. SC computes directly on the information received from the sensors meaning that no memory element comes into play, thus confronting the memory wall. However, when memory units are required it is currently not possible/efficient to use non-digital memory. Lastly, for the purposes of image processing, the small circuitry size and heat generated by SC circuits allow for every pixel in an image to have its own stochastic processor, thus achieving a high amount of parallelism. This observation is one of the main reasons why SC is interesting for the purposes of computer vision.

(25)

Feature Advantages Disadvantages Circuit size and

power consumption

Small arithmetic components make for low power consumption at the gate-level

Random number generators, conversion circuits and decor-relation circuits are required for the circuitry to serve its intended purpose

Operating speed High parallelism Long bit-streams may be

re-quired for accurate results Accuracy and

pre-cision

High tolerance to errors and pro-gressive precision

Low precision, fluctuations due to randomness, and inaccuracies due to correlation

Design Several arithmetic components

have a direct translation to logic gates

There is no design methodology for sequential circuits, causing some stochastic operations to be implemented in an ad hoc fash-ion

Table 2.2: Advantages and Disadvantages of Stochastic Computing

2.4.6 Variations on Stochastic Computing and Their Applications

There have been several proposed variants [26] to SC. These variants seek to address some of the shortcomings that are intrinsic to ordinary SC. The variants are called burst processing, bundle processing, and ergodic processing. Most research up until now has focused around the usage of the standard form of SNs, but variants might offer solutions to shortcomings with the original stochastic approach. Especially bundle processing and ergodic processing may find their use in stochastic hardware, since they are able to address the inherent variance of ordinary SNs. Both bundle processing and ergodic processing will be used further on in the paper. Burst processing, as of yet, does not seem to have a practical application, but is treated for the sake of thoroughness. For the sake of clarity we will be denoting the classical manner of SC as ordinary SC, the variants will be called by the names discussed in this section.

Bundle Processing

Since the amount of 1’s in a SN is randomly generated, they are not exactly precise: if we generate a SN X with px= 0.5 and N = 2, it is possible to get the bit-streams 00, 10, 01, and 11. This results in a bit-stream of length N having a precision of roughly √1

N due to the variance of the estimate. In order to accurately approach an 8-bit color value using the conventional approach we would need a stream length of 2562 _{to adjust for the variance. Bundle processing seeks to} work its way around this by having a fixed signal-length N . This way it is possible to represent a precision of _N1, because the variance caused by signal length is removed. This variant might be used in computer vision, since SNs do not require to be of varying length for the purposes of image processing.

Ergodic Processing

Ergodic processing combines ordinary SC with bundle processing. Instead of looking at individual bits, one looks at a set of n bits at a time. The stochastic variable thus becomes a stream of bundles. This method of approaching SC has the advantage of being able to give stochastic variables a property that is equivalent to a signal strength. Ergodic processing may also be used for the purposes of structuring continuous memory units; if a stochastic video feed is stored as a single continuous signal in memory, it is not possible to find a median value. Ergodic processing may be used to divide a continuous signal into multiple bundles, which would allow us to find a median of those bundles.

(26)

Burst processing

Burst processing steps away from a bit-stream, and instead encodes a decimal fraction as a number by a higher base increasing stream. The numbers in this stream are mathematically represented in decimal format, but for the purposes of hardware, a binary representation may be used. Here the value of a stream X is represented by adding all the numbers in the stream and dividing it by the length of the stream. For example, a higher base increasing stream 11223 would represent the fraction1+1+2+2+3₅ = 9₅ = 1.8. This representation has several characteristics: first, there are no effects from randomization since the numbers always appear in increasing order and secondly, it has the same PP that bundle and ergodic processing have. This way of processing numbers might not be as feasible in terms of a hardware implementation, since base increasing streams require a sequential approach to random number generation and can therefore be very costly.

2.4.7 Stochastic Circuitry for Roberts Cross Operator

In previous research [1] a hardware approach to the Roberts cross’ operator was proposed. Roberts cross algorithm is the fastest edge detection kernel within image processing, since it is the only 2x2 edge detection kernel. It is no longer commonly used due to the fact that 2x2 kernels are sensitive to noise, and most modern processors are fast enough to compute larger kernel sizes. The horizontal and vertical gradient components of Roberts cross are given by the following masks: Gx= "+1 0 0 −1 # ∗ P And Gy= " 0 +1 −1 0 # ∗ P

Roberts cross operator can compute the moving average of intensity values on a window of size 2x2 for each pixel Pi,jat row i and column j of the image, and generate an output Gi,jaccording to the formula:

Gi,j = 1

2(|Pi,j− Pi+1,j+1| + |Pi,j+1− Pi+1,j|)

The stochastic implementation of Roberts cross algorithm is shown in figure 2.10; the XOR gates perform the absolute-valued subtraction, while the multiplexers calculate the length of the two gradients. The auxiliary input r is set at pr= 0.5.

(27)

CHAPTER 3

My work

3.1 Stochastic Numbers

One of the questions in our research is what the the mathematical properties of SNs are, and what techniques and variants of SNs may be used for the purposes of implementing more accessible hardware. In this section we will discuss the applications of bundle processing and of stochastic correlation.

3.1.1 Stochastic Precision

In section 4.1.1 we look at the way in which the length of bit-streams affects the accuracy of the number being encoded. In section 4.2 we looked at how bundle processing can increase the precision of stochastic computer vision circuits. The usage of bundle processing allows for an exponential increase in the precision of SNs within image processing since the variance caused by randomness of the numbers is removed. In bundle processing the N1 of a SN X is given by the function N1= px× N .

3.1.2 Stochastic Correlation

[2] shows that a circuit’s functionality changes as a linear combination of its functions at SCC(X, Y ) = 0 and SCC(X, Y ) = +1 or − 1. For any SCC, pzmay be noted as [2]:

pz(px, py) =      (1 + SCC(X, Y )).F0(px, py) − SCC(X, Y ).F−1(px, py) if SCC(X, Y ) > 0 (1 − SCC(X, Y )).F0(px, py) + SCC(X, Y ).F+1(px, py) otherwise

We have found that this same behaviour may be implemented through the usage of a multiplexer between F0(px, py) and F+1(px, py) or F−1(px, py), in which the selector bit decides the SCC. Multiplexing fully correlated and uncorrelated signals may be an efficient alternative to using a randomization circuit in order to achieve the desired SCC and may make it easier to abuse SCCs in SC.

Stochastic correlation circuitry

In our research the SN generator illustrated in figure 2.5 is used for the purpose of generating correlated ordinary SNs. For the purposes of generating correlated stochastic bundles, a memory cell is used in which the required amount of 1’s (px× N ) for each SN is stored. This memory cell is then used to create SNs with leading 1’s, causing them to automatically be correlated.

(28)

Improvement SCC formula

We found that the definition 2 formula[2] for the SCC between two SNs contains a shortcoming:

SCC(X, Y ) =

( _ad−bc

n min(a+b,a+c)−(a+b)(a+c) if ad > bc

ad−bc

(a+b)(a+c)−n max(a−d,0) otherwise

In which a is the amount of overlapping 1’s, b the amount of overlapping 1’s in X with 0’s in Y, c the amount of overlapping 0’s in X with 1’s in Y and d the amount of overlapping 0’s, and n is the length of the SNs X and Y. The problem with function definition 2 for the SCC is that its able to divide itself by 0. Take for example two bit-streams X = 1111 and Y = 1100, this would give us the values a = 2, b = 2, c = 0 and d = 0. In this instance ad would not be larger than bc resulting in the bottom formula being used. The denominator of this formula is given by the formula (a + b)(a + c) − n max(a − d, 0). If we fill in the aforementioned numbers we get a divide by 0. From a mathematical perspective this result can be said to make sense, since the signals can both be said to have an SCC of +1 as well as having an SCC of -1. This is not the expected answer however, since if we refer to table 2.1, the two SNs implement the functionality found under an SCC of +1 and do not resemble the arithmetic for uncorrelated or negatively correlated signals. For this reason we propose the addition of a third term to the SCC formula resulting in: Definition 3. SCC(X, Y ) =      ad−bc n min(a+b,a+c)−(a+b)(a+c) if ad > bc ad−bc

(a+b)(a+c)−n max(a−d,0) if (a + b)(a + c) − n max(a − d, 0) > 0

1 otherwise

3.2 Stochastic Image Processing

In this section we demonstrate the application of SC in the area of image processing. We propose an efficient approach to the Sobel operator and median filter using SC. In our experiments we also test the accuracy of these circuits and confirm the results of the unipolar implementation of Roberts cross operator.

3.2.1 Stochastic Circuitry for the Sobel Operator

The Sobel operator is an improvement on Roberts cross that uses a larger kernel size to adjust for noise and is a combination of an average and a differentiation kernel. The horizontal and vertical components are given by the following masks:

Gx=    +1 0 −1 +2 0 −2 +1 0 −1   ∗ P And Gy =    +1 +2 +1 0 0 0 −1 −2 −1   ∗ P

The moving average of intensity values can be computed using the following formula: Gi,j=

1

2(|(Pi−1,j−1+ 2Pi,j−1+ Pi+1,j−1) − (Pi−1,j+1+ 2Pi,j+1+ Pi+1,j+1)|) +(|(Pi−1,j−1+ 2Pi−1,j+ Pi−1,j+1) − (Pi+1,j−1+ 2Pi+1,j+ Pi+1,j+1)|)

By making use of stochastic correlation we are able to implement a relatively small edge detection circuit. The required hardware to compute gradients in an image P is denoted in figure 3.1 in which the auxiliary inputs for the multiplexers all have the value pr= 0.5. Our proposed circuitry would require an additional 12 multiplexers when compared to the stochastic implementation of Roberts cross algorithm, but it has the benefit of preciser edge detection. The implementation of the Sobel operator is similar to that of Roberts cross, except that it requires intermediate results to be added to one another. Since there is added difficulty in storing SNs as intermediate results, some SNs are reused across different multiplexers to keep the circuitry combinational.

(29)

Pi−1,j−1 Pi−1,j Pi−1,j Pi−1,j+1 Pi+1,j−1 Pi+1,j Pi+1,j Pi+1,j+1 Pi−1,j−1 Pi−1,j Pi−1,j Pi−1,j+1 Pi+1,j−1 Pi+1,j Pi+1,j Pi+1,j+1 Gi,j

Figure 3.1: Proposed stochastic Sobel circuit

3.2.2 Stochastic Number Sorting

Paper [23] proposes the usage of the circuit shown in figure 3.2 to implement a stochastic 3x3 median filter. In which the tanh component in the stochastic comparator represents a state transition table. The configuration of the state transition table that implements the tanh circuit was proposed in paper [8].

Figure 3.2: The stochastic implementation of the 3x3 median filter [23]

By using correlated unipolar SNs instead of uncorrelated bipolar SNs, the size of this circuitry can be reduced significantly from the proposed sequential approach. In this paper we propose a new stochastic comparator, that uses the recently found effects of stochastic correlation, to efficiently compare SNs. Figure 3.3 shows our proposed stochastic comparator for two correlated SNs. This component may also be used to implement a stochastic bubble sort in hardware. This would require the circuitry of figure 3.2 to be extended in order to contain n2 _{comparators, in} which n is the amount of SNs that are to be sorted. Previous usage of the median filter within stochastic computation has been to to remove noise in an image. However, it can also be used to calculate the background of a series of images or video. This can be done by, instead of comparing pixels spatially at different points i, j within the same image, comparing pixels temporally at the same point i, j across different frames.

3.3 Stochastic Video Processing

For the purposes of demonstrating the applications of SC, we wanted to implement stochastic circuitry that could process on video footage. One of the main problems that we faced in imple-menting stochastic video processing is the need for memory. In a digital processor, intermediate results can be store efficiently in registers or memory; the binary representation of information

(30)

A

B

max(A,B)

min(A,B)

Figure 3.3: Proposed stochastic comparator

length scales exponentially with their precision, due to the variance inherent in random stream generation. This is demonstrated in section 4.1.1. Another problem that SC faces is that many video processing algorithms, such as optical flow, can not be implemented in probabilistic terms. However, the efficient gate-level arithmetic of SC may be able to provide efficient prepossessing circuitry for further digital operations. For the purposes of implementing stochastic video pro-cessing architecture we use the proposed stochastic median filter and edge detection circuit in combination with a stochastic background subtraction circuit as prepossessing. After prepos-sessing stochastically, we can convert the stochastic signal into a digital signal using a binary counter. We then use a digital blob detection algorithm in order to define a ROI.

3.3.1 Background Subtraction

To find a background in a video, it is required to store the values of previous frames. Since accessible circuitry for storing SNs has yet to be discovered [1], we currently use a digital image buffer to store the data of previous frames. The individual pixel values in the buffer are then used as the inputs for the random number generator shown in figure 2.5 with SCCmagn = 1 and SCCsign = 0. The SNs representing the pixel values can then be processed through the circuitry of section 3.3.2 in order to define an image background B by using stochastic circuitry. If stochastic storage existed, we might compute the background image as either the value of the entirety of the continuous memory, or by using ergodic processing to divide the memory into bundles for which we can then find the mean. The pixels P in a frame I(t) that change F (t) with respect to the background P [B] can be found by the equation:

P [F (t)] = |P [I(t)] − P [B]|

When using correlated unipolar signals, this absolute valued subtraction can be computed by using a single XOR gate between the background SN generated from the bitmap and the SN generated from the stochastic camera. This is shown in figure 3.4.

P[I(t)]

P[B] P[F(t)]

Figure 3.4: Stochastic frame difference

3.3.2 Computing Super-pixels

The frame-difference between a frame and a background image may be used for the purposes of defining regions of interest (ROI) in an image. The ROI of a stochastic frame-difference may be defined as the values in the frame difference that have a certain intensity. A larger ROI may be found by not lookint at pixels individually, but as clusters of pixels called super-pixels. The computation of super-pixels can be done easily at a stochastic level by multiplexing several SNs together. Figure 3.5 shows how a multiplexer may be used to compute the value of a 4x4 super-pixel. In practice a configurable multiplexer may be used to dynamically define ROI. Figure 4.11 shows how different sizes ROI are defined through the use of a multiplexer and how these ROIs offer an indication of where the movement is highest in an image. For the generation of figure

(31)

4.11 we compute 2x2 super pixels, which are then used as inputs for another multiplexer in order to generate 4x4 pixels, and so on. Using this method it becomes simple to generate images with different ROI sizes. By computing super-pixels it is possible to get an idea of which areas in an image contain the most movement, without the need for further computation. Further analysis, such as object detection, and tracking, is not yet possible at the stochastic level and will require a part of the processing to be done in digital circuitry, resulting in hybrid circuitry. The value of the super-pixels may be used as an indication for digital hardware as to where further processing is required.

Figure 3.5: Calculating super-pixel values [1]

3.3.3 Blob Analysis

We can further process the image by performing blob detection in order to define objects. Since blob detection algorithms require mathematical properties that SC has yet to implement, blob detection will need to be implemented at a digital level. We convert the output SNs of the super-pixel circuit to digital by using a simple binary counter.

For demonstration purposes we used a simple blob analysis algorithm based on a von Neumann’s neighborhood with a Manhattan distance r = 1 [40]. Figure 3.6 illustrates a pixel P and it’s neighbourhood D for Manhattan distance 1. We include pixels in a neighbourhood based on a double threshold by hysteresis. For our blob detection we search in the region of super-pixels with an activity above the given threshold. We take all the original values within the super-pixel and look at whether any pixels values are above our high threshold. If we find a pixel above the high threshold we look at adjacent pixels in the vertical and horizontal directions to see if they are above the low threshold. This process is continued for every pixel in the low threshold until no new pixels are added to the neighbourhood. We can then derive a bounding box (BB) for every blob detected in this manner. These BBs can be used for object tracking purposes such as a least squares fit of BBs or as ROI for a mean/CAMshift histogram [7]. Additional information such as the size (amount of pixels) of the neighbourhood or its median/mean value may be used to further select the relevant ROI. The blob detection is an optional extension of the stochastic circuitry and can be omitted.

(32)

3.4 Stochastic and Digital Hardware Comparison

3.4.1 Propagation Delays Expressed in Amounts of Standard Units

In order to add 8-bit numbers using digital circuitry, 8 full-adders are required. We will estimate the propagation delay of digital circuits by comparing the gate delay between stochastic circuitry and digital circuitry. In order to estimate the propagation delay of circuits, we will look at the amount of transistors that the critical path takes, this delay will be expressed in standard delay units. XOR gates have a delay of 2 standard delay units [10], because they are made out of a combination of multiple transistors. The gate delay represents a standard delay of the logic gates. The real number is dependent on the physical implementation of the specific components/transistors. All other delays are multiples of this one, because the critical path of each circuit, in the general case, is formed by the same type of transistors. For current high end chip implementations 1 standard delay unit could be taken as being 1ns [5]. A full-adder block, shown in 3.7, has the following worst-case propagation delays:

Figure 3.7: Full-adder

• From A or B to Cout; 4 standard delays (XOR → AN D → OR). • From A or B to S; 4 standard delays (XOR → XOR)

• From Cinto Cout; 2 standard delays (AN D → OR) • From Cinto S; 2 standard delays (XOR)

By applying these observations over an entire ripple carry adder circuit, it can be said that an n-bit adder will have a total propagation delay tp = 4 + 2(n − 2) + 2 = 2n + 2. Adding 8-bit numbers digitally would have a propagation delay of 18 units accordingly. The addition of a SN is done through a multiplexer with the worst case propagation delay being 3 units. This approach can be applied to the proposed circuitry and their digital counterparts to estimate their propagation delays.

Table 3.1 shows a rough estimate of the propagation delays between the proposed stochastic circuits and their digital counterparts. The numbers in the table are based on manually counting the components in both stochastic and digital implementations of the algorithms. It should be kept in mind that SNs require multiple iterations to represent, since they are continuous and generated one bit at a time. Longer streams will require multiple iterations of the circuitry. However, since SNs are continuously generated, changes in data will be directly reflected in the stream being processed. Another aspect of SC is that it is flexible in its precision, if we want more/less accurate results we can just lengthen/shorten the bit stream. Digital processing would require us to implement additional hardware, since the logic gate requirements for processing higher/lower precision binary numbers is physically different.

The table also shows the amount of logic gates used to implement the stochastic circuitry when compared to the amount used in digital circuitry. It follows from the table that stochastic cir-cuitry is more efficient than digital circir-cuitry in the usage of logic gates. The results do not take into account the additional costs of the pseudo-random number generator shown in 2.5. The hardware estimates shown in the table support earlier claims made about the costs and size of stochastic circuitry.

(33)

Propagation delay units

Roberts cross Sobel Operator Frame Difference 3x3 Median Filter

Digital 58 94 40 36

Stochastic 5 11 2 6

Logic gate requirements

Roberts cross Sobel Operator Frame Difference 3x3 Median Filter

Digital 202 602 85 1092

Stochastic 6 54 1 84

Table 3.1: Hardware costs for stochastic and digital processing

3.4.2 Video Processing Circuit Overview

The proposed circuitry for object detection can be seen in figure 3.8. We convert a digital video feed and frame buffer into a stochastic signal by using the circuitry suggested in figure 2.5, all output signals of the stochastic random number generators should be correlated, for this purpose the two random number generators will need to share random sources. The SNs generated from the frame buffer are then processed by the stochastic median filter proposed in section 3.2.2 to give a background image. The found background image is than subtracted from the SNs representing the current frame. In order to get better results it is possible to run the frame difference output through one of the proposed edge detection circuits (Sobel or Roberts cross). After the background has been subtracted, we compute the super pixel values by using the circuitry proposed in section 3.3.2. As discussed earlier, the super-pixel values are on their own enough to define ROI, it is possible to stop the circuitry here if we are just interested in having a general idea where movement takes place. It is also possible to process the video further by converting the SNs back to binary using a binary counter and then applying blob detection to define objects. This approach would result in more accurate ROI, but would abandon a purely stochastic implementation. Video Frames Sensor Image buffer Stochastic number generator Stochastic number generator Stochastic median filter Stochastic back-ground subtraction Stochastic edge detection circuitry Stochastic super-pixel calculation Stochastic to digital converter Blob detection

(34)

(35)

CHAPTER 4

Experiments

4.1 Stochastic Numbers

In the following section we will examine the relationship between the length of SNs and their precision and the way in which stochastic autocorrelation behaves. It is important to know these properties of SNs, since correlating signals and increasing the signal-length require the usage of resources. By finding the way in which these properties behave, it is possible to make trade-offs between accuracy and resource consumption for the purposes of SC.

4.1.1 Stochastic Precision

Figure 4.1 shows the index of dispersion, this is the relative variance of a value given by the ratio of the variance σ2 _{to the mean µ. The index itself is given by D =} σ2

µ. We use the index of dispersion instead of the variance as an indication for the precision of SNs because smaller SNs will naturally have less variance. In order to determine the precision of SNs we need to know the variance in relation to the value we are trying to represent, hence the index of dispersion is used. Figure 4.1 shows the index of dispersion for different probabilities pxacross stream lengths for ordinary SC. From this graph it can be seen that the lower the value of px, the higher the value of dispersion. This shows that, for ordinary SC, lower probabilities require an exponential increase in the length of the bit-stream in order to adjust for the variance involved in random number generation. The figure also shows that most stochastic processes can be stopped early, since there is no linear relationship between the length of the bit-stream and the accuracy of the estimate. The exact point at which a stochastic process may be stopped is arbitrary, since this is a trade-off that has to be made depending on the resource consumption and accuracy required in the specific circumstance.

(36)

4.1.2 Stochastic Autocorrelation

Table 4.1 shows the SCC between inputs A and B and the outputs of several logic gates, regard-less of the SCC(A,B). From this table it follows that most logic gate operations in stochastic computing autocorrelate inputs and outputs. We also found that when two SNs have an SCC of +1 or -1, multiplexing the two signals will never cause a reduction in the correlation of the two SNs. This observation if useful for the purposes of image processing, since most circuitry is dependent on the usage of SNs with SCC +1. The properties discussed in this section may be utilized to determine when to correlate or randomize SNs. The output of the XOR, XNOR, and MUX components is dependent on the probabilities of of their inputs relative to one another and further research may be done as to their exact relationship.

LOGIC GATE

SCC AND(A,B) OR(A,B) NAND(A,B) NOR(A,B)

A : 1.0 1.0 -1.0 -1.0

B : 1.0 1.0 -1.0 -1.0

Table 4.1: Approximate measure of autocorrelation for different inputs

4.2 Stochastic Image Processing

For the purposes of testing the stochastic image processing algorithms we used the image shown in figure 4.2. We then calculate the mean absolute error (MAE) between the stochastic imple-mentations and the digital impleimple-mentations of the same algorithm. The tables corresponding to the MAE graphs shown in this section can be found in appendix A.

Figure 4.2: Original image

4.2.1 Stochastic Image Accuracy

Figure 4.3 shows the MAEs, between a stochastic representation and the binary representation of an image for bundle processing and ordinary SC. The blue line shows the manner in which the length of the bit-stream relates to the ordinary stochastic implementation and the green line shows the relationship for stochastic bundle processing. It follows from the figure that bundle processing provides an accurate alternative to ordinary SC. However, it must be kept in mind that bundle processing sacrifices the properties that continuous signals have, such as environmental noise being mitigated over time. The figure also shows that the accuracy gained from increasing the length of a stream decreases the longer a stream goes on.

Sensitivity to noise between stochastic and digital circuitry

Figure 4.4 shows how an ordinary binary representation reacts to bit errors compared to how SNs handle such types of errors. It can be seen that a binary representation becomes unrecognizable at an error rate of 0.4, whereas a stochastic representation is still usable for further processing. This demonstrates the possibilities for of undervolting [27] as a method to further reduce resource consumption of stochastic circuitry at the cost of accuracy. Figure 4.5 shows the manner in which the MAE scales with error rate for a digital implementation when compared to a stochastic bundle and ordinary stochastic implementation. The figure suggests that for both SC and digital

(37)

Figure 4.3: Mean absolute errors for image representation (Table A.1)

Figure 4.4: Binary vs. stochastic accuracy for different error rates

computation there is linear scaling between the error rate and the size of the error. However, the figure also indicates that the linear increase in the MAE is higher for digital processes than it is for stochastic ones. Another interesting point is that the MAEs of the ordinary stochastic implementation and bundle processing diverges for higher bit error rates. This would imply that, if one wants to apply undervolting in order to reduce resource consumption, it does not matter if bundle processing or ordinary stochastic processing is being used, since the increased accuracy caused by bundle processing is lost when bit errors are introduced.

(38)

4.2.2 Stochastic Roberts cross

To test the accuracy of a stochastic Roberts cross implementation, we compared the results of an ordinary stochastic implementation and a bundle processing implementation to the results of the algorithm performed by digital processing. We consider the difference in pixel values between the stochastic implementations and digital implementation to be the absolute error. Figure 4.6 shows the MAEs of running Roberts cross algorithm multiple times for different bit-stream lengths on figure 4.2. This graph follows the intuition of figure 4.1 in that the added accuracy of increasing the run-time of a stochastic circuit becomes relatively small after a certain point. For bundle processing this point would be around a stream length of approximately 50, but for ordinary SC, this could be longer depending on the accuracy required. From the graph it follows that

Figure 4.6: Mean absolute errors for stochastic Roberts cross (Table A.3)

a stochastic implementation does not offer the exact same results as a digital implementation. The MAE does not go to 0, even for longer stream lengths. This is in part due to the inherent variance of SNs themselves, but also in part due to the variance of stochastic hardware, which relies on multiplexers for addition. In the stochastic implementation of Roberts cross the gradient is computed through the scaled addition of the derivatives Gx and Gy, which is not an exact calculation of the gradient, which is given byqG2

x+ G2y. Figure 4.7 shows the manner in which PP of SNs differs between ordinary SC and bundle processing.

Figure 4.7: Stochastic Roberts cross for varying bit-stream lengths (4, 16, 64, 256)

4.2.3 Stochastic Sobel operator

Since Roberts cross algorithm is known to be sensitive to noise, due to its small kernel size, it is often omitted in favour of more robust edge detection algorithms. Figure 4.8 shows the MAEs for the circuitry proposed in figure 3.1. The table indicates that the Sobel operator is less sensitive to the randomness that is inherent to SC. The MAE between the digital and stochastic implementation is a significant amount lower than for the stochastic implementation of Robert’s

(39)

cross algorithm. Figure 4.9 shows the properties of PP between bundle and ordinary stochastic

Figure 4.8: Mean absolute errors for stochastic Sobel operator (Table A.4)

processing. The PP is comparable to that of Roberts cross operator shown in figure 4.7. However, the Sobel kernel is more suited for detecting diagonal edges, which appear more nuanced than for Roberts cross operator shown in figure 4.7.

Figure 4.9: Stochastic Sobel for varying bit-stream lengths (4, 16, 64, 256)

4.2.4 Stochastic Median Filter

In the previous chapter we proposed a new type of stochastic comparator, shown in figure 3.3, for SNs in unipolar format with SCC +1. Figure 4.10 shows the result of the circuitry from figure 3.2 with the new comparator, the original image is shown on the left and the median filtered image on the right. The circuitry used for median filtering may be extended for additional inputs to allow for a stochastic bubblesort. The median filter can not only be used as a kernel within an image, but also to find the median between a series of pixels across different images. This behaviour may be used in order to find the background image in video footage.

(40)

4.2.5 Stochastic Frame Difference

In order to define a ROI for digital circuitry, we use a stochastic frame difference algorithm described in section 3.3.1. By multiplexing the pixels of a frame difference together we can define super-pixels which can be used to determine a ROI. Figure 4.11 shows how the pixels in a frame difference can be multiplexed together to define a ROI. The size that super-pixels will need to be to define a ROI will vary between applications since this is, in part, dependent on the resolution of the video footage. Another advantages of defining a ROI in this manner is that multiple object tracking becomes highly paralisable, since it is possible to have a dedicated thread for every super-pixel. In addition one may also save costs by directly performing blob detection on the super-pixel values, this feature may prove to be useful in real-time environments where trade-offs needs to be made in regards to processing costs and accuracy.

Figure 4.11: Multiplexing frame difference super-pixels

4.3 Stochastic Video Processing

Figure 4.12 shows the results of the proposed blob detection algorithm between the top two images. For the purposes of this image ordinary SC was used. For this particular experiment a high threshold of 60 and a low threshold of 15 was used in combination with a minimum neighbourhood size of 50. The figure shows the manner in which the precision of object tracking may vary for different stream lengths. The sensitivity in the first few images may be reduced by defining algorithms for dynamic thresholding. The accuracy may be increased further through the usage of an edge detection circuit on the frame difference image for higher contrast, as described in figure 3.8. The figure demonstrates that we find more BBs when using shorter stream lengths, thus acquiring less accurate ROI. For shorter stream lengths it may be useful to use super-pixels as an indicator of the ROI, rather than using the BB. The figure also shows that longer stream lengths, even for ordinary SC, are able to produce precise ROI, using BBs.

(41)

CHAPTER 5

Final remarks

5.1 Conclusion

In recent years the design challenges for integrated circuits have shifted to the resource con-sumption of circuits. Classical processing paradigms, such as stochastic processing, historically provide energy efficient alternatives to digital computers and offer solutions to modern process-ing challenges. We have demonstrated the low gate-level cost of stochastic circuitry and the relatively small propagation delays of these circuits. If we combine this with the native error tolerance, inherent to SNs, it provides an efficient alternative to digital logic circuitry. For the purposes of implementing stochastic video processing we have looked at the different variations of stochastic processing and how SNs may be interpreted in different ways for the purposes of such circuitry. We also did numerical tests on the accuracy of stochastic algorithms when compared to the conventional binary implementations and examined the numerical properties of SNs. We also looked at how the probabilities of SNs relate to their measure of autocorrelation and vari-ance. A particular look was taken at bundle processing and how its PP may be used for image processing purposes. Circuitry was proposed for a new Sobel edge detection chip, a stochas-tic comparator, image difference, and median filter, which make use of the effects of stochasstochas-tic correlation. Lastly, we proposed a hybrid circuitry that uses both stochastic and digital logic in order to efficiently track objects. From our research it follows that SC is able to provide answers to all three bottlenecks that digital systems face. The power wall is addressed through the effici¨ent gate level arithmetic that SC provides, the ILP wall is answered by the high level of parallelism that stochastic circuitry is able to provide, due to its small size and the memory wall may be tackled through the usage of continuous memory. With more research into the design of stochastic circuits it may be able to provide a suitable alternative to digital processing.

5.2 Future Research

Although stochastic computing is an old paradigm, there are still many problems that are yet to be addressed. In the following section we will point at some of the problems that remain in the field of SC.

One of the problems that came to light when researching stochastic video processing is the need for stochastic memory circuitry. Currently SNs are stored by converting them to binary [35], for the purposes of this paper a digital image buffer is used to store values of previous frames. However, since the circuitry itself is made for stochastic operations, the buffer in return needs to be converted back to SN for every frame. Being able to directly store SNs in memory would significantly reduce the costs of this kind of circuitry. One of the problems with storing SNs is that the cost of storing SN grows exponentially with their precision. Recent research into memristors [22] and magnetic-tunnel junction devices [32] suggest that these may be used for the purposes of implementing efficient stochastic memory. Efficient stochastic memory would, for example, open up the field of linear algebra using SC. Currently the kind of computations

(42)

required for linear algebra are not possible/viable since the continuous inputs, on which SC computes, are subject to change during computation.

One of the weaknesses of SC is the difficulty that comes with designing stochastic circuitry. In recent research [3] the use of spectral transforms to design combinational stochastic circuits was proposed. However, many stochastic circuits rely on sequential logic for their functionality. Steps have been made in the automatic synthesis of sequential stochastic circuits [37], but this approach is still quite limited. Further research may also be done in the application of machine learning in stochastic circuit design.

Another aspect of SNs is that they share similarities to neurological signals. The potential for stochastic circuitry for the purposes of medical applications remains relatively unexplored. It may be possible to create SC circuits that are directly able to interact with neural signals. The aforementioned property may prove invaluable in medical applications and research and remains yet to be studied extensively.

Further research may be done in the trade-offs between accuracy and power-consumption. For the circuitry proposed in this paper we are able to make trade-offs between the size of super pixels, the length of bit-streams and the amount of randomization circuitry used. For the purposes of implementing real-world stochastic circuitry it is important that informed decisions can be made regarding these trade-offs and some model is developed for these types of trade-offs to be made. Finally, most SC designs, including the ones in this paper, are verified through digital simulation or FPGA emulation. Designing circuitry in this fashion relies heavily on the assumptions made by whatever tool is used for the purposes of emulating the behaviour of the stochastic circuitry. There is a pressing need for stochastic chips to be fabricated in order to provide real-world data regarding their performance and shortcomings.

Stochastic processing for computer vision applications

Bachelor Informatica