• No results found

Non-parametric k-NN background model

weight πnand the space where the influence of a sample is present is controlled by D. Too big D leads to an oversmoothed solution where the close structures are merged together. Too small D leads to a very noisy and spiky estimate of the density function. The parameter D is called the smoothing factor or the bandwidth. While in practice the kernel form K has little influence, the choice of D is critical [1, 28]. However, it is easy to avoid this problems by using another standard non-parametric approach: the k nearest neighbors (k-NN) density estimate. The logic of the k-NN estimate is in a way opposite to the logic of the kernel estimates. The k-NN estimate is obtained by increasing the D until certain amount of data is covered by the kernel. In this way we get larger kernels in the areas with small number of samples and smaller kernels in the densely populated areas which is a desirable property. Usually the uniform kernel is used. The needed size of the kernel defined by its diameter D is used as the density estimate:

p(~x) ∼ 1/Dd

where d is the dimensionality of the data. Using the k-NN estimates as density estimates leads to some practical problems, but using the k-NN estimates for the classification purposes, as needed here, is straightforward and often used [2]. The parameter we need to set is the number k of the nearest samples.

One nearest neighbor is often used. To be more robust to some outliers we use k = [0.1K].

4.4.2 Modifications for background subtraction task

We propose a practical background subtraction algorithm here based on the non-parametric k-NN density estimate.

Time and memory efficient

The number of samples K in the model will be usually much larger than the number of clusters when the Gaussian mixture is used. For example for a moderate α = 0.001 we need K = 2301 samples which is far too much since it means keeping all this samples for each image pixel. However, if we don’t include samples from every incoming frame we could still have samples from a wide time span with a reasonable total number of samples. Following this idea we also note that instead of changing the weights πn it is possible to approximate the exponential envelope by appropriately changing the frequency of adding new samples to the model. In figure 4.1 we present an approximation with three steps for α = 0.001. The samples from the past are divided into

4.4. NON-PARAMETRIC K-NN BACKGROUND MODEL 51

-25000 -2000 -1500 -1000 -500 0

0.2 0.4 0.6 0.8 1

number of frames before the current frame

value of the exp. envelope

exp. envelope approximation

Klong Kmid Kshort

Figure 4.1: Approximation of the exponential envelope

three groups. Each group stands for 30% of the total area under the envelope (the last 10% - the oldest samples - are not considered as mentioned before).

The total number of frames per group can be calculated as:

Kshort = [log(0.7)/ log(1 − α)]

Kmid = [log(0.4)/ log(1 − α) − Kshort]

Klong = [log(0.1)/ log(1 − α) − Kmid− Kshort]

If we decide to use for example N = 10 samples per group we will have a total of K = 3N = 30 samples. A new sample is added to the short-term group after every Kshort/N frames. The mid-term group gets the oldest one from the previous short-term group every Kmid/N frames. The long-term group gets a new sample (the last one from the previous mid-term group ) every Klong/N frames. In this way we get the recent history most densely sampled but we still keep some of the old samples in the mid and long-term groups.

The background subtraction is now implemented in a simple way. For a specified threshold TbN P , the new sample ~x(t)belongs to the background:

à K X

n=1

Dn> TbN P

!

> k

where squared distance of the new sample ~x(t+1) from the model sample ~x(n) is calculated by Dn2 = (~x(n)− ~x(t+1))T(~x(n)− ~x(t+1)). The notation we used assumes that the result of the operation Dn> TbN P is equal to 1 if it is true and 0 if it is false. It is important to note that as soon as the sum is equal to k we can stop calculating the sum. Usually many pixels belong to the background and in this way we don’t need to always go through all the samples.

Adaptive

Again, the influence of the old data is controlled by the constant α. The ap-proximation with three groups of samples is described previously. The para-metric approaches are usually sensitive to the initial conditions. On the other hand the non-parametric approaches don’t have these problems.

Automatic learning

The heuristic described in section 4.2 used to suppress the influence of the in-truding objects on the background model can be implemented in the following way. Besides the samples ~x(1), ..., ~x(K) from the model, we keep also a set of corresponding indicators b(1), ..., b(K). The indicator b(n)indicates if the saved sample ~x(n) belongs to the background model. All the samples give the pb+f estimate and only the samples with b(k)= 1 give the background model:

à K X

n=1

b(k)(Dn> TbN P)

!

> k

When we add a new sample ~x(t+1) to the model the corresponding indicator is set to:

if à K

X

n=1

b(k)(Dn> TbN P)

!

> k or à K

X

n=1

Dn> TbN P

!

> TfN P

then b = 1, otherwise b = 0

If a new object remains static it will take some time before it can become the part of the background. Since the background is occluded, the number of samples in the model corresponding to the object will be increasing. We could expect the new samples to be close to each other. If the neighborhood defined by the TbN P is large enough we could use TfN P = [0.1K] as a corresponding value to the value TfGM = 0.1 discussed for the Gaussian mixture. In a similar way, approximately after log(1 − TfN P/K)/ log(1 − α) frames, the threshold TfN P will be exceeded and the new samples start to be included into the background model (the b-s are 1). In the Gaussian mixture case the new object is usually presented by an additional cluster that is immediately included into the background model. However, since the data is not clustered for the non-parametric method it will take additional time log(1 −k/K)/ log(1 −α) before the pixels corresponding to the object start to be recognized as the background.

4.4. NON-PARAMETRIC K-NN BACKGROUND MODEL 53 For α = 0.001, TfN P = [0.1K] and k = [0.1K] it will take approximately 2 · 105 = 210 frames.

4.4.3 Practical algorithm

The whole practical algorithm is given by:

Input: new data sample ~x(t+1), the samples from the model ~x(1), ..., ~x(K), corresponding indicators b(1), ..., b(K), for initialization we put values from the first frames into the model and the b-s to 1.

• background subtraction:

— kb = 0, kb+f = 0

— for n = 1 to K(= 3N ) (go through all the samples)

∗ kb+f = kb+f+(Dn> TbN P), with D2n= (~x(n)−~x(t))T(~x(n)−~x(t)) (we suggest TbN P = 4σ0)

∗ if (b(k)) kb= kb+ (Dn> TbN P)

∗ if kb> k it is a background pixel (we can stop further calcula-tion) (we use k = [0.1K] )

• background update:

— after every Klong/N frames remove the oldest sample from the long-term group and add the oldest sample from the mid-long-term group

— after every Kmid/N frames remove the oldest sample from the mid-term group and add the oldest sample from the short-mid-term group

— after every Kshort/N frames remove the oldest sample from the short-term group and add the new sample ~x(t+1). The correspond-ing indicator b is set to 1 if ((kb > k) or (kb+f > TfN P)), otherwise b = 0 (we use TfN P = [0.1K])

Output: updated set of samples ~x(1), ..., ~x(K) and the corresponding indi-cators b(1), ..., b(K).

We suggest the value TbN P = 4σ0 as a reasonable threshold value. Here σ0 is the mean variation of the pixel values mentioned previously, equation (4.4).

Initially we need to select the number of samples per group N and reserve the memory for the K = 3N samples. For the Gaussian mixture we needed to set the upper limit for the number of components but it had no important influence on the results. However, the number of samples K is an important

parameter here. More samples lead to more accurate results. However this also leads to slower performance and a large amount of memory is needed for the model.