Quadtree based image compression

(1)

Bachelor Informatica

Quadtree based image

compression

Bas Weelinck

June 17, 2015

Supervisor(s): Dick van Albada

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

The compression of pictures is an active research topic. We present an approach to compressing pictures using quadtree datastructures and pyramid image processing techniques. Furthermore an alternative interpolation method is presented that borrows from theory of processes in physics simulation. We describe a framework called Quadtree Imaging, or QI for short, that can com-press and decomcom-press images using the techniques presented. The framework currently achieves compression ratios sometimes close to the state of the art, while leaving room for further im-provement.

(5)

CHAPTER 1

Introduction

With the rise of personal computing and the internet the exchange of images has become one major use of communication systems that exists today. Since bandwidth is finite this also means the compression of images is important. Compression of images means the decrease in file size while leaving the leaving the user experience mostly unaffected.

1.1 The basics of compression

Compression bases itself on the prediction of information. Put differently, a file containing a lot of redundant information can be reduced in size by reorganising the information in a way that is less redundant.

Predictability of information can be illustrated by thinking of a random process. For instance, flipping a coin multiple times will give you either heads or tails, or thought of as binary output, a 1 or a 0. Now given that this is a fair coin it should not be possible to reliably predict the next outcome. And so we are left with no other way than simply record which side fell on top.

When the coin is biased however things become different. If I have a coin that, 90 percent of the time produces heads, a 1, instead of writing down every single 1 I can record the runs of 1’s I encounter and what I will have written down is shorter than recording every single result separately. And when this example is taken to the extreme, we can imagine I have a coin that always lands on heads, no matter what. Keeping track of every result becomes pointless as we will know the result is heads. In other words, every coin flip produces no new information.

1.1.1 Communication channels

A file or any piece of information can be regarded as a series of symbols. Those symbols can be sent over a communication channel. Furthermore, depending on the language that we speak we might interpret the symbols in different ways. Depending on our interpretation, some symbols will be more meaningful than others. Just like when you are reading this text right now, you will probably be able to predict at various points when some letters or even whole words will likely show up.

This ability to predict information makes the information that is expected less important since it becomes essentially redundant. As stated before, without interpretation a message carries no meaning. Since interpretation constructs the information contained in a message it is also not meaningful to try to quantify the information contained in the message without “understanding” it first.

In his 1948 paper Shannon introduces a formula that quantifies the amount of information contained in a message, given a model that tries to understand the information. [8] To restate this in mathematical terms, given a model that interprets a sequence of symbols, we are able to provide the probability mass function of the following symbols. By obtaining this function we now know the likelihood of the different symbols and therefore their information contents.

(6)

Equation (1.1) is the simplified form which yields for a given discrete variable X with proba-bility mass function p(x) the average symbol entropy in bits. Note that the logarithm base can be changed to obtain other units of entropy. In this paper we will stick to bits.

H(X) = −X

i

p(xi)log2(p(xi)) (1.1)

Given a message Y and a next symbol obtained from our information source, the entropy becomes sensitive to the context of the message Y as demonstrated in formula (1.2). The entropy contained in the various symbols produced by our information source is dependent upon the message we have received thus far. In other words, the expected probability mass function and thereby our expectation or prediction of the next chunk of information changes based on the past data.

H(Y, X) = −log₂(p(Y, XY)) (1.2)

1.2 Types of compression

Generally speaking there are two different types of compression. Compression that maintains all original information, and can therefore perfectly reconstruct the original information. And compression that removes some of the information based on an assumption of usefulness to the end-user. The former type is called lossless compression, as there is no loss in information. The latter is referred to as lossy compression.

In the area of image compression a lot of technologies have been developed. The next section will take a look at some of them.

1.3 Existing technologies

One of the well known compression systems and formats is JPEG. JPEG is an acronym for “Joint Pictures Experts Group”. JPEG’s main use is the storage of photos and other true to life imagery. The system allows for different levels of compression. At a low level of compression JPEG first disposes of information that should not be noticeable by the human eye. As compression level increases artifacts will become ever more noticeable.

JPEG is useful for storing images that do not contain sharp edges, but rather blurry bor-ders and object surfaces containing gradients. As JPEG achieves compression by discarding information it is a lossy compression system.

A well known lossless compression system is PNG. PNG is an acronym for “Portable Net-work Graphics”. While PNG can, of course, also be used for storing photos, due to its lossless properties it is particularly useful for storing graphics containing sharp edges and small details. This is made use of in for instance websites which use rasterised vector graphics or otherwise make use of computer generated imagery containing small details.

1.4 Research question

In this paper we examine the performance of a pyramid based image system that only stores prediction errors. The system is compared to conventional image compression methods. Fur-thermore, an alternative interpolation method is introduced to improve upon the bilinear inter-polation used in this image compression system.

Finally the influences of various interpolation techniques, alternative colour models, methods for constructing prediction errors and different entropy coding systems are evaluated. A complete list of the methods applied is provided in section4.2.

(7)

1.5 Text structure

This paper is divided into a few section dedicated to their own subject. They will now be briefly introduced.

Chapter 2– The abstract compression system

This chapter introduces the abstract pyramidal compression system. The system is subdi-vided into separate parts. This chapter purely describes theory of such a system.

Chapter 3– Algorithm implementation

The implementation of the system discussed in chapter2 is explained, including the chal-lenges that were met.

Chapter 4– Experimentation and results

Chapter4presents a comparison of compression performance of the reference implementa-tion with various convenimplementa-tional image compression systems. The effects caused by changing the system parameters in relation to the type of image compressed are also examined.

Chapter 5– Discussion

In this chapter we will wrap up. We look at what has been achieved and what the results tell us. Do they meet our expectations? What can be learned from them? Furthermore we’ll take a look at other compression systems and how this system related to them. Although the system presented in this paper is capable of compressing images, a lot of ideas and improvements have not been examined or implemented. Therefore we also list possible uses for this particular compression system and various improvements that can be made to it.

(8)

CHAPTER 2

The abstract compression system

2.1 Introduction

This chapter outlines a compression system for images called QI, an acronym for “Quadtree Imaging”. First we describe a framework that can be used, as a template for an image com-pression system into which various components can be incorporated. In order to construct this template we’ll first outline the various steps required to compress and later decompress an image. Later on we will present various components that can be used with this template to achieve a completed compression system. Finally we will examine the effects these components will likely have on information processed.

The design of the template is based on a pyramid representation where multiple levels of detail of the same image are contained on various levels of the pyramid. In our implementation at each level of the pyramid valid images can be constructed using classic and non-standard in-terpolation methods. Furthermore the system constructs and stores difference pyramids. These difference pyramids are needed due to the imperfect reconstruction by the interpolation algo-rithms used. When constructed however, these difference pyramids are the only information required to reconstruct the original image.

2.2 Data structure

The structure of image data in the application is somewhat reminiscent of texture MIP-maps. Let’s start with the basic premise that all images are power of two sized. [2]

The picture tree starts with the root pixel. This is a single pixel of uncompressed colour data. A next level of 2 by 2 pixels is then extracted from this root pixel by employing an interpolation scheme and correcting this with colour deltas. The process is then repeated resulting in a 4x4 image. We continue until we reach the final resolution.

Without any further additions this algorithm is not capable of representing images of resolu-tions other than X by X where X is a power of 2. Of course one could simply add a black border and crop the image during decompression but this is a rather crude solution. A much more com-putationally intensive solution employs sub-pixel zoom in order to allow arbitrary resolutions. To relax the square constraint the root pixel is changed into a root line, which approximates the aspect ratio of the original image.

To illustrate the method used to achieve any width or height consider a full sized image of size X by Y. A dimension X or Y is either odd or even. If the dimension is odd we will have to perform sub-pixel minification since the original image contains a single line too many when mapping two lines to a single line. If the dimension is even we can simply map every two lines to a single line of the smaller image. We are essentially bit shifting the dimension variables where the LSB tells us whether a sub-pixel algorithm is needed or if we can use an optimized version of the minifier.

(9)

(a) This version of the pixeltree illustrates how pixels are mapped from higher level pixels. In the case of images with dimen-sions a multiple of 2, every group of four pixels is mapped to a single pixel in the lower plane.

(b) The pixel tree represented in pixels. Every lower level plane has a quarter of the pixels of the higher level plane and is essentially a thumbnail of the higher level plane.

Figure 2.1: This figure demonstrates the processing of pixels within the QI system. In this case the highest planes represent the original image and the lower planes are bilinear interpolated smaller versions. Together a set of planes form what is called the pixeltree of an image. The single pixel at the bottom is called a root pixel.

2.3 Algorithm components

The compression algorithm can be subdivided into four distinct components: deconstructors, reconstructors, correctors and entropy coders.

• Deconstructors

These are functions that make the layers smaller until the root pixel is found. These interpolation functions not only affect the compression achieved but also the quality of smaller images in the tree which might be viewed as thumbnails.

• Reconstuctors

Reconstruction is in essence the interpolation used to magnify a smaller image. The quality of higher level reconstruction by these functions has the biggest impact on the achievable compression ratio; e.g. if the deconstructor creates a poor quality image but the recon-structor manages to construct the higher level better because of it the compression will be higher for that layer. This will become clear in the next section.

• Correctors

A correction function is responsible for extracting entropy from the predicted image (re-construction) and the original layer (de(re-construction); or, in case of the highest layer, the original unmodified image. Correctors are not completely separable from entropy coders as the format of the entropy limits our choice of coder. A corrector can for instance correlate individual colour channels, which changes the resulting entropy format.

• Entropy coders

Entropy coders encompass classic compression algorithms and perform the actual compres-sion; in fact no data is compressed until this stage.

In order to successfully achieve compression the prediction models just presented should cause the resulting data to exhibit a lot of repetitions and/or a small number of difference

(10)

E n c o d e r f o r e a c h p l a n e i n t h e t r e e p i x e l t r e e d e c o n s t r u c t o r c o r r e c t o r c u r r e n t p l a n e o r i g i n a l p l a n e l o o p r e c o n s t r u c t o r d o w n s c a l e d v e r s i o n c u r r e n t p l a n e r e c o n s t r u c t e d e n t r o p y c o d e r c o r r e c t i o n m a p s original f u l l s i z e o r i g i n a l b e c o m e s t r e e t o p

Figure 2.2: QI encoding flow of data: The original image data enters the top level of the pixeltree, and gets sent to the deconstructor which downscales the image data. We call every level of resolution an image plane. These are stored in the pixeltree and then processed again and again until we reach a single pixel called the root pixel or plane. Every deconstructed plane is also sent to a reconstructor which scales it up again. All different resolution pairs are finally sent to the corrector which produces correction maps containing a description of the imperfections of the reconstructions. Lastly, these maps are encoded and stored in a file together with the colour value of the root pixel.

values that make up the majority of the data. The more present those properties are the greater the compression achieved.

Decompressing an image processed using the machinery just described requires the entropy coder and corrector to reverse their job and applies the same reconstructor as used during compression.

2.4 Requirements for compression

This compression system transforms an image into a pixel quadtree, which reduces size fourfold at every level. Thus we can begin drawing certain conclusions about the impact, compression of difference images, has within the compression system.

Every level of the tree contains a number of difference pixels, or deltas, equal to the size of that level’s pixel plane. The highest level which has the size of the original image therefore contains deltas with a 1.0 delta to pixel ratio. An equal number, a difference stored for every pixel. The second highest level only has a quarter number of deltas, a ratio of 0.25.

(11)

D e c o d e r f o r e a c h p l a n e i n t h e t r e e p i x e l t r e e r e c o n s t r u c t o r n e x t p l a n e t o r e c o n s t r u c t original fullsize original at tree top

c o r r e c t o r

c u r r e n t p l a n e r e c o n s t r u c t e d

c u r r e n t p l a n e o r i g i n a l e n t r o p y c o d e r

c o r r e c t i o n m a p s

Figure 2.3: QI decoding flow of data: Largely a reverse of the encoding process. Stored correction maps are decoded and sent to the corrector. The pixeltree is initialised with the root pixel. The reconstructor is then run on the root pixel. The resulting upscaled image plane is processed by the corrector using the correction maps. The correction maps inform the corrector of the imperfections in the reconstruction allowing it to restore the image plane to an exact replica of the original. This perfect reconstruction is added to the pixeltree after which the reconstructor starts again. This process continues until an exact copy of the original image at full resolution is obtained.

It becomes apparent that for theoretical images of infinite size the ratio of deltas needed is the result of the following infinite series:

∞ X 0 1 4n = 1.0101 . . .base 2= 4 3 (2.1)

Deltas do not necessarily require more storage than the 8-bit values they correct. If the 8-bit intensity data is regarded as modular numbers, the numbers become pivots that the deltas push down or up. Put differently by adding two 8-bit numbers, exploiting overflow I can manipulate the second 8-bit number to obtain any 8-bit number as a result regardless of the value of the first number.

On the other hand, uncompressed deltas for a non-modular intensity space are the difference between the predicted value and the actual value. Since these deltas have to be able to represent values from −255 to 255 or 511 different values in total, they require 9 bits of storage.

8-bit deltas obviously offer the advantage of containing less bits. This however comes at the price of representing more entropy per bit and essentially giving the entropy coder less information to work with. 9-bit deltas contain less entropy per bit, but require more base storage. However the entropy coders might use the extra information to their advantage offsetting the larger alphabet size. 9-bit deltas provide higher detail histograms.

We can compute the raw increase in file size this compression system has when storing tightly packed deltas and the compression required for deltas in order to achieve overall size reduction.

For 9-bit deltas:

Delta to pixel ratio · Delta storage ratio ≈ 4 3·

9 8 =

3

2 (2.2)

Or in the case of 8-bit deltas:

Delta to pixel ratio · Delta storage ratio ≈ 4 3·

8 8 =

4

(12)

The uncompressed delta size does not actually affect the compressed number of bits per delta required to achieve compression. This number is entirely dependent upon the “delta to pixel” ratio. Both 8-bit and 9-bit deltas must be compressed to a size of 8 · 3

4 = 6 bits per delta to

break even. However, since the “delta to pixel ratio” approaches 4

3 from below this can already

be regarded as a slight reduction in size.

This compression system focusses on 9-bit deltas exclusively.

2.5 Alternative correction methods

The most basic method of correction is simply taking the differences between the reconstructed and original image planes. In case of 8-bit colour components we are left with 9-bit deltas. There are however various improvements that can be implemented at this level.

2.5.1 Colour models

One of the simple changes to the way the corrector computes deltas is the introduction of colour models. While observing raw delta maps it was noticeable a lot of them contained little colour. This means that whenever colours do change the components change together and with the similar magnitude and direction.

In a compression system one tries to exploit any promising relations in the behaviour of data. One could of course implement a PCA-style decorrelation system for colour. But a simpler method takes just the average change of the components and then corrects for the miscalculation in individual components.

This converts RGB delta space to IRGB delta space with I the average change.

2.5.2 4th pixel correction

Another, colour-model-compatible change is the introduction of 4th pixel correction. Since bi-linear deconstruction is the deconstructor most likely to be deployed, for images that use this deconstructor 4 pixels in a higher plane have an average approximately equal to their respective root pixel one plane lower.

This knowledge can be used to make the prediction of every 4th pixel much more accurate by re-correcting the pixels after the other three are known to be correct.

In other words, 4th pixel correction corrects a grid of pixels in the old fashioned way but then treats every last pixel in a group of four as special by correcting it based on a value obtained by satisfying the average. Figure2.4illustrates the pairing of groups of pixels.

a0+ b0+ c0+ d0

4 ≈ r0 (2.4)

3

d

3

Figure 2.4: A demonstration of the 4th pixel correction, the blue pixels represent underlying pixels that pixels at the higher plane are interpolated from. The red pixels are then predicted by taking the average of the grey pixels and comparing with the average of the blue root pixels.

(14)

CHAPTER 3

Algorithm implementation

3.1 Interpolation methods

The deconstructors and reconstructors require various methods of interpolation in order to allow the reconstruction of higher and lower level images. Since an image is always a sample of reality we cannot be sure the data can accurately describe the original full set of data. One way of understanding interpolation is as a model of an ideal system of which sampled data is available. For example, a series of numbers behaves differently when described as a waveform rather than a set of interconnected points. Needless to say depending on the expected behaviour of data the different methods of interpolation used are manifold. Furthermore an important implication this has is the absence of a “best” method. More computational fire-power does not necessarily imply better results.

• Nearest-neighbour

NN is a classic fast way of interpolating that is usually not useful in real world systems; it can however yield interesting results in theoretical situations or as base reference. Interpo-lation probably does not get much simpler than NN.

The name says it all, nearest neighbour predicts unknown points by selecting the closest known sample and simply assigning that sample’s value to the unknown point.

In the case of our algorithm this simply means the pixels are sharp and zooming in yields a mosaic of the smaller image.

• (Bi)linear interpolation

This kind of interpolation can also be summarised in little more than the name itself. This method of interpolation works by connecting known points by drawing straight lines, thereby assigning all the intermediate unknown points a value.

Bi-linear interpolation then applies this concept in 2D space by interpolation twice, hence the bi- in bilinear. First lines parallel to one axis are drawn and finally all points on those lines are connected along the other axis; furthermore it does not matter which axis you start with, the end result remains the same. If you were to view the interpolation result as a height map in 3D it would look like a crystalline landscape.

Bilinear interpolation already yields much better results handling real world image data but does not handle edges well; everything is smoothed, even sharp edges.

• TVD flux-limited interpolation

Total variations diminishing or simply TVD is a method of handling numerical simulations dealing mostly with differential equations, such as the heat equation.

TVD has for instance been used in the simulation of gas and fluid dynamics. TVD treats samples, pixels or cells as containers of quantities of substance. This substance does not increase or decrease but can move around within the container to take various shapes.

(15)

Flux limiters then control how the substance behaves with respect to the amount of sub-stance in the surrounding containers.

Once more, in case of our algorithm the amount of substance is controlled by the various pixel intensities. The flux limiter then controls the shape of the substance in the pixel, thereby handling all interpolation for points within the boundaries of this pixel. By using open intervals for our pixels all points are assigned a single unique pixel containing them. In our case the flux limiter positions a plane through the center of the pixel roof, thereby keeping the amount of substance the same but creating a linear change of intensity through-out the pixel. [9] In fact using the right parameters an identical result will be obtained if the flux limiter were to set 4 values at the corners of the pixel and then applying bilinear interpolation throughout the rest of the pixel.

3.2 Entropy coding

All systems attempting to achieve compression face the problem of entropy. Entropy in computer science and information theory is related to and was named after entropy in physics. It is, however, not quite the same. Entropy can be summarised as model noise. Ideally there is nothing that can be done to predict it, which means the model predicts reality as best as reality is predictable; of course ideally speaking there would be no noise at all which then requires no storage.

As strange as this may sound, this means that a compression algorithm specialises in recording noise and only noise itself. The rest can be inferred based on predictions of how data is supposed to behave. This is also why heavily compressed files tend to resemble independent and uniformly distributed noise.

This algorithm also tries to eliminate all predictable aspects of the image in various passes. Entropy coding is the final pass which assumes a stream of noise in an alphabet with a certain distribution. This distribution may or may not be allowed to change over time (as the prediction of this distribution is in essence not part of the entropy coder but a parameter of).

The entropy coder then finally outputs this stream in a binary form which should be pure noise. There exist various entropy coders, and classes of entropy coders, specialised in certain distributions, mostly trading speed for efficiency.

A fairly well known class of entropy coders are the prefix codes, in our system we examine two of them. Huffman codes and Golomb coding.

3.2.1 Prefix codes

The aptly named prefix codes are a class of entropy coding systems. Prefix codes translate every object in the alphabet being coded to a fixed binary output sequence. Prefix codes get their name from the fact that every binary output sequence, coding a valid object, is not a prefix of any other valid code in the coding system. This means that when decoding a prefix code encountering a valid binary sequence means we have found the object to be decoded.

Since a prefix code outputs the same sequence every time an object is encoded they are relatively fast to execute. Furthermore most prefix codes approach their distribution’s optimal information contents quite well.

They do however have a tendency to be easily corruptible in a bad way. Since prefix codes output variable length sequences for various items even a single decode error can cause the entirety of the rest of the stream to be decoded wrongly. This means the flipping of a single bit in the stream can render whatever follows unusable.

• Huffman coding

Every prefix coding scheme can be described by a binary tree. You simply follow the branches labelled 1 or 0 using the data you want to code and when you find a leaf you have either successfully en- or decoded the information.

Huffman coding uses this concept for the construction of the codebook itself. Huffman coding is an entropy coding scheme optimal for alphabets where the probability mass of

(16)

individual symbols approaches negative powers of 2. [7] For its construction Huffman coding repeatedly combines the two unlikeliest items and binds them into a binary tree node, this process also takes into account nodes created previously. This means that if the two unlikeliest items together remain one of two unlikely items they are combined again with either a node or an alphabet leaf.

The process continues until only a single node remains which is the root of the tree. Now the codebook is complete. Huffman codes yield good results compressing various distributions, but do round all probabilities within those distributions to negative powers of two. This means that while good, the results will most likely not be optimal. Furthermore it is not always possible to use Huffman coding, such as at times not the entire alphabet is known.

• Golomb-Rice coding

Golomb(-Rice) coding is a pair of prefix codes designed for handling geometric distributions. Golomb codes have been proven optimal within the class of prefix codes for handling these distributions. [6]

Geometric distributions are the discrete relative of the negative-exponential distribution. Both distributions tend to infinity. This also means Golomb coding has the nice property of being able to handle an infinitely large alphabet.

Golomb coding combines two other prefix coding schemes to achieve its result: unary coding and truncated binary coding. Unary coding is a coding scheme for an infinitely large alphabet giving Golomb its property of infinity. Truncated binary coding is a prefix code for use with uniform distributions of non-power-of-two alphabet size.

Unary coding is simple, to encode a number N in range 0 to infinity, output N 1’s followed by a 0. To decode a number N we simply count the number of 1’s we encounter before the terminating 0. Of course the 1’s may be substituted with 0’s and vice versa, however the idea of 0 termination is somewhat intuitive in computer science.

Unary coding is the optimal prefix code for a geometric distribution of negative powers of 2, i.e. 0 has probability 0.5, 1 has 0.25, 2 has 0.125 and so on an so forth.

Golomb codes

Truncated binary coding approximates the uniform distribution for prefix codes. It allocates approximately the same number of bits for every symbol. Since the assumed distribution is uniform (and it is assumed this distribution doesn’t change, which is assumed for all prefix codes) it doesn’t matter which symbols get assigned which code length, the penalty should remain the same. Obviously in reality a distribution does not exactly follow a certain mathematically exact density nor is the noise truly independent identically distributed in a lot of world cases.

Therefore one could tweak the truncated binary code with a simple mapping of symbols based on which symbols are marginally less common than others. Within Golomb coding this already happens by the design of truncated binary coding. By default truncated binary coding gives the last x characters the longer prefix codes and within a geometric distribution these are indeed generally the less common symbols.

Truncated binary codes split the range of symbols into two subsets. For X different symbols these ranges are sized as follows; Ideally X symbols require log2(X) bits. However in a discrete

system of prefix codes there must always be an integer number of bits per symbol and we can optimally have the average approach the theoretical limit. The outcome of this logarithm however does give us limit on the lower boundary.

In truncated binary coding the first K symbols are assigned blog₂(X)c bits, whereas the remaining L (= X − K) symbols are assigned dlog2(X)e bits.

Unary codes or thermometer codes are a simple prefix code for encoding numbers distributed according to negative powers of two.

Unary codes are a simple prefix code for encoding symbols distributed according to negative powers of two. On their own their use is limited since unary codes do not take a parameter and are therefore not flexible on their own, they always optimally encode the same frequency distributions.

(17)

Table 3.1: With these tables it can always be confusing how to read the prefix code correctly. In this case you read the bits from right to left to have follow the order in which the computer encounters the bits within a stream of bits.

Symbol Nibble (4-bits) Truncated binary code

0 0000 00 1 0001 01 2 0010 010 3 0011 011 4 0100 110 5 0101 111 0 1 2 3 4 5 6 Symbol 0.00 0.05 0.10 0.15 0.20 0.25 Fre qu en cy

Figure 3.1: Optimal frequency distribution for a truncated binary code with an alphabet of 6 symbols as presented in the table.

Table 3.2: With these tables it can always be confusing how to read the prefix code correctly. Differing from the truncated binary table this table needs to be read left to right like a string to follow the order in which the computer encounters the bits within a stream of bits.

Symbol 1-terminated unary codes 0-terminated unary codes

0 1 0 1 01 10 2 001 110 3 0001 1110 4 00001 11110 5 000001 111110 N (0)N₁ ₍₁₎N₀

(18)

0 1 2 3 4 5 6 Symbol 0.0 0.1 0.2 0.3 0.4 0.5 Fre qu en cy

Figure 3.2: Frequency distribution for the unary codes. The alphabet stretches into infinity but only six symbols are presented here for convenience and consistency.

Table 3.3: A demonstration of how Golomb codes are built from a division-module operation that encodes the quotient and the remainder using different prefix-codes en concatenates the result. The divisor in this case is 3. The divisor is also referred to as the M parameter. tb3(x)

is a pseudo function outputing truncated binary for an alphabet of 3 symbols.

Symbol Quotient Remainder Unary Trunc. binary Resulting Golomb

0 0 0 0 0 00 1 0 1 0 10 010 2 0 2 0 11 011 3 1 0 10 0 100 4 1 1 10 10 1010 5 1 2 10 11 1011 N N / 3 N % 3 (1)N/3₀ _tb 3(N %3) ((1)N/30 : tb3(N %3))

Unless the symbol distribution is naturally negative power of two exponential or preprocessed to be so the codes will not be efficient at encoding data. Summarising the first symbol must make up approximately 50% of data, the next symbol 25%, the following 12.5%, and so on. Unary codes can encode an alphabet containing an infinite number of symbols.

Golomb coding harnesses the power of both coding methods by using the exponential property of unary codes and the flat but slightly slanted property of truncated binary.

3.2.2 Arithmetic coding

While not exactly a class of coders itself, we will examine one non-prefix entropy coding scheme, the arithmetic coding scheme. Arithmetic coding can be summarised as a coding system that transforms a certain message using some alphabet into a single fraction. [11] This fraction can usually be stored using fewer bits than the original message and needless to say can be transformed back to the original message.

To understand this process consider the real number line and more specifically the part from 0 to 1.0. In other words the open interval [0, 1); or 0 to 0.999 . . .. In base 2 this becomes 0 to 0.111 . . .. Furthermore it’s simple to transform any binary data into a fraction the only thing that changes is the interpretation.

Let’s say there is a message consisting of only As, Bs and Cs. For instance ABCAAB-BCC. If we were to write this as a fraction in base 3 the result would be the following number:

(19)

0 2 4 6 8 10 12 14 16 Symbol 0.00 0.05 0.10 0.15 0.20 0.25 Fre qu en cy

Figure 3.3: Optimal frequency distribution of the first 15 symbols in the infinite alphabet of a Golomb prefix code with M parameter 3

0.012001122. This base 3 fraction can then be converted to base ten ≈ 0.187420616775898 or binary ≈ 0.001011111111101011001100.

Of course in conversion from something such as base 3 the representation is likely infinitely long. It can not be accurately represented by a finite decimal binary number, at best a finite ratio. However if we know how many symbols we are about to encode we obviously only need to encode the binary decimals required to make the first x base 3 decimals correct where x is the number of symbols to encode.

While this concept is simple it does not deal with non-uniform distributions. So let’s re-examine the decimals. We’ll still use an alphabet of three symbols, only this time we assume A has 50% chance of occurring while B and C both have a 25% probability. We could encode this by stating that a decimal in range [0, .50) means A. A decimal in range [.50, .75) means B, etc. So let’s say we encounter the number .333 . . . = 1

3. Since 0 ≤ .333 . . . < .5 the first symbol

must be A. Similarly any binary fraction can be used to code A just as long as the resulting fraction is smaller than 1₂. In binary this means the number has to start with .0 Any following bits do not matter in determining if the number is smaller than 1₂ since .0 can never be larger only asymptotically approach it with a near-infinite string of ones.

To encode 2 symbols we place the fraction range of the second symbol within the fraction range of the first symbol, in a sense we are constructing a fractal. The symbols AB can be encoded as follows. By itself B requires a fraction 1₂ ≤ x < 3

4 To put B after A in a single

fraction we drop Bs range inside the space created by A, any fraction within this range can represent AB:

Blow∗ Arange+ Alow ≤ xAB< Bhigh∗ Arange+ Alow (3.1)

0.5 ∗ 0.5 + 0 ≤ xAB< 0.75 ∗ 0.5 + 0

0.25 ≤ xAB< .375

Or in binary: .010 ≤ xAB < .011 With which we can conclude that any value starting with

.010 successfully resembles AB. Another fun observation is constructing a Huffman tree would in this case yield the same sequence of three bits. Arithmetic coding generates the same results as Huffman coding given symbols distributed according to negative powers of 2 and sorting the ranges from largest to smallest probability. However, Huffman is limited to generating sequences of bits for every symbol where arithmetic coding can closely approximate any distribution. The trade-off is as usual a heavier computational load.

(20)

Practical arithmetic coding

Of course a computer cannot work with infinite precision. A computer can approach it by using arbitrary precision software and lazy evaluation. It is perfectly possible to compute π to millions of digits, and similarly is the possibility to perform arithmetic coding with enough precision to encode any string of numbers.

However, this method would be slow and tedious. Luckily there exists a simple solution that employs a fixed precision scheme to implement a fast arithmetic coding system for alphabets of sizes that fit most needs.

In the case of this compression system the numbers encoded are 16-bit in size. The correction maps are stored in 16-bit form, however deep down they actually only use 9-bits, encoding signed numbers that range from -256 to 255.

In other words the basic correction method isn’t any more difficult than simply computing the per pixel component difference between the original and the reconstruction.

Needless to say, the higher the precision with which the arithmetic coding is implemented the better the compression result will be. Contrary to Golomb coding arithmetic coding requires a much larger input parameter as it does not by default make any assumptions about the underlying frequency distribution.

This is both a strength and a weakness. Since arithmetic coding only needs as input a frequency distribution or histogram of the data to be compressed it can be tuned to near optimally compress any type of i.i.d. data. Arithmetic also allows the distribution to change at any time which removes the i.i.d. constraint. In fact given any model arithmetic coding will yield near Shannon entropy limit results for the given model.

This however requires the model to be stored, either within the compression application or the compressed file. Obviously storing the model in the file reduces compression achieved, but given an alphabet of limited size or a clever method of describing the model this penalty need not be heavy.

To summarise, the strength lies in the flexibility but with this flexibility comes a weakness which is the need to capture the model. You can choose to trade flexibility for smaller description size which may or may not decrease achieved compression depending on data behaviour. If deciding on this trade-off becomes a big problem you might be better off using Huffman trees or some other prefix code.

(21)

CHAPTER 4

Experimentation and results

Within the compression system we are obviously interested in compression obtained. We want to know how transformations we apply influence the resulting entropy. Preferably we want the transformations to be analysed independently from each other.

This chapter will provide the compression ratios the QI system currently achieves by iterating through its parameters while applying it to a standard set of test images. These results will then allow us to answer our questions posed earlier.

4.1 Obtaining results

Initially we will be looking at a small set of standard test images also contained within the SIPI database. [1] Furthermore a creative-commons licensed photo from Flickr [5] has been used as well as screen captures created by the author.

For PNG results we will use the size of the output IDAT block resulting after applying OptiPNG with default settings. This will provide an adequate representation of PNGs best performance.

For JPEG-LS we will simply use the encoder default settings. As size used for computing compression ratios we will use the encoders outputted symbol size.

Finally for our compression system we will use the file output size as the metadata contained therein is negligible. To compute our compression ratios we will use the space required for raw 24-bit RGB data for colour images and 8-bit intensity data for grayscale images.

The compressions system presented in this paper has various parameters which can be tuned to better fit specific image profiles. In our results we will present the best result currently obtainable and the results obtained using the settings that have the best overall performance. In this case best overall means the setting with best average compression ratio when run over the test image set. The best results per image have been obtained by exhaustive search through the systems parameters. The next section will introduce those parameters.

4.2 Compression system parameters

As it stands the compression system has no ”continuous” parameters. The only parameter coming close to this qualification is the Golomb code M parameter.

To summarise the system currently has the following tunable components:

• Deconstructor

1. Nearest neighbour 2. Bilinear

(22)

1. Nearest neighbour 2. Bilinear

3. 2D Albada flux limiter

• Corrector

1. Simple differencing

2. Differencing with 4th pixel average correction

• Colour space 1. 8-bit grayscale

2. 24-bit RGB true colour

3. 32-bit IRGB correlated true colour

• Channel mode

1. Single pass encoded interleaved channels. 2. X pass encoded consecutive channels.

• Entropy coder 1. Golomb codes. 2. Arithmetic coding. 3. DEFLATE (using Zlib).

This allows for an exhaustive search through a selection of these parameters based on the image being grayscale or not to find a size optimal compression. Alternatively this can be divided by time taken to (de)compress to achieve an acceptable trade-off between computational complexity and size performance.

For colour images the grayscale colour space is not of interest leaving us with 2·3·2·2·2·3 = 144 different configurations.

For grayscale images the only colour space of interest is the grayscale colour space. Fur-thermore, since only a single colour channel remains there can also be no correlation and no interleaving, thus, the interleaved and consecutive channel modes become the same reducing to 2 · 3 · 2 · 1 · 1 · 3 = 36 possible configurations.

Technically speaking every time the deconstructor is switched to nearest neighbour we can skip the 4th pixel corrector as well since it builds on the assumption that the every node pixel approximates the average of the 4 pixels it derives from. This further reduces the configurations. However at the present time the testing software does not implement this skip and iterates over all possible combinations with regard to image colour space.

If you look at the tables in table 4.1 and table 4.4, containing the search parameters you will notice the system is cycling the lowest level back-end and carrying over to the highest level front-end. At the lowest level are the entropy coders, the last part to be run. At the highest level is the deconstructor, the first thing to process incoming pixel data. The reason for this order of searching is that it allows for a much more efficient handling of data and as a result is much faster.

Since the QI system has separable stages of operation that do not affect each other’s operation it is possible to cache the intermediate results of each stage. This allows for lower level stages to be rerun with different settings on higher level data without ever recomputing the higher level data. This is a time/memory trade-off but the penalty is acceptable as the intermediate data will never exceeds more than a few multiples of the original image data.

(23)

Table 4.1: An excerpt from the exhaustive search through the system parameters for a monochrome image. The channel modes column is not shown as it is irrelevant for single channel images such as grayscale. The colour model column however has been left in for clarity. This allows the result lines to uniquely identify a specific system configuration. The image used is “Elaine”.

Dec Rec Corrector Col Coder Size (KB) Ratio (%) NN NN standard I Golomb 216.8 84.7 NN NN standard I Arithmetic 232.7 90.9 NN NN standard I Zlib 210.2 82.1 NN NN 4th I Golomb 247.2 96.5 NN NN 4th I Arithmetic 259.2 101.3 NN NN 4th I Zlib 225.1 87.9 NN BI standard I Golomb 215.1 84.0 ... 22 results omitted ... BI BI 4th I Zlib 204.4 79.9

BI AFL standard I Golomb 194.0 75.8 BI AFL standard I Arithmetic 213.7 83.5 BI AFL standard I Zlib 211.2 82.5 BI AFL 4th I Golomb 183.2 71.6 BI AFL 4th I Arithmetic 201.6 78.8

BI AFL 4th I Zlib 201.0 78.5

Table 4.2: Best five results for the Elaine image

Dec Rec Corrector Col Coder Size (KB) Ratio (%) BI AFL 4th I Golomb 183.2 71.6 BI BI 4th I Golomb 187.3 73.2 BI NN 4th I Golomb 189.9 74.2 BI AFL standard I Golomb 194.0 75.8 BI BI standard I Golomb 198.4 77.5

4.3 Best and worst results

Looking at the top 5 worst results produces for the Lenna and Elaine image contained in ta-bles4.6 and 4.3 respectively. We do not surprisingly find some configurations that make little sense. Combining the 4th pixel correction algorithm with a nearest neighbour deconstructor tends to enlarge the error than reduce it, due to the fact that 4th pixel correction expects the deconstructed pixel to approximate the average of the higher resolution pixels.

The penalty taken seems to be not so bad for the Elaine image as only the three worst configurations fail to achieve compression and slightly exceed the size of uncompressed bitmap data. The reason the arithmetic coder shows up in all three worst results is likely caused by this encoders large histogram storage requirement, causing it to underperform in distributions handled equally well by either Golomb or Zlib.

Table 4.3: Worst five results for the Elaine image

Dec Rec Corrector Col Coder Size (KB) Ratio (%) NN BI 4th I Arithmetic 277.7 108.5 NN AFL 4th I Arithmetic 275.5 107.6 NN NN 4th I Arithmetic 259.2 101.3

NN BI 4th I Zlib 252.6 98.7

(24)

Figure 4.1: Test image: Elaine. SIPI code: elaine.512

Table 4.4: A shortened version of the exhaustive search through the relevant parameters of a coloured image. The table has been shortened as it would span several pages, but hopefully gives enough of an impression of how the system cycles through the various possibilities available. The image used is “Lenna”

Dec Rec Corrector Col Chan Coder Size (KB) Ratio (%) NN NN standard RGB interleaved Golomb 661.5 86.1 NN NN standard RGB interleaved Arithmetic 644.3 83.9 NN NN standard RGB interleaved Zlib 617.1 80.4 NN NN standard RGB consecutive Golomb 655.1 85.3 NN NN standard RGB consecutive Arithmetic 684.7 89.2 NN NN standard RGB consecutive Zlib 616.5 80.3 NN NN standard IRGB interleaved Golomb 685.7 89.3

...

130 results omitted ...

BI AFL 4th RGB consecutive Zlib 580.0 75.5 BI AFL 4th IRGB interleaved Golomb 564.0 73.4 BI AFL 4th IRGB interleaved Arithmetic 589.5 76.8 BI AFL 4th IRGB interleaved Zlib 616.2 80.2 BI AFL 4th IRGB consecutive Golomb 551.6 71.8 BI AFL 4th IRGB consecutive Arithmetic 601.7 78.3 BI AFL 4th IRGB consecutive Zlib 599.5 78.1

Table 4.5: Best five results for the Lenna image

Dec Rec Corrector Col Chan Coder Size (KB) Ratio (%) BI AFL 4th RGB consecutive Golomb 539.1 70.2 BI AFL 4th RGB interleaved Golomb 543.3 70.7 BI AFL 4th IRGB consecutive Golomb 551.6 71.8 BI AFL 4th RGB interleaved Arithmetic 554.2 72.2 BI BI 4th RGB consecutive Golomb 557.0 72.5

(25)

Figure 4.2: Test image: Lenna. SIPI code: 4.2.04

Table 4.6: Worst five results for the Lenna image

Dec Rec Corrector Col Chan Coder Size (KB) Ratio (%) NN BI 4th IRGB interleaved Arithmetic 847.6 110.4 NN AFL 4th IRGB interleaved Arithmetic 838.6 109.2 NN NN 4th IRGB interleaved Golomb 817.2 106.4 NN BI 4th RGB consecutive Arithmetic 813.0 105.9 NN BI 4th IRGB interleaved Zlib 812.8 105.8

Table 4.7: Compression ratios for various test images. Best and amortised result parameters were obtained through exhaustive search. PNG results use the IDAT size as output size. JPEG-LS results use the output symbols size as specified by the encoder.

Name SIPI code Best (%) Amortised (%) OptiPNG (%) JPEG-LS (%)

airplane 7.2.01 66.5 66.5 58.1 57.6 bridge 5.2.10 73.0 88.1 59.8 68.8 elaine elaine.512 71.6 71.6 64.6 61.2 tank 7.1.09 76.8 76.8 64.0 62.9 trui unknown 64.1 64.1 51.3 45.6 lenna 4.2.04 70.7 70.7 60.2 56.7

Table 4.8: Monochrome images

Name SIPI code (%) Best (%) Amortised (%) OptiPNG (%) JPEG-LS (%) 1.1.10.tiff 1.1.10 86.8 86.80 66.32 63.43 boat.512.tiff boat.512 77.5 77.50 63.45 59.94 elaine.512.tiff elaine.512 71.6 71.60 64.58 61.23 medieval.tiff 5.3.01 73.7 73.70 62.09 58.68 crippling-gaze.jpg N/A 55.4 68.80 39.47 23.03 5.2.10.tiff 5.2.10 73.0 88.10 59.81 68.76

(26)

Table 4.9: Coloured images

Name SIPI code (%) Best (%) Amortised (%) OptiPNG (%) JPEG-LS (%) 4.2.02.tiff 4.2.02 66.5 69.60 55.62 48.23 ncurses.png N/A 5.9 12.00 3.68 11.98 doen.png N/A 8.6 19.60 7.73 16.80 Lenna.tiff 4.2.04 70.2 74.50 60.30 56.69 4.2.07.tiff 4.2.07 73.6 78.40 64.28 59.44 mandrill.tiff 4.2.03 95.1 96.40 79.56 77.15

Figure 4.3: Test image: Mandrill. SIPI code: 4.2.03

Table 4.10: Configurations achieving the best result on a particular image

Name Dec Rec Corrector Col Chan Coder Size (KB) Ratio (%) ncurses.png NN NN standard RGB interleaved Zlib 66.6 5.9 doen.png NN NN standard IRGB consecutive Zlib 134.9 8.6 crippling-gaze.jpg BI BI standard I N/A Golomb 432.3 55.4 4.2.02.tiff BI AFL 4th RGB consecutive Zlib 510.7 66.5 4.2.04.tiff BI AFL 4th RGB consecutive Golomb 539.1 70.2 elaine.512.tiff BI AFL 4th I N/A Golomb 183.2 71.6 5.2.10.tiff NN NN standard I N/A Zlib 186.8 73.0 4.2.07.tiff BI AFL 4th RGB interleaved Golomb 565.2 73.6 medieval.tiff BI AFL 4th I N/A Golomb 754.9 73.7 boat.512.tiff BI AFL 4th I N/A Golomb 198.3 77.5 1.1.10.tiff BI AFL 4th I N/A Golomb 222.1 86.8 4.2.03.tiff BI NN 4th IRGB consecutive Golomb 730.5 95.1

(27)

Table 4.11: Average best compression ratios over monochrome image set per configuration

Dec Rec Corrector Col Chan Coder Ratio (%) BI AFL 4th I N/A Golomb 77.75 NN NN standard I N/A Zlib 79.18 BI AFL standard I N/A Golomb 79.50 BI BI 4th I N/A Golomb 79.82 BI NN 4th I N/A Golomb 80.38

Table 4.12: Average best compression ratios over coloured image set per configuration

Dec Rec Corrector Col Chan Coder Ratio (%) BI AFL 4th RGB interleaved Zlib 58.42 BI NN 4th RGB interleaved Zlib 58.98 NN NN standard RGB interleaved Zlib 59.05 BI BI 4th RGB interleaved Zlib 60.30 BI AFL 4th RGB consecutive Zlib 60.32

Table 4.13: Average worst compression ratios over monochrome image set per configuration

Dec Rec Corrector Col Chan Coder Ratio (%) NN BI 4th I N/A Arithmetic 108.58 NN AFL 4th I N/A Arithmetic 106.63 NN NN 4th I N/A Golomb 102.15 NN AFL 4th I N/A Golomb 100.70 NN BI 4th I N/A Golomb 99.42

Table 4.14: Average worst compression ratios over coloured image set per configuration

Dec Rec Corrector Col Chan Coder Ratio (%) NN BI 4th RGB consecutive Golomb 211.82 NN BI 4th RGB interleaved Golomb 210.72 NN AFL 4th RGB consecutive Golomb 208.17 NN AFL 4th RGB interleaved Golomb 204.57 NN BI standard RGB consecutive Golomb 190.52

(28)

One of the first rather obvious results: complex methods do not necessarily make for more effective compression systems. PNG and JPEG-LS both use simple algorithms but achieve impressive compression rates usually near 50% of the raw data size of 24-bits per pixel.

Unfortunately a lot of the test images used in this system were not necessarily of a known source. That is, I know little about how the original test images were obtained. Some computer generated images are obviously lossless originals, however for none of the photographs I know how they were digitized. Furthermore I do not know if any lossy processing algorithms were applied on any of them.

As a rule of thumb, images that have been through a JPEG encoder and are re-encoded in the QI format seem to perform best surprisingly with nearest-neighbour interpolation. This might have to do with the characteristics of macro blocks.

Consecutive and interleaved colour models also do not seem to obviously outclass each other. The IRGB model effectively decreases the entropy of the latter components in a lot of cases by applying simple averaging over the channels and performing the average trick on the blue channel. The average trick is the same correction used for 4th pixel correcting.

4.4 Entropy coder performance

Entropy coders are the part of QI that perform the actual compressing. QI employs three different encoders, Golomb codes, Arithmetic coding and the DEFLATE algorithm deployed by Zlib, which is a combination of the LZ77 run encoder and Huffman coding. [4]

Since the only parts of QI that do the actual compressing are entropy coders its important to assess their performance. Golomb coding has a small storage requirement as it only requires an M parameter. This parameter is stored in QI by a single byte at the start of the entropy stream, thus providing little overhead. The downside of Golomb is the requirement of the data frequency distribution to approximate a geometric distribution in order to achieve compression. The arithmetic coder allows for a great variety in frequency distributions but has to store this distribution requiring a bigger block at the start of the stream. Currently this requires 4K in QI causing a big compression penalty at the lower levels.

Finally Zlib has the added benefit of being able to recognise repeating patterns and can com-press runs of zeroes more efficiently. Both the Golomb and arithmetic coder lack this capability and map symbols with only the currently observed frequency distribution as their argument. While LZ77’s sliding Window is constructed on the fly, the Huffman tree does require storage. This gives Zlib a similar problem as arithmetic coding for small sets of data.

Since Zlib observes an 8-bit stream and the deltas are 9 bits wide there is also an additional conversion required.

(29)

0 100 200 300 400 500 600 Symbol code 103 104 105 106 107 Sy mb ol oc cu rre nc e

EV = 33.9, M = 1, Green bars make up 80.1% of data

Figure 4.4: This figure demonstrates an example of a worst case scenario for the Golomb coder. The image this histogram is derived from is “doen.png” which is a screenshot of text page. This image in fact never compresses well with Golomb regardless of settings used. The red line is the theoretically optimal distribution for the expected value. The green coloured bar indicates the first few symbols alone makes up already more than 80% of all data in this image. Finally the yellow crosses present the resulting Golomb frequencies based on the median. Since the image contains so many 0’s, the median from which the M parameter is derived dictates a 1. This causes Golomb to effectively become unary coding. The undershoot caused by the initial spike, and the resulting failure to compresses causes the data to increase fivefold.

0 20 40 60 80 100 120 140 160 180 Symbol code 101 102 103 104 105 106 Sy mb ol oc cu rre nc e

EV = 5.3, M = 3, Green bars make up 87.5% of data

Figure 4.5: This figure shows a good performance of the Golomb coder. While the figure is still far from optimal it represents an already much better distributed data set that achieves nearly 50% compression using Golomb. The red line is the optimal distribution of symbols according to the expected value. The yellow crosses represent the optimal frequencies resulting for a Golomb code with an M parameter of 3. The green bars represent approximately 88% of symbol data processed.

(30)

CHAPTER 5

Discussion

We will now present the conclusions that can be drawn from the results and how they relate to other systems. Finally some suggestions will be made on possible improvements to the current system.

5.1 Conclusion

The Golomb coder is, while one of the most efficient and simple entropy codes, not universally applicable. Screenshots for instance do not yield a geometric and as a result do not compress well using Golomb codes. Similarly interpolation methods which yield large errors, such as nearest-neighbour can cause Golomb to perform badly.

The flux limiter overall seems to perform better than other interpolation methods in the same configuration. The improvement is a few percent upon the original compression ratio. It can therefore be safe to conclude the flux-limited interpolation method models real-world light behaviour in images better.

Nearest-neighbour performs well for images containing noise.

The arithmetic coder currently uses too much data for its histograms to become competitive with the other entropy coders. It is likely arithmetic coding will perform much better by limiting the number of planes used.

The system generally seems to suffer by going to the lowest level of detail. It would therefore be advised keep the level of detail to a certain minimum.

The repetitive nature of this system is currently not exploited enough. The interpolation methods do not currently decorrelate the images well enough. Multiple planes can be included in the prediction of higher level planes and allow for the reconstruction of modelling colour behaviour in the image. Similarly with a plane, one has access to much more than just the direct neighbours of a pixel. Since the system builds full thumbnails this might also be used to a larger extent.

IRGB maps to a plane allowing it to be stored in three channels instead of four. However the current implementation of IRGB also applies a similar trick as 4th pixel to the fourth chan-nel, which essentially transforms this channel to a similar noise correction status as the plane transformation would have. This leads me to conclude IRGB is as yet ineffective, since it has not become the best performing colour model by default.

At the lower level errors are of such a large magnitude we are overcommitting precision. In other words, the system does not require the deltas to be of such precision as the higher level planes will correct the image to perfect detail anyway.

Nearest neighbour performs well when presented with computer generated imagery. Images with hard edges and patches containing little or no change in colour are well predicted using nearest neighbour.

The system is as yet too inflexible in its processing of image data and has no sense of context outside of the settings used to compress an image in its entirety and the parameters used per

(31)

entropy channel. This causes it to not account for the various characteristics present within different parts of an image. This is probably also the biggest difference separating this system from other image coding systems, and likely an area where there is a lot of improvement to be made.

4th pixel correction does not function well when combined with the nearest-neighbour de-constructor. This comes as no surprise because the premise of 4th pixel correction is not true for nearest-neighbour and as a result in the test results most of the worst compression ratios are caused by the 4th pixel predictor mispredicting the behaviour of nearest-neighbour deconstructed images.

The arithmetic coder currently requires a relatively large histogram to function. Since the system currently requires the tree to decompressed down to a root pixel and every level stores its own entropy settings the arithmetic coder takes a large hit causing it to have trouble competing with the other coders, especially so for images having distributions resembling the geometric distribution.

The system currently used for interpolating the larger versions of the images is not yet effective at decorrelating the pixels. The delta maps produced by the system still contain data that is easily identifiable as the underlying image by a human viewer. The system doesn’t exploit the information it gains about edges by learning from corrections done at a lower level merely repeating the same interpolation process at ever higher resolutions.

The current set of test images is somewhat small, furthermore it is not yet representing of the various characteristics real-world images can have. Increasing the image set would allow for providing stronger support for various hyptotheses and improve the detail of the conclusions we can draw.

Different level of details have very different entropy characteristics. This leads me to conclude this system requires the different layers to be able to switch to different entropy codecs. A similar statement can be made about different colour channels.

5.2 Related work

Various lossless image algorithms have been created in the past. Two of the better known are PNG and JPEG-LS. LOCO-I the underlying algorithm was eventually adopted by the Joint Pic-tures Experts Group to become the JPEG-LS standard. The name JPEG is strongly associated with the respective lossy image format. JPEG-LS however has little to do with this standard.

Then there is PNG, a standard which was originally intended as a patent unencumbered alternative to GIF, which contains the at that time patented LZW algorithm. PNG is an im-age format that is, like JPEG-LS, always lossless. Although there are programs which apply preprocessing filters (therefore lossy) before storing in PNG

5.2.1 JPEG-LS

JPEG is commonly associated with the lossy format JPG. This format has had an addition based on a simple differential predictor that allowed for a lossless mode. In 1999 however the LOCO-I algorithm was standardised as JPEG-LS, a standard intended to supersede the Lossless JPEG mode. [10]

LOCO-I stands for Low Complexity Lossless Compression for Images. The algorithm applies a pixel predictor based on a small neighbourhood called the median-edge detector or MED. This simple predictor is placed within a system that tries to construct a context model for the current region for instance for determining a bias within the region.

The resulting differential numbers should resemble a two-sided geometric distribution or TSGD in order to be efficiently encoded by the Golomb-Rice codes used. However, as a prefix code, Golomb-Rice codes can never encode a symbol at a rate of less than one bit per symbol. This is why LOCO-I can also switch to a run encoder when it encounters what is referred to as a low entropy part of the image.

Finally in an addendum to the main standard LOCO-I can optionally be implemented as LOCO-A using an arithmetic coder instead of the Golomb-Rice codes. This yields greater

(32)

flexi-bility in the distributions encoded. JPEG-LS merely processes pixel data and does not do colour decorrelation or analysis.

5.2.2 PNG

PNG or Portable Network Graphics is a lossless format that allows for the storage of grayscale and coloured images. [3] It applies a very simple but effective compression algorithm based on single pixel prediction and difference coding. Although the corresponding format technically allows for multiple filter systems to be exploited the current state of the PNG specification allows for only a single filtering and prediction system.

Filter types

While the selected filtering system is applied to the image as a whole every image line can be encoding using its own image type. This image type is stored per line in the PNG file. For the filter system provided in the PNG specification there exist 5 different filter types that predict the pixel based on the pixel above and the pixel to the left. The filter itself travers the images from top to bottom and left to right until all pixels have been predicted, differenced and encoded.

Encoding

After the filtering system has predicted a pixel, this predicted pixel is subtracted from the actual pixel value. Similar to the QI system the values obtained this way are distributed around 0 and therefore allow for greater compression than values do not such a predictable distribution. The difference values obtained are then encoded using the DEFLATE algorithm. This is the same algorithm optionally employed by QI through the use of Zlib.

5.3 Future work

5.3.1 Potential as very large image format

Due to the QI systems quadtree structure it can naturally be extended to a system for storing very large images in the order of gigapixels. This can be achieved by splitting planes into tiles of decompressible deltas. Due to the structure of the QI system, decompressing any tile at any level of detail would only require to decompress the tiles at the lower levels and the root tile.

Entropy tiling

As explained previously a reliability and usability versus efficiency trade-off can be made. Nor-mally every plane yields a correction map that is encoded in one go. The result is that only entropy information has to be stored per plane. This saves size on the information describing the entropy for the entropy codec, but the downside is, the entropy codec cannot account for spatial differences in entropy behaviour. JPEG for instance exploits this aspect of images by applying macro blocking.

Another big downside of storing a correction map in a single entropy compression block is that depending on the type of entropy code used a corruption in the block can ruin the decompression of most of the image. Obviously most compression methods suffer from these problems. Huffman is a common prefix code to use and can decode completely wrong by the flipping of only a single bit.

By increasing the amount of entropy blocks through tiling reliability is increased. Corrup-tion of the entropy block will now only damage the specific tile and will only affect the colour components within the tile and any tiles based on interpolating from the corrupted tile. While corruption of the entropy data still has the potential to make most of the image unviewable the better part of the entropy information will no longer be able to affect the image severely.

(33)

MIP-mapping and fast zooming

The QI-system is fractalic in nature. The recursive interpolation system places a repeating grid pattern over the image that is near-identical within every quadrant for non-power-of-two images, and identical for images with dimensions that are a power of two. Furthermore the increase by two in size along the dimensions is reminiscent of and usable for storing MIP-maps.

MIP-mapping is technique used in CGI for lowering render times while enhancing texture quality by providing a different level-of-detail texture based on the distance of the texture object. In browsers it is usually not hard to notice images that are far larger in dimensions than they are on page. While it is also considered bad practice in web design to have web pages contain images larger in resolution than necessary.

Due to its stacked plane nature, the QI system can provide lossless quality images at the resolution required. Therefore removing the need for the viewer to do any further interpolation. Any resolution at a lower level of detail than the stored resolution can easily be obtained by interpolating between the planes surrounding the level of detail.

5.3.2 Partial web fetching and resizing in clients

The QI-algorithm’s order of decoding easily allows the decoding process to be stopped at any level of detail. Similar to JPEG2000, which also allows the decoding process to be interrupted if a higher resolution image is not required. Since the algorithm starts at a low resolution and does not require any information describing higher levels of resolution than the one currently decoded, image files can be constructed in a single-pass linear way. This allows QI-files to be processed in a pipeline without storing to disk and to be partially fetched from a webserver.

This has the potential to save bandwith in a lot of web communication. Mobile phones, with (currently and relatively) small screens can stop fetching image data when their appropriate resolution has been reached. When the users zooms in, they can continue to decode any extra resolution as needed.

5.3.3 Storing rasterised vector graphics

Since the QI-format is also effectively a storage system for MIP-maps, it can be exploited for use in rasterising vector graphics and storing slides or pages of text in a originally described by vectors.

Interpolating vector graphics either down or up usually results in artefacts, unexpected hard-ness or blur respectively. However, the QI format can store the vector graphics at a large range of resolutions by rendering the vector graphics to all image planes and then encoding the image. This replaces QI’s internal deconstructor by the rasterisation engine of the vector graphics li-brary used. Thereby ensuring the deconstruction is completely lossless with regard to the vector graphics.

While this might result in slightly larger file size the quality increase should be evident.

5.4 The QI system

QI certainly shows potential as an image compression format. It has various preferable proper-ties. It’s zooming decompression capabilities make it especially suited for use in online viewing. However, the system still has a long way to go when it comes to being applied in real-world scenario’s. The system is not yet capable to reliably compete with alternative formats.

It is certainly interesting that a function generally used in the computation of gas dynamics proves a better predictor of light properties than for instance bilinear interpolation.

Compression is not merely finding a shorter description of data. It is modelling understand-ing of data and then formalisunderstand-ing this understandunderstand-ing. The QI system is based on the simple assumption that the miniature of an image already tells you a lot about the image. We have shown that this simple assumption can already lead to significant compression.

In my own attempt to predict the future on basis of the information I have now: perhaps one day this system will become the basis of a very robust image format. The future will tell.

Quadtree based image compression

Bachelor Informatica