WP1 Digital Surveying

(1)

WP1 Digital Surveying

Deliverable D1.1.1. Report on quality analysis and selection of compression tools to

optimize

URBAN

Uitmeten, Reconstrueren, Bekijken, Animeren en Navigeren

van stedelijke omgevingen

P. Rondao Alface

Maarten Vergauwen, GeoAutomation

Project Leader

Luc Van Gool,

K.U.Leuven-ESAT/PSI-VISICS

Research Leader

Carolien Maertens, IMEC-NES

Work Package Leader WP1

Patrice Rondao Alface, IMEC-NES

Task Leader Responsible WP1.1

Klaas Tack, IMEC-NES

(2)

Revision History :

Version

Reviewer

Date

(3)

Abstract ... 4

Glossary... 4

1. Introduction ... 4

2. System Requirements ... 5

3. Compression algorithms... 6

3.1. DCT-based algorithms ... 6

3.2. Wavelet-based algorithms ... 8

4. Quality Assessment Metrics ... 9

4.1. Objective quality ... 9

4.2. Perceived quality ... 10

4.3. Functional quality... 11

4.4. Conclusion... 11

5. Benchmark... 11

5.1. Data ... 12

5.2. Related Work... 13

5.3. Experiments... 13

5.3.1. Testing conditions ... 14

5.3.2. Results ... 16

6. Conclusion... 24

7. Bibliography... 25

List of Figures

Figure 1 H.264 encoder/decoder scheme ... 7

Figure 2 JPEG 2000 building blocks... 8

Figure 3 De-bayered output data from the eight cameras on the van. Stereo pairs of cameras

are placed on the four sides of the van. A(t) and B(t) stand for left and right cameras from a

stereo pair at time t in seconds ... 12

Figure 4 Rate-distortion curves, Frontal View, Camera A... 17

Figure 5 Rate-distortion curves, Frontal View, Camera B... 17

Figure 6 Rate-distortion curves, Rear Views, Camera A on the left and Camera B on the right

... 18

Figure 7 Rate-distortion curves, View on the Right, Camera A ... 19

Figure 8 Rate-distortion curves, View on the Right, Camera B... 20

Figure 9 Rate-distortion curves, View on the Left, Camera A... 20

Figure 10 Rate-distortion curves, View on the Left, Camera B... 21

Figure 11 Speed measurements, Front View, Camera A ... 22

Figure 12 Speed measurements, View on the Right, Camera A ... 22

(4)

Abstract

This report presents a benchmark of different video encoding algorithms and tools applied to the multi-camera recording system of GeoAutomation. A comparison of state-of-the-art implementations of DCT-based algorithms and DWT-based algorithms is conducted by taking into account the specificities of the studied system as well as the requirements of the whole application targeted in this Work Package. The presented results are commented and compared with other comparative studies published in the literature.

Glossary

AVC: H.264 Advanced Video Coding

CUDA: NVIDIA Compute Unified Device Architecture

DCT: Discrete Cosine Transform

DWT: Discrete Wavelet Transform

GPU: Graphics Processing Unit

MVC: H.264 extension to Multi-View Coding

PSNR: Peak Signal-to-Noise Ratio

1. Introduction

In WP1, 3D data is extracted for the masses of buildings in cities, with an emphasis on the precision. The goal is to allow cartographers at SPC to determine the 3D positions of the different objects needed by FGIA, and this within the prescribed precisions. GeoAutomation has developed a solution through which these measurements can be largely brought into the office, from behind a desktop. GeoAutomation has developed a van with eight cameras, and the necessary processing pipeline to extract 3D coordinates for points clicked on in an image. The traditional method would be to measure the points with GPS and photogrammetry, as field work. Now, only sparsely distributed reference points are measured on the field, to geo-reference the 3D data obtained from the imagery. Field work therefore is reduced to a minimum. This said, the actual data capture as well as the processing can be made more efficient. This is the main goal of this WP. IMEC’s know-how on speeding-up algorithms and systems will be applied to the existing GeoAutomation pipeline. This collaboration will be focused on two different areas: on-board the GeoAutomation recording van and later on during the processing on the computing servers.

In the current configuration, every camera is connected to a single computer. The images are dumped in raw format (bayered images) on the disk, at a rate of about 20MByte/sec, resulting in data sizes in the order of one Tera-byte or more for one recording session. This is too much to store on the GeoAutomation servers. That is why the images are compressed, after they are debayered. At the moment, the JPEG2000 format is used. However, the on-board computers have low-performance CPUs, unable to perform this debayering and compression in a reasonable time. As a result, all the data must be transferred to the servers where the processing is done. A substantial gain could be achieved by performing the compression on-board

(5)

during or after the recording. GeoAutomation has already tested PGF which is another wavelet-based encoding tool, for the encoding on the van. This solution is not real-time but already improves performances and can guide us in our study.

In this context, deliverable D1.1.1. focuses on the comparison of the compression performance and algorithmic complexity of state-of-the art video encoding tools such as DCT-based algorithms (AVC/MVC) and DWT-based algorithms (JPEG2000/PGF). The selected algorithm should combine very good rate-distortion performances together with parallelization opportunities for real-time processing purposes.

This quality analysis is intended to enable the selection of the correct compression algorithmic features (e.g. quarter-pel motion estimation) and functionalities (e.g. group-of-GOP Prediction, Sequential View Prediction) that are the most efficient for WP1.

2. System Requirements

Since the de-bayering and compression should be performed on the van in order to improve its storage capacity, their implementation should be real-time. The reconstructed images after decoding should also be of sufficient quality to be re-used in the processing on the servers of GeoAutomation. If there exist “lossless” compression algorithms for which the quality remains unaltered, they do not achieve sufficient bitrate reduction. On the contrary, “lossy” compression enables to significantly reduce the size of the input data but often generates distortions on the reconstructed frames. These distortions are related to the nature of the compression algorithm: blocking effect for block-based encoders, blurring effects near sharp edges for wavelets. These distortions are usually more important when the achieved compression ratio is increasing. Rate-distortion curves are often used to estimate the performance of the encoder as well as finding a good trade-off between these two axes. In the case of this deliverable a third axis has to be added in this search of the best trade-off: the processing time complexity.

Indeed encoding high-resolution images at a high quality level is a

challenging task that is usually achieved through algorithmic

simplifications enabling parallelization strategies. These simplifications can also reduce the quality of the reconstructed frame by limiting the rate-distortion performances of the encoder when accelerating it at the desired code rate.

The question then turns out to find among existing compression algorithms the best candidate for real-time acceleration while ensuring that the distortions remain imperceptible and functionally transparent (i.e. they do not degrade the correctness of the detection of feature points which are of crucial importance for the quality of the consecutive 3D reconstruction process). It must be noted that these two conditions (imperceptibility and functional transparency) may not be equivalent.

Furthermore, the requirements of the processing on the servers imply many random accesses to regions of coded frames. This means the decoder speed requirements are even stronger than those of the encoder.

(6)

3. Compression algorithms

State-of-the art compression algorithms for video streams are classified into two main families following the transform domain they exploit: the Discrete Cosine Transform (DCT) and the Discrete Wavelet Transform (DWT). On the one hand, the wavelet transform, applied on a frame basis, is very well-known for its high energy compaction performance. On the other hand, the cosine transform is always performed on block-based approaches which allow for more flexibility and a reduced complexity in a parallelization context. The latter are generally more suitable in case of real-time applications while wavelet-based algorithms are rather used when high-quality signal processing is needed. However, many extensions and improvements of both classes of algorithms have produced very comparable results regarding the compression application. Here follows a short presentation of the main algorithms tested in this deliverable.

3.1. DCT-based algorithms

AVC (H.264, MPEG-4 Part 10) [WSBL03], [WS07], for Advanced Video Coding, is a digital video codec standard which is noted for achieving very high performance compression. It has been developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT). AVC [IIIJ07] provides good video quality at substantially low bit rates. It is based on a block-based integer DCT-like transform. In addition, it performs spatial prediction for Intra frame coding and temporal motion estimation for Inter frame coding to improve further its compression efficiency. A de-blocking filter is also included to improve the quality of the reconstructed frames. The entropy coding is achieved by variable-length coding (CALVC) or by arithmetic coding (CABAC).

The overview of the scheme is represented on Figure 1. Different profiles and levels have been defined for AVC. They range from the simple Baseline profile towards High profile and fidelity range extensions (FRExt).

For parallelization purposes, the frames can be decomposed into slices (sets of macroblocks) that are independently coded form other slices. A slice is noted I-slice if all its macroblocks are Intra coded, P-slice if motion estimation for preceding reference frames is enabled and B-slice if weighted prediction from previous and next reference frames is enabled. The integer DCT transform is either a 4x4 block transform or a 16x16 block transform for the baseline profile. For higher profiles, a 8x8 transform is also enabled for a finer granularity.

(7)

Figure 1 H.264 encoder/decoder scheme

Table 1 presents a summary of the notations used to describe the different AVC profiles and implementations in this deliverable. The Baseline profile is the less complex one and can be divided into Intra only coding with no motion compensation (AVC Intra BP) or Inter coding where Intra and Inter predictions compete to give the best approximation (AVC Inter BP). Main profile (AVC MP) enables B slices, 8x8 transforms and arithmetic coding (CABAC). Finally AVC Intra HP (FRExt) stands for the High profile fidelity range extensions where intra prediction is performed together with 8x8 transforms and CABAC, optionally.

I slices P slices B slices 8x8

transform

CABAC

codec

AVC Intra BP

yes

no

IMEC, JM

AVC Inter BP

yes

no

IMEC, JM

AVC Intra HP

(FRExt)

yes

no

yes

JM

AVC MP

yes

Yes

yes

IPP, JM

(8)

There exist many implementations of the H.264 AVC codec implementations. In this deliverable we focus on:

• the reference software JM 14.0 [HHI08],

• the Intel IPP-based (Intel Performance Primitives) AVC encoder [I08]

and

• the IMEC AVC encoder (developed by the NES/DDT/Multimedia group).

The AVC standard is being extended towards Multi-View Coding (MVC) [IOS04] in order to enable to reduce the redundancy available between overlapping frames. It has been shown in [IOS05] that MVC can significantly improve rate-distortion performances but is characterized by a higher complexity. The available implementation of MVC is JMVM 8.0.

Parallelization strategies have been exploited by Intel by making use of their performance primitives (IPP). This can be seen as multi-threading on multi-core Intel general purpose platforms. The JM and IMEC encoders have not been parallelized yet and their speed performances should be then considered as far from their actual potential. Possible parallelization strategies for AVC encoding on the van would be either the use of IPP blocks in the IMEC encoder or mapping of the same encoder on a GPU possibly following and extending pioneering work in [PRNW07] (decoder), [KAWL07] (intra prediction), [LLWCTC07] (motion estimation).

3.2. Wavelet-based algorithms

Another compression standard, JPEG2000 [SCE01] is a wavelet-based

compression algorithm for still images. It is created by the Joint Photographic Experts Group (JPEG) committee. Besides offering a number of new functionalities, it outperforms the original DCT-based JPEG standard in terms of compression efficiency in many situations. However it cannot exploit motion compensation as predictions are restricted to intra coding. The basic functional blocks of JPEG 2000 are represented on Figure 2. In the pre-processing stage, an inter-component transformation is used to decorrelate the color data. Then, the DWT is applied to the processed samples. The DWT provides a multi-resolution image representation. Furthermore, it achieves better compression due to its good energy compaction. The resulting wavelet coefficients are quantized using a uniform quantizer with a central deadzone. Then, the quantized coefficients are coded by an Adaptive Binary Arithmetic encoder. Finally, the output of the arithmetic encoder is organized as a compressed bit-stream which offers a significant degree of flexibility. This enables features such as random access, region of interest coding, and scalability. For more details on the JPEG2000 standard refer to [SCE01]. Among the available implementations of JPEG 2000 codecs, KAKADU 6.0 has proven to achieve very good performances [T08].

Figure 2 JPEG 2000 building blocks

Pre- Processing Discrete Wavelet transform (DWT) Uniform quantizer with deadzone Adaptive Binary Arithmetic Coder Bitstream organization

(9)

Finally, Progressive Graphics File (PGF) is another wavelet-based encoder that has been designed to optimize speed with respect to compression efficiency [S02]. PGF has been built based on a simplification of the JPEG 2000 functional blocks. A simplified color conversion has been implemented as well as a fixed integer wavelet filter (while JPEG 2000 allows choosing other wavelet filter coefficients). Bitstream reordering and entropy coding also differ from JPEG 2000 and are rather based on [M99]. It has been shown in [S02] that PGF has an intermediate rate-distortion performance between JPEG and JPEG 2000 coders but with a significant speed-up when compared to the latter. Opposed to JPEG2000, where a desired bitrate can be specified for the encoding, PGF only offers a quality parameter with nine possible values ranging from lossless compression towards very high compression ratios with very low quality.

With respect to parallelization strategies, it must be noted that both PGF and KAKADU are already parallelized through similar techniques as the ones used by Intel for their IPP-based AVC encoder. It would be difficult to further accelerate these algorithms using the same strategy. A possible alternative is then the mapping on a GPU such as explored in [WLHW07].

4. Quality Assessment Metrics

The meaning of “quality” or “fidelity” may significantly differ from one application scenario to another one. Generally, compression tools make use of simple metrics giving a rough but quick estimate of the distortion between the original and reconstructed versions the input sequence. This is the case for the MSE, SAD or PSNR defined hereafter. Some applications however need more sophisticated or more specific quality or fidelity measures. In the sequel, frequently used metrics are classified into three main sets: objective, perceived and functional quality metrics.

4.1. Objective quality

Objective metrics refer to computed measures of the difference between the original and de-compressed frames or sequences. These measures are usually based on some function of the differences between the values of corresponding pixels of the original and de-compressed frames. Some metrics also include more sophisticated measures as the differences of features associated with the neighborhood of the compared pixels. These metrics are trade-offs between conflicting constraints: maximizing the adequacy of the measure with the perceived difference while minimizing its computational cost.

Among these metrics the most widely used for their computational simplicity are the Mean Square Error (MSE) which is the mean of the squared differences between corresponding pixels in the original and reconstructed frames, and the Peak Signal-to-Noise Ratio (PSNR) which is given by:

PSNR = 20 log (255/MSE)

A high PSNR corresponds to high fidelity while a low PSNR relates to large differences between the original and reconstructed sequence. This measure is widely used in video compression, image de-noising, etc.

(10)

Other simple metrics are the Delta metric (mean of the differences between corresponding pixels), the Sum of Absolute Differences (SAD). Feature-based metrics are based on comparisons of the corresponding average measures on neighborhoods (corresponding 4 by 4 or 8 by 8 block) such as the blurring and blocking metrics [MSU08]. The metrics are intended to quantify the usual artifacts caused by compression algorithms. The blurring metric compares the power of high frequencies in the neighboring block of the studied pixel. The blocking metric is based on the detection of flat and bright areas delimited by sharp edges.

Other approaches tend to better capture the perceived quality at the cost of a significant increase of the computational complexity. This is the case of the Structural Similarity Index (SSIM) and the Video Quality Metric (VQM).

The SSIM, proposed in [WBSS04], analyzes three features: the luminance similarity, the contrast similarity and the structural similarity. Luminance is computed as the mean luminance of the frames or pixel neighborhood to be compared. Contrast is computed by the square root of the variance of the frame or neighborhood pixels. The structure is computed as the difference between neighborhoods or frames after a luminance- and contrast-based normalization. These three similarity measures are then combined into one function to yield an overall similarity measure.

In VQM presented in [X00], the DCT is used to match the perceived differences. After a YUV conversion, a DCT is performed and coefficients are then quantized by frequency-dependent weighting functions which are given by the human spatial contrast sensitivity function (SCSF) matrix. This allows giving more emphasis to frequencies which are better perceived by the human eye. The final measure is given by a weighted pooling of mean and maximum distortions.

Finally, it is worth mentioning the perceptual masking approaches surveyed in [DVDM02]. These techniques tend to detect areas of the video frames where distortions would be more perceptible. This can be used to modulate the strength of the encoding quantization in block-based compression tools.

4.2. Perceived quality

Subjective quality metrics commonly refer to subjective quality (a.k.a. psycho-visual) assessment or metrics that have been validated through these experiments. ITU-R Rec. BT.500-11 [ITUR02] defines a set of subjective test methods adapted to the assessment of the perceived video quality (see Table 1). In categorical judgment, observers assign an image or image sequence to one of a set of categories that typically are defined in semantic terms. The categories may reflect judgments of whether or not an attribute is detected.

Reference Stimulus Assessment time DSIS Need Double Short (54 60s) SDSCE Need Double very long (30 60 min) DSCQS Need Double Short (54 60s) SS No need Single Short (41 47s) SC No need Single Short (54 60s) SSCQE No need Single very long (30 60 min)

(11)

The ITU-R Rec. BT.500-11 specifies 5-point (excellent-good-fair-poor-bad) quality impairment (imperceptible-perceptible but not annoying-annoying-very annoying) rating scales and 7-point (much better - better -slightly better - the same - slightly worth - worse - much worse) comparison scale. [CBMLH07] provides more details on how to setup perceptual experiments for video quality assessment and more particularly on Just Noticeable Difference (JND)-based techniques.

4.3. Functional quality

This term is less often used in the case of video sequences [OM00]. It aims at assessing whether the distortions present in the de-compressed frames significantly interferes with the function of the original sequence. If the function is only rendering then the functional quality is the same as the perceived quality. In the case of applications making use of these sequences (e.g. 3D reconstruction, face recognition, surveillance etc), functional quality will measure the impact of the distortions on the performances or reliability of the end application.

4.4. Conclusion

Many quality assessment tools and experiments have been proposed in the literature. The objective metrics such as the PSNR are best-suited for rate-distortion performance analysis. Perceptual experiments should be run for an application where rendering is very important (e.g. digital cinema) as they imply a significant cost in time and resources to be correctly conducted. The functional analysis seems more relevant in this context as

the primary goal of GeoAutomation is to provide an accurate 3D

reconstruction. Functional tests should then be run on significant samples of data (several minutes of recording in order to generate reasonable 3D reconstruction scene sizes).

A simple yet far from optimal way to link objective metrics to the functional analysis is to start from the fact that we know the level used in PGF by GeoAutomation that results in the best trade-off between compression ratio and a good quality in 3D reconstruction scenes. However, equivalent PSNRs for two DCT- and DWT-based algorithms can be obtained by very different distortion effects. Assessing whether these effects degrade more or less the 3D reconstruction accuracy is still necessary in order to correctly choose the compression tool family.

5. Benchmark

In this section, we present the benchmark realized on the data received from GeoAutomation. First we give a general analysis of the kind of content that has to be encoded. We then give an overview of the previous and incomplete benchmarks that have been published so far. Finally we describe the experiments we have run as well as their results. It must be noted that de-bayering effects have not been taken into account. It has however been shown in [PTH04] that modifying the de-bayering algorithm can slightly modify the compression performance in the case of JPEG2000.

(12)

5.1. Data

The system on the van is composed of four pairs of stereo cameras outputting bayered 1628x1236 images at 12 frames per second. Some samples are represented on Figure 3. The content of these images is made of urban views with smooth natural areas (more or less uniform textures for sky, streets, roads…), objects delimited by sharp edges (cars, buildings, traffic signals, windows on buildings…) and possibly textures with high-frequency behavior (vegetation, tree leaves…). These observations, together with the high spatial resolution, stem for the presence of large spatial redundancy that could efficiently reduced by Intra prediction (DWT or DCT). Besides, the motion in the sequences is mainly due to the motion of the van itself, e.g. zooming effect for front and rear views and uniform translation for lateral views when driving along a straight line. The remainder of the motion is related to moving objects such as cars, pedestrians etc. Inter prediction can thus be considered too.

The fact that the system is based on pairs of stereo cameras also implies large overlapping regions between cameras belonging to the same stereo pair. This kind of redundancy can be exploited by MVC.

A(0)

B(0)

A(20)

B(20)

Front

Rear

Rightt

Leftt

Figure 3 De-bayered output data from the eight cameras on the van. Stereo pairs of cameras are placed on the four sides of the van. A(t) and B(t) stand for left and right cameras from a stereo pair at time t in seconds

(13)

5.2. Related Work

Some comparative studies of JPEG 2000 and AVC have already pointed out that the performances of both algorithms are strongly dependent first on the spatial and temporal resolutions and second on the kind of content. However, these studies are always based on a limited number of different video sequences rather than on a database. Furthermore, the kinds of content and motion as well as the resolution of the data recorded by GeoAutomation have not been taken into account in these publications. In [MGCB04], a performance evaluation of AVC Intra MP and JPEG2000 was conducted. It is reported that AVC Intra performs better than JPEG2000 in terms of rate-distortion behavior for low and intermediate resolution sequences. The gain of AVC Intra over JPEG2000 in PSNR has been reported as around 0.5 ~ 2.0 dB. On the other hand, JPEG2000 performed better for higher resolution sequences with a gain around 0.5 ~ 1.0 dB in PSNR.

Furthermore, [MGW05] has compared AVC HP Intra and JPEG2000 for

monochromatic still image coding. It is shown that their performance is identical. Nevertheless, it has concluded that JPEG2000 has a gain of 1 dB in PSNR over AVC HP Intra if the 8x8 transform is disabled for the encoder. However, the evaluation was performed on a small set of images, which reduces its consistency. Both [T05] and [JVT04] performed the same comparison as [MGW05]. However, [T05] used video sequences at high resolutions instead of still images. The experimental results in [T05] and [JVT04] show that AVC HP Intra offers rate-distortion gain around 0.2 ~ 1.0 dB in PSNR over JPEG2000. Finally, [ODE06] compared JPEG2000 to both AVC profiles. It showed that JPEG2000 is very competitive with AVC HP Intra with around 0.1 dB difference in PSNR in favor of AVC HP for high spatial resolution sequences. On the other hand, JPEG2000 outperforms the Main Profile with gains around 0.1~1.0 dB in PSNR. For intermediate and low spatial resolution sequences, both profiles of AVC Intra outperform JPEG2000. Nevertheless, [ODE06] did not consider High Definition (HD) sequences among the test material used in its evaluation.

Finally, [CBMLH07] proposed to confront objective and subjective quality assessment tools for the comparison between AVC Intra and JPEG 2000. The proposed objective methods are PSNR, delta, blurring, blocking, structural similarity index and video quality metric. The experiments on subjective quality assessment have surprisingly shown that at high bitrates (high quality) AVC distortions are less perceived than JPEG 2000 ones. At very low bitrates, blocking effects of AVC are considered as more annoying than distortions caused by wavelets.

In conclusion, it can be assumed that at the given resolution, the performances of AVC/MVC with respect to JPEG 2000 or PGF will strongly depend on the available redundancy at the spatial level with respect to the efficiency of the motion estimation model chosen. If motion estimation does not improve compression performance, then it is not clear form the literature whether AVC Intra BP or Intra HP are better or worse than wavelet-based encoders.

5.3. Experiments

This section describes the experiments that have been run to benchmark AVC, JPEG 2000 and PGF encoding algorithms. In order to be able to compare our

(14)

results with the existing literature, we mainly use an objective metric (PSNR) for the distortion level assessment in a classical rate-distortion analysis. Perceptual experiments have not been conducted in order to assess the imperceptibility of the reconstructed sequences because of their cost and low relevance with the application. A functional transparency (i.e. the 3D reconstruction quality assessment) assessment on a larger set of data has still yet to be done.

5.3.1.Testing conditions

Here follow the best testing parameters with the lowest possible algorithmic complexity found during our experiments. This means that they are the result of the search of the best trade-off between rate-distortion and speed performance for each algorithm type. The results and plots that are shown in the remainder of the deliverable have been generated with these settings. It must also be mentioned that MVC has been discarded so far because of its too high complexity (30 minutes for ten frames for the JMVM encoder) though its compression performances outperform AVC MP which has the best rate-distortion performance. The opportunities for real-time acceleration exist but should be tackled only after a careful study of AVC BP or MP.

5.3.1.1.AVC Intra BP (IMEC and JM)

• Baseline Profile 66, level 42

• CALVC

• All Intra prediction modes are enabled

• Disable transform coefficients thresholding

• De-blocking enabled

• Error metric: SAD

5.3.1.2.AVC Inter BP (IMEC and JM)

• Baseline Profile, 66 level 42

• CALVC

• All Intra prediction modes are enabled

• Disable transform coefficients thresholding

• De-blocking enabled

• Enable the use of explicit lambda parameters and set the weight of

the I slice to 0.5.

• All partition modes are enabled

• Quarter-pel interpolation

• Error metric: SAD

• IntraPeriod 100

• SearchRange 16

• NumberOfReferenceFrames 2

• SearchMode EPZS

5.3.1.3.AVC Intra HP (FRExt) (JM)

• High Profile encoding 100, level 42

(15)

• The 8x8 transform enabled.

• IPCM mode enabled.

• Disable transform coefficients thresholding.

the I slice to 0.5.

• AdaptiveRounding is enabled. This parameter is used in the

quantization process to adjust the rounding offset to maintain an equal expected value for the input and output of the quantization process for the absolute value of the quantized data. It’s recommended to use AdaptiveRounding when encoding with high quality.

• AdaptRndPeriod is set to 1, AdaptRndWFactorIRef is set to 8 and

AdaptRndWFactorINRef is set to 8. These parameters are associated with AdaptiveRounding.

• OffsetMatrixPresentFlag is disabled.

• Enable rate-distortion optimization.

5.3.1.4.AVC MP (JM)

• Main Profile encoding 77, level 42

• CABAC.

• IPCM mode enabled.

• Disable transform coefficients thresholding.

the I slice to 0.5.

• AdaptiveRounding is enabled. This parameter is used in the

quantization process to adjust the rounding offset to maintain an equal expected value for the input and output of the quantization process for the absolute value of the quantized data. It’s recommended to use AdaptiveRounding when encoding with high quality.

• AdaptRndPeriod is set to 1, AdaptRndWFactorIRef is set to 8 and

AdaptRndWFactorINRef is set to 8. These parameters are associated with AdaptiveRounding.

• Quarter-pel interpolation

• IntraPeriod 4

• All partition modes are enabled

• B slices are disabled

• NumberOfReferenceFrames 2

• SearchMode EPZS

5.3.1.5.AVC MP (IPP)

Options have been selected as the most comparable with the ones of JM:

• Main Profile encoding 77, level 42

• CABAC.

• Number of frames between two Intra frames 3

(16)

• Num_Ref_Frames 2

• ME Method EPZS

5.3.1.6.KAKADU

• Codeblock size of 64x64.

• One tile per frame.

• 5 decomposition levels.

• Visual Frequency Weighting is switched-off. This parameter is used to

give good visual appearance. On the other hand, it reduces the rate-distortion performance.

• Base step parameter (QStep) adapted per sequence and rate control

switched-off.

5.3.1.7.PGF

• Number of hierarchic levels: 1

5.3.2. Results

The comparison results are classified into three subsections. First we perform a standard rate-distortion analysis in order to compare our results to other aforementioned comparison studies. In a second step, speed performances are measured by only taking into account encoding time without I/O accesses. Finally, we analyze the relative stability of the encoding performance in the temporal domain.

5.3.2.1.Rate-distortion performance

The quality is here measured by the PSNR as usually done in the related literature. Bitrates are given as the total amount of bits in the stream after encoding 200 frames. Results are plotted for the eight cameras of the van. As expected, compression performances vary in function of the content and motion which are dissimilar from one camera position to another. The quality range desired by GeoAutomation is given by the fixed level used with PGF (level 4). This quality level is indicated on the plots by a dark arrow. We can show that from a camera to another one, this level leads to very different PSNRs. However, generally, the PSNR is close to 40 dB. At this quality level, we could not perceive differences in reconstructed frames when compared to their original versions.

As expected, it is possible to classify the results in function of the content/position of the camera.

(17)

Figure 4 Rate-distortion curves, Frontal View, Camera A

Figure 5 Rate-distortion curves, Frontal View, Camera B

25 30 35 40 45 50 0 500000000 1000000000 1500000000 2000000000 2500000000 bits P S N R ( d B )

(18)

1. Frontal views

For both frontal views (Figure 4 and Figure 5) it can be seen that AVC MP has by far the best performances at any quality level. For the desired quality, the possible gain in compression ratio is about 40% to 50% with respect to KAKADU or PGF. At lower bitrates, wavelet-based algorithms perform better than AVC compression tools (except for AVC MP). It is the contrary for higher quality. Interestingly the desired quality is very close to the crossing between AVC and wavelet curves. As for all other views, AVC Inter BP does not outperform AVC Inter BP for the reference and IMEC encoders. This can be explained by the fact that a majority of inter blocks are intra-coded and that rate-control generates lower performances then. In our experiments, the IMEC encoder has always outperformed the JM software in 0.2 to 0.7 dB with a speed-up of 200 to 300%. AVC Intra HP seems to be a good intermediate between AVC MP and PGF.

2. Rear views

The same observations can be done for the very similar rear views (Figure 6). In this case, the PSNR at the desired level is also close to 41 to 42 dB. AVC MP has still the best rate-distortion performances. Here, AVC Intra HP does not perform much better than PGF or KAKADU.

25 30 35 40 45 50 0 500000000 1000000000 1500000000 2000000000 2500000000 bits P S N R ( d B )

KAKADU PGF AVC Inter BP IMEC AVC Intra BP IMEC AVC Intra HP JM AVC MP IPP

25 30 35 40 45 50 0 500000000 1000000000 1500000000 2000000000 2500000000 bits P S N R ( d B )

Figure 6 Rate-distortion curves, Rear Views, Camera A on the left and Camera B on the right

3. Lateral views (on the right of the van)

In these views (Figure 7 and Figure 8), at low bitrates, wavelet encoders perform better than AVC even for AVC MP. However at the desired quality, PGF has a very low compression ratio compared to KAKADU and AVC encoders. Among AVC algorithms, AVC MP still enables a gain of 50% in bitrate reduction when compared to PGF. AVC Intra HP or BP or AVC Inter BP and KAKADU have then almost the same performance. The quality level here corresponds to a much lower PSNR of 38dB. This is due to the very detailed

(19)

functional transparency assessment experiment is mandatory to determine whether interesting features or objects (e.g. bus stop signal) are decoded with sufficient quality to be detected during the 3D reconstruction process.

4. Lateral views (on the left of the van)

In these views (Figure 9 and Figure 10), it ca be observed that as for the views on the right of the van, the wavelet encoders perform well at lower bitrates but are outperformed at the desired quality. The gain of AVC MP is though much less important than for the right views, and is more comparable with results obtained on the front and rear views. PSNR values for the desired quality are also more comparable with these views than the other lateral views since they are close to 41 dB. This can be explained by the fact that on this sequence, the views on the left do not contain as many detailed textures as the views on the right.

It can be concluded that from these experiments, AVC MP enables a better rate-distortion performance than other encoders. AVC Intra HP and also AVC Intra BP are always comparable or slightly better than KAKADU and PGF when PGF’s level 4 is selected. These observations are as already mentioned restricted to PSNR measures.

Figure 7 Rate-distortion curves, View on the Right, Camera A

25 30 35 40 45 50 0 500000000 1000000000 1500000000 2000000000 2500000000 bits P S N R ( d B )

(20)

Figure 8 Rate-distortion curves, View on the Right, Camera B

Figure 9 Rate-distortion curves, View on the Left, Camera A

25 30 35 40 45 50 0 500000000 1000000000 1500000000 2000000000 2500000000 bits P S N R (d B )

25 30 35 40 45 50 0 200000000 400000000 600000000 800000000 100000000 0 120000000 0 140000000 0 160000000 0 180000000 0 200000000 0 bits

P S N R ( d B )

(21)

Figure 10 Rate-distortion curves, View on the Left, Camera B

5.3.2.2.Speed measurements

This subsection now presents the third axis of our system: algorithmic complexity. Since this word is defined in very different ways depending on the research domain, we propose processing time measures. This measure is extremely dependent on the implementation and on the selected platform; however, it is the best measure we can easily access. In the following plots, the reported speeds are measures of the encoding time taken by the encoders without taking into account accesses to the hard-disk. It must be noted that KAKADU and PGF are optimized codecs that already exploit parallelism on CPUs. AVC MP IPP is also optimized and parallelized for Intel CPUs, it also offers the possibility to select the number of threads for the processing. The JM and IMEC encoders do not have such optimizations yet and their speed measurements are then difficult to compare but we report them as well.

As for the rate-distortion analysis, speed measurements are very similar for front rear and left views (Figure 11). Views on the right of the van show interesting differences (Figure 12).

For the first class of views, it can be seen that PGF and AVC MP IPP are very competitive but only output 6 to 7 FPS at the desired quality. A real-time scenario needs 12 FPS which can be obtained by AVC MP IPP but at a lower quality. AVC Intra BP from IMEC already shows speeds of 2.5 FPS without parallelization. It is hard to predict if a speed-up of a factor 4

25 30 35 40 45 50 0 200000000 400000000 600000000 800000000 100000000 0 120000000 0 140000000 0 160000000 0 180000000 0 200000000 0 bits P S N R ( d B )

(22)

to 5 would be possible. The optimizations on the IPP encoder show however that using the 4 CPUs allows for speed-ups of a factor of 2 to 3. However, the algorithmic complexity of AVC MP is known to be much higher than the one of AVC Intra BP, but reported performances with parallelization on 1 CPU already show comparable speeds. So there is room for considerable speed-ups for the IMEC encoder.

0 2 4 6 8 10 12 42.324 39.369 36.854 34.859 mean PSNR (dB) S p e e d ( F P S )

KAKADU PGF AVC Inter BP IMEC AVC Intra BP IMEC AVC Intra HP JM AVC MP IPP (1 CPU) AVC MP IPP (4 CPUs)

Figure 11 Speed measurements, Front View, Camera A

0 1 2 3 4 5 6 7 8 43.6881769 41.68778109 39.17519643 36.46502272 33.60455899 mean PSNR (dB) s p e e d ( F P S )

KAKADU PGF AVC Inter BP IMEC AVC Intra BP IMEC AVC Intra HP JM AVC MP IPP (1 CPU) AVC MP IPP (4 CPUs)

(23)

For the views on the right, it can be seen that KAKADU outperforms PGF and

(but relatively less) AVC MP IPP at the desired quality level.

Surprisingly, the IMEC encoder performs almost as well as PGF for the desired quality though it has not been parallelized yet.

It can be concluded that optimized encoders (KAKADU, PGF, AVC MP IPP) are not sufficiently accelerated for real-time encoding. The IMEC encoder shows however nice opportunities for acceleration in this context. The other possibility is to map on of these encoders to a GPU in order to obtain further accelerations.

If we combine the results of this analysis with the results of the rate-distortion analysis, it turns out that the best trade-offs could be obtained by either mapping AVC MP on a GPU, or extend the functionalities of the IMEC encoder to AVC Intra HP and parallelize it for CPUs or on a GPU. The same can be said for KAKADU, however, the bottleneck in this encoder is related to the entropy coder which has almost no parallelization opportunity. On the contrary, bottlenecks of AVC encoders are rather located on the Motion Estimation or Intra coding. This gives a certain advantage to AVC compression tools.

Another point concerns the decoder speed. For the KAKADU and PGF implementations encoders and decoders have similar speeds (the decoder being 10 to 20% faster). This means decoding times hardly reach 12 frames per second. On the other hand the Intel AVC IPP-based decoder achieves speed performances between 100 to 250 FPS at the desired encoding quality. This is a significant fact for the processing on the servers where frame decoding is one of the bottlenecks.

5.3.2.3.Temporal Analysis

In this subsection, we further extend our analysis to the study of quality variations between frames. For wavelet-based encoders, there is no motion estimation and usually processing time and encoding quality are more or less constant between consecutive frames. The same cannot be said of the AVC algorithms which will have different rate-distortion performances on different frames. This is why it also offers rate-control functionalities in order to stabilize the output bitrate variations.

Figure 13 shows such an analysis. As expected, KAKADU and PGF produce an almost constant quality for each frame. A small variation can be seen around frame number 65 where two cars appear on the sequence. AVC IMEC and JM have very similar variations because they share the same rate-control algorithm. This is not the case for AVC MP IPP which behaves very differently.

Another point concerns frame encoding speed. Here again, there are not many variations between frames for PGF and KAKADU. In the case of AVC encoders, Intra coded frames are usually 5 times faster than Inter coded frames. In the case of AVC Inter BP IMEC or AVC MP IPP, it is possible to specify the length of the interval between Intra coded frames which enable to reduce possible drifts from Inter coded frames. This means that it is also possible to predict the frequency at which frames will vary.

(24)

33 34 35 36 37 38 39 40 41 42 -10 10 30 50 70 90 110 130 150 frame number P S N R ( d B )

Figure 13 Temporal analysis

5.3.2.4.Functional transparency assessment

We still have to perform a test on significant samples of data to ensure that distortions present in AVC encoded frames at equivalent PSNR values produce the same correct 3D reconstructions as wavelet-based encoded frames. This test should also be run periodically during the acceleration

process to make sure that possible distortions due to algorithm

simplifications for parallelization do not prevent the data to be used for its purpose.

6. Conclusion

In this deliverable, we have shown that the different state-of-the art coding tools present different trade-offs between speed and rate-distortion performance. Based on our observations, and provided that the still-to-be-run functional quality assessment validates their use, AVC encoders seem to be very good candidates for real-time acceleration. DWT-based compression tools should be further optimized on massive parallelism platforms such as GPUs or multi-core platforms. However, their bottleneck being the less parallelizable functional block, it is unclear whether significant speed-ups can be achieved. This is not the case for AVC encoding tools where Intra and Inter coding are bottlenecks that can significantly be accelerated through parallelization. Furthermore, the decoding times are also faster by one order of magnitude for AVC. This fact can be important for the processing on the servers.

(25)

7. Bibliography

[WSBL03] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra,

“Overview of the H.264/AVC Video Coding Standard”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, July 2003.

[SCE01] Skodras, A., Christopoulos, C., Ebrahimi, T., “The JPEG 2000

still image compression standard”, Signal Processing Magazine, IEEE vol. 18, Issue 5, pp. 36 – 58, September 2001.

[MGCB04] Marpe, D.; George, V.; Cycon, Hans L.; Barthel, K. U.,

“Performance evaluation of Motion-JPEG2000 in comparison with

H.264/AVC operated in pure intracoding mode”, Wavelet

Applications in Industrial Processing. Edited by Truchetet, F.. Proceedings of the SPIE, Volume 5266, pp. 129-137, 2004.

[MGW05] D. Marpe, S. Gordon, and T. Wiegand, “H.264/MPEG4-AVC Fidelity

Range Extensions: Tools, Profiles, Performance, and Application Areas”, IEEE International Conference on Image Processing, Genova, Italy, September 2005.

[CBMLH07] S.-G. Cho, Z. Bojkovic, D. Milovanovic, J. Lee, and J.-J. Hwang, “Image Quality Evaluation: JPEG 2000 Versus Intra-only H.264/AVC High Profile”, ELEC. ENERG. vol. 20, no. 1, 71-83, April 2007.

[ODE06] M. Ouaret, F. Dufaux, T. Ebrahimi, “On comparing JPEG2000 and

Intraframe AVC”, Applications of Digital Image Processing XXIX. Edited by Tescher, A. G.. Proceedings of the SPIE, Volume 6312, 2006.

[SV04] M. Smith and J. Villasenor, “Intra-frame JPEG-2000 vs.

Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences”, SMPTE Technical Conference and Exhibition Pasadena, California, October 20-23, 2004.

[T05] P. Topiwala, “Comparative study of JPEG2000 and H.264/AVC FRExt

I- frame coding on high-definition video sequences”, Optical Information Systems III, Proceedings of the SPIE, vol. 5909, pp. 284-292, 2005.

[JVT04] Joint Video Team of ITU-T and ISO/IEC, “Performance comparison

of intra-only H.264/AVC HP and JPEG2000 for a set of monochrome ISO/IEC test images”, Doc. JVT-M014, October 2004.

[MSU08] Msu video quality measurement tool. [Online]. Available:

http://www. compression.ru/video/quality measure/info en.html

[WBSS04] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. Simoncelli, “Image

quality assessment: from error visibility to structural

similarity,” IEEE Trans. on Image Proc., vol. 13, pp. 600–612, 2004.

[X00] F. Xiao, DCT-based video quality evaluation, 2000, student

final project Digital Video Processing (EE392J).

[ITUR02] Methodology for the subjective assessment of the quality of

television pictures, ITU-R Std., 2002.

[HHI08] http://iphome.hhi.de/suehring/tml/download/

[IIIJ07] ITU-T and ISO/IEC JTC 1, “Advanced Video Coding for Generic

Audiovisual Services,” ITU-T Recommendation H.264 and

ISO/IEC14496-10 (MPEG-4 AVC), Version 8: Consented in July 2007

(26)

[DVDM02] C. De Vleeschouwer, J.-F. Delaigle, B. Macq, “ Invisibility and application functionalities in perceptual watermarking an overview,” Proc. of the IEEE Vol. 90, No.1, pp. 64-77, Jan. 2002.

[OM00] R. Ohbuchi and H. Masuda, “Managing CAD Data as a Multimedia

Data Type Using Digital Watermarking”, IFIP WG 5.2, Fourth International Workshop on Knowledge Intensive CAD (KIC-4), Parma, Italy, May. 22-24, 2000.

[I08] Intel IPP H.264 codec (Intel Corporation)

http://www.intel.com/cd/software/products/asmo-na/eng/perflib/ipp/index.htm

[IOS04] International Organization for Standardization ISO/IEC

JTC1/SC29/WG11 Coding of Moving Pictures and Audio, Call for evidence on Multi-View Video Coding, Palma de Mallorca, Spain, October 2004

[IOS05] International Organization for Standardization ISO/IEC

JTC1/SC29/WG11 Coding of Moving Pictures and Audio, Survey of algorithms used for Multi-View Video Coding (MVC), Hong Kong, China, January 2005

[WS07] T. Wiegand and G.Sullivan, The H.264/AVC Video Coding Standard,

IEEE Signal Processing Magazine, March 2007

[S02] C. Stamm, “PGF: A New Progressive File Format for Lossy and

Lossless Image Compression”, Proceedings of WSCG02, pp 421-428, 2002.

[M99] H.S. Malvar, “Fast Progressive Wavelet Coding”, Proceedings

IEEE DCC’99, 1999.

[WLHW07] T.-T. Wong, C.-S. Leung, P.-A. Heng and J. Wang, “Discrete

Wavelet Transform on Consumer-Level Graphics Hardware”, Trans. on Multimedia, Vol. 9, No. 3, pp. 668-673, April 2007

[KAWL07] M. C. Kung, O. C. Au, P. Wong, C.-H. Liu: “Intra Frame Encoding

Using Programmable Graphics Hardware”. PCM 2007, pp. 609-618, 2007

[PRNW07] B. Pieters, D. Van Rijsselbergen, W. De Neve, and R. Van de

Walle, Motion Compensation and Reconstruction of H.264/AVC Video Bitstreams using the GPU,Eight International Workshop on Image Analysis for Multimedia Interactive Services(WIAMIS'07), 2007.

[LLWCTC07] C.-Y. Lee, Y.-C. Lin, C.-L. Wu, C.-H. Chang, Y.-M. Tsao, and S.-Y. Chien Multipass and Frame Parallel Algorithms of Motion Estimation in H.264/AVC for Generic GPU, ICME07, pp. 1603-1606, 2007.

[PTH04] Parrein, B., Tarin, M., Horain, P. Demosaicking and JPEG2000

compression of microscopy images, ICIP '04. , Vol. 1, pp .521-524, 2004

WP1 Digital Surveying