What is Compression Techniques in multimedia

Lecture Notes Compression Technologies and Multimedia Data Formats and what is compression techniques for images
Dr.HenryEvans Profile Pic
Published Date:10-07-2017
Your Website URL(Optional)
Lecture Notes Compression Technologies and Multimedia Data Formats Ao.Univ.-Prof. Dr. Andreas Uhl Department of Computer Sciences University of Salzburg Adress: Andreas Uhl Department of Computer Sciences J.-Haringerstr.2 A-5020 Salzburg ¨ Osterreich Tel.: ++43/(0)662/8044/6303 Fax: ++43/(0)662/8044/172 E-mail: uhlcosy.sbg.ac.atKapitel 1 Basics 1.1 What is Compression Technology ? Compression denotes compact representation of data. In this lecture we exclusively cover compression of digital data. Examples for the kind of data you typically want to compress are e.g. • text • source-code • arbitrary files • images • video • audio data • speech Obviously these data are fairly different in terms of data volume, data structure, intended usage etc. For that reason it is plausible to develop specific compression technologies for different data and application ty- pes, respectively. The idea of a general purpose compression technique for all thinkable application scenarios has to be entirely abandoned. Instead, we find a large number of very different techniques with respect to target data types and target application environments (e.g. data transmission in networks like streaming, storage and long term archiving applications, interactive multimedia applications like gaming etc.). Given this vast amount of different techniques, there are different ways how to classify compression techniques: • with respect to the type of data to be compressed • with respect to the target application area • with respect the the fundamental building blocks of the algorithms used 1KAPITEL 1. BASICS 2 Terminology: when talking abount compression, we often mean “lossy compression” while “lossless com- pression” is often termed as coding. However, not all coding algorithm do actually lead to lossless compres- sion, e.g. error correction codes. Like in every other field in computer science or engineering, the dominating language in compression technologies is English of course. There are hardly any comprehensive and up-to- date German books available, and there do NOT exist any German journals in the field. Codec denotes a complete system capable of encoding and decoding data which consists of an Encoder and a Decoder, transcoding is a conversion from one encoded digital representation into another one. A fundamental term in the area is compression rate (or compression ratio) which denotes the relation between the size of the original data before compression and the size of the compressed data. Compression ratio therefore rates the effectivity of a compression system in terms of data reduction capability. This must not be confused with other measures of compressed data size like bit per pixel (bpp) or bit per sample (bps). 1.2 Why do we still need compression ? Compression Technology is employed to efficiently use storage space, to save on transmission capacity and transmission time, respectively. Basically, its all about saving resources and money. Despite of the overwhel- ming advances in the areas of storage media and transmission networks it is actually quite a surprise that still compression technology is required. One important reason is that also the resolution and amount of digital data has increased (e.g. HD-TV resolution, ever-increasing sensor sizes in consumer cameras), and that there are still application areas where resources are limited, e.g. wireless networks. Apart from the aim of simply reducing the amount of data, standards like MPEG-4, MPEG-7, and MPEG-21 offer additional functionalities. During the last years three important trends have contributed to the fact that nowadays compression technology is as important as it has never been before – this development has already changed the way we work with multimedial data like text, speech, audio, images, and video which will lead to new products and applications: • The availability of highly effective methods for compressing various types of data. • Theavailabilityoffastandcheaphardwarecomponentstoconductcompressiononsingle-chipsystems, microprocessors, DSPs and VLSI systems. • Convergence of computer, communication, consumer electronics, publishing, and entertainment indu- stries. 1.3 Why is it possible to compress data ? Compression is enabled by statistical and other properties of most data types, however, data types exist which cannot be compressed, e.g. various kinds of noise or encrypted data. Compression-enabling properties are: • Statistical redundancy: in non-compressed data, all symbols are represented with the same number of bits independent of their relative frequency (fixed length representation).KAPITEL 1. BASICS 3 • Correlation: adjacent data samples tend to be equal or similar (e.g. think of images or video data). There are different types of correlation: – Spatial correlation – Spectral correlation – Temporal correlation In addition, in many data types there is a significant amount of irrelevancy since the human brain is not able to process and/or perceive the entire amount of data. As a consequence, such data can be omitted without degrading perception. Furthermore, some data contain more abstract properties which are independent of time, location, and resolution and can be described very efficiently (e.g. fractal properties). 1.4 History of compression technologies • 1st century B.C.: StenographyKAPITEL 1. BASICS 4 • 19th century: Morse- and Braille alphabeths • 50ies of the 20th century: compression technologies exploiting statistical redundancy are developed – bit-patterns with varying length are used to represent individual symbols according to their relative frequency. • 70ies: dictionary algorithms are developed – symbol sequences are mapped to shorter indices using dictionaries. • 70ies: with the ongiong digitization of telephone lines telecommunication companies got interested in procedures how to get more channels on a single wire. • early 80ies: fax transmission over analog telephone lines. • 80ies: rst fi applications involving digital images appear on the market, the “digital revolution” starts with compressing audio data • 90ies: video broadcasting, video on demand, etc. 1.5 Selection criteria for chosing a compression scheme If it is evident that in an application compression technology is required it has to be decided which type of technology should be employed. Even if it is not evident at rs fi t sight, compression may lead to certain surprising benets fi and can offer additional functionalities due to saved resources. When selecting a specific system, quite often we are not entirely free to chose due to standards or de-facto standards – due to the increasing develeopment of open systems and the eventual declining importance of standards (example: MPEG-4standardization)thesecriteriamightgainevenmoreimportanceinthefuture.Importantselection criteria are for example: • data dimensionality: 1-D, 2-D, 3-D, ......... • lossy or lossless compression: dependent of data type, required quality, compression rate • quality: with target quality is required for my target application ? • algorithm complexity, speed, delay: on- or off-line application, real-time requirements • hardware or software solution: speed vs. price (video encoder boards are still costly) • encoder / decoder symmetry: repeated encoding (e.g. video conferencing) vs. encoding only once but decoding often (image databases, DVD, VoD, ....) • error robustness and error resilience: do we expect storage or transmission errors • scalability: is it possible to decode in different qualities / resolutions without hassle ? • progressive transmission: the more data we transmit, the better the quality gets.KAPITEL 1. BASICS 5 • standard required: do we build a closed system which is not intended to interact with other systems (which can be desired due to market position, e.g. medical imaging) or do we want to exchange data across many systems • multiple encoding / decoding: repeated application of lossy compression, e.g. in video editing • adaptivity:doweexpectdatawithhighlyvaryingpropertiesorcanwepre-adapttospecificproperties (fingerprint compression) • synchronisation issues – audio and video data should both be frame-based preferably • transcoding: can the data be converted into other datatypes easisly (e.g. MJPEG −− MPEG)Kapitel 2 Fundamental Building Blocks 2.1 Lossless Compression In lossless compression (as the name suggests) data are reconstructed after compression without errors, i.e. no information is lost. Typical application domains where you do not want to loose information is compression of text, les fi , fax. In case of image data, for medical imaging or the compression of maps in the context of land registry no information loss can be tolerated. A further reason to stick to lossless coding schemes instead of lossy ones is their lower computational demand. For all lossless compression techniques there is a well known trade-off: Compression Ratio – Coder Complexity – Coder Delay. Lossless Compression typically is a process with three stages: • The model: the data to be compressed is analysed with respect to its structure and the relative frequency of the occuring symbols. • The encoder: produces a compressed bitstream / file using the information provided by the model. • The adaptor: uses informations extracted from the data (usually during encoding) in order to adapt the model (more or less) contineously to the data. The most fundamental idea in lossless compression is to employ codewords which are shorter (in terms of their binary representation) than their corresponing symbols in case the symbols do occur frequently. On the other hand, codewords are longer than the corresponding symbols in case the latter do not occur frequently (there’s no free lunch ). Each process which generates information can be thought of a source emitting a sequence of symbols chosenfromanit fi ealphabeth(incaseofcomputer-basedprocessingthisboilsdowntothebinaryalphabeth 0,1). We denote this “source of information” by S with the corresponding alphabeth s ,s ,...,s and 1 2 n their occurance probabilities p(s ),p(s ),..., p(s ). To make things simpler (and more realistic as well), 1 2 n we consider these probabilities as being relative frequencies. We want to determine the average information a source emits. Following intuition, the occurence of a less frequent symbol should provide more information compared to the occurence of highly frequent symbols. We assume symbols to be independent and consider theinformationprovidedbysuchsymbolsasbeingthesumofinformationasgivenbythesingleindependent 6KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 7 symbols. We dene fi 1 I(s )=log i p(s ) i as being the information delivered by the occurence of the symbol s related to its probability p(s ). The i i basis of the logarithm determines the measure of the amount of information, e.g. in case of basis 2 we talk about bits. By averaging over all symbols emitted by the source we obtain the average information per symbol, the entropy: n n X X H(S)= p(s )I(s )=− p(s )log p(s ) bits/symbol i i i i 2 i=1 i=1 The most significant interpretation of entropy in the context of lossless compression is that entropy measures how many bits are required on average per symbol in order to represent the source, i.e. to conduct lossless compression. Entropy therefore gives a lower bound on the number of bits required to represent the source information. In the next section we will discuss several techniques approaching more or less closely this theoretical bound. 2.1.1 Influence of the Sampling Rate on Compression If we compare data originating from an identical analog source where y is sampled ten times as densly as i x , we notice that neighbouring values of y tend to be much more similar as compared to neighbouring i i values in x . If we assume that the analog signal exhibits continuity to some degree, the samples of y will i i be more contineous (i.e. more similar) as compared to the samples ofx since these are taken from locations i in the signal more dislocated as the samples of y . The higher degree of similarity in y has a positive effect i i on entropy and therefore on the achievable compression ratio. 2.1.2 Predictive Coding - Differential Coding This technique is not really a compression scheme but a procedure to preprocess data in order to make them more suited for subsequent compression. Differential coding changes the statistics of the signal – based on the fact that neighbouring samples in digital data are often similar or highly correlated, differential coding does not compress the sequence of original data points x ,x ,...,x but the sequence of differences 1 2 N y =x − x . While x usually follow a uniform or normal distribution, y exhibit significant peaks around i i i− 1 i i 0 (always assuming that neighbouring samples tend to be similar as it is the case in audio, image, and video data). To give an example, we consider an image with 999 pixels and 10 different symbols. In case a) we have p(s ) = 1/10, i = 1,...,10, in case b) we have p(s ) = 91/100 and p(s ) = 1/100 for i = 2,...,10. i 1 i Case a) corresponds to the “conventional” image with unifomly distributed symbols, while case b) is the differential image consisting of y , where s is the zero symbol. In case a) we result in H(S)=3.32, while in i 1 case b) entropy is only H(S) = 0.72. This small example immediately makes clear that differential coding is sensible. Actually, it is the basis of many compression schemes employing prediction.KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 8 2.1.3 Runlength Coding Thebasicprincipleistoreplacesequencesofsuccessiveidenticalsymbolsbythreeelements:asinglesymbol, a counter, and an indicator which gives the interpretation of the other two elements. As an example we discuss the tape drive of the IBM AS/400 (where obviously lots of blanks are en- countered): strings of consecutive blanks (2-63 byte) are compressed into a codeword which contains the information “blank” and a counter. Strings of consecutive identical characters unequal blank (3-63 byte) are compressed into 2 byte: Control byte (“connsecutive char” and counter) and character byte. Strings of non-consecutive identical characters are expanded by a control byte (“non- consecutive identical”). Example: b Blank, control byte; bbbbbbABCDEFbb33GHJKbMN333333 is compressed into ABC- DEF33GHJKbMN3 ATTENTION: if the data is not suited for runlength encoding we result in data expansion (caused by the expansion of non-consecuritve identical chars). Runlengh coding is extremely fast on the other hand and can compress quite well in case of suited data. 2.1.4 Variable length Coding and Shannon-Fano Coding We denote the length of the codeword for the symbols asl(s ) (in bits). The average code length is dened fi i i P n as:L(S)= p(s )l(s ). In order to minimize this expression (which obviously is the aim of compression) i i i=1 we need to assign short codewords to symbols with high probability of occurence. It is intersting to notice P n the close relation to entropy: H(S) = p(s )I(s ). Consequently it is evendent that l(s ) = I(s ) = i i i i i=1 − log p(s ) needs to be fulfilled in order to attain the actual entropy value. However, this is possible only i 2 in case log p(s ) is a natural number. Usually, this is not the case of course, so the value is rounded to the i 2 next integer (which is a hint to the sub-optimality of the entire procedure ). Shannon-Fano Coding (see below) exactly provides the corresponding solution using the idea described before. The term code efficiency H(S) is dened fi as – obviously values close to 1 are desirable L(S)KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 9 Example: S = s ,s ,s ,s with p(s ) = 0.6, p(s ) = 0.3, p(s ) = 0.05 and p(s ) = 0.05. H(S) = 1.4 1 2 3 4 1 2 3 4 bits/symbol. When chosing a xe fi d length encoding like e.g. 00, 01, 10, 11 as codewords with average code length of 2 bits/symbol. When computing l(s ) we obtain the values 1,2, and 5 and result in the following i variable length code: 0, 10, 110 und 111. s and s would require a 5 bit codeword but 3 bits are sufficient 3 4 to generate a unique code. The corresponding average code length is 1.5 bits/symbol. A prefix code (or prex-free fi code) is a code system, typically a variable-length code, with the “prexfi property”: there is no valid code word in the system that is a prexfi (start) of any other valid code word in the set. E.g., a code with code words 9, 59, 55 has the prexfi property; a code consisting of 9, 5, 59, 55 does not, because 5 is a prefix of both 59 and 55. With a prefix code, a receiver can identify each word without requiring a special marker between words. Awaytogenerateaprexfi codeistouseafullbinarytreetorepresentthecode.Theleavesrepresentthe symbols to be coded while path traversing the tree from the root to the leave determines the bit-sequence of the resulting codeword. To actually determine a bitsequence, we need to dene fi how to encode a branching: for example, a branching to the left can be encoded by 0 and a branching to the right by 1. The tree given in the gure fi leads to the following codewords: A – 00, B – 01, C – 100, D – 101, E – 1. The tree (and code) generationoftheShannon-Fanoschemeworksasfollows(intheexample,weassumethefollowingfrequency counts: A – 15, B – 7, C – 6, D – 6, E – 5): 1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known. 2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. 3. Divide the list into two parts, with the total frequency counts of the left half being as close to the total of the right as possible. 4. The left half of the list is assigned the binary digit 0, and the right half is assigned the digit 1. This means that the codes for the symbols in the first half will all start with 0, and the codes in the second half will all start with 1.KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 10 5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree. A procedure often employed is to replace the alphabeth consisting of the s by an extension using tuples i s s with their corresponding relative frequencies. Using this strategy, a code length of 1.43 bits/symbol can i j be achieved, however, at a higher cost in terms of computation and memory requirements. 2.1.5 Huffman Coding HuffmanCodingisapopulartechniqueusingtheideaofvariablelengthcodingcombinedwithaconstructive algorithm how to build the corresponding unique codewords. Suppose we have given a source S with an alphabeth of size n. The two least frequent symbols are combined into a new virtual symbol which is assigned the sum of occurence probabilites of the two original symbols. Subsequently, the new alphabeth of size n− 1 is treated the same way, which goes on until only two symbols are left. If we know the codewords of the new symbols, the codewords for the two original symbols are obtained by adding a 0 and 1 bit to the right side of the new symbol. This procedure is applied recursively, i.e. starting with the two most frequent symbols which are assigned codewords 0 and 1, we successively add corresponding bits to the right until codewords for all original symbols have been generated. The example has entropy H(S) = 2.15, the gererated Huffman code have average code length of 2.2 bits/symbol, an ad-hoc generated code like shown before, like e.g. 0,10,110,1110,1111 has average code length of 2.25 bits/symbol. Modification : in case many symols have small probabilities of occurence, a large number of long codewords is the result which is not efficient. Therefore, all such symbols are assinged into a dedicated class which is termed “ELSE” in the Huffman code and which is encoded separately. This idea is calles modified Huffman code. Problems: 1. In case a source has a symbol withp(s) close to 1 and many others with small probability of occurence we result in an average code length of 1 bit/symbol since the smallest length for a codeword is of course 1 bit. The entropy is of course much smaller (recall the coding of differences). 2. In case of changing statistics of the source one can easily obtain a data expansion. 3. Since a fixed codebook is of poor quality in many cases we have a two stage algorithm: building statistics (the model), generate the code.KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 11 4. Adaptivity is difficult to implement, since changes in the statistics affect the entire tree and not just a local part. We can either store corresponding Huffman tables (trees) in the data which is inefficient in terms of compression or compute them on the yfl from decoded data which is inefficient in terms of compression speed. Huffman encoding is used in most older standards like JPEG, MPEG-1 to MPEG-4, however, in a non-adaptive manner which is possible due to the nature of the DCT transform. 2.1.6 Arithmetic Coding In Huffman coding we have a correspondence between a single symbol and its codeword – which is the main reason for its suboptimality. Arithmetic coding uses a single codeword for an entire sequence of source symbols of length m. In this manner the restriction to integer-valued bits per symbol values can be avoided in an elegant way. A drawback however is that similar sequences of source symbols can lead to entirely different codewords. The principal idea is as follows. Suppose we have a sequence of binary source symbols s with length m P m where p(0) = p and p(1) = 1 − p. Let I = 0,1) be the semi-open unit interval and p(s ) = 1 m m when summing over all 2 possible sequences of length m. Based on this initial construction we can build m subintervals I with l = 1,2,...,2 which are all in I and which can be assinged in unique manner to a l sequence s where I = p(s ) and all I are disjoint. The following procedure constructs such intervals m l m j and is called “Elias Code” (this approach is meant for illustrative purposes only since it is inpractical for implementations). The interval I is partitioned into two subintervals 0,p) (in case 0 is the symbol to be coded) and p,1) (in case 1 is to be coded). Based on the occurance of the next symbol (0 or 1) these intervals are further 2 2 2 2 partitioned recursively into 0,p ) and p ,p) and into p,2p− p ) and 2p− p ,1), respectively. The interval 2 j− 1 j− 1 p ,p) for example is assigned to the sequence 01. After j− 1 source symbols the interval L ,R ) has been assigned to the sequence s . j− 1 j j− 1 j j− 1 j− 1 j− 1 • 0 is next: L =L and R =L +p(R − L ). j j− 1 j− 1 j− 1 j j− 1 • 1 is next: L =L +p(R − L ) und R =R . j j Based on this construction one can show that the length of each interval L ,R ) has the same length as p(s ) of the corresponding sequence. As soon as the interval has been constructed, the codeword for s is m m j found easily: we represent L in binary manner and keep − log p(s ) bits (rounded to next integer) after m 2 thedecimalpoint.Thisissufficienttouniquelyidentifytheintervalandthecorrespondingsymbolsequence. In actual implementations it is sufficient to encode an arbitrary point in the constructed interval (of course a point with minimal length in the binary representation is selected) and the number of encoded symbols. An important fact is that in small intervals, more binary positions are required to represent a point (since no points with small “binary length” are contained). Actualimplementationsuseare-normalisationto0,1)ineachintervalpartitioningstageduetoproblems with rounding errors. Example: Encoding the sequence wuvw leads to the interval 0.5822,0.5850 the length of which is 0.0028 =p(w)p(u)p(v)p(w).Inbinaryrepresentationtheintervalbordersare0.1001010111000 and0.1001010100001 , 2 2KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 12 repsectively. The shortest binary number in this interval is 0.100101011 which is the codeword for this se- 2 quence. When computing the entropy of the sequence, we get − log 0.0028=8.48bit which is close to 9. 2 Advantages: 1. Adaptivity can be easily realised – encoder and decoder need to carry the statistical information in identical (i.e. synchronized) manner and can adapt the interval boundaries correspondingly. Either for each symbol, or for a xed fi amount of symbols, or when a certain extent of statistical change occurs. In any case there is no need to store the statistical information. 2. Already if only a small amount of encoded data is available (i.e. the MSB of the position) we can start decoding, there is no need to wait for the receipt of the entire data to start decoding. With the first symbols we can determine in which of the two initial intervals the final interval will be located, upon receipt of more information also more symbols can be decoded. 3. The restriction to integer bits/symbol values is overcome.KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 13 Arithmetic coding is used in JBIG-1 as the only coding engine and in the more recent lossy standards like JPEG2000 and H.264 as lossless compression stage. 2.1.7 Dictionary Compression Basic idea: the redundancy of repeated phrases in a text is exploited (recall: in Runlenth encoding, repeated identical symbols can be efficiently coded). Here we encode repeatedly occuring symbol strings in arbitrary files in an efficient manner. A frontend selects strings and information for compression and dekompression is stored in a dictionary. The encoder and the decoder need to have access to a copy of this dictionary of course.Inthismanner,entirestringsarereplacedbytokens(i.e.codewords).Thebasicalgorithmicelements are lookup tables, and the corresponding operations are very fast. LZ77 Dictionary Compression The name originates from Lempel and Ziv (1977), the algorithm actually described here is a variant, LZSS (Storer/Szymanski) as of 1982. A typical feature is the so-called “sliding window” consisting of two parts: the history buffer (the dictionary) contains a block of data recently encoded (some 1000 symbols) – and the look-ahead buffer which contains strings to be encoded (10-20 symbols). In important issue is the coice of the buffer sizes: • History buffer small − match is unlikely. • History buffer large − high computational requirements and poor compression rate caused by the large address space for the pointers into the buffer.KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 14 • Look-ahead buffer small − no match for long strings is possible (waste of potential redundancy). • Look-ahead buffer large − time is wasted for the unlikely case of finding matches for long strings. Example: Text out − ——DATA COMPRESSION—— COMPRESSES DATA—— − Text in. “DATA COMPRESSION” is the content of the history buffer, “COMPRESSES DATA” is the look-ahead buffer. The actual coding used subsequently is the QIC-122 standard for tape drives using a history buffer of 2048 bytes. To start with, we encode each symbol in “DATA COMPRESSION” as 0ASCII Char. Wert. “COM- PRESS”isastringoflength9whichhasbeenfoundinthehistorybuffer12symbolsbackwards:1 offset=12 length=9. Subsequently we encode “E” and “S” again as 0ASCII Char. Wert and “DATA” as string of length 4 which has been found in the history buffer 28 positions backwards: 1 offset=28 length=4. We notice that • the algorithm requires a start-up phase where the data in the dictionary has to be encoded explicitely, which means that it is hardly suited for compressing lots of small le fi s. • The algorithm is assymetric – searching the history buffer in an efficient manner is crucial. Intelligent data structures are required. • Since data leaves the history buffer after some time, long term redundancy cannot be exploited. LZ78 Dictionary Compression The algorithm described here the LZW(elch) Coding variant as of 1984. This algorithm solves the problems arising with the explicit definition of the two buffers. Stings in the dictionary are not automatically reased and the lookahead buffer does not have a xed fi size. The algorithm does not employ pointers to earlier complete strings but a procedure called “Phrasing”. A text is divided into phrases, where a phrase is the longest string found so far plus an additional character. All symbols are replaced by topkens (i.e. codewords) which point to entries in the dictionary. The rs fi t 256 disctionary entries are the common 8 bit ASCII code symbols. The dictionary is successively extended by new phrases generated from the text (i.e. data) to be coded. Critical issues: • How to efficiently encode a variable number of strings with variable length ? • How long can we grow the dictionary ? What are the strategies when the dictionary gets too large (start from scratch, keep the most common entries) ? 2.1.8 Which lossless algorithm is most suited for a given application ? Testing with exemplary data is important apart from the facts we know theoretically with respect to speed, assymetry, compression capability etc. KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 15 2.2 Lossy Compression In addition to the tradeoff between coding efficiency – coder complexity – coding delay the additional aspect of compression quality arises with the use of lossy methods. Quality is hard to assess since for many application human perception is the only relevant criterion. However, there are some scenarios where other factors need to be considered, e.g. when compressed data is used in matching procedures like in biometrics. Although several quality measures have been proposed over the last years, PSNR is still the most used measure to determine image and video quality after compression has been applied: v u N u X t 0 2 RMSE = 1/N (f(i)− f (i)) i=1 0 where f (i) is the reconstructed signal after decompression. Peak-signal-to-noise ratio (measured in de- ciBel dB) is dened fi as follows (where MAX is the maximal possible value in the original signal f(i)): MAX PSNR =20log ( ). 10 RMSEKAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 16 The higher the value in terms of PSNR, the better is the quality of the reconstructed signal. With increasing compression ratio, PSNR decreases. While PSNR is not well correlated with moderate quality changes, it serves its purpose as a rough quality guideline. Many other numerical measures have been developed with better prediction of human perception (e.g. structured similarity index SSIM) but these have not been widely adopted One reason is that image compression can be easily optimized with respect to RMSE (rate / distortion optimisation), but not with respect to the “better” measures. In order to really take human perception into account, subjective testing involving a significant number of human subjects is necessary – the resulting subjective measure is called MOS (“mean opinion score”). When developing quality measures, a high correlation with MOS computed over large databased is desired. PSNR does not correlate well with MOS. 2.2.1 Quantisation When changing the description of a signal from using a large alphabeth to using a smaller alphabeth of symbols we apply quantisation. Obviously, when reversing the process, a perfect reconstuction is no longer possible since two or more symbols of the larger alphabeth are mapped to a single symbol of the small alphabeth. Often, quantisation errors are the only actual errors caused by lossy compression (apart from rounding errors in transforms). Example: suppose we have stored a grayscale image with 8bit/pixel (bpp) precision, which means that for each pixel, we may choose among 256 different grayscales. If we apply a quantisation to 4 bpp, only 16 grayscale can be used for a pixel. Of course, it is no longer possible to re-transform the 4 bpp into the original bit-depth without errors. The information is lost. Ifsinglesignalvaluesaresubjecttoquantisationone-by-one(asinglevalueisquantisedintoitslessaccu- rate version), the procedure is denoted “scalar quantisation”. If we subject n-tuples of signal values together to quantisation (a vector is replaced by a less accurate version), we call the scheme “vector quantisation”. In any case, quantisation may be interpreted as a specific kind of approximation to the data. Ifwedevidetherangeoftheoriginalsignalintonequallysizedanddisjointintervalsandmapallsymbols contained in an interval to one of the n symbols of the new alphabeth, the quantisation is called “uniform”. Especiallyinthecaseofuniformlydistributedprobabilitiesofsymboloccurrenceuniformquantisationmakes sense.Quantisationismostoftenappliedtotransformcoefficients(e.g.afterDCTorwavelettransform)–in thiscontext,coefficientsquantisedtozeroaredesiredtoappearfrequently.Therefore,“uniformquantisation with deadzone around 0” is introduced, we uniform quantisation is applied, except for the quantisation bin around zero, which has twice the size of the other bins (symetrically around zero). In case the occurrence probabilities of the symbols in the original signal are not uniformly distributed andthedistributionisknown,itmakessensetoapplyfinerquantisation(i.e.smallerbins)intheareasofthe histogram where more data values are accumulated. In this case we talk about “non- uniform quantisation”. Note that following this strategy we minimise PSNR (the error is minimized in areas where many values are found) but we do not care at all about perceptual issues. Ifthedistributionoftheoccurrenceprobabilitiesofthesymbolsisnotknown,thesignalcanbeanalysed and based on the results, the quantisation bins can be set adaptively – termed “adaptive quantisation”. Since especially in the case of transform coding in many cases the probabilities of symbol occurrence is quite stable for typical signals, quatisation can be modeled in advance and does not have to be adapted to signalKAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 17 characteristics. 2.2.2 DPCM Differential pulse code modulation (DPCM) is the general principle behind differential / predictive coding. Instead of considering the simple difference between adjacent signal values, also the difference between a signal value to be predicted and a linear-combination of surrounding values (i.e. the prediction) can be considered. 0 0 Example: y = x − x , where x = αx + βx + γx . The values α,β,γ are denoted “predictor i i i− 1 i− 2 i− 3 i i 0 coefficients”, x is the prediction for x . The example is a linear predictor of order 3. Instead of storing i i or processing the sequence x , the sequence of differences y is considered, which has a significantly lower i i entropy (and can therefore be better compressed). The predictor coefficients can be fixed for classes of known signals (e.g. exploiting specific patterns like ridges in fingerprint images) or can be adapted for each single signal (in the latter case, these need to be stored). The optimal values for α,β,γ e.g. minimising entropy for y can be determined of course. i Furthermore, for each distinct part of a signal dedicated coefficients can be used and optimsied (“local adaptive prediction”). By analogy, these coefficients need to be stored for each signal part impacting on compression ratio (obviously, there is a trade-off ). The decision if DPCM is lossy or lossless is taken by the way how y is compressed. In case e.g. Huffman i coding is applied to y we result in an overall lossless procedure. Is y subject to quantisation, the overall i i scheme is a lossy one. A variant of DPCM frequently employed is to use a fixed set of predictor and quantiser combinations which are applied tp the data depending on local signal statistics. Lossless JPEG is such a scheme (without applying quantisation of course). 2.2.3 Transform Coding Most standardised lossy media compression schemes are based on transform coding techniques. Typically, tranform based coding consists of three stages. 1. Transformation 2. Quantisation: a variety of techniques (as discussed before) can be applied to map the transform coef- ficients to a small alphabeth, the size of the latter being selected corresponding to the target bitrate. Most rate-control procedures are applied in this stage. 3. Lossless encoding of coefficients: a variety of techniques (as discussed before) is applied in order to encode the quantised transform coefficients close to the entropy bound. Typically, a combination of runlength (zero-runs ) and Huffman coding (for the older standards like JPEG, MPEG-1,2,4) or arithmetic coding (for the more recent standards like JPEG2000 H.264) is used.KAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 18 The Transform The aim of transformation is to change the data of a signal in a way that it is better suited for subsequent compression. In most cases, so-called “integral-transforms” are used, which are based (in their contineous form) on applying integral operators. When applying transforms, the concept of projecting the data onto orthogonal basis functions is used. Motivation: Vectors in two-dimensional space can be represented by a set of othogonal basis-vectors (“orthogonal basis”, orthogonal requires the inner product between vectors to be 0). (x,y)=α (1,0)+β (0,1). (1,0),(0,1)areorthogonalbasis-vectors,α undβ arecalled“coefficients”whichindicateaweightforeach basis-vector in order to be able to represent the vector (x,y). The property of orthogonality guarantees that a minimal number of basis-vectors suffices (obviously, this will be important for compression). This concept can be transferred by analogy to represent functions or signals by means of basis elements. X f(x)= f(x),ψ (x)ψ (x) n n n ψ (x) are orthogonal basis-functions,f(x),ψ (x) are transform coefficients which determine the weight n n to be applied to each basis-function, to represent a given signal. When performing the actual compression, the f(x),ψ (x) are computed, quantised, and stored in encoded form. Since ψ (x) are chosen to be an n n orthogonal function system, the corresonding number of coefficients to be stored is minimal. − πinx Example: in the case of the Fourier transform ψ (x)=e =cos(nx)− isin(nx). In this special case we n observe the frequency of periodic signals. A Fourier coefficient f(x),ψ (x) reports the strength of the n frequencycomponentninthesignalwhichhasbeensubjectedtothetransform.Obviously,notallfunctions can be represented well by stationary (i.e. frequency does not change over time) periodic basis functions. This is where the wavelet transform and other types enter the scene. P To conduct transform coding, we represent the signal f(x) by f(x) = f(x),ψ (x) ψ (x) n n n (i.e. the transform coefficients are computed). The transform coefficients f(x),ψ (x) are quantised n subsequently and finally compressed losslessly. In case ψ (x) are well chosen, the transform coefficients are 0 n afterquantizationformostbasisfunctionsanddonotneedtobestored.Consequently,inthereconstruction the error is small, since most coefficients were 0 anyway. Most of the signal information is concentrated in a small number of coefficients – we speak of energy compactation here. Therefore, the art of transform coding is to select an appropriate set of suited basis functions that works well for a wide class of signals. A further way to interpret integral transforms is to view them as a rotation of the coordinate axes. The data is transformed into a coordinate system, which allows for a more efficient compression. As a motivation for this interpretation, we consider pairs X = (x ,x ) which are gernerated from neighboring pixel values 1 2 in an image. Plotting all pairs into a coordinate system most of the pairs are positioned around the main diagonal (since most neighboring pixels tend to be similar). The variance of x is dened fi as j M X 2 2 σ =1/M (x − x ) ji x j j i=1 where M is the number of pairs and x is the mean of x computed over all pairs. A simple technique to j j apply compression is to replace one of the two components in each pair by the corresponding meanx . The jKAPITEL 2. FUNDAMENTAL BUILDING BLOCKS 19 2 MSE of this compression approach is exactly σ , the variance. No matter which coordinate we replace be x j the mean, the variance of both cases is large and therefore, the error of this compression approach is large. Let us consider the transformation of X to Y =(y ,y ) by applying a rotation of the coordinate axes by 1 2 45 degrees (Y = AX where A is a rotation matrix). A fundamental property of these types of transforms is that they preserve the variance, i.e. 2 2 2 2 σ +σ =σ +σ x x y y 1 2 1 2 While variances in the original coordinate system have been almost equally for x and x , in the new 1 2 2 2 coordinate system variance is distribute quite unevenly, i.e. σ σ . Therefore, in this representation it y y 1 2 is obvious to apply our compression scheme to y (by replacing its components by the mean y ) since the 2 2 variance and consequently the error will be small. Examples of popular transforms (desired properties are: decorrelation of data, energy compactation, ba- sis functions not dependent on data, fast algorithms): 1. Karhunen-Loeve transform: optimal transform since basis functions are computed depending on the 3 data. A disadvantage is the high complexity O(N ) and the fact that the basis functions need to be stored and cannot be used universally. This transform is used to decorrelate “multiple-spectral band imagery” with a larger number of bands. Closely related to PCA

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.