[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Date Index][Thread Index][Author Index]

Re: Re:compression



[computer types: this is an intentionally simplified explanation of data
compression.  please don't castigate me for bitwise imprecision.  otherwise
feel free to hate me for the mostly non-looping content!]

Malhomme Olivier writes:

> There is not that much difference between these two different events
> that share the same name. In audio compression you loose clearly some
> information.

Hate to further beat a dead horse here, but there is indeed that much
difference between data compression and audio compression: _not all data
compression is lossy_ (though it is on some things, like mini-disc
recorders).  Think of it this way: the digital file (whether it's a CD or a
loop in the RAM of yer stylin' 'Plex Pro) is merely strings of zeroes and
ones.  Each of those zeroes and ones takes up file space.  Let's say
there's a particular file that looks like this:

101010111101101101101101111101101101010101110110110101101101

There's 60 characters in that file.  Let's say storage space is at a
premium so we want to shrink it down for storage.  If you noticed, there's
lots of repeated characters--pairs of "1"s for example.  With a compression
scheme, we can just tell the storage device that when it sees two "1"s in
succession, it should write an "M" character.  (A smarter compression
scheme would write "M" when it sees "101" but lets say we're not the best
software engineers) So now we have:

101010MM0M0M0M0M0MM10M0M0101010M10M0M010M0M01

That file contains exactly the same information as the first one, but it's
only 45 characters long, yielding a compression rate of 25%.  When we want
to read the file (to play it back, in the case of an audio file) the D/A
converter just puts "11" where it sees "M" and hey presto!  

101010111101101101101101111101101101010101110110110101101101

It's EXACTLY THE SAME as the original program content.  

We could even add another stage of compression where "MO" was replaced with
the character "F" and "10" was replaced with the character "L"

LLLMFFFFFMMLFFLLLMLFFLFF1

Yielding a file that is 25 characters long, etc.

What is happening with data compression is you are trading off CPU time for
storage space.  Since mass storage is pretty cheap, while CPU time is
increasingly at a premium, it's generally not a wise trade-off.  However,
there are specific applications (diskette-style storage media with
significant size limits--small diskette, thus small files) where it's
appropriate.

Cheers,

Scott
tanelorn@dimensional.com