Stay in touch…

Blog

Read the latest Bitstream

RSS Feed

LinkedIn

Look for us at LinkedIn

Twitter

Follow us on Twitter

Surround Professional

An unedited manuscript for an article, published in Issue Four of Surround Professional, discussing the background, implementation and use of Meridian Lossless Packing (MLP), the lossless codec adopted for the DVD-Audio specification.

MLP What?

By now, the fact that the DVD-A specification has past most major hurdles allows us to begin thinking of the technical aspects more carefully. One feature of the audio format that’s new to most practitioners is Meridian Lossless Packing or MLP. There are several lossless data reduction schemes available including Philips’ Direct Stream Transfer approach employed in the SACD format and Merging Technologies’ LRC (Lossless Realtime Coding) system. MLP however, won the war and is the mandated lossless data compression system used by DVD-A. As a DVD disc can only deliver a maximum data rate of 9.6 Mbits/sec., there simply isn’t enough data available from the disc to deliver 24 bit words at 88.2 or 96 kHz sample rate audio all ‘round. MLP compresses the number of bits required to transmit a give amount of information without altering that information. Let’s clarify a key point. Lossless compression means it does nothing to the dynamic range or, for that matter, any other aspect of the audio. It operates solely on the digital data stream, eliminating signal redundancies and thus reducing <italic> either </italic> transmission rate or storage space, two sides of the same bit budget coin. Let’s look more deeply into this space saving science…

From a technical standpoint, MLP provides five important features:

• At least 4 bits/sample of compression for both average and peak data rates

• Easy transformation between fixed–rate and variable–rate data streams

• Careful and economical handling of mixed input sample rates

• For decoders, modest number crunching requirements and special stereo playback considerations

• Automatic data rate savings on both the LFE (low frequency effects) channel without special decoding modes and also on audio sampled at 88.2/96 or 176.4/192 kHz that doesn’t use all of the available bandwidth

Let’s start with data rate savings. A signal that uses less than the full 24 bit essence or payload of the AES input, such as no–amplitude digital black or a reduced resolution playback from 16 bit DATs, will automatically be reduced in both rate and storage space requirements. Bandwidth challenged signals, such as a transfer from an optical sound track or a voiceover that doesn’t come close to using the bandwidth available from sampling at 88.2, would also be shrunk. In addition, channels that are correlated or similar, such a mono surround channel encoded as dual mono “stereo” surrounds, would take up no more room than the mono track alone. The encoder spots all of these situations where lossless data rate compression is possible. Significantly, no flags need be set in the data stream for the decoder. This prevents possible mistakes such as is seen with bogus emphasis flag settings in AES streams.

On average, MLP provides 5 to 11 bits of reduction in data rate or storage needs for 44.1/48 kHz data. At 88.2/96 k, that increases to a 9 to 13 bit savings. At 192 kHz, it can sometimes squeeze out an extra bit for up to 14 bits of compression. Peak reduction rates are more modest as the audio is filling the data pipe; 4 bits of savings for 44.1, 8 bits for 96 and 9 bits for 192 kHz samples. The ability to throttle peak data rates by a substantial, virtually guaranteed amount is where MLP excels and is one of the chief technical reasons for it’s adoption in the DVD-A spec.

For many engineers and producers, the idea of throwing 88.2 data at the rear channels only to reproduce acoustic ambience seems like a waste. To reduce overhead, it’s possible to mix sample rates, front to back, in a 2:1 ratio. The encoder takes in those 88.2 LCR channels and the 44.1 surround channels and upsamples the 44.1 data, allowing the entire process to run with one single sample clock. Decoders have the option of reproducing either the original low rate surround data or the upsampled 88.2 synthetic data. But, most player/processors will have only one internal clock, so they will most likely use the upsampled stream.

Well, how the heck does MLP accomplish these machinations? Though the details are obscured in very clever math, the basic concepts are simple. First, the audio is preprocessed in a “lossless matrix” which will be discussed a bit later. Then, the resulting data is passed through a “de–correlator” to remove any redundant information. Finally, the smaller amount of novel data is entropy coded to pack it into as small a space as possible.

The de–correlator works on the assumption that the input signal is continuous over time with no abrupt changes to the smoothly flowing audio. This implies that each successive sample of data is fairly similar to the prior sample. Given that, you can predict what the next, future sample will be and encode only the difference between the original signal and the prediction. This is encoding scheme is somewhat akin to M–S encoding, where stereo is encoded as a mono signal and a difference–between–channels signal.

Well, how do you predict the future sample? MLP uses linear prediction, whereby a carefully chosen filter is applied to the data in order to remove the correlated or timbral portion of the signal, leaving only a low amplitude residual signal not removed by the filter. A correlated signal is composed of a collection of quasi–periodic components. The residual is de–correlated, composed of aperiodic or noise–like stuff such as transients, phase relationships between partial harmonics, rosin on string, etc. If this transformation is performed correctly, the incoming audio can be completely described by the prediction filter and the residual left at the filter’s output. And equally important, the process is perfectly invertible, whereby the input can be recreated from the filter’s description (it’s coefficients) and the residual data.

The filter description is much more compact than the data it represents so much of the data reduction is accomplished by the de–correlator. But, more compaction can also be had. In the remaining residual PCM data, some sample values statistically occur more often than others. This allows yet another compression scheme, entropy coding, to be used. The idea is to build a lookup table of data words or “symbols” where all possible PCM sample values have a corresponding symbol, kind of like a zip code standing in for a lengthy street address. If the PCM sample values that occur most often are assigned the shortest symbols and the rarer sample values are mapped to the longer symbols, then an average reduction in data storage is achieved. This data shorthand is used by many computer file formats, including GIF, SIT and ZIP files.

Dressler Interview

I asked Dolby’s Roger Dressler, Director of Tech Strategy, some specific questions regard the technology. Here, along with my comments, is what he said…

[Surround Professional] In previous generation transmission methods, CRC was used for error detection. Dolby has said that the encoded MLP stream is “protected to an unusual degree against transmission errors.”

[Roger Dressler] “The CRC system used for PCM is the wrapping used to protect the data delivered on the disc itself. There is no error correction global to the process from the point of the encoder input to the decoder output. For example, PCM data is often altered, usually without (the engineer’s ) knowledge, as it passes from one place to another. Low level bits may be truncated or the data may be resampled, even if the sample rate is not being changed…MLP data is self–checked such that it can verify that the final PCM being decoded from the MLP unpacker is identical with the data that was input to the MLP encoder. If the data has undergone any subtle changes along the way, the output stops. In other words, if the MLP decoder is outputting PCM, then it can be assumed to be the same you gave it. This assumption cannot be guaranteed for other PCM pathways.

The encoder can also be programmed to provide different levels of robustness depending on the nature of the transmission medium. With DVD-Audio, the error rates are already vanishingly low, so there is not much need to use this option there.”

Speaking of protection, watermarking and encryption are not part of MLP. That would need to be taken care of elsewhere but the MLP bitstream can carry watermarking and content provider information. Future high end consumer components should have an Firewire or 13W3/I2S path carrying encrypted 24 bit audio from the DVD physical transport mechanism to a separate A/D converter.

[SP] That method of confirming the signal chain is lossless, the check data. How is that implemented? Is this a simple scheme, such as inclusion of parity bits?

[RD] “Multiple–level error checking is supported by including check words both at the “access unit” level and at the “substream” level. A low cost decoder may elect to omit intensive CRC checks on audio data in favor of simpler parity checks. In case an undetected error remains, the decoder is still prevented from making disturbing ‘bangs.’”

[SP] MLP has been “optimized for modern carriers.” Concerning modern carriers; Does this euphemistically mean that MLP is optimized to reduce the bandwidth required to transport high sample rate, long word length PCM audio? Or, are there specific features that address modern transmission methods such as more advanced bit stream structure or channel coding?

[RD] “It is referring to the delivery medium which will carry the MLP data. Modern carriers are usually block oriented transmission paths as opposed to continuous data as was analog audio. So, for example, MLP can be chopped or restarted with imperceptible delay before audio restarts. In another example, many carriers are “file based” and thus a key issue is file size. MLP allows for optimizing file size by using a VBR (variable bit rate) mode that saves the maximum data space. Whereas in electrical interfaces, MLP can switch to a CBR (constant bit rate) mode which is preferred for various reasons. MLP data can be transformed back and forth infinitely between VBR and CBR modes without re–coding.”

To perform the transformation from VBR to CBR and back, the data stream is simply padded with null data. The MLP process is inherently VBR. If the encoder is set to output in CBR mode, then the peak data rate becomes the CBR rate on output.

[SP] I would assume the encoders are always two pass…

[RD] “The CBR mode needs knowledge of the overall program in order to set the fixed rate, so yes, that would take more time. I do not see any reason why anyone would actually encode to a CBR stream unless they were trying to feed a particular electrical interface as the end product. In DVD authoring, that would not be the typical practice, as the data will be stored as a file on a hard disk. In any case, there will be a quick profile mode that estimates the encoder performance before the real encode pass is made. This can report the CBRrate, or the overall compression afforded by the VBR mode.”

Though encoder implementations have not escaped from the lab as this goes to press, it’s assumed that the operator will have a few options if the target data rate in lossless mode isn’t acceptably low. This should include high quality word length reduction and redithering of selected channels at specific moments in time during a performance.

[SP] I have been told that MLP’s downmix options are not the same as the SMART options described in the final draft of the DVD WG4…

[RD] “MLP can indeed follow all the same commands that SMART content affords the producer, so the end results would be identical whether MLP is used or not. But MLP also affords some additional flexibility in its downmixing tools. For example, in SMART content the downmix settings cannot vary more than once per song. With MLP the coefficients can vary continuously, up to 100 times per second, which could allow some “moves” during a song.”

I mentioned early on that MLP includes a provision for handling the audio data in a way that eases some of the problems associated with multichannel audio being decoded for two channel playback. Both encode and decoder feature a lossless matrix which can recover a properly dithered, 2 channel mix without having to process all 6 channels. This matrix divides the audio input into 2 interleaved groups or substreams, taking into consideration the two classes of decoders — multichannel or stereo–only. Handling these substreams separately keeps stereo playback devices low in cost without degrading the engineer’s hard work. And, being lossless, it eliminates the uncertainties of AC-3’s lossy downmixing. For many conservative mixes, a parallel stereo track need not be included on a title just for the sake of stereo compatibility.

Why was MLP chosen as the mandated compression scheme for DVD-A? As Dressler succinctly stated, “…because it worked as required on all tested materials and offered the highest compression.” Works for me and I, for one, am certainly looking forward to hearing this science at work in my production and listening environments.

My thanks to Roger Dressler and Jim Arnold at Dolby Laboratories, James Moorer at Sonic Solutions and Claude Cellier at Merging Technologies for their cooperation and indulgence.

Bio

Oliver Masciarotte is a new media craftsman living within a fog horn’s blast of the Golden Gate. Stop by http://seneschal.net and maybe pick up on something new…