Stay in touch…

Blog

Read the latest Bitstream

RSS Feed

LinkedIn

Look for us at LinkedIn

Twitter

Follow us on Twitter

Mix Magazine

This first installment of The Bitstream column appeared in the March 2000 issue of Mix Magazine.

The Bitstream

This column discusses lossy codecs for low data rate delivery…

Small Things Come In Lossy Packages

[Editor’s Note: Welcome to “The Bitstream,” Mix’s new monthly column focused on computer–based audio technologies. Penned by noted media consultant Oliver Masciarotte, “The Bitstream” will cover a wide variety of topics ranging from DVD, the Internet, new computer platforms, storage and distribution issues, to networking, software and more, all with a futuristic, yet practical slant. If you have ideas, insights or input for future topics, contact us at mixeditorial@intertec.com. We’ll be listening. It's no surprise to most of us that, by adopting digital acquisition, processing and distribution, we've settled for audio that is audibly degraded relative to its analog antecedents. Emerging acquisition standards, such as high-resolution PCM and DSD, have brought back the easy listening of analog to digital audio. While discriminating engineers sweat the details during production, distributing the finished product is another matter entirely.]

Distribution, until recently, dictated degraded quality. Take Justin the Freshman. He's quite happy listening to MP3's "near-CD quality" just as his forefathers were quite satisfied with their hideous 8 track tapes. Since today's distribution hot button is the net, something's gotta go when Justin’s lucky to have 56 kbaud. Whether its telecom, broadcast or optical delivery, there's usually not much bandwidth available for our audio data. We can't all afford xDSL just yet, but we can use some discrimination when the client asks us to “make it fit.”

The search for ways to stuff audio through pitifully puny pipes started with short-word-length PCM, ADPCM and IMA, and advanced to the current quagmire of “standards,” MPEG-1 Layer III (MP3), RealSystem G2 and QuickTime 3 (QT3) — all of which share the ability to take PCM source files and squeeze out the inherently redundant data, resulting in a smaller file that usually sounds acceptable. “Sounds acceptable?” Now that’s being polite. But, all is not lost. Of late, corporate brain trusts have been coming up with new, widely deployed codecs that actually sound good!

Restricted carrying capacity certainly is’'t new. Born in the Analog Era, there are several bandwidth–challenged analog standards you may recognize: NTSC and PAL television, AM/FM radio and pre-digital Plain Ol’ Telephone Service (POTS). Those standards relied on peculiarities of the human perceptual system to deliver just enough information to convey a message over a channel with less than full bandwidth.

Current distribution channels for digital audio, from satellite TV to CD-ROM, are largely supporting lossy codecs for the same reason. Some examples:

DVD-V, along with LPCM, has optional support for DTS, MPEG and AC-3. DVD-V was the first distribution format with support for 96/24. An audiophile somewhere was persistent enough to improve upon the “perfect sound forever” of 44.1/16. Certainly a “professional” audio engineer wouldn't have suggested such a thing. Most are perfectly happy with the crappy audio quality that we hear every day from gear with “pro” labels. Thank the Gods that someone in the pro world listens to acoustic music once in a while, otherwise DVD-A and SACD wouldn’t have been proposed.

Cinema sound is brought to you via Dolby Digital (AC-3), Digital Theater Systems, Inc. (DTS) or Sony’s 8-channel Sony Dynamic Digital Sound (SDDS), which uses a professional version of the same Audio Transform Acoustic Coding (ATRAC) algorithm used by MiniDisc.

Digital radio services, known as Digital Audio Broadcast (DAB), use various schemes, including Lucent's Perceptual Audio Coder (PAC), ISO/MPEG Layer II, MPEG-4 or Musicam (Masking Pattern Universal Sub-band Integrated Coding Multiplexing) — the codec of choice for good ol’ ISDN phone patches.

North American digital television uses — you guessed it — AC-3. No wonder Dolby Labs has been busy hiring in its licensing division.

The hairball of standards that is the Web has its own collection. Microsoft's Windows Media format (WMA), the standard for that company's extensive Active Streaming Format (ASF), and Version 2 of the QDesign codec (QDMC) used in the fabulous, open source QuickTime 4 are examples of audio mechanisms for the most popular computer operating systems. The SDMI distribution system for secure music purchases supports WMA, MP3, MPEG-2 AAC and Lucent's Enhanced PAC (ePAC). Liquid Audio’s open, multiformat approach doesn’t play OS favorites. It supports AAC, AC-3 and MP3.

All of the codecs mentioned above are classed as perceptual coders. Though invented in the late 1980s at Bell Laboratories, these algorithms have continued to morph just as Bell Labs has morphed into Lucent after the breakup. Engineers at Dolby, QDesign, Lucent and the Fraunhofer Institute for Integrated Circuits (FgH) among others, have thrown every trick in their multidisciplinary book at the development of modern versions. The algorithms these companies developed, as in the analog days of yore, provide bit rate reduction without compromising perceived fidelity. The sidebar shows some features of the low-data-rate king, QDMC v2, and the high-rate winner, ePAC, along with AAC. ePAC, is a highly refined dark horse with the power of Lucent behind it. MP3 is included for reference only as its old-school performance is considerably poorer than the other three.

Codec Stereo or Multichan. Sample & Bit Rate Comments*




AAC multichannel 96 k SR, 8-128 kbps/ch royalties
ePAC multichannel 44.1 k SR, 8-256 kbps/ch HTTP & Real G2 streaming, IIPP
QDMC v2 multichannel 48 k SR, 8-128 kbps/ch >HTTP & RTSP streaming, IIPP
WMA stereo 48 k SR, 5-160 kbps/ch HTTP & MMS streaming, IIPP
MP3 stereo ? SR, 32-320 kbps/ch royalties


* - IIPP stands for integrated intellectual property protection mechanisms such as watermarking and encryption. Also, WMA is not a codec but is a framework for several specialized codecs.

How do perceptual encoders work? Let's look at AAC, as its not cloaked in secrecy like QDMC and ePAC. Both ePAC and MPEG-2 Advanced Audio Coding incorporate algorithms from Lucent's PAC, dating from 1992. AAC is one of the codecs of the MPEG-2 standard and is a subset of MPEG-4, the new unified family of ISO standards for delivery of rich media. Major contributors included the FgH, AT&T, Dolby and Sony.

As with other offerings, AAC employs perceptual subband/transform coding, whereby the signal is first transformed from the time domain into the frequency domain using a variable window or block length. The encoder then applies a psychoacoustic model to estimate whether, in any particular band of frequencies, the signal strength is above or below the perceptual threshold relative to the adjacent bands. If the signal is above the masking threshold, a spectral coefficient or value is generated to represent the signal in that band. Masking threshold means the amplitude threshold below which a spectral component will be "hidden" or masked by louder components at frequencies nearby. Its a brain thing, just go with it. Once all the valid coefficients are determined, AAC applies additional mechanisms to enhance coding efficiency, including: joint coding, which removes monaural redundancy in a stereo signal; temporal noise shaping prediction, which distributes quantization noise over time; and lossless Huffman coding.

You can be sure that lossy codecs won’t be going away anytime soon, based on the proliferation of terrestrial (and satellite) distribution of rich media, the emergence of solid-state personal stereos and record labels—large and small—betting on a hybrid revenue model of on–site advertising with the enticement of free track downloads. So, when a project comes knocking that allows your input, do what you're paid for. Use your ears first, then make an educated choice from the new codec menu. To help you get a feel for which one is appropriate, I've posted sample files on my site for your evaluation. [In moving this and other Bitstream articles into the archive, I have taken the liberty of not including the audio files. If you’re interested in listening to the result of encoding with legacy codecs, contact me]

Though I've written for Mix as far back as 1987, “The Bitstream” is my first column for this magazine. I’ll try to provoke some thought about the technological foundation for our industry and answer questions you may or should have. The conflux of audio and computers forms the basis for discussion and there's plenty out there to cover. I'll also, on occasion, wander into other subject areas while eluding my editors…

Bio

Oliver Masciarotte is an audio craftsmensch and new-media consultant. He recently returned from Japan where he worked to bring up a new, integrated DVD-A/V installation for Prime Mix/Tokyo and the HMC in Fukuoka.