Stay in touch…

Blog

Read the latest Bitstream

RSS Feed

LinkedIn

Look for us at LinkedIn

Twitter

Follow us on Twitter

Mix Magazine

This installment of The Bitstream column appeared in the September 2002 issue of Mix Magazine.

The Bitstream

This column discusses the MPEG-4 standard…

Alright Mr. DeMille, I’m Ready For My Close-up

This month’s issue is devoted to sound tagging along with picture and, glory hallelujah, we audio folks finally come out ahead! Pardon my exuberance and permit me to explain...Yesterday, I was strolling the aisles of the MPEG-4 Industry Forum’s Workshop and Exhibition and was amused by all the crawly, noisy video that was being demo’d. Ah, but the audio… the audio, though only mono or stereo, had exceptional quality considering the data rate. Welcome to the wonderful world of MP4, where audio finally steps to the front of the line.

Established back in 1988, the Motion Pictures Expert Group (MPEG) was formed to specify digital coding schemes for audio and video at low data rates. Their most widely known creations are MPEG-2 video, sanctioned by the DVD Forum, and MPEG-1 Layer III, also known as mp3, the current fave of the download crowd. Over the next few years, mp3 audio will be dethroned by MPEG-4 AAC, the advanced audio codec originally developed to improve quality without backward compatibility restrictions imposed on prior codecs. AAC was originally designed to provide quality that is indistinguishable from a lossless parent file by the majority of listeners. The interesting thing is that the target data rate specified in the mandate was 320 kbits/second...for 5 full bandwidth channels!

Though MPEG-2 AAC was developed a while ago, it was chosen as the basis for sampled or “natural” audio in MPEG-4. If you’ve done any listening tests on lossy codecs, you know that, at high rates, they all do an OK job and some actually sound quite good. However, at dial-up data rates, most codecs fall on their face and that’s where some help is needed. Being a recovering mp3 basher, I’ve learned that the equation isn’t mp3 = the death of quality. Rather, as in all things in life, it’s a trade off. The real questions are what codec, what sample and data rates do you encode and whether or not preprocessing is warranted. With the exception of preprocessing, MP4 provides a wide range of solutions to the low data rate dilemma.

MPEG-4 comprehensively describes methods for representing content and the audio tools available to us are as varied as the range of distribution methods at our disposal. Building on the work of previous efforts, the MPEG-4 group has given us geeks more of everything. MP4 is more efficient, more scalable, more modular, more extensible and more cooperative

• More Efficient - The music codec in MPEG-4 is designed to operate at around 64 kbit/second (kbps) per channel. To give you some real-world perspective on that number, mp3 stereo at 192k VBR (96k times 2) sounds pretty darn good to me and AAC at Main profile does about the same job at 128k…’nuff said. Save space when storing or save bandwidth when streaming, it’s your choice.

By the way, MP4 AAC can handle from one to 48 channels and includes both downmix capabilities and default channel configurations including 5.1 multichannel in Dolby’s Mode 6 (C|L|R|LS|RS|LFE) assignment. Having all those channels available means you can, for instance, do spiffy multimedia presentations with separate mix–minus, VO and effects tracks.

• More Scalable - Because MPEG-4 audio allows for a wider range of bit rates, quality can be matched to a wider range of applications. In conjunction with the increased efficiency, applications such as transmission over wireless data networks, internet streaming, digital audio broadcasting and advanced portable players are made more practical.

Transmission over best–effort protocols like IP won’t cause buffer underflows since the decoder adapts by simply scaling back on the quality, usually by reducing the audio passband, when it’s starved for data. Another tool available, usually for speech transmission, provides that scalability. The TwinVQ coder is a good example of the adaptive abilities of encoding several partial bitstreams which can be decoded alone or, if the sustained data throughput is high enough, in concert for higher fidelity.

• More Modular - MPEG-4’s “object oriented” approach to content delivery means optimal encoding for each data type. Content creators have a broad range of methods for coding audio though software vendors have yet to bring mature production tools to market. One of the new additions to the MPEG-4 audio toolbox, along with long-term prediction and bit-rate scalability tools, is Perceptual Noise Substitution or PNS, a feature designed to further optimize bitrate efficiency.

PNS is based on the observation that, perceptually, all noise sounds about the same. This means that the actual fine structure of a noise signal isn’t too important. Rather, the bitstream just transmits that some region of frequencies is noise–like and additional information is supplied defining the total power in that band. In the decoder, a randomly generated noise will be inserted into the appropriate spectral region according to the power level.

• More Extensible - Not locked in to the limits of current technology, MPEG-4 can grow as new developments emerge. As the President of the MPEG-4IF says, “The object-based MPEG-4 standard is both state–of–the–art and future-proof; it can easily incorporate improvements in technology if and when they materialize.”

• More Cooperative - Sorry to break the news but MPEG-4, though a world-wide standard for audio, is more importantly for video and multimedia. Though preliminary testing and my experience indicates that MPEG-4 won’t improve on existing proprietary video codecs like Sorenson, it produces a much better quality image than MPEG-1 and has the ISO stamp of approval to boot. That, in turn, will go a long way toward widespread market acceptance as has been the case with MPEG-2.

Designed with interoperability in mind, MPEG-4 is meant to be wedded to MPEG-7, an emerging deep metadata standard for describing content. Together, they will work more graciously with DRM (Digital Rights Management) and interactive presentation infrastructures (DTV anyone?) as all this stuff matures. This will, in turn, reduce the FUD factor and confusion for consumers. As an example, MPEG-4 includes a set of standard interfaces to proprietary rights management systems. If you access protected content, the MPEG-4 bitstream should contain the information needed to obtain the correct unlocking software.

If you’re still awake about now, you may have noticed I snuck in the “sampled” qualifier back in paragraph three. The reason is that, along with so called “t/f (time/frequency) coders” for music and speech, MPEG-4 audio also includes tools for synthesized audio among its data objects or types. MP4-SA or “Structured Audio” relies on the decoding infrastructure to algorithmically create synthetic programming from very compact instructions. If this sounds like MIDI, you’re not far off. MIDI and wavetable synthesis are also supported.

Well, there’s lots more to cover but that’s all for this overview. For those wanting to test the video capabilities of MPEG-4, DivX 5 has been available for a while. Though audio tools are a bit rarer, AudioCoding.com has some Win source code and a Winamp plugin. For cross platform fun, QuickTime 6 should be out of beta so everyone can begin to hear the benefits of MPEG-4 audio. Real Networks has also adopted a strategy of interoperability in an attempt to combat the balkanization of the web that Microsoft envisions. In a future Bitstream, I’ll dig into MPEG-7, the future of metadata and MPEG-21, so stay tuned.

Bio

OMas provides tech help to a wide variety of media mavens. In his quieter moments, this column was decoded while under the influence of Nusrat Fateh Ali Khan’s Shahen–Shah along with the classic strains of (Who’s Afraid of) The Art Of Noise. Links and other useful arcana relating to Bitstream September are lurking at <www.seneschal.net>.

Pedant In A Box

This month’s bussword is:

FUD

FUD, a TLA standing for Fear, Uncertainty & Doubt is often attached to discussions of Microsoft’s penchant for triple threat marketing. First, they diss competing technologies, such as open standards, in an effort to promote their own proprietary offerings. Then, they create uncertainty by hyping premature products so consumers postpone or question pending investments in alternative technologies. Doubt’s easy. By making their stuff magically run better on their platform and conveniently leaving out 3rd party software that a typical consumer might need, they reinforce the “safe” decision to buy from Microsoft.

A good example of FUD is Microsoft’s announcement of Corona, their answer to MPEG-4. Rather than playing with the other children, they continue to make up new rules when it suits them, assuming that the FUD factor will help fill their coffers. These days, corporate America is starting to wake up to the manipulation and is increasingly moving to less expensive, more stable Open Source software for file and print services, long a cash cow for the folks in Redmond. In response, Microsoft is now targeting the Third World in the hope of keeping their revenue engine turning over.