# Mix Magazine

This installment of The Bitstream column appeared in the February 2003 issue of Mix Magazine.

## The Bitstream

This column discusses fixed versus floating point arithmetic in digital signal processing…

### Randall, Them’s Fightin’ Words!

This month, I’m delving into a topic that generates its share of enmity in the two audio camps known as the Fixed and the Floats. Like the prodigal McCoys and Hatfields, these fellers been mixin’ it up since Real Engineers wore ’coon skin caps, and nobody’s yet come out the winner. I’ve got friends, and you may be one of them, that are adamant about what flavor of arithmetic is used in their DAW. While there are circumstances where it’s more appropriate to employ fixed or floating point, it’s a bit like arguing if bidirectional mics are better than cardioid: they’re two different beasts. In and of itself, one is neither more accurate nor prettier than the other, just different but that doesn’t seem to matter to those folk who are feudin’.

Let’s start with one aspect inherent in this discussion: the word length war or 24 vs 32. In audio circles, “24 bits” refers to 24 bit fixed point or integer arithmetic while “32 bit” generally refers to 32 bit, floating point arithmetic. Before I lay out the clear, crystalline lines of my argument, we must, perforce, digress unless you remember your high school math from daze gone by…Check “The Fixed & The Floats” below for a brief rehash if need be.

All digital audio systems are, at heart, little silicon math majors. AES-EBU PCM data usually starts life as a sampled representation of some acoustical or electrical event and is stored as a 24 bit data word. Thus, AES-EBU audio is 24 bit, fixed point data by definition. Yet many hardware DAWs use 32 bit, floating point arithmetic to process what was once your AES/EBU data. Questions arise here: First, since 32 is bigger than 24, then are 32 bits better? Also, what happens when you convert a fixed point sample to its floating point equivalent? Well, my opinion is no and not much, but read on, then decide for yourself…

Strictly speaking, there is no difference between expressing a number, in our case an audio sample, as either fixed or floating point. Given sufficient precision, they are equivalent but therein lies the rub. I’ll dig into the subject of sufficient precision in a future column but, for now, let’s stick with the 24 vs 32 bit discussion…In audio circles, 24 bits are the AES mandated word length, so some products use 24 bit, fixed point number crunching. On the other hand, floating point arithmetic lends itself to simple digital signal processes like gain change and mixing, so 32 bit floating point processing is commonly used throughout the audio industry. In these cases, the 24 bit fixed standard is equivalent to the 24 bit mantissa plus 8 bit exponent used in the 32 bit, floating point version.

There’s a saying that the devil’s in the details and low level detail is what many engineer’s work very hard to preserve. A long time proponent of fixed point arithmetic is James Moorer, former tech chieftain at Sonic Solutions, now with Adobe Systems. In a AES paper discussing the advantages of double precision fixed point versus single precision floating point DSP (Digital Signal Processing), he states that “…there is an advantage to using integer arithmetic in general, in that most integer (24 bit fixed point) arithmetic units have very wide accumulators, such as 48 to 56 bits, whereas 32-bit floating point arithmetic units generally have only 24 or 32 bits of mantissa precision in the accumulator. This can lead to serious signal degradation, especially with low frequency filters.” The signal degradation mentioned translates into 3 or 4 bits of precision, by the way. Not much in the grand scheme but when multiple operations are performed on a signal, these small errors can quickly add up. The accumulator mentioned above is a temporary memory location or register that stores the result of an addition or multiplication operation. Any bits that don’t fit in the accumulator must be thrown out, usually via a rounding operation.

Moorer is talking specifically about DSP implementations. That is, the manner in which integrated circuit designers chose to build their chip–level products. In this case, he’s referring to the Motorola 56k family of DSPs, 24 bit fixed point machines with 56 bit accumulators. A common choice for floating point DSP is Analog Devices’ SHARC family, which is a 32 or 40 bit floating point device with a 32 bit mantissa accumulator.

The 56k and SHARC are two common hardware examples but host–based, software–only DAWs largely use the CPU’s built–in, fixed or floating point processing. Since personal computers are general purpose devices, they can perform most any arithmetic operation they are called upon to do, though it may not happen as quickly as a purpose–built, hardware device. By the way, SHARCs have some interesting register features but, for simplicity, I’m gonna skip their trick stuff and stick with the basic concept.

So, the bottom line here is first, carry “enough” significant digits from one DSP operation to the next. Second, when you have to throw out extra “low order” bits, do so sensibly so residual low amplitude information will not be lost. Finally, when it comes time to down–rez that 24 or 32 bit master to a 16 bit consumer format, redither it carefully. If done properly, the conversion from a long word length file to a shorter word length distribution master will carry most of that quiet information even though the “extra” bits are gone, but that too is a subject for a future column.

A third question you may have considered is why designers choose one processor architecture over another? I’m not sure I can answer that adequately but me thinks it has to do mostly with parts cost and programming complexity. An example is that SHARC family, which has less than stellar “development tools,” as programming aids are called, but is inexpensive and easy to hook together when an application calls for many DSPs. Hence their seeming ubiquity in low cost digital audio gear or where gazillions are needed, as in a digital desk or mixing console. Also, once a DSP choice has been made, the corporate culture tends to discount other architectures due to familiarity and a wealth of in–house wisdom about the chosen part.

Through all of this, realize that microphone choice and placement, which preamp and converter you use, gain staging, signal path and circuit topologies along with redithering choices usually have far more effect on the final sound than the arithmetic used in any professional DSP product. Also, I feel that all this fussing is moot if you’re working on pop music with no dynamic range and way too much processing. However, once an analog signal is sampled, then quality issues are dictated by, among other things, subtle product design tradeoffs, including how “excess” data is handled. So, the 24 versus 32 argument really comes down to implementation, either in hardware or software. If your gear “does the math” carefully — that is, perform the DSP in a conservative way, then it will produce a higher quality result and that, my friend, is the crux of the biscuit.

#### Bio

This column was written while under the influence of reruns of Buffy, the Musical and the cool jazz grooves of Stan Getz’s Focus. For links to DAW manufacturers, both fixed and floats, along with an archive of Bitstreams past, head on over to www.seneschal.net and top off that brain pan.

#### Sidebar

##### The Fixed & The Floats

Computers can perform their computations in one of two ways: Either the math is fixed point, as you or I would do in long hand arithmetic, or it’s floating point, what in high school is called scientific notation. Fixed point notation is a method of expressing a value by having an arbitrary number of digits before and/or after a decimal point. 0.0079, 3.1415 and 8,654.63 are all fixed point expressions. Floating point takes another tack by using a “mantissa” and “exponent.” The mantissa provides the significant information or digits and the exponent provides a scaling factor, to say how big is the number. Take a look at the table below for some examples…

Notice the floating point versions have a single digit, then a decimal point, then the rest of the significant digits. Also, grok that any number raised to the 1st power equals 1, so multiplying anything by 101 is the same as multiplying by 1, which means the value doesn’t change. So, 75 times 101 equals itself, 75. Finally, notice that the exponent, or “power” to which the number 10 is raised, is equal to the decimal places that the decimal point has been moved from the fixed point version: positive values move the decimal place to the right and negative values move the decimal point to the left. By the way, scientific notation is a geekspeak way of writing a floating point number in a compact way, with “EE” standing in for “times ten to the power of.”

#### Additional Reading

##### 48-Bit Integer Processing Beats 32-Bit Floating-Point for Professional Audio Applications

This paper was presented at the 107th AES Convention, September 24-27 1999 (AES Preprint Number 5038) & downloadable from Sonic Studio.