Stay in touch…

Blog

Read the latest Bitstream

RSS Feed

LinkedIn

Look for us at LinkedIn

Twitter

Follow us on Twitter

Mix Magazine

This installment of The Bitstream column appeared in the October 2003 issue of Mix Magazine.

The Bitstream

This column continues discussion on the introduction of 64 bit CPUs for desktop use…

Longer Is Better!

Last month, we started our look at the subject of chips, clocks, word length and DAW power. This month, I’m continuing this meander by examining an impending tipping point in desktop computers. Those of you that use host–based DAWs are subject to the limits of your CPU(s), so the introduction of several new consumer processors will change the way you do your work.

Let’s digress a bit so we can get our bearings…If you remember last month, I looked at the inevitable progression for 4 to 8 then 16 bits and on up to the current crop of 32 bit CPUs or Central Processing Units, the heart of any computer–based product. During this same time of evolution, Intel has increasingly hyped what the folks at Tom’s Hardware refer to as “its self-perpetuated myth that processor performance is based on clock-speed alone.” Now, with clock speeds reaching the limits of current technology, a feature that mainframes and scientific workstations have long enjoyed has started to make an impact on the Mac and Windows desktop as well. That spiff, processors that crunch true 64 data words, is breaking out of the chip foundry and onto your desktop.

To keep the heavy duty, “enterprise–class” IT customers happy and to provide something for mere mortals to lust after, the x86 chip vendors began the migration from 32 bit to 64 bit processors several years ago. At present, high end Windows users are in the middle of a marketing tug of war between Intel/HP and AMD, and I’d place my money on AMD. Here’s why: Intel and Hewlett Packard’s new 64 bit Itanium processor family, the Merced and McKinley chips, and their Intel IA-64 architecture are designed as a clean break from the past. Legacy “x86” code written for the 8, 16 and 32 bit range of past processors runs in emulation on an Itanium, making overall performance for legacy software relatively poor. “Relative” translates into slower than the current range of 32 bit CPUs. According to eWeek’s Technology Editor, Peter Coffee, “Intel is betting that on-chip instruction scheduling hardware, which emerged on x86 chips in the late 1990s to inject new life into 1980s-style code, is nearing its limit. With the Itanium, Intel proposes to examine programs when they are compiled into their executable form and encode concurrent operations ahead of time. Intel calls this approach EPIC, for Explicitly Parallel Instruction Computing, and it is the genuine difference between the Itanium and AMD’s x86-64.” Trouble is, EPIC is hobbled with weak backward compatibility for 32 bit code, making Itanium a slow poke in that regard.

Meanwhile, Advanced Micro Devices has seen fit to build legacy support or backward compatibility into their AMD64 technology, extending the Intel x86 instruction set to handle 64 bit memory addresses and integer data while providing a continuous upgrade path as applications are rewritten or recompiled for the new capabilities of their 64 bit chips. Dirk Meyer, Senior VP at AMD stated that AMD “designed its AMD Opteron and upcoming (it was due last month) AMD Athlon 64 processors to deliver quick and measurable returns on investment with low total costs of development and ownership; protect investments in the existing 32 bit computing infrastructure and limit the costs of transition disruption by transparently mixing 32 bit and 64 bit applications on the same platform; and simplify migration paths and strategies, allowing customers to choose when and how to transition to 64 bit computing.” In a word: value. Right on, sez I!

This nod to the real world needs of customers is something that I, for one, appreciate. I like the fact that, when possible, new and improved doesn’t necessitate heaving out your existing stuff. As I mentioned last time, the PowerPC Alliance made the same sensible choice when they built 64 bit compatibility into their family of processors. Alas, the PPC Alliance dissolved in 1998 when Motorola assumed control of the PowerPC chip design center in Austin, Texas. IBM continued to develop PPC chips for its own uses and that effort has resulted in the latest member of the POWER family, the 970. Announced at last October’s Microprocessor Forum, the fifth generation “G5” is, like the Itanium and Opteron, a true 64 bit machine, with support for 64 bit integer arithmetic versus 32 bit for the G4, two double precision floating point units (FPUs) versus one for the G4 as well as an AltiVec 128 data bit vector processor. The G5 has, according to Apple’s developer web site, a “massive out-of-order execution engine, able to keep more than 200 instructions in flight versus 16 for the G4.”

A much longer execution pipeline, up to 23 stages versus 7 for the G4, means that bogus branch predictions are more costly because of the deeper pipelines. Address prediction…the whole PPC vs Intel debate, in a way, boils down to prediction and how designers go about auguring upcoming processing requests. Here’s why: CPUs are designed to execute or process instructions in a predictable order. Think of a modern factory, with parallel production lines all building subassemblies that are merged into a finished commodity. The output of one assembly line feeds the input to another. Once all the subassembly lines are filled, an efficient manufacturing engine is created. In the world of CPU design, the assembly lines are called pipelines and, once all parallel pipelines are filled, an efficient data processing engine chugs along. By the way, Intel’s EPIC is their answer to efficient parallel execution: keep the pipelines full with nary a bubble in sight.

In a factory, if a part is missing from one assembly, it holds up all other lines dependent on the output of the suspended line. In a CPU, if the correct datum isn’t available for processing in any pipeline, it causes a discontinuity in the efficient use of the pipeline’s program execution capabilities. That discontinuity, or “bubble,” happens whenever the task scheduling manager screws up in predicting what “part” or piece of data is needed at the input to the pipeline. It’s as if the purchasing manager in a factory didn’t order a crucial widget to build a subassembly. The lack of that widget shuts down the whole factory. So, predicting the future with that out-of-order execution engine mentioned above, is crucial to the smooth running of any modern

64 bits; Now what, you may ask, does that buy you? Well, how about the ability to address more than 4 GB of RAM and practically manage more than 2 GB? Actually, one million Terabytes. Far fetched, you say? Not really, when you consider that, nowadays, 1 GB of DDR PC3200 RAM will cost you only $190 and many applications will happily use as much RAM as they can steal. With virtual memory, more RAM means less disk swapping which results in significantly better overall performance.

Another benefit is that these new 64 bit puppies are designed explicitly for SMP configurations. SMP or Symmetrical Multiprocessing is one of several design approaches that allow more than one CPU to share computing load, divvying up responsibilities among the processors. “Two–way,” two CPU computer configurations are typical for desktops, which means that one CPU can be handling all the UI, networking and other mundane tasks while the second CPU concentrates solely on your media application’s needs.

A third additional, though indirect, advantage is that 64 bits facilitate more widespread double precision data handling which, in turn, means better quality for your data “product.” For many media moguls, quality appears to be one of the last things on their minds but, for myself and one or two other engineers out there, delivering a realistic acoustic performance to the consumer is an important consideration and double precision processing really helps.

To be realistic though, 64 bit processors won’t buy us squat until software vendors drink the 64 bit Kool-Aid as well. Unless your favorite application is rewritten or, at the very least, recompiled to take advantage of these nextgen processors, then you won’t see any improvements. Even worst, under some circumstances, you may actually experience crappier overall performance due to your 32 bit application running in “compatibility” mode, essentially emulation, on a 64 bit Itanium. Since both the Opteron and PowerPC families were designed with transparent, low level compatibility for “legacy” or 32 bit applications, they’ll run your old school stuff just fine, thank you very much.

Another way of looking at all this 64 bit hoohah is that RISC and CSIC architectures, once clear and polar opposites, are now both moving toward a common ground. Distinctions are increasingly blurry though, while science and industry march forward, it may take a few years before the likes of Digi get around to rewriting their stuff for G5s, Opterons and Itaniums. In the meantime, more agile and customer oriented concerns will get right on the stick, providing 64 bit–optimized versions of your favorite software. So, save your Euros for that inevitable upgrade since longer really is better!

Bio

OMas has recently taken many a audio geek across the Divide of Confusion to the blissful Land of OS X Understanding. This column was brewed while under the influence of Madredeus’ Electronico and, in keeping with the electronica slant, Björk’s Greatest Hits.

Pedant In A Box

This month’s timely technobabble includes the phrase “Virtual Memory”…

Virtual Memory

Virtual Memory (VM) is a standard method of using slow, hard disk space to act as a substitute for fast solid state memory, typically Random Access Memory or RAM. Both Mac OS and Windows use virtual memory to optimize RAM usage. In Ye Olde Days, hard disks were far less expensive than RAM, so VM was a viable option for cash poor, time rich folks who couldn’t afford a boat load of RAM. You’d have to be time–rich since the time it takes to read and write data to rotating media like a hard disk is orders of magnitude slower than RAM access times.

When an operating system decides that memory requirements are getting tight, it takes the oldest data from RAM and “pages” it out to disk until needed again. If the data is later required to complete some operation, it’s read from disk back into RAM and then used. This “swapping” of data to and from RAM and disk takes, to a CPU operating at several GHz , what appears to be an inordinately large amount of time. Hence, the slowdown associated with the use of VM. Moral of this story: the more RAM you have, the less swapping that happens and the faster your computer will be.