Transmeta's Crusoe, HotRod or Performance Hog?Dec 25, 2000, 17:08 (2 Talkback[s])
(Other stories by Sander Sassen)
By Sander Sassen, HardwareCentral
Upon introduction Transmeta's Crusoe processor has generated much interest with its promises of strong performance and long battery life. This, in combination with announcements from well-known manufacturers that they will incorporate Transmeta's processors in their notebooks, has led many people to believe that we'll see substantial improvements in price/performance levels for these devices.
In actuality Transmeta has developed a whole new approach to microprocessor design, and not just another processor. Currently an entire processor with the accompanied instruction set is implemented in hardware (for example a x86 processor such as the Intel Pentium III), and then the software is written specifically to make use of that instruction set. Transmeta chose to do it differently; rather than implementing the entire x86 instruction set of the processor in hardware, the Crusoe processor consists of a compact hardware engine surrounded by a software layer.
The hardware component is a very simple, high-performance, low-power VLIW (Very Long Instruction Word) engine with an instruction set that bears no resemblance to that of x86 processors. Instead, it is the surrounding software layer that gives programs the impression that they are running on x86 hardware. This innovative software layer is called the Code Morphing software because it dynamically translates or rather 'morphs' x86 instructions into the hardware engine's native instruction set.
This unique approach to executing x86 code eliminates millions of transistors, replacing them with software. For example, the current implementation of the Crusoe processor uses roughly one-quarter of the logic transistors required for an all-hardware design of similar complexity. Apart from the obvious reduction in cost, this approach has the following benefits:
The hardware component is considerably smaller, faster, and more power efficient than conventional processors. The hardware is fully decoupled from the x86 instruction set architecture, enabling Transmeta's engineers to take advantage of the latest and best in hardware design trends without affecting legacy software.
The Code Morphing software can evolve separately from hardware. This means that upgrades to the software portion of the microprocessor can be rolled out independently of hardware chip revisions. Transmeta's Code Morphing technology is obviously not limited to x86 implementations.
However, software routines usually aren't as fast as a hardware solution, even when coded in assembly, the processor's native programming language. In the following pages we hope to give you some insight into the Crusoe's workings and analysis of prospective performance.
Code MorphingThe Code Morphing software is fundamentally a dynamic translation system; a program that compiles instructions for one instruction-set architecture into its own instruction set.
The entire Code Morphing software is implemented in flash ROM and is the first program to start on boot. Therefore the Crusoe processor, running its Code Morphing software, is indistinguishable from a 'normal' x86 processor, as the only thing any x86 code sees is the Code Morphing software--acting and operating just like any other x86 processor. The only program written especially for the Crusoe's VLIW engine is the Code Morphing software itself.
Because of the Code Morphing software, all x86 programs, but also a PC's BIOS and operating systems, are insulated from the hardware engine's native instruction set. Therefore the native instruction set can be changed arbitrarily without affecting any x86 software at all. The only program that needs to be changed is the Code Morphing software.
One other big advantage is that it solves a problem that has hampered acceptance of VLIW processors. A traditional VLIW processor exposes details of the processor pipeline to the compiler; as a result, any change to that pipeline would require all existing binaries to be re-compiled to make them compatible with the changed pipeline. In other words, any modification to the processor's hardware would require all programs that run on it to be re-compiled.
We've all seen the difficulties of changing a processor's hardware; just think of MMX, SSE and 3DNow! instructions, all hardware implementations for which the software has to be re-compiled. This, however, is not a problem with the Crusoe processor, since, in effect, the Code Morphing software always transparently 're-compiles' the x86 code it is running.
This translation of one instruction set's instructions into another's comes at a price, though; the processor has to dedicate some of its processing power to running Code Morphing, which a conventional x86 processor could have used to execute application code. And although Transmeta has undoubtedly designed the Code Morphing software for maximum efficiency and low overhead, it will never perform on par with a x86 CPU of the same configuration and clockspeed--simply because it has to execute the Code Morphing software before it can actually process an x86 instruction.
Execution, Decoding, SchedulingWith the Code Morphing software handling x86 compatibility, Transmeta's designers created a very simple, high-performance, VLIW hardware engine with two integer units, a floating-point unit, a memory control unit and a branch unit. Transmeta's Crusoe processor uses VLIW's, Very Long Instruction Word's, which are called a 'molecule', and can be 64 bits or 128 bits long and contain up to four RISC-like instructions, called 'atoms'. All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units within the processor. This format-dependent routing greatly simplifies the decode and dispatch hardware, as it does away with complex out-of-order hardware.
Current superscalar x86 processors, such as the Intel Pentium III, also have multiple functional units that can execute RISC-like operations in parallel; however due to the out-of-order execution a separate piece of hardware is needed to re-construct the sequence of the original x86 instructions and make sure that they execute in order. As a result, this type of processor is much more complex than the Crusoe processor's VLIW engine.
As mentioned above, due to a conventional x86 processor's out-of-order execution, it requires additional hardware to make sure that x86 instructions are executed in the correct order. In contrast, Code Morphing can translate an entire group of x86 instructions at once, whereas a conventional x86 processor translates each instruction separately. Also, the Code Morphing software translates instructions once and stores the result in a 'translation cache'; the next time the same code is executed, the processor can immediately run the existing instruction from the translation cache.
This software translation opens up new possibilities, since a conventional out-of-order processor has to re-translate and schedule an instruction each time it executes, very quickly. With Code Morphing, the translation process can be optimized by looking at the generated code and minimizing the number of instructions executed. In other words, Code Morphing can speed up execution plus reduce power consumption. However, performance might not be at its peak after the first iteration of the optimization process. It will probably require multiple passes because the code optimization is done in steps. All code is first recompiled quickly, but not necessarily efficiently, to keep the program and the processor running. Then, when a section of code is run again, it moves up in the hierarchy and is scheduled for optimization, sections that only occur once usually don't get optimized. As a result, that section might take several passes to get fully optimized.
In essence it is no different than programming in a higher language, for example C++. After you've compiled and executed your program one of the things you're keen on determining is where the performance bottlenecks are. Then you subsequently use assembly to speed up just those sections, and thus optimize the program's overall execution. Programming in C++ gets your program running faster than programming it all in assembly, but the extra performance gained by using assembly where needed can really speed up processing.
Caching and OptimizationAlthough the Code Morphing software is implemented in ROM, it first gets copied to DRAM at boot up to increase performance, residing in a separate memory space along with the translation cache.
The optimization of the translation process by the Code Morphing software allows for the translation cache to be used much more efficiently, as a result of which the hardware can then execute the optimized translation at full speed. Furthermore, as a part of this optimization, when an application executes Code Morphing 'learns' more about the program and constantly optimizes and speeds up execution.
The Code Morphing software has many ways to gather feedback about a running program, one of which is the 'instrument' translation; during the translation, code is added whose sole purpose is to collect information about the block to be executed. This data is used later to decide when and what to optimize and translate.
A very good example, already mentioned, is knowing how often a piece of x86 code is executed and if often, then optimize for that code, instead of for code used only once or twice, where such an optimization is a waste of time.
However, the Code Morphing software runs off of main memory and part of the processor's performance will be determined by the bandwidth of the memory interface and the memory technology used. Because the data to be processed and Code Morphing software both reside in memory, this has more performance impact than with a regular x86 processor, which only stores data in main memory.
Power ManagementMost conventional x86 processors regulate power consumption by rapidly alternating between running the processor at full speed and turning it off. Different performance levels can be obtained by varying the on/off ratio, or 'duty cycle'. This approach has quite a few drawbacks, as, for example; the processor may just have been switched off when an application needs it, or it may be running at full speed when not being used at all at all.
The Transmeta Crusoe adjusts power consumption on the fly by dynamically increasing or decreasing clockspeed. As a result, software continuously monitors processor demand and dynamically selects the appropriate clockspeed needed to run the application, no more and no less, hence also managing the processor's power consumption.
Finally, the Code Morphing software adjusts the processor's core voltage on the fly. Because power varies linearly with clock speed and by the square of the voltage, adjusting both can produce huge reductions in power consumption, while a conventional processor can adjust power only linearly. For example, assume an application program only requires 90% of the processor's speed. On a conventional processor, throttling back the processor speed by 10% cuts power by 10% Under the same conditions, Crusoe's 'LongRun' power management can reduce power by almost 30% (30% = 100% x (1-(.9 x .92))).
ConclusionTransmeta's Crusoe processor sets new standards for processor design and in software translation of instruction sets. We have yet to see the implications of this approach for the computer industry, but they are likely to become apparent over the next several years. The technology is scalable and not limited to low-power designs or to x86-compatible processors. However, due to its architecture it may not offer the same level of performance as any given processor of similar clockspeed and configuration.
Performance can vary greatly between programs, depending on whether its speed of execution relies on rerunning a section of code a large number of times. The Code Morphing software is designed to optimize instructions that recur, allowing execution to be optimized after each iteration. It does offer a great basis for running multiple processor platforms on a single processor core.
When notebooks incorporating Transmeta's Crusoe processor become available, don't expect all of your programs to work faster over time; some code is not too well suited for optimization, after all, and other code is already heavily optimized. Just keep in mind that Transmeta's Crusoe processor emulates another processor's instruction set, and we've yet to come across an emulator that outperforms the original. Although marketing slogans and product presentations might have got you thinking otherwise, in reality you will always lose performance if something isn't programmed natively.