Itanium

From TechPubs Wiki

Itanium Inside!
Itanium Inside!

Itanium, also known as IA-64 and IPF (Itanium Processor Family) is an ISA and family of high performance CPUs created as a joint Hewlett Packard and Intel project starting in the early 1990s. It is notable for SGI's later usage of the Itanium 2 architecture for the Altix 350 and later systems.

History of Itanium

The Itanium project began in the late 1980s at Hewlett-Packard’s Fort Collins Design Center in Colorado. At the time, HP was exploring the limits of its in-house RISC architecture (PA-RISC) and recognized that the next generation of high-performance computing would demand new approaches to instruction-level parallelism. Rather than scaling PA-RISC indefinitely, HP engineers began work on a new architectural model, one that exposed parallelism explicitly to the compiler rather than relying entirely on hardware scheduling.

By the early 1990s, Intel had been working on its own advanced RISC project, code-named P7, but progress had stalled. In 1994, Intel scrapped P7 and entered into a joint development agreement with HP, betting that EPIC could provide a leap ahead in performance. The partnership combined HP’s architectural research with Intel’s process technology and manufacturing scale. The new processor family would become known as Itanium.

Itanium was derived from the Berkeley RISC project that also gave rise to SPARC and the Intel i960 CPU families as well as the EPIC design ideas explored by HP.

The first generation, Merced, was released in 2001 after significant delays. Originally intended to ship by 1997, Merced slipped repeatedly due to the complexity of the design and compiler challenges inherent to EPIC. When it finally arrived, Merced was manufactured on a 180 nm process, with clock speeds from 733 MHz to 800 MHz. Although it demonstrated the potential of EPIC, performance was underwhelming due to misunderstandings in the press, delays that led to competition from Alpha and MIPS newer designs and focus on the x86 hardware emulation, which implied it was a successor to x86.

Intel and HP quickly followed with the Itanium 2 architecture in 2002, a major redesign incorporating deeper pipelines, larger caches, and more robust compiler support. By the mid-2000s, Itanium’s SPEC benchmark scores demonstrated that, on well-optimized workloads, it could compete effectively with contemporary RISC architectures. However, industry support dwindled due to a reliance on Intel's and HP's proprietary toolchains (GCC had support for Itanium, but it was modest at best. The design of Itanium required a well-tailored compiler for proper performance.)

Gradually, support consolidated around HP, which built its Integrity server line on Itanium, and a small number of Japanese partners such as NEC and Fujitsu. Other vendors, including Dell, IBM, and SGI, withdrew support in the 2000s.

The last major microarchitecture, Poulson, shipped in 2012 with eight cores and significant reliability and RAS (reliability, availability, serviceability) enhancements. Its successor, Kittson, was originally intended to be a die shrink and architectural advance (sometimes referred to as Kittson-22 for its 22 nm target). In practice, Intel canceled the ambitious redesign. Instead, “Kittson” was delivered in 2017 as a higher-binned Poulson on the same 32 nm process, offering modest performance gains but no new features. This effectively marked the end of Itanium’s development roadmap.

HP (later HPE) remained the last significant customer, sustaining Itanium through its Integrity server line into the late 2010s. Intel formally discontinued the architecture in 2021.

Specifications of Itanium

Itanium’s design combined elements of classical RISC with its novel EPIC approach. Its has roots in Berkeley RISC designs, particularly in the use of register windows. Instead of a flat set of registers, Itanium employed a large register file (128 general-purpose integer registers, 128 floating-point registers) with the ability to allocate “windows” of registers for procedures. This reduced memory traffic during function calls and allowed compilers to expose more instruction-level parallelism.

At its core, however, Itanium processors were in-order architectures. Unlike contemporary superscalar RISC and x86 processors, which relied on out-of-order execution to dynamically schedule instructions at runtime, Itanium delegated this responsibility to the compiler. The compiler was expected to identify independent instructions and bundle them into instruction groups, which the hardware could then issue in parallel with minimal additional scheduling logic.

This model was known as EPIC. While sometimes confused with Very Long Instruction Word architecture, EPIC was distinct. VLIW processors pack multiple operations into a single, wide instruction word, with rigid issue slots. EPIC, by contrast, retained fixed-length 41-bit instructions, grouped into bundles of three with explicit template fields. These templates signaled which instructions were independent and could be executed in parallel, while still allowing the hardware to perform certain speculative or predicated operations. This made EPIC less brittle than classical VLIW in the face of compiler scheduling errors, and better able to tolerate cache and branch unpredictability.

Itanium also included SIMD to accelerate parallelism. Its floating-point units supported 128-bit wide SIMD instructions, suitable for scientific and multimedia workloads. Coupled with predication (conditional execution without branches), this gave compilers the tools to produce vectorized, parallel code streams with fewer stalls.

Despite these architectural strengths, performance in practice was highly dependent on compiler quality. General-purpose compilers such as GCC or the open-source Open64 often produced disappointing results, as they struggled to expose sufficient instruction-level parallelism to keep the wide Itanium pipelines busy. By contrast, the Intel C++ Compiler (ICC) and HP's vendor compiler, known as ACC, were tuned specifically for Itanium’s execution model. These compilers could aggressively schedule instructions, optimize register window usage, and apply advanced techniques like software pipelining to fully exploit EPIC. As a result, SPEC benchmark results published with ICC or ACC demonstrated competitive performance against RISC and POWER processors of the same era, while the same hardware running code compiled with GCC frequently lagged far behind.