scotsmist
SUPPORT

THE INTEL PENTIUM 4

Processor performance is no longer determined by MHz alone. It is a function of frequency multiplied by IPC, or Instructions per clock Cycle. In order to overcome frequency limitations of the Pentium II and III systems, Intel developed a design that slightly reduced the number of IPC but also reaped significantly higher frequency capabilities.

Data Pipeline
The Intel Pentium design utilised a 5-stage pipeline. The Pentium Pro was released and the branch prediction/recovery pipeline was doubled to include 10-stages. Now the Pentium 4 pipeline depth has been doubled once again to 20 stages and now called NetBurst Micro-Architecture.

Integer & FP Units

Intel's Dynamic Execution Engine keeps the Arithmetic Logic Units busy with instructions to execute. The Pentium III only provids 42 instructions that the execution units can choose. The Pentium 4 has 126 which increases the probability that the data needed after a cache miss will be available almost immediately and not wait to fetch it from memory. This becomes more important as processor speed increases. The ALU is also clocked at twice the processor speed.

Level 1 and 2 Cache

Level 1 cache has decreased to a 8KB data only cache and Intel has re-implemented the instruction cache to store micro-operations in the path of the program execution, so results of program branches are integrated into the same cache line. Latency is eliminated because the execution engine can retrieve decoded operations from the cache directly, rather than fetching and decoding commonly used instructions over and over. Instructions that are not used do not get stored in the cache.

SSE2
64-bit integer instructions in the MMX processor paved the way for SSE, which doubled in the PIII which featured 128 bit single precision floating point instructions. The P4 with SSE2 Features 144 new instructions that add 128-bit SIMD integer arithmetic and 128-bit double-precision floating-point operations. SSE2 further reduces the number of instructions required to execute particular tasks.

Motherboard Chipset
The Pentium 4 requires a motherboard based on Intel's new i850 chipset and also the ICH2 I/O Controller Hub used with the i815E chipset which offers support for dual-RDRAM channels, ATA-100, and four USB ports with 24 MBps of bandwidth. The i850 also offers support for the NetBurst architecture.

RAMBus RIMMS
The main system RAM holds the data and is linked directly to the processor. As the P4 bus runs at 400 MHz a new kind of RAMBus chip is required. Since the i850 uses dual channels, RAM has to be installed in pairs.

A New Case Design
The ATX specification has been adapted to the Pentium 4 and named 2.03. This adds a 12V connector to the power supply capable of delivering a dedicated power output for the processor so motherboard manufacturers will not have to route power across the board to deliver the ~52W required by the P4.

Technical The Internal Components The Processor

The Microprocessor Behind The Personal Computer

Very Basically, a microprocessor combines the functions of a CPU (Central Processing Unit) within one chip. It includes a ALU (Arithmetic Logic Unit), internal registers and a CU (control unit) for sequencing the system. The processor has three buses, a bi-directional data bus, mono-directional address bus and control bus. The data bus carries data between various components of the system, typically from memory to the processor or input output controller. The address bus carries an address generated by the processor, which will select one internal register within one of the chips attached to the system and specifies the source or destination of the data which will carry along the data bus. The control bus carries various synchronization signals. The processor needs some sort of clock to synchronize the precise timing references of the system. The 8086 processor model is still intact in the latest Pentium processors. The processor design includes a Bus Unit, and ALU, Execution Unit (EU) and an instruction queue. Later Pentium designs include cache, a page unit, a Floating Point Unit, a branch target buffer and RISC (Reduced Instruction Set Computer) concepts in the execution unit.

To understand the PCs capabilities and performance a brief history of the microprocessors follows.

Intel Microprocessors

Intel had introduced the 8086 processor three years before the announcement of the IBM PC. However because of the cost of designing the personal computer around this microprocessor IBM choose the 8088 microprocessor also released by Intel. The 8088 microprocessor has a 16 bit internal bus but only supports a 8 bit external bus making it easier to use standard 8 bit peripheral chips that were already around, and allowed a smaller entry level of system memory. Using the 8088 processor then, also mapped the way for easy migration to the 8086 and 286 microprocessors that were to follow. The 8088 processor accomplished 1 MB addressing by using a technique known as segmentation. A two step process used to address memory. First a segment register was loaded with a pointer to a 64 KB block of memory, then normal 8 bit registers could manipulate data in that 64 KB segment. The segment register needed to be loaded with new data to access memory outside the 64 KB. (It was not until the 80386 processor that memory addressing mode allowed full linear mapping).
The PC AT was announced in 1984 by IBM, and used the Intel 286, 16 bit processor, which supported 16 bit bus transfers and 24 bit memory addressing and protected mode memory management which allows programs to be written that prevent one portion effect another portion and hence one requirement of multitasking. Increasing the expansion bus to 16 bits and remaining backward compatible allowed existing expansion cards to work in the new architecture. All PC designs still support the 16 bit (AT) bus, so cards that were designed for the original IBM PC should still work properly in a modern bus motherboard ( Some manufacturers will stop supporting this bus soon).
The 80386 microprocessor was announced by Intel in 1985. The processor could process 32 bit data and access memory on a 32 bit bus. Chips were added to the motherboard to allow the AT bus to run asynchronously to the processors clock, and permits their speeds to run independently. The memory was also moved from the external bus to the microprocessors local bus and no longer dependent on the speed of the external bus. By adding cache memory (much faster, smaller and more expensive) to the local bus the whole system was speeded up and freed the external bus from some of the constraints namely a bottleneck on the PC. Windows applications also pushed the PC to its limits and soon the graphics card performance became the bottle neck in the PC system performance. The 386 introduced linear addressing along with Demand Paging. Demand paging automatically detects when a block of memory is not in system memory and requires retrieving from the hard disk, Virtual Addressing. The 386 processor allowed Virtual 8086 mode. Each user or task could operate as though having the entire system to itself. 32 bit operating systems can use the full features of the 386 protected modes and offer 32 bit support. Bank Interleaving increased access by partitioning memory into multiple blocks that could be accessed simultaneously. The 386 was later shipped with a 16 bit internal bus which lowered system costs in a competitive market and was named the 386SX. The original 386 was named the 386DX.
In April 1989 Intel announced the 486 microprocessor. Apart from performance gains, not much changed to the architecture design however the new chip took advantage of advances in transistor size by adding a math coprocessor and a small amount of cache on the chip. The processor bus had changed somewhat from the 386 processor to allow burst transfer. Only one pointer, the start address needed to be loaded in a register to process blocks of memory. When the graphics card became the major bottleneck in the PC system, VESA (Video Electronics Standards Association) used the 486 local bus and extended it to include VESA local bus slots, adding them behind the AT bus slots, thus combining both buses in a system card, to provide high speed peripheral performance without replacing the function of the AT bus. Every time Intel introduced a new microprocessor they changed the processor architecture, so the Chipset had to change. To solve this problem Intel introduced a new bus called PCI (Peripheral Connection Interface) that would attach to the microprocessor bus via a local bus to PCI bridge Chipset. Only the bridge chip needs to be changed if the microprocessor and local bus design change which Intel do to improve functionality, speed and take advantage of new technology. External buses were reaching a limit . Incorporating on chip cache meant it was possible to run the processor at much higher clock rates inside while the external bus runs at lower speeds. A PLL (Phase Lock Loop) is also used and will accept an input from a reference clock and can multiply or divide the clock to accomplish the processor internal bus running faster. The 486 processor introduced a new System Management Mode, totally hidden from the other modes, but can be entered from the other modes and was developed to be used in notebook technology to allow power management functions to perform transparent to the operating system and applications. The 486 processor was eventually released in a SX version which had no math coprocessor . The original 486 was renamed the 486DX.
The Intel Pentium processor (P54C) was introduced in March 1993. The design completely enhanced the math coprocessor performance and increased the size of on chip cache. The Pentium local bus width is 64 bits and, under certain conditions the processor could execute two instructions in a single clock cycle. The Pentium also includes advanced system integrity features such as parity checking on each byte of data transferred on the external bus and generated on the address bus. Internal parity checking is done on instruction and data caches and nearly all internal registers and internal ROM instructions and data. The Pentium will also shut down if internal errors are detected. Did you notice Intel stopped using X86 to describe the model of processor and favoured a naming convention.
The Intel Pentium Pro processor was introduced in November 1995. It is a superpipelined superscaler processor supporting ECC (Error Correcting Code), Fault Analysis & Recovery, Functional Redundancy Checking, Multi-branch prediction, data flow analysis and supports multiple processors and is supplied with 16 KB of L1 cache and 32 KB of on die L2 cache that operates at the processor bus speed. The processor can address 64 GB of main memory through the addition of four more address lines. This is a RISC chip with a 486 hardware emulator on it. Several techniques are used by this chip to produce more performance. A performance increase is achieved by dividing processing into stages, three instructions can be decoded in each one, as opposed to two for the Pentium. In addition, instruction decoding and execution decoupled, instructions can still be executed if one pipeline stops. The Pentium Pro was first aimed at the server market and optimised to run 32 bit code.
The Intel Pentium MMX (P55C) processor was announced (quietly) in January 1997, followed by an uproar by consumers who were not issued the new processor in pre-Christmas purchased PC's. The MMX (Matrix Math Extensions or Multimedia Extensions) chip incorporates a lot of RISC (reduced Instruction Set Computer) architecture as opposed to CISC (Complex Instruction Set Computer), and will be the subject of another guide. Multimedia extensions enhance audio, video playback and graphics performance. All Intel CPU processors support MMX extensions. The MMX processor was also the last in the line to be mounted in the ZIF socket on the motherboard (presumably so that they could patent the design and stop AMD from taking over the market as most popular desktop processor).
The Pentium II processor from Intel includes MMX instructions (which enhance multimedia performance), it has 32 Kb onboard L1 cache. The L2 cache is mounted on a riser card (dual cavity package) along with the CPU, interconnected by the DIB (Dual Independent Bus) and fits into a slot on the motherboard. The Pentium II processor included SMP (Symmetric Multi-Processor) support for 2 CPU's through the GTL+ bus and uses two MMX execution units both execution units and the secondary cache are supplied with ECC. Pentium Pro and Pentium II processors contain a bug in the FPU (Floating Point Unit). The conversion of certain large negative numbers into integers sometimes fails to detect an overflow. Software solutions are available.
In February 1999, Intel unveiled its latest processor, the Pentium III. The Pentium III in addition to being faster than the Pentium Pro and Pentium II processors has many new features, including a unique processor ID and new processor instructions, Streaming SIMD Extensions, or SSE. SIMD, stands for Single Instruction Multiple Data, the capability to process more than one data element in one instruction. Though SSE adds new features, existing applications are not affected. These new instructions do for the Pentium II what MMX did for the Pentium.
Now the Intel Pentium 4 is here. Despite its 42 million transistors, the P4 as a whole is not that much faster than a Pentium III for general purposes but in time will go to clock speeds that the Pentium III could never match. For the first time since the Pentium Pro processor , Intel has redesigned their microprocessor architecture, adding features that they say will allow them to deliver leading performance for several years.

The competition

To further complicate processor options several manufacturers introduced clone processors. AMD (Advanced Micro Device) produced 286 processors under license from Intel and then claimed the license covered 386 and 486 designs. Up until the introduction of the K5 (Pentium equivalent), there was no real performance or functionality gains over Intel's processors. The K5 is not a clone of the Intel Pentium and claims performance gains due to superscaler design such as dual pipelines, branch prediction and execution in anticipation of a branch. AMD introduced the K6 in mid 1996 and like the Intel MMX, used 64 KB L1 cache. The chip fits into existing processor sockets on the motherboard unlike the Intel design which needed a new motherboard. The AMD K6-2 is similar to the K6 except it offers 3DNow! technology, which is AMD's version of MMX - but much more powerful. The K6-2 has been proven to outperform a Pentium II machine of an equivalent clock speed. The K6-2 also introduced the 100MHz FSB (front side bus). The K6-2 should work in any system a K6 (Socket 7), however the K6-2 requires less voltage. The K6-2 is the best performing member of the Pentium-compatible family of Socket 7 processors. The K6-3 a higher performance version of the K6-2, due to Tri-level cache design and improved manufacturing process. The K6-III is roughly comparable in performance to the Pentium II. I do not have a lot of details about the AMD processors at three time of typing.
Cyrix designed there processors from the ground up, using non of Intel's technology. Initial designs of 386 and 486 processors are not actual clones of Intel's processors, but hybrids. All the designs use a 486 like processor with a five stage pipeline, which allows many instructions to be executed in a single clock cycle. However a smaller level of data and instruction cache has been added to the chip. Cyrix also licensed its processor design to IBM, SGS Thompson and Texas Instruments. Texas Instruments also developed its own version of the 486 processor with larger caches and PCI bus interfaces built in.

The Maths Co-Processor

This processor goes by several names, the coprocessor, the math coprocessor, the floating point processor and the NPX (Numerical Processor Extension). The processor can only directly work with whole integer numbers. Math's functions perform calculations on numbers in non-integer format, so Intel introduced the Maths Coprocessor , capable of performing numeric operations 20 to 100 times faster than equivalent software routines using integer arithmetic processors. The trend is to have the math coprocessor integrated on the same chip as the integer processor. In the past Intel based computers were slow compared to RISC workstations, but since the release of the Pentium processor Intel redesigned the structure and functions, so performance is 5 to 10 times that of 486 processors and competitive with RISC workstations. The math coprocessor or also capable of handling integers packed numeric data. The math coprocessor can output data in several formats, internally all data is represented as temporary real numbers, a standard 80 bit format. To software, the coprocessor appears as additional registers, data types and instructions. The coprocessor has a number of embedded constants such as PI, Sine, Cosine, Tangent, etc. and arithmetic functions in addition to add and subtract.

The Future

VLIW (Very Long Instruction Word) processors receive several instructions packed into a single instruction word from compiled software that is executed along a set of parallel execution units for simultaneous processing. Most programs process blocks of instructions between branches, typically small blocks and if compiled so that instructions and tasks are arranged such that the very long instruction word (VLIW) contained no branching, pipelines wouldn't stall. Which is the technology behind caching. Too be continued...
< The Motherboard The System Memory >

If you have a question that is not answered on any of our pages why not post it on our community forum

[Welcome] [About Us] [25 Pounds] [Search] [Downloads] [Email] [Site Map] [Forum]

Copyright © 1994-2002 scotsmist.co.uk