|
Technical
The Internal Components
The Processor
The Microprocessor Behind The Personal Computer
Very Basically, a microprocessor combines the functions of a CPU (Central
Processing Unit) within one chip. It includes a ALU (Arithmetic Logic
Unit), internal registers and a CU (control unit) for sequencing the system.
The processor has three buses, a bi-directional data bus, mono-directional
address bus and control bus. The data bus carries data between various
components of the system, typically from memory to the processor or input
output controller. The address bus carries an address generated by the
processor, which will select one internal register within one of the chips
attached to the system and specifies the source or destination of the
data which will carry along the data bus. The control bus carries various
synchronization signals. The processor needs some sort of clock to synchronize
the precise timing references of the system. The 8086 processor model
is still intact in the latest Pentium processors. The processor design
includes a Bus Unit, and ALU, Execution Unit (EU) and an instruction queue.
Later Pentium designs include cache, a page unit, a Floating Point Unit,
a branch target buffer and RISC (Reduced Instruction Set Computer) concepts
in the execution unit.
To understand the PCs capabilities and performance a brief history of
the microprocessors follows.
Intel Microprocessors
Intel had introduced the 8086 processor three years before the announcement
of the IBM PC. However because of the cost of designing the personal computer
around this microprocessor IBM choose the 8088 microprocessor also released
by Intel. The 8088 microprocessor has a 16 bit internal bus but only supports
a 8 bit external bus making it easier to use standard 8 bit peripheral
chips that were already around, and allowed a smaller entry level of system
memory. Using the 8088 processor then, also mapped the way for easy migration
to the 8086 and 286 microprocessors that were to follow. The 8088 processor
accomplished 1 MB addressing by using a technique known as segmentation.
A two step process used to address memory. First a segment register was
loaded with a pointer to a 64 KB block of memory, then normal 8 bit registers
could manipulate data in that 64 KB segment. The segment register needed
to be loaded with new data to access memory outside the 64 KB. (It was
not until the 80386 processor that memory addressing mode allowed full
linear mapping).
The PC AT was announced in 1984 by IBM, and used the Intel 286, 16 bit
processor, which supported 16 bit bus transfers and 24 bit memory addressing
and protected mode memory management which allows programs to be written
that prevent one portion effect another portion and hence one requirement
of multitasking. Increasing the expansion bus to 16 bits and remaining
backward compatible allowed existing expansion cards to work in the new
architecture. All PC designs still support the 16 bit (AT) bus, so cards
that were designed for the original IBM PC should still work properly
in a modern bus motherboard ( Some manufacturers will stop supporting
this bus soon).
The 80386 microprocessor was announced by Intel in 1985. The processor
could process 32 bit data and access memory on a 32 bit bus. Chips were
added to the motherboard to allow the AT bus to run asynchronously to
the processors clock, and permits their speeds to run independently.
The memory was also moved from the external bus to the microprocessors
local bus and no longer dependent on the speed of the external bus. By
adding cache memory (much faster, smaller and more expensive) to the local
bus the whole system was speeded up and freed the external bus from some
of the constraints namely a bottleneck on the PC. Windows applications
also pushed the PC to its limits and soon the graphics card performance
became the bottle neck in the PC system performance. The 386 introduced
linear addressing along with Demand Paging. Demand paging automatically
detects when a block of memory is not in system memory and requires retrieving
from the hard disk, Virtual Addressing. The 386 processor allowed Virtual
8086 mode. Each user or task could operate as though having the entire
system to itself. 32 bit operating systems can use the full features of
the 386 protected modes and offer 32 bit support. Bank Interleaving increased
access by partitioning memory into multiple blocks that could be accessed
simultaneously. The 386 was later shipped with a 16 bit internal bus which
lowered system costs in a competitive market and was named the 386SX.
The original 386 was named the 386DX.
In April 1989 Intel announced the 486 microprocessor. Apart from performance
gains, not much changed to the architecture design however the new chip
took advantage of advances in transistor size by adding a math coprocessor
and a small amount of cache on the chip. The processor bus had changed
somewhat from the 386 processor to allow burst transfer. Only one pointer,
the start address needed to be loaded in a register to process blocks
of memory. When the graphics card became the major bottleneck in the PC
system, VESA (Video Electronics Standards Association) used the 486 local
bus and extended it to include VESA local bus slots, adding them behind
the AT bus slots, thus combining both buses in a system card, to provide
high speed peripheral performance without replacing the function of the
AT bus. Every time Intel introduced a new microprocessor they changed
the processor architecture, so the Chipset had to change. To solve this
problem Intel introduced a new bus called PCI (Peripheral Connection Interface)
that would attach to the microprocessor bus via a local bus to PCI bridge
Chipset. Only the bridge chip needs to be changed if the microprocessor
and local bus design change which Intel do to improve functionality, speed
and take advantage of new technology. External buses were reaching a limit
. Incorporating on chip cache meant it was possible to run the processor
at much higher clock rates inside while the external bus runs at lower
speeds. A PLL (Phase Lock Loop) is also used and will accept an input
from a reference clock and can multiply or divide the clock to accomplish
the processor internal bus running faster. The 486 processor introduced
a new System Management Mode, totally hidden from the other modes, but
can be entered from the other modes and was developed to be used in notebook
technology to allow power management functions to perform transparent
to the operating system and applications. The 486 processor was eventually
released in a SX version which had no math coprocessor . The original
486 was renamed the 486DX.
The Intel Pentium processor (P54C) was introduced in March 1993. The design
completely enhanced the math coprocessor performance and increased the
size of on chip cache. The Pentium local bus width is 64 bits and, under
certain conditions the processor could execute two instructions in a single
clock cycle. The Pentium also includes advanced system integrity features
such as parity checking on each byte of data transferred on the external
bus and generated on the address bus. Internal parity checking is done
on instruction and data caches and nearly all internal registers and internal
ROM instructions and data. The Pentium will also shut down if internal
errors are detected. Did you notice Intel stopped using X86 to describe
the model of processor and favoured a naming convention.
The Intel Pentium Pro processor was introduced in November 1995. It is
a superpipelined superscaler processor supporting ECC (Error Correcting
Code), Fault Analysis & Recovery, Functional Redundancy Checking,
Multi-branch prediction, data flow analysis and supports multiple processors
and is supplied with 16 KB of L1 cache and 32 KB of on die L2 cache that
operates at the processor bus speed. The processor can address 64 GB of
main memory through the addition of four more address lines. This is a
RISC chip with a 486 hardware emulator on it. Several techniques are used
by this chip to produce more performance. A performance increase is achieved
by dividing processing into stages, three instructions can be decoded
in each one, as opposed to two for the Pentium. In addition, instruction
decoding and execution decoupled, instructions can still be executed if
one pipeline stops. The Pentium Pro was first aimed at the server market
and optimised to run 32 bit code.
The Intel Pentium MMX (P55C) processor was announced (quietly) in January
1997, followed by an uproar by consumers who were not issued the new processor
in pre-Christmas purchased PC's. The MMX (Matrix Math Extensions or Multimedia
Extensions) chip incorporates a lot of RISC (reduced Instruction Set Computer)
architecture as opposed to CISC (Complex Instruction Set Computer), and
will be the subject of another guide. Multimedia extensions enhance audio,
video playback and graphics performance. All Intel CPU processors support
MMX extensions. The MMX processor was also the last in the line to be
mounted in the ZIF socket on the motherboard (presumably so that
they could patent the design and stop AMD from taking over the market
as most popular desktop processor).
The Pentium II processor from Intel includes MMX instructions (which enhance
multimedia performance), it has 32 Kb onboard L1 cache. The L2 cache is
mounted on a riser card (dual cavity package) along with the CPU, interconnected
by the DIB (Dual Independent Bus) and fits into a slot on the motherboard.
The Pentium II processor included SMP (Symmetric Multi-Processor)
support for 2 CPU's through the GTL+ bus and uses two MMX execution
units both execution units and the secondary cache are supplied with ECC.
Pentium Pro and Pentium II processors contain a bug in the FPU
(Floating Point Unit). The conversion of certain large negative numbers
into integers sometimes fails to detect an overflow. Software solutions
are available.
In February 1999, Intel unveiled its latest processor, the Pentium III.
The Pentium III in addition to being faster than the Pentium Pro and Pentium
II processors has many new features, including a unique processor
ID and new processor instructions, Streaming SIMD Extensions, or SSE.
SIMD, stands for Single Instruction Multiple Data, the capability to process
more than one data element in one instruction. Though SSE adds new features,
existing applications are not affected. These new instructions do for
the Pentium II what MMX did for the Pentium.
Now the Intel Pentium 4 is here. Despite its 42 million transistors, the
P4 as a whole is not that much faster than a Pentium III for general purposes
but in time will go to clock speeds that the Pentium III could never match.
For the first time since the Pentium Pro processor , Intel has redesigned
their microprocessor architecture, adding features that they say will
allow them to deliver leading performance for several years.
The competition
To further complicate processor options several manufacturers introduced
clone processors. AMD (Advanced Micro Device) produced 286 processors
under license from Intel and then claimed the license covered 386 and
486 designs. Up until the introduction of the K5 (Pentium equivalent),
there was no real performance or functionality gains over Intel's processors.
The K5 is not a clone of the Intel Pentium and claims performance gains
due to superscaler design such as dual pipelines, branch prediction and
execution in anticipation of a branch. AMD introduced the K6 in mid 1996
and like the Intel MMX, used 64 KB L1 cache. The chip fits into existing
processor sockets on the motherboard unlike the Intel design which needed
a new motherboard. The AMD K6-2 is similar to the K6 except it offers
3DNow! technology, which is AMD's version of MMX - but much more powerful.
The K6-2 has been proven to outperform a Pentium II machine of an equivalent
clock speed. The K6-2 also introduced the 100MHz FSB (front side bus).
The K6-2 should work in any system a K6 (Socket 7), however the K6-2 requires
less voltage. The K6-2 is the best performing member of the Pentium-compatible
family of Socket 7 processors. The K6-3 a higher performance version of
the K6-2, due to Tri-level cache design and improved manufacturing process.
The K6-III is roughly comparable in performance to the Pentium II. I do
not have a lot of details about the AMD processors at three time of typing.
Cyrix designed there processors from the ground up, using non of Intel's
technology. Initial designs of 386 and 486 processors are not actual clones
of Intel's processors, but hybrids. All the designs use a 486 like processor
with a five stage pipeline, which allows many instructions to be executed
in a single clock cycle. However a smaller level of data and instruction
cache has been added to the chip. Cyrix also licensed its processor design
to IBM, SGS Thompson and Texas Instruments. Texas Instruments also developed
its own version of the 486 processor with larger caches and PCI bus interfaces
built in.
The Maths Co-Processor
This processor goes by several names, the coprocessor, the math coprocessor,
the floating point processor and the NPX (Numerical Processor Extension).
The processor can only directly work with whole integer numbers. Math's
functions perform calculations on numbers in non-integer format, so Intel
introduced the Maths Coprocessor , capable of performing numeric operations
20 to 100 times faster than equivalent software routines using integer
arithmetic processors. The trend is to have the math coprocessor integrated
on the same chip as the integer processor. In the past Intel based computers
were slow compared to RISC workstations, but since the release of the
Pentium processor Intel redesigned the structure and functions, so performance
is 5 to 10 times that of 486 processors and competitive with RISC workstations.
The math coprocessor or also capable of handling integers packed numeric
data. The math coprocessor can output data in several formats, internally
all data is represented as temporary real numbers, a standard 80 bit format.
To software, the coprocessor appears as additional registers, data types
and instructions. The coprocessor has a number of embedded constants such
as PI, Sine, Cosine, Tangent, etc. and arithmetic functions in addition
to add and subtract.
The Future
VLIW (Very Long Instruction Word) processors receive several instructions
packed into a single instruction word from compiled software that is executed
along a set of parallel execution units for simultaneous processing. Most
programs process blocks of instructions between branches, typically small
blocks and if compiled so that instructions and tasks are arranged such
that the very long instruction word (VLIW) contained no branching, pipelines
wouldn't stall. Which is the technology behind caching.
Too be continued...
|