The Red Hill CPU Guide: transition to the 386

Now and then you will see one of these ancient chips still in service. In theory, you could run Windows 98 on any of them, though in practice they were really only capable of running DOS. Even Windows 3.1 was often a bit too much for them, though that's often what people bought them for.

386DX-25

Believe it or not, the illustration at right is just a small portion of an early 386 main board. The entire board has well over 100 separate major components — i.e., IC chips and clock crystals: that's not counting resistors, capacitors, connectors, jumpers, expansion slots or RAM. Current main boards use just a half-dozen at most, and the core logic uses only one or two.

It seems hard to believe now, but even leaving that huge main board aside, the 386DX-25 CPU was very, very expensive when it first came out. This is almost always the case with the fastest parts, and these were the fastest chip on the planet in their day. As so often with the leading-edge parts, they sold in tiny numbers. We sold a couple of ALR DX-25 machines back in 1989 for about $5000 each, without monitor. As a trade-in now, they'd not be worth five dollars — indeed, we would have to pay someone to take them away.

The reason that their leading-edge equivalent today, an Athlon XP or a Pentium 4, costs less than a quarter as much is not technical progress (though the progress of the hardware side of the industry truly has been massive), nor is it manufacturing efficiency (though this too has improved a great deal): it is competition. In 1989 if you wanted a powerful X86 computer CPU there was only one supplier, and the price was whatever they cared to make it.

It is very different now, of course, and it was with the now-humble but then leading-edge 386DX-25 that AMD made their entry into the X86 business as a competitor in their own right (rather than as a licensed second source for Intel-designed parts). AMD's 386 was partly built under license, partly their own design, and wholly loathed by the former monopolist.

You often used to hear people dismiss the so-called clones as "mere imitation". Imitation maybe — "mere", certainly not! In many ways, designing a pin-compatible part with equal performance was more difficult than designing the original, not less. If designing "clone" CPUs was as easy as manufacturing them, dozens of very capable Asian silicon foundries would have long since swamped the market. Think about it: when was the last time you bought a RAM chip made in the USA or Europe?

Form	Design	Manufacture	Introduction	NPU
132-pin PGA	Intel	Intel	April 1988	387
132-pin PGA	Intel, AMD	AMD	March 1991	387
Internal clock	External clock	L1 cache	Width	Transistor count
25MHz	25MHz	none	32-bit	275 thousand

386SX-33

Probably the most common 386SX and, the rare SX-40 aside, certainly the best of them. SX-33s were always vastly faster than SX-25s. Although the raw processing power was not terribly different, there was the 33MHz bus speed to take into account also.

To get a grasp on just how much faster different speed grades were, consider what you would have to do to a fairly recent machine to achieve the same relative improvement.

Just jumping the 386SX from 25 to 33MHz increased three things: CPU clock, mainboard bus, and RAM speed too. As it happens, if you multiply the differences out you'll discover that you get almost exactly the same proportional improvement by upgrading an Athlon Thunderbird 1000 on a 200MHz board with old PC-100 RAM to a Thunderbird 1333 on a 266MHz board with PC-133 RAM.

That step from SX-25 to SX-33, in other words, provided the same amount of extra performance that you needed a mainboard upgrade, new RAM, and no less than five CPU speed grades to get ten years later. In reality, it was even more of a boost than this, because the machines of the 21st Century are so fast that we human beings are oblivious to a good many of the differences between them.

Oddly enough, although our illustration is an AMD part, most of the SX-33s you'd see around were Intel manufactured. In contrast, a great many 386SX-25s and all 386DX-40s were AMD made. We assume that the reason is that by the time the SX-33 came into fashion AMD were concentrating on DX-40 production as their main priority, while Cyrix were still just getting started with the SLC and DLC, which left Intel as the only serious supplier of the part.

Form	Design	Manufacture	Introduction	NPU
100-pin QFP	Intel	Intel	January 1989	387SX
100-pin QFP	Intel, AMD	AMD	March 1991	387SX
Internal clock	External clock	L1 cache	Width	Transistor count
33MHz	33MHz	none	16/32-bit hybrid	275 thousand

Cyrix 486SLC-33

A 486 in name only. These were designed to be a plug-in replacement for 386SX chips for people unlucky enough to own non-standard, non-upgradable computers with a proprietary motherboard.

Unfortunately, they sometimes found their way into new systems too, mostly cut-price models from the most unscrupulous of the mass-market makers, notably Amstrad and Commodore. With their never-ending need to attract ill-educated buyers, the supermarket box builders found the SLC ideal: it gave them the opportunity make an overpriced but still fairly cheap and very slow system that could wear that magic 486 badge on its shelf talker.

(Intel's Celeron 266, by the way, was a more recent equivalent. It too was a huge success in the vomit box market, and it too was rightly scorned by more discerning buyers and went nowhere in the mainstream.)

In fact, you were vastly better off with a good 386 than with a Cyrix 486SLC, especially if the 386 was a DX-40. In this instance it's not really fair to blame Cyrix for the poor performance — these 16/32-bit hybrid parts were always intended to be an upgrade chip, not a stand-alone CPU.

But despite their poor performance, they did introduce an important new technology to X86 CPUs: write back cache. (More on cache shortly.) Cache design was to become a Cyrix specialty in later years. Note that there were two completely different 486SLC chips — the enormously faster IBM one came along later and is described below.

Form	Design	Manufacture	Introduction	NPU
100-pin QFP	Cyrix	Texas Instruments	April 1992	387SX
Internal clock	External clock	L1 cache	Width	Transistor count
33MHz	33MHz	1k unified	16/32-bit hybrid	600 thousand

Intel 486SX-20

Thankfully, the SX-20 was a very rare chip.

SX-25s and 33s accounted for the vast majority of 486SX sales, and the only one we ever saw was glacially slow. Any half-good 386 would blow it into the weeds, even an SX.

The introduction of the 486SX-20 was driven by marketing rather than technical considerations. AMD's 386DX-40 had been released a month earlier and was vastly faster. Intel's 486SX-20 was simply a cynical attempt to hoodwink buyers into spending quite a lot more on a chip that did quite a lot less. Like the Cyrix 486SLC-33 above, it was designed to cash in on the marketing appeal of the 486 name, and hopelessly incapable of providing 486-class performance, but unlike the SLC (which could at least claim that it was useful as an upgrade part) there was no possible excuse for the SX-20, and only one reason for it to exist at all: naked greed.

Form	Design	Manufacture	Introduction	NPU
168-pin PGA	Intel	Intel	April 1991	none
Internal clock	External clock	L1 cache	Width	Transistor count
20MHz	20MHz	8k	32-bit	900 thousand

Intel 486-20

The 486 didn't introduce whole new programming instructions like the 386, but it was a big step just the same. Essentially, a 486 is a very fast 32-bit 386DX CPU, and a 387 NPU, and 8k of cache RAM all on a single chip. By putting the cache and he NPU on the CPU chip, Intel were able to greatly speed communication between the three. Also, the 486 introduced mainframe techniques like pipelining into single chip microprocessors for the first time. With all these different components to fit onto one chip, it is no wonder that the 486 was physically much bigger than a 386. (See the illustration a few entries below.) It was very difficult to manufacture at first, and probably only Intel had the resources to achieve it. As for the original 486-20 (since renamed the 486DX-20), it was very rare indeed and we never saw one in the flesh.

As with almost all new CPU designs, early models of the 486 were very little faster than the best of the previous generation, and vastly more expensive. It is usually only when a design is in its second or third revision (e.g. 486DX/2, Pentium-75 and up, 386DX-40) that it becomes powerful, reliable, and cost-effective.

Form	Design	Manufacture	Introduction	NPU
168-pin PGA	Intel	Intel	April 1989	Internal
Internal clock	External clock	L1 cache	Width	Transistor count
20MHz	20MHz	8k unified	32-bit	1.2 million

Pipelining

The 486 bought a number of mainframe techniques into the X86 world for the first time: internal cache, rudimentary branch prediction, integrated NPU, and a five-stage pipeline.

To understand pipelining, imagine putting out a fire with buckets of water. If you are alone, you have to walk to the well, fill the bucket, walk to the fire, empty the bucket, walk back to the well, and so on. This is how a 386 works, going to the RAM, getting the next instruction, decoding it, running it, going back to the RAM for the next one.

Now imagine have four or five people to help put the fire out. One fills the buckets, one empties them onto the fire, the others pass the buckets along. You are still only pouring one bucket at a time, but it's much faster. CPU pipelining works the same way. Instructions are loaded one after another into the RAM end of the pipeline and the CPU just takes them from the other end as needed.

All modern CPUs are heavily pipelined. As a rough guide, the longer the pipeline, the easier it is to ramp up to higher clockspeeds, but the more severe the penalty each time the program branches and the CPU has to empty out the pipeline and start again with fresh data. This was one of the most significant differences between the Athlon XP and the equivalent Pentium 4: the Athlon had a long pipeline and operated comfortably in the 2000MHz range. The Pentium 4 had a very long pipeline and easily reached 3000MHz, but took a longer time to switch between different tasks — so much longer, in fact, that it was slower than an Athlon in general use. But where the P4 was presented with a long sequence of tasks that can be executed one after the other without interruption, it could be astonishingly fast.

Think of the difference, if you like, as being akin to the difference between a high-performance sports car (the Athlon) and a drag racer (the Pentium 4). In general, the sports car is more able to cope with the twists and turns of everyday motoring. But on a long, straight road, the drag racer rules supreme.

AMD 386SX-40

These came out just as the 386SX market was dying, so you're unlikely to have seen one. We sold maybe four or five of them in total. By the time they arrived, AMD was mainly interested in building its 386DX-40 market and didn't seem to get around to the SX-40 until it was too late to matter much. As you'd expect from the clock speed, they were very fast for an SX.

Form	Design	Manufacture	Introduction	NPU
100-pin QFP	AMD	AMD	1992	387SX
Internal clock	External clock	L1 cache	Width	Transistor count
40MHz	40MHz	none	16/32-bit hybrid	275 thousand

386DX-33

This was the high-performance chip of choice for a long, long time. As always with flagship chips, it was expensive and sold in very small numbers for the first half of its market life, but unlike many flagship chips the DX-33 never really became a volume seller, even after it had been eclipsed by bigger, faster, newer chips. In fact, it's probably more helpful to think of the DX-33 as two quite different chips: the original high-priced high-performer which was rare but much coveted, and the all but forgotten mid-priced part it became after about 1992.

The reason for the DX-33's unusual career was simple: in late 1991 AMD finally escaped from the long and bitter legal battle it had had with Intel and announced the AM386DX-33. This was a mixture of AMD's own design work and the Intel technologies covered under the existing ten year cross-license agreement. Essentially, Intel claimed that the technology sharing contracts they had signed with AMD in 1976 and 1982 allowed AMD to use Intel intellectual property to make 286 parts but nothing else. AMD, on the other hand, said something like: "a contract is a contract, and if you won't give us the designs for the 386, we will use those parts we already have and design the rest ourselves". After years in the courtrooms, AMD won the case, and this was the start of the massive change that swept across the entire industry in the early 1990s, to the enormous benefit of buyers both then and in years to come.

AMD made and sold a moderate number of its own 386DX-33 , but soon moved on to that all-time great chip, the 386DX-40. And this was why the Intel 386DX-33 sold so poorly from that time on: very few people were willing to pay more for a slower chip. For the first time since Z-80 days Intel found itself on the back foot, and resorted instead to the popular but lack-lustre 486SX. (It was no accident that the 486SX came out at the time it did.) Intel's DX-33 languished in the AMD chip's shadow in later life, and mainly appeared in low performance proprietary machines: Compaqs, Dells, and similar.

Form	Design	Manufacture	Introduction	NPU
132-pin PGA	Intel	Intel	April 1989	387
132-pin PGA	Intel, AMD	AMD	October 1991	387
Internal clock	External clock	L1 cache	Width	Transistor count
33MHz	33MHz	none	32-bit	275 thousand

486SX-25

The 486SX was a strange beast. Essentially, it was a 486DX with the maths co-processor disabled. This involved one extra step in the manufacturing process, so the 'cheap' 486SX actually cost more to make than the 'expensive' 486DX! The SX was developed mostly for marketing reasons — to give Intel a relatively cheap chip to equal or better the AMD 386DX-40 but not cut the heart out of their massively profitable flagship, the 486DX-33.

How could it be 'cheap' when it cost more to make than the DX? Because the actual manufacturing cost was so low anyway that they could still sell it at a good margin. An SX-33 used to retail for around A$200, the DX about twice that, but they were both understood to cost less than US$10 to make. So an incremental increase in manufacturing cost just wasn't significant. (Later on, they got around to making SX chips without co-pro, which saved them a few thousand transistors.)

It is not hard to see why Intel were always the single most profitable company in the computer hardware industry! Bear in mind, though, that they had to make massive allowances for non-production costs: research, new plant every few years, and marketing. Remember too that for every successful product, there are unsuccessful projects as well, and big hits like the 486 have to pay for the big misses too — yes, Intel have had their share of them.

486SX-25s were very often over-clocked and run at 33MHz. They were about $50 cheaper than an SX-33, and not supposed to be able to run at more than 25MHz, but they did, and still do. In fact, this is one of the very few chips we ever recommended overclocking as routine. (Along with the Celeron 300A and the K6-III/450+.) 486SX-25s always went well at 33MHz with a heat-sink and cooling fan, very nearly always with just a heat-sink, and usually proved trouble-free without any cooling at all. If you owned one, there was nothing to stop you cranking it up a bit.

If think you owned an SX-33, you more than likely really owned one of these, and you couldn't easily tell by looking at it, because many thousands of them were re-marked in shady back street factories somewhere in South-East Asia. Sometimes you could see the difference — the re-marked ones tended to have a smoother, slightly shiny finish, and the etching was less deep than the real thing. In practice, it didn't matter. Again, this only applies to the SX-25/SX-33. Many other chips have been re-marked but they tend to be troublesome. Usually, chip manufacturers mark CPUs at the highest safe speed so that they can sell them for the maximum price. We understand that Intel marked a lot of 33MHz chips as 25MHz simply because they needed a way to compete with the AMD 386DX-40. Their own 386 only ran at 33MHz, and they wanted to keep the price of the 486-33 as high as they could. Hence the 486SX-25. Intel denied this, of course, but no-one believed them.

In later days, the market became better educated and the manufacturers were more relaxed about this sort of thing — late production Pentium MMX 166 chips, for example, were exactly the same as the 200 and 233 parts, but with the clock multiplier pin disabled so you couldn't run them at their best speed.

Form	Design	Manufacture	Introduction	NPU
168-pin PGA	Intel	Intel	September 1991	none
Internal clock	External clock	L1 cache	Width	Transistor count
25MHz	25MHz	8k unified	32-bit	0.9 or 1.2 million

the red hill cpu guide

transition to the 386