Athlon


Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices. The original Athlon was the first seventh-generation x86 processor and was the first desktop processor to reach speeds of one gigahertz. It made its debut on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit chips targeting the Socket AM1 desktop SoC architecture, and Socket AM4 Zen microarchitecture. The modern Zen-based Athlon with a Radeon Graphics processor was introduced in 2019 as AMD’s highest-performance entry-level processor.
Athlon comes from the Ancient Greek wikt:ἆθλον meaning " contest", or "prize of a contest", or "place of a contest; arena". With the Athlon name originally used for AMD's mid-range processors, AMD currently uses Athlon for budget APUs.

Brand history

K7 design and release

AMD founder Jerry Sanders aggressively pursued strategic partnerships and engineering talent in the late 1990s, to build on earlier successes in the PC market with the AMD K6 line of processors. One major partnership announced in 1998 paired AMD with semiconductor giant Motorola to co-develop copper-based semiconductor technology, and resulted with the K7 project being the first commercial processor to utilize copper fabrication technology. In the announcement, Sanders referred to the partnership as creating a "virtual gorilla" that would enable AMD to compete with Intel on fabrication capacity while limiting AMD's financial outlay for new facilities.
The K7 design team was led by Dirk Meyer, who had worked as a lead engineer at DEC on multiple Alpha microprocessors during his employment at DEC. When DEC was sold to Compaq in 1998, the company discontinued Alpha processor development. Sanders approached many of the Alpha engineering staff as Compaq/DEC wound down their semiconductor business, and was able to bring in nearly all of the Alpha design team. The K7 engineering design team thus now consisted of both the previously acquired NexGen K6 team and the nearly complete Alpha design team.
In August 1999, AMD released the Athlon processor.
By working with Motorola, AMD was able to refine copper interconnect manufacturing to the production stage about one year before Intel. The revised process permitted 180-nanometer processor production. The accompanying die-shrink resulted in lower power consumption, permitting AMD to increase Athlon clock speeds to the 1 GHz range. Yields on the new process exceeded expectations, permitting AMD to deliver high speed chips in volume in March 2000.
The Athlon architecture also used the EV6 bus licensed from DEC as its main system bus. Intel required licensing to use the GTL+ bus used by its Slot 1 Pentium II and later processors. By licensing the EV6 bus used by the Alpha line of processors from DEC, AMD was able to develop its own chipsets and motherboards, and avoid being dependent on licensing from its direct competitor.

Later Athlon name usage

While the Athlon name was eventually repurposed for combined CPU/GPU processors with the GPU disabled, after the 2007 launch of the Phenom processors, the name was also used for mid-range processors, positioned above Sempron.
A USD$55 low-power Athlon 200GE with a Radeon graphics processor was introduced in September 2018, sitting under the Ryzen 3 2200G. With the release, AMD began using the Athlon brand name to refer to "low cost, high volume products," in a situation similar to both Intel's Celeron and Pentium Gold. The modern Athlon 3000G was introduced in 2019, and was positioned as AMD’s highest-performance entry-level processor.

Features of Athlon

General architecture

Like the AMD K5 and K6, the Athlon dynamically buffers internal micro-instructions at runtime resulting from parallel x86 instruction decoding. The CPU is an out-of-order design, again like previous post-5x86 AMD CPUs. The Athlon utilizes the Alpha 21264's EV6 bus architecture with double data rate technology. This means that at 100 MHz, the Athlon front side bus actually transfers at a rate similar to a 200 MHz single data rate bus, which was superior to the method used on Intel's Pentium III.
AMD designed the CPU with more robust x86 instruction decoding capabilities than that of K6, to enhance its ability to keep more data in-flight at once. The Athlon's three decoders could potentially decode three x86 instructions to six microinstructions per clock, although this was somewhat unlikely in real-world use. The critical branch predictor unit, essential to keeping the pipeline busy, was enhanced compared to what was on board the K6. Deeper pipelining with more stages allowed higher clock speeds to be attained. Whereas the AMD K6-III+ topped out at 570 MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was capable of clocking much higher.
AMD ended its long-time handicap with floating point x87 performance by designing a super-pipelined, out-of-order, triple-issue floating point unit. Each of its three units was tailored to be able to calculate an optimal type of instructions with some redundancy. By having separate units, it was possible to operate on more than one floating point instruction at once. This FPU was a huge step forward for AMD. While the K6 FPU had looked anemic compared to the Intel P6 FPU, with Athlon this was no longer the case.
The 3DNow! floating point SIMD technology, again present, received some revisions and a name change to "Enhanced 3DNow!". Additions included DSP instructions and an implementation of the extended MMX subset of Intel SSE.
The Athlon's CPU cache consisted of the typical two levels. Athlon was the first x86 processor with a 128 KB split level 1 cache; a 2-way associative cache separated into 2×64 KB for data and instructions. This cache was double the size of K6's already large 2×32 KB cache, and quadruple the size of Pentium II and III's 2×16 KB L1 cache. The initial Athlon used 512 KB of level 2 cache separate from the CPU, on the processor cartridge board, running at 50% to 33% of core speed. This was done because the 250 nm manufacturing process was too large to allow for on-die cache while maintaining cost-effective die size. Later Athlon CPUs, afforded greater transistor budgets by smaller 180 nm and 130 nm process nodes, moved to on-die L2 cache at full CPU clock speed.

Athlon "Classic"

The AMD Athlon processor launched on June 23, 1999, with general availability by August '99. It launched at 500 MHz and was, on average, 10% faster than the Pentium III at the same clock for Business applications, and even faster for gaming workloads.
The Athlon Classic is a cartridge-based processor, named Slot A and similar to Intel's cartridge Slot 1 used for Pentium II and Pentium III. It used the same, commonly available, physical 242 pin connector used by Intel Slot 1 processors but rotated by 180 degrees to connect the processor to the motherboard. The reversal served to make the slot keyed to prevent installation of the wrong CPU, as the Athlon and Intel processors used fundamentally different signaling standards for their front-side bus. The cartridge assembly allowed the use of higher speed cache memory modules than could be put on motherboards at the time. Similar to the Pentium II and the Katmai-based Pentium III, the Athlon Classic contained 512 KB of L2 cache. This high-speed SRAM cache was run at a divisor of the processor clock and was accessed via its own 64-bit bus, known as a "back-side bus" allowing the processor to both service system front side bus requests and cache accesses simultaneously versus the traditional approach of pushing everything through the front-side bus.
One limitation is that SRAM cache designs at the time were incapable of keeping up with the Athlon's clock scalability, due both to manufacturing limitations of the cache chips and the difficulty of routing electrical connections to the cache chips themselves. It became increasingly difficult to reliably run an external processor cache to match the processor speeds being released—and in fact it became impossible. Thus initially the Level 2 cache ran at half of the CPU clock speed up to 700 MHz. Faster Slot-A processors had to compromise further and run at 2/5 or 1/3. This later race to 1 GHz by AMD and Intel further exacerbated this bottleneck as ever higher speed processors demonstrated decreasing gains in overall performance—stagnant SRAM cache memory speeds choked further improvements in overall speed. This directly lead to the development of integrating L2 cache onto the processor itself and remove the dependence on external cache chips. AMD's integration of the cache onto the Athlon processor itself would later result in the Athlon Thunderbird.
The Slot-A Athlons were the first multiplier-locked CPUs from AMD. This was partly done to hinder CPU remarking being done by questionable resellers around the globe. AMD's older CPUs could simply be set to run at whatever clock speed the user chose on the motherboard, making it trivial to relabel a CPU and sell it as a faster grade than it was originally intended. These relabeled CPUs were not always stable, being overclocked and not tested properly, and this was damaging to AMD's reputation. Although the Athlon was multiplier locked, crafty enthusiasts eventually discovered that a connector on the PCB of the cartridge could control the multiplier. Eventually a product called the "Goldfingers device" was created that could unlock the CPU, named after the gold connector pads on the processor board that it attached to.
In commercial terms, the Athlon "Classic" was an enormous success—not just because of its own merits, but also because Intel endured a series of major production, design, and quality control issues at this time. In particular, Intel's transition to the 180 nm production process, starting in late 1999 and running through to mid-2000, suffered delays. There was a shortage of Pentium III parts. In contrast, AMD enjoyed a remarkably smooth process transition and had ample supplies available, causing Athlon sales to become quite strong.
The Argon-based Athlon contained 22 million transistors and measured 184 mm2. It was fabricated by AMD in a slightly modified version of their CS44E process, a 0.25 μm complementary metal–oxide–semiconductor process with six levels of aluminium interconnect. "Pluto" and "Orion" Athlons were fabricated in a 0.18 μm process.
;Specifications
The second generation Athlon, the Thunderbird, debuted on June 5, 2000. This version of the Athlon shipped in a more traditional pin-grid array format that plugged into a socket on the motherboard. It was sold at speeds ranging from 600 MHz to 1.4 GHz. The major difference, however, was cache design. Just as Intel had done when they replaced the old Katmai-based Pentium III with the much faster Coppermine-based Pentium III, AMD replaced the 512 KB external reduced-speed cache of the Athlon Classic with 256 KB of on-chip, full-speed exclusive cache. As a general rule, more cache improves performance, but faster cache improves it further still.
AMD changed cache design significantly with the Thunderbird core. With the older Athlon CPUs, the CPU caching was of an inclusive design where data from the L1 is duplicated in the L2 cache. Thunderbird moved to an exclusive design where the L1 cache's contents are not duplicated in the L2. This increases total cache size of the processor and effectively makes caching behave as if there is a very large L1 cache with a slower region and a very fast region. Because of Athlon's very large L1 cache and the exclusive design, which turns the L2 cache into basically a "victim cache", the need for high L2 performance and size was lessened. AMD kept the 64-bit L2 cache data bus from the older Athlons, as a result, and allowed it to have a relatively high latency. A simpler L2 cache reduced the possibility of the L2 cache causing clock scaling and yield issues. Still, instead of the 2-way associative scheme used in older Athlons, Thunderbird did move to a more efficient 16-way associative layout.
The Thunderbird was AMD's most successful product since the Am386DX-40 ten years earlier. Mainboard designs had improved considerably by this time, and the initial trickle of Athlon mainboard makers had swollen to include every major manufacturer. AMD's new fab in Dresden came online, allowing further production increases, and the process technology was improved by a switch to copper interconnects. In October 2000, the Athlon "C" was introduced, raising the mainboard front-side bus speed from 100 MHz to 133 MHz and providing roughly 10% extra performance per clock over the "B" model Thunderbird.
;Specifications

Palomino

AMD released the third-generation Athlon, code-named "Palomino", on October 9, 2001 as the Athlon XP. The "XP" suffix is interpreted to mean extended performance and also as an unofficial reference to Microsoft Windows XP. The Athlon XP was marketed using a PR system, which compared its relative performance to an Athlon utilizing the earlier "Thunderbird" core. Athlon XP launched at speeds between 1.33 GHz and 1.53 GHz, giving AMD the x86 performance lead with the 1800+ model. Less than a month later, it enhanced that lead with the release of the 1600 MHz 1900+, and subsequent 1.67 GHz Athlon XP 2000+ in January 2002.
Palomino was the first K7 core to include the full SSE instruction set from the Intel Pentium III, as well as AMD's 3DNow! Professional. It is roughly 10% faster than Thunderbird at the same clock speed, thanks in part to the new SIMD functionality and to several additional improvements. The core has enhancements to the K7's TLB architecture and added a hardware data prefetch mechanism to take better advantage of available memory bandwidth. Palomino was also the first socketed Athlon officially supporting dual processing, with chips certified for that purpose branded as the Athlon MP. According to articles posted on HardwareZone, it was possible to mod the Athlon XP to function as an MP by connecting some fuses on the OPGA, although results varied with the motherboard used.
Changes in core layout also resulted in Palomino being more frugal with its electrical demands, consuming approximately 20% less power than its predecessor, and thus reducing heat output comparatively as well. While the preceding Athlon "Thunderbird" was capable of clock speeds exceeding 1400 MHz, the power and thermal considerations required to reach those speeds would have made it increasingly impractical as a marketable product. Thus, Palomino's goals of lowered power consumption allowed AMD to increase performance within a reasonable power envelope. Palomino's design also allowed AMD to continue using the same 180 nm manufacturing process node and core voltages as Thunderbird.
The Palomino core debuted earlier in the mobile market—branded as Mobile Athlon 4 with the codename "Corvette". It distinctively used a ceramic interposer much like the Thunderbird instead of the organic pin grid array package used on all later Palomino processors.
;Specifications
The fourth-generation of Athlon was introduced with the Thoroughbred core, and was released on June 10, 2002 at 1.8 GHz. The "Thoroughbred" core marked AMD's first production 130 nm silicon, and gave a significant reduction in die size compared to its 180 nm predecessor.
There came to be two steppings of this core commonly referred to as Tbred-A and Tbred-B. The initial version was mostly a direct die shrink of the preceding Palomino core with minimal design changes, and demonstrated that AMD had successfully transitioned to a 130 nm process with production ready yields. However, while successful in reducing the production cost per processor, the unmodified Palomino design did not demonstrate the expected reduction in heat and clock scalability usually seen when a processor design is moved to a smaller process. As a result, AMD was not able to increase Thoroughbred-A clock speeds much above those of the Palomino it was meant to replace. Tbred-A was only sold in versions from 1333 MHz to 1800 MHz, and mostly only to displace existing speeds of the more production-costly Palomino from AMD's lineup.
;Thoroughbred B
AMD thus reworked the Thoroughbred's design to better match the process node on which it was produced, creating a revised core that then became known as Thoroughbred-B. A significant aspect of this redesign was the addition of a ninth "metal layer" to the already quite complex eight-layered Thoroughbred-A. For comparison, the competing Pentium 4 Northwood only utilized six, and its successor Prescott seven layers. While the addition of more layers itself does not improve performance, it gives more flexibility for chip designers routing electrical pathways within a chip, and importantly for the Thoroughbred core, more flexibility in working around logic and power bottlenecks preventing the processor from attaining higher clock speeds. The resulting Tbred-B offered a startling improvement in headroom over the Tbred-A, which made it very popular for overclocking. The Tbred-A often struggled to reach clock speeds above 1.9 GHz, while the Tbred-B often could easily reach 2.3 GHz and above.
The Thoroughbred line received an increased front side bus clock during its lifetime, from 133 MHz to 166 MHz improving the processor's ability to access memory and I/O efficiency, and resulted in improved per-clock performance. AMD shifted their PR rating scheme accordingly, making lower clock speeds equate to higher PR ratings.
The Thoroughbred-B was the direct basis for its successor—the Tbred-B with an additional 256 KB of L2 cache became the Barton core.
;Specifications
Fifth-generation Athlon Barton-core processors released in early 2003 with PR of 2500+, 2600+, 2800+, 3000+, and 3200+. While not operating at higher clock rates than Thoroughbred-core processors, they were marked with higher PR by featuring an increased 512 KB L2 cache; later models additionally supported an increased 200 MHz front side bus. The Thorton core was a later variant of the Barton with half of the L2 cache disabled, and thus was functionally identical to the Thoroughbred-B core. The name Thorton is a portmanteau of Thoroughbred and Barton.
By the time of Barton's release, the Northwood-based Pentium 4 had become more than competitive with AMD's processors. Unfortunately for AMD, a simple increase in size of the L2 cache to 512 KB did not have nearly the same impact as it did for Intel's Pentium 4 line, as the Athlon architecture was not nearly as cache-constrained as the Pentium 4. The Athlon's exclusive-cache architecture and shorter pipeline made it less sensitive to L2 cache size, and the Barton only saw an increase of several percent gained in per-clock performance over the Thoroughbred-B it was derived from. While the increased performance was welcome, it was not sufficient to overtake the Pentium 4 line in overall performance. The PR also became somewhat inaccurate because some Barton models with lower clock rates were being given higher PR than higher-clocked Thoroughbred processors. Instances where a computational task did not benefit more from the additional cache to make up for the loss in raw clock speed created situations where a lower rated Thoroughbred would outperform a higher-rated Barton.
The Barton was also used to officially introduce a higher 400 MT/s bus clock for the Socket A platform, which was used to gain some Barton models more efficiency. However, it was clear by this time that Intel's quad-pumped bus was scaling well above AMD's double-pumped EV6 bus. The 800 MT/s bus used by many later Pentium 4 processors was well out of the Athlon XP's reach. In order to reach the same bandwidth levels, the Athlon XP's bus would have to be clocked at levels simply unreachable.
By this point, the four-year-old Athlon EV6 bus architecture had scaled to its limit. To maintain or exceed the performance of Intel's newer processors would require a significant redesign. The K7 derived Athlons were replaced in September 2003 by the Athlon 64 family, which featured an on-chip memory controller and a completely new HyperTransport bus to replace EV6.
;Specifications:
Barton
Thorton
A Mobile Athlon XPs using a given core is physically identical to the equivalent desktop Athlon XPs counterpart, only differing by the configuration used to achieve a given performance level. Processors are usually binned and selected to become a mobile processor by their ability run a given processor speed while supplied with a lower voltage. This results in lower power consumption, longer battery life, and reduced heat over using a normal desktop part. Additionally Mobile XPs feature not being multiplier-locked and generally higher-rated maximum operating temperatures, requirements intended for better operation within the tight thermal constraints within a notebook PC—but also making them attractive for overclocking.
The Athlon XP-M replaced the older Mobile Athlon 4 based on the Palomino core, with the Athlon XP-M using the newer Thoroughbred and Barton cores. The Athlon XP-M was also offered in a compact microPGA socket 563 version for space constrained applications as an alternative to the larger Socket A.
Like their mobile K6-2+/III+ predecessors, the CPUs were capable of dynamic clock adjustment for power optimization, and also was the reason for the unlocked multiplier. When the system is idle, the CPU clocks itself down via lower bus multiplier and selects a lower voltage. When a program demands more computational resources, the CPU quickly returns to an intermediate or maximum speed with appropriate voltage to meet the demand. This technology was marketed as "PowerNow!" and was similar to Intel's SpeedStep power saving technique. The feature was controlled by the CPU, motherboard BIOS, and operating system. AMD later renamed the technology to Cool'n'Quiet on their K8-based CPUs, and introduced it for use on desktop PCs as well.
Athlon XP-Ms were popular with desktop overclockers, as well as underclockers. The lower voltage requirement and higher heat rating selected CPUs that were essentially "cherry picked" from the manufacturing line. Being some of the best cores "off the line", these CPUs typically overclocked more reliably than their desktop-headed counterparts. Also, the fact that they were not locked to a single multiplier was a significant simplification in the overclocking process. Some Barton core Athlon XP-Ms have been successfully overclocked as high as 3.1 GHz.
The chips were also liked for their undervolting ability. Undervolting is a process of determining the lowest voltage at which a CPU can remain stable at a given clock speed. As Athlon XP-M CPUs were already rated running lower voltages than their desktop siblings, it was a better starting point for lowering voltage even further. A popular application was use in home theater PC systems due to high performance and low heat output resultant from low Vcore settings.
Besides not being multiplier locked, XP-Ms curiously were not disabled from multi processor operation. Thus they could be used in place of the more expensive Athlon MP in dual Socket A motherboards. Since those boards generally lacked multiplier and voltage adjustment, and normally only supported 133 MHz FSB, adjustments would still be needed for full speed operation. One method of modification known as wire-modding involves connecting the appropriate CPU pins on the CPU socket with small lengths of wire to select the appropriate multiplier. A typical overclock of a mobile 2500+ CPU to 2.26 GHz with 17x multiplier would result in being faster than highest official 2800+ MP CPU running at 2.13 GHz.

Zen-based Athlon

The Zen-based Athlon with Radeon graphics processors was launched in September 2018 with the Athlon 200GE.
On November 19, 2019, AMD released the Athlon 3000G, with a higher 3.5 GHz core clock and 1100MHz graphics clock compared to the Athlon 200GE also with two cores. The main functional difference between the 200GE was the Athlon 3000G’s unlocked multiplier.
Specifications , Picasso''
The fastest supercomputers based on Athlon MP: