The Therac-25 was a computer-controlled radiation therapy machine produced by Atomic Energy of Canada Limited in 1982 after the Therac-6 and Therac-20 units. It was involved in at least six accidents between 1985 and 1987, in which patients were given massive overdoses of radiation. Because of concurrent programming errors, it sometimes gave its patients radiation doses that were hundreds of times greater than normal, resulting in death or serious injury. These accidents highlighted the dangers of software control of safety-critical systems, and they have become a standard case study in health informatics and software engineering. Additionally the overconfidence of the engineers and lack of proper due diligence to resolve reported software bugs are highlighted as an extreme case where the engineers' overconfidence in their initial work and failure to believe the end users' claims caused drastic repercussions.
Design
The machine offered two modes of radiation therapy:
Direct electron-beam therapy, in which a narrow low-current beam of high-energy electrons was scanned over the treatment area by magnets;
Megavolt X-ray therapy, which delivered a fixed width beam of X-rays, produced by colliding a narrow 100-times higher current beam of 25 MeV electrons with a target, then passing the emitted X-rays through both a flattening filter and a collimator.
It also included a "field light" mode, which allowed the patient to be correctly positioned by illuminating the treatment area with visible light.
Problem description
The six documented accidents occurred when the high-current electron beam generated in X-ray mode was delivered directly to patients. Two software faults were to blame. One, when the operator incorrectly selected X-ray mode before quickly changing to electron mode, which allowed the electron beam to be set for X-ray mode without the X-ray target being in place. A second fault allowed the electron beam to activate during field-light mode, during which no beam scanner was active or target was in place. Previous models had hardware interlocks to prevent such faults, but the Therac-25 had removed them, depending instead on software checks for safety. The high-current electron beam struck the patients with approximately 100 times the intended dose of radiation, and over a narrower area, delivering a potentially lethal dose of beta radiation. The feeling was described by patient Ray Cox as "an intense electric shock", causing him to scream and run out of the treatment room. Several days later, radiation burns appeared, and the patients showed the symptoms of radiation poisoning; in three cases, the injured patients later died as a result of the overdose.
Root causes
A commission attributed the primary cause to general poor software design and development practices rather than single-out specific coding errors. In particular, the software was designed so that it was realistically impossible to test it in a clean automated way. Researchers who investigated the accidents found several contributing causes. These included the following institutional causes:
AECL did not have the software code independently reviewed and chose to rely on in-house code, including the operating system.
AECL did not consider the design of the software during its assessment of how the machine might produce the desired results and what failure modes existed, focusing purely on hardware and asserting that the software was free of bugs.
Machine operators were reassured by AECL personnel that overdoses were impossible, leading them to dismiss the Therac-25 as the potential cause of many incidents.
AECL had never tested the Therac-25 with the combination of software and hardware until it was assembled at the hospital.
The researchers also found several engineering issues:
Several error messages merely displayed the word "MALFUNCTION" followed by a number from 1 to 64. The user manual did not explain or even address the error codes, nor give any indication that these errors could pose a threat to patient safety.
The system distinguished between errors that halted the machine, requiring a restart, and errors which merely paused the machine select 25 MeV photon mode, then use "cursor up" to edit the input to "E" to select 25 MeV Electron mode, then "Enter", all within eight seconds of the first keypress, well within the capability of an experienced user of the machine.
The design did not have any hardware interlocks to prevent the electron-beam from operating in its high-energy mode without the target in place.
The engineer had reused software from the Therac-6 and Therac-20, which used hardware interlocks that masked their software defects. Those hardware safeties had no way of reporting that they had been triggered, so preexisting errors were overlooked.
The hardware provided no way for the software to verify that sensors were working correctly. The table-position system was the first implicated in Therac-25's failures; the manufacturer revised it with redundant switches to cross-check their operation.
The software set a flag variable by incrementing it, rather than by setting it to a fixed non-zero value. Occasionally an arithmetic overflow occurred, causing the flag to return to zero and the software to bypass safety checks.
Leveson notes that a lesson to be drawn from the incident is to not assume that reused software is safe: "A naive assumption is often made that reusing software or using commercial off-the-shelf software will increase safety because the software will have been exercised extensively. Reusing software modules does not guarantee safety in the new system to which they are transferred..." This blind faith in poorly understood software coded paradigms is known as cargo cult programming. In response to incidents like those associated with Therac-25, the IEC 62304 standard was created, which introduces development life cycle standards for medical device software and specific guidance on using software of unknown pedigree.