1992: Fault-Tolerance Software

Photo montage. Traditionally, large-scale, complex telephone switching operations have used mainframes to ensure the reliability of telecommunications systems. But in 1992, an AT&T team developed a software alternative to large mainframe backups. For example, fault tolerance software detects faults that threaten to shut down a switching system which allots calls to individual phones, and takes corrective action to keep the switching system running.

Much like a dose of cough suppressant to an aching human, fault tolerance software allows a telecommunications system to "tolerate" hardware faults, and some of the design and coding faults built into them. The cough or bug may not be completely cured, but the system can still function effectively. Fault tolerance software detects faults that are, for example, about to shut down a switching system that is allotting calls to individual phones, and takes corrective action. Easily embedded in any application, this vigilant software detects failures, backs up and recovers data, eases communications among processes, copies key files for backup, and automatically restarts and restores crashed programs in seconds. In addition to telecommunications, fault tolerance software can be found in a number of systems where avoiding service outages is crucial, such as on Wall Street and in automatic teller machines.