Voder


The Bell Telephone Laboratory's Voder was the first attempt to electronically synthesize human speech by breaking it down into its acoustic components. It was invented by Homer Dudley in 1937–1938 and developed on his earlier work on the vocoder. The quality of the speech was limited; however, it demonstrated the synthesis of the human voice, which became one component of the vocoder used in voice communications for security and to save bandwidth.
The Voder synthesized human speech by imitating the effects of the human vocal tract. The operator could select one of two basic sounds by using a wrist bar. A buzz tone generated by a relaxation oscillator produced the voiced vowels and nasal sounds, with the pitch controlled by a foot pedal. A hissing noise produced by a white noise tube created the sibilants. These initial sounds were passed through a bank of 10 band pass filters that were selected by keys; their outputs were combined, amplified and fed to a loudspeaker. The filters were controlled by a set of keys and a foot pedal to convert the hisses and tones into vowels, consonants, and inflections. Additional special keys were provided to make the plosive sounds such as "p" or "d", and the affrictive sounds of the "j" in "jaw" and the "ch" in "cheese". This was a complex machine to operate. After months of practice, a trained operator could produce recognizable speech.
Performances on the Voder were featured at the 1939 New York World's Fair and in San Francisco. Twenty operators were trained by Helen Harper, particularly noted for her skill with the machine. The machine said the words "Good afternoon, radio audience."
The Voder was developed from research into compression schemes for transmission of voice on copper wires and for voice encryption. In 1948, Werner Meyer-Eppler recognized the capability of the Voder machine to generate electronic music, as described in Dudley's patent.
Whereas the vocoder analyzes speech, transforms it into electronically transmitted information, and recreates it, the voder generates synthesized speech by means of a console with fifteen touch-sensitive keys and a pedal. It basically consists of the "second half" of the vocoder, but with manual filter controls, and requires a highly trained operator.