English orthography


English orthography is the system of writing conventions used to represent spoken English in written form that allows readers to connect spelling to sound to meaning.
Like the orthography of most world languages, English orthography has a broad degree of standardisation. However, unlike with most languages, there are multiple ways to spell nearly every phoneme, and most letters also have multiple pronunciations depending on their position in a word and the context.
Several orthographic mistakes are common even among native speakers. This is mainly due to the large number of words that have been borrowed from a large number of other languages throughout the history of the English language without successful attempts at complete spelling reforms.
Most of the spelling conventions in Modern English were derived from the phonetic spelling of a variety of Middle English, and generally do not reflect the sound changes that have occurred since the late 15th century.
Despite the various English dialects spoken from country to country and within different regions of the same country, there are only slight regional variations in English orthography, the two most recognised variations being British and American spelling, and its overall uniformity helps facilitate international communication. On the other hand, it also adds to the discrepancy between the way English is written and spoken in any given location.

Function of the letters

Phonemic representation

in English orthography usually represent a particular sound. For example, the word cat consists of three letters,, and, in which represents the sound, the sound, and the sound.
Sequences of letters may perform this role as well as single letters. Thus, in the word ship, the digraph represents the sound. In the word ditch, the trigraph represents the sound.
Less commonly, a single letter can represent multiple successive sounds. The most common example is the letter, which normally represents the consonant cluster .
The same letter may be pronounced in different ways when it occurs in different positions within a word. For instance, the digraph represents the sound at the end of some words, such as rough, though not in others. At the beginning of syllables, the digraph is pronounced, as in the word ghost. Conversely, the digraph is never pronounced in syllable onsets other than in inflected forms, and is almost never pronounced in syllable codas.
Some words contain silent letters, which do not represent any sound in modern English pronunciation. Examples include the in doubt, debt, dumb, etc., the in psychology and pneumatic, as mentioned above in numerous words such as though, daughter, night, brought, and the commonly encountered silent .

Word origin

Another type of spelling characteristic is related to word origin. For example, when representing a vowel, the letter represents the sound in some words borrowed from Greek, whereas the letter usually representing this sound in non-Greek words is the letter. Thus, the word myth is of Greek origin, while pith is a Germanic word.
Other examples include pronounced , and pronounced – the use of these spellings for these sounds often mark words that have been borrowed from Greek.
Some researchers, such as Brengelman, have suggested that, in addition to this marking of word origin, these spellings indicate a more formal level of style or register in a given text, although Rollings finds this point to be exaggerated as there would be many exceptions where a word with one of these spellings, such as for , could occur in an informal text.

Homophone differentiation

Spelling may also be useful to distinguish between homophones, although in most cases the reason for the difference is historical and was not introduced for the purpose of making a distinction.
For example, the words heir and air are pronounced identically in most dialects, but in writing they are distinguished from each other by their different spellings.
Another example is the pair of homophones pain and pane, where both are pronounced but have two different spellings of the vowel. Often this is because of the historical pronunciation of each word where, over time, two separate sounds become the same but the different spellings remain: pain used to be pronounced as, with a diphthong, and pane as, but the diphthong merged with the long vowel in pane, making pain and pane homophones. Later became a diphthong.
In written language, this may help to resolve potential ambiguities that would arise otherwise.
Nevertheless, many homophones remain that are unresolved by spelling.

Marking sound changes in other letters

Some letters in English provide information about the pronunciation of other letters in the word. Rollings uses the term "markers" for such letters. Letters may mark different types of information.
For instance, the letter in the word cottage indicates that the preceding is pronounced, rather than the more common value of in word-final position as the sound, such as in tag.
The letter also often marks an altered pronunciation of a preceding vowel. In the pair ban and bane, the of ban has the value, whereas the of bane is marked by the as having the value. In this context, the is not pronounced, and is referred to as "silent e".
A single letter may even fill multiple pronunciation-marking roles simultaneously. For example, in the word wage, the marks not only the change of the from to, but also of the from to. In the word vague, the marks the long a sound, but the keeps the g hard rather than soft.
Doubled consonants usually indicate that the preceding vowel is pronounced short. For example, the doubled in latter indicates that the is pronounced, while the single of later gives. Doubled consonants only indicate any lengthening or gemination of the consonant sound itself when they come from different morphemes, as with the in unnatural = un+natural.

Multiple functionality

A given letter may have dual functions. For example, the letter in the word cinema has a sound-representing function and a pronunciation-marking function.

Underlying representation

Like many other alphabetic orthographies, English spelling does not represent non-contrastive phonetic sounds.
Although the letter is pronounced by some speakers with aspiration at the beginning of words, this is never indicated in the spelling, and, indeed, this phonetic detail is probably not noticeable to the average native speaker not trained in phonetics.
However, unlike some orthographies, English orthography often represents a very abstract underlying representation of English words.
In these cases, a given morpheme has a fixed spelling even though it is pronounced differently in different words. An example is the past tense suffix -, which may be pronounced variously as,, or . As it happens, these different pronunciations of - can be predicted by a few phonological rules, but that is not the reason why its spelling is fixed.
Another example involves the vowel differences in several related words. For instance, the word photographer is derived from the word photograph by adding the derivational suffix -. When this suffix is added, the vowel pronunciations change largely owing to the moveable stress:
Other examples of this type are the - suffix. See also: Trisyllabic laxing.
Another such class of words includes sign and bomb with "silent" letters and, respectively. However, in the related words signature and bombard these letters are pronounced and, respectively. Here it could be argued that the underlying representation of sign and bomb is || and ||, in which the underlying || and || are only pronounced in the surface forms when followed by certain suffixes. Otherwise, the || and || are not realised in the surface pronunciation. In these cases, the orthography indicates the underlying consonants that are present in certain words but are absent in other related words.
Other examples include the in fast and fasten, and the in heir and inherit.
Another example includes words like mean and meant. Here the vowel spelling is pronounced differently in the two related words. Thus, again the orthography uses only a single spelling that corresponds to the single morphemic form rather than to the surface phonological form.
English orthography does not always provide an underlying representation; sometimes it provides an intermediate representation between the underlying form and the surface pronunciation. This is the case with the spelling of the regular plural morpheme, which is written as either - or -. Here the spelling - is pronounced either or while - is usually pronounced . Thus, there are two different spellings that correspond to the single underlying representation || of the plural suffix and the three surface forms. The spelling indicates the insertion of before the in the spelling -, but does not indicate the devoiced distinctly from the unaffected in the spelling -.
The abstract representation of words as indicated by the orthography can be considered advantageous since it makes etymological relationships more apparent to English readers. This makes writing English more complex, but arguably makes reading English more efficient. However, very abstract underlying representations, such as that of Chomsky & Halle or of underspecification theories, are sometimes considered too abstract to accurately reflect the communicative competence of native speakers. Followers of these arguments believe the less abstract surface forms are more "psychologically real" and thus more useful in terms of pedagogy.

Diacritics

English has some words that can be written with accent marks. These words have mostly been imported from other languages, usually French. As imported words become increasingly naturalised, there is an increasing tendency to omit the accent marks, even in formal writing. For example, words such as rôle and hôtel were first seen with accents when they were borrowed into English, but now the accent is almost never used. The words were originally considered foreign – and some people considered that English alternatives were preferable – but today their foreign origin is largely forgotten. Words most likely to retain the accent are those atypical of English morphology and therefore still perceived as slightly foreign. For example, café and pâté both have a pronounced final e, which would otherwise be silent under the normal English pronunciation rules. However café is now sometimes facetiously pronounced "caff", while in pâté, the acute accent is helpful to distinguish it from pate.
Further examples of words sometimes retaining diacritics when used in English are: Ångström, appliqué, attaché, blasé, bric-à-brac, Brötchen, cliché, crème, crêpe, façade, fiancé, flambé, naïve, naïveté, , papier-mâché, passé, piñata, protégé, résumé, risqué, über-, voilà. Italics, with appropriate accents, are generally applied to foreign terms that are uncommonly used in or have not been assimilated into English: for example, , crème brûlée, pièce de résistance,, über, and.
It was formerly common in American English to use a diaeresis mark to indicate a hiatus: for example, coöperate, daïs, reëlect. The New Yorker and Technology Review magazines still use it for this purpose, even though it is increasingly rare in modern English. Nowadays the diaeresis is normally left out, or a hyphen is used if the hiatus is between two morphemes in a compound word. It is, however, still common in monomorphemic loanwords such as naïve and Noël.
Written accents are also used occasionally in poetry and scripts for dramatic performances to indicate that a certain normally unstressed syllable in a word should be stressed for dramatic effect, or to keep with the metre of the poetry. This use is frequently seen in archaic and pseudoarchaic writings with the -ed suffix, to indicate that the e should be fully pronounced, as with cursèd.
The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous or nonstandard for metrical reasons ; the grave to indicate that an ordinarily silent or elided syllable is pronounced.

Ligatures

In certain older texts, the use of the ligatures æ and œ is common in words such as archæology, diarrhœa, and encyclopædia. Such words have Latin or Greek origin. Nowadays, the ligatures have been generally replaced in British English by the separated digraph ae and oe ; but usually economy, ecology, and in American English by e. In some cases, usage may vary; for instance, both encyclopedia and encyclopaedia are current in the UK.

Phonic irregularities

Partly because English has never had any official regulating authority for spelling, such as the Spanish Real Academia Española, the French Académie française, and the German Rat für deutsche Rechtschreibung, English spelling, compared to many other languages, is quite irregular and complex. Although French, among other languages, presents a similar degree of difficulty when encoding, English is more difficult when decoding, as there are clearly many more possible pronunciations of a group of letters. For example, in French, the sound, can be spelled ou, ous, out, or oux, but the pronunciation of each of those sequences is always the same. In English, the sound can be spelled in up to 18 different ways, including oo, u, ui, ue, o, oe, ou, ough, and ew, but all of these have other pronunciations as well. The Spelling-to-sound correspondences section below presents a summary of pronunciation variations. Thus, in unfamiliar words and proper nouns the pronunciation of some sequences, ough being the prime example, is unpredictable to even educated native English speakers.

Spelling irregularities

Attempts to regularise or reform the spelling of English have usually failed. However, Noah Webster popularised more phonetic spellings in the United States, such as flavor for British flavour, fiber for fibre, defense for defence, analyze for analyse, catalog for catalogue and so forth. These spellings already existed as alternatives, but Webster's dictionaries helped make them standard in the US. See American and British English spelling differences for details.
Besides the quirks the English spelling system has inherited from its past, there are other idiosyncrasies in spelling that make it tricky to learn. English contains, depending on dialect, 24–27 separate consonant phonemes and 13–20 vowels. However, there are only 26 letters in the modern English alphabet, so there is not a one-to-one correspondence between letters and sounds. Many sounds are spelled using different letters or multiple letters, and for those words whose pronunciation is predictable from the spelling, the sounds denoted by the letters depend on the surrounding letters. For example, the digraph th represents two different sounds , and the voiceless alveolar sibilant can be represented by the letters s and c.
It is, however, not the shortage of letters which makes English spelling irregular. Its irregularities are caused mainly by the use of many different spellings for some of its sounds, such as the sounds /uː/, /iː/ and /oʊ/, and the use of identical sequences for spelling different sounds.
Furthermore, English no longer makes any attempt to anglicise the spellings of loanwords, but preserves the foreign spellings, even when they employ exotic conventions like the Polish cz in Czech or the Norwegian fj in fjord. In early Middle English, until roughly 1400, most imports from French were respelled according to English rules. Instead of loans being respelled to conform to English spelling standards, sometimes the pronunciation changes as a result of pressure from the spelling. One example of this is the word ski, which was adopted from Norwegian in the mid-18th century, although it did not become common until 1900. It used to be pronounced, which is similar to the Norwegian pronunciation, but the increasing popularity of the sport after the middle of the 20th century helped the pronunciation replace it.
There was also a period when the spelling of a small number of words was altered in what is now regarded as a misguided attempt to make them conform to what were perceived to be the etymological origins of the words. For example, the letter b was added to debt in an attempt to link it to the Latin debitum, and the letter s in island is a misplaced attempt to link it to Latin insula instead of the Old English word īġland, which is the true origin of the English word. The letter p in ptarmigan has no etymological justification whatsoever, only seeking to invoke Greek despite being a Gaelic word.
The spelling of English continues to evolve. Many loanwords come from languages where the pronunciation of vowels corresponds to the way they were pronounced in Old English, which is similar to the Italian or Spanish pronunciation of the vowels, and is the value the vowel symbols and have in the International Phonetic Alphabet. As a result, there is a somewhat regular system of pronouncing "foreign" words in English, and some borrowed words have had their spelling changed to conform to this system. For example, Hindu used to be spelled Hindoo, and the name Maria used to be pronounced like the name Mariah, but was changed to conform to this system.
Commercial advertisers have also had an effect on English spelling. They introduced new or simplified spellings like lite instead of light, thru instead of through, smokey instead of smoky, and rucsac instead of rucksack. The spellings of personal names have also been a source of spelling innovations: diminutive versions of women's names that sound the same as men's names have been spelled differently: Nikki and Nicky, Toni and Tony, Jo and Joe. The differentiation in between names that are spelled differently but have the same phonetic sound may come from modernization or different countries of origin. For example, Isabelle and Isabel sound the same but are spelled differently; these versions are from France and Spain respectively.
As examples of the idiosyncratic nature of English spelling, the combination ou can be pronounced in at least nine different ways: in out, in soul, in soup, in touch, in could, in four, in journal, in cough, and in famous. See the section Spelling-to-sound correspondences for a comprehensive treatment. In the other direction, the vowel sound in me can be spelled in at least 18 or 21 different ways: be, ski, bologna, algae, quay, beach, bee, deceit, people, key, volleyed, field, amoeba, chamois, dengue, beguine, guyot, and city. See the section Sound-to-spelling correspondences below.
Sometimes everyday speakers of English change a counterintuitive pronunciation simply because it is counterintuitive. Changes like this are not usually seen as "standard", but can become standard if used enough. An example is the word miniscule, which still competes with its original spelling of minuscule, though this might also be because of analogy with the word mini.

History

Inconsistencies and irregularities in English pronunciation and spelling have gradually increased in number throughout the history of the English language. There are a number of contributing factors. First, gradual changes in pronunciation, such as the Great Vowel Shift, account for a tremendous number of irregularities. Second, relatively recent loan words from other languages generally carry their original spellings, which are often not phonetic in English. The Romanization of languages using alphabets derived from the Latin alphabet has further complicated this problem, for example when pronouncing Chinese proper names.
The regular spelling system of Old English was swept away by the Norman Conquest, and English itself was supplanted in some spheres by Norman French for three centuries, eventually emerging with its spelling much influenced by French. English had also borrowed large numbers of words from French, which naturally kept their French spellings as there was no reason or mechanism to change them. The spelling of Middle English, such as in the writings of Geoffrey Chaucer, is very irregular and inconsistent, with the same word being spelled in different ways, sometimes even in the same sentence. However, these were generally much better guides to the then pronunciation than modern English spelling is.
For example, the sound, normally written u, is spelled with an o in son, love, come, etc., due to Norman spelling conventions which prohibited writing u before v, m, n due to the graphical confusion that would result.. Similarly, spelling conventions also prohibited final v. Hence the identical spellings of the three different vowel sounds in love, grove and prove are due to ambiguity in the Middle English spelling system, not sound change.
In 1417 Henry V began using English for official correspondence, which had no standardised spelling, instead of Latin or French which had standardised spelling. For example, for the word right, Latin had one spelling, rectus; Old French as used in English law had six spellings; Middle English had 77 spellings. English, now used as the official replacement language for Latin and French, motivated writers to standardise spellings, an effort which lasted about 500 years.
There was also a series of linguistic sound changes towards the end of this period, including the Great Vowel Shift, which resulted in the i in mine, for example, changing from a pure vowel to a diphthong. These changes for the most part did not detract from the rule-governed nature of the spelling system; but in some cases they introduced confusing inconsistencies, like the well-known example of the many pronunciations of ough. Most of these changes happened before the arrival of printing in England. However, the arrival of the printing press froze the current system, rather than providing the impetus for a realignment of spelling with pronunciation. Furthermore, it introduced further inconsistencies, partly because of the use of typesetters trained abroad, particularly in the Low Countries. For example, the h in ghost was influenced by Dutch. The addition and deletion of a silent e at the ends of words was also sometimes used to make the right-hand margin line up more neatly.
By the time dictionaries were introduced in the mid 17th century, the spelling system of English had started to stabilise. By the 19th century, most words had set spellings, though it took some time before they diffused throughout the English-speaking world. In The Mill on the Floss, English novelist George Eliot satirised the attitude of the English rural gentry of the 1820s towards orthography:
The modern English spelling system, with its national variants, spread together with the expansion of public education later in the 19th century.

"Ough" words

The most notorious group of letters in the English language, the ough tetragraph, can be pronounced in at least ten different ways, six of which are illustrated in the construct, Though the tough cough and hiccough plough him through, which is quoted by Robert A. Heinlein in The Door into Summer to illustrate the difficulties facing automated speech transcription and reading. The "ough" tetragraph, usually representing a pronunciation of roughly, is in fact a word in its own right, though rarely known or used: an exclamation of disgust similar to ugh. The following are recorded throughout English languages of the world:
The following pronunciations are found in uncommon single words:
The place name Loughborough uses two different pronunciations of ough: the first ough has the sound as in cuff and the second rhymes with thorough.

Spelling patterns

Spelling-to-sound correspondences

Vowels

In a generative approach to English spelling, Rollings identifies twenty main orthographic vowels of stressed syllables that are grouped into four main categories: "Lax", "Tense", "Heavy", "Tense-R".
For instance, the letter a can represent the lax vowel, tense, heavy, or before |r|. Heavy and tense-r vowels are the respective lax and tense counterparts followed by the letter r.
Tense vowels are distinguished from lax vowels with a "silent" e letter that is added at the end of words. Thus, the letter a in hat is lax, but when the letter e is added in the word hate the letter a is tense. Similarly, heavy and tense-r vowels pattern together: the letters ar in car are heavy, the letters ar followed by silent e in the word care are. The letter u represents two different vowel patterns, one being, the other. There is no distinction between heavy and tense-r vowels with the letter o, and the letter u in the pattern does not have a heavy vowel member.
Besides silent e, another strategy for indicating tense and tense-r vowels, is the addition of another orthographic vowel forming a digraph. In this case, the first vowel is usually the main vowel while the second vowel is the "marking" vowel. For example, the word man has a lax a pronounced, but with the addition of i in the word main the a is marked as tense and pronounced. These two strategies produce words that are spelled differently but pronounced identically, as in mane, main and Maine. The use of two different strategies relates to the function of distinguishing between words that would otherwise be homonyms.
Besides the 20 basic vowel spellings, Rollings has a reduced vowel category and a miscellaneous category.

Combinations of vowel letters

To reduce dialectal difficulties, the sound values given here correspond to the conventions at. This table includes H, W and Y when they represent vowel sounds. If no information is given, it is assumed that the vowel is in a stressed syllable.
Deriving the pronunciation of an English word from its spelling requires not only a careful knowledge of the rules given below and their many exceptions, but also:
Notes:
† Nearly 80% of Americans pronounce "luxurious" with, while two thirds of British people use. Half the American speakers pronounce "luxury" as, the rest says
†† About half of both British and American speakers say, the other half says.

Combinations of vowel letters and "r"

Combinations of other consonant and vowel letters

SpellingMajor value
Examples of major valueMinor values
Examples of minor valueExceptions
ahblahbar mitzvah
alpal, talcum, algae, alpbald, falcon
alf
calf, halfalfalfa, malfeasance palfrey
/eɪ/ halfpenny
alkwalk, chalking, talkativealkaline, grimalkin balkanise
all
call, fallout, smaller
shall, callus, fallow

wallet, swallow
allow, dialled
marshmallow, pall-mall
almalms, balmy, calm, palmistry
palmate, salmonella, talmud
almanac, almost, instalment
salmon, halm
almond *
signalment
alt
alter, malt, salty, basalt
alto, shalt, saltation
altar, asphalt
gestalt
royalty, penalty
final -angearrange, change, mange, strangeflange, phalange melange
blancmange
orange
final -astechaste, lambaste, paste, tastecineaste, caste, pleonaste caste
namaste
unstressed ci- before a vowelspecial, graciousspecies
-cquacquaint, acquirelacquer, racquet
final -ed after /t/ or /d/loaded, waited
final -ed after a voiceless soundpiped, enserfed, snakedbiped, underfed naked
final -ed after a lenis soundlimbed, enisled, unfearedimbed, misled, infrared
eheh, prehniet, tempehyeh feh, keffiyeh
final -es after a fricativemazes, washes, axes, bases, piecesaxes, bases, feces, oases
unstressed ex- before vowel or hexist, examine, exhaustexhale
gu- before abilingual, guano, languageguard, guarantee
final -le after a non-l consonantlittle, tableorle, isle boucle
final -isleaisle, isle, enisle, lisle
final -nguetongue, harangue, meringuemerengue, gué dengue
oh, final or before a consonantoh, kohlrabi, ohm, pharaohdemijohn, johnny bohrium
matzoh
oldblindfold, older, boldscaffold, kobold, kolkhoz polka
olldollhouse, pollen, trolley, hollytollhouse, swollen, troller, wholly atoll, cholla
caroller, collide
olmolm, dolmenenrolment, holmium holm
ong
songstress, along, strong, wronger
congress, jongleur, bongo, conger





congeries, longevity, pongee
tonger, bong, dugong, tongs
longer, strongest, elongate
monger, humongous, mongrel
sponger, longe, spongy
among, tongue
ongoing, nongraded
congratulate, lemongrass
congeal, congestion
allonge congé
qu-queen, quickliquor, mosquito
final -quemosque, bisquemanque, risqué barbeque
pulque
final -re after a non-r consonanttimbre, acre, ogre, centre,
cadre, compadre, emigre
genre, oeuvre, fiacre
final -ron after a vowelneuron, moron, interferon, aileronbaron, heron, environ iron
chaperon
unstressed sci- before a vowelconscience, luscious, prosciuttosciatica, sciamachy, sciential conscientious, fasciated
omniscient, prescience
-sclecorpuscle, musclemascle
final -se after a vowel house, excuse, moose, anise, geeseprose, nose, tease, guise, compromise marchese
final -se after a vowel house, excuse, choose, arise, pleasegrouse, dose, lease, chase, promise
unstressed -si before a vowelvision, occasion, explosion, illusion
pension, controversial, compulsion
easier, enthusiasm, physiological
tarsier, Celsius
unstressed -ssi before a vowelmission, passion, Russia, sessionpotassium, dossier, messier
unstressed -sureleisure, treasuretonsure, censure
unstressed -ti before a vowelcautious, patient, inertia, initial, ration
question, Christian, suggestion
patios, consortia, fiftieth, courtier
ratios, minutia, initiate, negotiate
cation, cationic
equation
rentier
unstressed -turenature, picture

* According to the Longman Pronunciation Dictionary, 75% of Americans pronounce "almond" as.
† Where GA distinguishes between and in the letter combination ong, RP only has the vowel

Sound-to-spelling correspondences

The following table shows for each sound the various spelling patterns used to denote it, starting with the prototypical pattern followed by others in alphabetical order. Some of these patterns are very rare or unique. An ellipsis stands for an intervening consonant.

Consonants

Arranged in the order of the IPA consonant tables.
* In 2008, 61% of British people pronounced "diphthong" as, though phoneticians prefer.
** In 2008, 20% of Americans pronounced "thespian" as.
*** The majority of British people, and the great majority of younger ones, pronounce "crescent" as.
† In 2008, 64% of Americans and 39% of British people pronounce "February" as.
†† The majority of Americans, and the great majority of younger ones, pronounce "congratulate" as.

Vowels

Sorted more or less from close to open sounds in the vowel diagram.
† Identical to previous vowel in non-rhotic dialects like RP.

Orthographies of English-related languages

; Germanic languages
; Romance languages
;Celtic languages
;Historical languages
;Artificial languages