Natural-language generation

Natural-language generation is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations which might even be read out by a text-to-speech system.
Automated NLG can be compared to the process humans use when they turn ideas into writing or speech. Psycholinguists prefer the term language production for this process, which can also be described in mathematical terms, or modeled in a computer for psychological research. NLG systems can also be compared to translators of artificial computer languages, such as decompilers or transpilers, which also produce human-readable code generated from an intermediate representation. Human languages tend to be considerably more complex and allow for much more ambiguity and variety of expression than programming languages, which makes NLG more challenging.
NLG may be viewed as the opposite of natural-language understanding : whereas in natural-language understanding, the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. The practical considerations in building NLU vs. NLG systems are not symmetrical. NLU needs to deal with ambiguous or erroneous user input, whereas the ideas the system wants to express through NLG are generally known precisely. NLG needs to choose a specific, self-consistent textual representation from many potential representations, whereas NLU generally tries to produce a single, normalized representation of the idea expressed.
NLG has existed since ELIZA was developed in the mid 1960s, but commercial NLG technology has only recently become widely available. NLG techniques range from simple template-based systems like a mail merge that generates form letters, to systems that have a complex understanding of human grammar. NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.

Example

The Pollen Forecast for Scotland system is a simple example of a simple NLG system that could essentially be a template. This system takes as input six numbers, which give predicted pollen levels in different parts of Scotland. From these numbers, the system generates a short textual summary of pollen levels as its output.
For example, using the historical data for July 1, 2005, the software produces:

Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4.

In contrast, the actual forecast from this data was:

Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count.

Comparing these two illustrates some of the choices that NLG systems must make; these are further discussed below.

Stages

The process to generate text can be as simple as keeping a list of canned text that is copied and pasted, possibly linked with some glue text. The results may be satisfactory in simple domains such as horoscope machines or generators of personalised business letters. However, a sophisticated NLG system needs to include stages of planning and merging of information to enable the generation of text that looks natural and does not become repetitive. The typical stages of natural-language generation, as proposed by Dale and Reiter, are:
Content determination: Deciding what information to mention in the text.
For instance, in the pollen example above, deciding whether to explicitly mention that pollen
level is 7 in the south east.
Document structuring: Overall organisation of the information to convey. For example, deciding to
describe the areas with high pollen levels first, instead of the areas with low pollen levels.
Aggregation: Merging of similar sentences to improve readability and naturalness.
For instance, merging the two following sentences:

Grass pollen levels for Friday have increased from the moderate to high levels of yesterday and
Grass pollen levels will be around 6 to 7 across most parts of the country

into the following single sentence:

Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country.

Lexical choice: Putting words to the concepts. For example, deciding whether medium or moderate
should be used when describing a pollen level of 4.
Referring expression generation: Creating referring expressions that identify objects and regions. For example, deciding to use
in the Northern Isles and far northeast of mainland Scotland to refer to a certain region in Scotland.
This task also includes making decisions about pronouns and other types of
anaphora.
Realization: Creating the actual text, which should be correct
according to the rules of
syntax, morphology, and orthography. For example, using will be for the future
tense of to be.
An alternative approach to NLG is to use "end-to-end" machine learning to build a system, without having separate stages as above. In other words, we build an NLG system by training a machine learning algorithm on a large data set of input data and corresponding output texts. The end-to-end approach has perhaps been most successful in image captioning, that is automatically generating a textual caption for an image.

Applications

The popular media has paid the most attention to NLG systems which generate jokes, but from a commercial perspective, the most successful NLG applications
have been data-to-text systems which generate textual summaries of databases and data sets; these
systems usually perform data analysis as well as text generation. Research has shown that textual summaries can be more effective than graphs and other visuals for decision support, and that computer-generated texts can be superior to human-written texts.
The first commercial data-to-text systems produced weather forecasts from weather data. The earliest such system to be
deployed was FoG, which was used by Environment Canada to generate weather forecasts in French and English in the early 1990s. The success of FoG triggered other work, both research and commercial.
Recent applications include the UK Met Office's text-enhanced forecast.
Currently there is considerable commercial interest in using NLG to summarise financial and business data. Indeed, Gartner has said that NLG will become a standard feature of 90% of modern BI and analytics platforms. NLG is also being used commercially in automated journalism, chatbots, generating product descriptions for e-commerce sites, summarising medical records, and enhancing accessibility.
An example of an interactive use of NLG is the WYSIWYM framework. It stands for What you see is what you meant and allows users to see and manipulate the continuously rendered view of an underlying formal language document, thereby editing the formal language without learning it.
Content generation systems assist human writers and makes writing process more efficient and effective. A content generation tool based on web mining using search engines APIs has been built. The tool imitates the cut-and-paste writing scenario where a writer forms its content from various search results. Relevance verification is essential to filter out irrelevant search results; it is based on matching the parse tree of a query with the parse trees of candidate answers. In an alternative approach, a high-level structure of human-authored text is used to automatically build a template for a new topic for automatically written Wikipedia article.
Several companies have been started since 2009 which build systems that transform data into narrative using NLG and AI techniques. These include , , Arria NLG, Automated Insights, , Retresco, , Yseop and . Open-source NLG solutions exist as well, for instance , and .

Evaluation

As in other scientific fields, NLG researchers need to test how well their systems, modules, and algorithms work. This is called evaluation. There are three basic techniques for evaluating NLG systems:

Task-based evaluation: give the generated text to a person, and assess how well it helps him perform a task. For example, a system which generates summaries of medical data can be evaluated by giving these summaries to doctors, and assessing whether the summaries helps doctors make better decisions.
Human ratings: give the generated text to a person, and ask him or her to rate the quality and usefulness of the text.
Metrics: compare generated texts to texts written by people from the same input data, using an automatic metric such as BLEU.

An ultimate goal is how useful NLG systems are at helping people, which is the first of the above techniques. However, task-based evaluations are time-consuming and expensive, and can be difficult to carry out. Hence task-based evaluations are the exception, not the norm.
Recently researchers are assessing how well human-ratings and metrics correlate with task-based evaluations. Work is being conducted in the context of Generation Challenges shared-task events. Initial results suggest that human ratings are much better than metrics in this regard. In other words, human ratings usually do predict task-effectiveness at least to some degree, while ratings produced by metrics often do not predict task-effectiveness well. These results are preliminary. In any case, human ratings are the most popular evaluation technique in NLG; this is contrast to machine translation, where metrics are widely used.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...