Natural Language Generation

🎯 Goal: generate natural language from semantic representation (or other data)

Pollen Forecast

Pollen Forecast for Scotland

  • Taking six numbers as input, a simple NLG system generates a short textual summary of pollen levels

    “Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4.”

  • The actual forecast (written by a human meteorologist) from the data

    “Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count.”

Weather Forecast

  • Function: Produces textual weather reports in English and French

  • Input: Numerical weather simulation data annotated by human forecaster


Making choices

  • Content to be included/omitted

  • Organization of content into coherent structure

  • Style (formality, opinion, genre, personality…)

  • Packaging into sentences

  • Syntactic constructions

  • How to refer to entities (referring expression generation)

  • What words to use (lexical choice)

Rule-based methods

Six basic activities in NLG:

  1. Content determination

    Deciding what information to mention in the text

  2. Discourse planning

    Imposing ordering and structure over the information to convey

  3. Sentence aggregation

    Merging of similar sentences to improve readability and naturalness

  4. Lexicalization

    Deciding the specific words and phrases to express the concepts and relations

  5. Referring expression generation

    Selecting words or phrases to identify domain entities

  6. Linguistic realization

    Creating the actual text, which is correct according to the grammar rules of syntax, morphology and orthography

3-stages pipelined architecture:

  • Text planning (Act 1 and 2)
  • Sentence planning (Act 3, 4, and 5)
  • Linguistic realization (Act 6)

Intermediate representations: Text plans

  • Represented as trees whose leaf nodes specify individual messages and internal nodes show how messages are conceptually grouped

Sentence plans

  • Template representation, possibly with some linguistic processing → Represent sentences as boilerplate text and parameters that need to be inserted into the boilerplate text
  • abstract sentential representation → Specify the content words (nouns, verbs, adjectives and adverbs) of a sentence, and how they are related
Text/Document planner

  • Determine

    • what information to communicate
    • how to structure information into a coherent text
  • Common Approaches:

    • methods based on observations about common text structures (Schemas)

    • methods based on reasoning about the purpose of the text and discourse coherence (Rhetorical Structure Theory, planning)

Content Selection

Rhetorical predicates

  • Attribute

    E.g. Mary has a pink coat.

  • Equivalence

    E.g. Wines described as ‘great’ are fine wines from an especially good village.

  • Specification

    E.g. [The machine is heavy.] It weighs 2 tons.

  • Constituency

    E.g. [This is an octopus.] There is his eye, these are his legs, and he has these suction cups.

  • Evidence

    E.g. [The audience recognized the difference.] They started laughing right from the very first frames of that film.

Corpus-based content selection

(Take weather forecast as example)

  • Routine messages: always included
    • E.g.
      • MonthlyRainFallMsg
      • MonthlyTemperatureMsg
      • RainSoFarMsg
      • MonthlyRainyDaysMsg
  • Significant Event messages: Only constructed if the data warrants it
    • E.g. if rain occurs on more than a specified number of days in a row
      • RainEventMsg
      • RainSpellMsg
      • TemperatureEventMsg


Define Schemas

Produces a text/document plan

  • a tree structure populated by messages at its leaf nodes


  • Deciding how messages should be composed together to produce specifications for sentences or other linguistic units

  • On the basis of

    • Information content
    • Possible forms of realization
    • Semantics
  • Some possibilities:

    • Simple conjunction
    • Ellipsis
    • Embedding
    • Set introduction
  • Example

    • Without aggregation:

      Heavy rain fell on the 27th. 
      Heavy rain fell on the 28th.
    • Aggregation via simple conjunction:

      Heavy rain fell on the 27th and heavy rain fell on the 28th.
    • Aggregation via ellipsis:

      Heavy rain fell on the 27th and [] on the 28th.
    • Aggregation via set introduction:

      Heavy rain fell on the 27th and 28th.


  • Choose words and syntactic structures to express content selected
  • If several lexicalizations are possible, consider:
    • user knowledge and preferences

    • consistency with previous usage

    • Pragmatics: emphasis, level of formality, personality, …

    • interaction with other aspects of micro planning

  • Example
    • S: rainfall was very poor

    • NP: a much worse than average rainfall

    • ADJP: much drier than average

Generating Referring Expressions (GRE)

  • Identify specific domain objects and entities

  • GRE produces description of object or event that allows hearer to distinguish it from distractors

  • Issues

    • Initial introduction of an object

    • Subsequent references to an already salient object

  • Example

    • Referring to months:

      • June 1999

      • June

      • the month

      • next June

    • Referring to temporal intervals

      • 8 days starting from the 11th
      • From the 11th to the 18th

    (Relatively simple, so can be hardcoded in document planning)


  • 🎯 Goal: to convert text specifications into actual text

  • Purpose: hide the peculiarities of the target language from the rest of the NLG system

  • Example

    • Example


  • Task-based (extrinsic) evaluation

    • how well the generated text helps to perform a task
  • Human ratings

    • quality and usefulness of the text
  • Metrics

    • e.g. BLEU (Bilingual Evaluation Understudy)

    • Quality is considered to be the correspondence between machine’s output and that of a human

Statistical methods

Problems of conventional NLG components

  • expensive to build
    • need lots of handcrafting or a well-labeled dataset to be trained on
  • kind and amount of available data severely limits the development 😢
  • makes cross-domain, multi-lingual SDSs (Spoken Dialogue Systems) intractable 😢


  • human languages are context-aware
  • natural response should be directly learned from data than depending on defined syntaxes or rules

Deep Learning NLG

  • Significant progress in applying statistical method for SLU and DM in past decade

    • including making them more easily extensible to other application/domains
  • Data-driven NLG for SDSs relatively unexplored due to mentioned difficulty of collecting semantically-annotated corpora

    • rule-based NLG remains the norm for most systems
  • Goal of the NLG component of an SDS:

    map an abstract dialog act consisting of an act type and a set of attribute(slot)-value pairs into an appropriate surface text

(RNN-based) Generation

  • Conditional text generation

    • Text has different length
  • Use RNN-based neural network

  • Decoding

    • Initialize RNN with input

      • Hidden state or first input
      截屏2020-09-19 17.25.57
    • Generate output probability for first word

    • Sample first word/Select most probable word

      截屏2020-09-19 17.26.51
    • Insert selected word into RNN

      截屏2020-09-19 17.27.31
    • Continue till <eos>

      截屏2020-09-19 17.28.13

🔴 Challenges

  • Large vocabulary

    • Names of all restaurants

    • Delexicalization: Replace slot values by slot names

      截屏2020-09-19 17.34.19
  • Vanishing gradient

    • Repeated input

      截屏2020-09-19 17.30.44
    • Gating of input vector

      • Problem: Output NAME several times

      • Remove NAME from S when it has been output

        截屏2020-09-19 17.30.44
  • Only backward dependencies

    • Rerank output with different models

      • N-Best list reranking
        • Cannot look at all possible output
        • But: Generate several good outputs (e.g. top 10; top 100)
        • Then we can also use other models to evaluate them
        • Possible to select different one
          • But if good output is not in best, we can not find it 🤪
      • N-Best generation
        • Beam search
          • Select top $k$ words at timestep 1
          • Independently insert all of them at timestep 2
            • Select top $k$ words
          • $k*k$ possible output at timestep 2
          • Filter top $k$
          • Continue with top $k$ at timestep 3
    • Right to left

      • Rescoring

        截屏2020-09-20 11.31.57

    • Inverse direction

Left to write decoding

  • RNN allows generation from left-to-right
    • 👍 Advantages
      • Do not need to generate all possible output and then evaluate
      • Possible for most task
    • 👎 Disadvantages
      • No global view

      • Word probability only on previous words

      • Non optimal modeling if all slots have been filed

Generating long sequence

  • RNN prefers short sequences –> Hard to train long sequences 😢

    • Incoherent E.g. The sun is the center of the sun
    • Redundant E.g. I like cake and cake
    • Contradictory E.g. I don’t own a gun, but I do own a gun
  • 💡 Idea:

    • Generate only fix length segments

    • Condition on input and previous target sequence

Generating by editing

  • Similar sentence should be in the training data

    • Edit this sentence instead of generating new sentence
  • 💡Idea

    • Find similar sentence

    • Combine edit vector and input sentence

    • Generate output sentence

  • Use sequence to sequence model

    • Again RNN

    • But easier to copy then to generate

截屏2020-09-20 11.40.15