Language Models

Summary (TL;DR)

Language models offer a way to assign a probability to a sentence or other sequence of words, and to predict a word from preceding words. $P(w|h)$: Probability of word $w$ given history $h$ n-gram model estimate words from a fixed window of previous words $$ P\left(w_{n} | w_{1}^{n-1}\right) \approx P\left(w_{n} | w_{n-N+1}^{n-1}\right) $$ n-gram probabilities can be estimated by counting in a corpus and normalizing (MLE) $$ P\left(w_{n} | w_{n-N+1}^{n-1}\right)=\frac{C\left(w_{n-N+1}^{n-1} w_{n}\right)}{C\left(w_{n-N+1}^{n-1}\right)} $$ Evaluation

2020-08-03

Perplexity’s Relation to Entropy

Recall A better n-gram model is one that assigns a higher probability to the test data, and perplexity is a normalized version of the probability of the test set.

2020-08-03

Smoothing

To keep a language model from assigning zero probability to these unseen events, we’ll have to shave off a bit of probability mass from some more frequent events and give it to the events we’ve never seen.

2020-08-03

Generalization and Zeros

The n-gram model is dependent on the training corpus (like many statistical models). Implication: The probabilities often encode specific facts about a given training corpus. n-grams do a better and better job of modeling the training corpus as we increase the value of $N$.

2020-08-03

Evaluating Language Models

Extrinsic evaluation Best way to evaluate the performance of a language model Embed LM in an application and measure how much the application improves For speech recognition, we can compare the performance of two language models by running the speech recognizer twice, once with each language model, and seeing which gives the more accurate transcription.

2020-08-03

N Gram

Language models (LMs): Model that assign probabilities to sequence of words N-gram: a sequence of N words E.g.: Please turn your homework … bigram (2-gram): two-word sequence of word “please turn”, “turn your”, or ”your homework” trigram (3-gram): three-word sequence of word “please turn your”, or “turn your homework” Motivation $P(w|h)$: probability of a word $w$ given some history $h$.

2020-08-02

Languages Modeling (N-Gram)

2020-08-02