Interrogating a National Narrative with GPT-2

Chantal Brousseau

Interrogating a National Narrative with GPT-2

Authors

Chantal Brousseau

Topics:

This lesson is intended to teach you how to apply Generative Pre-trained Transformer 2 (GPT-2), one of the largest existing open-source language models, to a large-scale text corpus in order to produce automatically-written responses to prompts based on the contents of the corpora, aiding in the task of locating the broader themes and trends that emerge from within your body of work. This method of analysis is useful for historical inquiry as it allows for a narrative crafted over years and thousands of texts to be aggregated and condensed, then analyzed through direct inquiry. In essence, it allows you to “talk” to your sources.

To do this, we will use an implementation of GPT-2 that is wrapped in a Python package to simplify the finetuning of an existing machine learning model. Although the code itself in this tutorial is not complex, in the process of learning this method for exploratory data analysis you will gain insight into common machine learning terminology and concepts which can be applied to other branches of machine learning. Beyond just interrogating history, we will also interrogate the ethics of producing this form of research, from its greater impact on the environment to how even one passage from the text generated can be misinterpreted and recontextualized.

Learning outcomes

After completing this lesson, you will be able to:

Apply GPT-2 to a large-scale text corpus in order to produce automatically-written responses to prompts based on the contents of the corpora
Gain insight into common machine learning terminology and concepts which can be applied to other branches of machine learning
Understand the ethical complications of producing this form of research

Interested in learning more?

Check out this lesson on Programming Historian's website

Go to this resource

Interrogating a National Narrative with GPT-2

Learning outcomes

Cite as

Reuse conditions

Full metadata

#Learning outcomes

Cite as

Reuse conditions

Full metadata

Learning outcomes