Interrogating a National Narrative with GPT-2
- Authors
This lesson is intended to teach you how to apply Generative Pre-trained Transformer 2 (GPT-2), one of the largest existing open-source language models, to a large-scale text corpus in order to produce automatically-written responses to prompts based on the contents of the corpora, aiding in the task of locating the broader themes and trends that emerge from within your body of work. This method of analysis is useful for historical inquiry as it allows for a narrative crafted over years and thousands of texts to be aggregated and condensed, then analyzed through direct inquiry. In essence, it allows you to “talk” to your sources.
To do this, we will use an implementation of GPT-2 that is wrapped in a Python package to simplify the finetuning of an existing machine learning model. Although the code itself in this tutorial is not complex, in the process of learning this method for exploratory data analysis you will gain insight into common machine learning terminology and concepts which can be applied to other branches of machine learning. Beyond just interrogating history, we will also interrogate the ethics of producing this form of research, from its greater impact on the environment to how even one passage from the text generated can be misinterpreted and recontextualized.
Learning outcomes
After completing this lesson, you will be able to:
- Apply GPT-2 to a large-scale text corpus in order to produce automatically-written responses to prompts based on the contents of the corpora
- Gain insight into common machine learning terminology and concepts which can be applied to other branches of machine learning
- Understand the ethical complications of producing this form of research
Check out this lesson on Programming Historian's website
Go to this resource