Talkie: A 13B Language Model Trained Exclusively on Pre-1931 Text

Stanley Ulili

Updated on May 10, 2026

Why train on pre-1931 text
Knowledge boundaries
In-context learning of Python
Historical forecasting
Building challenges
Future work
Final thoughts

Talkie is a 13-billion parameter language model trained on 260 billion tokens of English text published before 1931. Its training data predates digital computers, the internet, World War II, and modern slang. The model was created in part by Alec Radford, lead author of the 2018 OpenAI paper on generative pre-trained transformers that laid the architectural foundation for the GPT series.

Why train on pre-1931 text

Screenshot of the blog post announcing Talkie titled "Introducing talkie: a 13B vintage language model from 1930"

The 1931 cutoff is a copyright decision. Works published before 1930 have generally entered the public domain in the United States, making large-scale digitization and use legally straightforward.

The more significant motivation is research: Talkie serves as a contamination-free baseline for studying AI capabilities.

Modern LLMs are trained on massive web datasets that increasingly contain AI-generated content. When a current model answers a question correctly, it is difficult to determine whether it reasoned its way to the answer or retrieved a similar question-answer pair from its training data, potentially one written by another model.

Diagram illustrating "The Contamination Problem" showing modern LLMs training on web content populated by AI-generated output creating a feedback loop

Talkie's training data was created before digital computers existed. Any emergent behavior it displays is far more likely to be a genuine property of the model's learning process than a retrieval of memorized content.

Screenshot of Alec Radford's Wikipedia page highlighting his role as lead author on OpenAI's 2018 paper on generative pre-trained transformers

Knowledge boundaries

Talkie's responses reflect its pre-1931 knowledge consistently. Asked what the internet is, it appears to confuse the term with "internal revenue" and explains a 17th-century English tax on goods. Asked about popular slang, it produces terms like "bosh," "rot," "fudge," "gammon," "humbug," and "ribaldry."

Talkie interface showing its response to "What is the internet" explaining internal revenue tax instead

Talkie's list of popular slang words including "bosh," "rot," and "gammon"

In-context learning of Python

One experiment tested whether a model whose training data predates computers can learn to write Python code from a few in-context examples. When asked what a computer is, Talkie defines it as "a person whose business is to compute or calculate," which was the original meaning of the word.

Researchers provided several complete Python function examples in the prompt, then gave the beginning of a new function and asked Talkie to complete it. In one test, after being shown an encode_shift function that shifts characters forward by 5 positions in the alphabet, Talkie correctly generated a corresponding decode_shift function by changing + 5 to - 5.

Python code for encode_shift and Talkie's correct generated code for decode_shift demonstrating its understanding of inverse functions

This is not pattern matching on the training data, since Python did not exist in 1931. It is an emergent capability acquired entirely from context, demonstrating that sufficiently large models can learn novel concepts well outside their training distribution.

Historical forecasting

Researchers measured how "surprising" historical events were to Talkie by feeding it short descriptions of events from the New York Times "On This Day" feature and measuring prediction error over time. Events become progressively less predictable after the 1931 cutoff, with a significant increase in surprisingness in the 1950s and 1960s.

Asking Talkie to forecast events it cannot have seen reveals the assumptions embedded in pre-1931 thinking. On the prospect of another war in Europe, it suggests one is unlikely because "the great Powers are exhausted, and the peoples are yearning for peace," reflecting the post-World War I sentiment that a second world war was inconceivable. On Adolf Hitler, using information available in the early 1930s before the full extent of his regime was known, Talkie describes him as an "extraordinary personality" whose nationalism might bring "far more efficient administration." This illustrates how accurately an AI reflects the biases and blind spots of its training era.

Building challenges

Temporal leakage. The primary technical challenge is preventing post-1931 content from entering the training data. Archive metadata can be incorrect: a 1925 newspaper scan might be timestamped as 2005. Old texts are also frequently republished with modern additions: a 19th-century novel in a 1950 edition may include a new editor's introduction. Talkie shows minor signs of leakage, such as knowing who the US president was in 1936.

OCR quality. Most pre-1931 text exists only in physical form and must be digitized using optical character recognition. OCR struggles with old newsprint. Training on uncleaned OCR text produced only 30% of the learning efficiency of models trained on clean text. Applying regex-based cleaning to fix common OCR errors improved this to nearly 70%. The team is working toward a purpose-built model for historical document OCR.

Post-training without a period-appropriate judge. After pre-training, LLMs are typically fine-tuned for instruction-following using a judge model that rates responses. No model from the 1930s exists to serve this role. The team used Claude Sonnet 4.6 as the judge, which introduced a "modern style leak": the judge favors concise, bullet-pointed answers that were not characteristic of 1930s writing. To generate training data for this phase, the researchers built a custom dataset from period-appropriate sources including etiquette manuals, letter-writing guides, cookbooks, and fables.

Future work

The team is working toward a GPT-3-level vintage model. They estimate that over one trillion tokens of historical text is achievable, which would be sufficient to train a model with GPT-3.5-level capabilities rooted entirely in pre-1931 knowledge.

Final thoughts

Talkie is primarily a research instrument. It provides a baseline that is free from modern data contamination, making it useful for studying emergent capabilities, the mechanics of in-context learning, and how training data shapes the assumptions and biases of an AI system.

The in-context Python learning result is the most technically interesting finding: a model with no exposure to programming concepts can learn a new formal language from a handful of examples, and reason correctly about the inverse relationship between two functions. This suggests that certain forms of logical reasoning emerge from scale and language modeling rather than requiring direct exposure to the relevant domain.

The project documentation and model details are at talkie.ai.

Got an article suggestion? Let us know

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.