hckrnws
I would like to see what their process for safety alignment and guardrails is with that model. They give some spicy examples on github, but the responses are tepid and a lot more diplomatic than I would expect.
Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.
Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.
“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”
“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”
This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
When you put it that way it reminds me of the Severn/Keats character in the Hyperion Cantos. Far-future AIs reconstruct historical figures from their writings in an attempt to gain philosophical insights.
This is definitely fascinating - being able to do AI brain surgery, and selectively tuning its knowledge and priors, you'd be able to create awesome and terrifying simulations.
"...what do you mean, 'World War One?'"
I remember reading a children's book when I was young and the fact that people used the phrase "World War One" rather than "The Great War" was a clue to the reader that events were taking place in a certain time period. Never forgot that for some reason.
I failed to catch the clue, btw.
> "...what do you mean, 'World War One?'"
Oh sorry, spoilers.
(Hell, I miss Capaldi)
… what do you mean, an internet where everything wasn't hidden behind anti-bot captchas?
This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.
I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems
The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.
I used to teach 19th-century history, and the responses definitely sound like a Victorian-era writer. And they of course sound like writing (books and periodicals etc) rather than "chat": as other responders allude to, the fine-tuning or RL process for making them good at conversation was presumably quite different from what is used for most chatbots, and they're leaning very heavily into the pre-training texts. We don't have any living Victorians to RLHF on: we just have what they wrote.
To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.
Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!
I wonder if the historical format you might want to look at for "Chat" is letters? Definitely wordier segments, but it's at least the back and forth feel and we often have complete correspondence over long stretches from certain figures.
This would probably get easier towards the start of the 20th century ofc
Good point, informal letters might actually be a better source - AI chat is (usually) a written rather than spoken interaction after all! And we do have a lot transcribed collections of letters to train on, although they’re mostly from people who were famous or became famous, which certainly introduces some bias.
While not specifically Victorian, couldn't we learn much from what daily conversations were like by looking at surviving oral cultures, or other relatively secluded communal pockets? I'd also say time and progress are not always equally distributed, and even within geographical regions (as the U.K.) there are likely large differences in the rate of language shifts since then, some possibly surviving well into the 20th century.
The time cutoff probably matters but maybe not as much as the lack of human finetuning from places like Nigeria with somewhat foreign styles of English. I'm not really sure if there is as much of an 'obvious LLM text style' in other languages, it hasn't seemed that way in my limited attempts to speak to LLMs in languages I'm studying.
The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.
There is. I have observed it in both Chinese and Japanese.
I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.
“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”
On what data is it trained?
On one hand it says it's trained on,
> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.
Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.
But they seem to say it represents the 1913 viewpoint:
On one hand, they say it represents the perspective of 1913; for example,
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.
> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.
People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?
They apparently pre-train with all data up to 1900 and then fine-tune with 1900-1913 data. Anyway, the amount of available content tends to increase quickly over time, as instances of content like mass literature, periodicals, newspapers etc. only really became a thing throughout the 19th and early 20th century.
They pre-train with all data up to 1900 and then fine-tune with 1900-1913 data.
Where does it say that? I tried to find more detail. Thanks.
See pretraining section of the prerelease_notes.md:
https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?
We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).
So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...
Basically using GPT-5 and being careful
Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s
I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/
I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.
Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?
I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.
You could extract quoted speech from the data (especially in Q&A format) and treat that as "chat" that the model should learn from.
Interesting ... I'd love to find one that had a cutoff date around 1980.
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.
I don't mind the experimentation. I'm curious about where someone has found an application of it.
What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.
It doesn't have to be generic. You can assign genders, ideals, even modern ones, and it should do it's best to oblige.
So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.
Was there ever such a time or place?
There is a modern trope of a certain political group that bias is a modern invention of another political group - an attempt to politicize anti-bias.
Preventing bias is fundamental to scientific research and law, for example. That same political group is strongly anti-science and anti-rule-of-law, maybe for the same reason.
Keep at it Zurich!
A question for those who think LLM’s are the path to artificial intelligence: if a large language model trained on pre-1913 data is a window into the past, how is a large language model trained on pre-2025 data not effectively the same thing?
Counter question: how does a training set, representing a window into the past, differ from your own experience as an intelligent entity? Are you able to see into the future? How?
A human brain is a window to the person's past?
I assume this is a collaboration between the History Channel and Pornhub.
“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”
The knowledge machine question is fascinating ("Imagine you had access to a machine embodying all the collective knowledge of your ancestors. What would you ask it?") – it truly does not know about computers, has no concept of its own substrate. But a knowledge machine is still comprehensible to it.
It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.
Jonathan Swift wrote about something we might consider a computer in the early 18th century, in Gulliver's Travels - https://en.wikipedia.org/wiki/The_Engine
The idea of knowledge machines was not necessarily common, but it was by no means unheard of by the mid 18th century, there were adding machines and other mechanical computation, even leaving aside our field's direct antecedents in Babbage and Lovelace.
This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.
I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.
smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.
That's the point of this project, to have an LLM that reflects the moral and ethical norms of pre-1913 texts.
Crafted by Rajat
Source Code