No one can decipher what the hell this 15th century manuscript means. Not even artificial intelligence.
It contains over 113 unidentified plant species, astrological drawings of Zodiac symbols, images of what appear to be pregnant women wading in fluids, and sketches of over 100 species of medicinal herbs and roots, complete with watercolors and continuous pages of text that might be recipes, with flowers marking the margins.
The delightfully bizarre book, known as the “Voynich manuscript,” dates back to the 15th century. And no one knows what the hell any of it means.
In 1912, Polish-American bookseller Wilfrid M. Voynich acquired the puzzling 240-page collection of strange drawings and writings, which would come to be named after him. (You can see photocopied images of the entire text here).
Since then, researchers have tried—to varying results—using artificial intelligence to decipher the weird collection, though many of those efforts end up discredited. So what’s going on: Is the Voynich manuscript a bunch of gibberish, and that’s why AI can’t solve it, or is this a sign that AI isn’t as adept as we thought when it comes to understanding languages?
Over the years, a number of researchers thought they had solved the mysterious case of the Voynich manuscript, but each instance of artificial intelligence applied to the text has been heavily questioned and, in some cases, outright discredited. That’s apparent in the scattering of news articles that claim the text has been decoded, only to issue updates that, actually, no it hasn’t.
In September 2017, history researcher and television writer Nicholas Gibbs published research in the Times Literary Supplement, claiming he had figured out the text included “tell-tale signs of an abbreviated Latin format.” Gibbs said the collection was a guide to women’s health and it had been plagiarized from other similar writings during the same period. In essence, he said the Voynich manuscript wasn’t translated from a language into a code (which is the prevailing thought, to this day), but rather, it was shorthand.
“From the herbarium incorporated into the Voynich manuscript, a standard pattern of abbreviations and ligatures emerged from each plant entry … The abbreviations correspond to the standard pattern of words used in the Herbarium Apuleius Platonicus – aq = aqua (water), dq = decoque / decoctio (decoction), con = confundo (mix), ris = radacis / radix (root), s aiij = seminis ana iij (3 grains each), etc.”
The only problem? Everyone thought he was nuts.
A medieval scholar even told Ars Technica that a librarian would have “rebutted [the research] in a heartbeat.”
The whole ordeal points to the text as either having been written in an undiscovered language or in one language, then transformed through a code. Most scholars believe the manuscript has been written in a substitution cipher, which trades in the letters from one language for made-up ones.
In another instance, computer science professor Greg Kondrak and graduate student Bradley Hauer, both from the University of Alberta, claimed to have discovered the source language for the text, which they said was then written in Hebrew and encoded into its current format. They published a paper, “Decoding Anagrammed Texts Written in an Unknown Language and Script,” in 2016.
Kondrak and Hauer thought that by computing certain aspects of the text—like how often letters appear and which combinations are used most often—they could then compare the manuscript to existing languages to find a match.
They used the Universal Declaration of Human Rights, which has been adopted by the United Nations, in 380 different languages to train their algorithms. They weren’t particularly sophisticated in the world of AI because no neural networks or deep learning was used. Instead, the algorithms just relied on statistical analysis.
Unfortunately, Kondrak and Hauer both had way too many assumptions going into this thing.
First off, they trained their algorithms on modern-day languages, meaning they used 21st-century Hebrew, not the 15th-century version that the Voynich manuscript’s authors would have used. If you ever studied “The Canterbury Tales” in high school, toiling over the Middle English version of the text, you know how drastically different languages can become with the passage of time.
Another issue: While the algorithms could produce suggestions for source languages of the texts, they couldn’t evaluate the likelihood of each match. That means they don’t really know if the text was in Hebrew.
Finally, Kondrak and Hauer assumed the manuscript was an anagram, meaning the letters in each word have been scrambled. That’s convenient, given that it allows for more freedom in interpreting the text.
AI and Language: It’s Complicated
It isn’t exactly fair to assume that just because AI can’t crack the Voynich manuscript, it isn’t valuable. The opposite is true, in fact: It shows us that artificial intelligence has difficulty in fully understanding the complexities of human language, which is actually sort of refreshing.
John Doe (his real name), chief scientist at Mind AI, an artificial intelligence company in South Korea, wrote in a Medium post that there’s a vast difference between natural language processing, a subset of artificial intelligence, and what he calls “natural language reasoning.”
Natural language processing focuses on training computers to manipulate human language and relies on machine learning algorithms and neural networks. It’s used in loads of things that you interact with every day, including Amazon’s Alexa. In other cases, it’s been used to beat reading comprehension tests and finish your sentences (looking at you, Gmail).
Sure, natural language processing is quite good at those things, but, as Doe puts it, “it will never achieve a mastery of language” because “the algorithms have no concept of what they are doing.” That is, the applications are narrow and AI would have to have genuine human intelligence to understand the context, tone, and meaning.
Let’s not forget Google Duplex—the AI system that can call a business for you and book an appointment, introduced by Google CEO Sundar Pichai at the company’s 2018 Developers Conference—and the fact that, actually, many of its calls originated with humans, not the software.
So until the Singularity happens, it looks like AI will never be the codebreaker that finally uncovers the mysteries of the Voynich manuscript. But maybe you can give it a shot.
This article was written by Courtney Linder and was published by Popular Mechanics on 12/09/2019