Researchers have analyzed the text from four percent of all books ever published, to learn about how human culture has changed (Photo: Tom Murphy VII)
You may have Facebook friends who have done the “Here are the top words from my Facebook status messages!” thing, where it lists the words they’ve most commonly used in telling the world about their lives. Interesting as that may or may not be, imagine something similar being done with four percent of all books ever published. That’s what a team of researchers from Harvard University, Google, Encyclopaedia Britannica, and the American Heritage Dictionary have done. The resulting dataset is made up of the full text of about 5.2 million books, 72 percent of that text being in English, with French, German, Chinese, Russian, and Hebrew making up the rest. Analyzing that dataset, a practice that the researchers call “culturomics” (a play on genomics), has revealed some fascinating things about the history of our species.