In mid-September, Forbes published an article written by Kalev Leetaru, Senior Fellow at Georgetown University (and former UIUC student), about the possibilities for big data embedded in the colossal online archive of digitized information from the past. “History as Big Data: 500 Years of Book Images and Mapping Millions of Books,” gives several examples of Leetaru’s exciting projects–such as his GDELT project which processed over 3.5 million books from the Internet Archive and HathiTrust to render them mappable by multiple dimensions, such as title, author, library subject tags, and emotion, to name but a few–but it is his work with the Internet Archive’s book images that is of particular interest.
This initiative takes a different approach to mining digitized books; while text mining digitized, historic books is common in digital humanities practice, this effort has focused instead on miningĀ the illustrations, photographs, charts, graphs, and artistic flourishes from more than 600 million digitized pages from the Internet Archive’s 19-petabyte book collection. Leetaru’s algorithm extracted over 14 million images from books spanning more than five centuries, along with detailed descriptions including the text from the book that surrounds the image, the book’s subject tags, and relevant metadata. These high-resolution images with their data are being slowly added to Flickr photostream “Internet Archive Book Images” with metadata intact. Looking at specific images even reveals the link to the digital edition of the book, where you can see the image in its original context.
When searching the collection, you need to make sure to click the magnifying glass icon below the toolbar and the banner; this will ensure you’re searching within the collection and not the entirety of Flickr.
And sure, you *could* look up any number of keywords, and you’ll probably get results for almost anything you can think of–Leetaru suggests looking at the elegant/creepy/highly decorative images from emblem books, which are characterized by detailed illustrations with moralizing themes–but this is the internet, so obviously what you really want is ye olde cat memes.
Maybe 1911 is too recent to qualify as “ye olde” but given the average age of most cat memes, these are prehistoric. The images (and those that follow) are from the 1911 book Kittens and Cats; A Book of Tales by Eulalie Osgood Grover, who, if the accompanying text in the image description tells us anything, was quite the writer:
“Don’t tell anybody where I am. I am hiding away from mother. She wants me to go to the Queen’s party and I don’t want to go. I don’t like the Queen, she is so grand and dignified. She frightens me. I would rather hide in this pitcher all day than go to the Queen’s palace.”
–Eulalie Osgood Grover, Kittens and Cats; A Book of Tales. Boston: Houghton Mifflin, 1911. pp. 14
I think we can all relate to that.
Anyone with a passing interest in pictures of cats and kittens wearing clothing accompanied by absurd text should definitely check out all the results from this book. Here are a few more, for good measure:
But in all seriousness, the Book Images Collection project is staggering in its breadth and depth. As I write this, there are currently over 50,000 pages of images in this collection on Flickr, from books as early as 1500. Flickr’s advanced search options can delimit your search by date, copyright, color, and even composition. General searches default to searching all of the metadata, which, because it is exhaustive, can return unrelated results (for example, a general search for ‘cat’). Exclusively searching by tags may limit your results too much (again, as with ‘cat’), but it’s definitely worth playing around with and seeing what you can find.
This could potentially be a powerful search tool for those interested in visual culture studies, art history, media studies, communications, and the history of books, for example. Further, these images are all in the public domain and if you have a Flickr account, can be downloaded in beautiful high resolution. And if you’re really interested in what you’re looking at, click the link to the full text and read the book!