How can You Survey 5 Million Books in Less than 5 Secs?

If in 5 seconds you could glean some relevant data and insights from a text database (known as Corpra) searching 500 Billion Words in the process, would you be impressed?

Books are wonderful things and with Ebooks, Ereaders, Tablets whatever the type or platform it has never been easier to read on the go and make the most of your down-time on the train, waiting for a meeting, you know how it goes BUT? There are only so many books you can read and so many hours in the day, time to go to a Library, difficult in a busy week but have you seen this little tool.

Example use of Google N-Gram Viewer using some Key Hampshire Places Names with some Intriguing Variations to Consider

Text Search over 5 million books for a series of matching Phrases using Google’s NGRAM Viewer

It is a neat demonstration of how digital tools can help us learn about things not possible with pen and quill. It is more than that a way of taking in 5 seconds a quick test of any matching terms you might have in a sample of 5 million plus books and counting, scanned and digitized/digitised in Google Books online. Yes you can also buy books but that is not what this is about, it is a mammoth index of phrases that you can search, survey and graph over 200 years in literally under 5 seconds. Now in the real world, I am never going to get to look at let alone read 5 million books but this is a useful test of how frequently your key phrases are contained in texts in the Google Books Repository. There is a lot of texts from Harvard, Stanford, Princeton University Libraries and more, sadly UK not so enlightened, but what a great way to share rare books and make them more accessible and to some degree open.

Here our example is just embedded graph of a few key Towns/Place Names, not all register if they are quite obscure, but it is definitely worthy of a look. Further links below and some key features to consider but on the full version, see links below you can interpret the frequency and timeline to how it correlates with what events historically may have driven or not the peaks and troughs. But there is more, you can use timeline bands to go by hyperlink jump to a related selection with links to the texts and specific instances of those texts. you can then bookmark them and more, so watch-out for news of some more tools later to see how you can then capture those links further.

Takeaway Time:  you can try the tool, and you are already starting to benefit from Digital Humanities (its a fusion of Humanities and Computer Science) to help us all understand and research our subjects as we progress. Think of it as a just a Digital Toolkit and don’t get bogged down in the mind-boggling technology behind it, but just think about the art of the possible for you and your projects…

Resources You Can Use Right Now

  1. Using the Viewer (we have a post and PDF in production, it will be free to download and if you are interested you can register then for news on other tools as well, if not just get the download and give a little honest feedback.
  2. Google’s NGRAM dataset explained
  3. What is an n-gram “In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. ” in this case as explained on the Wiki that text is a digited book.
  4. Jean Baptise Michel and in the video explain a bit about this project that enables us all to get a feel for what was involved, it is worth watching, see below.

There is more but hopefully this has just given you a little idea and whatever your interests, its a simple introduction into what Text Mining is about and when we all can read and gipping in and out of various books, that is just what you are doing, the difference here is you can do less of the boring scanning and going to where you want, rather than read a load of stuff and then find no references. If all life was really that simple. But here the computer is doing what it should be doing,’ the tasks that can be safely relegated to the machine.’ It is no substitute for interpretation and deep/close reading of the texts and context but for us it was a start and made us wanted to know more.  Hope you enjoy and please on Google+ or Facebook feel free to comment on and share this post.

Want to Know a Bit More?

Here is the video it is also quite light hearted TED Talk for mere mortals like us not Tech-Heads.

Thanks for Reading and if you would like to contact us or connect your project here is just the place to do so.