Millions of pictures from The Internet Archive

Image from page 905 of "Canadian grocer July-December 1896" (1889)

Image from page 905 of “Canadian grocer July-December 1896” (1889). Source: The Internet Archive

The BBC has published a very interesting article about an initiative from the Internet Archive. Based on a technology developed by Kalev H. Leetaru, from Georgetown University, a Flickr page has been created to offer a pool of millions of images (already 2.6 million) included in scanned books from the Internet Archive.

Here’s an extract of the BBC article on how it works: 

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text. As part of the process, the software recognised which parts of a page were pictures in order to discard them. 

Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.

 And here is the Flickr page:

Please, tell us what you think about it!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s