Google Books: A Metadata Train Wreck

Google Books: A Meta Data Train Wreck by Geoff Nunberg definitely made me think deeply about the serious ramifications of Google’s digital archivist and their motivations to scan quality books and literature onto the web. It made me wonder if the motivations of google’s archivists were taking this very new problem to heart.

This “train wreck” that google has made for itself, especially as the largest and more popular go to search engine, is both amazing and disturbing.  Its amazing because they have taken on a major feat for the future of digital archiving which is commendable, yet are they doing it for the right reasons?  I immediately thought of the next generation of researchers, historians and book collectors. Would these fresh minds be mislead, misinformed or be forever digging for the factual records of these works? Would these new researchers have to verify and double check dates and meta data concerning books stored by the Google?

If Google is purposely manipulating or altering meta data of books searched on their site, it is a sign that the future of digital archiving is in trouble. According to Geoff Nunberg, Google has a very large portion of it’s collections systematically mis-dated and with hundreds of thousands of classifications errors.  For example, it carries a catalog of copyright entries from the Library of Congress listed under “Drama”.   Nunberg argues that these mistakes are prevalent throughout the system but who is to blame for this? Should we blame the publisher? Should we blame librarians? or should the mislabeling of books on google’s attempt to compete with Amazon?

Google must look deeper at it’s responsibility for future digital archiving and the importance of this practice.  This important practice of storing and scanning literature can not be taken lightly and Google should understand its vital role in this effort to preserve literature and history.  Future scholars will use Google’s files and collections in the future and if Google has not decided to correct this issue, I am afraid we will continue into a world of “fake information” and just unreliable hit and miss “googling”.

It makes me nervous when Google executives are confronted with these concerns and begin to pass on the responsibility to librarians, users and providers of the materials. It makes me concerned about future meta data scanned by Google Books. Will they have the best intentions for the people or will they become this private data-mining company that harbor information?  Will they become this meta data monopoly providing books to a certain social class?

One Reply

  • Thanks, Shonda, for reminding us about the (increasing?) responsibility of *corporations* in ensuring the accuracy and reliability of archives — particularly as we rely more often on corporate platforms, paywalled systems, and private infrastructures to access our knowledge resources.

    As you know, Google Books was discontinued. You might be interested in some of the project’s more “commons” -oriented successors, like Hathi Trust: https://www.hathitrust.org/

Leave a Reply

Your email address will not be published. Required fields are marked *