Will Googling books be page in history or footnote?

[14 May 2007]

By Mark Johnson

Milwaukee Journal Sentinel (MCT)

MILWAUKEE—Google me Ishmael.

Someone somewhere in the world is searching for the opening lines of Melville’s “Moby Dick,” or maybe a phrase from Toni Morrison’s “Beloved,” an anecdote from a Bart Starr biography, a speech by Churchill or by the evil genius in a James Bond novel. Our searcher is in a hurry, and not inclined to rise from the computer.

To avoid such an inconvenience, there is the promise (or the hype, depending on your point of view) of Google Book Search, the Internet search giant’s quest to digitize all of the world’s books and make them available online in snippets, pages, chapters, even entire works.

“There is a lot of hyperbole about how unbinding the books and digitizing every page is going to radically increase access to information. I want to deflate that notion,” says Siva Vaidhyanathan, an associate professor of culture and communications at New York University who formerly taught at the University of Wisconsin-Madison.

If a book is under copyright, Google can legally provide only a snippet of text without permission from the publisher, far too little to be of much use, Vaidhyanathan says. Books published before 1923 are considered in the public domain, meaning Google can print the entire text.

Despite posting such small portions of many books, the project has already come under fire from members of the Authors Guild and the Association of American Publishers, who have sued Google, alleging copyright infringement. Google’s defenders argue that the company is allowing copyright owners to opt out of the project and is, in any case, protected under the fair use doctrine, which allows the display of small amounts of copyrighted material.

In addition to such legal obstacles, Vaidhyanathan says, the Google project suffers from a “lack of quality control” and a search engine he calls “a disaster.”

“I’m arguing essentially for a national conversation about building a really useful and accessible digital library,” he says. “I think what we need is a human genome project of the mind.”

More upbeat about the project is Edward Van Gemert, acting director of libraries at the University of Wisconsin-Madison, one of the 13 institutions sharing a large portion of their collections with Google.

“This is about discovery,” Van Gemert says. “This is about people finding out information and then being able to go to the printed material. The snippet will point me toward the printed materials. ...

“It’s about giving people the widest possible access to the great collections at the University of Wisconsin and the Wisconsin Historical Society. I believe this effort really is transformative.”

The scope of the Google project is unprecedented, but various North American research libraries have been involved in smaller-scale projects to digitize their own collections, Van Gemert says.

The University of Wisconsin, which began collecting books in 1848, digitized a large portion of its collection in 2000, along with donated works and volumes from partners such as the Aldo Leopold Foundation. In all, about 2 million books were part of the university’s digitization project, many of them covering Wisconsin and regional topics such as the Great Lakes.

In the last fiscal year, the university recorded 5.3 million searches of its digital collection, a number that might seem small in a few years.

“The number of eyeballs that look at this stuff on Google is inarguable,” Van Gemert says.

In 2004, Google unveiled the book search project, targeting both publishers and libraries. So far, more than 10,000 publishers have signed on; the libraries include Harvard, Stanford, Oxford, Michigan and the New York Public Library.

The University of Wisconsin signed its agreement with Google in October. In April, a large truck left Madison, carrying the first shipment of books bound for one of Google’s scanning facilities. Van Gemert says that under the agreement with Google he cannot discuss details - how many books were in the truck or precisely where in the United States they were taken.

Google is by no means alone in marrying books and the Internet. Microsoft has been digitizing 100,000 volumes from the British Library.

“Our scope is very large, over a million books currently,” says Adam Smith, who manages Google Book Search.

Smith would not provide exact book numbers, nor would he say how many searches the project has recorded or how much of an investment it represents for Google. In a recent paper on the project, Jonathan Band, a Washington, D.C., copyright lawyer, estimated that Google will spend about $750 million just to scan 30 million volumes from five participating libraries.

Smith expects the project to keep rolling along, converting each new crop of books into packages of information, as searchable as the universe of Web pages.

“We think it’s important to the world that books are discoverable,” he says. “Allowing users to search every word in every book makes books vastly more discoverable.”

How discoverable, of course, varies greatly from book to book, depending on copyright status and other factors. The quality of the search appears to vary, too.

In an article on the blog of the American Historical Association, Robert B. Townsend writes that despite having useful features, “my experience suggests the project is falling far short of its central promise of exposing the literature of the world and is instead piling mistake upon mistake with little evidence of basic quality control.” He criticizes the quality of the scanning and inaccuracies in the descriptions that accompany some books.

Smith says Google has been listening to users and publishers and is “actively working on all of those issues.”

On one point at least, many of those involved in the debate over the book search project appear united. Technology, despite its speed and convenience, is not poised to bury the book - not yet, anyway.

“The book itself is still a remarkable technology,” Vaidhyanathan says. “People like the look and feel of books. They don’t run out of batteries. You can read them in the sunlight. You can read them during takeoffs and landings and in the bathtub.”

Published at: http://www.popmatters.com/pm/article/will-googling-books-be-page-in-history-or-footnote/