This is a reblog from the Weekly Standard of one of the best articles I've read on the inherent flaws in Google Books. Yes, it is a very long essay---but it is worth reading for its excellent summary of a range of interlocking issues. There are many other points of view, of course, and comments are welcome!
Google and Its Enemies
The much-hyped project to digitize 32 million books sounds like a good idea. Why are so many people taking shots at it?
by Jonathan V. Last
12/10/2007, Volume 013, Issue 13
In 1998 Larry Page and Sergey Brin founded a company calledGoogle, about which you likely know quite a bit. The outgrowth of workPage and Brin began in 1996 on hypertextual search engines, Google hasmoved from darling little high-concept innovator to Microsoft-likebehemoth in record time. Google employs over 15,000 people, has a stockprice hovering near $700 a share, and is the all-powerful advertisingand search force on the Internet. It is gradually pushing andpurchasing its way into entertainment, business software, and even thecellular telephone market.
Before Page and Brin started Google, however, they were graduatestudents working on Stanford's Digital Library Technologies project,which sought to digitally store and catalogue books, newspapers, andscholarly journals. Page, in particular, seems to carry a torch forthis endeavor. In 2002 he approached his alma mater, the University ofMichigan, about digitizing the library. It was the birth of the GoogleLibrary Project, one of the most ambitious undertakings in the historyof the written word. It was also a move that would create for Google--acompany obsessed with its own beneficence--a crowd of enemies.
In July 2004, Google began quietly scanning and digitizingMichigan's library. Five months later, in December 2004, the companyofficially announced the "Google Print for Libraries" project. (Afterthe effort hit snags and received some bad press, it was rebranded"Google Book Search.") Google partnered with five majorlibraries--Michigan, Stanford, Harvard, Oxford's Bodleian, and the NewYork Public Library--in an attempt to scan the pages of 15 millionvolumes. These digital books would be kept and indexed in a Googledatabase, which would be made available, for free, to the public.
The scope has changed in the intervening years. Initially Googleplanned to scan the 15 million books in six years. That projection wasrevised upwards to more than 20 million books, and the New Yorkerrecently reported that Google is now aiming to scan at least 32 millionbooks, besting the number of titles in the largest bibliographicdatabase, WorldCat. It hopes to finish within ten years. As oneGooglehead told the New Yorker's Jeffrey Toobin, "I think of Google Books as our moon shot."
It remains to be seen how realistic this goal is. Google will notdivulge how many books it is scanning currently, or how many titles arealready in its database, which went live to the public in May 2005 atbooks.google.com. To get a rough sense of things, the University ofMichigan library has 7 million volumes and Google estimates it willhave annexed them all by 2013, noting that it is scanning tens ofthousands of books each week. Google will not reveal how it scans thebooks. As for the cost, this too is closely guarded by Google. In asimilar venture, Microsoft is spending $2.5 million to scan 100,000books; if that scale were to hold, Google might spend as much as $800million.
Google has also expanded its list of library partners to include 13additional libraries, ranging from the Bavarian State Library to theUniversity of Virginia. Most of the agreements are private, so it isunclear what the participating institutions get from the deal, otherthan a digital copy of books they already own. For Google, thepotential upside must seem enormous: The ebook movement of a few yearsago failed but the Holy Grail of the digital library movement remains amassive archive of books, all searchable, which can be accessed fromanywhere on the planet. Already a company called OnDemandBooks hascreated a machine called "Espresso" which can take the digital text ofa book, print it, and bind it into soft cover in about four minutes.The commercial promise--and downright coolness--of Google's undertakingstaggers the mind. Which is why many recent accounts of the project,from Toobin's to Jason Epstein's in the New York Review of Books to Michael Hirschorn's in the Atlantic, vibrate with fidgety, egg-headed excitement.
Not everyone is thrilled, though. As a class, users seem underwhelmedby the product itself, poking fun on blogs at the page-scans, thetitles included, and the odd results that appear in response to searchqueries. Google's book-reader interface is unwieldy: It is difficult tonavigate through the books; what may be read is full of poorlyexplained limits; and "page unavailable" messages often appear in themiddle of books. Some books are presented without advertisements.Others have ads embedded in the browser window, which appear to run ona keyword algorithm similar to Google's Ad Words service. The entry forMark Twain's Life on the Mississippi, for instance, carries ads for sightseeing tours on the Mississippi River and a volume from Twain's collected works.
Nor is everyone pleased by the idea of Google's online library. Justthree days after Google announced the project, the president of theAmerican Library Association took to the pages of the Los Angeles Timesto proclaim the superior value of bricks-and-mortar libraries andcaution against irrational Google exuberance: "This latest version ofGoogle hype will no doubt join taking personal commuter helicopters towork and carrying the Library of Congress in a briefcase on microfilmas 'back to the future' failures, for the simple reason that they weresolutions in search of a problem."
Competitors have also appeared. Amazon.com has scanned hundreds ofthousands of books which can be accessed on the website and last monthintroduced its version of the ebook, called the "Kindle." As of now, itmakes available 90,000 books for purchase and download. In 2005,Microsoft and the Alfred P. Sloan Foundation formed the Open ContentAlliance, in conjunction with such institutions as the Boston PublicLibrary and Johns Hopkins University. Google's chief competitor in thesearch engine business, Yahoo!, provides web hosting for the OCA. Thepublisher HarperCollins announced that it would scan 20,000 of itstitles and provide the texts to all search engines, gratis.
On a much grander scale, the governments of China and India joinedwith the Library of Alexandria and eight U.S. universities on a"Million Book Project." They are moving aggressively: China has 18digitization centers up and running, India has 22. Part of thisconsortium, Carnegie Mellon's "Universal Library," already has about500,000 books digitized.
In Europe, the reaction to Google was striking. Jean-Noël Jeanneney,president of the Bibliothèque Nationale de France, wrote an op-ed thatbecame a book, Google and the Myth of Universal Knowledge. Itprincipally attacked Google's library project as a piece of Anglo-Saxoncultural imperialism. Jeanneney's book, which has been translated intoseveral languages and sold briskly, is full of irritatingly Frenchclichés. He laments the Monica Lewinsky affair and shakes his head inbewilderment at George W. Bush's reelection. At one point he worriesthat "English .??.??. if not contained, will become ever moredominant," because of projects such as Google Book Search. He did,however, prod some Europeans into taking Google seriously. The FrenchMinistry of Culture has signed up some 30 libraries to its own digitallibrary project. European governments are even contemplating thecreation of a state-owned search engine--the embryonic project iscalled "Quaero"--with an eye toward competing with Google. The modelJeanneney cites for this endeavor is Airbus.
And then there are the lawsuits. The Google Library is composed of twodifferent tracks, the "Partner Program" (originally called the"Publisher Program") and the "Library Project." Under the PartnerProgram, authors and publishers can volunteer their works for inclusionin the Google database. In return, they're given a portion of therevenue Google generates from ads that appear on pages featuring theirbooks. A number of authors and major publishers have joined up,including Simon & Schuster, Penguin, and McGraw-Hill. Books scannedunder the Partner Program will not give viewers access to the fulltext, but rather to a few pages on either side of the search result.
The legal problems lie with the Library Project. Copyright has itsfoundations in English law and the Licensing Act of 1662. The fallingcosts of printing had created rampant book piracy in England. Concernedthat such behavior would blunt creativity and harm the book business,Charles II established a register of licensed books to protect authorsand publishers. A hundred years later, the copyright was the only rightthe Founding Fathers gauged important enough to recognize explicitly inthe Constitution itself. In the intervening years, it has evolvedsomewhat. Today, works published before 1923 are generally in thepublic domain. There are exceptions and complexities, but workspublished after 1978 are protected by copyright for 70 years from theauthor's death. As for works published between 1923 and 1978, they weregiven an original copyright protection of 28 years from firstpublication and another 67 years of protection upon renewal of thecopyright. Got that?
And here lies Google's dilemma: Out-of-copyright books account forabout one-sixth of all titles. Most books--75 percent of them--are incopyright, but out of print. Only about 10 percent of all books areboth copyrighted and in print. Google has decided to get aroundthis problem of copyright protection by simply ignoring it: forgingahead and scanning books, regardless of their copyright status. If abook is in the public domain, its full text is displayed to users, butif the book is protected, then Google shows users only a "snippet" ofthe text surrounding the search result. It is relevant to note that"snippet" is Google's word and is intentionally not a legal term; howmuch text is displayed is entirely at Google's discretion.
Concerned by this imposition on the copyright, authors andpublishers began complaining to Google in mid-2005. That August, Googleannounced that it would suspend the scanning of copyrighted works forthree months so as to allow copyright holders to "opt out" of theprogram and keep their works out of the database. A month later, theAuthors Guild filed suit in New York's Second Circuit on the grounds ofcopyright infringement; a month after that, a group of publishers fileda separate suit on similar grounds.
Many of the publishers party to this suit were also, coincidentally,working with Google under the Partner Program. The publishers areseeking only to stop Google from scanning books without explicitpermission; the Authors Guild seeks damages as well. As the Guild'sPaul Aiken told the New Yorker, "Google is doing something thatis likely to be very profitable for them, and they should pay for it.It's not enough to say that it will help the sales of some books. Ifyou make a movie of a book, that may spur sales, but that doesn't meanyou don't license the books." Both cases are winding their way slowlythrough the courts.
Google has, as they say, all the right enemies. Anytime the ALA,Microsoft, France, a trade guild, and a bunch of trial lawyers arelined up on one side of an argument, the other side is going to lookextremely attractive. And there is a seductive appeal to the ideaof Google Book Search, to the dream of having millions of books at yourfingertips. Yet there are the aspects of the project that should giveus pause.
Google's Wal-Mart-like obsession with secrecy does not engendertrust in either its practices or arguments. As silly as most ofJean-Noël Jeanneney's broadside against Google is, it's easy to see whya book search without transparency of either its data set or its searchalgorithm would be suspicious and not obviously objective. Page andBrin admitted as much in the research paper that became the foundationof Google, "Anatomy of a Large-Scale Hypertextual Web Search Engine."They wrote:
The goals of the advertising business model do not always correspond to providing quality search to users. .??.??. For this type of reason and historical experience with other media, we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers.
Free-market competition should lessen this concern, of course. And,as previously mentioned, a number of competitors to Google havematerialized. But Google's principal advantage is that its competitors have abided by the letter of intellectual property law and not scannedcopyrighted materials without the express permission of the owners.Google's willingness to flout the law is the actual source of itscompetitive advantage.
To defend this advantage, Google has adopted a legal defense aimedstraight at copyright law. The defense is multipronged, but the twomost startling aspects relate to the establishment of the "opt out"option for copyright owners and Google's claim of a transformativenature to the Book Search. Each challenges the current understanding ofthe copyright in a fundamental way.
Google maintains that by giving copyright owners the chance to optout of the program, it has performed due diligence with respect to thecopyright. This turns traditional law--which stipulates that someonewanting to use copyrighted material must seek and receive affirmativepermission--on its head. Yet Google has found a slim precedent in the2006 case Field v. Google.
Blake Field sued Google for copying and caching 51 works from hiswebsite. The court ruled in Google's favor, citing in particular theease of Google's "opt out" feature, but the decision was based in parton dubious grounds. The court said that Field had "invited" Google'sspiders--web robots which crawl through the Internet cataloguing andindexing pages for a search engine--by not including code on hiswebsite which discouraged them. In other words, by not tellingGoogle to stay away, Field was asking to have his copyright violated.It's the intellectual property version of "She wore a red dress to thebar on Saturday night."
In another part of the decision, the court ruled that Field's workswere only a thimbleful of the "billions" Google had copied, and,presumably, Google had cached many of those without permission, too.The sheer volume of the copying provides them cover, since no one entrystands out in the sea. The violation of one copyright is a crime, theviolation of 20 million is a statistic. There's an evident weakness inGoogle's citing this legal argument: In the relatively closed system ofGoogle Book Search, most of the entries will likely be from protected works used without permission. In the Fielddecision, moreover, the court made much of the fact that works werecopied by automated spiders and that there was "no evidence of anymarket for Field's works." Neither is true in the case of thebook-scanning project.
The Internet has become, like the 17th-century printing press,incapable of observing copyrights. In the same way the printing pressencouraged the mass production of books and magazines and newspapers,the Internet cries out for the distribution of allinformation--everything from blog entries to pictures to books. And asit distributes all of this information, it exerts a leveling force thatdiminishes the value of everything it touches. There is no reason thatthe Internet, unlike the printing press before it, should be exemptfrom the same protections of creative value. Yet, this is what Google'sdefense would achieve.
If the copyright protection is shifted so that it must beinvoked--precisely what Google's "opt out" policy establishes--it willbecome the burden of holders. They will have to find and petition allthose using their works to cease and desist. Georgetown Law professorJonathan Band dismisses this concern in the course of a measured,intriguing defense of Google in the journal Plagiary. Bandwrites, "As a practical matter .??.??. only a small number of searchengine firms have the resources to engage in digitization programs onthe scale of Google's Library Project." But this is an odd argument: Solong as only Google in-fringes on the copyright, then it should beallowed to do so, because opting out will only be a burden if everyoneelse is allowed to infringe on the copyright, too.
The second, larger, aspect of Google's defense is that Google BookSearch is a "transformative work," which would provide for the fair useof previously copyrighted material. It might seem obvious that creatingan index of protected works--whose primary value and advantage lies inthe number of works in the set--and simply allowing users to search it,is not "transformative." Google Book Search is in important wayssimilar to Lexis-Nexis, the search database which catalogues newspaper,wire service, and magazine articles. LexisNexis pays content providersfor the right to include their material, even though all it does isaggregate that material and render it searchable. The copyrightprotection of this material was solid enough that the Supreme Courtdecided in favor of freelance writers who sought compensation for thiselectronic reuse of their materials in the 2001 case New York Times Co. v. Tasini.
Tasini is not perfectly on-point because LexisNexis givesthe full text of written works to paying customers where Google isproposing to give only snippets to its users. Here Google finds redoubtin the 2003 case Kelly v. Arriba Soft. PhotographerLeslie Kelly sued Arriba Soft because its search engine copiedphotographs posted on her website, created thumbnail-sized versions ofthem, and placed them in its search index. The Ninth Circuit found thatArriba's copying and usage met fair-use standards because thesearchable thumbnails constituted a transformed work. (They also voicedthe red dress and thimble arguments that would be later brought to bearin Field.)
This ruling would seem to offer comfort to Google because there issome similarity between Kelly's thumbnail images and the snippets ofcopyrighted books Google is giving away--both are abstractions oflarger works and neither eliminates the need for the original. Itassumes, however, that the violation of the copyright occurs whenGoogle gives material to the user. In reality, the infringement occurswhen Google scans and archives an entire book without permission. It isthe presence of millions of these whole, copyrighted books insideGoogle's database that creates commercial opportunities, albeitindirect ones, for the company. If Google Book Search included onlyworks in the public domain, it would be almost indistinguishable fromits competitors.
Google has tried to sidestep this problem by promising not to runadvertisements on the snippet-delivering pages of copyrighted books.But the presence of the protected works in the database is what rendersthe ad space on the public domain book pages so valuable. And Google'spromise of access to millions and millions of protected works is whatcreates the commercial opportunity for the rest of the project. If thecourts do not recognize this principle, Google will have changed thelandscape of intellectual property law.
So where does Google go from here? The lawsuits fall in the SecondCircuit. If the court finds against Google, it may produce a conflictwith the Ninth Circuit, a conflict the Supreme Court may decide toresolve. It's also possible that Google will buy its way out of theproblem and make a deal with the publishers and the Authors Guild.There is additional incentive because such a settlement could functionas a high barrier to entry and keep the competing enterprises frombeginning to use protected works.
If the courts were to find against Google, however, the Book Searchwould likely die on the vine. As Georgetown's Band notes, it would beextremely difficult to construct a licensing regime for books modeledon the ASCAP/BMI models for musical compositions. And if Google were totry to go legit, the transaction costs of identifying, locating, andcontacting copyright holders to seek permission could easily stretch totens of billions of dollars. Band puts the best guess in theneighborhood of $25 billion.
Yet even if Google finds a way to realize its dreams, it's unclearexactly how useful the Book Search would ever be for the average user.Is there value in seeing "snippets" of this or that text? The only waythe project could really achieve its goal of disseminating knowledge tothe masses would be by ignoring copyrights and putting all texts intothe public domain. Which is, of course, what the logic of the Internetultimately wants. "Information wants to be free," according to one ofthe web's founding mantras.
If Google was a different company, with a different set ofmotivating principles, it might well have constructed its Libraryproject along the lines of Apple's iTunes model--that is, it would havespent time and money not perfecting a mass scanning operation designedto gobble up as many pages as possible per hour, but in securing therights to a large catalogue of books which it could then sell asdownloads. After all, it's not as though the current delivery mechanismfor books is in any way optimal.
But this concept is beyond its ken. Google's corporate philosophy isbased on the model which brought them success: organizing and givingaway other people's content, creating space for advertisements in theprocess. The enormous success Google found with that model in thesearch engine business spurred it to try and impose it in every arena.In the Google worldview, content is individually valueless. No one pageis more important than the next; the value lies in the page view.And a page view is a page view, regardless of whether the page inquestion has a picture of a cat, a single link to another site, or thefull text of Freakonomics. When all you're selling is ad space,the value shifts from the content to the viewer. And ultimately thecontent is valued at nothing. And here, finally, is the larger problemposed by Google's actions. Books are not in any important senseuser-centric. Whether or not a book has readers matters little. Booksstand on their own, over time, as ideas and creations. In the world ofbooks, it is the ideas and the authors that matter most, not thereaders. That is why the copyright exists in the first place, toprotect the value of these created works, a value which Google istrying mightily to deny.
As much as any other American business, Google is the corporateembodiment of the Internet's first principles. And as with so much elseon the Internet, the promise of Google Book Search lies somewhere offon the horizon, while the dangers it poses today are very real.
Jonathan V. Last is a staff writer at THE WEEKLY STANDARD.