Google's book empire

The story of books has entered its most tumultuous chapter, writes Adam Jasper


I'm sitting opposite Dr Dan Clancy, chief engineer of the Google Books project, at Google's headquarters in Mountain View, California. The legendary Googleplex, which constantly astonishes journalists by providing free meals to its staff, is a place where pyjamas pass muster as Friday casual. Small plastic toys encrust the reception desk and a half-deflated plastic flamingo hangs, neck apparently broken, above an otherwise unremarkable office cubicle.

Dan Clancy is a man under a lot of pressure. Six months ago, Google looked set to achieve a de facto monopoly on digital access to the complete body of 20th-century literature. The information behemoth had scanned well over ten million books and had achieved a groundbreaking settlement in a copyright dispute with the Association of American Publishers and the Authors Guild.

All Google needed was the acquiescence of a single US court to become the sole purveyor of millions of titles from the largest and most complete library in the world, an enterprise that would make Amazon's Kindle look like a child's toy. By getting into bed with its natural enemy–copyright holders–Google had managed to broaden what had been a relatively banal case regarding the limits of fair-use copying, into a kind of de facto legislation regarding access to 20th-century literature.

Dr Pamela Samuelson, professor of law at Berkeley has claimed, "The settlement will transform the future of the book industry and of public access to the cultural heritage of mankind". Now it looks as if the giant might have over-reached. The entire deal is balancing on a knife-edge.

Judge Denny Chin (who sentenced disgraced investment banker Bernie Madoff to 150 years in jail) is presiding over a fairness hearing on the settlement. The Antitrust Division of the Department of Justice is running a distinctly hostile investigation. And Sergey Brin, Google's billionaire co-founder, finds himself writing New York Times op-ed pieces that end with a plaintive plea: "Let's not miss this opportunity". What on earth is going on?

Back in 2004, Google quietly expanded the Google Print project to include scanning the contents of complete libraries. Until then, Google Print had focused on offering samples of books actively submitted by publishers as part of a partner program.

Among the early participants in the new Google Books initiative were the University of Michigan, Stanford, Harvard, the New York Public Library, the British Library, and the University of California. Google Books rapidly outgrew all of them to the point that today it is in effect, the largest library on earth.

"Search is really about democratisation of information," said Clancy, who has spoken of creating a "universal digital library". How many books that means is debatable but the figure could be as high as 50 million and Google wants to scan all of them.

Even the Department of Justice dropped its customary restraint to note that the project "has the potential to breathe life into millions of works that are now effectively off limits to the public". The project was so ambitious that Marissa Mayer, a Google vice-president, described it as "Google's moon shot".

Since then, the case has morphed into something significantly more ambitious than a mere moon shot. It is certainly true that using Google Books for the first time is a revelation. It is a complete index of extant literature that not only allows searching by metadata (title, author, subject and keywords) but also allows users to mine the entire body of 20th-century literature, bringing up material of a quality and relevance superior to any other online library catalogue.

Should the project be successful, Google will be able to offer institutions (like Sydney University) affordable digital access to just about any book from anywhere in the world. It is a project that understandably excites many writers, academics and researchers. It is also one that would doubtless help increase the accessibility and ultimately the sale of many unusual and half-remembered texts.

So when on 20 September 2005, the Authors Guild initiated a class action with three named appellants–Betty Miles, Daniel Hoffman and Herbert Mitgang–maintaining that authors strenuously objected to the scanning of their work, one couldn't help wondering which authors and why not?

The complaint lodged with the Southern District Court of New York claimed that the defendant, Google, who the plaintiffs saw fit to remind the judge "owns and operates a major internet search engine", was engaging in massive copyright infringement. Less than a month later, the Association of American Publishers launched an equivalent suit.

When a work was in copyright, but Google had no prior understanding with the publisher, the search engine was cautious, only displaying "snippets", a short string of text showing the sought terms in context, comparable to the amount of information it shows for website searches. Seeing that Google was so conservative in what its searches revealed, one couldn't help wonder where the alleged breach of copyright lay. In its substance, the argument seemed specious: regardless of whether or not the distribution and presentation of copyrighted material abided by the law, the Authors Guild maintained that the violation lay in the very act of scanning and indexing the books.

Outside the court, the guild's argument was primarily that it just wasn't fair of Google to go and scan all those books. It argued that Google needed to first ask for a writer's permission. It also said Google was profiteering. The guild said it was irrelevant that there were millions of copyright holders, and their identities were often uncertain, making obtaining all the permissions utterly impossible.

A quick search on the three named appellants in the case produced a small but striking discovery. They are all elderly. Betty Miles is a children's book author in her seventies. Daniel Hoffman is an octogenarian poet laureate, and Herbert Mitgang was 85 when the documents were submitted to court. Not, prima facie, the people most likely to understand or be sympathetic to the nature of Google's project.

As Peter Brantley, director of the Bookserver Project at the Internet Archive, a San Francisco-based not-for-profit library said, "They were not, I think, philosophically in the same orientation as Google in regards to the availability of text for online discovery".

The Authors Guild invited the three to stand as representative individuals, but as pensioners, they individually had little to lose. Such figures, common in class actions, have a wonderful name in US legal parlance. They are referred to as "judgment proof". Using three pensioners, the Authors Guild, which represents a paltry 8000 writers, had explicitly laid claim to the right to represent the millions of authors whose works Google had begun the long, arduous process of scanning.

"At the core, our disagreement was on principle," Dan Clancy said of the ensuing legal battle between Google and the guild.

For a time, hopes were raised that Google would win a landmark fair-use case that would allow other groups, such as the Internet Archive and the Open Content Alliance, to expand the scope of their own projects in making books more available on the web.

In the meantime, the parties frantically negotiated behind closed doors, until, quite suddenly, on 28 October 2008, they reached a settlement of $US125 million.

As Richard Sarnoff, chairman of the Association of American Publishers, phrased it: "This historic settlement is a win for everyone." Both plaintiff and defendant used his words repeatedly. The homogeneity of these celebratory announcements made them impossible to believe.

What really motivates such cases? The Authors Guild and the Association of American Publishers waited nearly two years after the start of Google's scanning project before launching their suit. By that point, Google had clearly invested significant resources, and would be reluctant to change course. One could assume that such a suit was merely rent seeking.

Of the $US125 million, roughly $US45 million has been set aside as damages for authors, whose works were scanned without permission before 5 May 2009. Each rights holder eventually should receive a cheque of at least $US60 in the post. Another $US34.5 million has been set aside for the creation of a Book Rights Registry (BRR), a body that will administer the money to be distributed, as well as taking 63 per cent of all Google's ongoing earnings from the project.

About a third of the lump sum in turn goes to compensate publishers, and finally a separate payment, not named in the original settlement figure, is to be handed to the plaintiff's lawyers. This final sum is about $US45.5 million, or more than the amount awarded to all the rights holders combined. Perhaps most importantly, the BRR will be 50 per cent controlled by the Authors Guild, and 50 per cent by the Association of American Publishers, capturing the litigants in a permanent power-sharing arrangement. And with that, order was restored, authors' rights assured, and all was well again in the land of copyright.

Or perhaps not. During the past year, a storm of protest has begun to brew. In the immediate aftermath of the settlement, the internet lit up with claims of Luddite authors, reactionary publishers, and scheming litigators. The world's largest single provider of information had been brought low by a trio of technophobes, the pawns of corporations bent on enforcing a parasitic imposition that reaches from beyond the grave: copyright.

As the dust of the predictable intergenerational conflict settled, another narrative began to emerge. It appeared that Google, in losing, might have won a much bigger victory, achieving an end run around conventional copyright legislation and acquiring a monopoly over the ability to present a complete archive of 20th-century literature. As the Department of Justice filing from 18 September noted, "The proposed settlement would establish a marketplace in which only one competitor would have authority to use a vast array of works".

As Brantley, of the Internet Archive, explained: "This case is unique for a wide variety of reasons, the first of which is that it is not merely a retrospective agreement whereby Google simply reimburses the rights holders for the claimed infringement of digitisation, although that is in there. It is also a prospective agreement, in that it governs future uses of the materials that have been digitised."

"The other fundamentally important distinction is that this covers an area of law, copyright, that is formally the domain of the United States legislative process. So the court, in this case, winds up with the ability to shift the burden of potential liability in an area governed by copyright law in a court system, and this is frightening."

Much of the controversy hinges upon the status of orphan works. An orphan is the industry term for out-of-print books that could quite possibly still be in copyright. The term misleadingly implies a special case: of the more than ten million books that Google has scanned, up to 75 per cent are orphans. According to the terms of the settlement, Google would be the only organisation able to offer access to orphan works without risking further litigation.

The discussion is particularly concerned with the question of whether the dismissively named orphan works represent a large proportion of the literature being scanned. The Authors Guild has argued repeatedly and dismissively that in reality, they represent much less than a majority. To make matters worse, nobody knows which books are under discussion, because there is currently no central registry of copyrighted works.

A few minutes with a pocket calculator reveals that according to the lawyers' own estimates, the rights-holders of around 750,000 works are expected to come forward. Seeing as Google scanned 7.6 million books whose status is doubtful, the majority of works are still unaccounted for.

And in the final analysis, contrary to the arguments of the Authors Guild, it doesn't matter whether 20 per cent or 70 per cent of 20th-century literature is orphaned. The completeness of the index is all-important from the point of view of researchers and only Google will be in a position to present it. The Authors Guild's position is that not only are the numbers not that high, but also that the content simply isn't that important.

As the guild's Paul Aiken said: "We also know that for those older books, 1923-1963, there just weren't as many, period, as there are more recently. Because there was a Great Depression and there was World War II, so book production was relatively low in the first 40 or 50 years that we are talking about. Book production took off in the 1980s as expenses came down and the barriers to entry came down and the production of books per year just kept climbing."

An increasing number of scholars have argued that the new Book Rights Registry hands Google a monopoly over the entire class of orphan books. According to the terms of the settlement, orphan works, unlike copyrighted works, cannot be licensed to any other party. The Authors Guild and the Association of American Publishers have rejected this claim as unfounded.

John Sargent of Macmillan Publishers has acknowledged that, "In plain fact they [Google] have a lot of power over those works". The Department of Justice is equally curt in its assessment that Google has been granted "de facto exclusive rights for the digital distribution of orphan works".

In a New York Times article, Sergey Brin noted, "Nothing in this agreement precludes any other company or organisation from pursuing their own similar effort". This is not the view the Department of Justice has taken, noting that it is not reasonable "to think that a competitor could enter the market by copying books en masse without permission in the hope of prompting a class action suit that could then be settled on terms comparable to the proposed settlement. Even if it there were reason to think history could repeat itself in this unlikely fashion, [the agreement] discourages potential competitors".

The agreement does not act to protect any third party foolish enough to attempt to repeat Google's experiment. Quite the contrary: having established a new market place, the litigants can only lose by allowing free competition on it. Having extracted $US125 million from Google, there is no plausible reason why the legal fraternity would restrain themselves from pursuing smaller fry.

It is now almost inevitable that any other archival project that attempts to index its entire collection will also be sued, leading to the patent absurdity that Oxford University Press may not be able to legally digitally index a book that it published in 1950, but Google will be able to do so with impunity. Any other organisation would have to do better than Google in a contest of law, or be wiped out in the attempt.

It could well be argued that Google had no intention of setting up any kind of cartel. Indeed, it is noteworthy that we come from a generation that holds Google to a higher standard of moral behaviour than we expect of our grandmothers. Even the Department of Justice has expressed some enthusiasm at Google's idealism: "Google has made clear in the past that it started this project on the premise that anyone, anywhere, anytime should have the tools to explore the great works of history and culture. However the proposed settlement is modified by the parties, this approach should continue to be at its heart".

When Google concentrates so much power in so few hands, it is asking for a phenomenal amount of trust and all of a sudden, it becomes reasonable to question the company it keeps. What happens if there is a cultural change at the top of Google? What happens if Google becomes Zoogle (a zombie variant of itself)?

Dan Clancy, Google's chief engineer said, "In writing the settlement, we actually thought a lot about Zoogle. As our library partners said, ‘Look, Dan, I trust you, I trust Eric, I trust Larry and Sergey, but 15 years from now, I don't know who's going to be there'."