The Good Life: the utopian “no place” of email archives – Data Archive Infrastructure Fall 2018

Susan Breakell makes the point that “to archive” was not originally used as a verb; rather the word became one around the same time as the entrance of the PC into our homes and lives. We use the word both to mean to store records, and to store electronic information that we no longer regularly use. Zielinksi highlights that “the archive serves to organize mental and enforced orders in the shape of appropriate structure and to preserve, with a tremendous amount of effort, the memory of past orders.” And from Mattern we see that archives demonstrate the interconnected technological, social, intellectual, architectural infrastructures required. This embodiment is entwined with certain politics and epistemologies, and particularly takes place in large part through aesthetics.

“The Good Life” is a project by artists Tega Brain and Sam Lavigne, an archival performance art that positions your email inbox as the stage. It is based off a proportion of the emails sent between Enron employees in the late 1990s to the early 2000s. This large-scale archive of emails was the first of its kind, and was the training database used for many early natural language processing (NLP) algorithms – including most current spam filters, and early versions of Siri. By allowing your inbox to be hijacked for a period of time of your choosing (between 7 and 28 years), you too can embody “The Good Life” of white collar, mainly white, mainly male, corporate workers (and some criminals) though language-based architectures of late 90s corporate culture. In doing so, we can all explore the enduring nature and wide-usage of digital archives, “the impulse to archive” against “the right to be forgotten,” the inescapability of bias in training data sets, and the aesthetic of emails, the poetry, and the “rational” world order of this corporate elite.

Enron started out as an energy company. Based in Houston, Texas, it was considered “America’s most innovative company” for six years in a row. It employed 20,000 people, and in 2000, the year before it collapsed, it claimed revenues of $101 billion. It embodied a vision of American corporate success, constantly scaling and growing, moving from energy into creating new financial instruments, from trading to investments in broadband. Right before its collapse, was in partnership with Blockbuster to stream movies online – it could have been Netflix. In 2001, its stock price collapsed, and in the fallout, the company and its executives were found to have been involved in price fixing, misrepresentation of earnings, institutionalized accounting fraud, and generally corrupt business practice. When it declared bankruptcy, it was the largest in American history.

As a consequence, the Federal Energy Regulatory Commission (FERC) acquired the company’s data, including the massive archive of emails that had been sent to, from, and between employees – 1.6million emails in total. After complaints, some of these emails were removed from the archive. We can consider this a form of selection and curating the archive, though as Breakell notes, “any selection process is problematic.” One hundred employees were given 10 days to search through and remove personal emails (of their coworkers, their friends, their family members, their children). These workers were told to search for terms like “social security number” “credit card number” and “divorce”. However, as you can still find emails sent between divorcing spouses and flirting coworkers through The Good Life’s database, it’s clear many of these searches were not particularly effective in their task. The archive of 500,000 emails was the first large scale archive of its kind to be made publicly available. It is still one of the only large public email collections that’s easily and freely accessible online.

As Hal Foster writes, “no place” is the literal meaning of “utopia.” The artists’ project’s name, “The Good Life” speaks to Hal Foster and Breakall’s point – in the “no place” of the archive, we see the archival impulse go further: we can imagine “possible scenarios of alternative social relations.” To fully experience “The Good Life,” you can opt in for your own email inbox to receive a slightly reduced version of the archive. You can have 225,000 emails in total sent to your inbox in the order and with an equivalent time-spacing they were originally sent. Originally the project provided the option to have the emails sent over 5 days, 30 days, or 1 year, but these tiers had to be canceled because the emails kept getting blacklisted as spam. Given that modern spam filters were originally built off this database of emails, this seems ironic. Now, your options are to sign up to receive the emails every day for 7, 14 or 28 years.

Beyond an examination of the banality and volume of email even from its earliest usage, the project brings into play a much deeper critical commentary on contemporary digital archives, perhaps especially unintentional ones. First, we can consider the political, social, and cultural architecture of an archive, the importance of archives, and the enduring legacy of this particular database. Finn Brunton notes “the FERC had unintentionally produced a remarkable object: the public and private mailing activities of 158 people in the upper echelons of a major corporation, frozen in place like the ruins of Pompeii for future researchers.” As the first of its kind, it has been used to train spam filters, email recognition technologies like prioritization rules in your inbox, fraud detection, counterterrorism operations, and workplace behavioral patterns.The hegemonic ordering of an archive that Zielinski writes about is very much alive and enduring. There is a good chance that at least something on your phone is running off software that used this archive as its training database.

It matters then, that the users from which this archive was generated were from a particularly narrow group of people. This archive was used to build NLP algorithms because it was assumed to be representative of how people use email. But algorithms are only as good as the data provided, even or perhaps especially when they are on a large scale. As we discussed last week, biased inputs can generate and embed biased outputs in both allocation and representation. What cause for concern does it give us that so much of the epistemic scaffolding of our current information management systems are built off the corporate (and at least somewhat corrupt) working elite of the 1990s and early 2000s? On the other hand, as artist Mimi Onuoha has pointed out, today many our current datasets are built off the personal data of those who have no choice, or limited choice but to sign away their data, typically the structurally disadvantaged. This archive then offers a rare view into a group of users normally afforded more “privacy” than most people.

However, it is clear there was still a personal cost. The scrubbing of the archive did not clear out, for example, a named husband and wife emailing each other as their divorce proceeded. Employees may not have been aware in 1990s that their emails would ever resurface, least of all for public perusal. It is likely that corporate practice has changed since this time with increased awareness of the permanence of emails – the concept of “huddling” in corporate culture today is to take something offline, to communicate without leaving a digital trace. And even though we all know on an abstract level that email is not private, most of us today would still be deeply uncomfortable with our emails being publicly available in a searchable format and with our names attached to them, even though we operate with some awareness that this is possible. While it is clear that this email database deeply embodies the archival impulse, it also speaks to the right to be forgotten.

Though we might ask if that is realistic. We are now all contributing to digital archives many many orders of magnitude larger than these 500,000 emails of the Enron database. Every email, click, like, hovering over a link, and many other forms of our digital footprints are now collected by the biggest (and some not so big) corporate players in the world. What machines are ultimately being trained off datasets produced by our digital labors, and what implications does this have for both material and immaterial orders? Is anarchive even possible in this terrain? The artists suggest that by rendering your inbox into a timewarp between 1998 and present day, you subvert your email provider’s algorithm’s ability to make accurate sense of your data. Their “service obfuscates your personal emails, and it breaks the machine learning’s algorithms for understanding you.” They add: the real benefit is that it also makes it impossible for you to use your email.

Though there is a strong case to be made for examining the material infrastructure required to enable email technologies, for most people, emails appear largely through immaterial means. And yet, clearly they too operate at an aesthetic level. The Good Life’s commitment to replicating the Enron employee’s experience is achieved through the Windows 1995 interface. And while we might imagine email as standardized communication, the variation in content is analogous to Zielinski’s write up of VALIE EXPORT’s work. Formally similar frames can bring to the forefront the heterogeneity of what is contained, in this case in emails. In teaching the machines how to “think” through human language, this archive is showing a range of human communications. Granted, this is limited both by it being explicitly written content (which differs greatly from human speech, for example) and by the narrow collection of humans whose “labor” was used to generate this.

Mattern writes of a critical reviewer of an early article at pains to point out that highlighting the aesthetic experience might suggest that poetry is devoid of “intellectual or political engagement” and to fail to acknowledge that “poets even think rationally.” Given the current political debates about whose speech is considered “rational” and “unemotional” I thought it was telling that artist Constant Dullaart and NYU data scientist Leon Yin created an experiment with Brain and Lavigne’s project – a predictive text generator based off the Enron corpus. When the generator was fed a “poem” (itself found in the Enron database), it emulated the speech patterns of the emails to create this rather poetic response:

…I put my arms in front of me

The company, that Enron companies,

the service of the company

so the company

so the company seedness.

And went to pull her nearer

To the CIO,

The CCPM

Please no California

Thanks company

And the company

So the company.

And realized that my new best friend

Business conceding the company

so the company

so the companies seedness.

Was nothing but a mirror

Of the company

So the company.

One Reply

shannon says:

October 24, 2018 at 9:16 pm

I asked my students to manually comb through the Enron corpus of emails (a dataset that has machine learning, to train software) to find patterns that computers could miss but humans would notice. @turniplan found a web of racist/sexist jokes and began sketching the connections: pic.twitter.com/k0gd7aJ5y5

— kelli anderson (@kellianderson) October 24, 2018

One Reply

Leave a Reply Cancel