Ordering Logics Presentation – Data Archive Infrastructure Fall 2018

DRAFT!

Much of my interest in my art and design practice is in surveillance and how it is enabled through technology. My research primarily includes the systems and technologies the government create in agencies from GCHQ to the NYPD, and systems and technologies created by corporate institutions that are marketed and sold to local and federal governments. These technologies are top-secret and/or trade secrets. Moreover, the policies, strategies and rules of engagement are even more secretive and hidden behind impenetrable secret courts, cries of national security, and private contracts. Through leaked documents and product demos we can glean the priorities, hierarchies, and intent of theses technologies

In 2013, Edward Snowden, a data analyst, released documents uncovering the extent of abuse and power the NSA holds. The documents outline several projects of mass surveillance that the NSA maintains. One of these technologies is XKEYSCORE. XKEYSCORE is the NSA’s “widest-reaching” (these are the NSA’s own words) system; developing intelligence from computer networks, the program covers nearly everything a typical user does on the internet, including emails, websites, and Google searches. The XKEYSCORE system continuously collects so much internet data that it can be stored only for 3-5 days at a time. XKEYSCORE was also used to hack into systems that allowed the NSA the keys to cell phone communication. Though the advent of FISA restricts how the NSA may surveil US citizens, FISA warrants that allow the use of XKEYSCORE may lead to some US-based information being swept up by the wide-reaching filterless data gathering. Whistleblowers claim that they could, “wiretap anyone, from you or your accountant to a federal judge to even the president, if I had a personal email.”

XKEYSCORE’s (or XKS) strength is in its ability to go deep. Because much of the traffic in the internet is anonymous (thanks to efforts of privacy activists) XKS’s dragnet approach allows analysts to pick up on small bits of info to start creating profiles on “targets”. As XKS relies on scraping and acquiring telecom signals, it looks at different points of access into the telecom systems [pg.11]. This primarily includes phone numbers, email addresses, log-ins, and other internet activity. In looking at the slide deck for XKS we can glean some classifications that are of interest, and this make someone susceptible to surveillance. They are sprinkled throughout the deck[pg.14-20]. Something I find interesting, is not only data as classification, but ease of access to data as a class [pg.23]. For me there are two major themes in the classification that stand out. One is an anti-globalist and islamophobic one: this person doesn’t belong in this place. The other is an interesting position on internet security and safety: Are you exposed? We’ll target you! Do you care about protecting yourself on the Internet? We’ll target you!

Palantir is a data mining company founded by Peter Thiel, founder and former CEO/”don” of PayPal. They use data fusion to solve big problems, from national defense to improving medical patient outcomes to supply chain management. Data fusion is the process of taking different sets of data and find trends between them. This is exemplified in Palantir’s efforts to aid law enforcement, currently for the LAPD and secretly for NOPD for 6 years. Palantir’s models utilize datasets that include court filings, licenses, addresses, phone numbers, and social media data. Like others, the model uses this to index probability for a given target, but instead of indexing likeliness of buying a product, or voting for a candidate, it models likeliness of committing a crime.

In this so-called “crime forecasting”, Palantir used models that treated gun violence as a communicable disease. This is to say: those who were related, or closely associated, to those who have committed crimes were considered likely to commit a crime, too. For those who have already been charged with crime, the model creates an automated “chronic offender score” for the individual, above a certain threshold, and the individual is placed on a watch list. The individual is notified that they will be under increased surveillance, only to be removed if they have no interactions with law enforcement officers — a murky situation since law enforcement officers are now encouraged to scrutinize a citizen. Companies like Palantir allow local law enforcement to bolster its tactics of surveillance. Unlike the NSA, Palantir enables law enforcement to have a laser focus in its efforts to surveil certain people in their communities. Through their slide deck, presenting and pitching their work to other cities, they highlight where Palantir gathers its data: jail calls and phone logs, gang affiliation data, crime data, and social media [pg13]. This again has that same factor of perpetuating violence, as they use data that already skews (we know that the criminal justice system disproportionately targets Black and Hispanic people) to find “new” criminals, ones that have not been abducted into that system. We see often in the slide decks that “indirect” connections are plotted in the system, glossing over that these are not “indirect”, but “systematic” connections, a system that already favors guilt by association [pg.16].

IBM is one of the oldest tech companies around. The popular IBM PC was released over 37 years ago in 1981. Since then IBM has grown from a computer company into a computational one and has always been a major player in emerging technologies. The case has been no different with artificial intelligence technologies. Perhaps most famously, IBM has developed the computer system Watson, a machine learning, natural-language processor for question and answering, that appeared on Jeopardy brutally defeating. Since then Watson has been built out to be one of the largest services IBM offers, now extending the QA features to CV, Text-to-Speech, Speech-to-Text, and much more. With IBM’s rapid growth in the AI sector, it comes as no surprise that they have lent their machine learning prowess for systems of surveillance.

Last month The Intercept, in partnership with the Investigative Fund, reported on software developed by IBM for the NYPD. Through leaked corporate documents, the public is able to see how and what the NYPD are interested in surveilling. The system and IBM engineers had, unknowingly to the public, used access to NYPD CCTV system of over 500 cameras to tag individuals and train data. The data of public citizens is claimed to be safe via NDAs and background checks. For IBM, the NYPD was one of the first serious customers in surveillance tech, especially after 9/11. Looking through the slides of the leaked documents we can see how the NYPD prioritizes and categorizes its citizens in order to find state-aggressors. One of the most appalling categorizations is that of skin color. This is reminiscent of IBM’s body camera tech that allowed the categorizations of people by “ethnicity” tags, such as “Asian,” “Black,” and “White.” For me, the standout classifications are the more nuanced approach to the cameras themselves than the nuisance of classifying citizens by skin color. This further reinforced the idea that the camera provides a ground truth, and seeing the the defaults of these annotations[pg.33+34], such as Light as default skin color, and Black as default torso color(perhaps more of a nod to the default fashion of NYC). Further, I’m curious as to the classification field for “Large Amount of Skin in Torso” and the general interest for needing skin. Largely the claims for the NYPD using the fields have been shot down early by the NYPD as they claim to have acknowledge the tendency to profile. But an IBM engineer brings an interesting statement to the Intercept, “A company wont invest in what the customer doesn’t want.”

Leave a Reply Cancel