Precedent Analysis: Data Selfie

As platforms such as Facebook “mine” our data; selling and sharing it with advertising companies and governments, the question of what is being collected and how it’s being aggregated and applied has piqued the interest of a not-so-small-population. Data Selfie functions in the background of your Chrome web browser tracks your Facebook activity: what you click and type, how long you look at things, and the keywords associated with the things you look at.  It then takes that data and runs it through the type of algorithms that render it useful for advertisers and or intelligence personal, yet instead of selling it off or using it for unwarranted surveillance, Data Selfie presents it back to you via a visual profile that looks like this: 

Screenshot by DATA X


The project’s goal is to make people more conscious of the type of data being collected on them but also to visualize the way algorithms “make meaning” and or categorize that data for use. It does this by creating an easy visual language that uses color to delineate which of the four types of data (looked, liked, link clicked, typed) is being used in each category (and briefly how it’s being used). For example:

I imagine the makers of the project envisioned a scenario in which users review their Data Selfie with great fear, as they’re presented with uncanny reflections of themselves and/or uncomfortable truths that have been revealed from the depths of their subconscious. 

Yet, from what I can tell, Data Selfie may be less about revelation of self (the user) and more successful in revealing the (current) limits to Machine Learning and algorithmic data processing- the nuisance and noise that computers haven’t quite figured out how to sift through (yet).

To speak anecdotally, I’ve used Data Selfie for over a month now and it’s convinced I’m a man. It also thinks I hold positive sentiments towards Donald Trump and Ben Carson. My main keywords & concepts are topped by vague terms such as “20th Century Fox” and “White People”. My shopping preferences suggest I’m influenced by family when making product purchases and that I don’t eat out frequently (surprise! it’s one of my worst habits). My top friends are a weird combination of people I vaguely know, have never heard of, and (ironically) someone who really dislikes me. If anything Data Selfie should be renamed Data Funhouse – I can see traces of myself but the image is generally warped.

Of course the erratic results are a matter of false assumptions that are programmed into the algorithms (it should be also be noted that Data Selfie does not attempt to visualize all the data that is likely being collected by Facebook).  The category where it’s easiest to see this is “Top Friends,” where the amount of time spent on a friend’s post is equated to the closeness of your relationship to that person. (Anyone who has ever used the internet can see what is problematic about that). But it’s in some of the other categories such as “Gender” and “Religion” where one has to wonder how this algorithm is measuring “Maleness” or “Jewishness”. 

So while Data Selfie attempts to let users peek into the black box of “Big Data,” it still conceals the process by which most of the categories are formed. It’d be more interesting to me if I could see the variables that come to inform Machine Learning’s perception of gender or religion, and maybe down the line the Chrome extension will expand to do just that. Till then, I’m somewhat reassured that sarcasm is not quite perceptible by machine learning, so with that in mind… God Bless President Trump.