This week’s readings are concerned with the challenges for indexing, storage, and access of photo archives, all of which are complicated by “the constant problem of adequately attending to the different levels of content in any picture.” (Vestberg)
The things depicted are references to pre-existing objects in the world, making every photograph a “mini-archive within the archive,” according to Vestberg. Yet, she adds, they might have little to do with the thing depicted, or in other words: what the photo might mean. “Accounting for what a picture shows is never the same as describing what it depicts” (Vestberg).
This complicates not only the work of professional picture archivists, but also the photo management of anyone with a camera on hand — which today is virtually everyone. The problems illustrated in the readings made me appreciate more the mess that is my own photo archive, which I want to talk about today. (Apologies for taking my own work as an example! It seemed to fit when I first started writing, but at this point I feel embarrassed)
I have owned a camera since 2011 and have since then accumulated over 15,000 pictures, not counting the thousands more taken with mobile phones. This archive consists of mostly “fine art” photography — not in the sense that it’s “fine”, or ”art”, but in that its focus is aesthetics, not personal or documentary. I keep my pictures in Adobe Lightroom, a photo management and editing software. By default, Lightroom displays them in a grid reminiscent of a contact sheet from analog times. Pictures are grouped in folders by import date or date taken, if this metadata is available. These folders — perhaps the digital equivalent of physical vertical files — are listed in the sidebar, like a single filing cabinet slid open.
I have never found a good way to organize my collection in this system. To make a long story short, it’s a mess that I have avoided dealing with. Until, two years ago, I wanted to make a photo book. Figuring that organizing the database myself was a lost cause, and unable to rely on any metadata whatsoever, I decided it’d be an interesting experiment to computer-curate the book: to automate all metadata collection and organize the material purely on computable information.
So I used online computer vision services by Google and Microsoft to generate captions and tags for each image. Then, I collected dominant colors by averaging pixel distributions, and applied the Histogram of Oriented Gradients algorithm to quantify image composition. This way I was able to generate about 850 data points for each picture.
Without knowing, I had tapped into what Kamin calls the “discourse of affinities:” processing and organizing images solely by their formal content rather than their documentary value. At the Minneapolis College of Art and Design in the 60s, preceded by Warburg’s Mnemosyne Atlas in the 1920s, and Malraux’s 1949 Musée Imaginaire, former MoMA librarian Bernard Karpel sought to “force a reorientation of basic library approaches away from the historical and factual formulas to those that can follow the exploitation of the image in semantic and aesthetic terms.” (Kamin)
Kamin notes that “the discourse of affinities is manifest in discussions around pattern recognition and machine vision” today — the very technology I used to complete my project. Here, the generated tags and captions are purely descriptive, as they have to be derived from machine-readable form. And only sometimes, accidentally, they transcend into something bigger: when the computer gets things wrong.
Assuming that at this point we are all somewhat familiar with the politics of machine learning and computer vision I want to point out that they become visible in my project, too, and move on to another aspect that I think is relevant.
One of my frustrations with Adobe Lightroom is the rigid organizing structure that does not at all address the aforementioned “problem of adequately attending to the different levels of content in any picture.” (Vestberg) Generally, it seems that photo storage infrastructure is unable to account for complex interrelations between pictures: In an ideal spatial organizing system you would want a photo of a black cat, for example, to be close to other felines as well as animals in general, but also in the proximity of internet memes, the color black, and folklore.
The vertical filing cabinet in the physical archive, and its digital counterpart in Adobe Lightroom, are all limited in their spatial configuration. There are only so many attributes that can be taken into account simultaneously in adjacent files, folders, cabinets — in the physical world with its three dimensions. However, in geometry it doesn’t matter whether you describe a space in three, four, five, six, or more dimensions. So theoretically, my cat picture could be in the animals section in three dimensions but also next to folklore/religion in an additional dimension.
To imagine a fourth dimension, one would have to be able to picture a fourth axis perpendicular to all other three, and to imagine 800 more is unthinkable. Yet, although we can’t see it, I was able to computationally arrange my own photo collection in 850-dimensional space, considering all 850 criteria at once.
If you glance over the fact that many aspects of a picture’s meaning can’t be derived or represented computationally, a high-dimensional archive possibly enables a superior ordering logic; where all these simultaneous connections become possible, where formal as well as documentary attributes could be considered side-by-side and all at once, if they only can be quantified.
But there is a catch. Rendering this archive visible requires reducing all 850 dimensions again to the two or three we can perceive. To do this, I used a dimensionality reduction algorithm called t-SNE. The algorithm calculates the distances between all pictures in high-dimensional space and arranges them in 2D, trying to achieve a similar distribution.
The result is a map of affinities where pictures that were close together in high-dimensional space are grouped together in 2D as well. Unfortunately, accuracy is impossible. Similarly to how flat maps of our spherical Earth always distort the globe in some way or another (e.g. rendering Greenland or the poles either huge, tiny, or distorted), t-SNE can never account for all of the spatial relationships in hyper-space at once.
Finally, to get the arrangement into book form, I used another algorithm that attempts to compute the shortest path through all points in the arrangement, thereby defining the page order. The sequence is bound as an accordion book, a 95ft-long journey across categories, formal characteristics, and descriptions simultaneously. It’s a new way to traverse the collection, but leaves the viewer unable to see the whole picture where it all made sense.
Nina Lager Vestberg, “Ordering, Searching, Finding,” Journal of Visual Culture 12:3 (2013): 472-89.
About the politics, philosophy, and aesthetics of image classification, and how historical models prefigured the logics of machine vision: Diana Kamin, “Mid-Century Visions, Programmed Affinities: The Enduring Challenges of Image Classification,” Journal of Visual Culture 16:3 (2017): 310-36.