There is a consensus across all readings that archival collections are not neural historical records, but shaped — both willingly and accidentally — by archivists, technology, infrastructure, culture, and political ideology. This extends beyond what is archived to what is recorded in formats that lend themselves to archiving in the first place.
In the world of machine learning — underwriting its supposed omnipotence — “everything is a vector”, meaning that any information can supposedly be encoded in numerical, tabular form which can be computed with. Work in this field frequently uses, or “mines“ digital archives. Even if everything in those repositories were vectors, they would only point at records, not data itself. The data processing pipeline doesn’t draw from the source, but only connects to the tap. It misses out on the realia, things that hold meaning but were never even archivable in the first place. Data scientists should chat with archivists more.