Data recycling

I found fascinating to read about the history of machine voice recognition. A development that, in the search for an empirical standard of the American English language, eliminates the richness of its speakers’ origins, and subsequently, the history of the land the language is rooted; placing white males as the language standard speakers, while speakers from different backgrounds are unaccounted as they are understood as “unrealistic” or “abnormal”.

As this standardization of speech is the basis of current voice recognition and analysis technologies, it was disturbing to read the trials examples and the legal implications the “standard” language has when used to judge “non-standard” citizens.  People with different backgrounds are not recognized as individuals, according to this methodology, but as interchangeable members of a community.

In the race of developing new exciting products, companies place more attention to technologies than to the used data, often employing existing, biased data without much regard of how it was collected, categorized, and scrubbed. We should stop recycling data.

One Reply

  • Excellent, Liliana. I particularly appreciate your comments regarding the ethics — and wisdom! — of recycling and reifying old training sets. We see this in so many contexts — including the use of Henrietta Lacks’s cancer cells in medical research.

    This is your fourth and final post — thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *