The events of the 1989 Tiananmen Square protests continue to be a
subject of mystery. One of the most iconic figures from those protests is the
man known to the world only as “Tank Man,” a figure who courageously stood in
front of a line of military tanks.
While many
details about him, including his fate, remain unknown, one thing is absolutely
certain: Tank Man did not take a selfie with the tanks behind him.
Yet
Google displayed an AI-generated image this week implying otherwise
as its top search result for “Tank Man,” a result of indexing a popular Reddit
post.
Much of the
conversation accompanying the recent astronomical rise in AI capacity has
focused on the pragmatic, material risks of such technology, such as machine
bias and AI safety. But there have been far fewer discussions of the epistemic
challenges that the technology’s outputs — like the distorted Tank Man — pose
to our common knowledge foundation.
As students
encased in a broader research community, we should be concerned by how
generative AI threatens to tear apart the fabric of intellectual discourse.
Moreover, regardless of the posts that we occupy after leaving Cambridge, we
must be prepared for the possibility that the impacts of AI will reach across
every corner of our lives. In short, we are living in a transformational
moment, and it’s an open question how we and society writ large rise to the
challenge, if at all.
The progress
made by generative image tools over the last decade has been unprecedented. In
2014, image generation software could barely patch together a
pixelated, distorted human face, and now OpenAI’s latest release of DALL-E can
capture the intricate nuances of fantastical prompts, including “a vast
landscape made entirely of various meats.”
Putting
these advances in dialogue with philosopher Regina Rini’s scholarship yields
some concerning conclusions. Rini argues that certain types of media — for her,
specifically video and audio recordings — function as epistemic backstops, or
foundational layers of knowledge that underpin our collective informational
environments. It’s similar to the notion of “pulling the receipts” to prove
your involvement in a certain situation, as opposed to just relying on the
credibility of your assurance.
The rapid rise
of indiscernible AI-generated images and videos poses a central challenge to
this informational role, undermining a core epistemic function of our media. In
addition to the harm presented in each individual case, the recurring inability
to decide between real and fake images might result in us reflexively
distrusting all recordings, thereby eroding a common basis of knowledge.
When I
conversed via email with Matthew Kopec, the program director of Harvard’s
Embedded EthiCS program, he shared this sentiment, writing that “these tools
pose a serious threat to the health of our information ecosystems.”
At first glance, one can argue these advancements are but the latest development in a long history of media distortion: The vision of a pure, misinformation-free information ecosystem was always apocryphal. We’ve been building our lives on messy portraits for as long as we can remember. From the horribly-colored sepia Instagram filters of the early 2010’s to the teeth-whitening madness of FaceTune, distortions are now the expectation on social media, not the exception And tenuous information has been part of our ecosystem for decades and centuries. Part of the demand of being educated citizens is distinguishing between misinformation of all sorts — whether government propaganda, corporate ad campaigns, or straight-up fake news — and reality. The Tiananmen Square protests, fittingly, exemplify the struggle to find truth amidst the noise.
Up until
now, though, the tools at our disposal have been relatively adequate. We
evaluate the trustworthiness of where information comes from, how it’s been
generated, and the context in which it’s presented. We ask our aunt where she
got her most recent political news from, and we know when Photoshopped images
look a little bit too good to be true.
But the
newest generation of imaging tools threatens to change that.
For
starters, human intuition is no match for this tech as humans can no longer
differentiate between human and AI-generated content. Ana Andrijevic — a
visiting researcher at Harvard Law School writing her dissertation on the
impact of AI on copyright — pointed me to a 2019 survey by Pew Research
Center, taken before the advent of Generative AI, which found that while 63
percent of respondents recognized the issue of made-up images, an almost equal
number thought it was too much to ask the average person to recognize them.
“I am sure
that we would reach similar conclusions today,” she wrote in an email. “Even if
we can ask users to be even more critical of the images they are confronted
with, I don’t think we can validly ask them to be able to detect on their own
whether they are confronted with a Generative AI image.”
Intuitively,
the solution to a technical problem is one of more technology, but this
approach is similarly fruitless. Multiple platforms have started to acknowledge
that they have an AI content problem. TikTok announced earlier this
week that creators must explicitly disclose AI-generated content, while
Instagram is supposedly working on an AI-detection tool of its
own.
But the
experts I spoke with were pessimistic about such initiatives. Mehtab Khan, a
fellow at Harvard’s Berkman Klein Center for Internet and Society studying
generative AI tools, wrote in an email that “current practices are clearly
insufficient to deal with the challenges posed by generative AI.”
Human
moderation, for one, is a nonstarter.
“The task of
moderating online content will be even more difficult when misinformation can
appear solely in image form, with no caption needed to have the desired
misleading effect,” Kopec wrote.
With the
issue of human moderators out of the way, what about digital tools? That also
doesn’t work. As Andrijevic described it to me in an email, there is a “clear
imbalance between the development and release of AI tools” and the technologies
designed to detect them.
She pointed
out that nearly all of the aforementioned AI-detection tools that companies
have created — whether OpenAI’s DALL-E or Google DeepMind’s SynthID — only work
for images created through the same technology. For example, Andrijevic wrote,
Google’s technology “cannot be applied to images created by other tools, such
as OpenAI’s Dall-E, or Midjourney.”
Given the
inevitable flood of pseudo-real images and our current inability to distinguish
them, we might finally be approaching the situation that Rini feared. The trust
in basic sources of information required for her intellectual exchange will
become increasingly strained, and, to be honest, it seems that we don’t have a
clear solution. The task that lies before us, then, is to imagine what comes
next.
“As the
father of two young kids, I can only admit that I have absolutely no idea what
digital literacy will need to look like in five or ten years and that this
ignorance on my part deeply concerns me,” Kopec wrote.
“I hope we,
as a society, put our best minds to work on that puzzle,” he added.
Of course,
what is to come in the future, particularly with regard to emerging
technologies, is necessarily speculative. But to address these challenges requires
a concerted effort from all parties affected — technologists at AI companies,
policymakers in DC, and philosophers like Rini — to solidify the informational
ground we stand on. Otherwise, we risk living in a history constantly revising
itself.