Meta AI Image Decoder recreates mental imagery from brain scans

The computing interfaces of the not-too-distant future might move beyond touchscreens and keyboards — even past eyes and hand gestures, to the inside of our own minds.

Society is not quite there yet, but we are moving closer. Researchers at Meta Platforms, Inc., parent of Facebook, Instagram, WhatsApp and Oculus VR, today announced Image Decoder, a new deep learning application based on Meta’s open source foundation model DINOv2 that translates brain activity into highly accurate images of what the subject is looking at or thinking of nearly in realtime.

In other words, if a Meta researcher was sitting in a room and blocked from viewing the subject, even if the subject was on the other side of the world, the Image Decoder would allow the Meta researcher to see what the subject was looking at or imagining, based on their brain activity — provided the subject was at a neuroimaging facility and undergoing scanning from an MEG machine.

The researchers, who work at the Facebook Artificial Intelligence Research lab (FAIR) and PSL University in Paris, describe their work and the Image Decoder system in more detail in a new paper.

In notes provided over email to TechForgePulse by a spokesperson, Meta wrote that “his research strengthens Meta’s long-term research initiative to understand the foundations of human intelligence, identify its similarities as well as differences compared to current machine learning algorithms, and ultimately help to build AI systems with the potential to learn and reason like humans.”

How Meta’s Image Decoder works

In their paper, Meta’s researchers describe the technology underpinning Image Decoder.

It is essentially combining two, hitherto, largely disparate fields: machine learning —specifically deep learning, wherein a computer learns by analyzing labeled data and then inspecting new data and attempting to correctly label it — and magnetoencephalogphy (MEG), a system that measures and records brain activity non-invasively, outside the head, using instruments that pick up on the tiny changes in the brain’s magnetic fields as a person thinks.

Meta Researchers trained a deep learning algorithm on 63,000 prior MEG results from four patients (two women and two mean with the mean age of 23) across 12 sessions, in which the patients saw 22,448 unique images, and 200 repeated images from that original pool.

The Meta team used DINOv2, a self-supervised learning model designed to train other models and which was itself trained on scenery from forests of North America, and which Meta released publicly in April 2023.

The researchers instructed the Image Decoder algorithm to look at both this raw data and an image of what the person was actually seeing when their brain was producing that MEG activity.

In this way, by comparing the MEG data to the actual source image, the algorithm learned to decipher what specific shapes and colors were represented in the brain and how.

Promising results and ethical considerations

While the Image Decoder system is far from perfect, the researchers were encouraged by the results, as it attained accuracy levels of 70% in its highest performing cases in terms of accurately retrieving or recreating an image based on the MEG data, seven times better than existing methods.

Some of the imagery that the Image Decoder successfully retrieved from a pool of potential images included pictures of broccoli, caterpillars, and audio speaker cabinets. It was less successful at decoding more complex and varied imagery, including tacos, guacamole, and beans.

Graphic showing how Meta’s Image Decoder performed across decoding different MEG data into imagery. Credit: Meta Platforms, Inc.

“Overall, our findings outline a promising avenue for real-time decoding of visual representations in the lab and in the clinic,” the researchers write.

However, they noted that the technology poses “several ethical considerations,” as being able to look inside a person’s mind is a new level of invasiveness that technology has not yet attained on a large scale.

“Most notably,” among the ethical considerations the researchers put forth is “the necessity to preserve mental privacy,” though they don’t state exactly how this would be achieved.

The fact that this work is funded by a parent company that has already been fined billions for violating consumer privacy with its products is also a notable concern, though the researchers don’t directly address this elephant in the room.

But there are technological limitations that would prevent this technique from, for now, being used to read a person’s thoughts without their consent. Namely, the Image Decoder works best on concrete imagery of physical objects and sights a person has seen.

“By contrast, decoding accuracy considerably diminishes when individuals are tasked to imagine representations,” the researchers note.

In addition, “decoding performance seems to be severely compromised when participants are engaged in disruptive tasks, such 9 as counting backward (Tang et al., 2023). In other words, the subjects’ consent is not only a legal but also and primarily a technical requirement for brain decoding.”

So, a person who was subjected to an Image Decoding of their brain activity without their consent could take it upon themselves to stop it by resorting to a technique such as counting backward — if they were aware of that option and the circumstances they were in.

TechForgePulse's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.