If you've never had to consciously decode a face, the problem sounds trivial. For a lot of autistic people, it isn't.
The signal is there. The microexpressions, the shifts around the eyes and mouth that most allistic people read in under a second. What differs is the default pipeline that turns "face seen" into "emotion understood." Sometimes it runs slowly. Sometimes it runs accurately but costs a lot of cognitive bandwidth. Sometimes it works in one context and not another.
Structured interventions have existed for two decades. Simon Baron-Cohen's Mind Reading software. The PEERS program at UCLA. Dozens of smaller clinical tools. Most are flashcard systems: show a face, ask what emotion it shows, give feedback. They work. They are also blunt. Static stimuli, feedback that feels judgmental, no sense that the tool is paying attention to you.
That's the part neural networks genuinely change.
What the models do
Facial expression recognition is a well-solved computer vision problem. The reason: the thing being classified is a finite set of muscle movements. FACS (Ekman and Friesen, 1970s) decomposes a face into Action Units. AU1 is the inner brow raise. AU12 is the lip corner puller. Forty-six in total. Models like OpenFace, trained on CK+ and AffectNet, detect these in real time.
Multimodal vision-language models (GPT-4V, LLaVA) skip the decomposition and describe what a face is doing in plain language. Sometimes they're wrong in revealing ways. Sometimes they're disturbingly right.
Both unlock something the flashcard tools couldn't: private, real-time feedback that doesn't feel like a judgment.
What matters more than accuracy
The biggest benefit isn't that the AI is accurate. It's that the AI is not a person.
Practicing face reading in front of another human is its own stressor. You're being looked at and trying to look correctly at the same time. A camera and a model on your laptop, with feedback nobody else sees, is a completely different experience. It lowers the stakes by several orders of magnitude.
That's the target. Not "AI that fixes face reading." AI that makes practicing face reading survivable.
Where it fails
Two failure modes worth naming.
1. The training data is narrow. Most FER datasets are western, overwhelmingly white, and built on adults performing exaggerated posed expressions. A model trained on CK+ will confidently misread faces that don't match its distribution. Used on a child learning to read their own family, it teaches a confidently wrong map.
2. The framing is political. Many autism interventions implicitly define success as "becomes indistinguishable from an allistic person." Autistic adults have been rejecting this framing for years, often the same adults who benefited from social skills training as children but now describe it as having taught them to mask rather than to thrive. Build a system that trains users to recognize allistic expressions by allistic standards, call the result "improved social cognition," and you haven't built a neutral tool. You've picked a side.
The principle
Build these tools. Build them with your eyes open.
The useful question isn't "can we classify faces correctly?" That's solved. The useful questions are:
- Who asked for this?
- Is the user doing this because they want to, or because someone else wants them to perform differently?
- Can they turn it off, change its goals, disagree with its feedback?
- Does the tool treat being autistic as a preference or as a pathology?
These are design questions, not technical ones. They matter long before any model gets trained.
I work in this space with a community of neurodivergent developers. We keep landing on the same rule: the person using the tool has to choose it, configure it, and control it. Anything less is a politer version of the old problem.
Neural networks can teach someone to read faces. What's in doubt is whether the thing we're teaching is worth learning, and whether the person learning it got to decide.