When will artificial intelligence really pass the test?

Artificial intelligence is getting very good at doing certain things.

Google’s AI team has unveiled impressive early detection and warning of floods and wildfires. AI can aid wildlife conservation by analyzing millions of animal images. Somewhat controversially, AI can generate art and write articles.

These AI specialists continuously improve on their tasks, but how do you measure an AI’s general intelligence, and its ability to adapt and react to unanticipated changes?

“Most AI systems are evaluated in a narrow set of problems,” says Larry Holder, a computer science professor at Washington State University. “It’s what we call an AI savant that might be really good at one problem and terrible at another.”

Larry Holder

Holder and his research team at the School of Electrical Engineering and Computer Science are working toward measuring how more robust AI systems can solve a large array of tasks with unexpected twists: an IQ test of sorts for machine learning.

They started in 2018 with AIQ, a free tool to evaluate AI abilities, developed by Holder and Christopher Pereyda, then an undergraduate and now a WSU doctoral student. AIQ provides an environment to test and rank AI systems on tasks like playing video games, answering SAT problems, and solving the Rubik’s cube.

Christopher Pereyda with background of formulas on a white board — Christopher Pereyda

The timing of the AIQ effort coincided with Holder and WSU receiving a grant of just over $1 million from the Defense Advanced Research Projects Agency (DARPA) to develop and evaluate AI systems for their ability to handle novelty.

Eight teams run AI systems while four other teams, including WSU, throw unexpected challenges at the AIs in the project, DARPA’s Science of Artificial Intelligence and Learning for Open-world Novelty (SAIL-ON). Three tests, called “domains,” allow AIs to test their mettle in dealing with novelties.

The first test, CartPole, simulates a pole on top of a cart, with the goal of keeping the pole upright by moving the cart from side to side. Easy enough for an AI, until the WSU or another team introduces wind or an incline.

AI systems play a version of classic first-person shooter video game Doom for their next test. Specifically designed for AIs, ViZDoom “players” must shoot their way through monsters while avoiding damage.

It’s pretty straightforward until Holder and his team introduce some surprises, such as teleporting enemies to other rooms or eliminating extra ammo. The AI might think, “I’ll shoot like crazy because I can just pick up ammo if I need more. And now suddenly, you don’t have that and so you have to change your strategy,” Holder says.

The third domain connects to a WSU research strength: smart home environments. Diane Cook, Regents Professor in computer science, leads several projects to bring AI into homes to help seniors and others who need assistance. She is also part of the SAIL-ON grant.

In the SAIL-ON test, an AI tries to determine what a person in a smart home is doing, such as cooking, cleaning, or exercising. A lot of sensors collect data but, Holder asks, what if a sensor stops working? What if a person “fools” the sensors by doing something like opening and closing a door while pretending to go for a walk?

With each phase of the SAIL-ON program, WSU and the other teams introduce more hidden novelties. “We don’t tell anybody about the novelties,” Holder says, so in a complex setting like the smart home, AI systems really need to adapt to understand what’s happening.

Holder would like to see a component for AI systems to detect and adapt to novelties, rather than hard-coding novelty into the system. Holder’s team eventually wants to release this “novelty generator” to the public.

“Anybody could play these different environments, encounter novelty, and measure how well their system does against it,” Holder says. “I think it would challenge the AI community and help motivate them to build more general purpose adaptability into their systems.

“In order for AI to progress and become more robust, it needs to be able to deal with lots of different tasks,” Holder continues. For example, a robot assistant in a home might do laundry, cook, and clean. “They need to be able to do all those tasks pretty well, but they need to be able to adapt to changes that the programmers may not have anticipated.”

Still, “I think we’re a long way off from an AI system that essentially can compete or exceed human capabilities in all areas, like Data from Star Trek,” Holder says.

Web exclusive

Podcast: The ethical dilemmas of AI-generated art and writing

“AI for wildlife conservation—from an AI” (WSM Spring 2023)

“Researchers improve security for smart systems” (WSU Insider, November 7, 2022)

“Artificial intelligence: How to measure the ‘I’ in AI” (TechTalks, December 3, 2019)

“Unleash all this creativity: Google AI’s breathtaking potential”
(Axios, November 3, 2022)

“Scientists Increasingly Can’t Explain How AI Works”
(Motherboard/Vice, November 1, 2022)

Web exclusive

Read more