An Experimental Target-Recognition AI Mistakenly Thought It Was Succeeding 90% of the Time

The American military news site Defense One shares a cautionary tale from top U.S. Air Force Major General Daniel Simpson (assistant deputy chief of staff for intelligence, surveillance, and reconnaissance). Simpson describes their experience with an experimental AI-based target recognition program that had seemed to be performing well: Initially, the AI was fed data from a sensor that looked for a single surface-to-surface missile at an oblique angle, Simpson said. Then it was fed data from another sensor that looked for multiple missiles at a near-vertical angle. "What a surprise: the algorithm did not perform well. It actually was accurate maybe about 25 percent of the time," he said. That's an example of what's sometimes called brittle AI, which "occurs when any algorithm cannot generalize or adapt to conditions outside a narrow set of assumptions," according to a 2020 report by researcher and former Navy aviator Missy Cummings. When the data used to train the algorithm consists of too much of one type of image or sensor data from a unique vantage point, and not enough from other vantages, distances, or conditions, you get brittleness, Cummings said. In settings like driverless-car experiments, researchers will just collect more data for training. But that can be very difficult in military settings where there might be a whole lot of data of one type — say overhead satellite or drone imagery — but very little of any other type because it wasn't useful on the battlefield... Simpson said the low accuracy rate of the algorithm wasn't the most worrying part of the exercise. While the algorithm was only right 25 percent of the time, he said, "It was confident that it was right 90 percent of the time, so it was confidently wrong. And that's not the algorithm's fault. It's because we fed it the wrong training data."

Read more of this story at Slashdot.