Putting The Decis Agents To The Test

Why we're participating in the Metaculus Forecasting Tournament

The Decis Intelligence agents weren't designed to be forecasting tools. (In fact, I have a draft article explaining why we aren't in the prediction business.) However, as I was building out the scenario planning tools, it was obvious that if these were to be useful, I needed a way to determine how effective, realistic, and accurate the scenario likelihood estimates were.

But, as Andrew Ng wrote recently, evals and benchmarking are hard and expensive to set up. And in this case, how do you even evaluate the effectiveness of a forward-looking forecast?

Enter the Metaculus AI Benchmarking series.

This is a series of tournaments "designed to benchmark AI forecasting capabilities against top human forecasters on complex, real-world questions". These tournaments not only allow us to see how well the agents do with respect to realistically assessing how a situation will play out, but they also let us measure the agents' effectiveness against human forecasters.

Over the next three months, the Decis agents will conduct their analysis and make forecasts on hundreds of questions while being compared to human forecasters, other bots, and critically, the actual outcome of real world events.

This means that the tournament is a wholly objective test of the agents' performance in real-world conditions (and a very public one: the bot is named decis-ai-bot1, so it's going to be hard to hide). We're already 190+ forecasts in, so the questions are coming thick and fast, and some are resolving soon, meaning we will see some results before the end of April.

So I'm very much looking forward to seeing how the agents perform and am particularly excited to test the subject-matter-agent and worldSIM capabilities.

A big thanks to the Metaculus team for putting the tournament together and for making the bot-building framework so straightforward. 🚀