A GPT-4 Capability Forecasting Challenge

This is a game that tests your ability to predict ("forecast") how well GPT-4 will perform at various types of questions. (In case you've been living under a rock these last few months, GPT-4 is a state-of-the-art "AI" language model that can solve all kinds of tasks.)

Many people speak very confidently about what capabilities large language models do and do not have (and sometimes even could or could never have). I get the impression that most people who make such claims don't even know what current models can do. So: put yourself to the test.

WARNING: I made this website in August of 2023---AGES ago in LLM time. The questions here are about OpenAI's GPT-4 as it was immediately at launch. Current LLMs likely behave much different.

How likely do you think GPT-4 is to answer the question below correctly? Enter a number between 0 and 1, where 0 means you think the model has a 0% chance of getting the question right, and 1 means a 100% chance of getting it right. But don't be over-confident!

Question:

What is the capital of France?

Answer:

Paris