Computing

Large Language Models Are Improving Exponentially

A new METR study shows that LLM capabilities are doubling roughly every seven months, suggesting that by 2030, models may reliably complete month-long human tasks though real-world factors may slow this growth

Techscribe
Techscribe
IEEE Techverse
November 30, 2025
4 min
Back to Tech Scribe
Large Language Models Are Improving Exponentially

Benchmarking large language models (LLMs) is difficult because their main goal producing human-like text doesn’t align with traditional performance metrics. Still, measuring progress is essential for understanding how rapidly LLMs are improving. A recent study from Model Evaluation & Threat Research (METR) proposed a new metric called the “task-completion time horizon,” which estimates how long a task would take human programmers compared to an LLM achieving the same task with a given reliability.

Using this metric, METR found that leading LLMs have shown exponential improvement, with capabilities doubling every seven months. Their analysis suggests that by 2030, top LLMs could complete complex software or creative tasks like launching a company, writing a novel, or enhancing an existing model with 50 percent reliability. Many such tasks could be finished far faster by AI than by humans.

The researchers also introduced a “messiness” score, noting that LLMs struggle more with real-world, unstructured tasks. While the rapid pace raises concerns reminiscent of self-improving AI scenarios, METR emphasizes that practical constraints, such as hardware limitations and robotics challenges, may prevent runaway growth. Nonetheless, the potential benefits and risks of such powerful systems make accurate benchmarking crucial.

Read more-https://spectrum.ieee.org/large-language-model-performance

Computing
Techscribe

Techscribe

IEEE Techverse

IEEE Techverse Sri Lanka

Subscribe

Enjoyed this article?

Subscribe to our newsletter to get more tutorials, insights, and updates on the latest topics.