New AI coding challenge just released their first results – and they are not beautiful

The new AI coding challenge has revealed its first winner and set a new bar for AI engines for software engineers.

Wednesday at 5 pm PST, non-profit Laude Institute announced the versatile AI coding challenge founded by first winner of the K-Award, Databricks and confused founder Andy Konwinsk. The winner was a Brazilian fast engineer called Eduardo Rocha de Andrade, who will receive $ 50,000 from the prize. But more surprising than victory was his end result: he won only 7.5%of the test questions with the right answers.

“We are delighted that we built a reference point that is really difficult,” said Konwinski. “The reference values should be difficult if they matter,” he continued and added, “The results would be different if large laboratories had come with their biggest models. But that’s the thing. The K Prize is made with limited offline status, so it favors smaller and open models.

Konwinski has promised a million dollars for the first open source model with over 90%test.

Similar to the well-known SWE-Bench system, K-Award tests models for questions about Github tickets as a test on how well models can deal with real-world programming problems. But while SWE-Bench is based on fixed problems that the models can train, the K-Award is designed as a “non-polluted version of SWE-Bench”, which uses a timed entrance system to protect any reference-specific training. For the first round, the models were by March 12. The organizers of the K Prize then built the test using only GitHub questions, which were marked after that date.

The highest score of 7.5%is noteworthy to the Swe-Bench itself, which currently shows the top 75%score in its easier ‘certified’ test and 34%in a more difficult “full” test. Konwinski is still not sure if the difference is due to SWE-Bench pollution or just the challenge to gather new questions from Github, but he expects the K-Award project to be answering the question soon.

“When we get more, we have a better meaning,” he told Techcrunch, “because we expect people to adapt to the dynamics of competition every few months.”

TechCrunch event

San Francisco
And
27.-29. October 2025

It may seem a strange place to intervene, given the wide range of the publicly available AI coding tools -but as the comparative values change, many critics see projects such as K Prize as an essential step towards a solution AI’s growing assessment problem.

“I am quite ascending to build new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor, who presented a similar idea in a recent magazine. “Without such experiments, we can’t really tell if the problem is polluting or even by targeting the SWE-Bench Proposal Board with a person in the loop.”

For Konwinski, it is not only a better benchmark, but an open challenge for other industry. “If you listen to a hype, it’s like we should see AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% of SWE-Bench, it is a reality check for me.”

Leave a Comment Cancel reply