ASTRA: HackerRank's coding benchmark for LLMs

We help companies hire & upskill developers. A customer recently asked: What % of HackerRank problems can LLMs solve? That got us thinking—how should hiring evolve when AI can translate natural language to code?Our belief: AI will handle much of code generation, so developers will be assessed more on SDLC skills with AI assistants.To explore this, we’re benchmarking LLMs on real-world software dev scenarios—starting with 65 unseen problems across 10 domains. Beyond correctness, we evaluated consistency—an often overlooked aspect of AI reliability. We’re open-sourcing the dataset on Huggingface and expanding it to cover more domains, ambiguous specs, and harder challenges.Would love the HN community’s take on this! Comments URL: https://news.ycombinator.com/item?id=43015631 Points: 5 # Comments: 0

Fév 11, 2025 - 23:19
 0
ASTRA: HackerRank's coding benchmark for LLMs

We help companies hire & upskill developers. A customer recently asked: What % of HackerRank problems can LLMs solve? That got us thinking—how should hiring evolve when AI can translate natural language to code?

Our belief: AI will handle much of code generation, so developers will be assessed more on SDLC skills with AI assistants.

To explore this, we’re benchmarking LLMs on real-world software dev scenarios—starting with 65 unseen problems across 10 domains. Beyond correctness, we evaluated consistency—an often overlooked aspect of AI reliability. We’re open-sourcing the dataset on Huggingface and expanding it to cover more domains, ambiguous specs, and harder challenges.

Would love the HN community’s take on this!


Comments URL: https://news.ycombinator.com/item?id=43015631

Points: 5

# Comments: 0