Anthropic's Claude Sonnet 4.5 now scores 77% on a key software engineering benchmark and can work autonomously for over 30 hours on complex tasks.