June 2025 — San Francisco. In a breakthrough moment for AI-assisted security research, security expert Sean Heelan has revealed that OpenAI’s o3 large language model independently discovered a zero-day vulnerability (CVE-2025-37899) in the Linux kernel. The flaw affects ksmbd, the in-kernel SMB3 server responsible for network file sharing on Linux systems.

What makes this discovery extraordinary is that the model identified the vulnerability without any auxiliary tools, symbolic execution, or static analysis frameworks—just by reading and reasoning over raw code via API calls to o3.


A Zero-Day Found With a Prompt

The vulnerability, CVE-2025-37899, is a use-after-free triggered by a race condition in the smb2_session_logoff handler. It arises when two threads share access to a session: one frees a pointer (sess->user) during a logoff, while the other continues to access it—resulting in potential kernel memory corruption or arbitrary code execution.

OpenAI’s o3 model successfully identified this flaw in code spanning over 12,000 lines. It even produced a structured vulnerability report, outlining the issue, affected paths, and exploitation conditions—something typically expected from experienced security researchers.

“LLMs have made a real leap in code reasoning. They won’t replace experts, but they can make us dramatically more efficient,” Heelan wrote in his blog post.


Not Just Luck: Outperforms Other Models

To benchmark o3, Heelan tested it against a previously known vulnerability (CVE-2025-37778), also a use-after-free in the Kerberos session setup path. Out of 100 test runs:

  • o3 detected the bug 8 times
  • Claude Sonnet 3.7 found it 3 times
  • Claude 3.5 failed to detect it entirely

This 2–3× performance advantage confirms o3’s potential as the most capable LLM to date for real-world vulnerability research.


Key Advantages of o3 in Vulnerability Research

True code reasoning — o3 correctly identifies concurrency bugs requiring understanding of thread interleaving and shared state.

Human-like reporting — its outputs resemble a concise, well-written vulnerability disclosure.

No need for toolchains — it operates solely on textual prompts and raw code.

Accelerates review — helps validate existing bugs, evaluate patch completeness, and spot overlooked edge cases.


Key Limitations

⚠️ High false positive rate — In some tests, the signal-to-noise ratio was 1:50.

⚠️ False negatives still common — Complex or large codebases reduce detection rates.

⚠️ Prompt engineering still critical — Crafting the right code context is essential for success.

⚠️ Does not understand runtime behavior — Unlike dynamic tools (e.g., fuzzers), o3 lacks real-time insight into system execution.


Why This Zero-Day Matters

The most astonishing revelation came from a subtle insight: when o3 found the logoff vulnerability, it also corrected the flawed fix previously suggested for CVE-2025-37778. The original patch nulled the pointer after freeing, assuming this would prevent misuse. o3 correctly reasoned that in multi-threaded scenarios, another thread could still access freed memory before it is nulled, making the fix insufficient.

In some of its outputs, o3 pointed this out—demonstrating not just pattern matching but genuine multi-threaded reasoning.


Implications for the Security Community

This discovery marks a paradigm shift. For the first time, a general-purpose language model has:

  • Identified a real, critical kernel vulnerability before it was publicly known
  • Outperformed competing models in security-focused benchmarks
  • Demonstrated real value in code auditing and patch validation workflows

The implications are clear: LLMs are now viable assistants for vulnerability research. Not replacements—but accelerators. They should be integrated into toolchains, IDEs, and CI pipelines to amplify analyst efficiency and provide a second set of eyes in security-critical codebases.


Conclusion

With o3, OpenAI has shown that large language models can move beyond theoretical promise to practical impact in offensive and defensive security. While they remain imperfect, their capability is now well above the noise floor and worthy of integration into serious security workflows.

CVE-2025-37899 has since been patched. Administrators running Linux distributions with ksmbd support are urged to apply the latest kernel updates immediately.

Source: sean.heelan.io

Scroll to Top