GPT-5's true identity revealed, its first programming test dazzles the internet, a game is created in seconds with just one sentence, and OpenAI's two leaders prepare for AGI

This article is machine translated
Show original

GPT-5 is even closer! Today, the mysterious model, Horizon Alpha, has gone viral, with its first coding test showing incredible performance and various third-party benchmarks released. Just before its release, a core member of OpenAI admitted in an interview that the model still has bottlenecks, but remains confident that the scaling law has no end in sight.

The signs of the release of GPT-5 are getting stronger.

This morning, a mysterious model, Horizon Alpha, was suddenly launched on OpenRouter, and various charts and tests swept the entire network.

The Horizon Alpha model has a 256K context, is extremely responsive, and is very good at creative writing.

It also has "inference" capabilities, but the budget for inference tokens is twice that of o4-mini.

When it comes to programming, Horizon Alpha is unbeatable.

It can generate various games such as "Fruit Ninja" and "Alien Catches Cow" in one sentence, can directly output advertisements based on logo images, and can easily pass the "hexagonal physics simulation" test.

In the writing EQ-Bench benchmark test, Horizon Alpha ranked first, far surpassing o3 and Gemini 2.5 Pro.

What’s even more amazing is that it can complete 20-digit multiplication operations within 30 seconds.

Various code-named models that were previously leaked, such as lobster, zenith, summit, etc., amazed everyone in multiple tests.

All signs indicate that the GPT-5 "family bucket" is definitely the most powerful model on the planet.

Google included the OpenAI documentation page for GPT-5, currently 404

More details about Horizon Alpha are all concentrated in the actual tests of netizens.

The mysterious Horizon Alpha debuts, boasting incredible programming

Currently, testing of the Horizon Alpha version can be started on the OpenRouter platform.

Portal: https://openrouter.ai/chat?room=orc-1754007231-sX8GtgCUyNkHh6O6In2l

During model inference, Horizon Alpha has the fastest throughput, reaching 120 tokens/s, compared to Claude Sonnet 4 (60-80 tokens/s).

In the throughput comparison test, Horizon Alpha is currently the fastest.

Stunning physics simulation, build web pages in seconds

Some netizens asked it to create a fully functional Windows 95 retro desktop. The effect was surprising and the generation speed was extremely fast.

Another test that simulates physics by placing a ball inside a polygon.

Whether it is a hexagon or a triangle, even if the range in which the ball can move is reduced, it will not affect the effect.

A more challenging version involves 20 balls bouncing inside a rotating heptagon. Netizens were amazed and said, "This is one of the best versions I've ever seen."

Horizon Alpha can create a web page displaying a series of simple and fun browser games in 3 minutes and 48 seconds.

The same prompt was given to Horizon Alpha: “Create a visually interesting shader that can be run in a twigl application to make it look like a stormy ocean.”

Wharton CS professor Ethan Mollick marveled that this is the best so far and was created very quickly.

When netizens asked it to "create a business website related to dog walking", Horizon Alpha asked a lot of questions that needed to be confirmed in advance; Sonnet 4 would directly provide solutions.

Left: Horizon Alpha; Right: Claude Sonnet 4

Ultimately, judging by the build results, the Horizon Alpha output is high-quality and concise, while the Sonnet 4 output is longer, more comprehensive, and more creative.

Top: Horizon Alpha; Bottom: Claude Sonnet 4

Horizon Alpha will also build its own banking website.

Excellent design and aesthetic sense

AI expert Matthew Berman personally tested its SVG creation and UI design functions, and Horizon Alpha instantly generated a professionally designed and aesthetically pleasing image.

Previously, Simon Willison, a great figure in the AI circle, said that the history of AI evolution can be seen from a picture of a "pelican riding a bicycle".

Now, the same SVG test generated by Horizon Alpha is the strongest among all the models.

Some other great SVG examples.

It should be noted that although the performance in various tests is extraordinary, according to various speculations, Horizon Alpha may just be a small model.

No matter which one of GPT-5, the next step is to wait for OpenAI to release it.

Interview with OpenAI's "Two Heroes", Ultraman praises

Just before the release of GPT-5, OpenAI's two heroes - Chief Scientist Jakub Pachocki and Research Director Mark Chen - released important interviews at the same time.

This golden pair is the "two giants" who developed GPT-5.

This exclusive interview conducted by MIT Technology Review unexpectedly won the heart of Ultraman.

He praised it highly, saying, "I usually think articles like this miss the point, but this one really captures the essence of their collaboration."

What exactly did it mean to be highly recognized by Ultraman?

OpenAI's best partner

Anyone familiar with OpenAI’s internal personnel changes knows that Jakub Pachocki and Mark Chen are both rising stars.

Their styles are very different, but they complement each other perfectly.

Mark Chen, a former quantitative trader on Wall Street, is well-dressed and speaks eloquently, which can be said to have nothing to do with AI.

After joining OpenAI, he quickly grew into a key driving force behind DALL·E and GPT-4 multimodal capabilities and Codex, and is good at transforming complex research into products that everyone can use.

Jakub Pachocki, a low-key theoretical computer scientist, succeeded Ilya after his departure and is obsessed with pushing the limits of AI logic and creativity.

Regarding the internal division of roles, Pachocki said, "Chen is responsible for building and managing the research team, while I am responsible for setting the research roadmap and establishing our long-term technical vision."

The cooperation mode between them can be said to be "seamless switching".

No matter how complex the technical problems are, Pachocki and Mark always work together in perfect harmony to quickly overcome them.

AGI scale, autonomous time

Currently, the outside world's expectation for GPT-5 is that it will be a stronger, faster and more versatile behemoth.

In the interview, although Mark Chen did not directly address the GPT-5 problem, he admitted that "we are always trying to understand the technical bottlenecks of deep learning. Even the most powerful reasoning model currently cannot effectively connect knowledge."

Pachocki added, “We are still at the very beginning of the reasoning paradigm.”

What is crucial is how to enable a model to conduct long-term learning and exploration and come up with novel ideas.

At the same time, in their view, the Scaling Law is far from reaching its ceiling, and by investing more computing resources and data, the model will become better and better.

When asked how to view AGI, Mark Chen proposed an indicator - the model's ability to work autonomously for a longer period of time, namely "autonomous time".

This concept is simple yet profound. It represents the length of time that AI can continue to make progress in solving complex problems without human intervention.

This vision far exceeds the capabilities of current models, whose autonomy is limited to a few minutes to an hour and which often get stuck when encountering unfamiliar scenarios.

Mathematics + Programming, the Holy Grail of AI?

Some time ago, OpenAI models achieved good results in two top competitions:

First, he won second place in the AtCoder World Tour Finals. Second, he won the gold medal in the IMO 2025 competition.

In the AtCoder competition, Psyho's victory demonstrated the unique creative thinking of humans, similar to when AlphaGo defeated Lee Sedol in the Go game.

“We’re talking about programming and math here, but it’s really about creativity, coming up with novel ideas, connecting ideas from different fields,” Pachocki said.

In their view, mathematics and programming are the cornerstones of "general intelligence."

References:

https://x.com/karminski3/status/1950987896565182587 https://x.com/chetaslua/status/1950784759799718161

https://www.technologyreview.com/2025/07/31/1120885/the-two-people-shaping-the-future-of-openais-research/

This article comes from the WeChat public account "Xinzhiyuan" , author: Xinzhiyuan, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments