Codex, Opus, Gemini try to build Counter Strike

In the last week we have received three major model updates: Gemini 3 Pro, Codex Max 5.1, Cloud Opus 4.5. We thought we’d challenge them:

Create an original version of Counter Strike. The game had to have a 3D UI and it had to be multiplayer.

If you’re curious, open up (ideally a large computer screen) and you can try the handiwork of each model yourself:

  1. codecs max 5.1: https://cscodex.vercel.app/
  2. cloud opus 4.5: https://csclaude.vercel.app/
  3. gemini 3 pro: https://csgemini.vercel.app/
We have a full video of you walking through the build here, but for those who prefer text, you’ll find this post helpful.

We’ll look at some of our high-level impressions on each model, then delve deeper into the performance of specific signals.

Net

We signed up for the highest tier plan at each model provider and used the defaults set for their CLI. As for codecs, it is 5.1 codecs-max on medium setting. This is Opus 4.5 for cloud. And with Gemini it’s 3 Pro.

Then we gave each model about 7 consecutive signals. The signals were divided into two categories:

front end: At first agents only had to worry about game mechanics. Design the visuals, enemies, shooting logic, and some sound effects.

Backend: Once this is done, the agents will make the game multiplayer. They will need to select rooms. Users can connect with them and start shooting.

A high level overview

So, how did each model perform?

In a familiar tune with other anthropological models, Opus 4.5 wins on the frontendIt made good maps, good characters, good guns and generally had the right visuals from the start,

Once the design is ready, Gemini 3 Pro started wonking in the backendAdding multiplayer and persistence resulted in fewer errors, In general Gemini performed best by making logical changes rather than visual changes,

Codex Max felt like an “in between” model on both the frontend and backend. It gets a lot of “second place” marks in our book. It performed quite well on the frontend and also quite well on the backend, but experienced less spikes than other models.

Here is the scorecard in detail:

manual cloud Gemini
front end
boxes + physics
characters + guns
pov gun
sound
backend
going on
shooting
savings room
Bonus

Okay, now let’s delve deeper into each sign.

Goal number 1 was to establish the physics for the game. The models were required to design a map with a first-person perspective and the ability to shoot enemies.

ready

I want you to create a browser-based version of Counter Strike using three JS.

For now, just keep it local: don’t worry about backend, instantiation or anything like that.

For the first version, create a first-person view of the main character with cross hairs. Place enemies in random locations. Enemies have HP. You can shoot them, and kill them. When an enemy is killed, they respawn.

Make everything simple polygons – rectangles.

Here’s a side-by-side comparison of the visuals offered by each model:

Visually Cloud came up with the most interesting map. There were obstacles, a good floor, and you could see everything very well.

Mithun also got a chance to do some good work.

There was an error running the codec the first time. [1] (it called a function without importing it), but this fixed it immediately. Once the bugs were fixed, its map was the least pleasant to look at. The conditions were deep, there were no obstacles and the destination was difficult to find.

Now that we had a map and some polygons, we asked the models to stylize the characters. This was our signal:

I want you to make the enemies look like humans. Use a bunch of square polygons to represent a person, and maybe even a small gun

Here is the result of his work:

Again it feels like Cloud has done the best job here. The character looks quite human – almost at the level of design in Minecraft. Gemini also performed well. The codex improved its characters, but everything was the same color, which really let it down compared to others.

We then asked each model to add a gun to our first-person view. When we shoot, we wanted a recoil animation.

I want you to make it, so I also have a gun in my field of view. The gun shakes a little when I fire.

Here’s a side-by-side description of how the recoil felt for each model:

Here both Claude and Codex start working the gun at once. Although Claude’s gun looks like a real pistol.

Gemini had trouble trying to stick the gun to the camera. This took us a lot of back and forth until we realized that the gun was transparent.

We were almost done with the frontend: the last step was good. Here’s what we asked:

I want you to use chiptunes to animate the sound of shots. I want to make even deaths come alive.

Sounds were added very easily to all models. The last part of our prompt: “I want to animate deaths too.” Suddenly added to the video. Our intention was to add voice to the deaths. But It did not happen.

All three models misunderstood the sentence in the same way: They thought it was meant to explain how the characters died. It is true to a great extent, after reading the sentence again we will also be able to understand it in the same way.

Here are the results they came up with:

All models made sound easily. They all got animations, but we thought Cloud’s animation looked the funniest.

Now that all the models had a real frontend, we asked them to make it multiplayer.

We didn’t want the models to worry about shots right away: Goal 1 was to share movement positions. Here’s what we asked him to do:

I want you to use Quick Presence.

Don’t save anything in the database, just use Appearance and Themes. You can view the document.

There should be just one single room.

You no longer need randomly placed enemies. All players get a place.

For now, don’t worry about the shots. Let’s make it so that the players’ positions are determined by appearance.

Gemini got this right in one go. Both Codex and Cloud needed some more encouragement.

manual cloud Gemini
going on

It was interesting to see how each model attempted to solve the problems:

codec used Very Of introspection. This will constantly look at the TypeScript library and see the available functions. It did not appear so after looking at the documents.

Cloud treats documents very carefully. It read and re-read our documentation on presence, but rarely introspected the library like Codex.

Gemini was seen doing both. It looked at the docs, but then I think since it ran the build step continuously, it found any TypeScript errors that were present and fixed it.

Gemini made the fastest progress here, although by the time we pasted back the errors they all succeeded.

Then we moved on to putting the shots into action. Here was the sign:

Now let’s put the shots into action. When I shoot, I send the shot as a subject, and let it affect the target’s HP. When the target’s HP reaches zero, they must die and regenerate.

manual cloud Gemini
shooting

Claude got this right at once. Gemini and Codex had some problems to fix, but pasting the errors fixed them.

Now that all the models were in action in the same room, it was time to support them multiple Rooms.

The reason we added this challenge was to see (a) how they would deal with a new API (persistence), and (b) how they would deal with the refactoring required for multiple rooms.

So, now I want you to make it so that the main page is actually a list of maps. Since our UI is using a lot of polygons, make the style look like polygonal

Make the UI resemble the old Counter Strike map selection screen. i want you to save these maps In the database. Every map has a name. Use a script to create 5 random maps with nice names.

Then, increase some permissions so anyone can view the maps, but they can’t create or edit them.

When you connect to a map, you can use the map ID as the room ID for attendance.

map ui

All models performed well with UI. Here’s what each person looked like:

We like Gemini’s UI the most, but they were all great.

Perseverance

And persistence worked well too. They all dutifully created the schema for the maps, pushed a migration, and seeded 5 maps.

refactor

But Things got complicated in the refactor.

GPT 5.1 Codex Max (Medium) cloud 4.5 opus gemini 3 pro
savings room

Mithun completed the work in one go. It also chose to keep the map ID in the URL, making it much easier to use. The codecs passed one back and forth with a query error.

but cloud In fact Stuck. The culprit was Hook. Since UseEffect can run multiple times, it ran into some very subtle bugs. For example, it created 2 canvas objects instead of 1. There were also several animation referees running simultaneously.

It was hard to fix things on my own. To unblock the cloud here we had to put on our engineer shoes and actually look at the code.

However this gave us some ideas:

  1. Cloud’s issues were like humans’. How many of us get frustrated with UseEffects running twice, or the dependency tables being wrong? I think improvements in React DX on these two issues can really advance humans and agents.
  2. And what if a non-programmer had been building it? They must be really stuck. We believe more tools are needed to move from “strict vibe coding” to “real programming”. Right now the jump seems very fast.

Finally, all the models actually created a multiplayer FPS with zero hand-written code! This is great.

parting thoughts

Well, the models have certainly improved. They can take very high-level feedback and much-higher-level documentation. However what really surprises us is how much they can replicate their work thanks to the CLI.

However, there is still much left to do. The promise that you’ll never have to look at code doesn’t seem realistic yet.



<a href

Leave a Comment