# Gemini 3.5 Flash and Antigravity 2: Speed, Cost, and What the Benchmarks Actually Show


Google recently released Gemini 3.5 Flash and the Antigravity 2 developer suite. The marketing positions Flash as "frontier-level performance at 4x the speed" and "often at less than half the cost." **The benchmark data presents a more specific picture: exceptional speed and strong agentic capability, but significantly weaker coding scores** than Google's own comparisons suggest, and real-world costs that contradict the per-token pricing narrative.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/JegffwBtXJ0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>


## Gemini 3.5 Flash: specifications

The model has a 1 million token context window, supporting roughly 1,500 pages of text or a large codebase as full context. Maximum output is 64,000 tokens. It accepts text, images, video, audio, and PDF inputs natively.

## Benchmark results

### Google's benchmarks

![Benchmark table comparing Gemini 3.5 Flash with other leading AI models across coding, agentic workflows, and UI control](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/5ef161f8-a25c-4ecf-9f68-8b6a28f74800/md1x =1280x720)

Google's internal benchmarks show Flash within a few percentage points of GPT-5.5 on `Terminal-bench 2.1` and `SWE-bench Pro`, and ahead of Claude Opus 4.7 on `Terminal-bench`. On agentic benchmarks (`MCP Atlas`, `Toolathlon`), Google's data shows it leading all compared models.

### Artificial Analysis independent benchmarks

![Artificial Analysis Coding Index bar chart ranking AI models by coding benchmark performance](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/6412f113-4aad-41e0-5f3e-ee71dd258600/lg2x =1280x720)

The Artificial Analysis Coding Index scores Gemini 3.5 Flash at 45.0, below Gemini 3.1 Pro (56.5) and well below top-tier coding models. This contradicts Google's "in line with GPT-5.5" framing. The Artificial Analysis Agentic Index tells a different story: Flash shows significant improvement over its predecessors and ranks competitively with frontier models. The agentic strength is real; the coding parity claim is not supported by independent data.

### Output speed

![Bar chart from Artificial Analysis comparing output speed in tokens per second with Gemini 3.5 Flash at the top](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/27941745-d078-4531-4855-0f690d47c300/lg1x =1280x720)

Gemini 3.5 Flash produces 277 output tokens per second according to Artificial Analysis measurements, the highest of any model currently available. This is not a marginal advantage. For real-time conversational applications or any use case sensitive to perceived latency, this speed is the model's strongest differentiator.

## Real-world cost analysis

Per-token pricing for Gemini 3.5 Flash is $1.50 per million input tokens and $9.00 per million output tokens. This is cheaper than Claude Opus 4.7 ($5/$25) and GPT-5.5 ($5/$30) on a per-token basis.

Per-token pricing does not reflect total cost for agentic workloads. The relevant metric is cost to complete a task, which depends on how many tokens the model uses to do so.

![Chart showing total cost in USD to run the full Artificial Analysis intelligence benchmark suite for each AI model](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/c68a7ae7-1afd-443e-864f-f0f282ad1a00/orig =1280x720)

When Artificial Analysis ran their complete evaluation suite, Gemini 3.5 Flash cost $1,552 to complete, compared to $282 for Gemini 3 Flash (5.5x more expensive), approximately $870 for Gemini 3.1 Pro (75% more expensive), and less than $1,552 for GPT-5.5 medium.

The reason is turn count. In agentic evaluations, Flash required an average of 49 turns per task, one of the highest counts measured. Each turn passes the full conversation history as input. A model requiring 49 turns on a task accumulates token costs exponentially relative to a model completing the same task in 20 turns. The "less than half the cost" claim applies to per-token pricing in isolation; it does not hold for most practical agentic use cases.

## Antigravity 2 application

The Antigravity 2 app is a three-pane AI coding environment: project and conversation management on the left, main interaction and diff review in the center, and a prompt input and integrated terminal at the bottom. This layout matches the current standard for this category of tools.

Tested on two tasks:

**Simple UI (cafe website).** The model produced a visually polished single-page website, comparable to or better than Claude Opus 4.7 for this task. Simple UI generation appears to be a strength.

**Full-stack application (finance dashboard).** The resulting application was functional. The UI had a generic appearance compared to the Claude Opus version. Flash completed the task in 5 minutes versus 20 minutes for Opus; the Opus output was demonstrably more polished.

![Personal finance dashboard application generated by Gemini 3.5 Flash within the Antigravity 2 environment](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/1af69920-d323-46dc-11e2-200edc13c600/lg2x =1280x720)

The tradeoff is consistent with the benchmark data: significantly faster, with output quality that varies by task complexity.

## Antigravity CLI

The Antigravity CLI replaces the original Gemini CLI, which is being discontinued. The new CLI is rewritten in Go and is closed-source; the previous version was open-source. Functionality is similar. The move to closed source is a regression for developers who relied on the open-source version for customization or integration.

## Summary

Gemini 3.5 Flash is the fastest model currently available by a significant margin and performs well on agentic benchmarks. **For real-time conversational applications and multi-step agentic workflows where speed matters, it is a strong option. For coding tasks judged by independent benchmarks**, it underperforms its predecessors and is misrepresented in Google's own comparisons. For cost-sensitive agentic work, the per-token price advantage disappears once turn count is factored in.

**Antigravity 2 is a competent entry into the AI coding tool space without meaningful differentiation from existing tools**. The CLI's shift to closed source is a step back. The underlying model's speed advantage comes through in simple generation tasks; complex, design-sensitive work shows the gap between Flash and top-tier alternatives.

Gemini 3.5 Pro, which is forthcoming, may address the coding and cost-efficiency gaps. Until then, the Flash release is best understood as a speed-optimized specialist rather than a general-purpose replacement for existing frontier models.