On Feb. 5, Google expanded the availability of its updated Gemini 2.0 Flash generative AI model to the Gemini API, completing the rollout of the model intended for low latency and enhanced performance through so-called AI “reasoning.” Gemini app users on both desktop and mobile gained access to Gemini 2.0 Flash last week.
Gemini 2.0 Flash is now available through Google’s API
Developers can access Gemini 2.0 Flash in the Google AI Studio and Vertex AI. The pricing depends on the type of input and output, and on the type of subscription to Google AI Studio or Vertex AI. Details can be found on Google’s developer blog.
Gemini 2.0 Flash offers a context window of one million tokens (600,000 to 800,000 English words, although some words count as a single token). Like OpenAI o3 or DeepSeek’s R1, Gemini 2.0 Flash is designed to slow down some of its generative prediction processes to “reason’” through relatively complex coding, math, and science problems.
2.0 Pro Experimental shows the best of Gemini
Google has several other versions of Gemini 2.0 cooking: the standard Flash variant, Gemini 2.0 Flash-Lite, and Gemini 2.0 Pro Experimental. The Experimental version is the latest to come to general release; users can find it in Google AI Studio, Vertex AI, and the Gemini app with a Gemini Advanced subscription. Google reported that Gemini 2.0 Pro Experimental performs higher than its counterparts on benchmarks like MMLU-Pro and LiveCodeBench.
Gemini 2.0 Flash-Lite is open for business in public preview
At the midpoint between the Gemini 1.5 Flash and Gemini 2.0 Flash models is 2.0 Flash-Lite, which approaches 2.0’s performance while maintaining 1.5’s price point. Users of Google AI Studio and Vertex AI can find 2.0 Flash-Lite in public preview.
Gemini 2.0 Flash’s place in Google’s shifting approach to AI responsibility
The new availability options for Gemini 2.0 came shortly after Google updated its AI safety policy to remove language prohibiting uses such as weapons and surveillance.
In the related blog post, the Gemini team said they used novel reinforcement learning techniques to teach Gemini to double-check itself, improving responses and, in particular, dialing in on desirable answers to“sensitive prompts.”
“We’re also leveraging automated red teaming to assess safety and security risks, including those posed by risks from indirect prompt injection, a type of cybersecurity attack which involves attackers hiding malicious instructions in data that is likely to be retrieved by an AI system,” wrote Koray Kavukcuoglu, chief technology officer of Google DeepMind, in the blog post.