On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini exceeds “current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in large language model (LLM) research and development.” It’s a follow-up to PaLM 2, an earlier AI model that Google hoped would match GPT-4 in capability.
A specially tuned English version of its mid-level Gemini model is available now in over 170 countries as part of the Google Bard chatbot—although not in the EU or the UK due to potential regulation issues.
Like GPT-4, Gemini can handle multiple types (or “modes”) of input, making it multimodal. That means it can process text, code, images, and even audio. The goal is to make a type of artificial intelligence that can accurately solve problems, give advice, and answer questions in various fields—from the mundane to the scientific. Google says this will power a new era in computing, and it hopes to tightly integrate the technology into its products.
“Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information,” writes Google. “Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering, and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.”
Google says Gemini will be available in three sizes: Gemini Ultra (“for highly complex tasks”), Gemini Pro (“for scaling across a wide range of tasks”), and Gemini Nano (“for on device tasks” like Google’s Pixel 8 Pro smartphone). Each is likely separated in complexity by parameter count. More parameters means a bigger neural network that is generally more capable of executing more complex tasks but requires more computational power to run. That means Nano, the smallest, is designed to run locally on consumer devices, while Ultra can only run on data center hardware.
“These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year,” wrote Google CEO Sundar Pichai in a statement. “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead and for the opportunities Gemini will unlock for people everywhere.”
Although Gemini will come in three sizes, only the mid-level model is available for public use. As mentioned above, Google Bard now runs a specially tuned version of Gemini Pro. From our informal testing so far, Gemini Pro appears to perform much better than the previous version of Bard, which was based on Google’s PaLM 2 language model.
Google also claims that Gemini is more scalable and efficient than its previous AI models when run on Google’s custom Tensor Processing Units (TPU). “On TPUs,” Google says, “Gemini runs significantly faster than earlier, smaller and less-capable models.”
And it’s purportedly great at coding. Google trained a special coding-centric version of Gemni called AlphaCode 2, which “excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science,” according to Google. Gemini is also excellent at inflating Google’s PR language—if the models were any less capable and revolutionary, would the marketing copy be any less breathless? It’s doubtful.