In an ever-escalating race against the industry, Google has introduced Gemini, the largest and most capable AI model the company has ever built, according to their official statement. Gemini was built from the ground-up as a multimodal AI model, “which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” according to the press release.
Gemini was built across a large-scale collaborative effort of Google Research, DeepMind, and other teams across Google. An in depth report on the creation and engineering behind Gemini is available from Google DeepMind.
Gemini Ultra, Pro and Nano
Gemini 1.0, the company’s initial release of the multimodal model, has three different sizes: Gemini Ultra, Gemini Pro, and Gemini Nano. Notably, Gemini Nano is built to run natively on Google Pixel 8 hardware with the Android 14 operating system.
Gemini Ultra – the flagship model variant – is in limited public preview today by invitation only. It is expected to become generally available to the public by the second quarter of 2024. Gemini Nano is also in preview access, and is accepting applications. Gemini Pro is generally available today.
On Google’s official Gemini launch page, the company boasts its performance benchmarks across image, video and audio use cases. In multimodal benchmarks against OpenAI’s GPT-4V, Gemini outperforms GPT-4V in every use case.
Examples of the model’s ability to take any form of input and create any form of output are showcased, such as interpreting musical notation, physics problems, and taking a video of a flock of fish at sea and transforming it into code mimicking the video.
Hands-On Training with Google Gemini
If you’re a developer or just love to learn by doing, then Google has a handful of recommended multimodal prompts and training for you to explore.
First, you can explore how Gemini is able to interpret image context and understand patterns from this example on Google Developers. This guide will also demonstrate Gemini’s ability to understand context within image sequences, magic tricks, tool use, coding, and more.
If you’re looking for ways to incorporate Gemini Pro in your custom application, you can access the Gemini API in Google AI Studio and build your integration through Vertex AI on Google Cloud.
Gemini Nano, designed for Pixel 8 hardware, is in preview access as mentioned above. Google is accepting applications for early access, and developers will need to utilize the Google AI Edge SDK for Android to use Gemini Nano on-device with your app.
If you’re a data science student or professional, you’ll appreciate this Jupyter notebook walking through several Gemini use cases that are hosted on Google Colabs and GitHub.