Key Highlights from Google IO 2024

Google IO 2024 is this year’s edition of Google’s annual developers conference, which is usually hosted in Mountain View, California. As is customary, Google used the opportunity to announce a plethora of new products and services.

I have compiled a list of all the announcements made during the keynote speech, which is usually the first event of Google IO.

Ask Photos Search

Gemini goes beyond a simple search, recognizing different contexts, from doing laps in the pool to snorkeling in the ocean. We are rolling out Photos this summer with more capabilities to come. Multimodality radically expands the questions we can ask and the answers we will get back.

Long context takes this a step further, enabling us to bring in even more information—hundreds of pages of text, hours of audio, a full hour of video, or the entire code repost you need.

Gemini 1.5 Pro

A 1 million token context window is now possible with Gemini 1.5 Pro.

I’m excited to announce that we are bringing this improved version of Gemini 1.5 Pro to all developers globally. Gemini 1.5 Pro, with 1 million contexts, is now directly available for consumers in Gemini Advanced and can be used across 35 languages.

So today, we are expanding the context window to 2 million tokens. This represents the next step on our journey toward the ultimate goal of infinite context.

If you couldn’t make the PTA meeting, the recording of the meeting is an hour long. You can ask Gemini to give you the highlights if it’s from Google Meet. There’s a parents group looking for volunteers. You’re free that day, of course. Gemini can draft a reply. Gemini 1.5 Pro is available today in Workspace Labs.

Notebook LM

Notebook LM is going to take all the materials on the left as input and output them into a lively science discussion personalized for him. So, let’s dive into physics. What’s on deck for today? Well, we’re starting with the basics: force and motion. Okay, and that means we have to talk about Sir Isaac Newton and his three laws of motion.

What’s amazing is that my son and I can join the conversation and steer it in whichever direction we want. When I tap join, hold on, we have a question. What’s up, Josh? Yeah, can you give my son Jimmy a basketball example? Hey Jimmy, that’s a fantastic idea. Basketball is actually a great way to visualize force and motion. Let’s break it down.

First, imagine a basketball just sitting there on the court. It’s not moving, right? That’s because all the forces acting on it are balanced. The downward pull of gravity connected the dots and created that age-appropriate example for him, making AI helpful for everyone.

Gemini 1.5 Flash

Last year, we reached a milestone on that path when we formed Google Deep Mind. So today, we’re introducing Gemini 1.5 Flash. Flash is a lighter-weight model compared to Pro. Starting today, you can use 1.5 Flash and 1.5 Pro with up to 1 million tokens in Google AI Studio and Vertex AI.

Project Astra

Today, we have some exciting new progress to share about the future of AI assistance, which we’re calling Project Astra. “Tell me, when you see something that makes a sound,” I see a speaker, which makes a sound. “What is that part of the speaker called?” It is the tweeter that produces high-frequency sounds. “What does that part of the code do?” This code defines encryption and decryption functions. It seems to use AES CBC encryption to encode and decode data based on a key and an initialization vector (IV). “What can I add here to make this system faster?” Adding a cache between the server and the database could improve speed.

Today, we’re introducing a series of updates across our generative media tools with new models covering images, music, and video.

Imagen 3

I’m so excited to introduce Imagine 3. Imagine 3 is more photorealistic, allowing you to literally count the whiskers on its snout with richer details, such as the incredible sunlight in the shot and fewer visual artifacts or distorted images. You can sign up today to try Imagine 3 in Image FX, part of our suite of AI tools, at labs.google.

Music AI Sandbox

Together with YouTube, we’ve been building Music AI Sandbox, a suite of professional music AI tools that can create new instrumental sections from scratch, transfer styles between tracks, and more.

Veo, AI-Generated Video

Today, I’m excited to announce our newest, most capable generative video model called VR. VR creates high-quality 1080p videos from text, image, and video prompts. It can capture the details of your instructions in different visual and cinematic styles. You can prompt for things like aerial shots of a landscape or time-lapse sequences and further edit your videos using additional prompts. You can use VO in our new experimental tool called Video FX, where we’re exploring features like storyboarding and generating longer scenes.

Not only is it important to understand where an object or subject should be in space, but it also needs to maintain this consistency over time, just like the car in this video. Over the coming weeks, some of these features will be available to select creators through Video Effects at labs.google, and the waitlist is open now.

6th Gen TPUs Trillium

Today, we are excited to announce the sixth generation of TPUs, called Trillium. Trillium delivers a 4.7x improvement in compute performance per chip over the previous generation. We will make Trillium available to our Cloud customers in late 2024.

Multi-step Reasoning in Google Search

We’re making AI overviews even more helpful for your most complex questions. To make this possible, we’re introducing multi-step reasoning in Google Search. Soon, you’ll be able to ask Search to find the best yoga or Pilates studios in Boston and show you details on their intro offers and the walking time from Beacon Hill. You get some studios with great ratings and their introductory offers, and you can see the distance for each. For example, one studio is just a 10-minute walk away. Right below, you see where they’re located, laid out visually. It breaks your bigger question down into all its parts and figures out which problems it needs to solve and in what order.

Next, take planning for example. Now, you can ask Search to create a 3-day meal plan for a group that’s easy to prepare. Here, you get a plan with a wide range of recipes from across the web. If you want to get more veggies in, you can simply ask Search to swap in a vegetarian dish. You can export your meal plan or get the ingredients as a list just by tapping here. Soon, you’ll be able to ask questions with video right in Google Search.

Ask with Video

I’m going to take a video and ask Google, “Why will this not stay in place?” In nearly an instant, Google gives me an AI overview. It provides some reasons why this might be happening and steps I can take to troubleshoot. You’ll start to see these features rolling out in Search in the coming weeks.

New Gmail Mobile Features

And now, we’re excited that the new Gemini-powered side panel will generally be available next month. Three new capabilities are coming to Gmail mobile. It looks like there’s an email thread with lots of emails that I haven’t read, and luckily for me, I can simply tap the summarize option up top and skip reading this long back and forth. Now, Gemini pulls up this helpful mobile card as an overlay, where I can read a nice summary of all the salient information I need.

I can simply type out my question in the mobile card and say, “Compare my roof repair bids by price and availability.” This new Q&A feature makes it so easy to get quick answers on anything in my inbox without having to search Gmail, open the email, and look for specific information and attachments. I see some suggested replies from Gemini. Now, I see I have declined the service and suggested a new time. These new capabilities in Gemini and Gmail will start rolling out this month to Labs users.

For example, it’s got a PDF that’s an attachment from a hotel as a receipt, and I see a suggestion in the side panel to help me organize and track my receipts.

Step one: Create a Drive folder and put this receipt and 37 others found in that folder.
Step two: extract the relevant information from those receipts in that folder into a new spreadsheet. Gemini offers you the option to automate this so that this particular workflow is run on all future emails.

Gemini does the hard work of extracting all the right information from all the files in that folder and generates this sheet for you. It also shows me where the money is spent. Gemini not only analyzes the data from the sheet but also creates a nice visual to help me see the complete breakdown by category. This particular ability will be rolled out to Labs users.

Gemini-powered teammate called Chip

This September, we’re prototyping a virtual Gemini-powered teammate named Chip. Chip has been given a specific job role with a set of descriptions on how to be helpful to the team, which you can see here. Some of the jobs are to monitor and track projects, organize information, provide context, and a few more things.

When asked, “Are we on track for launch?” Chip gets to work, searching through everything it has access to, synthesizing what’s found, and coming back with an up-to-date response. There it is a clear timeline, a nice summary, and even a flagged potential issue the team should be aware of in the first message.

Because we’re in a group space, everyone can follow along, and anyone can jump in at any time. As you see, someone just did, asking Chip to help create a document to address the issue.

Gemini Live (video AI)

And this summer, you can have an in-depth conversation with Gini using your voice. We’re calling this new experience Live. When you go Live, you’ll be able to open your camera so Gemini can see what you see and respond to your surroundings in real-time. We’re rolling out a new feature that lets you customize it for your own needs and create personal experts on any topic you want. We’re calling these Gems. Just tap to create a Gem, write your instructions once, and come back whenever you need it.

For example, I created a Gem that acts as a personal writing coach. It specializes in short stories with mysterious twists and even builds on the story drafts in my Google Drive. Gems will roll out in the coming months. This reasoning and intelligence all come together in the new trip-planning experience in Gemini Advanced.

Trip planning with Gemini

We’re going to Miami! My son loves art, my husband loves seafood, and our flight and hotel details are already in my Gmail inbox. Gemini starts by gathering all kinds of information from search engines and using helpful extensions like Maps and Gmail to make sense of these variables. The end result is a personalized vacation plan presented in Gemini’s new Dynamic UI. I like these recommendations, but my family likes to sleep in, so I tap to change the start time. Just like that, Gemini adjusts my itinerary for the rest of the trip. This new trip planning experience will be rolling out to Gemini Advanced this summer.

You can upload your entire thesis, sources, notes, research, and soon interview audio recordings and videos. It can dissect your main points, identify improvements, and even role-play as your professor. Maybe you have a side hustle selling handcrafted products. Simply upload all of your spreadsheets and ask Gemini to visualize your earnings. Gemini goes to work calculating your returns and pulling its analysis together into a single chart. Of course, your files are not used to train our models. Later this year, we’ll be doubling the long context window to two million tokens.

We’re putting AI-powered search right at your fingertips. For example, let’s say my son needs help with a tricky physics word problem like this one. If he’s stumped on this question, instead of putting me on the spot, he can circle the exact part he’s stuck on and get step-by-step instructions right where he’s already doing the work. This new capability is available today.

Contextual awareness

Now we’re making Gemini context-aware. My friend Pete is asking if I want to play pickleball this weekend, so I’m going to reply and try to be funny by saying, “Uh, is that like tennis but with, uh, pickles?” Then, I’ll say, “Uh, create an image of tennis with pickles.” One new thing you’ll notice is that the Gemini window now hovers above the app, so I stay in the flow.

Okay, so that generated some pretty good images. What’s nice is I can then drag and drop any of these directly into the Messages app below. Cool, let me send that. Because it’s context-aware, Gemini knows I’m looking at a video, so it proactively shows me an “Ask this video chip what is” option. I can’t type the two-bounce rule, by the way. This uses signals like YouTube’s captions, which means you can use it on billions of videos.

So give it a moment, and there it is. Starting with Pixel later this year, we’ll be expanding what’s possible with our latest model, Gemini Nano, with multimodality.

TalkBack & Gemini Nano

Several years ago, we developed TalkBack, an accessibility feature that helps people navigate their phones through touch and spoken feedback. Now, we’re taking that to the next level with the multimodal capabilities of Gemini Nano. When someone sends Cara a photo, she’ll get a richer and clearer description of what’s happening, and the model even works when there’s no network connection. These improvements to TalkBack are coming later this year.

Gemini Pro and Flash prices

Gemini 1.5 Pro is $7 per 1 million tokens, and I’m excited to share that for prompts up to 128k, it’ll be 50% less at $3.50. Additionally, 1.5 Flash will start at 35 cents per 1 million tokens. Today marks the debut of our newest member, Poly Gemma, our first Vision Language Open Model, and it’s available right now.

I’m also thrilled to announce Gemma 2, the next generation of Gemma, which will be available in June. Today, we’re expanding Synth ID to two new modalities: text and video. In the coming months, we’ll be open-sourcing Synth ID text watermarking.

I’m excited to introduce Learn LM, our new family of models based on Gemini and fine-tuned for learning. We’re also developing some pre-made Gems that will be available in the Gemini app and web experience, including one called Learning Coach.

Conclusion

These are all exciting products and projects by Google. However, it remains to be seen how all these will impact our lives.

Samuel Effiok

A seasoned software engineer with more than eleven years of experience who writes about news and international topics on the side. Afolabi, who holds a degree in Electrical/Electronics Engineering, combines technical know-how with a sharp awareness of global events to offer a distinctive analytical viewpoint to his work. Afolabi is the one to turn to for perceptive commentary on world affairs.