AI
Project Astra: Google’s Leap into Multimodal AI, Challenging ChatGPT with Advanced Understanding and Interaction
To go back to this article, go to My Profile and then click on View saved stories.
Knight Will
Google Introduces Project Astra, A Comprehensive Solution to the Emerging ChatGPT
Despite ChatGPT being less than two years in existence, the concept of interacting with AI through text input is quickly becoming outdated.
During today's I/O developer conference hosted by Google, Demis Hassabis, the executive spearheading the company's push to regain its top position in the field of AI, unveiled a "next-generation AI assistant" named Project Astra. A video demonstration featured it operating both as a smartphone app and on a prototype of smart eyewear. This fresh innovation fulfills a commitment Hassabis had previously made concerning the capabilities of Gemini when the model was initially launched in December of last year.
Reacting to voice commands, Astra could understand and interpret objects and surroundings captured by the device's cameras, engaging in discussions about them using everyday language. It recognized a computer speaker and provided details on its parts, identified a London area by looking out an office window, interpreted and evaluated programming code displayed on a computer monitor, created a short humorous poem concerning a set of pencils, and remembered the location where someone had placed their eyeglasses.
The future outlook for AI presented recently is remarkably akin to the one OpenAI introduced on Monday. OpenAI unveiled a fresh interface for ChatGPT capable of engaging in quick voice conversations and discussing visuals captured by a smartphone camera or displayed on a computer screen. This iteration of ChatGPT, driven by a novel AI model named GPT-4o, incorporates a voice that closely resembles that of a human and conveys emotions with expressive tones, mimicking feelings such as astonishment and playful flirtation.
Google's initiative, named Project Astra, is leveraging an upgraded version of the Gemini Ultra AI technology, aimed at rivalling the engine behind ChatGPT since March 2023. Gemini Ultra, similar to OpenAI's GPT-4o, is categorized as "multimodal". This means it has been programmed with a diverse range of data including audio, images, videos, and text, enabling it to seamlessly process, mix, and create content across these different formats. The shift by Google and OpenAI towards this technology marks the dawn of a new era in the field of generative AI. Until now, the groundbreaking advancements that introduced ChatGPT and its rivals to the world were predominantly based on AI models focused solely on text. These models required integration with additional systems to incorporate image or audio functions.
In a conversation prior to the day's event, Hassabis expressed his belief that chatbots relying solely on text are merely a temporary phase leading to much more advanced and, ideally, beneficial AI assistants. He further emphasized, "Gemini was conceived with this foresight," explaining, "That's the reason for its multimodal design."
The latest iterations of Gemini and ChatGPT, equipped with the capabilities to see, hear, and communicate, offer remarkable demonstrations. However, it remains uncertain how they will integrate into professional environments or personal settings.
Jared Keller
Reece Rogers
Barrett Brian
Lauren Goode
PulAI-allcreator.com">kit Agrawal, a faculty member specializing in AI and robotics at the Massachusetts Institute of Technology, is wowed by the recent showcases from Google and OpenAI, highlighting the swift progress in multimodal AI technologies. In September 2023, OpenAI introduced GPT-4V, a model adept at understanding images. Agrawal finds it remarkable that Gemini can accurately interpret alterations to a diagram on a whiteboard as they happen live. Similarly, the latest iteration of OpenAI's ChatGPT seems to possess this capability as well.
Agrawal points out that the assistant technologies showcased by Google and OpenAI have the potential to generate fresh training information for these firms as users engage with these systems in real-world scenarios. "However, they need to offer value," he emphasizes. "The critical issue is determining their practical applications, which remains somewhat ambiguous."
Google announced that Project Astra will be accessible via a newly developed platform named Gemini Live later in the year. Hassabis mentioned that the firm is currently in the process of evaluating various smart glasses prototypes and remains undecided about releasing any of them.
Astra's potential could offer Google an opportunity to revive a variant of its unsuccessful Glass smart glasses. However, attempts to create hardware compatible with generative AI have faced challenges to date. Even with the remarkable demonstrations by OpenAI and Google, multimodal models still struggle to completely grasp the physical environment and the objects it contains, setting boundaries on their capabilities.
"Creating a mental representation of the surrounding physical environment is crucial for developing intelligence that closely resembles human intelligence," states Brenden Lake, an associate professor at New York University who employs artificial intelligence to study human cognitive abilities.
Lake observes that the leading artificial intelligence models of today primarily focus on language due to the majority of their training material being derived from text gathered from books and online sources. This approach significantly diverges from the human method of language acquisition, which occurs through engagement with the physical environment. He comments on the development of multimodal models, noting, "It's the opposite of how children develop."
Hassabis is of the opinion that enhancing AI models with a more profound grasp of the physical realm is crucial for continued advancements in AI, and for increasing the resilience of initiatives such as Project Astra. He suggests that exploring other AI domains, such as Google DeepMind's efforts in developing AI for gaming, could be beneficial. Hassabis, along with his peers, believes that this kind of research has the potential to transform the field of robotics, a sector where Google is actively pouring resources.
"Hassabis mentioned that a versatile, all-encompassing agent assistant is moving toward the concept of artificial general intelligence," referring to an anticipated yet largely unspecified future stage where machines are capable of performing any task that a human brain can. "It's not AGI or similar, but it marks the start of something."
Revised on May 14, 2024, at 4:15 PM EDT: The details in this article have been revised for a clearer understanding of the complete title of Google's initiative.
Suggested for You …
Direct to your email: Discover the future of artificial intelligence with Will Knight's Fast Forward series.
He transferred the contents of a digital currency platform to a USB drive and then vanished.
The emergence of instant deepfake love cons is now a reality
Boomer-induced excitement is on the rise.
Heading outside? Check out the top sleeping bags for any type of adventure
Kate Knibbs
Knight Will
Dave Paresh
Benj Edwards, a writer for Ars Technica
Lauren Goode
Lauren Goode
Kate Knibbs
Name: Caroline Haskins
Additional Content from WIRED
Critiques and Tutorials
© 2024 Condé Nast. All rights reserved. Purchases made through our website may generate revenue for WIRED via our affiliate relationships with retailers. Content from this site cannot be copied, shared, broadcast, saved, or used in any manner without explicit written consent from Condé Nast. Ad Choices
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.