AI
DeepMind’s Chatbot-Driven Robot Ushers in a New Era of AI-Assisted Automation
To look over this article again, go to My Profile and then to Saved Stories to see it.
DeepMind's AI-Driven Robot Marks a Significant Leap Forward
Nestled within a bustling, spacious office in Mountain View, California, a sleek, tall robot on wheels has taken on the role of both a tour guide and a casual assistant around the office. This has been made possible by an upgrade to its language processing capabilities, as announced by Google DeepMind today. The robot leverages an advanced iteration of Google’s Gemini large language model to understand instructions and navigate the space.
Upon receiving instructions from a person to "Find me somewhere to write," the robot obediently moves away, guiding the individual to a spotless whiteboard situated within the premises.
The "Google helper" robot, equipped with Gemini, excels at processing both video and textual data, along with the capability to absorb vast quantities of data from pre-recorded office tour videos. This functionality enables the robot to understand its surroundings and accurately follow orders that necessitate a level of practical reasoning. By integrating Gemini with a unique algorithm, the robot is programmed to perform designated actions, like making turns, based on the instructions it receives and its visual perception of the environment.
In December, upon the launch of Gemini, Demis Hassabis, the head of Google DeepMind, shared with WIRED that the system's diverse functionalities might pave the way for groundbreaking developments in robotics. He further mentioned that their team was diligently exploring the model's possibilities in robotic applications.
In their recently published study, the team of scientists reports that their robotic creation demonstrated a remarkable 90 percent success rate in finding its way, even when tasked with complex instructions like locating a misplaced coaster. The group from DeepMind notes that their technology has markedly enhanced the fluidity of communication between humans and robots, as well as substantially boosting the practicality of robot use.
The demonstration effectively showcases how advanced language models could extend their capabilities beyond the digital realm to perform practical tasks. Unlike traditional chatbots that function within digital platforms like websites or applications, recent advancements have broadened their scope, allowing them to process both visual and auditory data, as shown by recent innovations from Google and OpenAI. In May, Hassabis presented an enhanced model of Gemini, which can interpret the structure of an office space through the lens of a smartphone camera.
Research institutions and corporate labs are competing to explore the potential applications of language models in improving the capabilities of robots. At the upcoming International Conference on Robotics and Automation, a key gathering for those in the field of robotics, the agenda for May includes nearly twenty papers focused on the integration of vision language models.
Funding is flowing into new ventures that seek to harness the latest AI breakthroughs for robotic technology. A group of experts who previously contributed to a notable project at Google have established a new enterprise, Physical Intelligence, securing an initial investment of $70 million. Their aim is to integrate expansive language models with hands-on training, enabling robots to tackle a wide range of problems. Similarly, Skild AI, initiated by robotics experts from Carnegie Mellon University, shares this ambition. Recently, they revealed they've attracted $300 million in investment.
Not long ago, robots required a detailed map and specific instructions to move around effectively. Nowadays, advanced language models, enriched with details about the real world and trained on a mix of text, images, and videos — referred to as vision language models — have the capability to respond to queries needing visual understanding. Google's Gemini project enables their robots to comprehend instructions given visually or orally, allowing them to navigate to a new location by interpreting a route drawn on a whiteboard.
In their study, the authors mention their intention to evaluate the system across various robotic platforms. They also suggest that Gemini is designed to understand more intricate inquiries, like "Is my preferred beverage available today?" posed by someone whose desk is cluttered with numerous empty Coke cans.
Suggested for You…
Directly to your email: Receive Plaintext—Steven Levy offers an in-depth perspective on technology.
Introducing the chaotic world of automated advertising
What is the required number of electric vehicle charging points to substitute gasoline stations across the United States?
A charitable organization aimed to reform the tech industry but ended up losing grip on its management.
Eternal sunshine: Discover the top sunglasses for any escapade
Additional Content from WIRED
Insights and Manuals
© 2024 Condé Nast. All rights reserved. Purchases made through our site may result in WIRED receiving a commission, as part of our Retail Affiliate Programs. Reproduction, distribution, transmission, storage, or any form of usage of the site's content is strictly prohibited without explicit prior consent from Condé Nast. Ad Choices.
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.