AI
Google Lens Evolution: Navigating the Future of Visual Search with AI and Multimodal Capabilities
To go back to this article, navigate to My Profile and then click on View saved stories.
Google's Visual Search Enhancements Answer More Intricate Queries
Introduced in 2017, Google Lens transformed how we interact with the world through our smartphones. By simply aiming your phone's camera at an item, Google Lens identifies it, provides relevant information, and in some cases, offers a purchase option. This groundbreaking search method eliminated the need for typing out lengthy descriptions of objects in your vicinity.
Lens showcased Google's strategy to leverage its artificial intelligence and machine learning capabilities to expand the reach of its search engine across various platforms. Google is enhancing its core generative AI technology to provide concise summaries for textual queries, paralleling the advancements in Google Lens' image-based search functionality. The company has announced that Lens, currently facilitating approximately 20 billion searches monthly, will broaden its capabilities to include video and multimodal search options.
A recent update to Lens has enhanced its features, providing users with additional information for shopping within search results. Notably, shopping is a primary function of Lens, similar to visual search capabilities found on Amazon and Pinterest aimed at encouraging purchases. Previously, searching for an item like your friend's sneakers through Google Lens would yield a selection of similar products. However, with the new improvements, Google has announced that Lens will now offer direct purchasing links, consumer feedback, editorial reviews, and tools for comparing products.
The search function in Lens has advanced to become multimodal, a term currently trending in the field of artificial intelligence. This enhancement allows users to conduct searches using a mix of video, images, and vocal commands. Now, rather than just aiming their smartphone's camera at an item, focusing by tapping on the display, and anticipating the Lens application to generate findings, individuals have the option to direct the lens while simultaneously issuing voice commands. Examples include inquiries like, “What type of clouds are these?” or “Which brand do these sneakers belong to and where can I purchase them?”
Lens is expanding its capabilities to analyze live video footage, enhancing its current function of recognizing objects in static photos. This means if you encounter issues like a malfunctioning record player or observe a blinking indicator on a faulty device at your place, you can capture a brief video using Lens. Subsequently, with the help of generative AI, you will receive advice on fixing the problem.
Initially unveiled during the I/O conference, this particular function is still in the testing phase and is accessible solely to individuals who have chosen to participate in Google's search labs, according to Rajan Patel, a veteran at Google for 18 years and one of the founders of Lens. The additional features of Google Lens, including voice mode and enhanced shopping options, are being introduced to a wider audience.
Google's "video understanding" capability presents an interesting development for several reasons. At the moment, it applies to videos recorded live, but should Google extend this functionality to pre-recorded videos, vast collections of videos—whether stored in an individual's personal gallery or within a massive archive such as Google's—might soon be able to be tagged and extensively shopped through.
Another aspect to consider is the similarity between the Lens functionality and Google's upcoming Project Astra, slated for release later in the year. Similar to Lens, Project Astra also employs various forms of input to analyze and understand the environment via your smartphone. During a demonstration of Astra in the spring, the developers unveiled a prototype version of smart glasses.
In a distinct announcement, Meta recently unveiled its ambitious plans for the future of augmented reality, envisioning a scenario where ordinary people don futuristic glasses equipped with the ability to intelligently understand their surroundings and display holographic interfaces. This concept isn't entirely new, as Google previously ventured into similar territory with Google Glass, although the technology proposed by Meta significantly differs from Google's approach. The question arises whether the latest enhancements to Lens, along with the introduction of Astra, could pave the way for a novel iteration of smart eyewear.
When questioned, Patel responded modestly, describing Meta's declaration as "intriguing" and highlighting that Lens originated from Google's defunct Daydream team, which was primarily dedicated to developing virtual reality (VR) technologies, rather than augmented reality (AR).
Patel mentions, "We're constantly exploring ways to simplify the process for individuals to inquire, seeking methods to provide answers more fluidly, and determining the essential features we must develop. Each of these elements is a fundamental component."
Finally, it's important to note that the capability to record video of one's surroundings and quickly access a global information database raises significant and worrisome privacy issues. A team of students has reportedly modified Meta’s easily accessible smart glasses to include facial recognition capabilities, allowing them to recognize individuals they don't know.
When you utilize Google Lens through your mobile device to record a live video of individuals dancing in a park or possibly demonstrating on the streets, what happens in terms of data processing by Lens? Can it recognize the individuals appearing in the video?
Patel mentions that Len's primary approach will focus on understanding the scene by analyzing it from one frame to another. The goal is to mainly overlook the faces of individuals and concentrate on recognizing elements such as the setting of the scene, any music that might be playing, or, in certain situations, the attire worn by people. (The mantra being, always be shopping.)
Lens can be configured to largely overlook faces, yet it remains a visual search instrument that operates by taking pictures and videos that might capture people. While Lens' search capabilities may improve in providing responses to our inquiries, it also poses a significant question to its users about its trustworthiness.
Recommended for You …
Directly to your email: A selection of the top and most unconventional tales from the vault of WIRED.
Elon Musk poses a threat to national security
Interview: Meredith Whittaker Aims to Disprove Capitalism
What's the solution for a challenge such as Polestar?
Happening: Don't miss out on The Big Interview happening on December 3rd in San Francisco
Additional Content from WIRED
Evaluations and Manuals
© 2024 Condé Nast. All rights are reserved. WIRED might receive a share of revenue from items bought via our website, which is facilitated through our affiliate relationships with various merchants. Content from this website is not permitted to be copied, shared, broadcasted, stored, or used in any form without explicit written consent from Condé Nast. Advertising Choices
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.