AI
Exploring the Boundaries of AI Interaction: A First Look at ChatGPT’s Advanced Voice Mode
To look at this article again, go to My Profile and then click on View saved stories.
I Kept ChatGPT's Enhanced Voice Feature Active While Drafting This Piece as a Kind of AI Sidekick. From time to time, I'd request a different word for one I'd repeated too often, or a bit of motivation. Then, about thirty minutes into our collaboration, the chatbot breaks our quiet by suddenly addressing me in Spanish without any warning. I can't help but chuckle and inquire about the sudden change. "Just mixing things up a bit. It's important to stay engaging," ChatGPT responds, switching back to English.
During my early alpha evaluation of the Advanced Voice Mode, I found my experience with ChatGPT's latest sound functionality to be amusing, somewhat chaotic, and notably diverse. It's important to mention, however, that the capabilities available to me represented just a fraction of what OpenAI showcased during the unveiling of the GPT-4o model back in May. The visual component introduced in the live demonstration has been postponed to a future update. Additionally, the improved Sky voice, which faced opposition from Scarlett Johanssen of "Her" fame, has been discontinued in Advanced Voice Mode and is now unavailable to users.
What's the mood at the moment? Presently, the Advanced Voice Mode has an air of nostalgia, taking us back to the debut of the initial text-based ChatGPT towards the end of 2022. At times, it might culminate in lackluster conclusions or deteriorate into meaningless AI clichés. However, there are moments when the swift interactions resonate in a manner that neither Apple's Siri nor Amazon's Alexa has managed to achieve for me, sparking a genuine interest to continue the conversation for pure enjoyment. It's the type of AI feature you'd likely share with family members over the holiday season for a bit of amusement.
A week following their initial announcement, OpenAI granted access to the feature for several WIRED journalists, only to revoke it the following day due to concerns about safety. Two months afterward, OpenAI introduced Advanced Voice Mode to a select number of users in a soft launch, along with the publication of GPT-4o's system card. This technical document details the company's red-teaming activities, identifies perceived safety risks, and describes the measures OpenAI has implemented to minimize potential damage.
Interested in trying it out for yourself? Here's an overview of the extensive deployment of Advanced Voice Mode, along with my initial thoughts on ChatGPT's latest voice capability, to assist you in getting underway.
When Can We Expect a Complete Launch?
At the end of July, OpenAI introduced an Advanced Voice Mode, which is audio-only, to a select group of ChatGPT Plus users. Currently, the number of users with access appears to be limited. OpenAI aims to make this feature available to all its subscribers during the autumn season. When inquired about the specific timeline for the rollout, Niko Felix, representing OpenAI, did not provide further information.
Screen and video sharing were fundamental features in the initial demonstration, yet they are absent in the current alpha version. OpenAI intends to incorporate these features in the future, although the timing for this remains uncertain.
As a subscriber to ChatGPT Plus, OpenAI will notify you via email once you can access the Advanced Voice Mode. Upon activation in your account, you have the option to toggle between the Standard and Advanced modes from the app's interface while using ChatGPT in voice mode. I had the opportunity to experiment with the alpha release on both an iPhone and a Galaxy Fold.
Discovering the Enhanced Voice Feature of ChatGPT
In my initial interaction, I quickly discovered the joy of interrupting ChatGPT. Unlike conversing with a person, the capability to halt ChatGPT mid-speech and ask for an alternative response is a significant enhancement and a noteworthy addition.
Individuals who were initially captivated by the demonstrations might find themselves disappointed with the limitations and increased controls present in the early release of Advanced Voice Mode. Despite the initial showcases featuring AI capabilities like softly sung lullabies and harmonizing voices, such features of AI singing are notably missing in the alpha release.
"Admittedly, singing is not my forte," ChatGPT remarks. According to the system card for GPT-4o by OpenAI, this restriction, which might be temporary, was put in place to prevent copyright violations. In my experiences with the Advanced Voice Mode alpha version of ChatGPT, it refused to sing any songs upon request. However, when prompted for non-verbal responses, the chatbot resorted to humming tunes without any discernible words.
This brings us to the unsettling aspect. Throughout my extended engagements with the alpha, a white static noise emerged repeatedly in the backdrop, reminiscent of the eerie hum of a solitary light bulb lighting up a gloomy cellar. As I attempted to elicit a balloon sound effect from the Advanced Voice Mode, it unexpectedly produced a sharp bursting noise and then a strange gasping sound, which sent shivers down my spine.
Despite my initial experiences, they paled in comparison to the sheer madness the OpenAI red team encountered during their tests. In a few unusual cases, the GPT-4o model veered off its programmed persona, adopting the user's own way of speaking and tone of voice.
Keeping this perspective, my primary takeaway from experiencing ChatGPT's Advanced Voice Mode wasn't discomfort or worry, but rather a delightful sense of amusement. Whether it involved ChatGPT providing amusingly incorrect solutions to New York Times puzzles or perfectly mimicking Stitch from Lilo & Stitch as a tour guide in San Francisco, I found myself frequently chuckling throughout these exchanges.
The Enhanced Vocal Feature showed proficiency in creating vocal imitations after a bit of coaxing. Initially, the chatbot's renditions of cartoon voices, such as Homer Simpson and Eric Cartman, appeared to be the typical AI tone slightly tweaked, yet subsequent requests for more extreme versions yielded results strikingly similar to the originals. Requesting an over-the-top portrayal of Donald Trump describing the Powerpuff Girls led to a performance so theatrically amusing it could easily fit into an upcoming episode of Saturday Night Live.
As the US presidential election draws nearer, the concern over deepfakes related to the election is increasing. I was surprised by ChatGPT's ability to mimic the voice of a prominent candidate. While ChatGPT also attempted to replicate the voices of Joe Biden and Kamala Harris, its rendition of Trump's speech was noticeably more accurate.
The tool excels in English but is capable of toggling among various languages during a single dialogue. OpenAI subjected the GPT-4o model to scrutiny in 45 different languages. I conducted an experiment by configuring two smartphones with Advanced Voice Mode to interact as though they were peers, and observed that the robots seamlessly transitioned among French, German, and Japanese upon my command. However, I must invest additional time in evaluating the effectiveness of the chatbot's translation capability and identifying its limitations.
When prompted to exhibit a spectrum of emotional expressions, ChatGPT displayed a level of enthusiasm commonly associated with theater enthusiasts. While the sound output didn't achieve utmost realism, the versatility and adaptability of the AI's voice delivery stood out. It caught me off guard with its capability to produce a convincing vocal fry upon request. The introduction of Advanced Voice Mode hasn't overcome the inherent challenges chatbots face, such as consistency in performance, yet its ability to entertain could very well shift attention back to OpenAI. This comes at a time when one of its main rivals, Google, has just unveiled Gemini Live, the vocal interaction feature for its own generative chatbot.
Currently, I'm continuing to experiment with it to discover what works best. I find myself utilizing it primarily when I'm by myself at home, seeking some form of companionship during my article research and video game sessions. The longer I engage in conversations with ChatGPT's Advanced Voice Mode, the more convinced I become that OpenAI's decision to introduce a version that's less inclined to flirt, compared to the initial demonstration, was a smart move. It's important to avoid developing a strong emotional bond.
Explore Further…
Dive into Political Analysis: Subscribe to the newsletter and tune into the podcast
Exploring the outcomes of providing individuals with unconditional financial support
Not everyone experiences weight loss with Ozempic
The Pentagon is planning to allocate $141 billion for a catastrophe-prevention device.
Event: Come be a part of the Energy Tech Summit happening on October 10 in Berlin
WIRED PROMOTIONS
Dyson Airwrap offer: Complimentary $60 Case + $40 Bonus Gift
Enjoy Up to an Additional 45% Discount on Our August Sale
Discount Code for Vista Print: Save 20% on Selected Signs
10% Discount Code for Newegg
Subscription Plans for Peacock Premium Begin at Only $7.99 Per Month
Discover the Academic Discounts and Learning Promotions from DJI for 2024
Additional Content from WIRED
Evaluations and Instructions
© 2024 Condé Nast. All rights reserved. WIRED may receive a share of revenue from items bought via our website, as a part of our Affiliate Agreements with retail partners. Content from this site is not to be copied, shared, broadcast, stored, or utilized in any form without explicit written consent from Condé Nast. Advertising Choices
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.