AI
Unlocking the Future of AI: The Allen Institute Releases Groundbreaking Open Source AI Model with Visual Abilities
To go back to this article, head over to My Profile and then check out your saved stories.
The Latest Open Source AI Model Enhances AI Agents' Capabilities
The latest open source AI model, now with improved visual capabilities, could enable a greater number of developers, researchers, and startups to create AI agents capable of performing helpful tasks on your computer.
Today, the Allen Institute for AI (Ai2) introduced the Multimodal Open Language Model, also known as Molmo, which has the capability to understand images and engage in conversation through a chat interface. This advancement allows it to comprehend what's on a computer screen, thereby aiding an AI agent in executing activities like web browsing, managing file systems, and creating documents.
"With the launch of this technology, it's now possible for a wider audience to implement a multimodal model," states Ali Farhadi, head of Ai2, a Seattle, Washington-based research institute, and a computer science professor at the University of Washington. "This should pave the way for the development of cutting-edge applications."
AI entities, often referred to as agents, are currently being hailed as a significant advancement in the field of artificial intelligence, with major players like OpenAI and Google leading the charge in their development. The term "agents" has gained popularity recently, and the overarching aim is for these AI systems to advance past simple conversation to executing intricate and advanced tasks on computers upon receiving instructions. However, this level of functionality has not yet been achieved on a widespread basis.
Several advanced artificial intelligence models, such as OpenAI's GPT-4, Anthropic's Claude, and Google DeepMind's Gemini, possess the capability to process visual information. These models are currently being utilized to enhance the capabilities of experimental AI agents, yet their access is restricted and can only be obtained through a subscription-based API service.
Meta has unveiled a series of AI models named Llama, which are available under a license that restricts their use for commercial purposes. However, the company has not yet made a multimodal variant available to developers. It is anticipated that Meta will reveal new offerings, potentially featuring new Llama AI models, during its Connect event today.
"An open source, multimodal model allows any startup or researcher with an idea to attempt it," states Ofir Press, a postdoctoral researcher at Princeton University focusing on AI agents.
According to Press, the open-source nature of Molmo facilitates developers in customizing their agents for particular assignments, for instance, managing spreadsheets, through the introduction of extra training data. Unlike models such as GPT-4, which only permit limited adjustments via their APIs, a completely open model like Molmo can undergo significant modifications. “Having access to an open-source model opens up a plethora of possibilities,” Press states.
Today, Ai2 is launching various versions of Molmo, featuring a model with 70 billion parameters and another more compact version with 1 billion parameters designed to operate on mobile devices. The parameter count of a model indicates the total units it has for data storage and processing, which generally reflects its level of sophistication and performance.
Ai2 asserts that despite its modest dimensions, Molmo matches the performance of much larger commercial counterparts, thanks to meticulous training on premium data. Furthermore, Molmo sets itself apart by being completely open-source, offering unrestricted usage, unlike Meta's Llama. Additionally, Ai2 is making the training data behind the model available to the public, giving researchers deeper insights into how it operates.
Unveiling potent models carries inherent dangers. These models could be manipulated for malicious purposes; for instance, we might witness the development of harmful AI entities aimed at automating cyber-attacks.
Ai2's Farhadi contends that Molmo's superior efficiency and compactness will enable creators to develop more robust software tools that can operate directly on smartphones and similar mobile devices. "Models with a billion parameters are now functioning at, or competing closely with, those that are at least an order of magnitude larger," he states.
Creating effective AI tools might require more than just enhancing the efficiency of multimodal models. A significant obstacle is improving the reliability of these models. Achieving this might necessitate advancements in the reasoning capabilities of AI—a goal OpenAI is addressing with its newest model, o1, which showcases the ability to reason through problems progressively. The future direction could involve equipping multimodal models with similar reasoning skills.
Currently, the launch of Molmo indicates that AI agents are on the brink of becoming more accessible, potentially extending their utility beyond the major players dominating the AI industry.
Explore More…
Direct to your email: A curated selection of the most fascinating and peculiar tales from the archives of WIRED.
Elon Musk poses a threat to national security.
Interview: Meredith Whittaker Aims to Disprove Capitalist Ideals
What's the solution for a challenge like Polestar?
Invitation: Be part of The Big Interview happening on December 3 in San Francisco
Additional Content from WIRED
Evaluations and Instructions
© 2024 Condé Nast. All rights reserved. WIRED might receive a share of revenue from items bought via our website, as a result of our Affiliate Agreements with retail partners. Content from this site must not be copied, shared, broadcast, stored, or utilized in any form without explicit written consent from Condé Nast. Advertising Choices
Choose a global website
Discover more from Automobilnews News - The first AI News Portal world wide
Subscribe to get the latest posts sent to your email.