AI Chatbots Produce Instant Imagery and Mechanical Robots Embrace Learning.
During a tech event held in San Francisco last November, Sam Altman, CEO of the AI company OpenAI, was queried about the surprises the field of artificial intelligence might hold in the year 2024.
His response was quick and to the point: "Chatbots, such as GPT-3, will make a quantum leap that nobody expected." Alongside him was James Manyika, an executive at Google, who nodded and added, "I second that statement."
Rapid Improvements
One attribute defines the AI industry this year: a rapid and notable advancement in technology, with a cumulative development enabling AI to generate new types of media, mimic human reasoning in new ways, and extend into the real world through a new breed of robots.
In the coming months, AI-powered image generators like DALL-E and Midjourney will produce videos and images instantaneously, gradually merging with chatbots like ChatGPT.
This means chatbots will evolve beyond digital text to interact with images, videos, graphs, and other media types, exhibiting behavior closer to human reasoning by performing more complex tasks in fields such as mathematics and science. As technology increasingly shifts toward robotics, these machines will become capable of solving problems outside the digital realm.
Many of these advancements started crystallizing within leading research laboratories and the tech product world since last year. However, the power of these products will grow by 2024, with broader usage by people.
David Luan, CEO of the budding AI company Adept, believes that "the fast-paced advancement of AI is relentless and inevitable."
Companies like OpenAI and Google are developing AI more aggressively than other technologies due to the fundamental design of their underlying systems.
Typically, software applications that require the crafting of each piece of computer code individually are engineered by engineers in a slow and tedious process. In contrast, companies are now enhancing AI more rapidly because its technology is based on neural networks and computational systems capable of learning skills through the analysis of digital data. A neural network can learn to generate text on its own by observing patterns in diverse data, such as Wikipedia articles, books, and digital texts drawn from the internet.
Changes in 2024
Here is a guide to the changes AI will experience this year, starting with the anticipated near-term developments expected to propel the technology's capabilities further.
* Instant Videos: Until now, AI-powered applications have generated text and static images as responses to prompt commands. For example, DALL-E can create images resembling real photographs within seconds based on requests such as: "A unicorn diving in front of the Golden Gate Bridge."
This year will likely see companies like OpenAI, Google, Meta, and Runway unveil image generators that allow users to fabricate videos as well. Some of these companies have completed the development of prototype tools that create instant videos based on short text prompts.
Moreover, these companies will probably seek to integrate the powers of image and video generation into chatbots to enhance the latter's capabilities.
* Multimodal Chatbots: Chatbots and image generators, initially developed as separate tools, are merging; last year, OpenAI first launched a new version of ChatGPT capable of generating images and texts.
AI companies are crafting "multimodal systems," which means their AI can handle multiple media types. These systems learn their skills by analyzing images, texts, and perhaps other media like graphs, sounds, and videos, so they can produce their texts, images, and sounds. As these systems also learn the relationships between different media types, they will one day be able to understand one type of media and respond in another form. In other words, someone may feed a chatbot with an image, and the chatbot will respond with text.
* Improved "Logic" and AI Agents: When Altman speaks of AI's quantum leap, he refers to chatbots with better "logical thinking" that can perform more complex tasks, such as solving tricky mathematical problems and generating detailed computer programs.
The goal is to build systems capable of solving a problem thoughtfully and logically through a series of secret steps that depend on each other since this is how human reasoning works, at least in some cases.
There is a debate among top scientists over whether chatbots are truly capable of thinking with such logic. Some argue that these systems barely demonstrate logic, merely repeating behavior observed in internet data. However, OpenAI and others are building systems that rightly respond to difficult questions on topics such as mathematics, computer programming, physics, and other sciences.
Former Google researcher and assistant director at the emerging AI company Cohere, Nick Frost, believes that "Higher trust levels in these systems will increase their popularity." If chatbots truly become more logical, they can easily transition into "AI agents."
* AI Agents: Tech companies are teaching AI systems how to handle complex problems step by step, and in the same manner, they can also improve the ability of chatbots to use software applications and websites on behalf of users.
Researchers are earnestly working to turn chatbots into a new type of autonomous systems called "AI agents." This means a chatbot will become capable of using programming applications, websites, and other electronic tools, such as calendars and travel sites, allowing people to finally delegate their work to such agents. However, this could also lead to AI agents wholly taking over certain jobs.
Today, chatbots operate as agents for simple tasks, such as scheduling meetings, editing files, analyzing data, and creating graphs. But the performance of these tools is not always up to standard and can collapse when faced with more complex tasks.
This year, AI companies are expected to unveil more efficient agents; Luan anticipated that "users will be able to delegate any tedious and tiresome tasks from their day-to-day work on the computer to such an agent."
These tasks include tracking expenses using an app like QuickBooks or scheduling holiday days in an app like Workday. In the long term, agents' capabilities will expand beyond software and internet services to the world of robotics.
* Smarter Robots: In the past, robots were programmed to perform the same task over and over again, like moving boxes of similar size and shape. Today, researchers use the same technology that powers chatbots to enable traditional robots to tackle more complicated and perhaps unfamiliar tasks.
Just as a chatbot learns to anticipate the next word in a sentence by analyzing vast amounts of digital text, robots can learn to predict what happens in the real world by analyzing countless videos showing objects and bodies being lifted and moved.
This will be a year of integrating AI's powers into robots that mostly work behind the scenes, such as mechanical arms folding shirts in laundry stores or organizing goods in warehouses. Tech giants, including
Elon Musk, are also endeavoring to bring humanoid robots into users' homes.