OpenAI unveils Realtime API and other features for developers


OpenAI didn’t release any new models at its Dev Day event but new API features will excite developers who want to use their models to build powerful apps.

OpenAI has had a tough few weeks with its CTO, Mira Murati, and other head researchers joining the ever-growing list of former employees. The company is under increasing pressure from other flagship models, including open-source models which offer developers cheaper and highly capable options.

The new features OpenAI unveiled were the Realtime API (in beta), vision fine-tuning, and efficiency-boosting tools like prompt caching and model distillation.

Realtime API

The Realtime API is the most exciting new feature, albeit in beta. It enables developers to build low-latency, speech-to-speech experiences in their apps without using separate models for speech recognition and text-to-speech conversion.

With this API, developers can now create apps that allow for real-time conversations with AI, such as voice assistants or language learning tools, all through a single API call. It’s not quite the seamless experience that GPT-4o’s Advanced Voice Mode offers, but it’s close.

It’s not cheap though, at approximately $0.06 per minute of audio input and $0.24 per minute of audio output.

Vision fine-tuning

Vision fine-tuning within the API allows developers to enhance their models’ ability to understand and interact with images. By fine-tuning GPT-4o using images, developers can create applications that excel in tasks like visual search or object detection.

This feature is already being leveraged by companies like Grab, which improved the accuracy of its mapping service by fine-tuning the model to recognize traffic signs from street-level images​.

OpenAI also gave an example of how GPT-4o could generate additional content for a website after being fine-tuned to stylistically match the site’s existing content.

Prompt caching

To improve cost efficiency, OpenAI introduced prompt caching, a tool that reduces the cost and latency of frequently used API calls. By reusing recently processed inputs, developers can cut costs by 50% and reduce response times. This feature is especially useful for applications requiring long conversations or repeated context, like chatbots and customer service tools.

Using cached inputs could save up to 50% on input token costs.

Price comparison of cached and uncached input tokens for OpenAI’s API. Source: OpenAI

Model distillation

Model distillation allows developers to fine-tune smaller, more cost-efficient models, using the outputs of larger, more capable models. This is a game-changer because, previously, distillation required multiple disconnected steps and tools, making it a time-consuming and error-prone process.

Before OpenAI’s integrated Model Distillation feature, developers had to manually orchestrate different parts of the process, like generating data from larger models, preparing fine-tuning datasets, and measuring performance with various tools.

Developers can now automatically store output pairs from larger models like GPT-4o and use those pairs to fine-tune smaller models like GPT-4o-mini. The whole process of dataset creation, fine-tuning, and evaluation can be done in a more structured, automated, and efficient way.

The streamlined developer process, lower latency, and reduced costs will make OpenAI’s GPT-4o model an attractive prospect for developers looking to deploy powerful apps quickly. It will be interesting to see which applications the multi-modal features make possible.