OpenAI announces new multimodal desktop GPT with new voice and vision capabilities

May 14, 2024
OpenAI has unveiled a new multimodal version of its desktop GPT, dubbed GPT-4o, which enhances user interaction through text, voice, and visual prompts, boasting nearly human-like response speeds. This upgraded model can interpret and respond to diverse inputs like photos, documents, facial expressions, and handwritten information swiftly and effectively. Although these capabilities mark significant progress, analysts note that OpenAI is catching up to competitors like Google, who previously introduced advanced multimodal models. Additionally, GPT-4o is set to include improved memory functions, enhancing its ability to learn from previous interactions. Despite its advancements, GPT-4o's context window size remains the same, and it faces challenges regarding potential misuse in real-time applications.
