在世界移動大會(MWC)2025年,Google宣布了其Gemini Live AI助手的重大擴展,介紹了實時視頻和屏幕上的查詢功能。這些新功能允許用戶使用智能手機的相機供稿和屏幕上的內容與雙子座進行交互,從而實現實時的多模式AI幫助。

How the New Features Work

Google’s latest Gemini AI update expands its capabilities by integrating Live Video AI Queries and On-Screen AI Interaction, two features designed to bring real-time intelligence to everyday mobile experiences.

They allow users to interact with Gemini in a more natural and intuitive way, without relying solely on text-based prompts.

Live Video AI Queries

The Live Video AI Queries feature enables users to point their smartphone camera at objects, text, or scenes and ask Gemini questions based on what it sees.

Whether identifying unfamiliar landmarks, solving math problems from a written equation, or providing step-by-step guidance on repairing a household item, Gemini can process the live feed and generate relevant responses.

Live Video AI Queries build on Google’s previous AI-powered image recognition technologies but takes it a step further by making the AI assistant capable of analyzing dynamic, real-time video rather than static images.

[embedded content]

On-Screen AI Interaction

The second feature, On-Screen AI Interaction, allows Gemini to analyze content displayed on the user’s phone screen and provide relevant information or assistance.這意味著用戶可以在閱讀文章,審查文檔或瀏覽網站以獲取解釋,摘要或翻譯而無需切換應用程序的情況下召喚雙子座。

例如,例如,閱讀科學論文的用戶可以要求雙子座簡化複雜的術語,而審查合同的人可以要求使用法律文本的簡單範圍。 This seamless integration of AI into everyday browsing and work tasks eliminates the need to copy and paste content into a separate chatbot interface.

[embedded content]

Google’s Strategy: Expanding Gemini AI Beyond Text

Google’s latest AI advancements align with its broader strategy to transform Gemini AI into a full-fledged research and productivity tool.

Earlier in February, Google added its Deep Research feature to the Gemini Android app,  which allows Gemini Advanced users to conduct structured investigations by compiling and analyzing multiple sources.

This shift toward multimodal AI interactions—incorporating Google’s Gemini 2.0 Flash Thinking model—suggests a growing emphasis on making AI a real-time, interactive assistant rather than just a chatbot.

The focus on AI-driven mobile experiences at MWC 2025 places Google in direct competition with Apple’s upcoming AI initiatives for iOS 18 and OpenAI’s continued push with new capabilities in ChatGPT, like the live video support for Advanced Voice Mode it added last December.

Google’s Gemini AI, now integrated into Android’s core functionality, aims to position itself as a default AI companion for mobile users.

Beyond consumer applications, Google’s AI capabilities extend to productivity tools.就在MWC的前幾天,將雙子AI集成到Google表中,實現了自動數據分析和可視化-與Microsoft的AI驅動的Excel Copilot。 2月初的Gemini 2.0 Pro和Flash-Lite的引入帶來了重大的技術改進,尤其是在推理和記憶方面。現在,這些模型支持了一個2000萬個上下文窗口,使雙子座能夠在一次會話中處理更多信息。

此外,Google還將Gemini定位為長期AI研究工具。深入的研究功能允許進行多種源研究,雙子座的記憶更新都強調了AI的角色,而不是簡單的聊天機器人交互。

通過合併實時視頻和屏幕分析,Google正在彌合研究以研究為導向的AI工具和日常用戶交互之間的差距

此舉表明,最終向AI增強的增強現實(AR)應用程序進行了轉變,用戶通過智能覆蓋層與周圍的世界進行互動。

解決實時AI

的挑戰,而Google的最新gemini的互動則構成了一個重要的挑戰,使一位重要的步驟構成了一個重要的步驟。處理實時視頻的能力引起了人們對隱私,安全性和準確性的擔憂。

確保AI生成的響應既可靠,又沒有偏見,這對Google及其競爭對手來說仍然是一個至關重要的挑戰。梅塔(Meta)的AI驅動射線板智能眼鏡在兩名哈佛學生展示如何與面部識別軟件結合在一起後,他們可以實時揭示人們的個人詳細信息。 

AI Model Benchmarks – LLM Leaderboard

Last updated: Mar 3, 2025

OrganizationModelContextParameters (B)Input $/MOutput $/MLicenseGPQAMMLUMMLU ProDROPHumanEval openai o3128,000—Proprietary87.70%—- anthropic Claude 3.7 Sonnet200,000-$3.00 $15.00 Proprietary84.80%—- xai Grok-3128,000—Proprietary84.60%—- xai Grok-3 Mini128,000—Proprietary84.60%—- openai o3-mini200,000-$1.10 $4.40 Proprietary79.70%86.90%— openai o1-pro128,000—Proprietary79.00%—- openai o1200,000-$15.00 $60.00 Proprietary78.00%91.80%–88.10% google Gemini 2.0 Flash Thinking1,000,000—Proprietary74.20%—- openai o1-preview128,000-$15.00 $60.00 Proprietary73.30%90.80%— deepseek DeepSeek-R1131,072671$0.55 $2.19 Open71.50%90.80%84.00%92.20%- anthropic Claude 3.5 Sonnet200,000-$3.00 $15.00 Proprietary67.20%90.40%77.60%87.10%93.70% qwen QwQ-32B-Preview32,76832.5$0.15 $0.20 Open65.20%—- google Gemini 2.0 Flash1,048,576—Proprietary62.10%-76.40%– openai o1-mini128,000-$3.00 $12.00 Proprietary60.00%85.20%–92.40% deepseek DeepSeek-V3131,072671$0.27 $1.10 Open59.10%88.50%75.90%91.60%- google Gemini 1.5 Pro2,097,152-$2.50 $10.00 Proprietary59.10%85.90%75.80%74.90%84.10% microsoft Phi-416,00014.7$0.07 $0.14 Open56.10%84.80%70.40%75.50%82.60% xai Grok-2128,000-$2.00 $10.00 Proprietary56.00%87.50%75.50%-88.40% openai GPT-4o128,000-$2.50 $10.00 Proprietary53.60%88.00%74.70%– google Gemini 1.5 Flash1,048,576-$0.15 $0.60 Proprietary51.00%78.90%67.30%-74.30% xai Grok-2 mini128,000—Proprietary51.00%86.20%72.00%-85.70% meta Llama 3.1 405B Instruct128,000405$0.90 $0.90 Open50.70%87.30%73.30%84.80%89.00% meta Llama 3.3 70B Instruct128,00070$0.20 $0.20 Open50.50%86.00%68.90%-88.40% anthropic Claude 3 Opus200,000-$15.00 $75.00 Proprietary50.40%86.80%68.50%83.10%84.90% qwen Qwen2.5 32B Instruct131,07232.5–Open49.50%83.30%69.00%-88.40% qwen Qwen2.5 72B Instruct131,07272.7$0.35 $0.40 Open49.00%-71.10%-86.60% openai GPT-4 Turbo128,000-$10.00 $30.00 Proprietary48.00%86.50%-86.00%87.10% amazon Nova Pro300,000-$0.80 $3.20 Proprietary46.90%85.90%-85.40%89.00% meta Llama 3.2 90B Instruct128,00090$0.35 $0.40 Open46.70%86.00%— qwen Qwen2.5 14B Instruct131,07214.7–Open45.50%79.70%63.70%-83.50% mistral Mistral Small 332,00024$0.07 $0.14 Open45.30%-66.30%-84.80% qwen Qwen2 72B Instruct131,07272–Open42.40%82.30%64.40%-86.00% amazon Nova Lite300,000-$0.06 $0.24 Proprietary42.00%80.50%-80.20%85.40% meta Llama 3.1 70B Instruct128,00070$0.20 $0.20 Open41.70%83.60%66.40%79.60%80.50% anthropic Claude 3.5 Haiku200,000-$0.10 $0.50 Proprietary41.60%-65.00%83.10%88.10% anthropic Claude 3 Sonnet200,000-$3.00 $15.00 Proprietary40.40%79.00%56.80%78.90%73.00% openai GPT-4o mini128,000-$0.15 $0.60 Proprietary40.20%82.00%-79.70%87.20% amazon Nova Micro128,000-$0.04 $0.14 Proprietary40.00%77.60%-79.30%81.10% google Gemini 1.5 Flash 8B1,048,5768$0.07 $0.30 Proprietary38.40%-58.70%– ai21 Jamba 1.5 Large256,000398$2.00 $8.00 Open36.90%81.20%53.50%– microsoft Phi-3.5-MoE-instruct128,00060–Open36.80%78.90%54.30%-70.70% qwen Qwen2.5 7B Instruct131,0727.6$0.30 $0.30 Open36.40%-56.30%-84.80% xai Grok-1.5128,000—Proprietary35.90%81.30%51.00%-74.10% openai GPT-432,768-$30.00 $60.00 Proprietary35.70%86.40%-80.90%67.00% anthropic Claude 3 Haiku200,000-$0.25 $1.25 Proprietary33.30%75.20%-78.40%75.90% meta Llama 3.2 11B Instruct128,00010.6$0.06 $0.06 Open32.80%73.00%— meta Llama 3.2 3B Instruct128,0003.2$0.01 $0.02 Open32.80%63.40%— ai21 Jamba 1.5 Mini256,14452$0.20 $0.40 Open32.30%69.70%42.50%– openai GPT-3.5 Turbo16,385-$0.50 $1.50 Proprietary30.80%69.80%-70.20%68.00% meta Llama 3.1 8B Instruct131,0728$0.03 $0.03 Open30.40%69.40%48.30%59.50%72.60% microsoft Phi-3.5-mini-instruct128,0003.8$0.10 $0.10 Open30.40%69.00%47.40%-62.80% google Gemini 1.0 Pro32,760-$0.50 $1.50 Proprietary27.90%71.80%— qwen Qwen2 7B Instruct131,0727.6–Open25.30%70.50%44.10%– mistral Codestral-22B32,76822.2$0.20 $0.60 Open—-81.10% cohere Command R+128,000104$0.25 $1.00 Open-75.70%— deepseek DeepSeek-V2.58,192236$0.14 $0.28 Open-80.40%–89.00% google Gemma 2 27B8,19227.2–Open-75.20%–51.80% google Gemma 2 9B8,1929.2–Open-71.30%–40.20% xai Grok-1.5V128,000—Proprietary—– moonshotai Kimi-k1.5128,000—Proprietary-87.40%— nvidia Llama 3.1 Nemotron 70B Instruct128,00070–Open-80.20%— mistral Ministral 8B Instruct128,0008$0.10 $0.10 Open-65.00%–34.80% mistral Mistral Large 2128,000123$2.00 $6.00 Open-84.00%–92.00% mistral Mistral NeMo Instruct128,00012$0.15 $0.15 Open-68.00%— mistral Mistral Small32,76822$0.20 $0.60 Open—– microsoft Phi-3.5-vision-instruct128,0004.2–Open—– mistral Pixtral-12B128,00012.4$0.15 $0.15 Open-69.20%–72.00% mistral Pixtral Large128,000124$2.00 $6.00 Open—– qwen QvQ-72B-Preview32,76873.4–Open—– qwen Qwen2.5-Coder 32B Instruct128,00032$0.09 $0.09 Open-75.10%50.40%-92.70% qwen Qwen2.5-Coder 7B Instruct128,0007–Open-67.60%40.10%-88.40% qwen Qwen2-VL-72B-Instruct32,76873.4–Open—–