Microsoft通過推出Phi-4-Mini和Phi-4-Multimodal,加強了其AI投資組合,擴大了其Phi-4家族。 These new models reinforce the company’s focus on developing compact AI systems that maintain high efficiency while delivering performance on par with larger models.

The introduction of Phi-4-mini, a lightweight text-based AI model, and Phi-4-multimodal, which incorporates image-processing capabilities, positions Microsoft to compete in the growing sector of small, high-performance AI.

The update follows Microsoft’s decision to open-source Phi-4 in January 2025, making it freely available under an MIT license.

Phi-4-mini continues this trend of accessibility, while Phi-4-multimodal introduces capabilities that align with recent AI advancements by OpenAI, Google, and Meta.

Microsoft在此成功之後,Microsoft取得了重大的步驟>在擁抱面孔上釋放Phi-4的模型。微軟工程師Shital Shah確認了這一決定,並指出:“許多人一直在要求我們釋放體重。很少有人在擁抱面上上載的盜版phi-4重量。好吧,不再等待。今天,我們將在Huggingface上發布官方PHI-4模型! With MIT license!!”

Phi-4-multimodal is a 5.6B parameter model that seamlessly integrates speech, vision, and text processing into a single, unified architecture. According to Microsoft, the “model enables more natural and context-aware interactions, allowing devices to understand and reason across multiple input modalities simultaneously.”

“Whether interpreting spoken language, analyzing images, or processing textual information, it delivers highly efficient, low-latency inference—all while optimizing for on-device execution and reduced computational overhead.”

Phi-4-multimodal is capable of processing both visual and audio together and achieves much stronger performance on multiple benchmarks than other existing state-of-the-art omni models.

Phi-4-multimodal audio and visual benchmarks (Source: Microsoft)

Phi-4-multimodal has also demonstrated great capabilities in speech-related tasks, emerging as a leading open model in multiple areas. It outperforms specialized models like WhisperV3 and SeamlessM4T-v2-Large in both automatic speech recognition (ASR) and speech translation (ST), and has claimed the top position on the Huggingface OpenASR leaderboard with an impressive word error rate of 6.14%.

According to Microsoft, “The model has a gap with close models, such as Gemini-2.0-Flash and GPT-4o-realtime-preview, on speech question answering (QA) tasks as the smaller model size results in less capacity to retain factual QA knowledge.”

Phi-4-multimodal speech benchmarks (Source: Microsoft)

Phi-4-multimodal, espite its smaller size with only 5.6B parameters, demonstrates remarkable vision capabilities across various benchmarks, most notably achieving strong performance on mathematical and science reasoning.

According to Microsoft, it “maintains competitive performance on general multimodal capabilities, such as document and chart understanding, Optical Character Recognition (OCR), and visual science reasoning, matching or exceeding close models like Gemini-2-Flash-lite-preview/Claude-3.5-Sonnet.”

Phi-4-multimodal vision benchmarks (Source: Microsoft)

The other model, Phi-4-mini, is a 3.8B parameter model with a dense, decoder-only transformer architecture featuring grouped-query attention, 200,000 vocabulary, and shared input-output embeddings.它以高精度和可擴展性支持高達128,000個令牌的序列。

根據Microsoft的說法,“ Phi-4-Mini可以通過查詢進行推理,用適當的參數識別和調用相關功能,接收功能輸出,並將這些結果納入其響應中。這將創建一個可擴展的基於代理的系統,可以通過將其連接到外部工具,應用程序接口(API)和數據源通過定義明確的功能接口來增強模型B2AWV3QM94PSIWIDAGMTAYNCA1NZYIIHDPZHROPSIXMDI0IIBOZWLNAHQ9IJU3NIIGEG1SBNM9IMH0DHA6LY93DHLENMH0DHALMH0DHALMH0DHLNAH0DHLNAH0DHALBINBLAB3D3CUDZMUB3JNNLZNLZIWMDAVC3ZNIJNIJNIJNIJNIJ48L3N2 nn2 zzs4 > 為什麼Microsoft押注較小的AI模型

PHI-4-MINI和PHI-4-MILTIMODAL與Microsoft更廣泛的AI模型的更廣泛的轉變,這些模型平衡了績效和訪問權限的較較高的企業,探索了較小的企業。我不需要高端GPU或大量的雲資源。

該策略的主要驅動力之一是合成數據,Microsoft用它來完善PHI-4的解決問題的能力。通過在策劃的合成數據集上訓練AI,而不是僅依靠網絡綁帶的內容,Microsoft可以在沒有不必要的計算開銷的情況下確保更好的邏輯推理。該方法在PHI-4的強大數學性能中起著關鍵作用,強化了訓練有素的小型模型可以挑戰大型AI系統。

另一個關鍵要素是Microsoft決定平衡開源可訪問性與Enterprise Cloud Cloud Integration的決定。通過將Phi-4-Mini公開提供,同時將Phi-4-MultiModal在Azure生態系統中保存下來,Microsoft可以滿足依靠託管AI解決方案的獨立開發人員和企業。

This dual approach contrasts with OpenAI, which has restricted access to its latest models, and Mistral AI, which has focused on local deployment rather than cloud-based AI services.

Competition From Hugging Face, Mistral AI, and Google

Microsoft’s expansion of the Phi-4 series comes at a time when other companies are prioritizing efficient, smaller-scale AI models. Hugging Face推出了SMOLVLM-256M和SMOLVLM-500M輕型多模型型號,旨在在低於1GB RAM的低功率設備上運行。這些模型旨在尋找不需要高端基礎架構的AI解決方案的開發人員,使其直接競爭Microsoft的Phi-4-MultiModal。

Mistral AI也通過發行了各種型3B和Ministral 8B,兩個用於啟用磁場處理的緊湊型模型,從而增強了其位置。與雲的AI不同,這些模型旨在完全在本地硬件上發揮作用,從而解決了不需要Internet連接的以隱私為中心的AI的需求不斷增長。根據Mistral的說法,“客戶一直在推動不依賴雲基礎架構但仍然提供迅速響應時間的選項。”該公司還聲稱,這些模型在Microsoft和Google上的表現都優於Microsoft和Google的類似產品。

在遵循這些發展的指令中,Google與這些型號一起提供了高效的效果Google的AI生態系統可以通過Vertex AI在Google Cloud上部署。 OFT的PHI-4陣容現在進入一個正在迅速發展的市場,該市場迅速發展為可訪問,多模式和本地處理的AI解決方案。

多模式A​​I對於自動化文檔分析,搜索索引和AI驅動的研究特別有用,Microsoft具有Microsoft具有既定利益的領域。通過將這些功能集成到PHI-4中,Microsoft將其AI應用程序擴展到了傳統的基於文本的模型之外,同時保持其緊湊型體系結構的效率優勢。

Microsoft在競爭激烈的AI市場

競爭中,AI空間中的競爭越來越多,因為更多的公司都在越來越重視,縮放量表可縮放。 Mistral AI擴展到亞太市場及其IPO的計劃突出了對輕量級AI的不斷增長的投資。同時,擁抱的面部繼續鞏固其作為開源AI領導者的領導地位,通過較小的,可適應的AI系統提供了專有模型的替代方案。

Microsoft的AI策略仍然是獨一無二的,因為它彌合了開放式研究和商業AI部署之間的差距。儘管該公司在財務上支持OpenAi,但其自己的AI部門正在建立模型,可替代OpenAI的封閉源方法。這使Microsoft處於同時的支持者和不斷發展的AI景觀中的競爭者的位置。

隨著行業的AI採用增長,對可以有效地在多種硬件上有效運行的模型的需求正在增加。微軟最新的PHI-4發布表明,小型高性能模型可能在企業AI開發中起更大的作用。公司現在不僅專注於擴大參數計數,而是在優化培訓技術和微調架構以提高效率而不損害準確性。

Categories: IT Info