Close Menu
    Facebook X (Twitter) Instagram
    Cloud Tech ReportCloud Tech Report
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Cloud Tech ReportCloud Tech Report
    Home»AI News»NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD
    AI News

    NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD

    February 20, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    ledger


    NVIDIA has just released Dynamo v0.9.0. This is the most significant infrastructure upgrade for the distributed inference framework to date. This update simplifies how large-scale models are deployed and managed. The release focuses on removing heavy dependencies and improving how GPUs handle multi-modal data.

    The Great Simplification: Removing NATS and etcd

    The biggest change in v0.9.0 is the removal of NATS and ETCD. In previous versions, these tools handled service discovery and messaging. However, they added ‘operational tax’ by requiring developers to manage extra clusters.

    NVIDIA replaced these with a new Event Plane and a Discovery Plane. The system now uses ZMQ (ZeroMQ) for high-performance transport and MessagePack for data serialization. For teams using Kubernetes, Dynamo now supports Kubernetes-native service discovery. This change makes the infrastructure leaner and easier to maintain in production environments.

    Multi-Modal Support and the E/P/D Split

    Dynamo v0.9.0 expands multi-modal support across 3 main backends: vLLM, SGLang, and TensorRT-LLM. This allows models to process text, images, and video more efficiently.

    frase

    A key feature in this update is the E/P/D (Encode/Prefill/Decode) split. In standard setups, a single GPU often handles all 3 stages. This can cause bottlenecks during heavy video or image processing. v0.9.0 introduces Encoder Disaggregation. You can now run the Encoder on a separate set of GPUs from the Prefill and Decode workers. This allows you to scale your hardware based on the specific needs of your model.

    Sneak Preview: FlashIndexer

    This release includes a sneak preview of FlashIndexer. This component is designed to solve latency issues in distributed KV cache management.

    When working with large context windows, moving Key-Value (KV) data between GPUs is a slow process. FlashIndexer improves how the system indexes and retrieves these cached tokens. This results in a lower Time to First Token (TTFT). While still a preview, it represents a major step toward making distributed inference feel as fast as local inference.

    Smart Routing and Load Estimation

    Managing traffic across 100s of GPUs is difficult. Dynamo v0.9.0 introduces a smarter Planner that uses predictive load estimation.

    The system uses a Kalman filter to predict the future load of a request based on past performance. It also supports routing hints from the Kubernetes Gateway API Inference Extension (GAIE). This allows the network layer to communicate directly with the inference engine. If a specific GPU group is overloaded, the system can route new requests to idle workers with higher precision.

    The Technical Stack at a Glance

    The v0.9.0 release updates several core components to their latest stable versions. Here is the breakdown of the supported backends and libraries:

    ComponentVersionvLLMv0.14.1SGLangv0.5.8TensorRT-LLMv1.3.0rc1NIXLv0.9.0Rust Coredynamo-tokens crate

    The inclusion of the dynamo-tokens crate, written in Rust, ensures that token handling remains high-speed. For data transfer between GPUs, Dynamo continues to leverage NIXL (NVIDIA Inference Transfer Library) for RDMA-based communication.

    Key Takeaways

  • Infrastructure Decoupling (Goodbye NATS and ETCD): The release completes the modernization of the communication architecture. By replacing NATS and ETCD with a new Event Plane (using ZMQ and MessagePack) and Kubernetes-native service discovery, the system removes the ‘operational tax’ of managing external clusters.
  • Full Multi-Modal Disaggregation (E/P/D Split): Dynamo now supports a complete Encode/Prefill/Decode (E/P/D) split across all 3 backends (vLLM, SGLang, and TRT-LLM). This allows you to run vision or video encoders on separate GPUs, preventing compute-heavy encoding tasks from bottlenecking the text generation process.
  • FlashIndexer Preview for Lower Latency :The ‘sneak preview’ of FlashIndexer introduces a specialized component to optimize distributed KV cache management. It is designed to make the indexing and retrieval of conversation ‘memory’ significantly faster, aimed at further reducing the Time to First Token (TTFT).
  • Smarter Scheduling with Kalman Filters: The system now uses predictive load estimation powered by Kalman filters. This allows the Planner to forecast GPU load more accurately and handle traffic spikes proactively, supported by routing hints from the Kubernetes Gateway API Inference Extension (GAIE).
  • Check out the GitHub Release here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.



    Source link

    frase
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison

    June 10, 2026

    The consequences of relying on AI for accurate news | MIT News

    June 9, 2026

    Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

    June 8, 2026

    How C3 AI agents will automate predictive maintenance for Shell

    June 7, 2026

    Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal

    June 6, 2026

    The crucial human component in computing and AI | MIT News

    June 5, 2026
    aistudios
    Latest Posts

    Pepsi Fired 41 Truckers for AI… Buy THESE 7 Stocks NOW

    June 10, 2026

    A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison

    June 10, 2026

    How Claude AI Helped Me Make $1000 in One Weekend (Step by Step)

    June 10, 2026

    PewDiePie’s Odysseus AI — Beginners Guide, Best Models & Honest Review (7 Days Later)

    June 10, 2026

    Botanix Shuts Down as Bitcoin Defi Demand Falls Short

    June 10, 2026
    quillbot
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Dragonfly’s Rob Hadick Says Stablecoins Could Grow 10x as Payments Adoption Expands

    June 11, 2026

    XRP Demand Falls 91.5% As Traders Eye $0.63 Support

    June 11, 2026
    binance
    Facebook X (Twitter) Instagram Pinterest
    © 2026 CloudTechReport.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 63,584.00
    ethereum
    Ethereum (ETH) $ 1,680.30
    tether
    Tether (USDT) $ 0.998941
    bnb
    BNB (BNB) $ 604.04
    usd-coin
    USDC (USDC) $ 0.999797
    xrp
    XRP (XRP) $ 1.14
    solana
    Solana (SOL) $ 66.88
    tron
    TRON (TRX) $ 0.313677
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05