Close Menu
    Facebook X (Twitter) Instagram
    Cloud Tech ReportCloud Tech Report
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Cloud Tech ReportCloud Tech Report
    Home»AI News»Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+
    AI News

    Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

    May 20, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    Customgpt



    Canadian AI lab Cohere made waves recently by announcing a merger with German AI startup Aleph Alpha, but now it has even more in store for enterprise builders around the globe: today, the firm co-founded by former Googler and "Attention Is All You Need" co-author Aidan Gomez unveiled Command A+, a highly optimized, 218-billion-parameter language model engineered specifically for complex reasoning, multimodal document processing, and agentic workflows.

    The most significant aspect of the release is not just the model’s capabilities; it is its accessibility.

    By releasing the model weights free on the popular AI code sharing repository Hugging Face under a highly permissive Apache 2.0 open-source license — a first for the company, according to a post by Gomez, now Cohere's CEO, on X — Cohere is making a calculated bet on "sovereign AI"—the thesis that enterprises, governments, and developers should have the ability to run, control, and adapt frontier-grade AI entirely within their own secure environments, without sacrificing performance.

    Sparse architecture with extreme quantization

    At the architectural level, Command A+ represents a major evolution from Cohere’s previous dense models. It is a decoder-only Sparse Mixture-of-Experts (MoE) Transformer.

    aistudios

    While the model houses a relatively modest 218 billion total parameters, even fewer — only 25 billion — are active during any given generation step. It's a much lighter footprint and requires far less compute resources to run in inference (serving the model in production environments to end users or via agents) than the proprietary U.S. giants like OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.7, which are estimated by third-party observers to be in the trillions of parameters.

    This sparse architecture is the key to the model’s efficiency. In plain terms, an MoE model routes incoming queries only to the specific "expert" neural networks best suited to handle them, leaving the rest of the model dormant.

    This is a familiar formulation and one followed by most leading LLMs these days, allowing models to retain the vast knowledge base and nuanced reasoning capabilities of a giant, but at the faster speeds and reduced compute and energy requirements of a much smaller model, since only a fraction of parameters are ever activated at any time.

    But where Cohere has taken an extra step beyond most for Command A+ is that it has focused heavily on hardware efficiency through quantization—a process that compresses the model's memory footprint by reducing the precision of its parameters.

    Command A+ is available in 16-bit (BF16), 8-bit (FP8), and a highly compressed 4-bit (W4A4) format.

    The W4A4 quantization is the technical centerpiece of this release. Typically, reasoning models suffer an outsized "quantization tax," where compressing the model leads to visible regressions in complex problem-solving.

    Cohere mitigated this by only quantizing the MoE experts to 4-bit, while keeping the critical attention pathways at full precision, supplemented by a technique called Quantization-Aware Distillation.

    The result is a nearly lossless compression that allows this massive model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.

    The speed gains are equally notable. According to performance data released by the company, the W4A4 quantization at low concurrency achieves 375 tokens per second (TOPS) with a Time-to-First-Token (TTFT) latency of just 113 milliseconds—representing up to a 63% increase in output speed and a 17% reduction in latency compared to the previous Command A Reasoning model.

    Furthermore, Cohere has overhauled the model's tokenizer. Tokenizers break text down into the fragments that AI models process. The new tokenizer is highly optimized for global enterprise use, featuring native support for 48 languages.

    More importantly, it dramatically improves tokenization efficiency for non-European languages, reducing the number of tokens required to generate responses in Arabic by 20%, Japanese by 18%, and Korean by 16%. Because inference costs are calculated per token, this translates directly to lower operational costs for global, multilingual or non-English deployments.

    Agentic workflows and high benchmarks on math, specialized fields

    While raw speed and size dictate deployment, a model’s utility is defined by its product capabilities. Command A+ was built specifically for "agentic" tasks — workflows where the AI operates autonomously or semi-autonomously, uses external tools, queries databases, and synthesizes information across multiple steps.

    The benchmark leaps over the previous generation are stark.

    On 𝜏²-Bench Telecom, which tests complex reasoning, the model jumped from a 37% score to 85%. On Terminal-Bench Hard, which measures agentic coding performance, it climbed from 3% to 25%. In complex mathematics, it scored 90% on AIME 25, up from 57%.

    Command A+ punches above its weight class (25B active parameters) in pure reasoning and mathematics, competing directly with much larger models like DeepSeek V4 Pro on math benchmarks. However, for deep agentic coding and general broad-scale intelligence indexing, it currently trails behind the latest generations from Chinese open source rivals like DeepSeek, Z.ai (GLM), and MiniMax.

    That said, comparing them directly ignores Cohere's core value proposition: hardware efficiency.

    Beyond the benchmarks, Command A+ introduces deep integrations for enterprise trust and verification. The model supports conversational tool use via standard chat templates, allowing developers to connect it seamlessly to internal APIs, search engines, or SQL databases.

    Crucially, Command A+ features native citation generation. When Command A+ retrieves information from an external tool, it doesn't just synthesize the answer; it generates explicit "grounding spans." Using special tags embedded in the output, the model directly links every factual claim it makes to the specific source document or database row it pulled the information from.

    For enterprises heavily regulated industries like finance, healthcare, or legal, this traceability is the difference between an interesting prototype and a production-ready application. If a user asks for a daily sales report, the model will output the total sales amount and explicitly cite the database query result that provided that number, minimizing the risk of undetected hallucinations.

    Additionally, Command A+ is fully multimodal, capable of processing both text and images natively within its massive 128K input context window, making it highly effective for complex document processing, such as analyzing scanned invoices, charts, or technical manuals.

    The first fully Apache 2.0 licensed Cohere AI model

    In the current AI landscape, "open source" has become a fraught term. Many leading AI companies release their model weights under restrictive commercial licenses or acceptable use policies that explicitly forbid large enterprises from using the models for commercial purposes, or prohibit the models from being used to train competing AI systems.

    Indeed, Cohere's prior models, including Command R and Command R+, were released under a CC-BY-NC 4.0 (Creative Commons NonCommercial) license. While their model weights were open for researchers and developers to download, tinker with, and evaluate, they were strictly prohibited from being used for commercial purposes without purchasing a separate enterprise license from Cohere or going through its application programming interface (API), similar to the arrangement many enterprises use for accessing AI models from OpenAI, Anthropic, Google and other leading labs.

    Cohere has changed up its approach by releasing Command A+ under the Apache 2.0 license. This is a critical distinction for the developer community. Apache 2.0 is a true, OSI-approved open-source license. It allows anyone—from independent developers to Fortune 500 corporations—to use, modify, distribute, and commercialize the model without paying licensing fees or adhering to restrictive non-compete clauses.

    As Gomez wrote on X, the decision was championed by fellow Cohere co-founder Nick Frosst, who posted a two-minute long overview calling it "the best model we've ever put out."

    For the enterprise, this license means total vendor independence. A company can download the Command A+ weights, fine-tune them on highly classified internal data, and deploy them on their own private servers or air-gapped networks. They are not tethered to Cohere’s infrastructure, pricing changes, or API uptime. It is the ultimate realization of sovereign AI.

    The release was met with immediate traction across the AI developer ecosystem, driven heavily by its day-one integration with major open-source inference frameworks like Hugging Face and vLLM.

    What's next?

    The release of Command A+ marks a maturing of the open-source AI ecosystem. By combining frontier-level reasoning, robust agentic tool use, and multimodal capabilities with an architecture specifically designed for hardware efficiency, Cohere is changing the calculus for enterprise AI adoption.

    The requirement of massive, centralized compute clusters has long been a bottleneck for companies prioritizing data privacy and cost control. By democratizing access to a model of this caliber under a true open-source license, Cohere has provided the enterprise market with exactly what it has been asking for: the power of the cloud, capable of running securely in the server room down the hall.



    Source link

    livechat
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison

    June 10, 2026

    The consequences of relying on AI for accurate news | MIT News

    June 9, 2026

    Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

    June 8, 2026

    How C3 AI agents will automate predictive maintenance for Shell

    June 7, 2026

    Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal

    June 6, 2026

    The crucial human component in computing and AI | MIT News

    June 5, 2026
    Customgpt
    Latest Posts

    Pepsi Fired 41 Truckers for AI… Buy THESE 7 Stocks NOW

    June 10, 2026

    A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison

    June 10, 2026

    How Claude AI Helped Me Make $1000 in One Weekend (Step by Step)

    June 10, 2026

    PewDiePie’s Odysseus AI — Beginners Guide, Best Models & Honest Review (7 Days Later)

    June 10, 2026

    Botanix Shuts Down as Bitcoin Defi Demand Falls Short

    June 10, 2026
    coinbase
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Dragonfly’s Rob Hadick Says Stablecoins Could Grow 10x as Payments Adoption Expands

    June 11, 2026

    XRP Demand Falls 91.5% As Traders Eye $0.63 Support

    June 11, 2026
    ledger
    Facebook X (Twitter) Instagram Pinterest
    © 2026 CloudTechReport.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 62,576.00
    ethereum
    Ethereum (ETH) $ 1,639.90
    tether
    Tether (USDT) $ 0.998854
    bnb
    BNB (BNB) $ 598.37
    usd-coin
    USDC (USDC) $ 0.999807
    xrp
    XRP (XRP) $ 1.11
    solana
    Solana (SOL) $ 65.47
    tron
    TRON (TRX) $ 0.316377
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05