Close Menu
    Facebook X (Twitter) Instagram
    Cloud Tech ReportCloud Tech Report
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Cloud Tech ReportCloud Tech Report
    Home»AI News»Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World
    AI News

    Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World

    February 9, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    ledger






    Robots are entering their GPT-3 era. For years, researchers have tried to train robots using the same autoregressive (AR) models that power large language models (LLMs). If a model can predict the next word in a sentence, it should be able to predict the next move for a robotic arm. However, a technical wall has blocked this progress: continuous robot movements are difficult to turn into discrete tokens.

    A team of researchers from Harvard University and Stanford University have released a new framework called Ordered Action Tokenization (OAT) to bridge this gap.

    https://arxiv.org/pdf/2602.04215

    The Messy Reality of Robot Actions

    Tokenization turns complex data into a sequence of discrete numbers (tokens). For robots, these actions are continuous signals like joint angles. Previous strategies had fatal flaws:

    aistudios
    • Binning: Turns every action dimension into a ‘bin.’ While simple, it creates massive sequences that make training and inference slow.
    • FAST (Frequency-space Action Sequence Tokenization): Uses math to compress movements into frequency coefficients. It is fast but often produces ‘undecodable’ sequences where small errors cause the robot to halt or move unpredictably.
    • Learned Latent Tokenizers: These use a learned ‘dictionary’ of movements. They are safe but lack a specific order, meaning the model treats early and late tokens as equally important.
    https://arxiv.org/pdf/2602.04215

    The Three Golden Rules of OAT

    The research team identified 3 essential properties—desiderata—for a functional robot tokenizer:

  • High Compression (P.1): Token sequences must be short to keep models efficient.
  • Total Decodability (P.2): The decoder must be a total function, ensuring every possible token sequence maps to a valid movement.
  • Causal Ordering (P.3): Tokens must have a left-to-right structure where early tokens capture global motion and later tokens refine details.
  • The Secret Sauce: Nested Dropout and Registers

    OAT uses a transformer encoder with register tokens to summarize action chunks. To force the model to learn ‘important’ things first, the research team used a innovative approach called Nested Dropout.

    https://arxiv.org/pdf/2602.04215

    Breaking the Benchmarks

    The research team tested OAT across 20+ tasks in 4 major simulation benchmarks. OAT consistently outperformed the industry-standard Diffusion Policy (DP) and previous tokenizers.

    Performance Results

    BenchmarkOAT Success RateDP Success RateBin Token CountOAT Token CountLIBERO56.3% 36.6% 224 8 RoboMimic73.1% 67.1% 224 8 MetaWorld24.4% 19.3% 128 8 RoboCasa54.6% 54.0% 384 8

    ‘Anytime’ Inference: Speed vs. Precision

    The most practical benefit of OAT is prefix-based detokenization. Since the tokens are ordered by importance, you can stop the model early.

    • Coarse Actions: Decoding just 1 or 2 tokens gives the robot a general direction quickly, which is useful for low-latency tasks.
    • Fine Actions: Generating all 8 tokens provides the high-precision details needed for complex insertions.

    This allows for a smooth trade-off between computation cost and action fidelity that previous fixed-length tokenizers could not offer.

    Key Takeaways

    • Solving the Tokenization Gap: OAT addresses a fundamental limitation in applying autoregressive models to robotics by introducing a learned tokenizer that simultaneously achieves high compression, total decodability, and causal ordering.
    • Ordered Representation via Nested Dropout: By utilizing nested dropout during training, OAT forces the model to prioritize global, coarse motion patterns in early tokens while reserving later tokens for fine-grained refinements.
    • Total Decodability and Reliability: Unlike prior frequency-domain methods like FAST, OAT ensures the detokenizer is a total function, meaning every possible token sequence generates a valid action chunk, preventing runtime execution failures.
    • Flexible ‘Anytime’ Inference: The ordered structure enables prefix-based decoding, allowing robots to execute coarse actions from just one or two tokens to save computation or full eight-token sequences for high-precision tasks.
    • Superior Performance Across Benchmarks: Autoregressive policies equipped with OAT consistently outperform diffusion-based baselines and other tokenization schemes, achieving a 52.3% aggregate success rate and superior results in real-world ‘Pick & Place’ and ‘Stack Cups’ tasks.

    Check out the Paper, Repo and Project Page. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.







    Previous articleA Coding Implementation to Establish Rigorous Prompt Versioning and Regression Testing Workflows for Large Language Models using MLflow




    Source link

    binance
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison

    June 10, 2026

    The consequences of relying on AI for accurate news | MIT News

    June 9, 2026

    Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

    June 8, 2026

    How C3 AI agents will automate predictive maintenance for Shell

    June 7, 2026

    Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal

    June 6, 2026

    The crucial human component in computing and AI | MIT News

    June 5, 2026
    murf
    Latest Posts

    Pepsi Fired 41 Truckers for AI… Buy THESE 7 Stocks NOW

    June 10, 2026

    A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison

    June 10, 2026

    How Claude AI Helped Me Make $1000 in One Weekend (Step by Step)

    June 10, 2026

    PewDiePie’s Odysseus AI — Beginners Guide, Best Models & Honest Review (7 Days Later)

    June 10, 2026

    Botanix Shuts Down as Bitcoin Defi Demand Falls Short

    June 10, 2026
    binance
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Dragonfly’s Rob Hadick Says Stablecoins Could Grow 10x as Payments Adoption Expands

    June 11, 2026

    XRP Demand Falls 91.5% As Traders Eye $0.63 Support

    June 11, 2026
    Customgpt
    Facebook X (Twitter) Instagram Pinterest
    © 2026 CloudTechReport.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 63,091.00
    ethereum
    Ethereum (ETH) $ 1,658.44
    tether
    Tether (USDT) $ 0.999
    bnb
    BNB (BNB) $ 599.04
    usd-coin
    USDC (USDC) $ 0.99981
    xrp
    XRP (XRP) $ 1.12
    solana
    Solana (SOL) $ 65.43
    tron
    TRON (TRX) $ 0.321542
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.02
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05