Close Menu
    Facebook X (Twitter) Instagram
    Cloud Tech ReportCloud Tech Report
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Cloud Tech ReportCloud Tech Report
    Home»AI News»This AI Paper Introduces TinyLoRA, A 13-Parameter Fine-Tuning Method That Reaches 91.8 Percent GSM8K on Qwen2.5-7B
    AI News

    This AI Paper Introduces TinyLoRA, A 13-Parameter Fine-Tuning Method That Reaches 91.8 Percent GSM8K on Qwen2.5-7B

    March 24, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    This AI Paper Introduces TinyLoRA, A 13-Parameter Fine-Tuning Method That Reaches 91.8 Percent GSM8K on Qwen2.5-7B
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    murf


    Researchers from FAIR at Meta, Cornell University, and Carnegie Mellon University have demonstrated that large language models (LLMs) can learn to reason using a remarkably small number of trained parameters. The research team introduces TinyLoRA, a parameterization that can scale down to a single trainable parameter under extreme sharing settings. Using this method on a Qwen2.5-7B-Instruct backbone, the research team achieved 91.8% accuracy on the GSM8K benchmark with only 13 parameters, totaling just 26 bytes in bf16.

    Overcoming the Constraints of Standard LoRA

    Standard Low-Rank Adaptation (LoRA) adapts a frozen linear layer W ∈ Rdxk using trainable matrices A ∈ Rdxr and B ∈ Rrxk. The trainable parameter count in standard LoRA still scales with layer width and rank, which leaves a nontrivial lower bound even at rank 1. For a model like Llama3-8B, this minimum update size is approximately 3 million parameters.

    TinyLoRA circumvents this by building upon LoRA-XS, which utilizes the truncated Singular Value Decomposition (SVD) of frozen weights. While LoRA-XS typically requires at least one parameter per adapted module, TinyLoRA replaces the trainable matrix with a low-dimensional trainable vector 𝜐 ∈ Ru projected through a fixed random tensor P ∈ Ruxrxr.

    The update rule is defined as:

    quillbot

    $$W’ = W + U\Sigma(\sum_{i=1}^{u}v_{i}P_{i})V^{\top}$$

    By applying a weight tying factor (ntie), the total trainable parameters scale as O(nmu/ntie), allowing updates to scale down to a single parameter when all modules across all layers share the same vector.

    Reinforcement Learning: The Catalyst for Tiny Updates

    A core finding of the research is that Reinforcement Learning (RL) is fundamentally more efficient than Supervised Finetuning (SFT) at extremely low parameter counts. The research team reports that models trained via SFT require updates 100 to 1,000 times larger to reach the same performance as those trained with RL.

    This gap is attributed to the ‘information density’ of the training signal. SFT forces a model to absorb many bits of information—including stylistic noise and irrelevant structures of human demonstrations—because its objective treats all tokens as equally informative. In contrast, RL (specifically Group Relative Policy Optimization or GRPO) provides a sparser but cleaner signal. Because rewards are binary (e.g., exact match for a math answer), reward-relevant features correlate with the signal while irrelevant variations cancel out through resampling.

    Optimization Guidelines for Devs

    The research team isolated several strategies to maximize the efficiency of tiny updates:

    • Optimal Frozen Rank (r): Analysis showed that a frozen SVD rank of r=2 was optimal. Higher ranks introduced too many degrees of freedom, complicating the optimization of the small trainable vector.
    • Tiling vs. Structured Sharing: The research team compared ‘structured’ sharing (modules of the same type share parameters) with ’tiling‘ (nearby modules of similar depth share parameters). Surprisingly, tiling was more effective, showing no inherent benefit to forcing parameter sharing exclusively between specific projections like Query or Key modules.
    • Precision: In bit-constrained regimes, storing parameters in fp32 proved most performant bit-for-bit, even when accounting for its larger footprint compared to bf16 or fp16.

    Benchmark Performance

    The research team reports that Qwen-2.5 models often needed around 10x fewer updated parameters than LLaMA-3 to reach similar performance in their setup.

    ModelParameters TrainedGSM8K Pass@1Qwen2.5-7B-Instruct (Base)088.2%Qwen2.5-7B-Instruct182.0%Qwen2.5-7B-Instruct1391.8%Qwen2.5-7B-Instruct19692.2%Qwen2.5-7B-Instruct (Full FT)~7.6 Billion91.7%

    On harder benchmarks like MATH500 and AIME24, 196-parameter updates for Qwen2.5-7B-Instruct retained 87% of the absolute performance improvement of full finetuning across six difficult math benchmarks.

    Key Takeaways

    • Extreme Parameter Efficiency: It is possible to train a Qwen2.5-7B-Instruct model to achieve 91.8% accuracy on the GSM8K math benchmark using only 13 parameters (26 total bytes).
    • The RL Advantage: Reinforcement Learning (RL) is fundamentally more efficient than Supervised Finetuning (SFT) in low-capacity regimes; SFT requires 100–1000x larger updates to reach the same performance level as RL.
    • TinyLoRA Framework: The research team developed TinyLoRA, a new parameterization that uses weight tying and random projections to scale low-rank adapters down to a single trainable parameter.
    • Optimizing the “Micro-Update”: For these tiny updates, fp32 precision is more bit-efficient than half-precision formats , and “tiling” (sharing parameters by model depth) outperforms structured sharing by module type.
    • Scaling Trends: As models grow larger, they become more ‘programmable’ with fewer absolute parameters, suggesting that trillion-scale models could potentially be tuned for complex tasks using just a handful of bytes.

    Check out the Paper. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.



    Source link

    aistudios
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    On algorithms, life, and learning | MIT News

    March 23, 2026

    Testing autonomous agents (Or: how I learned to stop worrying and embrace chaos)

    March 22, 2026

    Visa prepares payment systems for AI agent-initiated transactions

    March 21, 2026

    A Coding Implementation Showcasing ClawTeam’s Multi-Agent Swarm Orchestration with OpenAI Function Calling

    March 20, 2026

    A better method for identifying overconfident large language models | MIT News

    March 19, 2026

    New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow

    March 18, 2026
    frase
    Latest Posts

    Anthropic AI Academy WIPES OUT $497/Month AI Courses 💀 (Beginners Are Winning For $0)

    March 24, 2026

    This Simple ChatGPT Hack Makes You a Prompt GENIUS!

    March 24, 2026

    BAL price outlook as Balancer Labs proposes radical tokenomics overhaul

    March 24, 2026

    Bitcoin Bulls Fight To Hold $70K, Derivatives Data Signals Weakness

    March 24, 2026

    What The Current Dogecoin Momentum Means For The Meme Coin’s Price

    March 23, 2026
    notion
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    How Bitcoin evolved from ‘safe haven’ to become the market’s real-time geopolitical risk indicator

    March 24, 2026

    Fira Debuts Fixed-Rate DeFi Lending Protocol with $450M in Deposits

    March 24, 2026
    10web
    Facebook X (Twitter) Instagram Pinterest
    © 2026 CloudTechReport.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 70,026.00
    ethereum
    Ethereum (ETH) $ 2,147.01
    tether
    Tether (USDT) $ 0.999534
    bnb
    BNB (BNB) $ 635.90
    xrp
    XRP (XRP) $ 1.40
    usd-coin
    USDC (USDC) $ 0.99991
    solana
    Solana (SOL) $ 89.84
    tron
    TRON (TRX) $ 0.309154
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.04
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05