Pockot珀刻机

Pockot

Small Models Make the Floor Measurable

Research Note: Small Models Make the Floor Measurable

Question

The off-grid question is not "can a pocket device run the largest model?" It is "which tasks become useful at 1B, 3B, 7B, or 13B parameters under a strict power budget?" Pockot needs a model-size ladder before it can talk about autonomy.

Source-Backed Data Points

  • Meta's Llama 3.2 release includes lightweight text-only 1B and 3B models intended for select edge and mobile devices. Source: Meta Llama 3.2.
  • The same Meta release states that the Llama 3.2 1B and 3B models support a 128K-token context length. Source: Meta Llama 3.2.
  • LoRA reports a reduction in trainable parameters by 10,000 times and GPU memory requirement by 3 times compared with GPT-3 175B fine-tuning with Adam. Source: arXiv 2106.09685.

Reading

Small models change the device question because they allow task-specific usefulness. A 1B or 3B model may be enough for local summaries, simple extraction, command parsing, or document search when paired with retrieval. It may fail at reasoning depth or broad knowledge. Both statements can be true.

Compression and adaptation are also separate. Quantization makes a model fit and run in less memory. LoRA-style adapters reduce what has to be trained for a task. Neither automatically creates a self-improving device. They create knobs that an offline system might use under clear limits.

Tool Rule

Pockot will model parameter count and bits per parameter explicitly. A 3B 4-bit model should appear as a memory estimate, not as a quality claim. The next tool version should add measured tokens per second by device and runtime.