§01 — MODEL WEIGHTS
Model Catalog
Open-weight language models from 1B to 35B parameters, selected for prompt adherence, tool calling, and reasoning — optimized for on-device deployment with EdgeAI.
Tool Calling
Models that reliably generate structured function calls, enabling agentic workflows on-device.
Reasoning
Chain-of-thought and hybrid thinking modes for complex multi-step problem solving at the edge.
Edge-Optimized
Every model runs locally on CPUs, GPUs, or NPUs — no cloud dependency, no per-token costs.
Tier 1 · 1 – 4B
Ultra-Light
Phones, IoT, Embedded
Qwen3-1.7B
1.7BSmallest model with real tool calling and hybrid thinking modes.
Llama 3.2 3B Instruct
3BBroadest ecosystem support with native tool use and Meta safety tools.
Phi-4-mini-instruct
3.8BBest overall at this size for tool calling with 128K context window.
Ministral 3B
3.4B256K context, vision support, and native function calling.
Tier 2 · 4 – 14B
Small
Laptops, Jetson, Single GPU
Qwen3-8B
8BBest balance of size and capability with hybrid thinking modes.
Gemma 3 12B
12BMultimodal with 140+ languages and Google edge SDK support.
DeepSeek-R1-Distill-Qwen-14B
14BBest-in-class reasoning at this size, distilled from DeepSeek-R1.
Qwen3-14B
14BStrongest tool calling in tier with 128K context window.
Tier 3 · 14 – 35B
Edge Server
Workstations, Edge Servers
Qwen3-30B-A3B (MoE)
30B / 3B activeLarge model intelligence at small model cost — best edge efficiency.
Mistral Small 3.2 24B
24BBest dense model for tool calling, on par with Llama 70B.
Gemma 3 27B
27BBeats Gemini 1.5-Pro with best Google ecosystem integration.
Qwen3-32B
32BStrongest dense Qwen3 with hybrid thinking and 128K context.
Notable Picks
Honorable Mentions
Standout models worth watching — novel architectures, first-of-their-kind releases, and compact VLMs for edge vision tasks.
GPT-oss-20B
20B (MoE)OpenAI's first open-weight model. MXFP4 quantized MoE that fits in 16GB — Apache 2.0 licensed.
LFM2.5-1.2B-Instruct
1.2BLiquid AI's edge-native architecture — 2x faster than Qwen3 on CPU, optimized for embedded SoCs.
SmolVLM2-2.2B-Instruct
2.2BHuggingFace's compact VLM for image understanding — strong vision performance under 3B parameters.
Moondream2
1.8BTiny vision-language model that runs in 2GB RAM — ideal for edge visual understanding tasks.