Model Catalog

Smallest model with real tool calling and hybrid thinking modes.

Llama 3.2 3B Instruct

Broadest ecosystem support with native tool use and Meta safety tools.

Phi-4-mini-instruct

3.8B

Best overall at this size for tool calling with 128K context window.

Ministral 3B

3.4B

Tool CallingMultimodal

256K context, vision support, and native function calling.

Tier 2 · 4 – 14B

Small

Laptops, Jetson, Single GPU

Qwen3-8B

Best balance of size and capability with hybrid thinking modes.

Gemma 3 12B

12B

Multimodal with 140+ languages and Google edge SDK support.

DeepSeek-R1-Distill-Qwen-14B

14B

Reasoning

Best-in-class reasoning at this size, distilled from DeepSeek-R1.

Qwen3-14B

14B

Strongest tool calling in tier with 128K context window.

Tier 3 · 14 – 35B

Edge Server

Workstations, Edge Servers

Qwen3-30B-A3B (MoE)

30B / 3B active

Large model intelligence at small model cost — best edge efficiency.

Mistral Small 3.2 24B

24B

Best dense model for tool calling, on par with Llama 70B.

Gemma 3 27B

27B

Beats Gemini 1.5-Pro with best Google ecosystem integration.

Qwen3-32B

32B

Strongest dense Qwen3 with hybrid thinking and 128K context.

Notable Picks

Honorable Mentions

Standout models worth watching — novel architectures, first-of-their-kind releases, and compact VLMs for edge vision tasks.

GPT-oss-20B

20B (MoE)

OpenAI's first open-weight model. MXFP4 quantized MoE that fits in 16GB — Apache 2.0 licensed.

LFM2.5-1.2B-Instruct

1.2B

Liquid AI's edge-native architecture — 2x faster than Qwen3 on CPU, optimized for embedded SoCs.

SmolVLM2-2.2B-Instruct

2.2B

HuggingFace's compact VLM for image understanding — strong vision performance under 3B parameters.

Moondream2

1.8B