llmfit: what it is, what problem it solves & why it's gaining traction
llmfit: what it is, what problem it solves & why it's gaining traction
What it solves
llmfit is a terminal tool designed to help users find the right-sized Large Language Model (LLM) for their specific hardware. It eliminates the guesswork of whether a model will fit in VRAM or run at acceptable speeds by automatically detecting system specs (RAM, CPU, GPU) and scoring models based on quality, speed, and fit.
How it works
The tool analyzes your hardware and compares it against a database of hundreds of models and providers. It calculates a composite score for each model, estimating the best quantization level and the expected tokens per second (tok/s) based on your system's bandwidth and memory. It supports multiple local runtime providers including Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio.
Who it’s for
It is for users running local LLMs who want to optimize model selection based on their available hardware, as well as those planning hardware upgrades to see which models would become runnable.
Highlights
- Hardware Detection & Simulation: Automatically detects your system specs or allows you to simulate different hardware to see what would fit.
- Interactive TUI: A Vim-inspired terminal interface for searching, filtering, and comparing models.
- Community Leaderboard: Integrates with localmaxxing.com to show real-world performance data (tok/s, TTFT, VRAM) from other users with similar hardware.
- Plan Mode: Estimates the hardware requirements (VRAM/RAM/CPU) needed to run a specific model configuration.
- Download Manager: Built-in tools to manage model downloads and directory configurations.
- Live Inference Bench: Measures actual performance (TTFT, TPS) against locally running providers.
Sources
- undefinedAlexsJones/llmfit