Ollama for Mac is one of the simplest ways to run large language models locally on your computer , and yes, it works natively on both Apple Silicon and Intel Macs. No cloud subscription, no API costs, no data leaving your machine. This guide covers installing Ollama on macOS, running models like Llama 3 and Mistral, and setting up a proper chat interface.
What is Ollama?
Ollama is an open-source tool that lets you download and run large language models directly on your own hardware. Think of it as a package manager for AI models: pick a model, run one command, and you’re chatting with it in seconds. It handles downloads, configuration, and serving automatically.
On Mac, Ollama runs as a lightweight background process in your menu bar and exposes a local API on port 11434. Apple Silicon Macs (M1, M2, M3, M4) get good performance because Ollama uses Metal to offload computation to the GPU. Intel Macs work too, but inference is slower, especially for larger models.
Supported models include Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, and dozens more from the Ollama model library. All models run fully offline once downloaded.
How to install Ollama on Mac
- Go to ollama.com and click “Download for macOS”
- Open the downloaded
.zipfile , it extracts the Ollama app - Drag Ollama to your Applications folder
- Launch Ollama , a llama icon appears in your menu bar
- Open Terminal (Applications → Utilities → Terminal)
- Run your first model:
ollama run llama3
If you use Homebrew, you can also install via terminal:
brew install ollama
Then start the service with ollama serve. Either method works , the GUI installer is simpler for most people.
How to run AI models with Ollama on Mac
Once Ollama is running, every model follows the same pattern: ollama run [modelname]. The first run downloads the model, which takes a few minutes depending on size. After that, it loads from local storage.
Running Llama 3
In Terminal: ollama run llama3
Llama 3 is Meta’s open model and a reasonable default. The 8B version (about 4.7GB) runs on Macs with 8GB RAM. There’s also a 70B version for machines with 32GB+ RAM. Once downloaded, type your message and press Enter.
Running Mistral
In Terminal: ollama run mistral
Mistral 7B is fast and works well for coding and general Q&A. It’s about 4.1GB and runs on any Mac with 8GB RAM. Many developers prefer it over Llama 3 for speed.
Running other models
A few more worth trying:
ollama run gemma3, Google’s Gemma 3, good on reasoning tasksollama run phi4, Microsoft’s Phi-4, small but capableollama run deepseek-r1, DeepSeek’s reasoning model, strong for math and logicollama run qwen2.5-coder, Alibaba’s coding-focused modelollama run llava, multimodal model that can analyze images
Browse the full list at ollama.com/library. Each model page shows file size and RAM requirements.
List downloaded models: ollama list. Remove a model: ollama rm modelname.
Add a chat UI , Open WebUI for Ollama
The terminal works fine, but if you want a ChatGPT-style interface with conversation history, file uploads, and model switching, Open WebUI is the best option. It’s a browser-based frontend that connects to your local Ollama instance.
The easiest install method uses Docker. If you don’t have it, download from docker.com first.
- Make sure Ollama is running (llama icon in menu bar)
- Open Terminal and run:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main - Wait about a minute for the container to start
- Open your browser and go to
http://localhost:3000 - Create a local account , no data goes anywhere external
- Pick a model from the dropdown and start chatting
If you’d rather skip Docker, Open WebUI also has a pip installer: pip install open-webui, then open-webui serve. The Docker method is more reliable.
System requirements for Ollama on Mac
| Component | Minimum | Recommended |
|---|---|---|
| Chip | Intel Mac | Apple Silicon (M1/M2/M3/M4) |
| RAM | 8GB | 16GB+ |
| Storage | 10GB free | 50GB+ for multiple models |
| macOS | macOS 11 Big Sur | macOS 14 Sonoma or later |
RAM is the main constraint. A 7B-8B model typically needs about 5-6GB. A 13B model needs 8-10GB. A 70B model needs 40GB+, which limits it to high-end M2/M3/M4 Pro/Max/Ultra configs. If your Mac has 8GB RAM, stay with 7B and 8B models.
Apple Silicon Macs are noticeably faster because the CPU and GPU share unified memory. An M1 Pro with 16GB RAM generates roughly 50-60 tokens per second with Llama 3 8B. An Intel Mac with the same RAM manages around 5-10 tokens per second.
Common issues and fixes
Model download stalls
Large models can be 4-8GB and may appear frozen on slow connections. Run ollama pull llama3 directly in Terminal to see a progress bar. If it genuinely stalls, press Ctrl+C and re-run , Ollama resumes from where it stopped.
Ollama not starting or menu bar icon missing
Open Activity Monitor and search “ollama”. If a process is running but there’s no icon, quit it and relaunch from Applications. If Ollama won’t launch at all, go to System Settings → Privacy & Security , macOS sometimes blocks apps from outside the App Store on first launch. Click “Open Anyway”.
Out of memory errors
If you see “model too large” or your system becomes sluggish during inference, the model needs more RAM than you have free. Switch to a smaller variant , use llama3:8b instead of llama3:70b, for example.
Port 11434 already in use
Ollama’s API runs on port 11434. If something else is using it, Ollama fails silently. Check with lsof -i :11434 in Terminal. If Ollama is already running in the background, quit it from the menu bar first, then relaunch.
FAQ
Is Ollama free on Mac?
Yes. Ollama is open source under the MIT license. All models in the official library are also free. No subscriptions, no API limits, no usage fees.
Does Ollama work on older Intel Macs?
Yes, on macOS 11 Big Sur or later. Expect 3-10 tokens per second on a 7B model compared to 50+ on M-series chips. Smaller models like Phi-4 mini or Gemma 3 2B run better on older hardware.
Is my data private?
Yes. Everything runs locally. No prompts or responses are sent to any server. This is why a lot of people use Ollama , full privacy for sensitive data or confidential work.
Can I use Ollama as an API for my own apps?
Yes. Ollama exposes an OpenAI-compatible REST API at http://localhost:11434. You can call it from Python, JavaScript, or any HTTP client. Many tools designed for the OpenAI API work with it , just change the base URL.
How do I update models?
Run ollama pull modelname. Ollama only downloads changed layers, so updates are usually faster than the initial download.
Can Ollama run multiple models at once?
Yes, though each loaded model uses RAM. Ollama unloads models that haven’t been used recently. You can adjust this with the OLLAMA_KEEP_ALIVE environment variable.
Running AI locally on a Mac has gotten genuinely accessible. Ollama handles most of the setup complexity, and Apple Silicon’s unified memory makes it fast enough for real use. If you’re also running AI tools on Windows, take a look at our guides on ChatGPT for PC, Claude AI for PC, and Stable Diffusion for PC. For a local model runner with a built-in GUI, our LM Studio for Mac guide covers a solid alternative to Ollama.




