Ollama for Mac — Install & Run Local AI Models on macOS (2026)

Ollama for Mac is one of the simplest ways to run large language models locally on your computer , and yes, it works natively on both Apple Silicon and Intel Macs. No cloud subscription, no API costs, no data leaving your machine. This guide covers installing Ollama on macOS, running models like Llama 3 and Mistral, and setting up a proper chat interface.

What is Ollama?

Ollama is an open-source tool that lets you download and run large language models directly on your own hardware. Think of it as a package manager for AI models: pick a model, run one command, and you’re chatting with it in seconds. It handles downloads, configuration, and serving automatically.

On Mac, Ollama runs as a lightweight background process in your menu bar and exposes a local API on port 11434. Apple Silicon Macs (M1, M2, M3, M4) get good performance because Ollama uses Metal to offload computation to the GPU. Intel Macs work too, but inference is slower, especially for larger models.

Supported models include Llama 3, Mistral, Gemma, Phi, Qwen, DeepSeek, and dozens more from the Ollama model library. All models run fully offline once downloaded.

How to install Ollama on Mac

Go to ollama.com and click “Download for macOS”
Open the downloaded .zip file , it extracts the Ollama app
Drag Ollama to your Applications folder
Launch Ollama , a llama icon appears in your menu bar
Open Terminal (Applications → Utilities → Terminal)
Run your first model: ollama run llama3

If you use Homebrew, you can also install via terminal:

brew install ollama

Then start the service with ollama serve. Either method works , the GUI installer is simpler for most people.

How to run AI models with Ollama on Mac

Once Ollama is running, every model follows the same pattern: ollama run [modelname]. The first run downloads the model, which takes a few minutes depending on size. After that, it loads from local storage.

Running Llama 3

In Terminal: ollama run llama3

Llama 3 is Meta’s open model and a reasonable default. The 8B version (about 4.7GB) runs on Macs with 8GB RAM. There’s also a 70B version for machines with 32GB+ RAM. Once downloaded, type your message and press Enter.

Running Mistral

In Terminal: ollama run mistral

Mistral 7B is fast and works well for coding and general Q&A. It’s about 4.1GB and runs on any Mac with 8GB RAM. Many developers prefer it over Llama 3 for speed.

Running other models

A few more worth trying:

ollama run gemma3 , Google’s Gemma 3, good on reasoning tasks
ollama run phi4 , Microsoft’s Phi-4, small but capable
ollama run deepseek-r1 , DeepSeek’s reasoning model, strong for math and logic
ollama run qwen2.5-coder , Alibaba’s coding-focused model
ollama run llava , multimodal model that can analyze images

Browse the full list at ollama.com/library. Each model page shows file size and RAM requirements.

List downloaded models: ollama list. Remove a model: ollama rm modelname.

Add a chat UI , Open WebUI for Ollama

The terminal works fine, but if you want a ChatGPT-style interface with conversation history, file uploads, and model switching, Open WebUI is the best option. It’s a browser-based frontend that connects to your local Ollama instance.

The easiest install method uses Docker. If you don’t have it, download from docker.com first.

Make sure Ollama is running (llama icon in menu bar)
Open Terminal and run:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Wait about a minute for the container to start
Open your browser and go to http://localhost:3000
Create a local account , no data goes anywhere external
Pick a model from the dropdown and start chatting

If you’d rather skip Docker, Open WebUI also has a pip installer: pip install open-webui, then open-webui serve. The Docker method is more reliable.

System requirements for Ollama on Mac

Component	Minimum	Recommended
Chip	Intel Mac	Apple Silicon (M1/M2/M3/M4)
RAM	8GB	16GB+
Storage	10GB free	50GB+ for multiple models
macOS	macOS 11 Big Sur	macOS 14 Sonoma or later

RAM is the main constraint. A 7B-8B model typically needs about 5-6GB. A 13B model needs 8-10GB. A 70B model needs 40GB+, which limits it to high-end M2/M3/M4 Pro/Max/Ultra configs. If your Mac has 8GB RAM, stay with 7B and 8B models.

Apple Silicon Macs are noticeably faster because the CPU and GPU share unified memory. An M1 Pro with 16GB RAM generates roughly 50-60 tokens per second with Llama 3 8B. An Intel Mac with the same RAM manages around 5-10 tokens per second.

Common issues and fixes

Model download stalls

Large models can be 4-8GB and may appear frozen on slow connections. Run ollama pull llama3 directly in Terminal to see a progress bar. If it genuinely stalls, press Ctrl+C and re-run , Ollama resumes from where it stopped.

Ollama not starting or menu bar icon missing

Open Activity Monitor and search “ollama”. If a process is running but there’s no icon, quit it and relaunch from Applications. If Ollama won’t launch at all, go to System Settings → Privacy & Security , macOS sometimes blocks apps from outside the App Store on first launch. Click “Open Anyway”.

Out of memory errors

If you see “model too large” or your system becomes sluggish during inference, the model needs more RAM than you have free. Switch to a smaller variant , use llama3:8b instead of llama3:70b, for example.

Port 11434 already in use

Ollama’s API runs on port 11434. If something else is using it, Ollama fails silently. Check with lsof -i :11434 in Terminal. If Ollama is already running in the background, quit it from the menu bar first, then relaunch.

FAQ

Is Ollama free on Mac?

Yes. Ollama is open source under the MIT license. All models in the official library are also free. No subscriptions, no API limits, no usage fees.

Does Ollama work on older Intel Macs?

Yes, on macOS 11 Big Sur or later. Expect 3-10 tokens per second on a 7B model compared to 50+ on M-series chips. Smaller models like Phi-4 mini or Gemma 3 2B run better on older hardware.

Is my data private?

Yes. Everything runs locally. No prompts or responses are sent to any server. This is why a lot of people use Ollama , full privacy for sensitive data or confidential work.

Can I use Ollama as an API for my own apps?

Yes. Ollama exposes an OpenAI-compatible REST API at http://localhost:11434. You can call it from Python, JavaScript, or any HTTP client. Many tools designed for the OpenAI API work with it , just change the base URL.

How do I update models?

Run ollama pull modelname. Ollama only downloads changed layers, so updates are usually faster than the initial download.

Can Ollama run multiple models at once?

Yes, though each loaded model uses RAM. Ollama unloads models that haven’t been used recently. You can adjust this with the OLLAMA_KEEP_ALIVE environment variable.

Running AI locally on a Mac has gotten genuinely accessible. Ollama handles most of the setup complexity, and Apple Silicon’s unified memory makes it fast enough for real use. If you’re also running AI tools on Windows, take a look at our guides on ChatGPT for PC, Claude AI for PC, and Stable Diffusion for PC. For a local model runner with a built-in GUI, our LM Studio for Mac guide covers a solid alternative to Ollama.

Rate this post

Ollama for Mac , Install & Run Local AI Models on macOS (2026)

What is Ollama?

How to install Ollama on Mac