Cloud^Preview

Run larger, more powerful models with new capabilities

Upgrade

Upgrade

Speed up model inference

Run models using datacenter-grade hardware, returning responses much faster.
Run larger models

Upgrade to the newest hardware, making it possible to run larger models.
Privacy first

Ollama does not retain your data to ensure privacy and security.
Save battery life

Take the load of running models off your Mac, Windows or Linux computer, giving you performance back for your other apps.

What is Ollama's cloud?

Ollama's cloud is a way to run models on much more powerful cloud GPUs. Many new models are too large to fit on widely available GPUs, or run very slowly. Ollama's cloud provides a way to run these models fast while using Ollama's App, CLI, and API.
Does Ollama's cloud work with Ollama's CLI and App?

Yes! See the docs for more information.
Does Ollama's cloud work with Ollama's API and JavaScript/Python libraries?

Yes! See the docs for more information.
What are the usage limits for Ollama's cloud?

Ollama's cloud includes hourly and weekly limits to avoid capacity issues. Usage-based pricing will soon be available to consume models in a metered fashion.
What are premium model requests?

Premium model requests are additional requests reserved for larger models such as Gemini 3 Pro Preview. These requests don't count towards hourly or weekly limits, and are available in limited quantities per month. Currently, Free includes 5 premium requests, Pro includes 20 premium requests and Max includes 100. Premium model requests are new and we are working on increasing the usage limits for them on every plan.
What data is logged in Ollama's cloud?

Ollama does not log prompt or response data.