AI models

The models behind Jarvis: on-device transcription for the ears, and a brain you choose (Claude, OpenAI, Gemini, or a local model). What runs locally, and what needs your own key.

Jarvis uses two kinds of model: the ears that turn your speech into text, and the brain that reasons and acts.

The ears: transcription

Speech-to-text runs on-device by default, with no internet needed. Smaller models are faster, larger ones more accurate, and you pick the trade-off. If you want a cloud transcription service for extra accuracy, plug in your own key.

The brain: reasoning and action

For the agent work, choose the model that does the thinking.

  • Cloud: Claude, OpenAI, or Gemini, using your own API key.
  • Local: run an on-device model for fully offline, fully private operation.

Picking a trade-off

  • Local models keep everything on your Mac and work offline.
  • Cloud models are more capable for hard, multi-step work.
  • You can mix: on-device transcription with a cloud brain, or all local.