xorbitsai/inference

A unified, production‑ready inference framework (also called Xinference) that lets you serve and run large language models, speech recognition, and multimodal models locally, on the cloud, or on‑premises through a single API supporting a variety of hardware, model types, and deployment modes.