Python Package ยท MIT License ยท v2.0+

One API for
All LLMs.
Zero Lock-In.

PlugLLM gives you a single, clean Python interface for 13+ AI providers โ€” OpenAI, Gemini, Groq, Claude, Mistral, Ollama & more. Built-in memory, streaming, async. Switch providers in one line.

$ pip install plugllm
โ€” GitHub Stars
13+ Providers
4 Core Methods
<5K Lines of Code
demo.py โ€” PlugLLM python
from plugllm import ChatOpenAI, ChatGroq, LLMFactory

# โ”€โ”€ Direct provider usage โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
llm = ChatOpenAI(api_key="sk-...", model="gpt-4o")
reply = llm.ask("What is machine learning?")
print(reply)

# โ”€โ”€ Factory: switch providers in 1 line โ”€โ”€โ”€
llm = LLMFactory.create(
    "groq", api_key="gsk_...",
    model="llama-3.3-70b-versatile"
)
print(llm.generate("Explain AI in one sentence"))

# โ”€โ”€ Built-in session memory โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
chat = ChatGroq(api_key="gsk_...", max_history=10)
chat.chat("My name is Alice", session_id="u1")
response = chat.chat("What's my name?", session_id="u1")
# โ†’ "Your name is Alice!"

# โ”€โ”€ Real-time streaming โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
for chunk in llm.stream("Tell me a story"):
    print(chunk, end="", flush=True)
Running ยท plugllm 2.0
Python 3.9+
Supported Providers
๐Ÿค– OpenAI ๐Ÿ’Ž Google Gemini โšก Groq ๐Ÿง  Anthropic Claude ๐ŸŒŸ Mistral AI ๐Ÿ  Ollama (Local) ๐Ÿ”ต DeepSeek ๐ŸŒŠ Cohere ๐Ÿ‰ Alibaba Qwen ๐ŸŒ™ Moonshot Kimi โš™๏ธ xAI Grok ๐Ÿฆ™ Meta Llama ๐Ÿ‡ฎ๐Ÿ‡ณ SarvamAI ๐Ÿค– OpenAI ๐Ÿ’Ž Google Gemini โšก Groq ๐Ÿง  Anthropic Claude ๐ŸŒŸ Mistral AI ๐Ÿ  Ollama (Local) ๐Ÿ”ต DeepSeek ๐ŸŒŠ Cohere ๐Ÿ‰ Alibaba Qwen ๐ŸŒ™ Moonshot Kimi โš™๏ธ xAI Grok ๐Ÿฆ™ Meta Llama ๐Ÿ‡ฎ๐Ÿ‡ณ SarvamAI
โœฆ Why PlugLLM

Everything you need.
Nothing you don't.

A focused, production-ready library that does one thing exceptionally well โ€” unified LLM access.

๐Ÿ”Œ
Unified API
Identical .generate(), .chat(), .stream(), and .ask() methods across every provider. Swap OpenAI for Groq in one character change.
๐Ÿง 
Built-in Conversation Memory
Native sliding-window deque memory per session ID. Configure max_history=20 and forget about context management.
๐ŸŒŠ
Real-time Streaming
Sync and async streaming out of the box. for chunk in llm.stream(prompt) just works for every provider โ€” perfect for chat UIs.
๐Ÿ”„
Full Async Support
Every synchronous method has an async counterpart โ€” agenerate(), achat(), astream(), aask(). Compatible with asyncio, FastAPI, and any async framework.
python ยท async
import asyncio
from plugllm import ChatGemini

async def main():
    llm = ChatGemini()
    result = await llm.agenerate("Hello!")

    async for chunk in llm.astream("Tell a story"):
        print(chunk, end="", flush=True)

asyncio.run(main())
๐Ÿญ
LLMFactory & Fluent API
Create any provider dynamically with LLMFactory.create("groq", ...). Chain with .with_system(), .with_temperature(0.7), .call().
๐Ÿ”
Production Error Handling
Typed exception hierarchy: AuthenticationError, RateLimitError, HTTPStatusError. Built-in retries, configurable timeouts, structured ChatResponse.
โœฆ 13+ Providers

Every major LLM.
One import.

From cloud giants to local models โ€” all accessible through a single unified interface.

Provider Class Links
๐Ÿค– OpenAI
ChatOpenAIGet Key โ†—
๐Ÿ’Ž Google Gemini
ChatGeminiGet Key โ†— Free
โšก Groq
ChatGroqGet Key โ†— Free
๐Ÿง  Anthropic Claude
ChatClaudeGet Key โ†—
๐ŸŒŸ Mistral AI
ChatMistralGet Key โ†— Free
โš™๏ธ xAI Grok
ChatGrokGet Key โ†—
๐Ÿ”ต DeepSeek
ChatDeepSeekGet Key โ†—
๐Ÿ  Ollama
ChatOllamaInstall โ†— Local
๐ŸŒŠ Cohere
ChatCohereGet Key โ†—
๐Ÿ‰ Alibaba Qwen
ChatQwenGet Key โ†—
๐ŸŒ™ Moonshot Kimi
ChatKimiGet Key โ†—
๐Ÿฆ™ Meta Llama
ChatLlamaFlexible
๐Ÿ‡ฎ๐Ÿ‡ณ SarvamAI
ChatSarvamGet Key โ†— Free
โœฆ Comparison

PlugLLM vs the rest.

Focused simplicity beats heavyweight orchestration for most production AI apps.

Capability โœฆ PlugLLM
Switch provider in 1 line โœ“ Yes โ€” change class or factory string
Built-in memory โœ“ Native deque, per-session
Streaming (sync + async) โœ“ Both sync & async, unified
Package footprint โœ“ <5K LoC ultra-light
Learning curve โœ“ Minutes
โœฆ Tutorial

From zero to production
in 5 steps.

A complete, hands-on guide covering installation through advanced async patterns.

1
Install & Configure API Keys
Install the package and set your provider API keys as environment variables or pass them directly.
bash ยท installation
# Install PlugLLM from PyPI
pip install plugllm

# Optional: with dev dependencies
pip install "plugllm[dev]"

# Set your API keys
export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="AIza..."
export GROQ_API_KEY="gsk_..."
export ANTHROPIC_API_KEY="sk-ant-..."
2
Basic Generation & Q&A
Four core methods work identically across all providers. Mix and match freely.
python ยท basic usage
from plugllm import ChatOpenAI, ChatGroq, LLMFactory

# .generate() โ€” simple text completion
llm = ChatOpenAI(model="gpt-4o")
text = llm.generate("What is the difference between AI and ML?")

# .ask() โ€” Q&A with optional system prompt
answer = llm.ask(
    "Explain transformers architecture",
    system_prompt="You are a senior ML engineer. Be concise.",
    max_tokens=300
)
print(answer.content)   # ChatResponse object
print(answer.model)     # "gpt-4o"

# LLMFactory โ€” dynamic provider switching
llm2 = LLMFactory.create("gemini", model="gemini-2.5-flash")
result = llm2.generate("Summarize the Transformer paper")

# Fluent chaining interface
response = (
    ChatGroq()
    .with_system("You are a Python expert.")
    .with_temperature(0.2)
    .call("Write a binary search function")
)
3
Session-based Conversation Memory
Built-in sliding-window memory per session ID. No external storage required.
python ยท session memory
from plugllm import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", max_history=20)

# Multi-user session isolation
llm.chat("My name is Alice, I live in Paris", session_id="alice")
llm.chat("My favourite language is Python",   session_id="alice")
llm.chat("I'm Bob, a data scientist in Tokyo", session_id="bob")

# Each session has completely isolated memory
print(llm.chat("Where do I live?",    session_id="alice"))
# โ†’ "You live in Paris."

print(llm.chat("What do I work with?", session_id="bob"))
# โ†’ "You mostly work with PyTorch."

# Clear memory for a session
llm.clear_history(session_id="alice")
history = llm.get_history(session_id="bob")
4
Real-time Token Streaming
Stream responses token-by-token for live UIs, terminals, and chat applications.
python ยท streaming
from plugllm import ChatGroq

llm = ChatGroq(model="llama-3.3-70b-versatile")

# Synchronous streaming
for chunk in llm.stream("Write a short poem about Python"):
    print(chunk, end="", flush=True)

# Stream with system prompt
for chunk in llm.stream(
    "Explain async/await in Python",
    system_prompt="You are a concise technical writer."
):
    print(chunk, end="", flush=True)
5
Async & High-Concurrency
Run parallel LLM calls with asyncio โ€” ideal for batch processing and high-throughput APIs.
python ยท asyncio
import asyncio
from plugllm import ChatGemini, ChatGroq, ChatOpenAI

async def parallel_summarize(text: str) -> dict:
    providers = {
        "GPT-4o":  ChatOpenAI(model="gpt-4o-mini"),
        "Gemini":  ChatGemini(model="gemini-2.5-flash"),
        "Llama":   ChatGroq(model="llama-3.3-70b-versatile"),
    }
    prompt = f"Summarize in 2 sentences:\n{text}"
    tasks = {name: llm.agenerate(prompt) for name, llm in providers.items()}
    results = await asyncio.gather(*tasks.values())
    return dict(zip(tasks.keys(), results))

async def main():
    summaries = await parallel_summarize(
        "Transformer architecture uses self-attention..."
    )
    for provider, summary in summaries.items():
        print(f"\n[{provider}]\n{summary}")

asyncio.run(main())
โœฆ Real-World Projects

Build something real.

Three complete production-ready projects you can run right now.

PROJECT 01
Multi-LLM Content Summarizer
Benchmark multiple providers simultaneously and compare their summaries side by side.
OpenAI Groq Gemini
python
from plugllm import ChatOpenAI, ChatGroq, ChatGemini

def multi_summarize(text: str, max_tokens: int = 120):
    providers = {
        "GPT-4o": ChatOpenAI(model="gpt-4o-mini"),
        "Groq":   ChatGroq(model="llama-3.3-70b-versatile"),
        "Gemini": ChatGemini(model="gemini-2.5-flash"),
    }
    results = {}
    prompt = f"Summarize this concisely:\n\n{text}"
    for name, llm in providers.items():
        r = llm.ask(prompt, max_tokens=max_tokens)
        results[name] = {"summary": r.content, "tokens": r.usage}
    return results
PROJECT 02
Customer Support Bot
A stateful support bot with per-user session memory and sentiment-aware escalation.
Mistral Session Memory Stateful
python
from plugllm import ChatMistral

SYSTEM = """You are a helpful support agent.
Be empathetic, solution-focused, and brief."""

class SupportBot:
    def __init__(self):
        self.llm = ChatMistral(
            model="mistral-large-latest",
            max_history=15,
        )

    def handle(self, user_id: str, message: str) -> str:
        return self.llm.chat(
            message, session_id=user_id,
            system_prompt=SYSTEM,
        )

    def reset(self, user_id: str):
        self.llm.clear_history(session_id=user_id)
PROJECT 03
Local Code Assistant
Fully offline code explainer and reviewer using Ollama โ€” no API key, no cloud, no cost.
Ollama 100% Local Free
python
from plugllm import ChatOllama

# Requires: ollama pull deepseek-coder
assistant = ChatOllama(model="deepseek-coder:6.7b")

SYSTEM = """You are an expert code reviewer.
Explain clearly, identify bugs, and suggest
improvements with specific line references."""

def explain_code(code: str) -> str:
    return assistant.ask(
        f"Explain and review:\n\n```python\n{code}\n```",
        system_prompt=SYSTEM,
    ).content

def review_code(code: str) -> str:
    return assistant.ask(
        f"Find bugs and improvements:\n\n```python\n{code}\n```",
        system_prompt=SYSTEM,
    ).content
โœฆ API Reference

Complete API documentation.

Every method, parameter, and return type โ€” all in one place.

โšก Core Methods
generate()Basic text completion. Returns str.
chat()Memory-aware conversation. Returns ChatResponse.
ask()Q&A with optional system prompt. Returns ChatResponse.
stream()Token streaming iterator. Yields str chunks.
agenerate()Async version of generate().
achat()Async version of chat().
astream()Async generator, yields str chunks.
aask()Async version of ask().
โš™๏ธ Configuration Parameters
api_keystrProvider API key (or env var)
modelstrModel name/version string
temperaturefloatCreativity 0.0โ€“2.0 (default: 0.7)
max_tokensintMax output tokens (default: 1024)
max_historyintMemory window size (default: 10)
timeoutintRequest timeout in seconds
max_retriesintRetry count on failure (default: 3)
top_pfloatNucleus sampling probability
๐Ÿ“ฆ ChatResponse Object
python
response = llm.ask("What is Python?")

response.content       # str โ€” the text response
response.model         # str โ€” model used
response.usage         # dict โ€” token counts
response.raw_response  # dict โ€” full API response
response.finish_reason # str โ€” "stop", "length"

# Message factory helpers
from plugllm.types import Message
msg = Message.user("Hello!")
msg = Message.assistant("Hi there!")
msg = Message.system("You are an expert.")
๐Ÿ” Error Handling
python
from plugllm import ChatOpenAI
from plugllm.types import (
    AuthenticationError,
    RateLimitError,
    HTTPStatusError,
    PlugLLMError,
)

llm = ChatOpenAI(max_retries=3)

try:
    response = llm.generate("Hello!")

except AuthenticationError as e:
    print(f"Invalid API key: {e}")

except RateLimitError as e:
    print(f"Rate limited: {e}")

except HTTPStatusError as e:
    print(f"HTTP {e.status_code}: {e}")

except PlugLLMError as e:
    print(f"General error: {e}")

Start building today.

One command to install. One import to access every major LLM. Zero vendor lock-in.

$ pip install plugllm