🔌 PlugLLM Documentation

📘 Version 2.0.0 • One API to rule them all — Unified interface for 13+ LLM providers

📖 Overview

PlugLLM provides a unified interface for interacting with multiple Large Language Model providers. It abstracts away provider-specific implementation details, offering consistent methods for generation, chat, and streaming across all supported providers.

// ES Module imports
import { ChatOpenAI, Message, ChatResponse } from 'plugllm';
OpenAI gpt-4o
Google gemini-2.0-flash
Groq llama-3.3-70b
Anthropic claude-sonnet-4-5
xAI grok-3-mini
Mistral mistral-large
Meta Llama-4-Maverick
DeepSeek deepseek-chat
Alibaba qwen-plus
Moonshot moonshot-v1-8k
Cohere command-a-03-2025
SarvamAI sarvam-2b-v0.5
Ollama gemma3 (local)

📦 Installation

npm install plugllm
# or
yarn add plugllm
# or
pnpm add plugllm

🧠 Core Concepts

BaseLLM

BaseLLM is an abstract class that all provider implementations extend. It defines the standard interface and shared functionality.

Properties

PropertyTypeDescription
modelstringThe model identifier being used
temperaturenumberSampling temperature (0-2)
maxTokensnumberMaximum tokens to generate
maxHistorynumberMaximum messages retained in conversation history
apiKeystringAPI key for the provider

Methods

MethodReturn TypeDescription
generate(prompt, kwargs)Promise<ChatResponse>Generate a response from a prompt
stream(prompt, kwargs)AsyncGenerator<string>Stream a response token by token
chat(message, options, kwargs)Promise<ChatResponse>Continue a conversation with memory
ask(userPrompt, options, kwargs)Promise<ChatResponse>Simple Q&A with optional system prompt
askStream(userPrompt, options, kwargs)AsyncGenerator<string>Stream Q&A responses
getConversationHistory(sessionId)Message[]Retrieve conversation history
clearConversation(sessionId)voidClear history (preserves system message)
resetConversation(sessionId)voidFull reset including system message
setSystemMessage(message, sessionId)voidSet system prompt for a session

ChatResponse

Standardized response object returned by all generation methods.

interface ChatResponse {
  /** The generated text content */
  content: string;
  
  /** The model used for generation */
  model: string;
  
  /** Token usage statistics */
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
  
  /** Raw response from the provider API */
  rawResponse: any;
  
  /** Reason why generation stopped */
  finishReason: 'stop' | 'length' | 'content_filter' | 'tool_calls' | null;
}

Message Factory

Factory for creating standardized message objects used in conversation history.

// Static factory methods
Message.user(content: string): Message
Message.assistant(content: string): Message
Message.system(content: string): Message

// Message interface
interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

Example

import { Message } from 'plugllm';

const messages = [
  Message.system('You are a helpful assistant'),
  Message.user('What is the capital of France?'),
  Message.assistant('The capital of France is Paris.'),
  Message.user('What is its population?')
];

📚 API Reference

BaseLLM Class

Abstract base class providing common functionality for all LLM providers.

Constructor Options

interface BaseLLMOptions {
  /** API key for the provider (reads from env if omitted) */
  apiKey?: string;
  
  /** Model identifier */
  model?: string;
  
  /** Sampling temperature (0-2) */
  temperature?: number;
  
  /** Maximum tokens to generate */
  maxTokens?: number;
  
  /** Maximum messages retained in history (default: 10) */
  maxHistory?: number;
  
  /** Base URL for API requests (provider-specific) */
  baseURL?: string;
}

generate()

import { ChatOpenAI, Message } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

// String prompt
const response = await llm.generate('Explain quantum computing');

// Message array
const response = await llm.generate([
  Message.system('You are a physics professor'),
  Message.user('Explain quantum computing')
]);

// With provider-specific kwargs
const response = await llm.generate('Hello', { 
  top_p: 0.9,
  frequency_penalty: 0.5 
});

stream()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

for await (const chunk of llm.stream('Tell me a story')) {
  process.stdout.write(chunk);
}

chat()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

// Default session
await llm.chat('My name is Alice');
await llm.chat('What is my name?'); // Remembers context

// Multiple sessions
await llm.chat('I like Python', { sessionId: 'user1' });
await llm.chat('I like JavaScript', { sessionId: 'user2' });

ask()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

const response = await llm.ask(
  'What is machine learning?',
  { systemPrompt: 'You are a patient teacher. Explain simply.' }
);

askStream()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

for await (const chunk of llm.askStream('Count from 1 to 10')) {
  process.stdout.write(chunk);
}

Fluent Interface

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

const response = await llm
  .withSystem('You are a math tutor')
  .withUser('What is the derivative of x²?')
  .withTemperature(0.3)
  .withMaxTokens(200)
  .call();

Fluent Methods

MethodDescription
withSystem(content)Set system message
withUser(content)Add user message
withAssistant(content)Add assistant message
withTemperature(value)Set temperature
withMaxTokens(value)Set max tokens
call(kwargs)Execute with current chain

Conversation Management

// Get history
const history = llm.getConversationHistory('default');

// Clear history (preserves system message)
llm.clearConversation('default');

// Full reset
llm.resetConversation('default');

// Set system message
llm.setSystemMessage('You are a helpful coding assistant', 'coding-session');

LLMFactory

Factory class for creating provider instances dynamically.

type Provider = 
  | 'openai' | 'chatopenai'
  | 'gemini' | 'chatgemini' | 'google'
  | 'groq' | 'chatgroq'
  | 'claude' | 'chatclaude' | 'anthropic'
  | 'grok' | 'chatgrok' | 'xai'
  | 'mistral' | 'chatmistral'
  | 'llama' | 'chatllama' | 'meta'
  | 'deepseek' | 'chatdeepseek'
  | 'qwen' | 'chatqwen' | 'alibaba'
  | 'kimi' | 'chatkimi' | 'moonshot'
  | 'cohere' | 'chatcohere'
  | 'sarvam' | 'chatsarvamai'
  | 'ollama' | 'chatollama';
import { LLMFactory } from 'plugllm';

const llm = LLMFactory.create('groq', {
  apiKey: 'gsk_xxx',
  model: 'llama-3.3-70b-versatile',
  temperature: 0.7
});

v1 API (Legacy)

Simplified API for quick prototyping.

import { config, generate, chat, resetChat } from 'plugllm';

// Configure once
config({
  provider: 'openai',
  apiKey: 'sk-xxx',
  model: 'gpt-4o'
});

// Generate
const reply = await generate('What is JavaScript?');

// Stateful chat
const r1 = await chat('My name is Bob');
const r2 = await chat('What is my name?');

// Reset
resetChat();

🔌 Provider-Specific Classes

Each provider class extends BaseLLM and may include provider-specific methods or properties.

ChatOpenAI

OpenAI GPT models (GPT-4, GPT-4o, GPT-3.5).

Environment Variable: OPENAI_API_KEY | Default Model: gpt-4o

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: 'sk-xxx',
  model: 'gpt-4o',
  organization: 'org-xxx' // Optional
});

ChatGemini

Google Gemini models.

Environment Variable: GEMINI_API_KEY | Default Model: gemini-2.0-flash

import { ChatGemini } from 'plugllm';

const llm = new ChatGemini({
  apiKey: 'AIza...',
  model: 'gemini-2.0-flash'
});

ChatGroq

Groq's ultra-fast inference.

Environment Variable: GROQ_API_KEY | Default Model: llama-3.3-70b-versatile

import { ChatGroq } from 'plugllm';

const llm = new ChatGroq({
  apiKey: 'gsk_xxx',
  model: 'llama-3.3-70b-versatile'
});

ChatClaude

Anthropic Claude models.

Environment Variable: ANTHROPIC_API_KEY | Default Model: claude-sonnet-4-5

import { ChatClaude } from 'plugllm';

const llm = new ChatClaude({
  apiKey: 'sk-ant-xxx',
  model: 'claude-sonnet-4-5'
});

ChatGrok

xAI Grok models.

Environment Variable: XAI_API_KEY | Default Model: grok-3-mini

import { ChatGrok } from 'plugllm';

const llm = new ChatGrok({
  apiKey: 'xai-xxx',
  model: 'grok-3-mini'
});

ChatMistral

Mistral AI models.

Environment Variable: MISTRAL_API_KEY | Default Model: mistral-large-latest

import { ChatMistral } from 'plugllm';

const llm = new ChatMistral({
  apiKey: 'xxx',
  model: 'mistral-large-latest'
});

ChatLlama

Meta Llama models via Llama API.

Environment Variable: LLAMA_API_KEY | Default Model: Llama-4-Maverick-17B

import { ChatLlama } from 'plugllm';

const llm = new ChatLlama({
  apiKey: 'xxx',
  model: 'Llama-4-Maverick-17B'
});

ChatDeepSeek

DeepSeek models.

Environment Variable: DEEPSEEK_API_KEY | Default Model: deepseek-chat

import { ChatDeepSeek } from 'plugllm';

const llm = new ChatDeepSeek({
  apiKey: 'xxx',
  model: 'deepseek-chat'
});

ChatQwen

Alibaba Qwen models.

Environment Variable: DASHSCOPE_API_KEY | Default Model: qwen-plus

import { ChatQwen } from 'plugllm';

const llm = new ChatQwen({
  apiKey: 'xxx',
  model: 'qwen-plus'
});

ChatKimi

Moonshot Kimi models.

Environment Variable: MOONSHOT_API_KEY | Default Model: moonshot-v1-8k

import { ChatKimi } from 'plugllm';

const llm = new ChatKimi({
  apiKey: 'xxx',
  model: 'moonshot-v1-8k'
});

ChatCohere

Cohere models.

Environment Variable: CO_API_KEY | Default Model: command-a-03-2025

import { ChatCohere } from 'plugllm';

const llm = new ChatCohere({
  apiKey: 'xxx',
  model: 'command-a-03-2025'
});

ChatSarvamAI

SarvamAI Indian language models.

Environment Variable: SARVAM_API_KEY | Default Model: sarvam-2b-v0.5

import { ChatSarvamAI } from 'plugllm';

const llm = new ChatSarvamAI({
  apiKey: 'xxx',
  model: 'sarvam-2b-v0.5'
});

ChatOllama

Local Ollama models.

Environment Variable: None required | Default Model: gemma3 | Default Base URL: http://localhost:11434

import { ChatOllama } from 'plugllm';

const llm = new ChatOllama({
  model: 'llama3',
  baseURL: 'http://localhost:11434'
});

⚠️ Error Types

PlugLLM provides typed errors for better error handling.

import {
  AuthenticationError,
  RateLimitError,
  ValidationError,
  APIError,
  NetworkError
} from 'plugllm/types';
Error ClassDescription
AuthenticationErrorInvalid or missing API key
RateLimitErrorRate limit exceeded
ValidationErrorInvalid parameters or configuration
APIErrorProvider API returned an error
NetworkErrorNetwork connectivity issues

Example

import { ChatOpenAI } from 'plugllm';
import { 
  AuthenticationError, 
  RateLimitError, 
  ValidationError, 
  APIError, 
  NetworkError 
} from 'plugllm/types';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

try {
  const response = await llm.generate('Hello');
} catch (error) {
  switch (error.name) {
    case 'AuthenticationError':
      console.error('Check your API key');
      break;
    case 'RateLimitError':
      console.error('Rate limit hit, retry after:', error.retryAfter);
      break;
    case 'ValidationError':
      console.error('Invalid parameters:', error.message);
      break;
    case 'APIError':
      console.error('Provider error:', error.statusCode, error.message);
      break;
    case 'NetworkError':
      console.error('Connection failed:', error.message);
      break;
    default:
      console.error('Unknown error:', error);
  }
}

💡 Usage Examples

Multi-Turn Conversation

import { ChatOpenAI, Message } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o'
});

async function conversation() {
  const sessionId = 'user-123';
  
  // Set context
  llm.setSystemMessage(
    'You are an expert JavaScript developer. Provide concise answers.',
    sessionId
  );
  
  // Multi-turn conversation
  const responses = [];
  
  responses.push(await llm.chat('What is a closure?', { sessionId }));
  console.log('Assistant:', responses[0].content);
  
  responses.push(await llm.chat('Give me a practical example', { sessionId }));
  console.log('Assistant:', responses[1].content);
  
  responses.push(await llm.chat('How does it relate to lexical scoping?', { sessionId }));
  console.log('Assistant:', responses[2].content);
  
  // View history
  const history = llm.getConversationHistory(sessionId);
  console.log(`Conversation length: ${history.length} messages`);
}

conversation();

Streaming with Progress

import { ChatGroq } from 'plugllm';

const llm = new ChatGroq({
  apiKey: process.env.GROQ_API_KEY,
  model: 'llama-3.3-70b-versatile'
});

async function streamWithProgress() {
  let tokenCount = 0;
  
  process.stdout.write('Generating: ');
  
  for await (const chunk of llm.stream('Explain the theory of relativity')) {
    tokenCount++;
    process.stdout.write(chunk);
  }
  
  console.log(`\n\nGenerated ${tokenCount} tokens`);
}

streamWithProgress();

Comparing Multiple Providers

import { LLMFactory } from 'plugllm';

async function compareProviders(prompt) {
  const providers = [
    { name: 'OpenAI', config: { provider: 'openai', model: 'gpt-4o' } },
    { name: 'Claude', config: { provider: 'claude', model: 'claude-sonnet-4-5' } },
    { name: 'Gemini', config: { provider: 'gemini', model: 'gemini-2.0-flash' } }
  ];
  
  const results = await Promise.all(
    providers.map(async ({ name, config }) => {
      const llm = LLMFactory.create(config.provider, config);
      const start = Date.now();
      const response = await llm.ask(prompt);
      const duration = Date.now() - start;
      
      return {
        provider: name,
        response: response.content,
        tokens: response.usage.totalTokens,
        duration: `${duration}ms`
      };
    })
  );
  
  results.forEach(r => {
    console.log(`\n=== ${r.provider} ===`);
    console.log(`Duration: ${r.duration}`);
    console.log(`Tokens: ${r.tokens}`);
    console.log(`Response: ${r.response.slice(0, 100)}...`);
  });
}

compareProviders('What is the meaning of life?');

Building a Simple CLI Chatbot

import readline from 'readline';
import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o'
});

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

async function chat() {
  console.log('🤖 Chatbot started. Type "exit" to quit.\n');
  
  const ask = () => {
    rl.question('You: ', async (input) => {
      if (input.toLowerCase() === 'exit') {
        console.log('Goodbye!');
        rl.close();
        return;
      }
      
      process.stdout.write('Bot: ');
      
      for await (const chunk of llm.askStream(input)) {
        process.stdout.write(chunk);
      }
      
      console.log('\n');
      ask();
    });
  };
  
  ask();
}

chat();

Parallel Processing with Multiple Sessions

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o'
});

async function parallelSessions() {
  const sessions = ['user1', 'user2', 'user3'];
  
  // Set different system prompts for each session
  llm.setSystemMessage('You are a math tutor', 'user1');
  llm.setSystemMessage('You are a history teacher', 'user2');
  llm.setSystemMessage('You are a coding mentor', 'user3');
  
  // Process multiple conversations in parallel
  const results = await Promise.all(
    sessions.map(sessionId => 
      llm.chat('What can you teach me?', { sessionId })
    )
  );
  
  results.forEach((result, index) => {
    console.log(`Session ${sessions[index]}:`, result.content.slice(0, 100));
  });
}

parallelSessions();