🔌 PlugLLM Documentation

📘 Version 2.0.0 • One API to rule them all — Unified interface for 13+ LLM providers

📖 Overview

PlugLLM provides a unified interface for interacting with multiple Large Language Model providers. It abstracts away provider-specific implementation details, offering consistent methods for generation, chat, and streaming across all supported providers.

// ES Module imports
import { ChatOpenAI, Message, ChatResponse } from 'plugllm';

OpenAI gpt-4o

Google gemini-2.0-flash

Groq llama-3.3-70b

Anthropic claude-sonnet-4-5

xAI grok-3-mini

Mistral mistral-large

Meta Llama-4-Maverick

DeepSeek deepseek-chat

Alibaba qwen-plus

Moonshot moonshot-v1-8k

Cohere command-a-03-2025

SarvamAI sarvam-2b-v0.5

Ollama gemma3 (local)

📦 Installation

npm install plugllm
# or
yarn add plugllm
# or
pnpm add plugllm

🧠 Core Concepts

BaseLLM

BaseLLM is an abstract class that all provider implementations extend. It defines the standard interface and shared functionality.

Properties

Property	Type	Description
`model`	`string`	The model identifier being used
`temperature`	`number`	Sampling temperature (0-2)
`maxTokens`	`number`	Maximum tokens to generate
`maxHistory`	`number`	Maximum messages retained in conversation history
`apiKey`	`string`	API key for the provider

Methods

Method	Return Type	Description
`generate(prompt, kwargs)`	`Promise<ChatResponse>`	Generate a response from a prompt
`stream(prompt, kwargs)`	`AsyncGenerator<string>`	Stream a response token by token
`chat(message, options, kwargs)`	`Promise<ChatResponse>`	Continue a conversation with memory
`ask(userPrompt, options, kwargs)`	`Promise<ChatResponse>`	Simple Q&A with optional system prompt
`askStream(userPrompt, options, kwargs)`	`AsyncGenerator<string>`	Stream Q&A responses
`getConversationHistory(sessionId)`	`Message[]`	Retrieve conversation history
`clearConversation(sessionId)`	`void`	Clear history (preserves system message)
`resetConversation(sessionId)`	`void`	Full reset including system message
`setSystemMessage(message, sessionId)`	`void`	Set system prompt for a session

ChatResponse

Standardized response object returned by all generation methods.

interface ChatResponse {
  /** The generated text content */
  content: string;
  
  /** The model used for generation */
  model: string;
  
  /** Token usage statistics */
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
  
  /** Raw response from the provider API */
  rawResponse: any;
  
  /** Reason why generation stopped */
  finishReason: 'stop' | 'length' | 'content_filter' | 'tool_calls' | null;
}

Message Factory

Factory for creating standardized message objects used in conversation history.

// Static factory methods
Message.user(content: string): Message
Message.assistant(content: string): Message
Message.system(content: string): Message

// Message interface
interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

Example

import { Message } from 'plugllm';

const messages = [
  Message.system('You are a helpful assistant'),
  Message.user('What is the capital of France?'),
  Message.assistant('The capital of France is Paris.'),
  Message.user('What is its population?')
];

📚 API Reference

BaseLLM Class

Abstract base class providing common functionality for all LLM providers.

Constructor Options

interface BaseLLMOptions {
  /** API key for the provider (reads from env if omitted) */
  apiKey?: string;
  
  /** Model identifier */
  model?: string;
  
  /** Sampling temperature (0-2) */
  temperature?: number;
  
  /** Maximum tokens to generate */
  maxTokens?: number;
  
  /** Maximum messages retained in history (default: 10) */
  maxHistory?: number;
  
  /** Base URL for API requests (provider-specific) */
  baseURL?: string;
}

generate()

import { ChatOpenAI, Message } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

// String prompt
const response = await llm.generate('Explain quantum computing');

// Message array
const response = await llm.generate([
  Message.system('You are a physics professor'),
  Message.user('Explain quantum computing')
]);

// With provider-specific kwargs
const response = await llm.generate('Hello', { 
  top_p: 0.9,
  frequency_penalty: 0.5 
});

stream()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

for await (const chunk of llm.stream('Tell me a story')) {
  process.stdout.write(chunk);
}

chat()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

// Default session
await llm.chat('My name is Alice');
await llm.chat('What is my name?'); // Remembers context

// Multiple sessions
await llm.chat('I like Python', { sessionId: 'user1' });
await llm.chat('I like JavaScript', { sessionId: 'user2' });

ask()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

const response = await llm.ask(
  'What is machine learning?',
  { systemPrompt: 'You are a patient teacher. Explain simply.' }
);

askStream()

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

for await (const chunk of llm.askStream('Count from 1 to 10')) {
  process.stdout.write(chunk);
}

Fluent Interface

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

const response = await llm
  .withSystem('You are a math tutor')
  .withUser('What is the derivative of x²?')
  .withTemperature(0.3)
  .withMaxTokens(200)
  .call();

Fluent Methods

Method	Description
`withSystem(content)`	Set system message
`withUser(content)`	Add user message
`withAssistant(content)`	Add assistant message
`withTemperature(value)`	Set temperature
`withMaxTokens(value)`	Set max tokens
`call(kwargs)`	Execute with current chain

Conversation Management

// Get history
const history = llm.getConversationHistory('default');

// Clear history (preserves system message)
llm.clearConversation('default');

// Full reset
llm.resetConversation('default');

// Set system message
llm.setSystemMessage('You are a helpful coding assistant', 'coding-session');

LLMFactory

Factory class for creating provider instances dynamically.

type Provider = 
  | 'openai' | 'chatopenai'
  | 'gemini' | 'chatgemini' | 'google'
  | 'groq' | 'chatgroq'
  | 'claude' | 'chatclaude' | 'anthropic'
  | 'grok' | 'chatgrok' | 'xai'
  | 'mistral' | 'chatmistral'
  | 'llama' | 'chatllama' | 'meta'
  | 'deepseek' | 'chatdeepseek'
  | 'qwen' | 'chatqwen' | 'alibaba'
  | 'kimi' | 'chatkimi' | 'moonshot'
  | 'cohere' | 'chatcohere'
  | 'sarvam' | 'chatsarvamai'
  | 'ollama' | 'chatollama';

import { LLMFactory } from 'plugllm';

const llm = LLMFactory.create('groq', {
  apiKey: 'gsk_xxx',
  model: 'llama-3.3-70b-versatile',
  temperature: 0.7
});

v1 API (Legacy)

Simplified API for quick prototyping.

import { config, generate, chat, resetChat } from 'plugllm';

// Configure once
config({
  provider: 'openai',
  apiKey: 'sk-xxx',
  model: 'gpt-4o'
});

// Generate
const reply = await generate('What is JavaScript?');

// Stateful chat
const r1 = await chat('My name is Bob');
const r2 = await chat('What is my name?');

// Reset
resetChat();

🔌 Provider-Specific Classes

Each provider class extends BaseLLM and may include provider-specific methods or properties.

ChatOpenAI

OpenAI GPT models (GPT-4, GPT-4o, GPT-3.5).

Environment Variable: OPENAI_API_KEY | Default Model: gpt-4o

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: 'sk-xxx',
  model: 'gpt-4o',
  organization: 'org-xxx' // Optional
});

ChatGemini

Google Gemini models.

Environment Variable: GEMINI_API_KEY | Default Model: gemini-2.0-flash

import { ChatGemini } from 'plugllm';

const llm = new ChatGemini({
  apiKey: 'AIza...',
  model: 'gemini-2.0-flash'
});

ChatGroq

Groq's ultra-fast inference.

Environment Variable: GROQ_API_KEY | Default Model: llama-3.3-70b-versatile

import { ChatGroq } from 'plugllm';

const llm = new ChatGroq({
  apiKey: 'gsk_xxx',
  model: 'llama-3.3-70b-versatile'
});

ChatClaude

Anthropic Claude models.

Environment Variable: ANTHROPIC_API_KEY | Default Model: claude-sonnet-4-5

import { ChatClaude } from 'plugllm';

const llm = new ChatClaude({
  apiKey: 'sk-ant-xxx',
  model: 'claude-sonnet-4-5'
});

ChatGrok

xAI Grok models.

Environment Variable: XAI_API_KEY | Default Model: grok-3-mini

import { ChatGrok } from 'plugllm';

const llm = new ChatGrok({
  apiKey: 'xai-xxx',
  model: 'grok-3-mini'
});

ChatMistral

Mistral AI models.

Environment Variable: MISTRAL_API_KEY | Default Model: mistral-large-latest

import { ChatMistral } from 'plugllm';

const llm = new ChatMistral({
  apiKey: 'xxx',
  model: 'mistral-large-latest'
});

ChatLlama

Meta Llama models via Llama API.

Environment Variable: LLAMA_API_KEY | Default Model: Llama-4-Maverick-17B

import { ChatLlama } from 'plugllm';

const llm = new ChatLlama({
  apiKey: 'xxx',
  model: 'Llama-4-Maverick-17B'
});

ChatDeepSeek

DeepSeek models.

Environment Variable: DEEPSEEK_API_KEY | Default Model: deepseek-chat

import { ChatDeepSeek } from 'plugllm';

const llm = new ChatDeepSeek({
  apiKey: 'xxx',
  model: 'deepseek-chat'
});

ChatQwen

Alibaba Qwen models.

Environment Variable: DASHSCOPE_API_KEY | Default Model: qwen-plus

import { ChatQwen } from 'plugllm';

const llm = new ChatQwen({
  apiKey: 'xxx',
  model: 'qwen-plus'
});

ChatKimi

Moonshot Kimi models.

Environment Variable: MOONSHOT_API_KEY | Default Model: moonshot-v1-8k

import { ChatKimi } from 'plugllm';

const llm = new ChatKimi({
  apiKey: 'xxx',
  model: 'moonshot-v1-8k'
});

ChatCohere

Cohere models.

Environment Variable: CO_API_KEY | Default Model: command-a-03-2025

import { ChatCohere } from 'plugllm';

const llm = new ChatCohere({
  apiKey: 'xxx',
  model: 'command-a-03-2025'
});

ChatSarvamAI

SarvamAI Indian language models.

Environment Variable: SARVAM_API_KEY | Default Model: sarvam-2b-v0.5

import { ChatSarvamAI } from 'plugllm';

const llm = new ChatSarvamAI({
  apiKey: 'xxx',
  model: 'sarvam-2b-v0.5'
});

ChatOllama

Local Ollama models.

Environment Variable: None required | Default Model: gemma3 | Default Base URL: http://localhost:11434

import { ChatOllama } from 'plugllm';

const llm = new ChatOllama({
  model: 'llama3',
  baseURL: 'http://localhost:11434'
});

⚠️ Error Types

PlugLLM provides typed errors for better error handling.

import {
  AuthenticationError,
  RateLimitError,
  ValidationError,
  APIError,
  NetworkError
} from 'plugllm/types';

Error Class	Description
`AuthenticationError`	Invalid or missing API key
`RateLimitError`	Rate limit exceeded
`ValidationError`	Invalid parameters or configuration
`APIError`	Provider API returned an error
`NetworkError`	Network connectivity issues

Example

import { ChatOpenAI } from 'plugllm';
import { 
  AuthenticationError, 
  RateLimitError, 
  ValidationError, 
  APIError, 
  NetworkError 
} from 'plugllm/types';

const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });

try {
  const response = await llm.generate('Hello');
} catch (error) {
  switch (error.name) {
    case 'AuthenticationError':
      console.error('Check your API key');
      break;
    case 'RateLimitError':
      console.error('Rate limit hit, retry after:', error.retryAfter);
      break;
    case 'ValidationError':
      console.error('Invalid parameters:', error.message);
      break;
    case 'APIError':
      console.error('Provider error:', error.statusCode, error.message);
      break;
    case 'NetworkError':
      console.error('Connection failed:', error.message);
      break;
    default:
      console.error('Unknown error:', error);
  }
}

💡 Usage Examples

Multi-Turn Conversation

import { ChatOpenAI, Message } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o'
});

async function conversation() {
  const sessionId = 'user-123';
  
  // Set context
  llm.setSystemMessage(
    'You are an expert JavaScript developer. Provide concise answers.',
    sessionId
  );
  
  // Multi-turn conversation
  const responses = [];
  
  responses.push(await llm.chat('What is a closure?', { sessionId }));
  console.log('Assistant:', responses[0].content);
  
  responses.push(await llm.chat('Give me a practical example', { sessionId }));
  console.log('Assistant:', responses[1].content);
  
  responses.push(await llm.chat('How does it relate to lexical scoping?', { sessionId }));
  console.log('Assistant:', responses[2].content);
  
  // View history
  const history = llm.getConversationHistory(sessionId);
  console.log(`Conversation length: ${history.length} messages`);
}

conversation();

Streaming with Progress

import { ChatGroq } from 'plugllm';

const llm = new ChatGroq({
  apiKey: process.env.GROQ_API_KEY,
  model: 'llama-3.3-70b-versatile'
});

async function streamWithProgress() {
  let tokenCount = 0;
  
  process.stdout.write('Generating: ');
  
  for await (const chunk of llm.stream('Explain the theory of relativity')) {
    tokenCount++;
    process.stdout.write(chunk);
  }
  
  console.log(`\n\nGenerated ${tokenCount} tokens`);
}

streamWithProgress();

Comparing Multiple Providers

import { LLMFactory } from 'plugllm';

async function compareProviders(prompt) {
  const providers = [
    { name: 'OpenAI', config: { provider: 'openai', model: 'gpt-4o' } },
    { name: 'Claude', config: { provider: 'claude', model: 'claude-sonnet-4-5' } },
    { name: 'Gemini', config: { provider: 'gemini', model: 'gemini-2.0-flash' } }
  ];
  
  const results = await Promise.all(
    providers.map(async ({ name, config }) => {
      const llm = LLMFactory.create(config.provider, config);
      const start = Date.now();
      const response = await llm.ask(prompt);
      const duration = Date.now() - start;
      
      return {
        provider: name,
        response: response.content,
        tokens: response.usage.totalTokens,
        duration: `${duration}ms`
      };
    })
  );
  
  results.forEach(r => {
    console.log(`\n=== ${r.provider} ===`);
    console.log(`Duration: ${r.duration}`);
    console.log(`Tokens: ${r.tokens}`);
    console.log(`Response: ${r.response.slice(0, 100)}...`);
  });
}

compareProviders('What is the meaning of life?');

Building a Simple CLI Chatbot

import readline from 'readline';
import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o'
});

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

async function chat() {
  console.log('🤖 Chatbot started. Type "exit" to quit.\n');
  
  const ask = () => {
    rl.question('You: ', async (input) => {
      if (input.toLowerCase() === 'exit') {
        console.log('Goodbye!');
        rl.close();
        return;
      }
      
      process.stdout.write('Bot: ');
      
      for await (const chunk of llm.askStream(input)) {
        process.stdout.write(chunk);
      }
      
      console.log('\n');
      ask();
    });
  };
  
  ask();
}

chat();

Parallel Processing with Multiple Sessions

import { ChatOpenAI } from 'plugllm';

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4o'
});

async function parallelSessions() {
  const sessions = ['user1', 'user2', 'user3'];
  
  // Set different system prompts for each session
  llm.setSystemMessage('You are a math tutor', 'user1');
  llm.setSystemMessage('You are a history teacher', 'user2');
  llm.setSystemMessage('You are a coding mentor', 'user3');
  
  // Process multiple conversations in parallel
  const results = await Promise.all(
    sessions.map(sessionId => 
      llm.chat('What can you teach me?', { sessionId })
    )
  );
  
  results.forEach((result, index) => {
    console.log(`Session ${sessions[index]}:`, result.content.slice(0, 100));
  });
}

parallelSessions();