🔌 PlugLLM Documentation
📖 Overview
PlugLLM provides a unified interface for interacting with multiple Large Language Model providers. It abstracts away provider-specific implementation details, offering consistent methods for generation, chat, and streaming across all supported providers.
// ES Module imports
import { ChatOpenAI, Message, ChatResponse } from 'plugllm';
gpt-4ogemini-2.0-flashllama-3.3-70bclaude-sonnet-4-5grok-3-minimistral-largeLlama-4-Maverickdeepseek-chatqwen-plusmoonshot-v1-8kcommand-a-03-2025sarvam-2b-v0.5gemma3 (local)📦 Installation
npm install plugllm
# or
yarn add plugllm
# or
pnpm add plugllm
🧠 Core Concepts
BaseLLM
BaseLLM is an abstract class that all provider implementations extend. It defines the standard interface and shared functionality.
Properties
| Property | Type | Description |
|---|---|---|
model | string | The model identifier being used |
temperature | number | Sampling temperature (0-2) |
maxTokens | number | Maximum tokens to generate |
maxHistory | number | Maximum messages retained in conversation history |
apiKey | string | API key for the provider |
Methods
| Method | Return Type | Description |
|---|---|---|
generate(prompt, kwargs) | Promise<ChatResponse> | Generate a response from a prompt |
stream(prompt, kwargs) | AsyncGenerator<string> | Stream a response token by token |
chat(message, options, kwargs) | Promise<ChatResponse> | Continue a conversation with memory |
ask(userPrompt, options, kwargs) | Promise<ChatResponse> | Simple Q&A with optional system prompt |
askStream(userPrompt, options, kwargs) | AsyncGenerator<string> | Stream Q&A responses |
getConversationHistory(sessionId) | Message[] | Retrieve conversation history |
clearConversation(sessionId) | void | Clear history (preserves system message) |
resetConversation(sessionId) | void | Full reset including system message |
setSystemMessage(message, sessionId) | void | Set system prompt for a session |
ChatResponse
Standardized response object returned by all generation methods.
interface ChatResponse {
/** The generated text content */
content: string;
/** The model used for generation */
model: string;
/** Token usage statistics */
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
/** Raw response from the provider API */
rawResponse: any;
/** Reason why generation stopped */
finishReason: 'stop' | 'length' | 'content_filter' | 'tool_calls' | null;
}
Message Factory
Factory for creating standardized message objects used in conversation history.
// Static factory methods
Message.user(content: string): Message
Message.assistant(content: string): Message
Message.system(content: string): Message
// Message interface
interface Message {
role: 'user' | 'assistant' | 'system';
content: string;
}
Example
import { Message } from 'plugllm';
const messages = [
Message.system('You are a helpful assistant'),
Message.user('What is the capital of France?'),
Message.assistant('The capital of France is Paris.'),
Message.user('What is its population?')
];
📚 API Reference
BaseLLM Class
Abstract base class providing common functionality for all LLM providers.
Constructor Options
interface BaseLLMOptions {
/** API key for the provider (reads from env if omitted) */
apiKey?: string;
/** Model identifier */
model?: string;
/** Sampling temperature (0-2) */
temperature?: number;
/** Maximum tokens to generate */
maxTokens?: number;
/** Maximum messages retained in history (default: 10) */
maxHistory?: number;
/** Base URL for API requests (provider-specific) */
baseURL?: string;
}
generate()
import { ChatOpenAI, Message } from 'plugllm';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
// String prompt
const response = await llm.generate('Explain quantum computing');
// Message array
const response = await llm.generate([
Message.system('You are a physics professor'),
Message.user('Explain quantum computing')
]);
// With provider-specific kwargs
const response = await llm.generate('Hello', {
top_p: 0.9,
frequency_penalty: 0.5
});
stream()
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
for await (const chunk of llm.stream('Tell me a story')) {
process.stdout.write(chunk);
}
chat()
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
// Default session
await llm.chat('My name is Alice');
await llm.chat('What is my name?'); // Remembers context
// Multiple sessions
await llm.chat('I like Python', { sessionId: 'user1' });
await llm.chat('I like JavaScript', { sessionId: 'user2' });
ask()
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
const response = await llm.ask(
'What is machine learning?',
{ systemPrompt: 'You are a patient teacher. Explain simply.' }
);
askStream()
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
for await (const chunk of llm.askStream('Count from 1 to 10')) {
process.stdout.write(chunk);
}
Fluent Interface
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
const response = await llm
.withSystem('You are a math tutor')
.withUser('What is the derivative of x²?')
.withTemperature(0.3)
.withMaxTokens(200)
.call();
Fluent Methods
| Method | Description |
|---|---|
withSystem(content) | Set system message |
withUser(content) | Add user message |
withAssistant(content) | Add assistant message |
withTemperature(value) | Set temperature |
withMaxTokens(value) | Set max tokens |
call(kwargs) | Execute with current chain |
Conversation Management
// Get history
const history = llm.getConversationHistory('default');
// Clear history (preserves system message)
llm.clearConversation('default');
// Full reset
llm.resetConversation('default');
// Set system message
llm.setSystemMessage('You are a helpful coding assistant', 'coding-session');
LLMFactory
Factory class for creating provider instances dynamically.
type Provider =
| 'openai' | 'chatopenai'
| 'gemini' | 'chatgemini' | 'google'
| 'groq' | 'chatgroq'
| 'claude' | 'chatclaude' | 'anthropic'
| 'grok' | 'chatgrok' | 'xai'
| 'mistral' | 'chatmistral'
| 'llama' | 'chatllama' | 'meta'
| 'deepseek' | 'chatdeepseek'
| 'qwen' | 'chatqwen' | 'alibaba'
| 'kimi' | 'chatkimi' | 'moonshot'
| 'cohere' | 'chatcohere'
| 'sarvam' | 'chatsarvamai'
| 'ollama' | 'chatollama';
import { LLMFactory } from 'plugllm';
const llm = LLMFactory.create('groq', {
apiKey: 'gsk_xxx',
model: 'llama-3.3-70b-versatile',
temperature: 0.7
});
v1 API (Legacy)
Simplified API for quick prototyping.
import { config, generate, chat, resetChat } from 'plugllm';
// Configure once
config({
provider: 'openai',
apiKey: 'sk-xxx',
model: 'gpt-4o'
});
// Generate
const reply = await generate('What is JavaScript?');
// Stateful chat
const r1 = await chat('My name is Bob');
const r2 = await chat('What is my name?');
// Reset
resetChat();
🔌 Provider-Specific Classes
Each provider class extends BaseLLM and may include provider-specific methods or properties.
ChatOpenAI
OpenAI GPT models (GPT-4, GPT-4o, GPT-3.5).
Environment Variable: OPENAI_API_KEY | Default Model: gpt-4o
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({
apiKey: 'sk-xxx',
model: 'gpt-4o',
organization: 'org-xxx' // Optional
});
ChatGemini
Google Gemini models.
Environment Variable: GEMINI_API_KEY | Default Model: gemini-2.0-flash
import { ChatGemini } from 'plugllm';
const llm = new ChatGemini({
apiKey: 'AIza...',
model: 'gemini-2.0-flash'
});
ChatGroq
Groq's ultra-fast inference.
Environment Variable: GROQ_API_KEY | Default Model: llama-3.3-70b-versatile
import { ChatGroq } from 'plugllm';
const llm = new ChatGroq({
apiKey: 'gsk_xxx',
model: 'llama-3.3-70b-versatile'
});
ChatClaude
Anthropic Claude models.
Environment Variable: ANTHROPIC_API_KEY | Default Model: claude-sonnet-4-5
import { ChatClaude } from 'plugllm';
const llm = new ChatClaude({
apiKey: 'sk-ant-xxx',
model: 'claude-sonnet-4-5'
});
ChatGrok
xAI Grok models.
Environment Variable: XAI_API_KEY | Default Model: grok-3-mini
import { ChatGrok } from 'plugllm';
const llm = new ChatGrok({
apiKey: 'xai-xxx',
model: 'grok-3-mini'
});
ChatMistral
Mistral AI models.
Environment Variable: MISTRAL_API_KEY | Default Model: mistral-large-latest
import { ChatMistral } from 'plugllm';
const llm = new ChatMistral({
apiKey: 'xxx',
model: 'mistral-large-latest'
});
ChatLlama
Meta Llama models via Llama API.
Environment Variable: LLAMA_API_KEY | Default Model: Llama-4-Maverick-17B
import { ChatLlama } from 'plugllm';
const llm = new ChatLlama({
apiKey: 'xxx',
model: 'Llama-4-Maverick-17B'
});
ChatDeepSeek
DeepSeek models.
Environment Variable: DEEPSEEK_API_KEY | Default Model: deepseek-chat
import { ChatDeepSeek } from 'plugllm';
const llm = new ChatDeepSeek({
apiKey: 'xxx',
model: 'deepseek-chat'
});
ChatQwen
Alibaba Qwen models.
Environment Variable: DASHSCOPE_API_KEY | Default Model: qwen-plus
import { ChatQwen } from 'plugllm';
const llm = new ChatQwen({
apiKey: 'xxx',
model: 'qwen-plus'
});
ChatKimi
Moonshot Kimi models.
Environment Variable: MOONSHOT_API_KEY | Default Model: moonshot-v1-8k
import { ChatKimi } from 'plugllm';
const llm = new ChatKimi({
apiKey: 'xxx',
model: 'moonshot-v1-8k'
});
ChatCohere
Cohere models.
Environment Variable: CO_API_KEY | Default Model: command-a-03-2025
import { ChatCohere } from 'plugllm';
const llm = new ChatCohere({
apiKey: 'xxx',
model: 'command-a-03-2025'
});
ChatSarvamAI
SarvamAI Indian language models.
Environment Variable: SARVAM_API_KEY | Default Model: sarvam-2b-v0.5
import { ChatSarvamAI } from 'plugllm';
const llm = new ChatSarvamAI({
apiKey: 'xxx',
model: 'sarvam-2b-v0.5'
});
ChatOllama
Local Ollama models.
Environment Variable: None required | Default Model: gemma3 | Default Base URL: http://localhost:11434
import { ChatOllama } from 'plugllm';
const llm = new ChatOllama({
model: 'llama3',
baseURL: 'http://localhost:11434'
});
⚠️ Error Types
PlugLLM provides typed errors for better error handling.
import {
AuthenticationError,
RateLimitError,
ValidationError,
APIError,
NetworkError
} from 'plugllm/types';
| Error Class | Description |
|---|---|
AuthenticationError | Invalid or missing API key |
RateLimitError | Rate limit exceeded |
ValidationError | Invalid parameters or configuration |
APIError | Provider API returned an error |
NetworkError | Network connectivity issues |
Example
import { ChatOpenAI } from 'plugllm';
import {
AuthenticationError,
RateLimitError,
ValidationError,
APIError,
NetworkError
} from 'plugllm/types';
const llm = new ChatOpenAI({ apiKey: 'sk-xxx' });
try {
const response = await llm.generate('Hello');
} catch (error) {
switch (error.name) {
case 'AuthenticationError':
console.error('Check your API key');
break;
case 'RateLimitError':
console.error('Rate limit hit, retry after:', error.retryAfter);
break;
case 'ValidationError':
console.error('Invalid parameters:', error.message);
break;
case 'APIError':
console.error('Provider error:', error.statusCode, error.message);
break;
case 'NetworkError':
console.error('Connection failed:', error.message);
break;
default:
console.error('Unknown error:', error);
}
}
💡 Usage Examples
Multi-Turn Conversation
import { ChatOpenAI, Message } from 'plugllm';
const llm = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o'
});
async function conversation() {
const sessionId = 'user-123';
// Set context
llm.setSystemMessage(
'You are an expert JavaScript developer. Provide concise answers.',
sessionId
);
// Multi-turn conversation
const responses = [];
responses.push(await llm.chat('What is a closure?', { sessionId }));
console.log('Assistant:', responses[0].content);
responses.push(await llm.chat('Give me a practical example', { sessionId }));
console.log('Assistant:', responses[1].content);
responses.push(await llm.chat('How does it relate to lexical scoping?', { sessionId }));
console.log('Assistant:', responses[2].content);
// View history
const history = llm.getConversationHistory(sessionId);
console.log(`Conversation length: ${history.length} messages`);
}
conversation();
Streaming with Progress
import { ChatGroq } from 'plugllm';
const llm = new ChatGroq({
apiKey: process.env.GROQ_API_KEY,
model: 'llama-3.3-70b-versatile'
});
async function streamWithProgress() {
let tokenCount = 0;
process.stdout.write('Generating: ');
for await (const chunk of llm.stream('Explain the theory of relativity')) {
tokenCount++;
process.stdout.write(chunk);
}
console.log(`\n\nGenerated ${tokenCount} tokens`);
}
streamWithProgress();
Comparing Multiple Providers
import { LLMFactory } from 'plugllm';
async function compareProviders(prompt) {
const providers = [
{ name: 'OpenAI', config: { provider: 'openai', model: 'gpt-4o' } },
{ name: 'Claude', config: { provider: 'claude', model: 'claude-sonnet-4-5' } },
{ name: 'Gemini', config: { provider: 'gemini', model: 'gemini-2.0-flash' } }
];
const results = await Promise.all(
providers.map(async ({ name, config }) => {
const llm = LLMFactory.create(config.provider, config);
const start = Date.now();
const response = await llm.ask(prompt);
const duration = Date.now() - start;
return {
provider: name,
response: response.content,
tokens: response.usage.totalTokens,
duration: `${duration}ms`
};
})
);
results.forEach(r => {
console.log(`\n=== ${r.provider} ===`);
console.log(`Duration: ${r.duration}`);
console.log(`Tokens: ${r.tokens}`);
console.log(`Response: ${r.response.slice(0, 100)}...`);
});
}
compareProviders('What is the meaning of life?');
Building a Simple CLI Chatbot
import readline from 'readline';
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o'
});
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
async function chat() {
console.log('🤖 Chatbot started. Type "exit" to quit.\n');
const ask = () => {
rl.question('You: ', async (input) => {
if (input.toLowerCase() === 'exit') {
console.log('Goodbye!');
rl.close();
return;
}
process.stdout.write('Bot: ');
for await (const chunk of llm.askStream(input)) {
process.stdout.write(chunk);
}
console.log('\n');
ask();
});
};
ask();
}
chat();
Parallel Processing with Multiple Sessions
import { ChatOpenAI } from 'plugllm';
const llm = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o'
});
async function parallelSessions() {
const sessions = ['user1', 'user2', 'user3'];
// Set different system prompts for each session
llm.setSystemMessage('You are a math tutor', 'user1');
llm.setSystemMessage('You are a history teacher', 'user2');
llm.setSystemMessage('You are a coding mentor', 'user3');
// Process multiple conversations in parallel
const results = await Promise.all(
sessions.map(sessionId =>
llm.chat('What can you teach me?', { sessionId })
)
);
results.forEach((result, index) => {
console.log(`Session ${sessions[index]}:`, result.content.slice(0, 100));
});
}
parallelSessions();