- Published on
LangChain and LlamaIndex: Building LLM Applications
- Authors

- Name
- Jared Chung
Introduction
LangChain and LlamaIndex are the two most popular frameworks for building LLM applications. While they overlap in functionality, each has its strengths. In this post, we'll explore both frameworks with practical examples to help you choose the right tool for your project.
Overview Comparison
| Feature | LangChain | LlamaIndex |
|---|---|---|
| Focus | General LLM orchestration | Data indexing & retrieval |
| Best For | Agents, chains, complex workflows | RAG, document Q&A |
| Learning Curve | Steeper | Gentler |
| Flexibility | Very high | Moderate |
| Abstractions | Many layers | Focused |
| Community | Larger | Growing |
LangChain Fundamentals
LangChain provides building blocks for LLM applications: prompts, models, chains, memory, and agents.
Installation
pip install langchain langchain-openai langchain-community
Basic Usage
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
# Initialize model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Simple completion
messages = [
SystemMessage(content="You are a helpful coding assistant."),
HumanMessage(content="Write a Python function to calculate factorial.")
]
response = llm.invoke(messages)
print(response.content)
Prompt Templates
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
# Simple template
prompt = PromptTemplate.from_template(
"Translate the following text to {language}: {text}"
)
formatted = prompt.format(language="French", text="Hello, how are you?")
print(formatted)
# Chat template
chat_prompt = ChatPromptTemplate.from_messages([
("system", "You are a {role} expert."),
("human", "{question}")
])
messages = chat_prompt.format_messages(role="Python", question="Explain decorators")
response = llm.invoke(messages)
Chains with LCEL
LangChain Expression Language (LCEL) provides a declarative way to compose chains:
from langchain_core.output_parsers import StrOutputParser
# Create a chain
chain = chat_prompt | llm | StrOutputParser()
# Run the chain
result = chain.invoke({"role": "data science", "question": "What is feature engineering?"})
print(result)
Structured Output
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
class MovieReview(BaseModel):
title: str = Field(description="Movie title")
rating: int = Field(description="Rating from 1-10")
pros: List[str] = Field(description="Positive aspects")
cons: List[str] = Field(description="Negative aspects")
# Use structured output
structured_llm = llm.with_structured_output(MovieReview)
result = structured_llm.invoke("Review the movie Inception")
print(f"Title: {result.title}")
print(f"Rating: {result.rating}/10")
RAG with LangChain
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# Load documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()
# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# Create RAG chain
template = """Answer based on the following context:
{context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Query
answer = rag_chain.invoke("What is the main topic of the document?")
print(answer)
Agents
Agents use LLMs to decide which tools to use:
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain import hub
# Define tools
@tool
def search_database(query: str) -> str:
"""Search the product database for information."""
# Simulate database search
return f"Found 3 products matching '{query}': ProductA, ProductB, ProductC"
@tool
def calculate_price(product: str, quantity: int) -> str:
"""Calculate the total price for a product."""
prices = {"ProductA": 10, "ProductB": 25, "ProductC": 15}
price = prices.get(product, 0) * quantity
return f"Total price for {quantity}x {product}: ${price}"
tools = [search_database, calculate_price]
# Create agent
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Run agent
result = agent_executor.invoke({
"input": "Find products related to laptops and calculate price for 5 ProductA items"
})
print(result["output"])
Memory for Conversations
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
# Create chain
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("placeholder", "{history}"),
("human", "{input}")
])
chain = prompt | llm | StrOutputParser()
# Add memory
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
with_memory = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
# Conversation
config = {"configurable": {"session_id": "user123"}}
response1 = with_memory.invoke({"input": "My name is Alice"}, config=config)
print(response1)
response2 = with_memory.invoke({"input": "What's my name?"}, config=config)
print(response2) # "Your name is Alice"
LlamaIndex Fundamentals
LlamaIndex focuses on connecting LLMs with your data through efficient indexing and retrieval.
Installation
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
Basic Usage
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of the documents?")
print(response)
Customizing LLM and Embeddings
from llama_index.core import Settings
# Configure globally
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Or per-index
index = VectorStoreIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model="text-embedding-3-small")
)
query_engine = index.as_query_engine(
llm=OpenAI(model="gpt-4o-mini")
)
Node Parsing (Chunking)
from llama_index.core.node_parser import (
SentenceSplitter,
SemanticSplitterNodeParser
)
# Simple splitting
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(documents)
# Semantic splitting
semantic_splitter = SemanticSplitterNodeParser(
embed_model=OpenAIEmbedding(),
breakpoint_percentile_threshold=95
)
semantic_nodes = semantic_splitter.get_nodes_from_documents(documents)
Different Index Types
from llama_index.core import (
VectorStoreIndex,
SummaryIndex,
TreeIndex,
KeywordTableIndex
)
# Vector index - best for semantic search
vector_index = VectorStoreIndex.from_documents(documents)
# Summary index - best for summarization
summary_index = SummaryIndex.from_documents(documents)
# Tree index - hierarchical for complex queries
tree_index = TreeIndex.from_documents(documents)
# Keyword index - for keyword-based retrieval
keyword_index = KeywordTableIndex.from_documents(documents)
Advanced RAG with LlamaIndex
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
# Create index
index = VectorStoreIndex.from_documents(documents)
# Custom retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10 # Retrieve more candidates
)
# Add post-processors
similarity_filter = SimilarityPostprocessor(similarity_cutoff=0.7)
# Create query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[similarity_filter]
)
response = query_engine.query("Explain the key concepts")
print(response)
Re-ranking
from llama_index.postprocessor.cohere_rerank import CohereRerank
# Add reranker
reranker = CohereRerank(api_key="your-key", top_n=5)
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[reranker]
)
Chat Engine with Memory
from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=3000)
chat_engine = index.as_chat_engine(
chat_mode="context",
memory=memory,
system_prompt="You are a helpful assistant for answering questions about our documentation."
)
# Conversation
response1 = chat_engine.chat("What products do you offer?")
print(response1)
response2 = chat_engine.chat("Tell me more about the first one")
print(response2)
Agents in LlamaIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
# Create tools from query engines
product_tool = QueryEngineTool(
query_engine=product_index.as_query_engine(),
metadata=ToolMetadata(
name="product_search",
description="Search for product information"
)
)
faq_tool = QueryEngineTool(
query_engine=faq_index.as_query_engine(),
metadata=ToolMetadata(
name="faq_search",
description="Search frequently asked questions"
)
)
# Create agent
agent = ReActAgent.from_tools(
[product_tool, faq_tool],
llm=OpenAI(model="gpt-4o-mini"),
verbose=True
)
response = agent.chat("What's your return policy for electronics?")
When to Use Each Framework
Choose LangChain When:
- Building complex agent workflows
- Need fine-grained control over every component
- Creating multi-step chains with branching logic
- Integrating many different tools and APIs
- Building conversational applications with complex state
Choose LlamaIndex When:
- Primary focus is document Q&A / RAG
- Need efficient indexing over large document collections
- Want simpler, more opinionated abstractions
- Building knowledge bases or search systems
- Need specialized index types (tree, keyword, graph)
Use Both Together:
from langchain.tools import Tool
from llama_index.core import VectorStoreIndex
# Create LlamaIndex query engine
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Wrap as LangChain tool
def search_docs(query: str) -> str:
response = query_engine.query(query)
return str(response)
doc_search_tool = Tool(
name="DocumentSearch",
func=search_docs,
description="Search through company documents"
)
# Use in LangChain agent
agent_executor = AgentExecutor(agent=agent, tools=[doc_search_tool, ...])
Production Considerations
- Caching: Both support caching embeddings and responses
- Streaming: Enable for better UX in chat applications
- Observability: Integrate with LangSmith or Arize Phoenix
- Error handling: Implement retries and fallbacks
- Cost tracking: Monitor token usage
# LangChain streaming
for chunk in chain.stream({"input": "Hello"}):
print(chunk, end="", flush=True)
# LlamaIndex streaming
streaming_response = query_engine.query("Explain this concept")
for text in streaming_response.response_gen:
print(text, end="", flush=True)
Complete Example: Document Q&A System
Here's a complete, production-ready document Q&A system using LlamaIndex:
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
Settings,
StorageContext,
load_index_from_storage
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.memory import ChatMemoryBuffer
from pathlib import Path
import os
class DocumentQASystem:
"""Production-ready document Q&A system using LlamaIndex."""
def __init__(
self,
documents_dir: str = "./documents",
persist_dir: str = "./storage",
model: str = "gpt-4o-mini",
chunk_size: int = 512,
chunk_overlap: int = 50
):
# Configure settings
Settings.llm = OpenAI(model=model, temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.node_parser = SentenceSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
self.documents_dir = documents_dir
self.persist_dir = persist_dir
self.index = None
self.chat_engine = None
def build_index(self):
"""Build or load the index."""
if Path(self.persist_dir).exists():
print("Loading existing index...")
storage_context = StorageContext.from_defaults(persist_dir=self.persist_dir)
self.index = load_index_from_storage(storage_context)
else:
print("Building new index...")
documents = SimpleDirectoryReader(self.documents_dir).load_data()
print(f"Loaded {len(documents)} documents")
self.index = VectorStoreIndex.from_documents(documents)
self.index.storage_context.persist(persist_dir=self.persist_dir)
print(f"Index persisted to {self.persist_dir}")
def query(self, question: str, similarity_top_k: int = 5) -> dict:
"""Query the index and return answer with sources."""
if self.index is None:
self.build_index()
query_engine = self.index.as_query_engine(
similarity_top_k=similarity_top_k,
response_mode="compact"
)
response = query_engine.query(question)
return {
"answer": str(response),
"sources": [
{
"text": node.node.text[:200] + "...",
"score": node.score,
"metadata": node.node.metadata
}
for node in response.source_nodes
]
}
def chat(self, message: str) -> str:
"""Chat with memory."""
if self.chat_engine is None:
if self.index is None:
self.build_index()
memory = ChatMemoryBuffer.from_defaults(token_limit=3000)
self.chat_engine = self.index.as_chat_engine(
chat_mode="context",
memory=memory,
system_prompt="""You are a helpful assistant answering questions
about the documentation. Be concise and cite specific sections when possible."""
)
response = self.chat_engine.chat(message)
return str(response)
def reset_chat(self):
"""Reset chat memory."""
self.chat_engine = None
# Usage
if __name__ == "__main__":
# Initialize system
qa_system = DocumentQASystem(documents_dir="./docs")
# Build/load index
qa_system.build_index()
# Single query
result = qa_system.query("What are the key features?")
print(f"Answer: {result['answer']}\n")
print("Sources:")
for source in result['sources']:
print(f" - [{source['score']:.3f}] {source['text'][:80]}...")
# Chat conversation
print("\n--- Starting chat ---")
print(qa_system.chat("What products do you offer?"))
print(qa_system.chat("Tell me more about the first one"))
print(qa_system.chat("What's the pricing?"))
Complete Example: Multi-Tool Agent with LangChain
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_community.utilities import WikipediaAPIWrapper
from pydantic import BaseModel, Field
from typing import Optional
import json
# Define tools with proper schemas
class CalculatorInput(BaseModel):
expression: str = Field(description="Mathematical expression to evaluate")
@tool(args_schema=CalculatorInput)
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression and return the result."""
try:
# Safe evaluation (in production, use a proper math parser)
allowed_chars = set("0123456789+-*/.() ")
if all(c in allowed_chars for c in expression):
result = eval(expression)
return f"Result: {result}"
return "Error: Invalid expression"
except Exception as e:
return f"Error: {str(e)}"
class SearchInput(BaseModel):
query: str = Field(description="Search query")
@tool(args_schema=SearchInput)
def web_search(query: str) -> str:
"""Search Wikipedia for information on a topic."""
wikipedia = WikipediaAPIWrapper(top_k_results=2, doc_content_chars_max=1000)
return wikipedia.run(query)
class WeatherInput(BaseModel):
city: str = Field(description="City name")
@tool(args_schema=WeatherInput)
def get_weather(city: str) -> str:
"""Get current weather for a city (simulated)."""
# In production, use a real weather API
weather_data = {
"new york": {"temp": 72, "condition": "Sunny"},
"london": {"temp": 59, "condition": "Cloudy"},
"tokyo": {"temp": 68, "condition": "Partly cloudy"},
}
city_lower = city.lower()
if city_lower in weather_data:
data = weather_data[city_lower]
return f"Weather in {city}: {data['temp']}F, {data['condition']}"
return f"Weather data not available for {city}"
class MultiToolAgent:
"""Agent that can use multiple tools to answer questions."""
def __init__(self, model: str = "gpt-4o-mini"):
self.llm = ChatOpenAI(model=model, temperature=0)
self.tools = [calculator, web_search, get_weather]
# Create prompt
self.prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant with access to various tools.
Use the tools when needed to answer questions accurately.
Always cite your sources when using search results."""),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
# Create agent
agent = create_tool_calling_agent(self.llm, self.tools, self.prompt)
self.executor = AgentExecutor(
agent=agent,
tools=self.tools,
verbose=True,
max_iterations=5,
handle_parsing_errors=True
)
self.chat_history = []
def ask(self, question: str) -> str:
"""Ask the agent a question."""
result = self.executor.invoke({
"input": question,
"chat_history": self.chat_history
})
# Update history
self.chat_history.append(("human", question))
self.chat_history.append(("assistant", result["output"]))
return result["output"]
def clear_history(self):
"""Clear conversation history."""
self.chat_history = []
# Usage
if __name__ == "__main__":
agent = MultiToolAgent()
# Test different capabilities
print("=== Calculator ===")
print(agent.ask("What is 15% of 250?"))
print("\n=== Weather ===")
print(agent.ask("What's the weather like in Tokyo?"))
print("\n=== Search ===")
print(agent.ask("Tell me about the history of Python programming language"))
print("\n=== Combined ===")
print(agent.ask("What's the population of France and what's that divided by 1000?"))
Error Handling and Resilience
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_openai import ChatOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ResilientLLMWrapper:
"""LLM wrapper with retry logic and fallbacks."""
def __init__(
self,
primary_model: str = "gpt-4o-mini",
fallback_model: str = "gpt-3.5-turbo"
):
self.primary = ChatOpenAI(model=primary_model, temperature=0)
self.fallback = ChatOpenAI(model=fallback_model, temperature=0)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=60)
)
def invoke_with_retry(self, messages):
"""Invoke with automatic retry."""
return self.primary.invoke(messages)
def invoke_with_fallback(self, messages):
"""Try primary, fall back to secondary on failure."""
try:
return self.invoke_with_retry(messages)
except Exception as e:
logger.warning(f"Primary model failed: {e}, using fallback")
return self.fallback.invoke(messages)
# Usage in a chain
def safe_json_parse(text: str) -> dict:
"""Safely parse JSON with error handling."""
try:
return json.loads(text)
except json.JSONDecodeError:
# Try to extract JSON from text
import re
json_match = re.search(r'\{.*\}', text, re.DOTALL)
if json_match:
return json.loads(json_match.group())
return {"error": "Failed to parse JSON", "raw": text}
# Build resilient chain
resilient_llm = ResilientLLMWrapper()
chain = (
{"input": RunnablePassthrough()}
| RunnableLambda(lambda x: [{"role": "user", "content": x["input"]}])
| RunnableLambda(resilient_llm.invoke_with_fallback)
| RunnableLambda(lambda x: x.content)
)
Monitoring and Observability
from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Dict
import json
import time
@dataclass
class QueryLog:
"""Log entry for a query."""
timestamp: str
query: str
response: str
latency_ms: float
model: str
tokens_used: int = 0
sources_retrieved: int = 0
error: str = None
class ObservableRAGSystem:
"""RAG system with built-in observability."""
def __init__(self, qa_system):
self.qa_system = qa_system
self.logs: List[QueryLog] = []
def query(self, question: str) -> dict:
"""Query with logging."""
start = time.perf_counter()
error = None
result = {}
try:
result = self.qa_system.query(question)
except Exception as e:
error = str(e)
raise
finally:
latency = (time.perf_counter() - start) * 1000
log = QueryLog(
timestamp=datetime.now().isoformat(),
query=question,
response=result.get("answer", ""),
latency_ms=latency,
model="gpt-4o-mini",
sources_retrieved=len(result.get("sources", [])),
error=error
)
self.logs.append(log)
return result
def get_metrics(self) -> Dict:
"""Get aggregated metrics."""
if not self.logs:
return {}
latencies = [log.latency_ms for log in self.logs if not log.error]
errors = [log for log in self.logs if log.error]
return {
"total_queries": len(self.logs),
"successful_queries": len(self.logs) - len(errors),
"error_rate": len(errors) / len(self.logs),
"avg_latency_ms": sum(latencies) / len(latencies) if latencies else 0,
"p95_latency_ms": sorted(latencies)[int(len(latencies) * 0.95)] if latencies else 0,
"avg_sources_retrieved": sum(log.sources_retrieved for log in self.logs) / len(self.logs)
}
def export_logs(self, filepath: str):
"""Export logs to JSON file."""
with open(filepath, 'w') as f:
json.dump([vars(log) for log in self.logs], f, indent=2)
Conclusion
Both frameworks are excellent choices for LLM applications:
- LangChain: More flexible, better for complex orchestration and agents
- LlamaIndex: More focused, better for data-centric applications and RAG
Many production systems use both - LlamaIndex for efficient RAG and LangChain for agent orchestration.
Key recommendations:
- Start with LlamaIndex for pure document Q&A
- Use LangChain when you need complex agent workflows
- Combine both when you need the best of both worlds
- Add observability early - it's much harder to add later
- Implement proper error handling with retries and fallbacks
References
- LangChain Documentation: https://python.langchain.com
- LlamaIndex Documentation: https://docs.llamaindex.ai
- LangChain Expression Language: https://python.langchain.com/docs/expression_language
- LangSmith (Observability): https://smith.langchain.com