Jared AI Hub
Published on

LangChain and LlamaIndex: Building LLM Applications

Authors
  • avatar
    Name
    Jared Chung
    Twitter

Introduction

LangChain and LlamaIndex are the two most popular frameworks for building LLM applications. While they overlap in functionality, each has its strengths. In this post, we'll explore both frameworks with practical examples to help you choose the right tool for your project.

Overview Comparison

FeatureLangChainLlamaIndex
FocusGeneral LLM orchestrationData indexing & retrieval
Best ForAgents, chains, complex workflowsRAG, document Q&A
Learning CurveSteeperGentler
FlexibilityVery highModerate
AbstractionsMany layersFocused
CommunityLargerGrowing

LangChain Fundamentals

LangChain provides building blocks for LLM applications: prompts, models, chains, memory, and agents.

Installation

pip install langchain langchain-openai langchain-community

Basic Usage

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Simple completion
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="Write a Python function to calculate factorial.")
]

response = llm.invoke(messages)
print(response.content)

Prompt Templates

from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

# Simple template
prompt = PromptTemplate.from_template(
    "Translate the following text to {language}: {text}"
)

formatted = prompt.format(language="French", text="Hello, how are you?")
print(formatted)

# Chat template
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role} expert."),
    ("human", "{question}")
])

messages = chat_prompt.format_messages(role="Python", question="Explain decorators")
response = llm.invoke(messages)

Chains with LCEL

LangChain Expression Language (LCEL) provides a declarative way to compose chains:

from langchain_core.output_parsers import StrOutputParser

# Create a chain
chain = chat_prompt | llm | StrOutputParser()

# Run the chain
result = chain.invoke({"role": "data science", "question": "What is feature engineering?"})
print(result)

Structured Output

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List

class MovieReview(BaseModel):
    title: str = Field(description="Movie title")
    rating: int = Field(description="Rating from 1-10")
    pros: List[str] = Field(description="Positive aspects")
    cons: List[str] = Field(description="Negative aspects")

# Use structured output
structured_llm = llm.with_structured_output(MovieReview)

result = structured_llm.invoke("Review the movie Inception")
print(f"Title: {result.title}")
print(f"Rating: {result.rating}/10")

RAG with LangChain

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Load documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Create RAG chain
template = """Answer based on the following context:

{context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Query
answer = rag_chain.invoke("What is the main topic of the document?")
print(answer)

Agents

Agents use LLMs to decide which tools to use:

from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain import hub

# Define tools
@tool
def search_database(query: str) -> str:
    """Search the product database for information."""
    # Simulate database search
    return f"Found 3 products matching '{query}': ProductA, ProductB, ProductC"

@tool
def calculate_price(product: str, quantity: int) -> str:
    """Calculate the total price for a product."""
    prices = {"ProductA": 10, "ProductB": 25, "ProductC": 15}
    price = prices.get(product, 0) * quantity
    return f"Total price for {quantity}x {product}: ${price}"

tools = [search_database, calculate_price]

# Create agent
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run agent
result = agent_executor.invoke({
    "input": "Find products related to laptops and calculate price for 5 ProductA items"
})
print(result["output"])

Memory for Conversations

from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Create chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("placeholder", "{history}"),
    ("human", "{input}")
])

chain = prompt | llm | StrOutputParser()

# Add memory
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

with_memory = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

# Conversation
config = {"configurable": {"session_id": "user123"}}

response1 = with_memory.invoke({"input": "My name is Alice"}, config=config)
print(response1)

response2 = with_memory.invoke({"input": "What's my name?"}, config=config)
print(response2)  # "Your name is Alice"

LlamaIndex Fundamentals

LlamaIndex focuses on connecting LLMs with your data through efficient indexing and retrieval.

Installation

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Basic Usage

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of the documents?")
print(response)

Customizing LLM and Embeddings

from llama_index.core import Settings

# Configure globally
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Or per-index
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model="text-embedding-3-small")
)

query_engine = index.as_query_engine(
    llm=OpenAI(model="gpt-4o-mini")
)

Node Parsing (Chunking)

from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser
)

# Simple splitting
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(documents)

# Semantic splitting
semantic_splitter = SemanticSplitterNodeParser(
    embed_model=OpenAIEmbedding(),
    breakpoint_percentile_threshold=95
)
semantic_nodes = semantic_splitter.get_nodes_from_documents(documents)

Different Index Types

from llama_index.core import (
    VectorStoreIndex,
    SummaryIndex,
    TreeIndex,
    KeywordTableIndex
)

# Vector index - best for semantic search
vector_index = VectorStoreIndex.from_documents(documents)

# Summary index - best for summarization
summary_index = SummaryIndex.from_documents(documents)

# Tree index - hierarchical for complex queries
tree_index = TreeIndex.from_documents(documents)

# Keyword index - for keyword-based retrieval
keyword_index = KeywordTableIndex.from_documents(documents)

Advanced RAG with LlamaIndex

from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# Create index
index = VectorStoreIndex.from_documents(documents)

# Custom retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10  # Retrieve more candidates
)

# Add post-processors
similarity_filter = SimilarityPostprocessor(similarity_cutoff=0.7)

# Create query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[similarity_filter]
)

response = query_engine.query("Explain the key concepts")
print(response)

Re-ranking

from llama_index.postprocessor.cohere_rerank import CohereRerank

# Add reranker
reranker = CohereRerank(api_key="your-key", top_n=5)

query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[reranker]
)

Chat Engine with Memory

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3000)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt="You are a helpful assistant for answering questions about our documentation."
)

# Conversation
response1 = chat_engine.chat("What products do you offer?")
print(response1)

response2 = chat_engine.chat("Tell me more about the first one")
print(response2)

Agents in LlamaIndex

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent

# Create tools from query engines
product_tool = QueryEngineTool(
    query_engine=product_index.as_query_engine(),
    metadata=ToolMetadata(
        name="product_search",
        description="Search for product information"
    )
)

faq_tool = QueryEngineTool(
    query_engine=faq_index.as_query_engine(),
    metadata=ToolMetadata(
        name="faq_search",
        description="Search frequently asked questions"
    )
)

# Create agent
agent = ReActAgent.from_tools(
    [product_tool, faq_tool],
    llm=OpenAI(model="gpt-4o-mini"),
    verbose=True
)

response = agent.chat("What's your return policy for electronics?")

When to Use Each Framework

Choose LangChain When:

  • Building complex agent workflows
  • Need fine-grained control over every component
  • Creating multi-step chains with branching logic
  • Integrating many different tools and APIs
  • Building conversational applications with complex state

Choose LlamaIndex When:

  • Primary focus is document Q&A / RAG
  • Need efficient indexing over large document collections
  • Want simpler, more opinionated abstractions
  • Building knowledge bases or search systems
  • Need specialized index types (tree, keyword, graph)

Use Both Together:

from langchain.tools import Tool
from llama_index.core import VectorStoreIndex

# Create LlamaIndex query engine
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Wrap as LangChain tool
def search_docs(query: str) -> str:
    response = query_engine.query(query)
    return str(response)

doc_search_tool = Tool(
    name="DocumentSearch",
    func=search_docs,
    description="Search through company documents"
)

# Use in LangChain agent
agent_executor = AgentExecutor(agent=agent, tools=[doc_search_tool, ...])

Production Considerations

  1. Caching: Both support caching embeddings and responses
  2. Streaming: Enable for better UX in chat applications
  3. Observability: Integrate with LangSmith or Arize Phoenix
  4. Error handling: Implement retries and fallbacks
  5. Cost tracking: Monitor token usage
# LangChain streaming
for chunk in chain.stream({"input": "Hello"}):
    print(chunk, end="", flush=True)

# LlamaIndex streaming
streaming_response = query_engine.query("Explain this concept")
for text in streaming_response.response_gen:
    print(text, end="", flush=True)

Complete Example: Document Q&A System

Here's a complete, production-ready document Q&A system using LlamaIndex:

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    Settings,
    StorageContext,
    load_index_from_storage
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.memory import ChatMemoryBuffer
from pathlib import Path
import os

class DocumentQASystem:
    """Production-ready document Q&A system using LlamaIndex."""

    def __init__(
        self,
        documents_dir: str = "./documents",
        persist_dir: str = "./storage",
        model: str = "gpt-4o-mini",
        chunk_size: int = 512,
        chunk_overlap: int = 50
    ):
        # Configure settings
        Settings.llm = OpenAI(model=model, temperature=0)
        Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
        Settings.node_parser = SentenceSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap
        )

        self.documents_dir = documents_dir
        self.persist_dir = persist_dir
        self.index = None
        self.chat_engine = None

    def build_index(self):
        """Build or load the index."""
        if Path(self.persist_dir).exists():
            print("Loading existing index...")
            storage_context = StorageContext.from_defaults(persist_dir=self.persist_dir)
            self.index = load_index_from_storage(storage_context)
        else:
            print("Building new index...")
            documents = SimpleDirectoryReader(self.documents_dir).load_data()
            print(f"Loaded {len(documents)} documents")

            self.index = VectorStoreIndex.from_documents(documents)
            self.index.storage_context.persist(persist_dir=self.persist_dir)
            print(f"Index persisted to {self.persist_dir}")

    def query(self, question: str, similarity_top_k: int = 5) -> dict:
        """Query the index and return answer with sources."""
        if self.index is None:
            self.build_index()

        query_engine = self.index.as_query_engine(
            similarity_top_k=similarity_top_k,
            response_mode="compact"
        )

        response = query_engine.query(question)

        return {
            "answer": str(response),
            "sources": [
                {
                    "text": node.node.text[:200] + "...",
                    "score": node.score,
                    "metadata": node.node.metadata
                }
                for node in response.source_nodes
            ]
        }

    def chat(self, message: str) -> str:
        """Chat with memory."""
        if self.chat_engine is None:
            if self.index is None:
                self.build_index()

            memory = ChatMemoryBuffer.from_defaults(token_limit=3000)
            self.chat_engine = self.index.as_chat_engine(
                chat_mode="context",
                memory=memory,
                system_prompt="""You are a helpful assistant answering questions
                about the documentation. Be concise and cite specific sections when possible."""
            )

        response = self.chat_engine.chat(message)
        return str(response)

    def reset_chat(self):
        """Reset chat memory."""
        self.chat_engine = None


# Usage
if __name__ == "__main__":
    # Initialize system
    qa_system = DocumentQASystem(documents_dir="./docs")

    # Build/load index
    qa_system.build_index()

    # Single query
    result = qa_system.query("What are the key features?")
    print(f"Answer: {result['answer']}\n")
    print("Sources:")
    for source in result['sources']:
        print(f"  - [{source['score']:.3f}] {source['text'][:80]}...")

    # Chat conversation
    print("\n--- Starting chat ---")
    print(qa_system.chat("What products do you offer?"))
    print(qa_system.chat("Tell me more about the first one"))
    print(qa_system.chat("What's the pricing?"))

Complete Example: Multi-Tool Agent with LangChain

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_community.utilities import WikipediaAPIWrapper
from pydantic import BaseModel, Field
from typing import Optional
import json

# Define tools with proper schemas
class CalculatorInput(BaseModel):
    expression: str = Field(description="Mathematical expression to evaluate")

@tool(args_schema=CalculatorInput)
def calculator(expression: str) -> str:
    """Evaluate a mathematical expression and return the result."""
    try:
        # Safe evaluation (in production, use a proper math parser)
        allowed_chars = set("0123456789+-*/.() ")
        if all(c in allowed_chars for c in expression):
            result = eval(expression)
            return f"Result: {result}"
        return "Error: Invalid expression"
    except Exception as e:
        return f"Error: {str(e)}"

class SearchInput(BaseModel):
    query: str = Field(description="Search query")

@tool(args_schema=SearchInput)
def web_search(query: str) -> str:
    """Search Wikipedia for information on a topic."""
    wikipedia = WikipediaAPIWrapper(top_k_results=2, doc_content_chars_max=1000)
    return wikipedia.run(query)

class WeatherInput(BaseModel):
    city: str = Field(description="City name")

@tool(args_schema=WeatherInput)
def get_weather(city: str) -> str:
    """Get current weather for a city (simulated)."""
    # In production, use a real weather API
    weather_data = {
        "new york": {"temp": 72, "condition": "Sunny"},
        "london": {"temp": 59, "condition": "Cloudy"},
        "tokyo": {"temp": 68, "condition": "Partly cloudy"},
    }

    city_lower = city.lower()
    if city_lower in weather_data:
        data = weather_data[city_lower]
        return f"Weather in {city}: {data['temp']}F, {data['condition']}"
    return f"Weather data not available for {city}"

class MultiToolAgent:
    """Agent that can use multiple tools to answer questions."""

    def __init__(self, model: str = "gpt-4o-mini"):
        self.llm = ChatOpenAI(model=model, temperature=0)
        self.tools = [calculator, web_search, get_weather]

        # Create prompt
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful assistant with access to various tools.
Use the tools when needed to answer questions accurately.
Always cite your sources when using search results."""),
            ("placeholder", "{chat_history}"),
            ("human", "{input}"),
            ("placeholder", "{agent_scratchpad}")
        ])

        # Create agent
        agent = create_tool_calling_agent(self.llm, self.tools, self.prompt)
        self.executor = AgentExecutor(
            agent=agent,
            tools=self.tools,
            verbose=True,
            max_iterations=5,
            handle_parsing_errors=True
        )

        self.chat_history = []

    def ask(self, question: str) -> str:
        """Ask the agent a question."""
        result = self.executor.invoke({
            "input": question,
            "chat_history": self.chat_history
        })

        # Update history
        self.chat_history.append(("human", question))
        self.chat_history.append(("assistant", result["output"]))

        return result["output"]

    def clear_history(self):
        """Clear conversation history."""
        self.chat_history = []


# Usage
if __name__ == "__main__":
    agent = MultiToolAgent()

    # Test different capabilities
    print("=== Calculator ===")
    print(agent.ask("What is 15% of 250?"))

    print("\n=== Weather ===")
    print(agent.ask("What's the weather like in Tokyo?"))

    print("\n=== Search ===")
    print(agent.ask("Tell me about the history of Python programming language"))

    print("\n=== Combined ===")
    print(agent.ask("What's the population of France and what's that divided by 1000?"))

Error Handling and Resilience

from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_openai import ChatOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class ResilientLLMWrapper:
    """LLM wrapper with retry logic and fallbacks."""

    def __init__(
        self,
        primary_model: str = "gpt-4o-mini",
        fallback_model: str = "gpt-3.5-turbo"
    ):
        self.primary = ChatOpenAI(model=primary_model, temperature=0)
        self.fallback = ChatOpenAI(model=fallback_model, temperature=0)

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=4, max=60)
    )
    def invoke_with_retry(self, messages):
        """Invoke with automatic retry."""
        return self.primary.invoke(messages)

    def invoke_with_fallback(self, messages):
        """Try primary, fall back to secondary on failure."""
        try:
            return self.invoke_with_retry(messages)
        except Exception as e:
            logger.warning(f"Primary model failed: {e}, using fallback")
            return self.fallback.invoke(messages)


# Usage in a chain
def safe_json_parse(text: str) -> dict:
    """Safely parse JSON with error handling."""
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        # Try to extract JSON from text
        import re
        json_match = re.search(r'\{.*\}', text, re.DOTALL)
        if json_match:
            return json.loads(json_match.group())
        return {"error": "Failed to parse JSON", "raw": text}

# Build resilient chain
resilient_llm = ResilientLLMWrapper()

chain = (
    {"input": RunnablePassthrough()}
    | RunnableLambda(lambda x: [{"role": "user", "content": x["input"]}])
    | RunnableLambda(resilient_llm.invoke_with_fallback)
    | RunnableLambda(lambda x: x.content)
)

Monitoring and Observability

from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Dict
import json
import time

@dataclass
class QueryLog:
    """Log entry for a query."""
    timestamp: str
    query: str
    response: str
    latency_ms: float
    model: str
    tokens_used: int = 0
    sources_retrieved: int = 0
    error: str = None

class ObservableRAGSystem:
    """RAG system with built-in observability."""

    def __init__(self, qa_system):
        self.qa_system = qa_system
        self.logs: List[QueryLog] = []

    def query(self, question: str) -> dict:
        """Query with logging."""
        start = time.perf_counter()
        error = None
        result = {}

        try:
            result = self.qa_system.query(question)
        except Exception as e:
            error = str(e)
            raise
        finally:
            latency = (time.perf_counter() - start) * 1000

            log = QueryLog(
                timestamp=datetime.now().isoformat(),
                query=question,
                response=result.get("answer", ""),
                latency_ms=latency,
                model="gpt-4o-mini",
                sources_retrieved=len(result.get("sources", [])),
                error=error
            )
            self.logs.append(log)

        return result

    def get_metrics(self) -> Dict:
        """Get aggregated metrics."""
        if not self.logs:
            return {}

        latencies = [log.latency_ms for log in self.logs if not log.error]
        errors = [log for log in self.logs if log.error]

        return {
            "total_queries": len(self.logs),
            "successful_queries": len(self.logs) - len(errors),
            "error_rate": len(errors) / len(self.logs),
            "avg_latency_ms": sum(latencies) / len(latencies) if latencies else 0,
            "p95_latency_ms": sorted(latencies)[int(len(latencies) * 0.95)] if latencies else 0,
            "avg_sources_retrieved": sum(log.sources_retrieved for log in self.logs) / len(self.logs)
        }

    def export_logs(self, filepath: str):
        """Export logs to JSON file."""
        with open(filepath, 'w') as f:
            json.dump([vars(log) for log in self.logs], f, indent=2)

Conclusion

Both frameworks are excellent choices for LLM applications:

  • LangChain: More flexible, better for complex orchestration and agents
  • LlamaIndex: More focused, better for data-centric applications and RAG

Many production systems use both - LlamaIndex for efficient RAG and LangChain for agent orchestration.

Key recommendations:

  1. Start with LlamaIndex for pure document Q&A
  2. Use LangChain when you need complex agent workflows
  3. Combine both when you need the best of both worlds
  4. Add observability early - it's much harder to add later
  5. Implement proper error handling with retries and fallbacks

References