Introduction

Prompt engineering is the art and science of crafting inputs that elicit the best possible outputs from Large Language Models. As LLMs become more capable, the skill of prompting becomes increasingly valuable. In this post, we'll cover advanced techniques that consistently produce better results with complete, runnable code examples.

Prerequisites

pip install openai pydantic tenacity

# Setup for all examples
from openai import OpenAI
import json
from typing import List, Dict, Optional

client = OpenAI()  # Uses OPENAI_API_KEY env var

def chat(messages: List[Dict], model: str = "gpt-4o-mini", **kwargs) -> str:
    """Helper function for chat completions."""
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs
    )
    return response.choices[0].message.content

Why Prompt Engineering Matters

Approach	Quality	Cost	Effort
Bad prompts	Low	High (retries)	Low
Good prompts	High	Low	Medium
Fine-tuning	Highest	Very High	Very High

Good prompts can often match or exceed fine-tuned model performance at a fraction of the cost.

The Anatomy of a Good Prompt

A well-structured prompt typically includes:

System context: Who is the AI and what's its role?
Task description: What should it do?
Input data: What information to work with?
Output format: How should the response be structured?
Constraints: What to avoid or ensure?
Examples: Demonstrations of desired behavior

Technique 1: Zero-Shot Prompting

Direct instruction without examples. Works well for simple, clear tasks.

from openai import OpenAI
client = OpenAI()

def zero_shot(task: str, input_text: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": task},
            {"role": "user", "content": input_text}
        ]
    )
    return response.choices[0].message.content

# Example
result = zero_shot(
    task="You are a sentiment analyzer. Classify the sentiment as positive, negative, or neutral.",
    input_text="The product exceeded my expectations! Highly recommend."
)
print(result)  # "positive"

Technique 2: Few-Shot Learning

Provide examples to guide the model's behavior.

def few_shot_classify(text: str) -> str:
    messages = [
        {"role": "system", "content": "Classify customer feedback into categories."},
        {"role": "user", "content": "The delivery was 3 days late."},
        {"role": "assistant", "content": "Category: Shipping"},
        {"role": "user", "content": "Your app crashes every time I try to checkout."},
        {"role": "assistant", "content": "Category: Technical Issue"},
        {"role": "user", "content": "The customer service rep was very helpful!"},
        {"role": "assistant", "content": "Category: Customer Service"},
        {"role": "user", "content": text}
    ]

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

result = few_shot_classify("I was charged twice for the same item")
print(result)  # "Category: Billing"

How Many Examples?

Task Complexity	Recommended Examples
Simple classification	2-3
Complex reasoning	4-6
Creative writing	1-2 (style examples)
Data extraction	3-5

Technique 3: Chain-of-Thought (CoT)

Encourage step-by-step reasoning for complex problems.

def chain_of_thought(problem: str) -> str:
    prompt = f"""Solve this problem step by step. Show your reasoning before giving the final answer.

Problem: {problem}

Solution:
Step 1:"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Example
problem = """
A store sells apples for $0.50 each and oranges for $0.75 each.
If Sarah buys 6 apples and 4 oranges, and pays with a $10 bill,
how much change does she receive?
"""

print(chain_of_thought(problem))

Output:

Step 1: Calculate the cost of apples
6 apples × $0.50 = $3.00

Step 2: Calculate the cost of oranges
4 oranges × $0.75 = $3.00

Step 3: Calculate total cost
$3.00 + $3.00 = $6.00

Step 4: Calculate change
$10.00 - $6.00 = $4.00

Final Answer: Sarah receives $4.00 in change.

Zero-Shot CoT

Simply adding "Let's think step by step" can improve reasoning:

def zero_shot_cot(question: str) -> str:
    prompt = f"{question}\n\nLet's think step by step."

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Technique 4: Structured Outputs

Force consistent, parseable responses.

Using JSON Mode

import json

def extract_entities_json(text: str) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """Extract entities from the text. Return JSON with this structure:
{
    "people": ["name1", "name2"],
    "organizations": ["org1"],
    "locations": ["loc1"],
    "dates": ["date1"]
}"""
            },
            {"role": "user", "content": text}
        ],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)

result = extract_entities_json(
    "Apple CEO Tim Cook announced new products at the Cupertino event on March 15, 2024."
)
print(json.dumps(result, indent=2))

Using Pydantic for Validation

from pydantic import BaseModel
from typing import List, Optional

class ProductReview(BaseModel):
    sentiment: str
    rating: int
    pros: List[str]
    cons: List[str]
    summary: str

def analyze_review(review_text: str) -> ProductReview:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""Analyze the product review and return JSON matching this schema:
{ProductReview.model_json_schema()}"""
            },
            {"role": "user", "content": review_text}
        ],
        response_format={"type": "json_object"}
    )

    data = json.loads(response.choices[0].message.content)
    return ProductReview(**data)

Technique 5: Role Prompting

Assign a specific persona to the model.

def expert_analysis(topic: str, question: str) -> str:
    personas = {
        "security": "You are a senior cybersecurity expert with 20 years of experience at major tech companies.",
        "legal": "You are a corporate lawyer specializing in technology and intellectual property law.",
        "finance": "You are a CFA charterholder and senior financial analyst at an investment bank.",
        "medical": "You are a board-certified physician with expertise in internal medicine."
    }

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": personas.get(topic, "You are a knowledgeable expert.")},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

result = expert_analysis(
    "security",
    "What are the main vulnerabilities in JWT authentication?"
)

Technique 6: Self-Consistency

Generate multiple responses and aggregate for better accuracy.

from collections import Counter

def self_consistent_answer(question: str, n_samples: int = 5) -> str:
    """Generate multiple answers and return the most common one."""
    answers = []

    for _ in range(n_samples):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "user", "content": f"{question}\n\nThink step by step, then provide your final answer on a new line starting with 'ANSWER:'"}
            ],
            temperature=0.7  # Add some variance
        )

        # Extract final answer
        content = response.choices[0].message.content
        if "ANSWER:" in content:
            answer = content.split("ANSWER:")[-1].strip()
            answers.append(answer)

    # Return most common answer
    counter = Counter(answers)
    return counter.most_common(1)[0][0] if counter else answers[0]

Technique 7: Prompt Chaining

Break complex tasks into a pipeline of simpler prompts.

def research_and_summarize(topic: str) -> dict:
    """Multi-step prompt chain for research."""

    # Step 1: Generate research questions
    questions_response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": f"Generate 5 key research questions about: {topic}"}
        ]
    )
    questions = questions_response.choices[0].message.content

    # Step 2: Answer each question
    answers_response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": f"Answer these questions concisely:\n\n{questions}"}
        ]
    )
    answers = answers_response.choices[0].message.content

    # Step 3: Synthesize into summary
    summary_response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": f"Synthesize this research into a coherent 3-paragraph summary:\n\n{answers}"}
        ]
    )
    summary = summary_response.choices[0].message.content

    return {
        "questions": questions,
        "answers": answers,
        "summary": summary
    }

Technique 8: Constrained Generation

Set clear boundaries and rules.

def generate_with_constraints(task: str, constraints: list[str]) -> str:
    constraints_text = "\n".join(f"- {c}" for c in constraints)

    prompt = f"""Task: {task}

You MUST follow these constraints:
{constraints_text}

Generate your response:"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Example
result = generate_with_constraints(
    task="Write a product description for a laptop",
    constraints=[
        "Maximum 100 words",
        "Include exactly 3 bullet points",
        "Do not mention price",
        "Focus on productivity features",
        "Use professional tone"
    ]
)

Common Pitfalls

1. Vague Instructions

# Bad
"Summarize this article"

# Good
"Summarize this article in exactly 3 bullet points, each under 20 words, focusing on key findings."

2. Missing Output Format

# Bad
"Extract the dates from this text"

# Good
"Extract all dates from this text. Return as a JSON array in ISO format: ['YYYY-MM-DD', ...]"

3. Ambiguous Context

# Bad
"Fix the code"

# Good
"Fix the Python code below. The issue is: IndexError on line 15. Explain what caused it and provide the corrected code."

Temperature and Sampling

Use Case	Temperature	Top-p
Code generation	0.0-0.2	0.1
Factual Q&A	0.0-0.3	0.1
Creative writing	0.7-1.0	0.9
Brainstorming	0.8-1.2	0.95

# Deterministic output
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    temperature=0,
    seed=42  # For reproducibility
)

# Creative output
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    temperature=0.9,
    top_p=0.95
)

Technique 9: Tree of Thoughts (ToT)

For complex problems, explore multiple reasoning paths:

def tree_of_thoughts(problem: str, n_thoughts: int = 3) -> str:
    """Explore multiple reasoning paths and select the best."""

    # Step 1: Generate multiple initial thoughts
    thoughts_prompt = f"""Problem: {problem}

Generate {n_thoughts} different approaches to solve this problem.
For each approach, explain the reasoning briefly.

Format:
Approach 1: [description]
Approach 2: [description]
Approach 3: [description]"""

    thoughts = chat([{"role": "user", "content": thoughts_prompt}])

    # Step 2: Evaluate each approach
    eval_prompt = f"""Problem: {problem}

Possible approaches:
{thoughts}

For each approach, rate its likelihood of success (1-10) and explain why.
Then select the best approach and solve the problem using it.

Format:
Evaluation:
- Approach 1: [score] - [reason]
- Approach 2: [score] - [reason]
- Approach 3: [score] - [reason]

Best approach: [number]

Solution using best approach:
[detailed solution]"""

    result = chat([{"role": "user", "content": eval_prompt}])
    return result

# Example: Complex reasoning problem
problem = """
A farmer needs to cross a river with a wolf, a goat, and a cabbage.
The boat can only carry the farmer and one item at a time.
If left alone, the wolf will eat the goat, and the goat will eat the cabbage.
How can the farmer get everything across safely?
"""

print(tree_of_thoughts(problem))

Technique 10: ReAct (Reasoning + Acting)

Combine reasoning with tool use for complex tasks:

def react_agent(question: str, tools: Dict[str, callable], max_steps: int = 5) -> str:
    """ReAct-style reasoning with tool use."""

    tool_descriptions = "\n".join([
        f"- {name}: {func.__doc__}"
        for name, func in tools.items()
    ])

    system_prompt = f"""You are a helpful assistant that answers questions by reasoning and using tools.

Available tools:
{tool_descriptions}

For each step, use this format:
Thought: [your reasoning about what to do next]
Action: [tool_name]
Action Input: [input to the tool]

After receiving tool output, continue reasoning.
When you have the final answer, respond with:
Thought: I now have enough information
Final Answer: [your answer]"""

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ]

    for step in range(max_steps):
        response = chat(messages, temperature=0)
        messages.append({"role": "assistant", "content": response})

        # Check for final answer
        if "Final Answer:" in response:
            return response.split("Final Answer:")[-1].strip()

        # Parse action
        if "Action:" in response and "Action Input:" in response:
            action_line = [l for l in response.split("\n") if l.startswith("Action:")][0]
            input_line = [l for l in response.split("\n") if l.startswith("Action Input:")][0]

            action = action_line.replace("Action:", "").strip()
            action_input = input_line.replace("Action Input:", "").strip()

            if action in tools:
                try:
                    result = tools[action](action_input)
                    messages.append({
                        "role": "user",
                        "content": f"Observation: {result}"
                    })
                except Exception as e:
                    messages.append({
                        "role": "user",
                        "content": f"Error: {str(e)}"
                    })

    return "Max steps reached without conclusion"

# Example tools
def calculator(expression: str) -> str:
    """Evaluates mathematical expressions. Input: math expression like '2 + 2'"""
    try:
        return str(eval(expression))  # Note: use safer eval in production
    except:
        return "Invalid expression"

def search(query: str) -> str:
    """Searches for information. Input: search query"""
    # Simulated search results
    data = {
        "population of france": "67 million (2023)",
        "capital of japan": "Tokyo",
        "speed of light": "299,792,458 meters per second"
    }
    for key, value in data.items():
        if key in query.lower():
            return value
    return "No results found"

# Usage
result = react_agent(
    "What is the population of France divided by 10?",
    tools={"calculator": calculator, "search": search}
)
print(result)

Technique 11: Meta-Prompting

Use the LLM to generate better prompts:

def generate_optimized_prompt(task_description: str, examples: List[str] = None) -> str:
    """Use LLM to generate an optimized prompt for a task."""

    meta_prompt = f"""You are an expert prompt engineer. Generate an optimal prompt for the following task.

Task: {task_description}

{"Example inputs that the prompt should handle well:" + chr(10) + chr(10).join(f"- {ex}" for ex in examples) if examples else ""}

Requirements for the prompt:
1. Be specific and unambiguous
2. Include output format instructions
3. Add relevant constraints
4. Include 2-3 few-shot examples if helpful
5. Use chain-of-thought if the task requires reasoning

Generate the complete prompt:"""

    return chat([{"role": "user", "content": meta_prompt}])

# Example
task = "Classify customer support tickets into categories and priority levels"
examples = [
    "My account was charged twice for the same order",
    "How do I reset my password?",
    "Your service has been down for 3 hours and we're losing money"
]

optimized_prompt = generate_optimized_prompt(task, examples)
print(optimized_prompt)

Technique 12: Prompt Templates

Create reusable, parameterized prompts:

from dataclasses import dataclass
from string import Template
from typing import Any

@dataclass
class PromptTemplate:
    """Reusable prompt template with validation."""
    template: str
    required_vars: List[str]
    system_prompt: Optional[str] = None

    def format(self, **kwargs) -> List[Dict]:
        """Format the template with provided variables."""
        # Validate required variables
        missing = set(self.required_vars) - set(kwargs.keys())
        if missing:
            raise ValueError(f"Missing required variables: {missing}")

        # Format template
        formatted = Template(self.template).safe_substitute(**kwargs)

        messages = []
        if self.system_prompt:
            messages.append({"role": "system", "content": self.system_prompt})
        messages.append({"role": "user", "content": formatted})

        return messages

# Define reusable templates
CODE_REVIEW_TEMPLATE = PromptTemplate(
    template="""Review this $language code for:
1. Bugs and potential issues
2. Performance improvements
3. Code style and best practices

Code:
```$language
$code

Provide your review in this format:

Issues Found

[list issues with severity: critical/warning/info]

Suggested Improvements

[list improvements]

Overall Assessment

[brief summary]""", required_vars=["language", "code"], system_prompt="You are an expert code reviewer with deep knowledge of software engineering best practices." )

SUMMARIZE_TEMPLATE = PromptTemplate( template="""Summarize the following $content_type in$ length.

Content: $content

Focus on: $focus_areas""", required_vars=["content_type", "length", "content", "focus_areas"] )

Usage

messages = CODE_REVIEW_TEMPLATE.format( language="python", code=""" def factorial(n): if n == 0: return 1 return n * factorial(n) """ ) review = chat(messages) print(review)


# Evaluation and Testing

## A/B Testing Prompts

```python
import random
from dataclasses import dataclass
from typing import Callable

@dataclass
class PromptExperiment:
    """A/B test different prompts."""
    name: str
    prompts: Dict[str, str]
    evaluator: Callable[[str, str], float]  # (input, output) -> score

    def run(self, test_inputs: List[str], n_runs: int = 3) -> Dict:
        results = {name: [] for name in self.prompts}

        for input_text in test_inputs:
            for _ in range(n_runs):
                # Randomly select prompt variant
                variant = random.choice(list(self.prompts.keys()))
                prompt = self.prompts[variant]

                # Run prompt
                messages = [{"role": "user", "content": prompt.format(input=input_text)}]
                output = chat(messages)

                # Evaluate
                score = self.evaluator(input_text, output)
                results[variant].append(score)

        # Calculate statistics
        stats = {}
        for variant, scores in results.items():
            stats[variant] = {
                "mean": sum(scores) / len(scores),
                "min": min(scores),
                "max": max(scores),
                "n": len(scores)
            }

        return stats

# Example: Test different summarization prompts
def length_evaluator(input_text: str, output: str) -> float:
    """Score based on how close to 50 words the summary is."""
    word_count = len(output.split())
    target = 50
    return 1.0 - min(abs(word_count - target) / target, 1.0)

experiment = PromptExperiment(
    name="summarization_test",
    prompts={
        "simple": "Summarize this in about 50 words: {input}",
        "detailed": "Create a concise summary of approximately 50 words. Focus on key points only: {input}",
        "structured": "Summarize in exactly 50 words (+/- 5). Include: main topic, key finding, conclusion. Text: {input}"
    },
    evaluator=length_evaluator
)

# Run experiment
# results = experiment.run(test_articles, n_runs=5)
# print(json.dumps(results, indent=2))

Regression Testing

from dataclasses import dataclass
import hashlib

@dataclass
class PromptTestCase:
    """A test case for prompt regression testing."""
    name: str
    prompt: str
    expected_contains: List[str]  # Output should contain these
    expected_not_contains: List[str] = None  # Output should NOT contain these

def run_prompt_tests(test_cases: List[PromptTestCase]) -> Dict:
    """Run regression tests on prompts."""
    results = {"passed": [], "failed": []}

    for test in test_cases:
        messages = [{"role": "user", "content": test.prompt}]
        output = chat(messages, temperature=0)  # Deterministic

        # Check expected content
        passed = True
        failures = []

        for expected in test.expected_contains:
            if expected.lower() not in output.lower():
                passed = False
                failures.append(f"Missing: {expected}")

        if test.expected_not_contains:
            for forbidden in test.expected_not_contains:
                if forbidden.lower() in output.lower():
                    passed = False
                    failures.append(f"Contains forbidden: {forbidden}")

        if passed:
            results["passed"].append(test.name)
        else:
            results["failed"].append({"name": test.name, "failures": failures, "output": output[:200]})

    return results

# Example test suite
test_cases = [
    PromptTestCase(
        name="sentiment_positive",
        prompt="Classify the sentiment of 'I love this product!' as positive, negative, or neutral. Reply with just the sentiment.",
        expected_contains=["positive"],
        expected_not_contains=["negative"]
    ),
    PromptTestCase(
        name="json_format",
        prompt='Return a JSON object with keys "name" and "age" for a 25 year old named John.',
        expected_contains=['"name"', '"age"', "John", "25"]
    ),
]

# results = run_prompt_tests(test_cases)
# print(results)

Production Best Practices

1. Retry with Exponential Backoff

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=60)
)
def robust_chat(messages: List[Dict], **kwargs) -> str:
    """Chat completion with automatic retry."""
    return chat(messages, **kwargs)

2. Cost Tracking

import tiktoken

def estimate_cost(messages: List[Dict], model: str = "gpt-4o-mini") -> Dict:
    """Estimate API call cost."""
    encoding = tiktoken.encoding_for_model(model)

    input_tokens = sum(len(encoding.encode(m["content"])) for m in messages)

    # Approximate pricing (check OpenAI for current rates)
    pricing = {
        "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},  # per 1K tokens
        "gpt-4o": {"input": 0.005, "output": 0.015},
    }

    rates = pricing.get(model, pricing["gpt-4o-mini"])
    estimated_cost = (input_tokens / 1000) * rates["input"]

    return {
        "input_tokens": input_tokens,
        "estimated_cost_usd": round(estimated_cost, 6),
        "model": model
    }

3. Prompt Versioning

from datetime import datetime
import hashlib

class PromptRegistry:
    """Version and track prompts."""

    def __init__(self):
        self.prompts = {}
        self.history = []

    def register(self, name: str, prompt: str, metadata: Dict = None) -> str:
        """Register a prompt and return its version hash."""
        version = hashlib.md5(prompt.encode()).hexdigest()[:8]

        self.prompts[name] = {
            "prompt": prompt,
            "version": version,
            "metadata": metadata or {},
            "registered_at": datetime.now().isoformat()
        }

        self.history.append({
            "name": name,
            "version": version,
            "timestamp": datetime.now().isoformat()
        })

        return version

    def get(self, name: str) -> str:
        """Get a registered prompt."""
        if name not in self.prompts:
            raise KeyError(f"Prompt '{name}' not found")
        return self.prompts[name]["prompt"]

    def get_version(self, name: str) -> str:
        """Get the version hash of a prompt."""
        return self.prompts[name]["version"]

# Usage
registry = PromptRegistry()
registry.register(
    "sentiment_classifier",
    "Classify the sentiment as positive, negative, or neutral: {text}",
    metadata={"author": "team", "task": "classification"}
)

Conclusion

Prompt engineering is about clear communication with LLMs. Key takeaways:

Be specific: Vague prompts get vague responses
Show examples: Few-shot learning is powerful for consistent outputs
Encourage reasoning: Chain-of-thought and Tree-of-thoughts improve accuracy
Structure output: JSON and schemas ensure parseable, consistent responses
Test systematically: Treat prompts like code with regression tests
Version and track: Log prompt versions for reproducibility
Iterate: The best prompts evolve through experimentation

The best prompt engineers treat prompts like code - version controlled, tested, and continuously improved.

References

Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (2022)
Wang et al. "Self-Consistency Improves Chain of Thought Reasoning" (2023)
Yao et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" (2023)
Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2023)
OpenAI Prompt Engineering Guide: https://platform.openai.com/docs/guides/prompt-engineering
Anthropic Prompt Engineering: https://docs.anthropic.com/claude/docs/prompt-engineering