Best Practices
Prompt Engineering Guide
Master the art and science of crafting effective prompts — from zero-shot to advanced reasoning patterns
Published: 2026-04-05 · Last updated: 2026-04-10
Prompt Engineering Guide
Prompt engineering is the practice of designing, optimizing, and systematizing input prompts to get reliable, high-quality results from Large Language Models. It combines an understanding of model behavior with structured communication patterns.
This guide covers everything from foundational principles to advanced reasoning techniques used by AI engineers in production systems.
Core Principles
1. Be Clear and Specific
Vague prompts produce vague outputs. The more precise your instructions, the more useful the response.
| ❌ Vague | ✅ Specific |
|---|---|
Tell me about AI | Explain the difference between narrow AI and general AI in 3 bullet points, with one real-world example each |
Write code | Write a Python function that takes a list of dicts and returns those sorted by a given key, with type hints and error handling |
Summarize this | Summarize the following article in 5 bullet points, focusing on business impact and technical implications |
2. Provide Context and Role
You are a senior ML engineer with 10 years of experience building production NLP systems.
Explain the concept of attention mechanisms to a software engineer who knows Python
but has never studied deep learning. Use analogies from web development where possible.
3. Specify Output Format
Analyze the following code for security vulnerabilities.
Format your response as:
1. **Critical Issues** (list with severity)
2. **Recommendations** (prioritized list)
3. **Fixed Code** (complete rewritten version)
4. **Explanation** (2-3 sentences per fix)
Code:
{insert_code_here}
4. Use Delimiters and Structure
<task>Classify the sentiment of the following review</task>
<review>
The product arrived late and the packaging was damaged.
However, the item itself works perfectly and exceeds expectations.
Mixed feelings overall.
</review>
<output_format>
Sentiment: [Positive/Negative/Mixed]
Confidence: [0-100%]
Key phrases: [list]
Reasoning: [2-3 sentences]
</output_format>
Fundamental Techniques
Zero-Shot Prompting
Ask the model to perform a task without examples.
Translate the following text to French: "The future of AI is collaborative."
Best for: Common tasks the model has seen during training (translation, summarization, classification).
Few-Shot Prompting
Provide examples to establish the expected pattern.
Convert these movie titles to emoji representations:
Input: "The Lion King"
Output: 🦁👑
Input: "Finding Nemo"
Output: 🔍🐠
Input: "The Matrix"
Output: 💊🕶️
Input: "Interstellar"
Output:
Best for: Tasks requiring specific formatting, style transfer, or non-obvious mappings.
Chain of Thought (CoT)
Encourage step-by-step reasoning before answering.
A factory produces 500 widgets per day. 12% are defective.
Of the non-defective widgets, 80% are shipped immediately
and the rest are stored. How many widgets are stored per week (5 days)?
Let's solve this step by step:
Why it works: Forces the model to "show its work," reducing calculation errors and logical mistakes. Research shows CoT can improve accuracy on math and reasoning tasks by 10-40%.
Role Prompting
Assign a specific persona or expertise level.
You are a cybersecurity expert conducting a penetration test review.
Evaluate this authentication implementation for:
1. Common attack vectors (OWASP Top 10)
2. Cryptographic weaknesses
3. Session management flaws
4. Rate limiting adequacy
Be specific about exploit scenarios, not just general advice.
Advanced Techniques
Tree of Thoughts (ToT)
Explore multiple reasoning paths and select the best one.
I need to design a database schema for a multi-tenant SaaS application.
Think about this from THREE different perspectives:
1. **Performance-first**: Optimize for read-heavy workloads
2. **Security-first**: Maximum tenant isolation
3. **Cost-first**: Minimize storage and compute costs
For each perspective, outline the schema design, then recommend a balanced approach that considers all three.
ReAct (Reasoning + Acting)
Combine reasoning with tool use or external actions.
You have access to a code execution environment.
Task: Find the bug in this Python code.
Process:
Thought: I should first understand what the code does.
Action: Run the code with sample input.
Observation: [output]
Thought: The output reveals X issue. Let me check Y.
Action: [next action]
...
Final Answer: [explanation + fix]
Constitutional AI / Self-Critique
Ask the model to review and improve its own output.
First, write a product description for a new AI-powered notebook.
Then, review your description and identify:
- Any exaggerated claims
- Missing key information a buyer would want
- Sentences that could be clearer
Finally, write an improved version addressing all issues.
Meta-Prompting
Use the model to design better prompts for itself.
I want to get the best possible code review from an LLM.
What prompt should I use? Consider:
- What context the model needs
- What output format is most useful
- How to prevent common LLM mistakes in code review
Design the optimal prompt and explain your reasoning for each design choice.
Prompt Templates
Code Generation Template
# Role
You are an expert {language} developer specializing in {specialty}.
# Task
Create a {function/class/module} that {description}.
# Requirements
- {requirement_1}
- {requirement_2}
- {requirement_3}
- Include comprehensive error handling
- Add type hints and docstrings
- Write unit tests
# Constraints
- Do NOT use {forbidden_library}
- Must be compatible with {version}
- Performance target: {time_complexity}
# Output Format
1. Implementation code
2. Usage example
3. Time/space complexity analysis
4. Edge cases handled
Analysis & Research Template
# Context
Analyze the following {type}:
{content_or_topic}
# Task
{specific_analysis_request}
# Evaluation Criteria
- Accuracy and factual correctness
- Logical consistency
- Completeness of coverage
- Practical actionability
# Output Format
- **Executive Summary** (3-5 sentences)
- **Key Findings** (bullet points, prioritized)
- **Detailed Analysis** (organized by theme)
- **Recommendations** (numbered, with rationale)
- **Uncertainties** (what's unclear or needs more data)
Data Extraction Template
Extract the following information from the text below:
Fields to extract:
- Company name
- Product name
- Key features (list)
- Pricing information
- Release date
- Target audience
Text:
{text}
Return the result as a JSON object. Use null for fields not found in the text.
Controlling Output Behavior
Temperature and Sampling
| Temperature | Behavior | Use Case |
|---|---|---|
| 0.0–0.2 | Deterministic, always picks most likely | Code generation, data extraction, factual Q&A |
| 0.3–0.5 | Focused with slight variation | Technical writing, documentation, analysis |
| 0.6–0.8 | Creative but grounded | Brainstorming, creative writing, ideation |
| 0.9–1.0 | Highly creative, unpredictable | Poetry, fiction, exploratory thinking |
Additional Controls
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.3, # Creativity level
top_p=0.9, # Nucleus sampling
max_tokens=1000, # Output length limit
frequency_penalty=0.0, # Reduce repetition (-2.0 to 2.0)
presence_penalty=0.0, # Encourage new topics (-2.0 to 2.0)
stop=["\n\n\n"], # Custom stop sequences
response_format={"type": "json_object"} # Structured output
)
Anti-Patterns to Avoid
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Vague instructions | Unpredictable, generic outputs | Be specific about task, format, scope |
| Contradictory requirements | Model picks one or fails both | Prioritize requirements clearly |
| Assuming unstated knowledge | Model may not know your context | Provide necessary background info |
| No output format spec | Inconsistent structure | Define exact format expected |
| Overly long prompts | Key instructions get lost | Use structure, headings, delimiters |
| Ignoring model limitations | Hallucinations, wrong answers | Ask for confidence levels; verify facts |
| Single massive prompt | Quality degrades with complexity | Break into steps; use chaining |
Testing & Iteration Framework
- Baseline: Start with the simplest prompt that could work
- Add constraints: Narrow scope, specify format, add examples
- Test edge cases: Try inputs that might break the prompt
- Measure outputs: Define success criteria (accuracy, format compliance, usefulness)
- A/B test: Run two prompt variants on the same inputs
- Iterate: Refine based on systematic observation, not gut feel
Evaluation Checklist
- Does the output consistently match the requested format?
- Are factual claims accurate (spot-check with external sources)?
- Does the output handle edge cases gracefully?
- Is the output length appropriate for the use case?
- Would a human expert consider this output useful?
Tools & Ecosystem
| Tool | Purpose | Link |
|---|---|---|
| Prompt Libraries | Curated collections of tested prompts | OpenAI Cookbook, Anthropic Examples |
| LMQL / Guidance | Constrained generation and prompt programming | github.com/eth-sri/lmql |
| LangChain Prompts | Prompt templates + versioning | python.langchain.com |
| Promptfoo | Prompt evaluation and benchmarking | promptfoo.dev |
| DSPy | Programmatic prompt optimization | github.com/stanfordnlp/dspy |
| LangSmith | Prompt tracing and evaluation | smith.langchain.com |
Related Documentation
- RAG Systems — Ground prompts with retrieved context
- Function Calling — Let models call external APIs
- Structured Outputs — Enforce exact output schemas
- Evaluation Metrics — Measure prompt effectiveness
Related docs
Distributed Training at Scale
Engineering systems for training 100B+ parameter models — cluster design, networking, fault tolerance, and the operational challenges of frontier model training
Developer Productivity Evaluator Agent Implementation Guide
Architecture, workflow design, metrics, and rollout guidance for a developer productivity evaluator agent in production.
Developer Productivity Executor Agent Implementation Guide
Architecture, workflow design, metrics, and rollout guidance for a developer productivity executor agent in production.
Related agents
Developer Productivity Evaluator Agent
Developer Productivity agent blueprint focused on score outputs against explicit rubrics so teams can compare variants, regressions, and rollout quality over time for engineering teams want reliable help with issue triage, runbook guidance, and change review without obscuring system ownership.
Developer Productivity Executor Agent
Developer Productivity agent blueprint focused on take well-bounded actions across tools and systems once a plan, permission model, and fallback path are already defined for engineering teams want reliable help with issue triage, runbook guidance, and change review without obscuring system ownership.
Developer Productivity Memory Agent
Developer Productivity agent blueprint focused on maintain durable task state, summarize interaction history, and preserve only the context worth carrying forward for engineering teams want reliable help with issue triage, runbook guidance, and change review without obscuring system ownership.