Bad bug reports waste more engineering time than the bugs themselves. A report that says "the page is broken" sends a developer on a scavenger hunt — checking logs, trying to reproduce the issue, asking the reporter for details they should have included upfront. Multiply that by dozens of bugs a week and the cost is significant.
Good bug reports are a skill, and AI can help you write them faster without sacrificing quality. These patterns also extend beyond reports into the full issue lifecycle: triage, analysis, reproduction, and postmortems.
These five patterns cover the core tasks: structured bug reports, issue triage, error log analysis, reproduction step generation, and incident postmortems.
Pattern 1: The Structured Bug Report
A complete bug report answers every question a developer would ask before they start working. This pattern ensures nothing is missing.
The Template
You are a QA engineer writing a bug report for a development team.
I will describe a bug I encountered. Write a structured bug report based on my description.
My description of the issue:
[describe what happened — what you were doing, what went wrong, what you expected]
Additional context:
- Product/feature area: [which part of the application]
- Environment: [browser, OS, device, app version — whatever is relevant]
- Frequency: [does this happen every time, intermittently, or only under specific conditions?]
- Error messages: [any error messages, console errors, or status codes you saw]
- Screenshots or recordings: [note if available — the AI cannot see them, but it should reference them in the report]
Write the bug report with:
1. Title: A specific, searchable summary (not "bug in page" — "Checkout fails with 500 error when applying discount code to cart over $200")
2. Severity: Classify as critical / high / medium / low based on user impact
3. Steps to reproduce: Numbered steps that someone unfamiliar with the issue can follow
4. Expected behavior: What should have happened
5. Actual behavior: What actually happened
6. Environment details: Browser, OS, version, account type, etc.
7. Additional notes: Any patterns, workarounds, or related issues
Constraints:
- Steps to reproduce must be specific enough that a developer can follow them without asking questions
- If my description is missing key information, include it in the report with a [NEEDS INFO] tag so I can fill it in
- Do not speculate about the root cause unless the evidence strongly suggests it
- Use neutral language — describe the behavior, do not assign blame
- Keep the total report concise — under 200 words excluding steps
Why It Works
The "[NEEDS INFO]" tag for missing details is the key feature. Instead of the AI inventing plausible-sounding details (which leads to wasted debugging time), it explicitly flags gaps. The searchable title requirement makes the bug findable later — a detail that saves significant time in teams with large bug backlogs.
Example Output
From a description: "I was trying to buy something and used a discount code but it just showed an error":
Title: Checkout returns 500 error when applying discount code SAVE20 to cart totaling over $200
>
Severity: High — Blocks purchase completion for users with discount codes.
>
Steps to reproduce:
1. Log in with a standard user account
2. Add items to cart totaling over $200 [NEEDS INFO: does the exact amount matter? Test with amounts near $200 to find the threshold]
3. Proceed to checkout
4. Enter discount code "SAVE20" in the promo code field
5. Click "Apply"
6. Observe the error
>
Expected behavior: Discount code is applied and the total reflects the discount.
>
Actual behavior: Page displays a generic error message. [NEEDS INFO: exact error message text? Check browser console for the HTTP status code — reporter mentioned 500 but this needs confirmation]
>
Environment: [NEEDS INFO: browser, OS, device]
>
Additional notes: Reporter encountered this once. Reproduction consistency is unknown — test whether the issue occurs with all discount codes or only SAVE20, and whether cart total is a factor.
Pattern 2: The Issue Triage Assistant
When you have a backlog of bugs and feature requests, triage — deciding what to fix first — is critical. This pattern helps you prioritize systematically.
The Template
You are a technical lead triaging a list of issues for an engineering team.
Our context:
- Product: [what the product does]
- Current priorities: [what the team is focused on this sprint/quarter]
- Team capacity: [how many developers, any constraints]
- Upcoming releases or deadlines: [anything time-sensitive]
Issues to triage:
[paste a list of issues — titles and brief descriptions are enough]
For each issue, assess:
1. Severity: Critical (system down, data loss), High (feature broken, major UX issue), Medium (feature degraded, workaround exists), Low (cosmetic, minor inconvenience)
2. Impact scope: How many users or use cases are affected
3. Urgency: Must fix now, should fix this sprint, can wait for next sprint, backlog
4. Effort estimate: Small (< 1 day), Medium (1-3 days), Large (3+ days) — rough sizing only
5. Recommendation: Fix now, schedule this sprint, schedule next sprint, backlog, or needs more information
After individual assessments, provide:
- A prioritized list (top 5 items to address first)
- Any items that seem related and might share a root cause
- Items that need more information before they can be properly triaged
Constraints:
- Prioritize based on user impact and business risk, not just technical interest
- If two items are equally important, prefer the one with lower effort (quick wins first)
- Flag items where severity is unclear due to incomplete information
- Do not assign to specific developers — that is the team lead's decision
- Tone: pragmatic and decisive
Why It Works
The multi-factor assessment (severity, scope, urgency, effort) prevents single-dimensional prioritization. Without this structure, teams tend to prioritize either by severity alone (ignoring effort) or by effort alone (doing easy things first regardless of impact). The "related issues" identification can reveal underlying problems that fixing once would close multiple tickets.
Example Output
For triaging eight issues on a SaaS dashboard product:
Issue 1: "Dashboard charts show incorrect data after timezone change"
Severity: High — data accuracy issue affects trust in the product
Impact: All users who change their timezone or are in non-UTC zones
Urgency: Should fix this sprint — this is a data integrity issue
Effort: Medium — likely a timezone conversion bug in the chart data pipeline
Recommendation: Schedule this sprint
>
Issue 2: "Export to CSV button returns empty file"
Severity: High — core feature is completely broken
Impact: Any user who exports data
Urgency: Fix now — this is a regression, it worked last release
Effort: Small — likely a recent code change broke the export endpoint
Recommendation: Fix now (top priority)
>
Issue 3: "Logo on login page is slightly misaligned on Safari"
Severity: Low — cosmetic only
Impact: Safari users on the login page
Urgency: Backlog
Effort: Small
Recommendation: Backlog — fix opportunistically if someone is already touching the login page
>
[Assessments for remaining issues would follow]
>
Prioritized top 5:
1. Issue 2 — CSV export broken (high impact, small effort, regression)
2. Issue 1 — Timezone chart data (high impact, data integrity)
3. Issue 5 — API rate limit not enforced (security risk)
4. Issue 7 — Email notifications delayed by 2+ hours (user-facing SLA issue)
5. Issue 4 — Search results pagination broken on page 3+ (medium impact, small effort)
>
Potentially related: Issues 1 and 6 both involve timezone handling — investigate whether they share a root cause in the date utility functions.
Pattern 3: The Error Log Analyzer
When you are staring at a stack trace or error log and need to make sense of it quickly, this pattern helps you parse the information and identify the likely issue.
The Template
You are a senior developer analyzing an error log.
Context:
- Application type: [web app, API server, mobile app, CLI tool, etc.]
- Tech stack: [language, framework, database, relevant infrastructure]
- When this started: [when the error first appeared — after a deploy, at a specific time, randomly]
- Frequency: [constant, intermittent, increasing]
Error log:
[paste the error log, stack trace, or relevant log output]
Analysis requested:
1. Error summary: What is happening in plain language (1-2 sentences)
2. Root cause hypothesis: The most likely cause based on the error and context, with reasoning
3. Key lines: Which specific lines in the log are most informative and why
4. Suggested investigation steps: 3-5 specific things to check, in order of likelihood
5. Quick fix vs. proper fix: If there is an obvious quick fix, describe it — and then describe what a proper long-term fix would look like
Constraints:
- Do not assume the error message is accurate — sometimes error messages are misleading or generic
- If the stack trace points to a library or framework, mention whether this is a known issue pattern for that library
- If the error could have multiple causes, list the top 2-3 with relative likelihood
- Be specific in investigation steps — "check the database" is not helpful, "check if the connection pool is exhausted by running [command]" is helpful
- Tone: calm and methodical — this might be a production incident
Why It Works
The "do not assume the error message is accurate" instruction is crucial. Experienced developers know that error messages frequently mislead — a "connection timeout" might be a DNS issue, and a "null pointer" might be a permissions problem. The quick fix vs. proper fix distinction acknowledges that production incidents need immediate relief and long-term solutions, and those are often different things.
Example Output
For a Node.js application throwing "ECONNREFUSED" errors after a deployment:
Error summary: The application is unable to connect to a service (likely the database or an external API) on the expected host and port. Connections are being actively refused, meaning the target service is either not running, not listening on that port, or not reachable from the application's network.
>
Root cause hypothesis (most likely): The deployment changed the environment configuration, and the connection string or host variable is pointing to an incorrect or unreachable address. Likelihood: 60%.
>
Second possibility: The target service (database) is running but has reached its connection limit and is refusing new connections. Likelihood: 25%.
>
Third possibility: A network/firewall rule changed alongside the deployment, blocking outbound connections. Likelihood: 15%.
>
Key lines:
- Line 14: Error: connect ECONNREFUSED 10.0.3.42:5432 — the IP and port tell you exactly where it is trying to connect. Verify this matches your current database host.
- Line 22: at TCPConnectWrap.afterConnect — this confirms it is a TCP-level connection failure, not an authentication or query error. The connection never establishes.
>
Investigation steps:
1. Check the DATABASE_URL environment variable in the deployed environment — compare it to the previous working version
2. From the application host, run nc -zv 10.0.3.42 5432 to verify network reachability
3. Check if the database service is running: pg_isready -h 10.0.3.42 -p 5432
4. Check connection pool metrics — if using a pool, verify active connections have not hit the max
5. Review the deployment diff for any changes to environment configuration files
>
Quick fix: If the environment variable changed, revert it to the known-working value and redeploy. Proper fix: Add a startup health check that validates database connectivity before the application accepts traffic, and add monitoring alerts for connection failures.
Pattern 4: The Reproduction Step Generator
The hardest part of many bugs is reproducing them. This pattern takes vague reports and generates systematic reproduction strategies.
The Template
You are a QA engineer creating a reproduction plan for a reported bug.
Bug report:
[paste the original bug report or description]
Product context:
- Application type: [web app, mobile app, API, etc.]
- Known user states that affect behavior: [logged in vs. guest, free vs. paid, admin vs. regular, etc.]
- Known environmental factors: [browser differences, mobile vs. desktop, region, etc.]
- Recent changes: [any recent releases, config changes, or migrations]
Create a reproduction plan with:
1. Primary reproduction attempt: Step-by-step instructions based on the most literal interpretation of the report
2. Variable isolation: A list of factors to test individually to narrow down the trigger:
- User state variations to try
- Environmental variations to try
- Data variations to try (e.g., different input values, edge cases)
3. Edge case checklist: 5-8 specific edge cases that often cause bugs like this one (e.g., empty inputs, special characters, concurrent actions, boundary values)
4. If not reproducible: A structured set of questions to ask the reporter that would help narrow down the conditions
Constraints:
- Start with the simplest reproduction attempt and add complexity only if needed
- Each variation should change exactly one factor so you can isolate the cause
- Include negative tests — conditions where the bug should NOT occur — to confirm the trigger
- Do not assume the reporter's description is complete — common steps like "log in" are often omitted
- Keep the plan executable in under 30 minutes
Why It Works
The "change one factor at a time" approach is the scientific method applied to debugging. Without this structure, QA testers often change multiple things at once and can never pinpoint which factor triggers the bug. The negative tests (where the bug should not occur) confirm the boundary conditions and prevent false positives.
Example Output
For a reported bug: "Image upload fails sometimes":
Primary reproduction attempt:
1. Log in with a standard user account
2. Navigate to the profile settings page
3. Click "Change Avatar" (or wherever image upload is available)
4. Select a standard JPEG image under 2 MB
5. Click "Upload" and observe the result
>
If this succeeds, the bug is conditional. Proceed to variable isolation.
>
Variable isolation:
>
File type variations (change one at a time):
- Test with: JPEG, PNG, GIF, WebP, SVG, BMP
- Test with: HEIC/HEIF (common on iOS — often causes upload issues)
>
File size variations:
- Under 1 MB, exactly at the limit (if known), slightly over the limit, significantly over (10 MB+)
>
File name variations:
- Standard name: photo.jpg
- Spaces in name: my photo.jpg
- Special characters: photo(1).jpg, photo#2.jpg
- Very long filename: 200+ characters
- Unicode characters in filename
>
User state:
- Free account vs. paid account (different upload limits?)
- New account vs. established account
- Profile already has an image vs. first-time upload
>
Environment:
- Chrome, Firefox, Safari, Edge
- Desktop vs. mobile browser
- Fast connection vs. throttled (simulate slow 3G in dev tools)
>
Edge case checklist:
- [ ] Upload a 0-byte file (empty file with a .jpg extension)
- [ ] Upload a file with the wrong extension (rename a .txt to .jpg)
- [ ] Upload while on a slow or interrupted connection
- [ ] Upload two images in rapid succession (double-click or duplicate submission)
- [ ] Upload immediately after session refresh
- [ ] Upload the same file that was already uploaded (duplicate)
- [ ] Upload a very large image (50 MB+) to test timeout handling
- [ ] Upload from a mobile device using the camera option vs. file picker
>
If not reproducible, ask the reporter:
1. What type of file were you uploading? (File extension and approximate size)
2. What device and browser were you using?
3. Did you see an error message, or did the upload appear to do nothing?
4. Were you on WiFi or mobile data?
5. Has upload worked for you before, or is this the first time you tried?
Pattern 5: The Incident Postmortem
After a production incident, a postmortem captures what happened, why, and how to prevent it from happening again. This pattern produces a structured document that is honest without being punitive.
The Template
You are a site reliability engineer writing an incident postmortem.
Incident details:
- What happened: [describe the incident — what broke, what users experienced]
- When: [timeline — when it started, when it was detected, when it was resolved]
- Duration: [total time of impact]
- Severity: [how bad it was — number of users affected, revenue impact, data loss, etc.]
- Root cause: [what caused the incident — if known]
- How it was detected: [monitoring alert, customer report, internal discovery]
- How it was resolved: [what actions fixed it]
Write a postmortem with:
1. Incident summary: 3-4 sentences covering what happened, the impact, and the resolution (this is what leadership will read)
2. Timeline: A minute-by-minute or hour-by-hour log of key events from detection to resolution
3. Root cause analysis: What caused the incident, what conditions allowed it to happen, and whether it could have been prevented with existing safeguards
4. Contributing factors: Additional factors that made the incident worse or delayed resolution (not root causes, but amplifiers)
5. What went well: Things that worked during the incident response
6. What went poorly: Things that slowed down detection or resolution
7. Action items: Specific, assignable tasks to prevent recurrence — each with a priority and a suggested owner (role, not name)
8. Lessons learned: 2-3 takeaways for the broader team
Constraints:
- Blameless tone — focus on systems and processes, not individual mistakes
- Action items must be specific and achievable — "improve monitoring" is not an action item, "add a latency alert on the checkout endpoint with a 5-second threshold" is
- Do not minimize the incident or its impact — honest assessment builds trust
- Include at least one action item that addresses detection speed, not just prevention
- Keep the full postmortem under 800 words
Why It Works
The "blameless" constraint is foundational to effective postmortems — teams that blame individuals create cultures where people hide mistakes instead of learning from them. The "what went well" section is often skipped but matters: it identifies existing safeguards worth preserving and reinforces good incident response behavior.
Example Output
For a 45-minute checkout outage caused by a database migration:
## Incident Postmortem: Checkout Outage — April 10, 2026
>
### Summary
On April 10 at 14:22 UTC, a database migration deployed to production locked the orders table, causing all checkout requests to time out. The outage lasted 45 minutes and affected all users attempting to complete purchases. Approximately 1,200 checkout attempts failed during this period. The issue was resolved by rolling back the migration at 15:07 UTC.
>
### Timeline
- 14:15 — Migration deployed to production as part of routine release
- 14:22 — First checkout timeout errors appear in logs
- 14:28 — Monitoring alert fires for elevated 500 error rate on /checkout endpoint
- 14:31 — On-call engineer acknowledges alert, begins investigation
- 14:38 — Database lock identified as the cause via pg_stat_activity query
- 14:42 — Team decides to roll back the migration rather than wait for it to complete
- 14:55 — Rollback script executed
- 15:07 — All checkout requests processing normally, incident closed
>
### Root Cause
The migration added an index to the orders table usingCREATE INDEXwithout theCONCURRENTLYoption. On a table with 4.2 million rows, this operation acquired a write lock that blocked all INSERT operations — including checkout transactions. The migration worked correctly in staging, which has only 50,000 rows and completed in under a second.
>
### Contributing Factors
- Staging database does not mirror production data volume — the migration appeared safe based on staging performance
- The migration was part of a batch release and was not flagged for individual review
- No pre-deployment check exists for potentially locking database operations
>
### What Went Well
- The monitoring alert fired within 6 minutes of impact starting
- The team correctly identified the root cause within 10 minutes
- The rollback decision was made quickly — no extended debate about waiting it out
>
### What Went Poorly
- 6-minute detection delay — users experienced errors before we knew about it
- No migration review process caught the missing CONCURRENTLY keyword
- Rollback took 12 minutes because the rollback script had to be written on the spot
>
### Action Items
1. Add a CI check for locking migrations — Lint all SQL migrations for operations that acquire table-level locks without CONCURRENTLY. Owner: Backend lead. Priority: High.
2. Seed staging with production-scale data — Mirror production volume in staging so performance issues surface before production. Owner: Platform team. Priority: High.
3. Pre-write rollback scripts — Every migration must include a tested rollback script before it is approved. Owner: Backend team (policy change). Priority: Medium.
4. Add checkout-specific latency alert — Current alerts trigger on error rate only. Add a p95 latency alert at 3 seconds on the checkout endpoint. Owner: SRE. Priority: High.
>
### Lessons Learned
- Migrations that are fast on small datasets can be catastrophic on production-scale data. Always test with realistic volumes.
- Having a pre-written rollback script would have cut resolution time by half.
- The team's incident response was strong — the gap was in prevention, not response.
Quick Tips for Bug Report and Triage Prompts
- Paste actual error messages and logs. The AI cannot analyze what it cannot see. Paste the real output, not a paraphrase.
- Include the context around changes. Most bugs follow a change — a deployment, a configuration update, a data migration. Always mention what changed recently.
- Specify your tech stack. Error analysis is dramatically better when the AI knows the language, framework, and infrastructure you are using.
- Use these for drafts, not final versions. AI-generated bug reports and postmortems should be reviewed and refined by the people who have firsthand knowledge of the issue.
- Build a library of templates. Save your best bug report and postmortem prompts in your project wiki so the whole team uses consistent formats.
When to Use Templates vs. Freeform Prompts
Use these templates for recurring quality processes — writing up bugs during testing cycles, triaging backlogs, or running postmortems after incidents. The structure ensures nothing is missed and makes outputs comparable across the team.
Go freeform when you are debugging in real time and need the AI to help you think through a specific, unique problem. For those sessions, use the CRAFT framework from our prompt writing guide to provide enough context for a useful response.
For instant prompt generation without building templates manually, SurePrompts' AI Prompt Generator can structure your engineering workflow requests automatically.