

In my experience, user stories are the working unit of agile teams - and bad stories cause bad sprints. Good stories take time to write well, which is why I see teams chronically underinvest in them. AI changes the math for me: a high-quality user story now takes 5-10 minutes instead of 30-45 minutes, and the team can iterate fast.
In this guide I share the prompts, templates, and review patterns I’ve found make AI user story generation reliable, plus the failure modes I’ve watched catch teams off-guard. Use it as the working manual I wish I’d had earlier.
A great user story has:
Skip any of these and the story underperforms.
A working workflow:
End-to-end: 10-15 minutes per story instead of 30-45.
Prompt 1: Generate a story from a problem statement
“You are a senior product owner. Below is a problem statement. Generate a user story in the format: ‘As a [user] I want [goal] so that [reason]’. Include 6-10 acceptance criteria covering happy path, edge cases, error states, and DoD. Output as markdown.”
Prompt 2: Refine an existing story
“Take this story and improve it: [paste]. Score it against INVEST. For any criterion failed, propose a specific fix.”
Prompt 3: Generate acceptance criteria only
“Take this story: [paste]. Generate 8-12 acceptance criteria. Use Given/When/Then format. Cover happy path, edge cases, error states, observability, and DoD.”
Prompt 4: Split a story
“This story is too big: [paste]. Suggest 3 ways to split it that each deliver independent customer value. For each split, name the new stories and what gets deferred.”
Prompt 5: Convert a feature description to stories
“Below is a feature description from a roadmap. Decompose it into 5-10 user stories. For each: title, narrative, AC. Order by suggested priority.”
INVEST = Independent, Negotiable, Valuable, Estimable, Small, Testable.
A useful audit prompt:
“Run INVEST checks on this story: [paste]. For each criterion, state pass or fail and why. Propose specific changes for any failed criterion.”
Output is a 6-line audit. Address the failures and re-run.
AI-generated AC tends toward generic. Improve with these techniques:
A working AC generation prompt:
“Generate 8-12 acceptance criteria for this story: [paste]. Use Given/When/Then. Each AC must be objectively testable, reference a specific user behaviour or system state, and avoid vague language. Cover happy path, 3 edge cases, 2 error states, observability hooks, and DoD.”
Common splitting patterns AI applies well:
A useful prompt:
“This story is too big: [paste]. Apply 3 different splitting strategies. For each, output the resulting stories. Recommend the best split for delivering customer value fastest.”
A higher-leverage workflow: generate stories directly from user research.
“Below are 8 user interview transcripts. Identify pain points worth solving. For the top 5 pain points, generate user stories with AC. Include a count of how many users mentioned each pain.”
This converts research into actionable backlog in minutes. Pairs especially well with continuous discovery practices.
Before any AI-generated story enters the backlog:
A 60-second review pass per story prevents bad stories from entering planning.
Story template:
As a [specific user] I want to [specific goal] So that [concrete benefit]
Acceptance criteria: - Given X, When Y, Then Z - (more)
Out of scope: - …
Dependencies: - …
Refinement template:
Story title Narrative (As a… I want… So that…) Background context AC Design link Dependencies DoD additions specific to this story
Consumer software: focus on user emotions and journey moments. AC includes UX states.
B2B SaaS: focus on workflow integration and admin / end-user separation. AC includes role-based behaviour.
Mobile: AC includes platform-specific (iOS/Android) considerations, offline state, push notifications.
Hardware: AC includes physical states, calibration, error recovery.
Internal tools: simpler narratives, focus on process compliance and audit logging.
Marketing/ops: stories about tasks rather than user experiences. AC includes deliverables and approval flows.
For teams to benefit:
A team-wide prompt library is one of the highest-leverage investments a scrum master or PO can make.
Track:
If quality drops, the issue is usually prompt drift, not AI itself.
Paul Lister, an Agilist and a Certified Scrum Trainer (CST) with 20+ years of experience, coaches Scrum courses, co-founded the Surrey & Sussex Agile meetup. He also writes short stories, novels, and have directed and produced short films.
QUICK FACTS
No. Use AI for the bulk-grunt-work stories. Human-write the strategic, ambiguous, or politically sensitive stories where the writing process itself helps clarify thinking.