You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This report summarises an automated exploration of the agentic-workflows agent across 4 representative software worker personas. 4 scenarios were tested; all produced high-quality, production-realistic workflow suggestions.
✅ strict: true applied consistently across all 4 scenarios without prompting
✅ gh-proxy mode universally adopted — no scenario used local mode or raw MCP server startup
✅ Security posture maintained — agent job stayed read-only in every case; all writes routed through safe-outputs
✅ Bash allowlist scoping was context-aware — narrow lists on PR-triggered workflows (untrusted input), broad ["*"] on scheduled workflows (no untrusted input)
✅ Report routing worked correctly — scheduled + discussion-output scenarios were properly routed to the report prompt pattern (e.g., close-older-discussions, mentions: false)
Top Patterns
Most common triggers: pull_request (with optional paths filter) for reactive workflows; schedule with natural-language syntax (every Monday at 08:00 UTC, daily on weekdays) for periodic reports
Most recommended tools: gh-proxy mode + targeted toolsets (issues, actions, pull_requests) — never over-provisioned
Universal security practices: strict: true, read-only agent permissions, safe-outputs for every write, label/mention spam prevention (allowed-github-references: [], max-bot-mentions: 0)
View High Quality Responses (Top 3)
Backend Engineer — DB Migration Reviewer (5.0/5.0)
Used paths filter on pull_request to avoid running on every PR
Structured analysis with 🔴 Critical / 🟡 Warning / 🟢 Best Practice tiers
Added duplicate-comment guard (checks for existing comment before posting)
Correctly narrowed bash to [cat, grep, diff, find, wc]
Fallback behaviour for zero-issues weeks explicitly handled in prompt
View Areas for Improvement
Cache-memory not surfaced: None of the 4 scenarios suggested using cache-memory for cross-run state persistence (e.g., storing a baseline coverage % between PR runs, or tracking failure trend data for the CI health report). The .github/aw/memory.md guidance could be more proactively recommended for stateful workflows.
workflow_dispatch auto-addition undocumented in prompts: The compiler automatically adds workflow_dispatch to scheduled workflows, but this is not mentioned in agent responses. Users may be surprised when they see it in the compiled lock file — a brief note in prompt responses would improve transparency.
Broad vs. narrow bash decision rationale: The DevOps scenario correctly used bash: ["*"] for a scheduled workflow with no untrusted input. However, this decision was made implicitly. Explicitly stating why (no user-controlled input in this trigger) would help users understand when to deviate from the default narrow allowlist.
Recommendations
Add a cache-memory usage example to .github/aw/memory.md targeted at PR workflows (e.g., storing baseline metrics between runs) and report workflows (e.g., tracking week-over-week trend deltas). Currently the agent does not proactively suggest persistence even when it would clearly improve workflow quality.
Document workflow_dispatch auto-injection in .github/aw/triggers.md so agents can mention it in responses — users should know their scheduled workflow can also be run on-demand without frontmatter changes.
Add a decision guide for bash allowlist scope to .github/aw/create-agentic-workflow.md — a simple rule: "narrow bash when the trigger exposes untrusted user input (issues, PRs, comments); bash: ['*'] is acceptable for scheduled or workflow_dispatch triggers with no untrusted input."
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This report summarises an automated exploration of the
agentic-workflowsagent across 4 representative software worker personas. 4 scenarios were tested; all produced high-quality, production-realistic workflow suggestions.Key Findings
strict: trueapplied consistently across all 4 scenarios without promptinggh-proxymode universally adopted — no scenario usedlocalmode or raw MCP server startupsafe-outputs["*"]on scheduled workflows (no untrusted input)reportprompt pattern (e.g.,close-older-discussions,mentions: false)Top Patterns
pull_request(with optionalpathsfilter) for reactive workflows;schedulewith natural-language syntax (every Monday at 08:00 UTC,daily on weekdays) for periodic reportsgh-proxymode + targetedtoolsets(issues,actions,pull_requests) — never over-provisionedstrict: true, read-only agent permissions,safe-outputsfor every write, label/mention spam prevention (allowed-github-references: [],max-bot-mentions: 0)View High Quality Responses (Top 3)
Backend Engineer — DB Migration Reviewer (5.0/5.0)
pathsfilter onpull_requestto avoid running on every PR[cat, grep, diff, find, wc]QA Tester — Coverage Analysis (5.0/5.0)
hide-older-comments: truekeeps PR clean across push eventsadd-labels.allowed: [coverage-drop]prevents label injection abusedefault,actions,pull_requests) scoped preciselyProduct Manager — Weekly Digest (5.0/5.0)
mentions: false+allowed-github-references: []correctly prevents notification spam from issue titlesclose-older-discussions: true+title-prefixprevents discussion accumulationView Areas for Improvement
Cache-memory not surfaced: None of the 4 scenarios suggested using
cache-memoryfor cross-run state persistence (e.g., storing a baseline coverage % between PR runs, or tracking failure trend data for the CI health report). The.github/aw/memory.mdguidance could be more proactively recommended for stateful workflows.workflow_dispatchauto-addition undocumented in prompts: The compiler automatically addsworkflow_dispatchto scheduled workflows, but this is not mentioned in agent responses. Users may be surprised when they see it in the compiled lock file — a brief note in prompt responses would improve transparency.Broad vs. narrow bash decision rationale: The DevOps scenario correctly used
bash: ["*"]for a scheduled workflow with no untrusted input. However, this decision was made implicitly. Explicitly stating why (no user-controlled input in this trigger) would help users understand when to deviate from the default narrow allowlist.Recommendations
Add a
cache-memoryusage example to.github/aw/memory.mdtargeted at PR workflows (e.g., storing baseline metrics between runs) and report workflows (e.g., tracking week-over-week trend deltas). Currently the agent does not proactively suggest persistence even when it would clearly improve workflow quality.Document
workflow_dispatchauto-injection in.github/aw/triggers.mdso agents can mention it in responses — users should know their scheduled workflow can also be run on-demand without frontmatter changes.Add a decision guide for bash allowlist scope to
.github/aw/create-agentic-workflow.md— a simple rule: "narrowbashwhen the trigger exposes untrusted user input (issues, PRs, comments);bash: ['*']is acceptable for scheduled orworkflow_dispatchtriggers with no untrusted input."References:
Beta Was this translation helpful? Give feedback.
All reactions