Claude
Skills
Sign in
Back

openclaw-github-dedupe

Included with Lifetime
$97 forever

Investigate a cluster of GitHub issues and PRs, determine canonical candidates, post duplicate/related status, preserve contributor credit, and execute cleanup actions. Supports autonomous mode for provided-link-only closeout, merge/fix follow-through, changelog, and post-merge issue/PR cleanup.

Code Reviewscriptsassets

What this skill does


# Issue/PR Cluster Deduper

Use this skill when a cluster of GitHub issues and pull requests has been reported for a common failure mode (Slack, iMessage, support threads, or manual list), and you need an evidence-based dedupe recommendation, execution pass, or autonomous closeout run.

## Purpose

Provide a consistent, evidence-driven triage pass for issue and PR clusters so duplicate work is folded, contributor credit is preserved, and cleanup actions stay auditable.

Execution is command-led and conservative: drive decisions from `gh` readbacks plus deterministic file/metadata checks only, and avoid speculative local analysis beyond triage logic.

Autonomous mode is different from broad dedupe: it starts from user-provided links/refs, follows only links found inside those refs, first decides whether the cluster is still viable and needed against current `main`, then drives the selected fix/merge/closure path to completion when repo policy allows.

Primary goal: make every run action-ready with explicit per-item actions, links, and command outcomes.

## Vision

Treat each cluster as a triage system, not a documentation exercise. The expected behavior is deterministic classification, auditable comment text, safe mutation flow, and fast operator comprehension.

## Principles (read before execution)

These are defined in `principles.md`:

- Evidence over narrative
- Root-cause alignment over title similarity
- Credit-preserving attribution
- Safe defaults (`dry-run` and blockers)
- Complete audit trail in comments and labels
- Humans first: communicate like a senior developer advocate, not a script.

Read `constitution.md` for governance requirements and decision quality rules before choosing final outcomes.

## Operator experience and communication defaults

This workflow is designed for high-velocity maintainers and external contributors:

- Keep every reply short, clear, and respectful.
- Lead with what happened, then why.
- Include a plain-English next step and reopen path.
- Avoid abrupt or accusatory language, especially on duplicates.
- Never hide uncertainty; if blocked, say what is missing and what you need next.
- Treat the assistant as a Developer Experience lead: helpful, practical, and direct without sounding mechanical.
- Use examples as style guidance, not templates to paste verbatim.
- Vary sentence ordering and opener lines per message so repeated runs do not sound identical.
- Do not repeat the exact same message body twice in one cluster run unless no safe variation is possible.
- Use short, conversational paragraphs (2-4 paragraphs), and avoid one-liner robotic templates.
- If intent is clear, keep tone warm and explanatory even when issuing a close.

## When to use

- You have a cluster of suspected duplicates to classify as canonical/related/independent.
- You need a concrete action plan with exact statuses instead of generic commentary.
- The user asked to execute safe closure/label/comment steps.
- You need to avoid duplicate PR churn by identifying which change should stay canonical.

## Inputs

- `cluster_refs` (required): list of issue/PR references as IDs or URLs.
- `mode` (optional): `plan` (default), `execute`, or `autonomous`.
- `channel` (optional): source context like `slack`, `imessage`, `support`, etc.
- `repo` (optional): explicit `owner/repo` when not using current checkout.
- `canonical_hint` (optional): explicit preference when ambiguity exists.
- `merge_guard` (optional): `high|medium` mergeability strictness; default `high`.
- `max_changed_files_for_canonical` (optional): default `30`.
- `max_delta_lines_for_canonical` (optional): default `2500`.
- `min_greptile_score` (optional): default `65` when available.
- `body_noise_mode` (optional): `strict|medium` for junk body tolerance; default `medium`.
- `reuse_copy_detection` (optional): `off|on` with default `on` for bot/copied-work checks.
- `bot_author_pattern` (optional): list of substrings for suspicious authors.
- `merge_tool_pref` (optional): `auto`, `gh`, `merge-skill`, or `land-skill`; default `auto`.
- `dry_run` (optional): `0|1` to force non-mutating mode.
- `output_mode` (optional): `compact|detailed`; default `detailed`.
- `triage_report` (optional): path to a triage report file (like `/Users/vincentkoc/Desktop/triage_report.md`) to seed initial cluster candidates.
- `search_limit` (optional): max similar items per item from GH search; default `12`.
- `search_queries` (optional): explicit query terms (space-separated).

## Autonomous mode

Trigger this path only when the user explicitly says `autonomous mode` or sets `mode=autonomous`. This is closeout work, not exploration.

- Do not run a broad GitHub search by default.
- Start only from refs/URLs the user supplied, refs linked from those issue/PR bodies, comments, review threads, closing refs, commit messages, and PR descriptions, or refs in an explicit local cluster artifact.
- If you need a broad search, stop and state the exact reason. Do not silently expand.
- Prefer full GitHub URLs in comments and final output.
- Do not ask before routine issue closeout when confidence is high and repo policy allows it. Ask before bulk PR close/reopen when local repo instructions require it.

Before editing, merging, or closing anything, prove the cluster is still actionable against current `main`:

1. Fetch current `origin/main` and identify whether the reported behavior is already fixed, obsolete, or still reproduces from code/tests.
2. Hydrate every provided/ref-linked item with `gh issue view` or `gh pr view`; include bodies, comments, labels, state, checks, review threads, and linked closing refs.
3. Classify each item as `needed`, `covered`, `stale`, or `independent`.
4. Identify the canonical path: existing merged commit/PR, existing open PR if mergeable or repairable, or new fix branch/PR only if no viable PR exists and the bug is confirmed from code/tests.
5. Do not enter drive mode until the canonical path and closeout target are explicit.

- If an open PR is canonical: address actionable review comments, fix CI/changelog, rebase on `main`, run targeted local gates, push, wait for relevant checks, and merge per repo policy.
- If `main` already contains the fix: use that merged PR/commit as canonical and close only high-confidence covered duplicates.
- If no PR exists and the bug is real: create a worktree/branch from `origin/main`, patch the smallest implicated surface, add focused tests/changelog when user-facing, open a draft PR, drive it through review/CI where permitted, then merge when clean.
- After merge/fix confirmation: close covered issues/PRs with a short comment naming the canonical full URL and why the item is covered.
- Leave independent or low-confidence items open with a relationship comment only when useful.
- Verify closure state with `gh issue view` / `gh pr view` after actions.

Autonomous mode final output may be compact, but it must include: canonical URL, landed commit/PR state, validation performed, closed refs, and items intentionally left open.

## Outputs

- Per-item action matrix with explicit status (`KEEP_OPEN_CANONICAL`, `CLOSE_DUPLICATE`, `KEEP_OPEN_RELATED`, `KEEP_OPEN_UNRELATED`, `MANUAL_REVIEW_REQUIRED`) and each row showing title + author.
- Evidence matrix with root-cause mapping, scope deltas, and risk blockers.
- Credit chain and attribution rationale (single-credit default, dual-credit only by exception).
- Required command set (dry-run and execution mode): close, comment, label, merge, and changelog actions.
- Escalation list for `manual-review-required` outcomes, with confidence score and hard-stop reason.

## Sub-agent orchestration

Use sub-agents in this chain for higher consistency and lower mistakes:

- `agents/cluster-intake-agent.md`
- `agents/cluster-similarity-agent.md`
- `agents/cluster-evidence-agent.md`
- `agents/cluster-decision-agent.md`
- `agents/cluster-synthesis-agent.md`

Run in sequence and feed each output to the next.

- Intake → Simi

Related in Code Review