evidence-evaluation
Framework for assessing evidence quality in decision-making
What this skill does
# Evidence Evaluation Skill A systematic framework for assessing the quality and reliability of evidence used in decision-making. ## Purpose Evidence-based decisions require more than just having data - they require understanding how trustworthy that data is. This skill provides tools to distinguish facts from assumptions, evaluate source quality, and document evidence systematically. --- ## Fact vs Assumption Understanding the difference between facts and assumptions is fundamental to evidence evaluation. ### Facts **Definition**: Verifiable statements supported by direct observation, measurement, or documentation. **Characteristics**: - Can be independently verified - Based on direct evidence - Remain true regardless of perspective - Have clear provenance **Examples**: | Statement | Why It's a Fact | |-----------|-----------------| | "Server response time averaged 245ms over 1000 requests" | Measured and logged | | "The contract expires on March 15, 2026" | Documented in signed agreement | | "Three customers reported this bug in the last week" | Tracked in support system | | "The test suite has 847 passing tests" | Verified by running tests | ### Assumptions **Definition**: Beliefs or expectations accepted as true without direct verification. **Characteristics**: - May feel obvious but lack proof - Often based on past patterns - Can vary by perspective - May be reasonable but unverified **Examples**: | Statement | Why It's an Assumption | |-----------|------------------------| | "Users prefer the new design" | No user research conducted | | "Performance will improve with more servers" | Not tested in this context | | "The competitor will respond with a price cut" | Prediction, not fact | | "The team can deliver this in two sprints" | Estimate, not commitment | ### Converting Assumptions to Facts When an assumption is critical to a decision, consider how to verify it: ``` Assumption: "Most users access the app on mobile" -> Verification: Check analytics data for device breakdown -> Result: "68% of sessions are from mobile devices" (fact) Assumption: "The API can handle 10x current load" -> Verification: Run load tests -> Result: "API failed at 7x load with memory exhaustion" (fact) Assumption: "Customers want feature X" -> Verification: Conduct user interviews or surveys -> Result: "23 of 30 interviewed customers expressed need for X" (fact) ``` --- ## Source Quality Framework Evaluate evidence sources across four dimensions, each rated High/Medium/Low. ### 1. Credibility (Who) Assesses the trustworthiness of the information source. | Rating | Criteria | Examples | |--------|----------|----------| | **High** | Expert in relevant domain; established track record; no conflicts of interest; peer-reviewed or audited | Academic research, official documentation, recognized industry experts, audited financial statements | | **Medium** | Knowledgeable but not expert; some track record; minor potential conflicts; reviewed but not rigorously | Industry reports, experienced practitioners, trade publications, internal analysis | | **Low** | Unknown expertise; no track record; significant conflicts of interest; unreviewed | Anonymous sources, marketing materials, unverified claims, self-reported without validation | **Questions to Ask**: - What are the source's qualifications on this topic? - Does the source have a track record of accuracy? - What incentives might bias the source? - Has the information been reviewed or verified by others? ### 2. Methodology (How) Assesses how the information was gathered or conclusions were reached. | Rating | Criteria | Examples | |--------|----------|----------| | **High** | Rigorous, documented methodology; appropriate sample size; controlled conditions; reproducible | Scientific studies, comprehensive testing, statistical analysis with proper controls | | **Medium** | Reasonable methodology with some gaps; moderate sample size; some documentation | Surveys with decent response rates, pilot tests, case studies with multiple examples | | **Low** | Unclear or flawed methodology; small or biased samples; anecdotal; not reproducible | Single anecdotes, informal polls, untested hypotheses, "common knowledge" | **Questions to Ask**: - How was this data collected? - Is the sample size adequate and representative? - Could this result be replicated? - What variables were controlled? ### 3. Recency (When) Assesses how current the information is relative to the decision context. | Rating | Criteria | Examples | |--------|----------|----------| | **High** | Current or real-time; collected within relevant timeframe; accounts for recent changes | Live metrics, recent market research, current documentation, data from this quarter | | **Medium** | Reasonably recent; may not reflect latest changes; still largely applicable | Last year's data, recent but not current reports, slightly outdated documentation | | **Low** | Outdated; from before significant changes; may no longer apply | Multi-year-old research, pre-pivot data, documentation from deprecated systems | **Questions to Ask**: - When was this information collected or published? - Have relevant conditions changed since then? - Is there a more recent source available? - Does the age of this data matter for this decision? ### 4. Relevance (What) Assesses how directly the information applies to the specific decision. | Rating | Criteria | Examples | |--------|----------|----------| | **High** | Directly addresses the question; same context and conditions; specific to this situation | Data from this system, research on this exact problem, feedback from target users | | **Medium** | Related but not exact; similar context; requires some extrapolation | Industry benchmarks, analogous case studies, research on similar problems | | **Low** | Tangentially related; different context; significant extrapolation required | General statistics, research from different domains, opinions on different problems | **Questions to Ask**: - Does this evidence directly address our specific question? - How similar is the source context to our situation? - What assumptions are needed to apply this evidence here? - Is there more directly relevant evidence available? --- ## Evidence Quality Score Combine the four dimension ratings into an overall quality assessment. ### Scoring Method Assign points to each rating: - High = 3 points - Medium = 2 points - Low = 1 point Calculate total (max 12 points): | Total Score | Overall Quality | Guidance | |-------------|-----------------|----------| | 10-12 | **Strong Evidence** | Can rely on for important decisions | | 7-9 | **Moderate Evidence** | Useful but seek corroboration | | 4-6 | **Weak Evidence** | Use cautiously; actively seek better sources | | 1-3 | **Poor Evidence** | Do not rely on; treat as unverified assumption | ### Example Evaluation **Evidence**: "Industry report says 40% of companies are adopting this technology" | Dimension | Rating | Rationale | |-----------|--------|-----------| | Credibility | Medium | Reputable firm but funded by vendor | | Methodology | Medium | Survey of 500 companies; self-reported | | Recency | High | Published this quarter | | Relevance | Medium | Includes our industry but broader scope | **Score**: 2 + 2 + 3 + 2 = 9 (Moderate Evidence) **Conclusion**: Useful directional indicator but seek additional sources before major commitments. --- ## Red Flags Warning signs that evidence may be unreliable: ### Conflict of Interest - Source benefits from a particular conclusion - Funding source has stake in results - Vendor-provided benchmarks for their own products - *Example*: Security vendor's report on rising cyber threats ### Cherry-Picking - Only favorable data points presented - Timeframe chosen to support conclusion - Comparison groups selected to look good - *Example*: "Our growth this quarter" ignoring down quarters ### Missing Methodology - No explanation of h
Related in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.