building-role-mining-for-rbac-optimization
Apply bottom-up and top-down role mining techniques to discover optimal RBAC roles from existing user-permission assignments, reducing role explosion and enforcing least privilege.
What this skill does
# Building Role Mining for RBAC Optimization
## Overview
Role mining is the process of analyzing existing user-permission assignments to discover optimal roles for a Role-Based Access Control (RBAC) system. Organizations accumulate excessive permissions over time through job changes, project assignments, and ad-hoc access grants, leading to "role explosion" where thousands of granular roles exist with significant overlap. Role mining uses data analysis -- including clustering algorithms, formal concept analysis, and graph-based methods -- to consolidate permissions into a minimal set of roles that accurately represent business functions while enforcing least privilege.
## When to Use
- When deploying or configuring building role mining for rbac optimization capabilities in your environment
- When establishing security controls aligned to compliance requirements
- When building or improving security architecture for this domain
- When conducting security assessments that require this implementation
## Prerequisites
- Export of current user-permission assignments (CSV/database)
- Identity governance platform or directory service access
- Python 3.9+ with pandas, scikit-learn, numpy
- Understanding of organizational structure and job functions
- Stakeholder access for role validation workshops
## Core Concepts
### Role Mining Approaches
| Approach | Description | Best For |
|----------|-------------|----------|
| Bottom-Up | Analyze existing permissions to discover common patterns | Large datasets with organic permission growth |
| Top-Down | Design roles from business requirements and job descriptions | Greenfield RBAC or organizational restructuring |
| Hybrid | Combine bottom-up analysis with top-down business validation | Most production environments |
### Role Mining Algorithms
**1. Permission Clustering**: Group users with similar permission sets using k-means or hierarchical clustering. Users in the same cluster share a common role.
**2. Formal Concept Analysis (FCA)**: Mathematical framework that identifies complete set of concepts (user groups sharing exact permission sets) from a binary user-permission matrix.
**3. Graph-Based Mining**: Model users and permissions as a bipartite graph, then find dense subgraphs representing candidate roles.
**4. Boolean Matrix Decomposition**: Decompose the user-permission matrix U into U ≈ R × P where R maps users to roles and P maps roles to permissions.
### Role Mining Metrics
| Metric | Formula | Target |
|--------|---------|--------|
| Role Count | Total distinct roles after mining | Minimize |
| Coverage | Permissions explained by mined roles / Total permissions | > 95% |
| Weighted Structural Complexity (WSC) | Sum of role-user + role-permission assignments | Minimize |
| Deviation | Extra permissions not covered by assigned roles | < 5% |
## Workflow
### Step 1: Extract User-Permission Data
Collect the current access state from all identity sources:
```python
import pandas as pd
import numpy as np
# Load user-permission assignments
# Format: user_id, permission_id (one row per assignment)
assignments = pd.read_csv("user_permissions.csv")
# Create binary user-permission matrix (UPA matrix)
upa_matrix = assignments.pivot_table(
index="user_id",
columns="permission_id",
aggfunc="size",
fill_value=0
)
upa_matrix = (upa_matrix > 0).astype(int)
print(f"Users: {upa_matrix.shape[0]}")
print(f"Permissions: {upa_matrix.shape[1]}")
print(f"Assignments: {assignments.shape[0]}")
print(f"Density: {upa_matrix.values.sum() / upa_matrix.size:.2%}")
```
### Step 2: Bottom-Up Role Discovery Using Clustering
```python
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
def find_optimal_clusters(matrix, max_k=50):
"""Find optimal number of roles using silhouette analysis."""
scores = []
for k in range(2, min(max_k, matrix.shape[0])):
clustering = AgglomerativeClustering(
n_clusters=k, metric="jaccard", linkage="average"
)
labels = clustering.fit_predict(matrix)
score = silhouette_score(matrix, labels, metric="jaccard")
scores.append((k, score))
optimal_k = max(scores, key=lambda x: x[1])[0]
return optimal_k, scores
def mine_roles_clustering(upa_matrix, n_clusters):
"""Mine roles using hierarchical clustering on Jaccard distance."""
clustering = AgglomerativeClustering(
n_clusters=n_clusters, metric="jaccard", linkage="average"
)
user_matrix = upa_matrix.values
labels = clustering.fit_predict(user_matrix)
roles = {}
for cluster_id in range(n_clusters):
cluster_users = upa_matrix.index[labels == cluster_id]
cluster_permissions = upa_matrix.loc[cluster_users]
# Core role = permissions held by >80% of cluster members
permission_frequency = cluster_permissions.mean()
core_permissions = permission_frequency[permission_frequency >= 0.8].index.tolist()
roles[f"Role_{cluster_id}"] = {
"permissions": core_permissions,
"user_count": len(cluster_users),
"users": cluster_users.tolist(),
"coverage": permission_frequency[permission_frequency >= 0.8].mean()
}
return roles, labels
```
### Step 3: Formal Concept Analysis
```python
def mine_roles_fca(upa_matrix, min_support=3):
"""Mine roles using Formal Concept Analysis (frequent closed itemsets)."""
from itertools import combinations
users = upa_matrix.index.tolist()
permissions = upa_matrix.columns.tolist()
concepts = []
# Find all maximal permission sets shared by at least min_support users
for size in range(len(permissions), 0, -1):
for perm_combo in combinations(permissions, size):
perm_set = set(perm_combo)
# Find users who have ALL permissions in this set
matching_users = []
for user in users:
user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1])
if perm_set.issubset(user_perms):
matching_users.append(user)
if len(matching_users) >= min_support:
# Check if this is a closed concept (no superset with same extent)
is_closed = True
for concept in concepts:
if set(matching_users) == set(concept["users"]) and \
perm_set.issubset(set(concept["permissions"])):
is_closed = False
break
if is_closed:
concepts.append({
"permissions": list(perm_set),
"users": matching_users,
"support": len(matching_users)
})
if len(concepts) > 100: # Limit for performance
break
return concepts
```
### Step 4: Evaluate and Select Roles
```python
def evaluate_role_set(roles, upa_matrix):
"""Evaluate the quality of a mined role set."""
total_assignments = upa_matrix.values.sum()
covered_assignments = 0
extra_assignments = 0
for role_name, role_data in roles.items():
role_perms = set(role_data["permissions"])
for user in role_data["users"]:
user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1])
covered = role_perms.intersection(user_perms)
extra = role_perms - user_perms
covered_assignments += len(covered)
extra_assignments += len(extra)
metrics = {
"total_roles": len(roles),
"total_assignments": total_assignments,
"covered_assignments": covered_assignments,
"coverage_rate": covered_assignments / total_assignments if total_assignments else 0,
"extra_permissions": extra_assignments,
"deviation_rate": extra_assignments / (covered_assignments + extra_assignments) if (covered_assignments + extra_assignmentRelated in General
modeling-omnistudio-epc-catalog
IncludedSalesforce Industries CME EPC product-modeling skill for Product2-based catalog creation. Use when creating EPC products, configuring product attributes, building offer bundles with Product Child Items, or reviewing EPC DataPack JSON metadata for product catalog changes. TRIGGER when: user creates or updates Product2 EPC records, AttributeAssignment payloads, AttributeMetadata/AttributeDefaultValues, Offer bundles, or ProductChildItem relationships. DO NOT TRIGGER when: designing OmniScripts/FlexCards/Integration Procedures (use building-omnistudio-omniscript, building-omnistudio-flexcard, or building-omnistudio-integration-procedure), implementing Apex business logic (use generating-apex), or troubleshooting deployment pipelines (use deploying-metadata).
relationship-science-coach
IncludedUse this skill for direct, practical adult relationship coaching: couples conflict, repair, trust, marriage, dating, flirting, attachment patterns, emotional connection, sex, desire differences, eroticism, kink negotiation, affection, love languages, breakups, and long-term passion. Draw on Gottman, EFT and Hold Me Tight, attachment science, modern sex research, Perel, Nagoski, Kerner, Schnarch, Love and Stosny, and flexible love-language tools. Be concrete and low-hedge. Redirect only for imminent danger, abuse, coercive control, minors, non-consent, self-harm, stalking, or medical/legal/psychiatric decisions.
building-sf-integrations
IncludedSalesforce integration architecture and runtime plumbing with 120-point scoring. Use this skill to set up Named Credentials, External Credentials, External Services, REST/SOAP callout patterns, Platform Events, and Change Data Capture. TRIGGER when: user sets up Named Credentials, External Services, REST/SOAP callouts, Platform Events, CDC, or touches .namedCredential-meta.xml files. DO NOT TRIGGER when: Connected App/OAuth config (use configuring-connected-apps), Apex-only logic (use generating-apex), or data import/export (use handling-sf-data).
venue-templates
IncludedAccess comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
let-fate-decide
IncludedDraws the 12 Houses of the Zodiac Tarot spread to inject entropy into planning when prompts are vague, ambiguous, or casually delegated. Interprets the spread to guide next steps. Use when the user says 'let fate decide', 'YOLO', 'whatever', 'idk', or other nonchalant phrases, makes Yu-Gi-Oh references, or when you are about to arbitrarily pick between multiple reasonable approaches. Prefer over ask-questions-if-underspecified when the user's tone is casual or playful rather than precision-seeking.
net-ops
IncludedCross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH.