oraclecloud-observability

Included with Lifetime

$97 forever

Set up programmatic monitoring, logging, and alarms for OCI resources. Use when configuring OCI Monitoring metrics, creating alarm rules, publishing custom metrics, or searching logs via the Logging service. Trigger with "oraclecloud observability", "oci monitoring", "oci alarms", "oci logging", "oracle cloud observability".

Cloud & DevOpssaasoraclecloudoci

What this skill does

# Oracle Cloud Observability

## Overview

Set up programmatic monitoring for OCI infrastructure using the Monitoring, Logging, and Notifications services. The OCI Console buries these features behind nested menus, and the status page has historically failed to acknowledge outages (e.g., London region, January 2026). This skill builds monitoring you control through code — metric queries, alarm rules, custom metric publishing, and log searches — so you are never surprised by an outage you should have caught.

**Purpose:** Create a code-driven observability stack that queries metrics, fires alarms, publishes custom metrics, and searches logs without depending on the OCI Console.

## Prerequisites

- **OCI tenancy** with an API signing key in `~/.oci/config`
- **Python 3.8+** with `pip install oci`
- **Compartment OCID** containing the resources to monitor
- **IAM policies** granting `manage alarms` and `read metrics` in the target compartment
- **Notification topic** created for alarm destinations (or create one in Step 4)

## Instructions

### Step 1: Query Metrics with MonitoringClient

OCI publishes built-in metrics for compute, networking, block storage, and more. Query them programmatically:

```python
import oci
from datetime import datetime, timedelta

config = oci.config.from_file("~/.oci/config")
monitoring = oci.monitoring.MonitoringClient(config)

# Query CPU utilization for all instances in a compartment
response = monitoring.summarize_metrics_data(
    compartment_id="ocid1.compartment.oc1..example",
    summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
        namespace="oci_computeagent",
        query='CpuUtilization[5m]{availabilityDomain = "Uocm:US-ASHBURN-AD-1"}.mean()',
        start_time=(datetime.utcnow() - timedelta(hours=1)).isoformat() + "Z",
        end_time=datetime.utcnow().isoformat() + "Z"
    )
)

for metric in response.data:
    for dp in metric.aggregated_datapoints:
        print(f"{dp.timestamp}: {dp.value:.1f}% CPU")
```

### Step 2: Create Alarm Rules

Alarms trigger when a metric crosses a threshold. Create them via SDK so they survive Console UI changes:

```python
monitoring.create_alarm(
    oci.monitoring.models.CreateAlarmDetails(
        display_name="High CPU Alert",
        compartment_id="ocid1.compartment.oc1..example",
        metric_compartment_id="ocid1.compartment.oc1..example",
        namespace="oci_computeagent",
        query='CpuUtilization[5m].mean() > 80',
        severity="CRITICAL",
        body="CPU utilization exceeded 80% for 5 minutes.",
        destinations=["ocid1.onstopic.oc1..example"],
        is_enabled=True,
        pending_duration="PT5M",
        repeat_notification_duration="PT15M"
    )
)
print("Alarm created: High CPU Alert")
```

### Step 3: Publish Custom Metrics

Push application-level metrics into OCI Monitoring so they can trigger the same alarm system:

```python
from datetime import datetime

monitoring.post_metric_data(
    oci.monitoring.models.PostMetricDataDetails(
        metric_data=[
            oci.monitoring.models.MetricDataDetails(
                namespace="custom_app",
                compartment_id="ocid1.compartment.oc1..example",
                name="RequestLatencyMs",
                dimensions={"service": "api-gateway", "endpoint": "/v1/orders"},
                datapoints=[
                    oci.monitoring.models.Datapoint(
                        timestamp=datetime.utcnow().isoformat() + "Z",
                        value=142.5
                    )
                ]
            )
        ]
    )
)
print("Custom metric published: RequestLatencyMs = 142.5ms")
```

### Step 4: Set Up Notifications

Create a notification topic and email subscription to receive alarm alerts:

```python
notifications = oci.ons.NotificationDataPlaneClient(config)
control_plane = oci.ons.NotificationControlPlaneClient(config)

# Create topic
topic = control_plane.create_topic(
    oci.ons.models.CreateTopicDetails(
        name="infra-alerts",
        compartment_id="ocid1.compartment.oc1..example",
        description="Infrastructure alarm notifications"
    )
).data

# Subscribe an email endpoint
notifications.create_subscription(
    oci.ons.models.CreateSubscriptionDetails(
        topic_id=topic.topic_id,
        compartment_id="ocid1.compartment.oc1..example",
        protocol="EMAIL",
        endpoint="[email protected]"
    )
)
print(f"Topic created: {topic.topic_id}")
```

### Step 5: Search Logs

Query the OCI Logging service to find specific events across your infrastructure:

```python
logging_search = oci.loggingsearch.LogSearchClient(config)

results = logging_search.search_logs(
    oci.loggingsearch.models.SearchLogsDetails(
        time_start=(datetime.utcnow() - timedelta(hours=1)).isoformat() + "Z",
        time_end=datetime.utcnow().isoformat() + "Z",
        search_query=(
            'search "ocid1.compartment.oc1..example" '
            '| where data.statusCode = 500'
        ),
        is_return_field_info=False
    )
)

for log_entry in results.data.results:
    print(f"{log_entry.data}")
```

### Step 6: Health Check Probes

Monitor endpoint availability with OCI Health Checks:

```python
health = oci.healthchecks.HealthChecksClient(config)

health.create_http_monitor(
    oci.healthchecks.models.CreateHttpMonitorDetails(
        compartment_id="ocid1.compartment.oc1..example",
        display_name="API Health Check",
        targets=["api.example.com"],
        protocol="HTTPS",
        port=443,
        path="/health",
        interval_in_seconds=30,
        timeout_in_seconds=10,
        is_enabled=True
    )
)
print("Health check probe created: api.example.com/health every 30s")
```

## Output

Successful completion produces:

- Metric queries returning CPU, memory, and network data for your compartment
- Alarm rules that fire to notification topics when thresholds are breached
- Custom application metrics published to OCI Monitoring
- A notification topic with email subscription for alert delivery
- Log search queries for troubleshooting 500 errors and other events
- HTTP health check probes for endpoint availability monitoring

## Error Handling

| Error | Code | Cause | Solution |
|-------|------|-------|----------|
| NotAuthenticated | 401 | Bad API key or expired config | Verify `~/.oci/config` fingerprint matches your API key |
| NotAuthorizedOrNotFound | 404 | Missing IAM policy for monitoring | Add: `Allow group X to manage alarms in compartment Y` |
| TooManyRequests | 429 | Rate limited on metric queries | Reduce query frequency; cache results for dashboards |
| InternalError | 500 | OCI Monitoring service issue | Check [OCI Status](https://ocistatus.oraclecloud.com) and retry |
| InvalidParameter | 400 | Wrong MQL query syntax | Verify namespace and metric name; use `list_metrics` to discover available metrics |
| ServiceError status -1 | N/A | Request timeout on large queries | Narrow the time window or add dimension filters |

## Examples

**Quick metric check with OCI CLI:**

```bash
# List available metric namespaces
oci monitoring metric list \
  --compartment-id ocid1.compartment.oc1..example \
  --namespace oci_computeagent

# List all alarms
oci monitoring alarm list \
  --compartment-id ocid1.compartment.oc1..example
```

**List all metrics in a namespace to discover what's available:**

```python
import oci

config = oci.config.from_file("~/.oci/config")
monitoring = oci.monitoring.MonitoringClient(config)

metrics = monitoring.list_metrics(
    compartment_id="ocid1.compartment.oc1..example",
    list_metrics_details=oci.monitoring.models.ListMetricsDetails(
        namespace="oci_computeagent"
    )
).data

for m in metrics:
    print(f"{m.name} — dimensions: {m.dimensions}")
```

## Resources

- [OCI Monitoring](https://docs.oracle.com/en-us/iaas/Content/Monitoring/home.htm) — metrics, alarms, and MQL query language
- [OCI Logging](https://docs.oracle.com/en-us/iaas/Content/Logging/h

Files: 2

Size: 10.7 KB

Complexity: 30/100

Category: Cloud & DevOps

Source: https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/oraclecloud-pack/skills/oraclecloud-observability

Related in Cloud & DevOps

appbuilder-action-scaffolder

Included

Create, implement, deploy, and debug Adobe Runtime actions with consistent layout, validation, and error handling. Use this skill whenever the user needs to add actions to an App Builder project, understand action structure (params, response format, web/raw actions), configure actions in the manifest, use App Builder SDKs (State, Files, Events, database), deploy and invoke actions via CLI, debug action issues, or implement patterns such as webhook receivers, custom event providers, journaling consumers, large payload redirects, action sequence pipelines, and Asset Compute workers. Also trigger when users mention serverless functions in Adobe context, action logging, IMS authentication for actions, or cron-style scheduled actions.

Cloud & DevOpsscripts

orchestrating-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. Use this skill when the user needs a multi-step Data Cloud pipeline, cross-phase troubleshooting, or data space and data kit management. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase sf data360 workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching phase-specific skill), the task is STDM/session tracing/parquet telemetry (use observing-agentforce), standard CRM SOQL (use querying-soql), or Apex implementation (use generating-apex).

Cloud & DevOpsscripts

github-project-automation

Included

Automate GitHub repository setup with CI/CD workflows, issue templates, Dependabot, and CodeQL security scanning. Includes 12 production-tested workflows and prevents 18 errors: YAML syntax, action pinning, and configuration. Use when: setting up GitHub Actions CI/CD, creating issue/PR templates, enabling Dependabot or CodeQL scanning, deploying to Cloudflare Workers, implementing matrix testing, or troubleshooting YAML indentation, action version pinning, secrets syntax, runner versions, or CodeQL configuration. Keywords: github actions, github workflow, ci/cd, issue templates, pull request templates, dependabot, codeql, security scanning, yaml syntax, github automation, repository setup, workflow templates, github actions matrix, secrets management, branch protection, codeowners, github projects, continuous integration, continuous deployment, workflow syntax error, action version pinning, runner version, github context, yaml indentation error

Cloud & DevOpsscripts

sf-datacloud

Included

Salesforce Data Cloud product orchestrator for connect→prepare→harmonize→segment→act workflows. TRIGGER when: user needs a multi-step Data Cloud pipeline, asks to set up or troubleshoot Data Cloud across phases, manages data spaces or data kits, or wants a cross-phase `sf data360` workflow. DO NOT TRIGGER when: work is isolated to a single phase (use the matching sf-datacloud-* skill), the task is STDM/session tracing/parquet telemetry (use sf-ai-agentforce-observability), standard CRM SOQL (use sf-soql), or Apex implementation (use sf-apex).

Cloud & DevOpsscripts

fabric-cli

Included

Use this skill for Fabric.so CLI workflows with the `fabric` terminal command: diagnose/install/login, search or browse a Fabric library, save notes/links/files, create folders, ask the Fabric AI assistant, manage tasks/workspaces, generate shell completion, check subscription usage, produce JSON output, and use Fabric as persistent agent memory. Do not use for Microsoft Fabric/Azure/Power BI `fab`, Daniel Miessler's Fabric framework, Python Fabric SSH, Fabric.js, or textile/fashion fabric.

Cloud & DevOpsscripts

lark

Included

Lark/Feishu CLI skills: lark-cli operations for docs, markdown, sheets, base, calendar, im, mail, task, okr, drive, wiki, slides, whiteboard, apps, approval, attendance, contact, vc, minutes, event. Use when the user needs to operate Lark/Feishu resources via lark-cli, send messages, manage documents, spreadsheets, calendars, tasks, OKRs, deploy web pages, or any Feishu/Lark workspace operations.

Cloud & DevOpsscripts