diff --git a/README.md b/README.md index 3937a5c..e9e0ea6 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ Copy skills to your agent's skills location: | Plugin | Description | |--------|-------------| -| [sapcc](plugins/sapcc/) | All SAP CC skills and MCP server configuration. Covers compute, networking, storage, identity, quota, audit, metrics, registry, and endpoint services. | +| [sapcc](plugins/sapcc/) | All SAP CC skills and MCP server configuration. Covers compute, networking, storage, identity, quota, audit, metrics, registry, endpoint services, DNS, secrets, object storage, file systems, load balancing, images, bare metal, and autoscaling. | ### Skills @@ -60,6 +60,14 @@ Copy skills to your agent's skills location: | [sapcc-metrics](plugins/sapcc/skills/sapcc-metrics/) | Maia | PromQL queries, metric discovery, monitoring | | [sapcc-registry](plugins/sapcc/skills/sapcc-registry/) | Keppel | Container images, vulnerability status, federation | | [sapcc-connectivity](plugins/sapcc/skills/sapcc-connectivity/) | Archer | Private endpoint services, service discovery | +| [sapcc-dns](plugins/sapcc/skills/sapcc-dns/) | Designate | DNS zones, recordsets, FQDN management | +| [sapcc-secrets](plugins/sapcc/skills/sapcc-secrets/) | Barbican | Secret metadata, certificate inventory, audit | +| [sapcc-object-storage](plugins/sapcc/skills/sapcc-object-storage/) | Swift | Containers, objects, storage inspection | +| [sapcc-filesystems](plugins/sapcc/skills/sapcc-filesystems/) | Manila | Shared NFS/CIFS file systems | +| [sapcc-loadbalancer](plugins/sapcc/skills/sapcc-loadbalancer/) | Octavia | Load balancers, listeners, pools | +| [sapcc-images](plugins/sapcc/skills/sapcc-images/) | Glance | VM images, snapshots, boot sources | +| [sapcc-baremetal](plugins/sapcc/skills/sapcc-baremetal/) | Ironic | Physical server nodes, provisioning | +| [sapcc-autoscaling](plugins/sapcc/skills/sapcc-autoscaling/) | Castellum | Automatic quota scaling, resize operations | | [credential-setup](plugins/sapcc/skills/credential-setup/) | Keystone | Guided auth setup with keychain storage | ### Rules @@ -85,7 +93,7 @@ Skills use progressive disclosure: 3. Reference files load on-demand for deep-dive content 4. Skill context releases when the task completes -10 skills installed = ~500 tokens at startup. Full context only when needed. +18 skills installed = ~900 tokens at startup. Full context only when needed. ## Security Philosophy diff --git a/plugins/sapcc/skills/sapcc-autoscaling/SKILL.md b/plugins/sapcc/skills/sapcc-autoscaling/SKILL.md new file mode 100644 index 0000000..368925b --- /dev/null +++ b/plugins/sapcc/skills/sapcc-autoscaling/SKILL.md @@ -0,0 +1,168 @@ +--- +name: sapcc-autoscaling +description: > + Autoscaling operations via Castellum in SAP Converged Cloud. + Triggers: autoscaling, castellum, auto-resize, quota scaling, capacity management, resize operation +version: 1.0.0 +metadata: + service: [castellum] + task: [inspect, monitor, debug] + persona: [platform-engineer, devops] +--- + +# SAP CC Autoscaling (Castellum) + +Inspect Castellum autoscaling: check resource configurations, view pending operations, and diagnose failed resize attempts. Castellum automatically adjusts project quotas and resource sizes based on configured thresholds. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `castellum_get_project_resources` | Get autoscaling config and status for a project | `project_id` (UUID, required) | +| `castellum_list_pending_operations` | List scheduled but incomplete resize operations | `project_id`, `asset_type` | +| `castellum_list_recently_failed_operations` | List recent resize failures | `project_id`, `asset_type`, `max_age` (default: 1d) | + +## What Castellum Does + +Castellum watches resource usage and automatically resizes when thresholds are hit: + +``` +Usage crosses HIGH threshold → Castellum schedules UPSIZE +Usage drops below LOW threshold → Castellum schedules DOWNSIZE +``` + +Assets it manages: +- `project-quota:compute:cores` — vCPU quota +- `project-quota:compute:ram` — RAM quota +- `project-quota:compute:instances` — instance count quota +- `project-quota:block-storage:capacity` — volume storage +- NFS share sizes, server root disks, etc. + +## Gotchas + +### 1. Castellum manages QUOTA, not individual resources + +Castellum doesn't scale your application. It adjusts quotas and resource sizes. For example, it can increase your project's compute cores quota when usage exceeds 80%, but it doesn't create new servers. + +### 2. project_id is required — you must know which project + +Unlike other tools, Castellum requires an explicit project_id. Get it from `keystone_token_info` or `keystone_list_projects`. + +### 3. Operations can be PENDING without issues + +A pending operation means a resize is scheduled. This is normal — Castellum batches operations and may wait for cooldown periods between resizes. + +### 4. Failed operations have a reason + +`castellum_list_recently_failed_operations` shows WHY a resize failed. Common reasons: +- Quota exceeded at the domain level (project can't grow) +- Backend capacity exhausted +- Conflicting resize already in progress + +### 5. max_age controls the failure lookback window + +Default is `1d` (24 hours). Use `7d` for broader investigation, `12h` for recent issues only. Accepts Go duration format: `s`, `m`, `h`, `d`. + +### 6. asset_type filters are specific strings + +Format: `project-quota::`. Examples: +- `project-quota:compute:cores` +- `project-quota:compute:ram` +- `project-quota:block-storage:capacity` + +### 7. Castellum only acts on configured resources + +Not all resources have autoscaling configured. `castellum_get_project_resources` shows which resources ARE configured and their thresholds. No configuration = no autoscaling. + +### 8. Cooldown prevents thrashing + +After a resize, Castellum waits before acting again. If you see a resource at threshold but no pending operation, it may be in cooldown. + +## Common Workflows + +### Check Autoscaling Configuration + +``` +1. keystone_token_info → get current project_id +2. castellum_get_project_resources(project_id=) +3. Review: which resources are configured, thresholds, current status +``` + +### Are There Pending Resizes? + +``` +1. castellum_list_pending_operations(project_id=) +2. If empty: no scheduled resizes (normal) +3. If populated: review what's being resized and when +``` + +### Diagnose Autoscaling Failures + +``` +1. castellum_list_recently_failed_operations(project_id=, max_age=7d) +2. Review failure reasons +3. Common: domain quota ceiling hit, backend capacity +4. Cross-reference with Limes: limes_get_project_quota → is project at domain cap? +``` + +### "Why didn't my quota grow?" + +``` +1. castellum_get_project_resources(project_id) → is the resource configured? +2. If not configured: autoscaling won't act +3. If configured: check thresholds — is usage actually above HIGH? +4. castellum_list_recently_failed_operations → did it try and fail? +5. Check cooldown — did it resize recently and is waiting? +``` + +### Correlate with Quota + +``` +1. castellum_get_project_resources(project_id) → autoscaling config +2. limes_get_project_quota → current quota and usage +3. Compare: is usage near threshold? Has quota been growing? +``` + +## Troubleshooting + +### No autoscaling configured for a resource + +- Castellum configuration is per-resource, per-project +- Not all projects have autoscaling enabled +- Configuration requires admin or project-admin access (not via MCP tools) + +### Operations keep failing + +- Check `castellum_list_recently_failed_operations(max_age=7d)` for patterns +- If "domain quota exceeded": the project has hit its domain-level cap. Need domain admin to increase domain quota. +- If "backend capacity": physical capacity exhausted in the region/AZ +- If "conflicting operation": wait for the existing operation to complete + +### Autoscaling seems too slow + +- Castellum has deliberate cooldown periods (typically 5-15 minutes) +- It batches multiple threshold crossings +- For urgent needs: manual quota adjustment via Limes is faster + +### Resource at threshold but no pending operation + +- Cooldown period active (recent resize within last N minutes) +- Castellum may not have polled yet (typical interval: 5 minutes) +- Resource may not be configured for autoscaling + +## Security Considerations + +- Autoscaling configuration reveals capacity management strategy +- Failed operations reveal infrastructure limits and bottlenecks +- project_id in URLs is validated (UUID format) to prevent injection +- Autoscaling policies are read-only via MCP — no configuration changes possible + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Current quota and usage | Limes | `limes_get_project_quota(project_id=)` | +| Domain-level quota cap | Limes | `limes_get_domain_quota(domain_id=)` | +| Project identity | Keystone | `keystone_token_info`, `keystone_list_projects` | +| Who changed autoscaling config | Hermes | `hermes_list_events(target_type=castellum)` | +| Compute resources being scaled | Nova | `nova_list_servers` (to see actual usage) | diff --git a/plugins/sapcc/skills/sapcc-baremetal/SKILL.md b/plugins/sapcc/skills/sapcc-baremetal/SKILL.md new file mode 100644 index 0000000..9790ccf --- /dev/null +++ b/plugins/sapcc/skills/sapcc-baremetal/SKILL.md @@ -0,0 +1,152 @@ +--- +name: sapcc-baremetal +description: > + Bare metal node management via Ironic in SAP Converged Cloud. + Triggers: bare metal, ironic, physical server, baremetal, BMC, IPMI, redfish, hardware +version: 1.0.0 +metadata: + service: [ironic] + task: [inspect, manage, debug] + persona: [platform-engineer] +--- + +# SAP CC Bare Metal (Ironic) + +Inspect Ironic bare metal nodes: list nodes, check provision/power states, and understand maintenance status. Ironic manages physical servers in the cloud, enabling bare metal provisioning alongside VMs. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `ironic_list_nodes` | List baremetal nodes | `provision_state`, `maintenance`, `driver`, `resource_class`, `instance_uuid`, `fault`, `owner` | +| `ironic_get_node` | Full detail for a single node | `node_id` (UUID or name) | + +## Security Note + +**BMC credentials (IPMI/Redfish passwords) are intentionally excluded from responses.** The MCP server strips `driver_info` and `properties` fields that may contain hardware management credentials. This is a security boundary. + +## Gotchas + +### 1. Provision state is the lifecycle — not power state + +| provision_state | Meaning | +|----------------|---------| +| `available` | Ready to be provisioned (no instance) | +| `active` | Running an instance | +| `deploying` | Instance being deployed onto the node | +| `cleaning` | Being wiped between tenants | +| `error` | Failed operation — needs investigation | +| `manageable` | Enrolled but not yet made available | + +### 2. Power state is separate from provision state + +A node can be `provision_state=active` but `power_state=power off` (unexpected shutdown). Power states: `power on`, `power off`, `None` (unknown). + +### 3. Maintenance mode = node excluded from scheduling + +When `maintenance=true`, Nova will not schedule new instances to this node. Existing instances may still be running. Maintenance is set manually by operators or automatically on repeated failures. + +### 4. instance_uuid links to Nova + +When a node has an instance deployed, `instance_uuid` contains the Nova server UUID. Use `nova_get_server(instance_uuid)` to see the VM running on this hardware. + +### 5. resource_class determines scheduling + +Nodes declare their resource class (e.g., `baremetal`, `baremetal.large`). Nova flavors reference these classes to match workloads to appropriate hardware. + +### 6. driver indicates management protocol + +Common drivers: `ipmi` (legacy BMC), `redfish` (modern REST-based BMC). The driver determines how the node is powered on/off and booted. + +### 7. fault indicates why a node is broken + +When a node enters error/maintenance, the `fault` field explains why: `power failure`, `clean failure`, `deploy failure`, etc. This is the first thing to check for broken nodes. + +### 8. Nodes are owned by projects + +The `owner` field shows which project can provision instances on this node. Filter by owner to see nodes allocated to your project. + +## Common Workflows + +### Inventory Bare Metal Nodes + +``` +1. ironic_list_nodes() +2. Review: name/UUID, provision_state, power_state, maintenance +3. Flag any in error state or maintenance +``` + +### Find Available Nodes + +``` +1. ironic_list_nodes(provision_state=available, maintenance=false) +2. These nodes are ready for instance deployment +3. Check resource_class to match with desired flavor +``` + +### Check What Instance Runs on a Node + +``` +1. ironic_get_node(node_id=) → note instance_uuid +2. nova_get_server(server_id=) → instance details +``` + +### Troubleshoot a Node in Error + +``` +1. ironic_get_node(node_id=) → check fault field +2. Check maintenance flag — was it set automatically? +3. Check provision_state for last failed transition +4. hermes_list_events(target_type=baremetal/node, target_id=) +``` + +### Find Nodes by Owner Project + +``` +1. ironic_list_nodes(owner=) +2. Shows all nodes allocated to that project +3. Combine with provision_state filter for specific views +``` + +## Troubleshooting + +### Node in error state + +- Check `fault` field first — it describes the failure +- Common faults: `power failure` (BMC unreachable), `clean failure` (disk wipe failed), `deploy failure` (image deployment failed) +- Check Hermes for the triggering event + +### Node stuck in "deploying" + +- Deployment may have timed out +- Check if the node lost network connectivity during deploy +- BMC may be unresponsive — the node can't be power-cycled + +### Node in maintenance unexpectedly + +- May have been set automatically after repeated failures +- Check `maintenance_reason` in node details +- Requires operator intervention to clear maintenance flag + +### Power state is "None" + +- BMC is unreachable — can't determine actual power state +- Check network connectivity to the BMC/management network +- Driver may need reconfiguration + +## Security Considerations + +- Node listings reveal physical infrastructure topology +- resource_class and driver info reveal hardware types and management protocols +- instance_uuid mapping reveals which workloads run on which physical hardware +- Physical access to BMC = full control of hardware — BMC credential exposure is critical +- Maintenance patterns reveal infrastructure health/reliability + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Instance on a node | Nova | `nova_get_server()` | +| Who modified node state | Hermes | `hermes_list_events(target_type=baremetal/node)` | +| Node network ports | Neutron | `neutron_list_ports(device_id=)` | +| Compute quota for baremetal | Limes | `limes_get_project_quota(service=compute)` | diff --git a/plugins/sapcc/skills/sapcc-dns/SKILL.md b/plugins/sapcc/skills/sapcc-dns/SKILL.md new file mode 100644 index 0000000..6dabf01 --- /dev/null +++ b/plugins/sapcc/skills/sapcc-dns/SKILL.md @@ -0,0 +1,126 @@ +--- +name: sapcc-dns +description: > + DNS zone and recordset management via Designate in SAP Converged Cloud. + Triggers: dns, zone, recordset, designate, domain, A record, CNAME, MX, TXT, nameserver +version: 1.0.0 +metadata: + service: [designate] + task: [manage, inspect, debug] + persona: [developer, platform-engineer] +--- + +# SAP CC DNS (Designate) + +Manage DNS zones and recordsets: list zones, inspect zone details, and query recordsets. Designate is OpenStack's multi-tenant DNS-as-a-Service. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `designate_list_zones` | List DNS zones in current project | `name`, `status`, `type` (returns: ID, name, email, TTL, status, type, serial, created_at) | +| `designate_get_zone` | Full detail for a single zone | `zone_id` (UUID) | +| `designate_list_recordsets` | List recordsets in a zone | `zone_id` (required), `name`, `type`, `status`, `data` | + +## Gotchas + +### 1. Zones are project-scoped — you cannot see other projects' zones + +Each project manages its own DNS zones. If you expect to see a zone but don't, verify you're authenticated to the correct project with `keystone_token_info`. + +### 2. Zone names MUST end with a dot + +DNS convention requires fully qualified domain names (FQDNs) to end with a trailing dot. `example.com.` is correct; `example.com` may not match filters. The API returns names with trailing dots. + +### 3. Recordsets require a zone_id — you cannot list all recordsets globally + +You must first identify the zone, then list its recordsets. Workflow: `designate_list_zones` → pick zone → `designate_list_recordsets(zone_id=...)`. + +### 4. Status transitions: PENDING → ACTIVE + +After creation or modification, zones and recordsets go through `PENDING` status before becoming `ACTIVE`. A `PENDING` record is not yet propagated to DNS servers. + +### 5. Recordset types matter for filtering + +Common types: `A` (IPv4), `AAAA` (IPv6), `CNAME` (alias), `MX` (mail), `TXT` (arbitrary text, SPF, DKIM), `SRV` (service locator), `NS` (nameserver delegation). + +### 6. The `data` filter searches record values + +Use `data` to find records pointing to a specific IP or target. For example, `data=10.0.1.5` finds all A records pointing to that IP. Useful for "what DNS names point to this server?" + +### 7. TTL controls cache duration + +TTL (Time To Live) in seconds controls how long resolvers cache the record. Low TTL (60-300s) = faster propagation of changes. High TTL (3600-86400s) = less DNS traffic but slower updates. Zone-level TTL is the default; recordset-level TTL overrides it. + +### 8. Zone type PRIMARY vs SECONDARY + +`PRIMARY` zones are authoritative — you manage records directly. `SECONDARY` zones are replicas of an external primary — records are read-only copies. Most user zones are PRIMARY. + +## Common Workflows + +### Discover DNS Zones in Project + +``` +1. designate_list_zones() +2. Review zones — note name, status, type +3. designate_get_zone(zone_id) for full detail +``` + +### Find All Records in a Zone + +``` +1. designate_list_zones() → identify target zone +2. designate_list_recordsets(zone_id=) +3. Scan results for A, CNAME, MX, TXT records +``` + +### "What DNS points to this IP?" + +``` +1. designate_list_zones() → get all zones +2. For each zone: designate_list_recordsets(zone_id=, data=) +3. Matches show which names resolve to that IP +``` + +### Verify DNS Configuration for a Service + +``` +1. designate_list_zones(name=) → find the zone +2. designate_list_recordsets(zone_id=, name=) → find specific record +3. Check: correct type, correct data, status=ACTIVE +``` + +## Troubleshooting + +### Zone not found + +- Verify zone name includes trailing dot: `example.com.` +- Check you're in the correct project: `keystone_token_info` +- Zone may be in another project — DNS is project-scoped + +### Recordset status is PENDING for > 5 minutes + +- May indicate a backend issue — check Hermes audit trail +- `hermes_list_events(target_type=dns/recordset, outcome=failure)` + +### DNS not resolving despite ACTIVE status + +- Check TTL — old cached value may not have expired at resolver +- Verify the zone's NS records point to correct nameservers +- Ensure the zone itself is ACTIVE (not just the recordset) + +## Security Considerations + +- DNS records reveal infrastructure topology (server IPs, service names) +- TXT records may contain verification tokens, SPF policies, or DKIM keys +- MX records expose mail server infrastructure +- Treat zone data as internal — it maps your service architecture + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Server at an IP address | Nova | `nova_list_servers(ip=
)` | +| Who modified DNS records | Hermes | `hermes_list_events(target_type=dns/recordset)` | +| Load balancer VIP in DNS | Octavia | `octavia_get_loadbalancer` → check VIP address | +| Network for a DNS-referenced IP | Neutron | `neutron_list_ports` | diff --git a/plugins/sapcc/skills/sapcc-filesystems/SKILL.md b/plugins/sapcc/skills/sapcc-filesystems/SKILL.md new file mode 100644 index 0000000..c4013c8 --- /dev/null +++ b/plugins/sapcc/skills/sapcc-filesystems/SKILL.md @@ -0,0 +1,125 @@ +--- +name: sapcc-filesystems +description: > + Shared file system management via Manila in SAP Converged Cloud. + Triggers: shared filesystem, manila, NFS, CIFS, file share, network storage, mount +version: 1.0.0 +metadata: + service: [manila] + task: [inspect, manage, debug] + persona: [developer, platform-engineer] +--- + +# SAP CC Shared Filesystems (Manila) + +Manage Manila shared file systems: list shares, inspect details, and understand access states. Manila provides network-attached storage (NFS, CIFS) that can be mounted by multiple servers simultaneously. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `manila_list_shares` | List file shares in current project | `name`, `status`, `share_proto` (returns: ID, name, status, protocol, size, availability zone) | +| `manila_get_share` | Full detail for a single share | `share_id` (UUID) | + +## Gotchas + +### 1. Shares are network-mounted — different from block storage + +Unlike Cinder volumes (attached to one server via iSCSI/FC), Manila shares are network file systems (NFS/CIFS) accessible by multiple servers simultaneously. Use Manila for shared data, Cinder for dedicated block devices. + +### 2. Protocol determines mount method + +| Protocol | Mount Style | Typical Use | +|----------|-------------|-------------| +| NFS | `mount -t nfs /mnt` | Linux servers, most common | +| CIFS | `mount -t cifs // /mnt` | Windows or mixed environments | +| GlusterFS | Gluster mount | Distributed storage | +| CephFS | Ceph FUSE/kernel mount | Ceph-backed storage | + +### 3. Status "available" = ready to mount + +Only shares in `available` status can be mounted. + +### 4. Size is in GiB + +Same as Cinder — all capacity reported in gibibytes. + +### 5. Shares need access rules to be mountable + +A share in `available` status still requires access rules (IP-based or user-based) before any server can mount it. + +### 6. Availability zone affects which servers can mount + +Cross-AZ mounting may not be supported. Ensure the share's AZ matches the servers that need it. + +### 7. Share network provides the connection path + +Shares are associated with a share network (a Neutron network/subnet). Servers must have connectivity to that network. + +## Common Workflows + +### List All File Shares + +``` +1. manila_list_shares() +2. Review: name, protocol, status, size, AZ +``` + +### Inspect a Specific Share + +``` +1. manila_list_shares(name=) → find the share +2. manila_get_share(share_id=) → full details +3. Note: export_location shows the mount path +``` + +### Find NFS Shares Available for Mounting + +``` +1. manila_list_shares(share_proto=NFS, status=available) +2. Note export_location for mount commands +3. Verify AZ matches your server's AZ +``` + +### Troubleshoot a Share in Error State + +``` +1. manila_get_share(share_id=) → check status +2. hermes_list_events(target_type=manila/share, target_id=) +3. Common causes: backend capacity, network issue, quota exceeded +``` + +## Troubleshooting + +### Share stuck in "creating" + +- Backend provisioning may be slow for large shares +- If > 10 minutes, likely a backend issue +- Check Hermes: `hermes_list_events(target_type=manila/share, outcome=failure)` + +### Cannot mount despite "available" status + +- Check access rules (not visible via MCP tools) +- Verify network connectivity between server and share network +- Check security groups allow NFS traffic (port 2049) or CIFS (port 445) +- Verify server is in the same AZ + +### Quota exhausted + +- `limes_get_project_quota(service=sharev2)` → check usage vs quota + +## Security Considerations + +- Share export locations reveal network topology and storage paths +- NFS shares may have permissive access rules (entire subnet) +- Shared access means data accessible to multiple servers — ensure restrictive access rules +- Audit share creation/deletion via Hermes for compliance + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Servers that can mount a share | Nova | `nova_list_servers` (filter by AZ) | +| Network for share connectivity | Neutron | `neutron_list_networks`, `neutron_list_subnets` | +| Who created/modified shares | Hermes | `hermes_list_events(target_type=manila/share)` | +| Share quota remaining | Limes | `limes_get_project_quota(service=sharev2)` | diff --git a/plugins/sapcc/skills/sapcc-images/SKILL.md b/plugins/sapcc/skills/sapcc-images/SKILL.md new file mode 100644 index 0000000..ce280ef --- /dev/null +++ b/plugins/sapcc/skills/sapcc-images/SKILL.md @@ -0,0 +1,144 @@ +--- +name: sapcc-images +description: > + Image management via Glance in SAP Converged Cloud. + Triggers: image, glance, VM image, OS image, snapshot, AMI, disk image, boot image +version: 1.0.0 +metadata: + service: [glance] + task: [inspect, manage, debug] + persona: [developer, platform-engineer] +--- + +# SAP CC Images (Glance) + +Inspect Glance images: list available images, check details, and understand image properties. Glance stores disk images used to boot Nova servers. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `glance_list_images` | List images available to current project | `name`, `status`, `visibility`, `owner` (returns: ID, name, status, visibility, disk/container format, size) | +| `glance_get_image` | Full detail for a single image | `image_id` (UUID) | + +## Gotchas + +### 1. Visibility controls who can see and use an image + +| Visibility | Who can see | Who can use | +|------------|-------------|-------------| +| `public` | Everyone | Everyone | +| `private` | Owner project only | Owner project only | +| `shared` | Owner + explicitly shared projects | Owner + shared projects | +| `community` | Everyone | Everyone (but not in default listings) | + +Most production images are `public` (provided by platform team) or `private` (project-specific snapshots). + +### 2. Status "active" = ready to use + +Only `active` images can be used to boot servers. Other statuses: +- `queued`: Metadata created, no data uploaded yet +- `saving`: Data currently being uploaded +- `deactivated`: Administratively disabled (cannot boot, but data exists) +- `killed`: Upload failed + +### 3. Size is in bytes — can be very large + +Image sizes are raw bytes. A typical Linux image is 2-10 GB. Convert: `size / 1024 / 1024 / 1024` for GiB. + +### 4. disk_format and container_format matter for compatibility + +| disk_format | Description | +|-------------|-------------| +| `vmdk` | VMware (most common in SAP CC) | +| `raw` | Uncompressed disk | +| `qcow2` | QEMU/KVM compressed | +| `vhd` | Hyper-V | + +Container format is almost always `bare` in practice. + +### 5. Public images are platform-provided — don't delete them + +Images with `visibility=public` are maintained by the SAP CC platform team. Your project uses them but doesn't own them. You can only modify/delete `private` images you own. + +### 6. Image properties contain OS metadata + +`glance_get_image` returns properties like `os_type`, `os_distro`, `os_version`, `hw_vif_model`, `hypervisor_type`. Use these to identify the operating system and compatibility requirements. + +### 7. Snapshots are private images created from servers + +When you snapshot a server, it creates a private Glance image. These consume image quota and storage. Old snapshots should be cleaned up. + +### 8. owner field is a project UUID + +The `owner` filter accepts a project UUID. Use `keystone_list_projects` to find project UUIDs if needed. + +## Common Workflows + +### Find Available Boot Images + +``` +1. glance_list_images(visibility=public, status=active) +2. Review: name, size, disk_format +3. Look for naming patterns: "Ubuntu 22.04", "SLES 15 SP5", "Windows 2022" +``` + +### Find Project Snapshots + +``` +1. glance_list_images(visibility=private, owner=) +2. These are your project's server snapshots +3. Check sizes and dates for cleanup candidates +``` + +### Inspect Image Before Booting + +``` +1. glance_get_image(image_id=) +2. Check: status=active, disk_format compatible with your hypervisor +3. Note: min_disk, min_ram requirements +4. Review os_distro, os_version for the OS +``` + +### Find Image Used by a Server + +``` +1. nova_get_server(server_id) → note image reference +2. glance_get_image(image_id) → full image details +``` + +## Troubleshooting + +### Image not found + +- Image may be private to another project +- Image may have been deleted — check Hermes audit trail +- Try without filters to see all accessible images + +### Cannot boot server from image + +- Check image status is `active` (not `deactivated` or `queued`) +- Check `min_disk` and `min_ram` — flavor must meet minimums +- Verify disk_format is compatible with the target hypervisor + +### Image in "queued" status for a long time + +- Upload may have failed silently +- Check Hermes: `hermes_list_events(target_type=image, target_id=)` +- May need to delete and re-upload + +## Security Considerations + +- Private images may contain sensitive configurations or credentials baked in +- Image names and properties reveal infrastructure stack (OS versions, patch levels) +- Old, unpatched images are a security risk — check os_version against known CVEs +- Snapshots may capture ephemeral credentials or session data from running servers + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Servers using an image | Nova | `nova_list_servers(image=)` | +| Who created/deleted images | Hermes | `hermes_list_events(target_type=image)` | +| Image quota usage | Limes | `limes_get_project_quota(service=image)` | +| Flavors meeting min_disk/min_ram | Nova | `nova_list_flavors` | diff --git a/plugins/sapcc/skills/sapcc-loadbalancer/SKILL.md b/plugins/sapcc/skills/sapcc-loadbalancer/SKILL.md new file mode 100644 index 0000000..022e4d0 --- /dev/null +++ b/plugins/sapcc/skills/sapcc-loadbalancer/SKILL.md @@ -0,0 +1,164 @@ +--- +name: sapcc-loadbalancer +description: > + Load balancer management via Octavia in SAP Converged Cloud. + Triggers: load balancer, octavia, listener, pool, VIP, health monitor, L7, reverse proxy, LB +version: 1.0.0 +metadata: + service: [octavia] + task: [inspect, manage, debug] + persona: [developer, platform-engineer] +--- + +# SAP CC Load Balancers (Octavia) + +Inspect Octavia load balancers: list LBs, view listeners and pools, and troubleshoot connectivity. Octavia provides L4/L7 load balancing as a service. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `octavia_list_loadbalancers` | List LBs in current project | `name`, `provisioning_status`, `operating_status`, `vip_address`, `vip_subnet_id`, `provider` | +| `octavia_get_loadbalancer` | Full detail for a single LB | `loadbalancer_id` (UUID) | +| `octavia_list_listeners` | List listeners | `loadbalancer_id`, `name`, `protocol`, `protocol_port` | +| `octavia_list_pools` | List backend pools | `loadbalancer_id`, `name`, `protocol`, `lb_algorithm` | + +## Octavia Object Model + +``` +Load Balancer (VIP address) +├── Listener (protocol:port — e.g., HTTPS:443) +│ └── Pool (backend group) +│ ├── Member (backend server:port) +│ ├── Member +│ └── Health Monitor (checks member health) +└── Listener (HTTP:80) + └── Pool + └── Members... +``` + +## Gotchas + +### 1. Two status fields — provisioning vs operating + +| Field | Meaning | +|-------|---------| +| `provisioning_status` | Infrastructure state: ACTIVE, PENDING_CREATE, PENDING_UPDATE, ERROR | +| `operating_status` | Traffic state: ONLINE, OFFLINE, DEGRADED, ERROR, NO_MONITOR | + +A LB can be `provisioning_status=ACTIVE` but `operating_status=DEGRADED` if some members are down. + +### 2. VIP address is the entry point + +The Virtual IP (VIP) is what clients connect to. It's on a specific subnet. DNS should point to this address. The VIP does NOT change when backend members are added/removed. + +### 3. Listeners define what traffic to accept + +Each listener binds protocol+port. You cannot have two listeners on the same port. Common patterns: +- HTTPS:443 (with TLS termination via Barbican cert) +- HTTP:80 (redirect to HTTPS or direct) +- TCP:3306 (database passthrough) + +### 4. Pools define where traffic goes + +A pool groups backend members (servers). Key attributes: +- `lb_algorithm`: ROUND_ROBIN, LEAST_CONNECTIONS, SOURCE_IP +- Members are server IP:port pairs +- Health monitor checks member availability + +### 5. TERMINATED_HTTPS means TLS terminates at the LB + +The LB decrypts HTTPS, forwards plain HTTP to backends. Requires a Barbican certificate reference. Backends only need to handle HTTP. + +### 6. Operating status NO_MONITOR = no health checks configured + +Without a health monitor, the LB cannot detect down members. Traffic goes to all members regardless of health. Always configure health monitors in production. + +### 7. Providers: amphora vs ovn + +- `amphora`: Full-featured (L7 rules, TLS termination, health monitors). Uses dedicated VMs. +- `ovn`: Lightweight L4 only. Lower overhead but fewer features. + +### 8. Filter listeners/pools by loadbalancer_id + +Without `loadbalancer_id` filter, you get ALL listeners/pools across all LBs in the project. Always filter by LB for a specific investigation. + +## Common Workflows + +### Inventory Load Balancers + +``` +1. octavia_list_loadbalancers() +2. Review: name, VIP address, provisioning/operating status +3. Flag any with operating_status != ONLINE +``` + +### Full LB Topology + +``` +1. octavia_get_loadbalancer(loadbalancer_id=) → VIP, status +2. octavia_list_listeners(loadbalancer_id=) → what ports are open +3. octavia_list_pools(loadbalancer_id=) → backend groups + members +``` + +### Find LB by VIP Address + +``` +1. octavia_list_loadbalancers(vip_address=) +2. Or: neutron_list_ports() and match fixed_ips to the VIP +``` + +### Troubleshoot Degraded LB + +``` +1. octavia_get_loadbalancer(loadbalancer_id) → check operating_status +2. octavia_list_pools(loadbalancer_id) → check pool operating_status +3. If DEGRADED: some members are failing health checks +4. Verify backend servers are running: nova_list_servers +5. Check security groups allow health check traffic: neutron_list_security_groups +``` + +## Troubleshooting + +### LB provisioning_status is ERROR + +- Check Hermes: `hermes_list_events(target_type=loadbalancer)` for failure details +- Common causes: subnet full (no IP for VIP), quota exhausted, backend unavailable +- May need to delete and recreate + +### LB operating_status is OFFLINE + +- All members are failing health checks +- Check backend servers are running and healthy +- Verify security groups allow traffic from the LB subnet to member ports + +### operating_status is DEGRADED + +- Some but not all members are unhealthy +- Identify failing members via pool status +- Common: one server crashed or is overloaded + +### Listener on port 443 but no HTTPS + +- Check if listener protocol is `TERMINATED_HTTPS` (needs Barbican cert) +- Or `TCP` (passthrough — TLS handled by backend) +- `HTTP` on 443 works but is plain HTTP on a non-standard port + +## Security Considerations + +- VIP addresses reveal public-facing services +- Listener protocols reveal what services are exposed +- Pool members reveal backend server topology +- Health monitor endpoints may be unauthenticated — check they don't expose sensitive data +- TERMINATED_HTTPS references Barbican certificates — cert rotation matters + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| VIP subnet details | Neutron | `neutron_list_subnets` | +| Backend server status | Nova | `nova_get_server()` | +| TLS certificates | Barbican | `barbican_get_secret` (cert referenced by listener) | +| Who modified the LB | Hermes | `hermes_list_events(target_type=loadbalancer)` | +| DNS pointing to VIP | Designate | `designate_list_recordsets(data=)` | +| LB quota | Limes | `limes_get_project_quota(service=network)` | diff --git a/plugins/sapcc/skills/sapcc-object-storage/SKILL.md b/plugins/sapcc/skills/sapcc-object-storage/SKILL.md new file mode 100644 index 0000000..c9259ba --- /dev/null +++ b/plugins/sapcc/skills/sapcc-object-storage/SKILL.md @@ -0,0 +1,126 @@ +--- +name: sapcc-object-storage +description: > + Object storage operations via Swift in SAP Converged Cloud. + Triggers: object storage, swift, container, blob, S3, bucket, object, file upload +version: 1.0.0 +metadata: + service: [swift] + task: [inspect, manage, debug] + persona: [developer, platform-engineer] +--- + +# SAP CC Object Storage (Swift) + +Inspect Swift object storage: list containers, browse objects, and retrieve object metadata. Swift provides S3-compatible, project-scoped object/blob storage. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `swift_list_containers` | List containers in current account | `prefix`, `limit` (returns: name, object count, total bytes) | +| `swift_list_objects` | List objects in a container | `container` (required), `prefix`, `delimiter`, `limit` (returns: name, bytes, content_type, last_modified, hash) | +| `swift_get_object_metadata` | Get metadata for a specific object | `container` (required), `object` (required) (returns: content_type, content_length, etag, last_modified) | + +## Gotchas + +### 1. These tools provide metadata only — no object content retrieval + +You can list containers, list objects, and get object metadata. You CANNOT download or read object content through these tools. This prevents accidentally dumping large binary files into the LLM context. + +### 2. Containers are flat namespaces with pseudo-directories + +Swift does not have real directories. Use `delimiter=/` to create pseudo-directory listings. Objects named `logs/2024/01/data.json` appear as directory `logs/` when using `delimiter=/`. + +### 3. Object count and bytes are at container level + +`swift_list_containers` shows aggregate stats per container. Use for quick capacity assessment. + +### 4. The `prefix` filter enables efficient browsing + +Use `prefix` to navigate pseudo-directories: +- `prefix=logs/` → all objects starting with "logs/" +- `prefix=logs/2024/` → narrow to a year +- Combined with `delimiter=/` → see only immediate "children" + +### 5. `limit` defaults to 100 + +Swift containers can hold millions of objects. Default returns first 100. Use prefix+delimiter for efficient navigation. + +### 6. `hash` is the MD5 of the object content (etag) + +Use it to verify object integrity or detect changes without downloading content. + +### 7. Container names are URL-safe strings, not UUIDs + +Unlike most OpenStack resources, containers are identified by name (not UUID). Case-sensitive. + +### 8. Object storage is eventually consistent for overwrites + +Read-after-write is consistent for new objects. Overwrites/deletes may take seconds to propagate. + +## Common Workflows + +### Inventory Storage Containers + +``` +1. swift_list_containers() +2. Review: container names, object counts, total bytes +3. Identify large containers or unusual names +``` + +### Browse Container Contents + +``` +1. swift_list_containers() → identify target +2. swift_list_objects(container=, delimiter="/") → top-level +3. swift_list_objects(container=, prefix=, delimiter="/") → drill down +``` + +### Check Object Details + +``` +1. swift_list_objects(container=, prefix=) → find object +2. swift_get_object_metadata(container=, object=) +3. Review: size, content_type, last_modified +``` + +### Assess Storage Usage + +``` +1. swift_list_containers() → total bytes per container +2. Sum across containers for project total +3. Compare with quota: limes_get_project_quota(service=object-store) +``` + +## Troubleshooting + +### Container not found + +- Container names are case-sensitive — verify exact casing +- Check project scope: `keystone_token_info` + +### Object listing returns empty + +- Container may genuinely be empty +- `prefix` filter may be too restrictive — try without prefix + +### Large container — cannot see all objects + +- Use `prefix` + `delimiter` to navigate instead of listing all + +## Security Considerations + +- Object names and container names may reveal data classification +- `last_modified` timestamps reveal activity patterns +- Container listings expose data inventory — treat as confidential +- Never attempt to retrieve object content that might contain secrets + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Object storage quota | Limes | `limes_get_project_quota(service=object-store)` | +| Who uploaded/deleted objects | Hermes | `hermes_list_events(target_type=object-store/object)` | +| Container used by Keppel | Keppel | `keppel_list_accounts` (registry backing storage) | +| Project context | Keystone | `keystone_token_info` | diff --git a/plugins/sapcc/skills/sapcc-secrets/SKILL.md b/plugins/sapcc/skills/sapcc-secrets/SKILL.md new file mode 100644 index 0000000..3a1b619 --- /dev/null +++ b/plugins/sapcc/skills/sapcc-secrets/SKILL.md @@ -0,0 +1,120 @@ +--- +name: sapcc-secrets +description: > + Secret metadata management via Barbican (Key Manager) in SAP Converged Cloud. + Triggers: secret, key, certificate, barbican, key manager, encryption, passphrase, credential store +version: 1.0.0 +metadata: + service: [barbican] + task: [inspect, audit, manage] + persona: [developer, security, platform-engineer] +--- + +# SAP CC Secrets (Barbican) + +Inspect secret metadata stored in Barbican (OpenStack Key Manager). The MCP server provides metadata-only access — **secret payloads are never returned** for security. + +## MCP Tools + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `barbican_list_secrets` | List secrets (metadata only) | `name`, `secret_type` (returns: secret_ref, name, status, secret_type, algorithm, bit_length, created, expiration) | +| `barbican_get_secret` | Get metadata for a single secret | `secret_id` (UUID) (returns: name, status, secret_type, algorithm, bit_length, mode, created, updated, expiration, content_types) | + +## Critical Security Note + +**The secret payload (the actual key/certificate/password value) is NEVER returned by these tools.** This is an intentional security boundary. The MCP server only exposes metadata — enough to inventory and audit secrets, but never enough to extract sensitive material. + +If a user asks to "show me the secret value" or "get the password from Barbican" — explain that this is not possible through these tools and is by design. + +## Gotchas + +### 1. Payload is never returned — metadata only + +You will see name, type, algorithm, bit_length, expiration — but never the actual secret value. This is a security feature, not a limitation. + +### 2. Secret types determine usage context + +| Type | Typical Use | +|------|-------------| +| `symmetric` | Encryption keys (AES, etc.) | +| `public` | Public keys (RSA, EC) | +| `private` | Private keys (RSA, EC) | +| `passphrase` | Passwords, tokens | +| `certificate` | X.509 certificates | +| `opaque` | Arbitrary binary data | + +### 3. Status "ACTIVE" means ready for use + +Secrets go through: `PENDING` → `ACTIVE`. Only `ACTIVE` secrets can be consumed. + +### 4. Expiration is informational — Barbican does not auto-delete + +A secret past its `expiration` date is still retrievable. The expiration field is advisory — it tells you the secret SHOULD have been rotated, but Barbican does not enforce it. + +### 5. secret_ref contains the UUID + +The `secret_ref` field is a full URL. Extract the UUID from the end for use with `barbican_get_secret`. + +### 6. Secrets are project-scoped + +You only see secrets belonging to your current project. Cross-project secret sharing requires explicit ACLs. + +### 7. algorithm + bit_length describe cryptographic strength + +For symmetric keys: `AES` + `256` = AES-256. For asymmetric: `RSA` + `4096` = RSA-4096. Use this to audit security standards compliance. + +## Common Workflows + +### Inventory All Secrets + +``` +1. barbican_list_secrets() +2. Review: name, type, status, expiration +3. Flag any with expired dates or weak algorithms +``` + +### Find Certificates Nearing Expiration + +``` +1. barbican_list_secrets(secret_type=certificate) +2. Check expiration field for each +3. Certificates past or near expiration need rotation +``` + +### Audit Secret Usage for Compliance + +``` +1. barbican_list_secrets() → full inventory +2. barbican_get_secret(secret_id) for each → detailed metadata +3. Cross-reference with Hermes: hermes_list_events(target_type=key-manager/secret) +``` + +## Troubleshooting + +### No secrets found + +- Project may not use Barbican +- Check project scope: `keystone_token_info` +- Secrets may be stored under different names than expected + +### Secret status is not ACTIVE + +- `PENDING`: Store operation may have failed +- Check Hermes: `hermes_list_events(target_type=key-manager/secret, outcome=failure)` + +## Security Considerations + +- Even metadata reveals security posture: weak algorithms, expired certs, naming patterns +- Secret names may reveal infrastructure details (service names, environments) +- Report expired secrets and weak algorithms (< AES-128, < RSA-2048) as findings +- Never suggest workarounds to extract secret payloads + +## Cross-Service References + +| Need | Service | Tool | +|------|---------|------| +| Who accessed/created secrets | Hermes | `hermes_list_events(target_type=key-manager/secret)` | +| Secret quota usage | Limes | `limes_get_project_quota(service=key-manager)` | +| TLS certificates for LBs | Octavia | `octavia_list_listeners` (TERMINATED_HTTPS uses Barbican) | +| Project identity context | Keystone | `keystone_token_info` |