Kubernetes Process with Resource Ratio Anomalies

Original Source: [splunk source]
Name:Kubernetes Process with Resource Ratio Anomalies
id:0d42b295-0f1f-4183-b75e-377975f47c65
version:4
date:2024-10-17
author:Matthew Moore, Splunk
status:experimental
type:Anomaly
Description:The following analytic detects anomalous changes in resource utilization ratios for processes running on a Kubernetes node. It leverages process metrics collected via an OTEL collector and hostmetrics receiver, analyzed through Splunk Observability Cloud. The detection uses a lookup table containing average and standard deviation values for various resource ratios (e.g., CPU:memory, CPU:disk operations). Significant deviations from these baselines may indicate compromised processes, malicious activity, or misconfigurations. If confirmed malicious, this could signify a security breach, allowing attackers to manipulate workloads, potentially leading to data exfiltration or service disruption.
Data_source:
search:| mstats avg(process.*) as process.* where `kubernetes_metrics` by host.name k8s.cluster.name k8s.node.name process.executable.name span=10s
| eval cpu:mem = 'process.cpu.utilization'/'process.memory.utilization'
| eval cpu:disk = 'process.cpu.utilization'/'process.disk.operations'
| eval mem:disk = 'process.memory.utilization'/'process.disk.operations'
| eval cpu:threads = 'process.cpu.utilization'/'process.threads'
| eval disk:threads = 'process.disk.operations'/'process.threads'
| eval key = 'k8s.cluster.name' + ":" + 'host.name' + ":" + 'process.executable.name'
| lookup k8s_process_resource_ratio_baseline key
| fillnull
| eval anomalies = ""
| foreach stdev_* [ eval anomalies =if( '<<MATCHSTR>>' > ('avg_<<MATCHSTR>>' + 4 * 'stdev_<<MATCHSTR>>'), anomalies + "<<MATCHSTR>> ratio higher than average by " + tostring(round(('<<MATCHSTR>>' - 'avg_<<MATCHSTR>>')/'stdev_<<MATCHSTR>>' ,2)) + " Standard Deviations. <<MATCHSTR>>=" + tostring('<<MATCHSTR>>') + " avg_<<MATCHSTR>>=" + tostring('avg_<<MATCHSTR>>') + " 'stdev_<<MATCHSTR>>'=" + tostring('stdev_<<MATCHSTR>>') + ", " , anomalies) ]
| eval anomalies = replace(anomalies, ",\s$", "")
| where anomalies!=""
| stats count values(anomalies) as anomalies by host.name k8s.cluster.name k8s.node.name process.executable.name
| where count > 5
| rename host.name as host
| `kubernetes_process_with_resource_ratio_anomalies_filter`


how_to_implement:To implement this detection, follow these steps: * Deploy the OpenTelemetry Collector (OTEL) to your Kubernetes cluster. * Enable the hostmetrics/process receiver in the OTEL configuration. * Ensure that the process metrics, specifically Process.cpu.utilization and process.memory.utilization, are enabled. * Install the Splunk Infrastructure Monitoring (SIM) add-on. (ref: https://splunkbase.splunk.com/app/5247) * Configure the SIM add-on with your Observability Cloud Organization ID and Access Token. * Set up the SIM modular input to ingest Process Metrics. Name this input "sim_process_metrics_to_metrics_index". * In the SIM configuration, set the Organization ID to your Observability Cloud Organization ID. * Set the Signal Flow Program to the following: data('process.threads').publish(label='A'); data('process.cpu.utilization').publish(label='B'); data('process.cpu.time').publish(label='C'); data('process.disk.io').publish(label='D'); data('process.memory.usage').publish(label='E'); data('process.memory.virtual').publish(label='F'); data('process.memory.utilization').publish(label='G'); data('process.cpu.utilization').publish(label='H'); data('process.disk.operations').publish(label='I'); data('process.handles').publish(label='J'); data('process.threads').publish(label='K') * Set the Metric Resolution to 10000. * Leave all other settings at their default values. * Run the Search Baseline Of Kubernetes Container Network IO Ratio
known_false_positives:unknown
References:
  -https://github.com/signalfx/splunk-otel-collector-chart
drilldown_searches:
  :
tags:
  analytic_story:
    - 'Abnormal Kubernetes Behavior using Splunk Infrastructure Monitoring'
  asset_type:Kubernetes
  confidence:50
  impact:50
  message:Kubernetes Process with Resource Ratio Anomalies on host $host$
  mitre_attack_id:
    - 'T1204'
  observable:
    name:'host'
    type:'Hostname'
    - role:
      - 'Victim'
  product:
    - 'Splunk Enterprise'
    - 'Splunk Enterprise Security'
    - 'Splunk Cloud'
  required_fields:
    - 'process.*'
    - 'host.name'
    - 'k8s.cluster.name'
    - 'k8s.node.name'
    - 'process.executable.name'
  risk_score:25
  security_domain:network

tests:
  :
manual_test:None