Skip to content

Sensor: Ephemeral / Cloud-Init

Ephemeral nodes are hosts that exist for a bounded period: cloud instances that spin up and down on demand, CI/CD test workers, short-lived lab VMs, and autoscaled bare-metal pools. The Telovix Sensor installs and enrolls normally on these nodes.

There is no special "ephemeral mode" in the sensor or Console. Ephemeral nodes use the same enrollment flow as long-lived hosts. The differences are operational: token scope, naming conventions, and how you manage stale records in Sensors.

Prerequisites

  • Enrollment token already created in the Console (one-time or cluster token), or operator role to create one via the Console UI or API
  • Target nodes running Linux with systemd
  • Kernel version 5.4 or later on each target node
  • BTF (BPF Type Format) enabled on each target node (/sys/kernel/btf/vmlinux must exist)
  • Outbound TCP connectivity from each target node to the Console on port 15483

Enrollment token strategy

Two token types are relevant for ephemeral infrastructure:

Token typeTTLReuseBest for
One-time bare-metal token15 minutes (configurable)Single use, consumed at enrollmentControlled one-at-a-time provisioning
Kubernetes cluster token365 daysReusable by all nodes in the clusterDaemonSets, autoscaled node pools using Kubernetes

For bare-metal or VM autoscaling outside Kubernetes, there is no persistent reusable "fleet token" beyond the cluster token type. Generate enrollment tokens as needed, or automate token generation via the API and inject them into instance user data at launch time.

::: note Do not share enrollment tokens across environments. A token generated for a lab node should not be reused in production provisioning. Tokens are short-lived by design; use the API to generate fresh ones per launch batch rather than embedding a single long-lived token in a machine image. :::


Node naming for ephemeral fleets

The node name shown in the Console defaults to the system hostname. On ephemeral nodes, hostnames are often random or recycled, which makes Sensors difficult to interpret after the fact.

Recommended approach: construct a stable, unique name from instance metadata available at provisioning time and pass it to the install script. Examples:

bash
# AWS: use instance ID
NODE_NAME="$(curl -sf http://169.254.169.254/latest/meta-data/instance-id)-worker"

# GCP: use instance name from metadata
NODE_NAME="$(curl -sf "http://metadata.google.internal/computeMetadata/v1/instance/name" \
  -H "Metadata-Flavor: Google")"

# Generic: hostname + timestamp
NODE_NAME="$(hostname -s)-$(date +%Y%m%d%H%M%S)"

# CI: use CI pipeline and job ID
NODE_NAME="ci-${CI_PIPELINE_ID:-0}-${CI_JOB_ID:-0}"

Use tags to classify the node for filtering. For example: env:lab,lifecycle:ephemeral,pool:ci-workers. Tags are applied once at enrollment and can be edited afterward from Sensors > [sensor] > Edit Tags. They appear in the Sensors filter bar and in alert context.


Cloud-init / user data pattern

The most reliable way to enroll an ephemeral node is to run the install script in the provisioning phase, before the workload starts. The following patterns work with any cloud-init compatible provisioning system.

cloud-config (YAML format)

The install script accepts enrollment token, node role, node name, and tags as inline options. Resolve the node name from instance metadata before calling the script.

yaml
#cloud-config
runcmd:
  - |
    NODE_NAME="$(curl -sf http://169.254.169.254/latest/meta-data/instance-id 2>/dev/null || hostname)"
    bash <(curl -fsSL https://console.example.com:15483/api/v1/sensor/install.sh) \
      --token "<token>" \
      --role generic_linux \
      --name "${NODE_NAME}" \
      --tags "env:lab,pool:autoscale"

Shell-based user data

bash
#!/bin/bash
set -euo pipefail

# Resolve node identity from instance metadata
INSTANCE_ID="$(curl -sf http://169.254.169.254/latest/meta-data/instance-id 2>/dev/null || hostname)"

bash <(curl -fsSL https://console.example.com:15483/api/v1/sensor/install.sh) \
  --token "<token>" \
  --role generic_linux \
  --name "${INSTANCE_ID}-worker" \
  --tags "env:lab,pool:autoscale"

Packer image build

For baking a sensor into a machine image, you can run the install script during the build phase. The node will already have the binary and service installed when it starts. However, the enrollment token must still be injected at launch time (not baked into the image), because:

  • The token is single-use and expires in 15 minutes
  • Baking a token into an image shares that token across all instances launched from it

Pattern for Packer builds:

json
{
  "provisioners": [
    {
      "type": "shell",
      "inline": [
        "apt-get install -y curl",
        "curl -fsSL -o /usr/local/bin/telovix-sensor https://console.example.com:15483/api/v1/sensor/download/amd64",
        "chmod +x /usr/local/bin/telovix-sensor",
        "mkdir -p /etc/telovix-sensor /var/lib/telovix-sensor"
      ]
    }
  ]
}

Then inject the enrollment details at launch time via user data and start the service:

bash
# At instance launch (via user data):
INSTANCE_ID="$(curl -sf http://169.254.169.254/latest/meta-data/instance-id)"

bash <(curl -fsSL https://console.example.com:15483/api/v1/sensor/install.sh) \
  --token "<fresh-token>" \
  --role generic_linux \
  --name "${INSTANCE_ID}-worker" \
  --tags "env:lab,pool:packer-base"

Automating token generation

For large-scale ephemeral provisioning, generate enrollment tokens from the Console rather than pre-creating them manually. In the Console, go to Sensors > Enrollment Tokens > Generate Token to create a new one-time token. For automated provisioning pipelines, use the Console API (see the Console REST API reference) to generate tokens programmatically and inject them into instance user data at launch time.

Tokens are valid for 15 minutes by default (configurable from Console Settings > Enrollment). For large autoscale groups launching many instances simultaneously, generate tokens in bulk before the launch batch or extend the TTL for that window.


Fleet state for ephemeral nodes

When an ephemeral node terminates without running the uninstall script, its sensor record remains in Sensors. The Console uses the heartbeat timeout to change the sensor's status:

StatusCondition
healthyHeartbeats arriving normally
degraded (watch)No heartbeat for approximately 45 seconds (3 missed cycles at 15s interval)
staleNo heartbeat for approximately 90 seconds (configurable from Console Settings)
offlineNo heartbeat for approximately 4x the stale threshold (~6 minutes by default)

The retention loop removes sensor metrics after 30 days (configurable from Console Settings > Retention). However, sensor records (the fleet entry, certificates, and audit history) are not automatically deleted. On high-volume ephemeral fleets, orphaned records accumulate in Sensors.

To delete orphaned sensor records:

  • From the Console UI: Sensors > select the sensor > Actions > Delete. This requires the admin role.

Deletion is permanent and is recorded in the audit log with action type sensor_deleted.

For bulk cleanup of stale sensors, filter Sensors by status stale using the Status filter, then select and delete the records in bulk. Alternatively, use the Console REST API (see the API reference) to list sensors by status and delete them programmatically.


Graceful cleanup on instance termination

If your instance lifecycle tooling supports pre-termination hooks (AWS lifecycle hooks, GCP shutdown scripts, Azure term notifications), run the uninstall script before the instance terminates:

bash
# In a shutdown/termination hook:
bash <(curl -fsSL \
  --cert /var/lib/telovix-sensor/client.cert.pem \
  --key /var/lib/telovix-sensor/client.key.pem \
  --cacert /etc/telovix-sensor/console-ca.cert.pem \
  "https://console.example.com:15483/api/v1/sensor/uninstall.sh")

The uninstall script notifies the Console to decommission the sensor over mTLS, removes the sensor record, then removes all local files. If the Console is unreachable, the script completes the local cleanup anyway and logs a warning. You may need to delete the sensor record manually from the Console in that case.


Practical recommendations

ConcernRecommendation
Token scopeGenerate fresh tokens per launch batch, not per instance. Do not bake tokens into images.
Node namingBuild the name from stable instance metadata (instance ID, job ID) rather than from hostnames
TagsUse env:lab, lifecycle:ephemeral, or pool:<name> to separate ephemeral nodes from production in fleet filters
PoliciesUse observe-only packs on ephemeral nodes. Enforcement on short-lived infrastructure adds operational risk without proportional benefit.
CleanupRun the uninstall script in shutdown hooks when available. For nodes without hooks, schedule periodic bulk deletion of stale records via the API.
License limitEach enrolled sensor (even stale ones) counts as an active node until the record is deleted. On high-volume fleets, delete stale records regularly to stay within the license max_protected_nodes limit.

Further reading

Released under the Telovix Commercial License.