Pilot Checklist

A successful pilot establishes that enrollment works reliably on your target nodes, that the fleet model fits your environment, that the data landing in the Console is actionable, and that you understand the rollback path before you scale. This checklist is organized as a sequence: each phase builds on the previous one.

Phase 1: Infrastructure readiness

Before enrolling any sensors, confirm the Console and databases are healthy.

[ ] Console host is sized (2 CPU / 2 GB RAM minimum) and reachable from pilot nodes on port 15483 (Telovix self-hosted default)
[ ] PostgreSQL 15+ is running and reachable from the Console host
[ ] ClickHouse 24.8+ is running and reachable from the Console host
[ ] Console health endpoints return 200: curl https://<console>:15483/healthz and /readyz
[ ] Signed license has been imported and the Console UI is accessible (setup wizard completed)
[ ] At least one admin account exists and login works
[ ] Sensor binaries (telovix-sensor and/or telovix-sensor-telecom) are staged in sensor-binaries/{amd64,arm64}/
[ ] Platform vertical matches your nodes: standard for general Linux, telecom for nodes running 5G or O-RAN workloads

Phase 2: Kernel readiness on pilot nodes

For each node you plan to enroll, run the following checks:

bash

# Kernel version (minimum 5.4, recommended 6.x)
uname -r

# BTF available (required)
test -f /sys/kernel/btf/vmlinux && echo “BTF OK” || echo “BTF MISSING”

# BPF filesystem mounted (required)
grep -q bpf /proc/mounts && echo “BPF FS OK” || echo “BPF FS NOT MOUNTED”

# BPF LSM active (required for enforcement and LSM hooks)
grep -q bpf /sys/kernel/security/lsm && echo “LSM BPF OK” || echo “LSM BPF NOT ACTIVE”

# systemd present (required for VM/bare-metal install)
command -v systemctl && echo “systemd OK” || echo “No systemd”

[ ] Every pilot node passes all five checks above
[ ] PREEMPT_RT nodes (O-DU, vDU) identified: the sensor handles them automatically but note which nodes they are for reference
[ ] If any node is missing BTF, resolve before enrollment (kernel recompile or distribution kernel upgrade may be needed)
[ ] If BPF LSM is not active, LSM hooks will not fire: file access detection and privilege escalation detection fall back to kprobes only

Phase 3: Node naming and tagging convention

Agreeing on this before enrollment avoids cleanup later.

[ ] Node naming convention decided (example: <site>-<role>-<index>, such as oslo-amf-01)
[ ] Tags agreed for the sensor install command at install time (example: site:oslo-dc1,env:pilot,plmn:242-01)
[ ] Node roles mapped for each pilot host (see table below)
[ ] Sensor flavor decided per node: standard or telecom

Common node roles to assign at enrollment:

Role	Use for
`generic_linux`	Any standard Linux host
`amf`	5G Core AMF
`smf`	5G Core SMF
`upf`	5G Core UPF
`o_du`	O-RAN Distributed Unit
`o_ru`	O-RAN Radio Unit
`o_cu_cp`	O-RAN Central Unit Control Plane
`o_cu_up`	O-RAN Central Unit User Plane
`near_rt_ric`	O-RAN near-RT RIC

Phase 4: First enrollment

Start with one standard Linux node and, if applicable, one telecom node. Do not enroll your full fleet yet.

[ ] Enrollment token generated from Sensors > Deploy Sensor (15-minute TTL)
[ ] Install command run as root on the target node
[ ] Sensor appears in Sensors within 15 seconds of the first heartbeat
[ ] Trust state shows healthy (not stale, degraded, or revoked)
[ ] Node role and flavor label appear correctly in the Sensors row
[ ] Tags applied at install time are visible in Sensors > [sensor]
[ ] Recent events are flowing: process execution, network connections, and file access events visible within 60 seconds of activity on the node
[ ] For telecom nodes: NF role is detected and the Telco section shows data

Check enrollment health on the node:

bash

systemctl status telovix-sensor
journalctl -u telovix-sensor -n 30 --no-pager | grep -E “enrolled|heartbeat|error”

If the sensor shows stale: heartbeats are not reaching the Console. The sensor marks itself stale after 90 seconds without a successful heartbeat by default. Check network connectivity and clock synchronization on both hosts.

Phase 5: Signal quality review

Run the sensor in observe-only mode for at least 14 days before considering enforcement. This is the anomaly scoring learning window. During the first 14 days, behavioral baselines are building and anomaly alerts are suppressed to prevent noise from false positives.

[ ] Sensor has been running for at least 48 hours on each pilot node before reviewing alerts
[ ] Alert inbox reviewed: are the alerts meaningful for this environment?
[ ] At least one controlled test event triggered and traced (examples: a new outbound connection, a suspicious process execution, an SSH login from an unexpected host)
[ ] Process tree view used to trace at least one event from the originating process back to its ancestor chain
[ ] Event kinds visible in Sensors > [sensor] > Recent Events match expectations for each node type (process_exec, network_connect, file_open, privilege_change, etc.)
[ ] For telecom nodes: NGAP KPI view, PFCP session view, and/or O-RAN interface status shows data
[ ] Resource usage on pilot nodes is acceptable (CPU less than 2%, RSS less than 120 MB under normal load)

Phase 6: Policy pack assignment

A policy pack provides structured detection coverage beyond the always-on baseline collection.

[ ] At least one policy pack assigned to a pilot sensor via Sensors > [sensor] > Assign Pack
[ ] Pack appears in the heartbeat response (visible in Sensors > [sensor])
[ ] Events tagged with the pack ID appear in the event feed within one heartbeat cycle (15 seconds)
[ ] Enforcement verification run: Sensors > [sensor] > Enforcement shows the readiness class

The Telovix Console computes three readiness classes before allowing enforcement advancement:

Class	Meaning
`NotReady`	One or more blocking conditions exist: pack is not enforcement-capable, sensor is not in live runtime mode, trust is revoked or degraded, sensor health is critical, or sensor is disabled
`ReviewRequired`	No hard blockers but caution conditions exist: trust is not fully healthy, health is degraded or watch, or no recent activity has been observed
`CandidateForEnforceReady`	No blocking or caution conditions. Safe to advance to `enforce_ready`.

Do not advance past observe until you have reviewed the enforcement verification output and have confidence in the baseline.

The enforcement state machine has three states and only advances forward one step at a time:

observe  -  enforce_ready  -  enforced

Returning to observe is always allowed from any state. You cannot skip enforce_ready and jump directly to enforced.

Phase 7: Compliance evidence (if applicable)

[ ] Compliance view opened for the framework relevant to your deployment
[ ] At least one framework report generated and reviewed (CIS v8, NIS2, 3GPP TS 33.117, or O-RAN WG11)
[ ] Evidence gaps documented: controls that require more data collection time or additional nodes
[ ] At least one evidence bundle exported for a specific control to validate the download and format

Phase 8: Rollback and operational ownership

Before scaling, confirm the rollback path is understood and owned.

[ ] At least one operator knows how to disable a sensor from the Console (Sensors > [sensor] > Actions > Disable)
[ ] At least one operator knows how to revert a pack to observe state (Sensors > [sensor] > Enforcement > Set to Observe)
[ ] Console backup process agreed for Telovix self-hosted deployments (PostgreSQL and ClickHouse data, and the Console configuration directory) (Telovix self-hosted only)
[ ] Sensor re-enrollment process tested: generate a re-enrollment token and confirm the existing sensor identity is preserved. A re-enrollment token updates credentials on the existing sensor record. A plain enrollment token creates a new sensor identity.
[ ] Escalation contact for license issues identified (Portal account owner or support contact)

Phase 9: Scale readiness

Only after all previous phases are complete:

[ ] Enrollment token strategy agreed: one-time tokens for manual nodes, cluster enrollment tokens for Kubernetes
[ ] Sensor binary staging confirmed for all architectures and flavors needed in the broader fleet
[ ] Console resource plan reviewed for the expected node count (PostgreSQL and ClickHouse sizing)
[ ] If more than 1,500 nodes are expected: Redpanda Scale tier evaluated
[ ] Sensor staleness threshold reviewed (default 90 seconds; adjustable in Console Settings) and confirmed appropriate for the network environment
[ ] Certificate renewal window noted: renewal_recommended at 72 hours before expiry, renewal_due at 24 hours. Ensure operators know to act on trust health alerts before the cert expires.

Pilot success criteria

A pilot is ready to move to production rollout when all of the following are true:

Criterion	How to verify
Enrollment is reliable	Enrolled at least 3 nodes; all reached `healthy` trust state without manual intervention
Events are flowing	Event feed shows at least `process_exec`, `network_connect`, and `file_open` events on each node
Alerts are actionable	At least one alert reviewed and triaged to a meaningful conclusion
Rollback is understood	At least one operator demonstrated the disable and observe revert actions
Compliance evidence is exportable	At least one framework report downloaded and verified
Resource impact is acceptable	CPU and memory overhead on pilot nodes is within agreed limits

Pilot Checklist ​

Phase 1: Infrastructure readiness ​

Phase 2: Kernel readiness on pilot nodes ​

Phase 3: Node naming and tagging convention ​

Phase 4: First enrollment ​

Phase 5: Signal quality review ​

Phase 6: Policy pack assignment ​

Phase 7: Compliance evidence (if applicable) ​

Phase 8: Rollback and operational ownership ​

Phase 9: Scale readiness ​

Pilot success criteria ​

Further reading ​