Back to Blog

You know that feeling when you sit down on a Friday to โ€œfix one small bugโ€ and wake up on Tuesday with two entire feature epics shipped, a redesigned navigation system, and your agents auto-updating themselves from GitHub releases?

Yeah. That happened.

Buckle up โ€” this is a big one. โ˜•


๐ŸŽจ The Great Dark Mode Migration

Letโ€™s start with the one that annoyed me every single day: half the app was still bright white. Login page? Dark. Dashboard? Blinding. It was like walking from a movie theater into a parking lot at noon.

The diagnosis was brutal: 418 hardcoded colors scattered across the entire frontend. Every bg-white, every text-gray-800, every border-gray-200 โ€” all of them completely ignoring the dark mode class on the root element.

The fix? A surgical strike across 23 files, 149 replacements:

Before After
bg-white bg-white dark:bg-zinc-900
text-gray-800 text-gray-800 dark:text-zinc-100
border-gray-200 border-gray-200 dark:border-zinc-800
bg-blue-100 text-blue-800 bg-blue-500/20 text-blue-400

That last one is my favorite trick. Instead of having separate light/dark badge colors, translucent backgrounds (bg-*-500/20) with bright text (text-*-400) look great in both modes. Stolen from Discordโ€™s playbook, honestly.

Result: Every. Single. Page. Now dark. Login โœ… Dashboard โœ… Nodes โœ… Jobs โœ… Vulnerabilities โœ… Security Center โœ… Services โœ… Reports โœ… Settings โœ…

No more retina burns at 2 AM. Youโ€™re welcome. ๐ŸŒ™


๐Ÿ” E30: Patch & Update Orchestration

This is the big kahuna. The one feature every IT admin asks about first: โ€œCan it patch my machines?โ€

Before v0.6.0, Octofleet could tell you about vulnerabilities. It could even remediate some of them through the remediation engine. But it didnโ€™t have a proper patch management pipeline โ€” you know, the whole ring-based, staged rollout, compliance-tracking thing that makes SCCM admins feel warm and fuzzy.

Now it does.

What We Built

๐Ÿ—„๏ธ Server Side (Phase 1)

Five new database tables form the backbone:

  • patch_catalog โ€” Your master list of patches with severity, KB numbers, and affected products
  • patch_rings โ€” Deployment rings (Dev โ†’ Pilot โ†’ Broad โ†’ Critical) with configurable delays
  • patch_deployments โ€” Scheduled rollouts targeting specific rings
  • patch_deployment_results โ€” Per-node results tracking
  • patch_catalog_nodes โ€” Which nodes need which patches

Twenty API endpoints under /api/v1/patches/ handle everything from catalog management to deployment approval workflows.

The frontend got a proper /patches page with four tabs: Catalog, Rings, Deployments, and Compliance. Thereโ€™s a deployment wizard that walks you through selecting patches, picking a ring, setting a schedule, and kicking it off.

๐Ÿ–ฅ๏ธ Agent Side (Phase 2)

Hereโ€™s where it gets spicy. PatchScanner.cs talks directly to the Windows Update Agent COM API:

var updateSession = new UpdateSession();
var updateSearcher = updateSession.CreateUpdateSearcher();
var searchResult = updateSearcher.Search("IsInstalled=0");

Every 6 hours, the agent scans for missing updates and reports back to the server. No WSUS required. No SCCM. Just the agent talking to Windows Update and telling headquarters whatโ€™s missing.

The scanner categorizes patches by severity (Critical, Important, Moderate) and maps them to KB numbers so the server-side catalog stays in sync.

Is it SCCM? No. Is it enough for 90% of environments under 500 nodes? Absolutely. ๐Ÿ’ช


๐Ÿ“ E31: Configuration Baselines & Drift Management

If E30 is about โ€œare my machines patched?โ€, E31 is about โ€œare my machines configured correctly?โ€

Think Group Policy auditing, but vendor-agnostic and with actual drift detection.

The Concept

You define a baseline โ€” a collection of rules that describe how a machine should be configured. Things like:

  • ๐Ÿ”’ Password policy: minimum 12 characters
  • ๐Ÿ›ก๏ธ Windows Firewall: enabled on all profiles
  • ๐Ÿšซ Guest account: disabled
  • ๐Ÿ’ป RDP: Network Level Authentication required

Then you assign that baseline to nodes or groups. Octofleet evaluates the rules against inventory data and tells you whoโ€™s compliant and whoโ€™s drifting.

Phase 1: The Engine

  • config_baselines โ€” Named baselines (e.g., โ€œWindows Server 2022 Hardeningโ€)
  • config_baseline_rules โ€” Individual rules with expected values and evaluation logic
  • config_baseline_assignments โ€” Which nodes/groups get which baselines
  • config_baseline_evaluations โ€” Results from the last evaluation run
  • config_drift_events โ€” Timeline of when drift was detected (and resolved)

The evaluation engine runs server-side against inventory data that agents already collect. No extra agent configuration needed.

Phase 2: CIS Benchmarks & Auto-Remediation

This is where it gets really cool. We built two CIS benchmark templates out of the box:

Windows Server 2022/2025 L1 (9 rules):

  • Password history, max age, min length, complexity
  • Account lockout threshold and duration
  • Windows Firewall profiles
  • Remote Desktop NLA

Windows 11 Enterprise L1 (6 rules):

  • UAC enforcement
  • Windows Defender real-time protection
  • BitLocker drive encryption
  • Audit policy configuration

Each rule has an evaluation type (registry, policy, service, feature) and many come with auto-remediation scripts. Click โ€œRemediateโ€ and Octofleet generates the appropriate PowerShell command:

# For non-Server SKUs (Win10/11):
try { 
    Install-WindowsFeature Windows-Defender -IncludeManagementTools -ErrorAction Stop 
} catch { 
    Add-WindowsCapability -Online -Name 'Microsoft.Windows.Defender~~~~' -ErrorAction Stop 
}

That try/catch fallback? Yeah, thatโ€™s because Install-WindowsFeature only exists on Server editions. Spent a fun hour debugging why remediation worked on HYPERV02 but crashed on desktop machines. The joys of cross-SKU PowerShell. ๐Ÿ™ƒ


๐Ÿงญ The Mega Dropdown Navigation

The old navbar had seven top-level items and a secondary tab bar on the Security page. It was fine when we had 15 pages. We now have 40+. Something had to give.

Enter the mega dropdown:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ๐Ÿ”’ Security                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Monitoring      โ”‚  Compliance               โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
โ”‚  ๐Ÿ“Š Dashboard    โ”‚  ๐Ÿ“ Baselines              โ”‚
โ”‚  ๐Ÿ” Findings     โ”‚  ๐Ÿ“‹ CIS Benchmarks         โ”‚
โ”‚  ๐Ÿ“ก Events       โ”‚  ๐Ÿ”„ Drift Events           โ”‚
โ”‚  ๐Ÿ“ File Audit   โ”‚  ๐Ÿ›ก๏ธ Posture                โ”‚
โ”‚  ๐Ÿง  Behavior     โ”‚  ๐Ÿ’Š Remediation            โ”‚
โ”‚  ๐Ÿ‘๏ธ Activity     โ”‚  โš–๏ธ Retention              โ”‚
โ”‚  ๐Ÿ“ฆ Evidence     โ”‚  ๐ŸŽฏ Policies               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Two columns. Section headers. 14 items organized by function. Persistent context โ€” the dropdown highlights which section youโ€™re currently in.

The navbar now has six dropdowns: Fleet (emerald), Software (blue), Infra (amber), Security (red, 2-column mega), Ops (cyan), and Admin (purple). Each with its own color identity so you always know where you are.

Goodbye, tab bar. You served us well. ๐Ÿ‘‹


๐Ÿ”ง The Job System Bug That Wasnโ€™t a Bug

This oneโ€™s a good story.

Users reported that jobs showed โ€œdeviceโ€ instead of the actual hostname in the jobs list. Also, some jobs were stuck at 0/0 instances. Classic.

Round 1: I added a LEFT JOIN nodes to resolve hostnames. Committed. Deployed. Still showing โ€œdevice.โ€

Round 2: Turns out FastAPI has a last-registered-wins behavior for routes. The routers/jobs.py file was overriding the endpoint in main.py. My fix was in main.py. The wrong file. ๐Ÿคฆ

Round 3: Fixed routers/jobs.py with the proper JOIN. Now hostnames show up. But wait โ€” why are jobs stuck at 0/0 instances?

Round 4: The create_job() function in routers/jobs.py was inserting into the jobs tableโ€ฆ but never creating job_instances. Jobs existed in the database, but agents had nothing to pick up. Itโ€™s like placing an order at a restaurant but nobody sending it to the kitchen.

Round 5: Fixed instance creation. Jobs now show instances: 1. But agents still arenโ€™t executing them?!

Round 6: The agent polls as win-baltasa, which resolves to BALTASA. But job_instances.node_id stores UUIDs like d7cc6f42-735a-409c-.... The query was comparing apples to UUIDs. Added a node lookup step that resolves hostname โ†’ UUID before querying.

Six rounds. One โ€œsimpleโ€ feature. Welcome to distributed systems.

Jobs now show actual hostnames, create proper instances, and agents actually pick them up. What a concept. ๐ŸŽช


๐Ÿค– Self-Updating Agents

Speaking of agents โ€” v0.6.0 includes the PatchScanner and PostureCollector, so all nodes needed an update. The auto-updater checks GitHub releases on startup, downloads the ZIP, extracts, and restarts itself.

The catch: When running as a Windows Service, the agent canโ€™t cleanly restart itself (you canโ€™t kill the process thatโ€™s killing the process). In interactive/PowerShell mode? Works beautifully. As a service? The restart script spawns but the timing isโ€ฆ unpredictable.

Current rollout status:

  • โœ… DESKTOP-B4GCTCV โ€” self-updated to v0.6.0
  • โœ… BALTASA โ€” manual PowerShell restart โ†’ auto-updated
  • โœ… HYPERV02 โ€” updated via restart job
  • ๐Ÿ”„ SCVMM, SQLSERVER1 โ€” in progress

For the remaining two, a quick Restart-Service OctofleetNodeAgent on the box does the trick. Not elegant, but it works.


๐Ÿ–ฅ๏ธ Hardware Fleet Dashboard

New page alert! /fleet/hardware now shows a fleet-wide hardware overview:

  • ๐Ÿ“ฆ Total Storage: 12.44 TB across all nodes
  • ๐Ÿ’ฝ Disk Health: 5 healthy, 4 unknown (virtual disks donโ€™t report SMART)
  • ๐Ÿงฎ CPU Breakdown: AMD Ryzen 9800X3D, Intel i7-13700, i9-13900K, i7-8550U
  • โš ๏ธ Issues: Auto-detects drives above 90% capacity

All data pulled from the hardware_current table that agents populate automatically. Zero configuration required.


๐Ÿฉน The Remediation Engine Saga

The vulnerability remediation pipeline got a lot of love this release. Hereโ€™s a highlight reel of bugs found and squashed:

Bug Root Cause Fix
Remediation jobs fail to create node_id inserted as text, column is uuid Cast to ::uuid
Agent never picks up jobs Status filter only matched approved, engine creates pending IN ('pending', 'approved')
Dashboard shows wrong counts API returned completed, frontend expected success Return both key formats
SSE stream crashes json.dumps() on raw UUID objects default=str serializer
500 duplicate jobs Double-click on โ€œRemediateโ€ button Cleaned up + added debounce
winget returns exit code 1 โ€œNo upgrade availableโ€ is technically exit 1 Treat as success

After remediation succeeds, vulnerabilities are now automatically marked as fixed in node_vulnerabilities. The vulnerability dashboard filters these out, so your numbers actually go down when you fix things. Revolutionary, I know. ๐Ÿ“‰


๐Ÿ“Š By The Numbers

Since v0.5.6 (5 days ago):

Metric Count
Commits 50+
New API endpoints 40+
New DB tables 10
Files changed 100+
New frontend pages 8
E2E tests passing 41/44
Dark mode pages 9/9 โœ…
Bugs squashed ~20
Coffee consumed โ˜•โ˜•โ˜•โ˜•โ˜•

Whatโ€™s Next?

The roadmap still has plenty of meat on it:

  • E33 โ€” Software Metering & License Tracking (P2)
  • E34 โ€” Network Discovery & Topology (P2)
  • E35 โ€” Enterprise Reporting Suite (P3)
  • Agent registry inventory for CIS registry rule evaluation
  • Build agent v0.6.0 binary via CI (currently requires Windows + dotnet publish)

Oh, and main.py is still 11,500 lines. That number only goes in one direction around here, and itโ€™s not the direction Iโ€™d like. But thatโ€™s a problem for future me.


The full release is tagged as v0.6.0 on GitHub. Star the repo if youโ€™re into this kind of thing. Or donโ€™t. Iโ€™m not your dad. ๐Ÿ™

โ€” Benedikt