· Veytron Technologies · Edge AI · 5 min read
AI Log Triage for IoT Devices — Without Sending Data to the Cloud
A client needed AI-powered incident analysis for their IoT fleet, but security policy prohibited sending device telemetry to any external service. We built it on their existing hardware.
The project started as an internal R&D tool.
The client’s engineering team was running tests on their FreeRTOS IoT devices and spending a disproportionate amount of time manually analysing test logs — scrolling through event sequences, cross-referencing error codes against documentation, trying to reconstruct what happened during a failed test run. We built them an AI-assisted triage tool to do that work automatically: feed in the logs, get back a readable summary of what went wrong and why.
For the prototype, we used a public LLM API. Fast to build, good results, the team adopted it immediately. It became part of their standard testing workflow.
That success led to the next question: could this run on production device telemetry? The answer was yes — but it hit a wall immediately. Production telemetry contained customer device data, and the client’s information security policy explicitly prohibited sending that data to any external service. OpenAI, Anthropic, Microsoft’s Azure OpenAI Service — all off the table. Even cloud-hosted models running in a nominally private Azure tenant did not satisfy the policy.
The capability was proven. The task was to rebuild it so that no data ever left their infrastructure.
Why FreeRTOS logs make this tractable
The first question was whether a locally-hosted model could realistically handle this task. The answer depends heavily on the type of device.
These devices ran FreeRTOS, not Linux. That distinction matters enormously for AI log analysis. A Linux system produces thousands of log lines per minute — kernel scheduler events, driver messages, systemd chatter — the vast majority of which is noise. Feeding that to a local model is impractical without a significant pre-filtering pipeline.
FreeRTOS devices log only application-level events: task state changes, sensor readings, MQTT outcomes, error codes, connectivity events. An incident window is typically 20–80 lines of structured, purposeful output. There is almost no noise. You can hand the full incident log directly to the model and it sees exactly what happened, in the right sequence.
This is the right problem for a 7B-class model on commodity hardware. Not because the hardware is impressive — it is not — but because the input is compact and the signal density is high.
The non-obvious hard part: finding where the problem actually was
The practical engineering challenge turned out to be something we did not anticipate upfront.
Logs were not delivered in real time. The client’s telemetry pipeline batched device events and forwarded them on a fixed schedule — in production, that typically meant once per day, or on-demand when an engineer manually triggered a sync. By the time the log arrived for analysis, the device had long since recovered or moved on. The log buffer covering the incident was somewhere in the middle of the data we received, surrounded by normal-operation entries before and recovery events after.
A naive prompt — “here are the logs, what went wrong?” — produced unreliable results because the model would try to summarize the entire window, including the recovery phase, often diluting or missing the actual failure.
The solution required a two-stage approach. The first stage asked the model to locate the incident within the log — identify the timestamp range where the failure sequence began and ended, based on event patterns rather than any prior knowledge of what to look for. Only once that window was identified did we pass it to the analysis stage. Getting this first-stage prompt right took significant iteration: the model needed to distinguish a brief failure-and-recovery sequence from normal operational variance, without being told what the failure would look like.
This kind of prompt engineering is where most of the real work in applied AI lives. The model selection and infrastructure are secondary.
Making the answers specific, not generic
A model with no context about the client’s specific devices will give generic answers: “possible memory issue,” “connection timeout,” “check your configuration.” Technically correct, practically useless.
The gap between a generic answer and a useful one is knowledge — the client’s own firmware changelogs, device-specific error code documentation, past incident post-mortems. We built a RAG layer that indexed this documentation locally and retrieved the most relevant sections for each query before the model ran.
The difference in output quality was substantial. Instead of “possible MQTT connection failure,” the model would identify the specific firmware version where this error pattern was introduced, reference the relevant changelog entry, and suggest the exact service restart or update path that fixed it in a previous incident. That is the difference between a demo and a tool engineers actually use.
What ran where
The model serving ran on the client’s existing on-premise server — no new hardware purchased. We chose Mistral 7B in Q4_K_M quantization, served via Ollama. On CPU-only hardware this class, triage summaries come back in roughly 90–120 seconds. For asynchronous post-incident analysis this is acceptable; the alternative was engineers spending 20–40 minutes on manual investigation.
The full stack — model server, embedding model, vector store, orchestration — ran on a single server alongside the OpenSearch cluster, with no external network calls at any point in the pipeline.
What the client got
Engineers went from spending 20–40 minutes on initial triage to validating a pre-generated summary in under 5 minutes. Not every summary was correct — roughly 15% required manual review and correction — but even those cases had the right log window and relevant documentation already surfaced, making the investigation faster regardless.
More importantly: the client’s security posture did not change. No telemetry left their network. The system was auditable, self-contained, and ran on hardware they already owned.
If you have a device fleet generating structured telemetry and engineers spending too much time on triage, this architecture is worth considering — especially if data residency requirements rule out cloud AI APIs. Get in touch if you want to talk through whether it applies to your situation.