Chapter 17 — The Raspberry Pi as AI Node: MQTT Meets the LLM

The consolidation of two years of IoT work: a Raspberry Pi 4 running local AI models, connected to the MQTT broker, able to process sensor data and respond in natural language.

Local LLMs on a Pi

A Raspberry Pi 4 with 8GB RAM can run quantized language models in the 7B parameter range at 1–3 tokens/second. That’s not fast — a response to a sensor query takes 10–30 seconds. For interactive chat, it’s too slow. For autonomous monitoring tasks that run once per hour, it’s fine.

The practical stack:

Ollama for model serving — one command to download and run a model
Llama 3.2 3B or Phi-3.5 mini for Pi-compatible size
Python + requests to interact with the Ollama API

# On the Pi
ollama serve
ollama pull phi3.5

# Test
curl http://localhost:11434/api/generate \
  -d '{"model": "phi3.5", "prompt": "Is 22°C a good temperature for a bedroom?"}'

The MQTT + AI Pipeline

The architecture: an MQTT subscriber that accumulates sensor readings, packages them as context, and periodically queries the local LLM for anomalies or recommendations.

import paho.mqtt.client as mqtt
import requests
import json

readings = {}

def on_message(client, userdata, msg):
    topic = msg.topic
    value = float(msg.payload.decode())
    readings[topic] = value

    # Every 20 new readings, ask the LLM for analysis
    if len(readings) % 20 == 0:
        context = json.dumps(readings, indent=2)
        prompt = f"""
        Here are the current sensor readings from my home:
        {context}
        
        Are there any values outside normal ranges? 
        Should I take any action?
        """
        resp = requests.post("http://localhost:11434/api/generate",
                           json={"model": "phi3.5", "prompt": prompt, "stream": False})
        analysis = resp.json()["response"]
        
        # Publish the analysis back to MQTT for Home Assistant
        client.publish("home/ai/analysis", analysis)

client = mqtt.Client()
client.on_message = on_message
client.connect("localhost", 1883)
client.subscribe("home/sensors/#")
client.loop_forever()

The analysis is published back to MQTT. Home Assistant picks it up as a text sensor entity, displays it on a dashboard, and can trigger a notification when the LLM flags something unusual.

sequenceDiagram
    participant Sensor as DHT22 + Pico W
    participant Broker as MQTT Broker
    participant Py as Python Subscriber
    participant Ollama as Ollama (Pi 4)
    participant HA as Home Assistant

    Sensor->>Broker: publish home/sensors/# (every 30s)
    Broker->>Py: deliver readings
    Py->>Py: accumulate 20 readings
    Py->>Ollama: POST /api/generate\n{context + prompt}
    Ollama->>Py: analysis text (10–30s)
    Py->>Broker: publish home/ai/analysis
    Broker->>HA: deliver analysis text
    HA->>HA: display on dashboard
    HA-->>HA: trigger notification\nif anomaly detected

Pico W publishing sensor data over MQTT — the input end of the AI analysis pipeline

The Realistic Expectation

A 3B parameter model running at 1 token/second on a Pi will not replace a data analyst. It will catch obvious patterns (“temperature has been rising steadily for 3 hours”), flag values outside normal ranges (“CO2 is at 1800ppm, which is significantly above typical indoor levels”), and suggest simple actions (“consider opening a window or turning on ventilation”).

That’s useful. Not magic. The value is in turning passive sensor data into a proactive notification system without building complex rule engines manually.

Takeaway: Ollama on a Pi 4 runs quantized 3B models at usable speed for batch analysis. Use MQTT as the bus between sensors, AI analysis, and Home Assistant. Set realistic expectations — local small models are useful, not omniscient.