Building a Multi-Tenant AIOps Platform with FastAPI and React
Introduction
Building enterprise-grade software is never a simple task. When I joined Discretelogix to work on utilITise — a multi-tenant AIOps platform — the challenge was clear: give IT teams full visibility into their Microsoft Intune-enrolled device fleet, automate incident detection, and resolve issues autonomously using AI.
In this post, I'll walk through the key architectural decisions we made and the lessons learned.
The Architecture
Backend: FastAPI + SQLModel + Async PostgreSQL
We chose FastAPI as our backend framework for several reasons:
- Native async support with
asyncio - Automatic OpenAPI documentation
- Pydantic v2 for strict type validation
- Lightning-fast performance compared to Django for high-concurrency endpoints
from fastapi import FastAPI, Depends
from sqlmodel.ext.asyncio.session import AsyncSession
app = FastAPI(title="utilITise API")
@app.get("/devices/{tenant_id}")
async def get_devices(
tenant_id: str,
session: AsyncSession = Depends(get_session),
current_user: User = Depends(get_current_user),
):
return await DeviceService.list_devices(session, tenant_id, current_user)
Multi-Tenant Data Isolation
Every API endpoint enforces tenant-scoped queries. We implemented Row-Level Security at the application layer, ensuring that even if a bug exists in the query logic, tenants can never see each other's data.
Real-Time Updates with WebSockets + Redis Pub/Sub
For live device status updates, we built a WebSocket layer backed by Redis pub/sub:
@app.websocket("/ws/devices/{tenant_id}")
async def device_updates(websocket: WebSocket, tenant_id: str):
await manager.connect(websocket, tenant_id)
async with redis.subscribe(f"tenant:{tenant_id}:devices") as channel:
async for message in channel:
await websocket.send_json(message)
The AI Layer: Anthropic Claude + FastMCP
The most exciting part of the platform is the self-healing pipeline. When an incident is detected, Claude agents autonomously:
- Diagnose the root cause using device telemetry
- Determine the optimal remediation action
- Execute the fix via Microsoft Graph API
- Report the outcome back to the dashboard
We used FastMCP to expose IT tools as callable functions that Claude can invoke during its reasoning process.
Lessons Learned
- Multi-tenancy is a first-class concern — design it into your data models from day one, not as an afterthought.
- WebSockets and Redis pub/sub are a powerful combo for real-time features without the complexity of a full message queue.
- LLM agents need guardrails — always validate AI-generated actions before executing them in production environments.
- TypeScript on the frontend saves hours of debugging — invest in proper type definitions for your API contracts.
Conclusion
Building utilITise taught me that the real challenge of enterprise software isn't the individual technologies — it's how they interact. FastAPI, React, TypeScript, Redis, and Anthropic Claude each solve a specific problem, but making them work seamlessly together requires careful architectural thinking.
If you're building something similar, feel free to reach out!