llm-reliability-platform

Currently in development: a production-oriented LLMOps and RAG platform for reliable document-based AI assistants with citations, evaluation, observability, fallback routing, cost tracking, and live deployment.

Overview

The project is designed to be more than a simple chatbot. It demonstrates how RAG-based LLM applications can be built, tested, observed, rate-limited, cost-controlled, and deployed in a production-like environment.

Tech stack

PythonFastAPIPydanticPostgreSQLpgvectorRedisDockerGitHub ActionsOpenAI APILangChainLangGraphOpenTelemetryPrometheusGrafanaNext.jsTypeScriptTailwind CSSHetznerCloudflareCaddy

Planned platform capabilities

  • Document upload, parsing, deterministic chunking, embeddings, vector search, and RAG answers with citations.
  • LLM gateway with provider abstraction, routing, fallback behavior, retries, timeouts, rate limits, token tracking, and budget checks.
  • Golden datasets, evaluation runner, citation checks, regression thresholds, and CI-based eval gates.
  • Evidence page with screenshots, evaluation reports, architecture documentation, runbooks, and deployment notes.

Architecture direction

  • FastAPI backend with Pydantic schemas, async SQLAlchemy, Alembic migrations, PostgreSQL, pgvector, and Redis.
  • Next.js web app for demo, evidence, architecture, and operational visibility.
  • Self-hosted observability with OpenTelemetry, Prometheus, Grafana, structured logs, request IDs, and trace IDs.
  • Docker Compose deployment on Hetzner with Cloudflare DNS/proxy/TLS and Caddy as reverse proxy.