llm-reliability-platform
Currently in development: a production-oriented LLMOps and RAG platform for reliable document-based AI assistants with citations, evaluation, observability, fallback routing, cost tracking, and live deployment.
Overview
The project is designed to be more than a simple chatbot. It demonstrates how RAG-based LLM applications can be built, tested, observed, rate-limited, cost-controlled, and deployed in a production-like environment.
Tech stack
PythonFastAPIPydanticPostgreSQLpgvectorRedisDockerGitHub ActionsOpenAI APILangChainLangGraphOpenTelemetryPrometheusGrafanaNext.jsTypeScriptTailwind CSSHetznerCloudflareCaddy
Planned platform capabilities
- •Document upload, parsing, deterministic chunking, embeddings, vector search, and RAG answers with citations.
- •LLM gateway with provider abstraction, routing, fallback behavior, retries, timeouts, rate limits, token tracking, and budget checks.
- •Golden datasets, evaluation runner, citation checks, regression thresholds, and CI-based eval gates.
- •Evidence page with screenshots, evaluation reports, architecture documentation, runbooks, and deployment notes.
Architecture direction
- •FastAPI backend with Pydantic schemas, async SQLAlchemy, Alembic migrations, PostgreSQL, pgvector, and Redis.
- •Next.js web app for demo, evidence, architecture, and operational visibility.
- •Self-hosted observability with OpenTelemetry, Prometheus, Grafana, structured logs, request IDs, and trace IDs.
- •Docker Compose deployment on Hetzner with Cloudflare DNS/proxy/TLS and Caddy as reverse proxy.