AI / LLM Engineering
Enterprise AI Assistant Platform (RAG)
A production, company-wide AI assistant grounded in internal knowledge, with fine-grained department-scoped access control and full enterprise-identity integration — end to end, from architecture through deployment.
3 (fast, primary, embeddings)
Models orchestrated
Attribute-based, by department
Access control
Zero — managed identity + vault
Credentials in app config
The Problem
Employees needed a single AI chat assistant that could answer questions using real internal knowledge (intranet pages, IT documentation, department-specific content) — but department-specific information had to stay restricted to the right audience, and the whole thing needed to be secured to enterprise identity, not a shared login.
My Approach
- Chose an open-source chat frontend and self-hosted it, backed by a multi-model LLM backend (a fast model and a primary model, plus a dedicated embeddings model) so cost and latency could be tuned per use case.
- Designed a Retrieval-Augmented Generation (RAG) proxy service that sits between the chat frontend and the LLM: it intercepts a chat request, runs a hybrid vector + semantic search against a managed search index, injects the most relevant retrieved chunks into the prompt, and only then forwards to the model — so answers are grounded in real content instead of the model's general knowledge.
- Implemented attribute-based access control: knowledge-base documents are tagged by department, and retrieval is filtered by the requesting user's department attribute from the identity provider — not by static group membership — so access stays correct as people move between departments without manual group management.
- Wired single sign-on end-to-end via OpenID Connect against the enterprise identity provider, including session handling and secure secret storage.
- Hardened the service for production before rollout: cryptographic JWT signature verification (never disabling signature checks), per-IP rate limiting, locked-down CORS, standard security headers, and file-upload validation (magic-byte checking + size limits) on the knowledge-ingestion path.
- Deployed the entire stack from Infrastructure as Code, with all secrets in a managed secrets vault accessed via managed identity — no credentials in application config.
Stack
AI/LLM
Retrieval-Augmented Generation (RAG)Vector + semantic hybrid searchMulti-model routing (chat/fast/embeddings)Self-hosted open-source chat UI
Cloud
Azure AI Foundry / Azure OpenAIAzure AI SearchAzure Container AppsAzure Cosmos DBAzure Key VaultAzure Container Registry
Backend
Python (FastAPI)
Identity/Security
OpenID Connect SSOJWT verificationAttribute-based access controlRate limitingCSP/security headers
IaC
Bicep
Practices
Secure-by-default API designManaged-identity secret accessProduction hardening checklist before go-live
Skills Demonstrated
- ▸End-to-end RAG system design and implementation
- ▸Enterprise identity integration (OIDC SSO, attribute-based authorization)
- ▸Secure API and service design for production, not just prototype quality
- ▸Full-stack ownership: from architecture and IaC through deployment and hardening
- ▸Translating a vague "AI assistant for the company" ask into a scoped, secure, shippable system