AI / LLM Engineering

Enterprise AI Assistant Platform (RAG)

A production, company-wide AI assistant grounded in internal knowledge, with fine-grained department-scoped access control and full enterprise-identity integration — end to end, from architecture through deployment.

3 (fast, primary, embeddings)

Models orchestrated

Attribute-based, by department

Access control

Zero — managed identity + vault

Credentials in app config

The Problem

Employees needed a single AI chat assistant that could answer questions using real internal knowledge (intranet pages, IT documentation, department-specific content) — but department-specific information had to stay restricted to the right audience, and the whole thing needed to be secured to enterprise identity, not a shared login.

My Approach

Chose an open-source chat frontend and self-hosted it, backed by a multi-model LLM backend (a fast model and a primary model, plus a dedicated embeddings model) so cost and latency could be tuned per use case.
Designed a Retrieval-Augmented Generation (RAG) proxy service that sits between the chat frontend and the LLM: it intercepts a chat request, runs a hybrid vector + semantic search against a managed search index, injects the most relevant retrieved chunks into the prompt, and only then forwards to the model — so answers are grounded in real content instead of the model's general knowledge.
Implemented attribute-based access control: knowledge-base documents are tagged by department, and retrieval is filtered by the requesting user's department attribute from the identity provider — not by static group membership — so access stays correct as people move between departments without manual group management.
Wired single sign-on end-to-end via OpenID Connect against the enterprise identity provider, including session handling and secure secret storage.
Hardened the service for production before rollout: cryptographic JWT signature verification (never disabling signature checks), per-IP rate limiting, locked-down CORS, standard security headers, and file-upload validation (magic-byte checking + size limits) on the knowledge-ingestion path.
Deployed the entire stack from Infrastructure as Code, with all secrets in a managed secrets vault accessed via managed identity — no credentials in application config.

Stack

AI/LLM

Retrieval-Augmented Generation (RAG)Vector + semantic hybrid searchMulti-model routing (chat/fast/embeddings)Self-hosted open-source chat UI

Cloud

Azure AI Foundry / Azure OpenAIAzure AI SearchAzure Container AppsAzure Cosmos DBAzure Key VaultAzure Container Registry

Backend

Python (FastAPI)

Identity/Security

OpenID Connect SSOJWT verificationAttribute-based access controlRate limitingCSP/security headers

IaC

Bicep

Practices

Secure-by-default API designManaged-identity secret accessProduction hardening checklist before go-live

Skills Demonstrated

▸End-to-end RAG system design and implementation
▸Enterprise identity integration (OIDC SSO, attribute-based authorization)
▸Secure API and service design for production, not just prototype quality
▸Full-stack ownership: from architecture and IaC through deployment and hardening
▸Translating a vague "AI assistant for the company" ask into a scoped, secure, shippable system