AI Systems & Projects | Amulya Gupta — Agentic AI · LLM

Project 01 — Agentic AI

Multi-Agent AI Research Assistant

Agentic AI LangGraph LangChain GPT-4 Tool Use FastAPI Python Docker

The Problem

Research is tedious. You search, read, cross-reference, synthesise, and repeat — and most of it is mechanical work that doesn't need a human brain. I wanted to build a system where multiple AI agents handle different parts of the research process and coordinate with each other, the way a small research team would.

Architecture

Orchestrator Agent: LangGraph-based supervisor that breaks research queries into sub-tasks, delegates to specialist agents, and synthesises their outputs into a coherent report

Research Agent: Searches the web, extracts key information, and produces structured summaries with source citations

Analysis Agent: Takes research outputs and identifies patterns, contradictions, and knowledge gaps across multiple sources

Writing Agent: Produces the final structured report with proper citations, maintaining consistent tone and logical flow

Tool Layer: Web search, document parsing, citation extraction, and structured output formatting via function calling

Design Decisions

▸ LangGraph over raw LangChain because I needed conditional routing — the orchestrator decides which agent to call based on the current state of the research
▸ Separate agents with distinct system prompts rather than one monolithic chain — each agent is focused and testable independently
▸ Structured outputs via function calling to ensure agents communicate in predictable formats, not free-text
▸ FastAPI service wrapping the agent system so it can be called as an API, not just run as a script

What I Learned

The hardest part of multi-agent systems isn't building individual agents — it's designing the communication protocol between them. When agents pass free-text to each other, quality degrades fast. Structured outputs and clear state machines made the system dramatically more reliable. Also: agent orchestration frameworks are still immature, so expect to build custom routing logic.

Project 02 — MLOps

Heart Disease Prediction — End-to-End MLOps System

View on GitHub ↗

MLOps Docker Kubernetes FastAPI MLflow Prometheus Grafana GitHub Actions Python

The Problem

Everyone trains heart disease prediction models. Almost nobody deploys them properly. I wanted to see what it actually takes to go from a trained model to something running in a real infrastructure environment — versioned, reproducible, monitored, and deployed without manual steps.

What I Built

Data & Training Layer: Preprocessing pipeline with reproducible artifact management using MLflow experiment tracking

Inference Layer: FastAPI service with input validation (Pydantic), health check endpoints, and Prometheus metrics exposure

Infrastructure Layer: Docker containerisation + Kubernetes (Minikube) orchestration with declarative manifests

Observability Layer: Prometheus scraping metrics + Grafana dashboards for inference latency, error rates, and request volume

CI/CD Layer: GitHub Actions pipeline automating build, test, and deployment on every push

Why I Made These Choices

▸ MLflow over manual tracking because I got burned trying to reproduce an experiment three weeks later and couldn't
▸ FastAPI over Flask because Pydantic catches bad input before it hits the model, and the auto-docs save everyone time
▸ Kubernetes even on Minikube — overkill, sure, but the manifests work identically on any real cluster

What I Took Away

The biggest lesson wasn't about any single tool — it was about wiring them together. A misconfigured health check had Kubernetes restarting my pod in a loop for an hour before I figured it out. The thing that stuck: observability isn't a nice-to-have. If you can't see what your model is doing in production, you're guessing.

Project 03 — Data Engineering

Customer Churn Prediction — Automated ELT Pipeline & ML System

View on GitHub ↗

Python Prefect ETL Pipeline Supervised Learning Data Engineering Automation PostgreSQL

The Problem

Churn data is scattered across CRM exports, usage logs, and billing records — and it's always messier than you expect. The real challenge isn't training a model. It's building a pipeline that reliably pulls from all these sources, cleans the data, and feeds it into a training loop without someone manually fixing things every time.

What I Built

ELT Pipeline: Prefect-orchestrated Python pipeline that pulls from multiple data sources and consolidates everything into a clean training set

Data Quality: Validation and cleaning steps that catch the kind of data issues that silently tank model performance

Model Training: Tested several supervised learning approaches, landed on one that hits 90%+ accuracy consistently

Speed: Profiled and optimised the pipeline until it ran 40% faster — turns out my first attempt had some embarrassingly inefficient data loading

Outcome

A pipeline that runs on schedule, processes messy multi-source data, retrains the model, and doesn't need me to hold its hand. The 40% speed improvement came from fixing data loading bottlenecks I didn't even know I had until I profiled properly.

Project 04 — Production Case Study

Adobe Journey Optimizer — Production Campaign Platform at Scale

ℹ️ This is a case study of my production work at HCLTech. No proprietary code is shared — only outcomes, decisions, and learnings.

Production Adobe Journey Optimizer 20M+ Users Campaign Engineering Data Pipelines Personalization

What I Owned

▸ Ran the whole lifecycle of in-app campaigns through Adobe Journey Optimizer — from setup to delivery to post-launch monitoring
▸ Shipped 40+ campaigns during Black Friday and Cyber Monday, which is exactly when you can't afford to mess up
▸ These campaigns reached 20M+ users and we saw CTR and purchase completion go up 10-15% through better targeting
▸ Made sure everything was production-ready before it went out — QA coordination, render validation, the boring but critical stuff
▸ Built the data pipelines that power the personalisation engine behind these campaigns

The Part I'm Proudest Of

Zero critical post-launch issues across all 40+ campaigns during the busiest retail season of the year. That doesn't happen by accident — it takes careful QA, solid audience segmentation, and a lot of coordination between engineering, QA, and product teams who all have different priorities.

Project 05 — LLM + RAG

AI-Powered Customer Support Ticket System

View on GitHub ↗

Python FastAPI LangChain RAG PostgreSQL Docker Sentence Transformers ChromaDB

The Problem

I watched support agents spend their days answering the same questions that were already solved somewhere in the knowledge base. The answers existed — they were just buried. I wanted to build something that could read an incoming ticket, figure out what kind of issue it is, find similar past resolutions, and draft a response. Not to replace agents, but to stop them from doing the same repetitive lookup work over and over.

What I Built

Ticket Classification: A classifier that sorts tickets into categories (billing, technical, account, feature request) — gets it right about 92% of the time

RAG-Powered Suggestions: Searches past resolved tickets and KB articles to find similar issues, then drafts a response the agent can edit and send

Smart Routing: Figures out priority and which team should handle it based on the ticket content, the customer's history, and how similar tickets were resolved before

API Layer: FastAPI service that handles ticket submission, status tracking, and plugs into the agent dashboard

Why I Made These Choices

▸ Sentence-transformers instead of OpenAI's API for embeddings — I wanted this to run without racking up per-query costs
▸ Hybrid search (vector + BM25) because technical tickets are full of specific error codes and product names that need exact matching
▸ PostgreSQL handles the ticket lifecycle and audit trail, ChromaDB handles the vector search — each does what it's good at

Outcome

In testing, first-response time dropped by about 60% compared to manual triage. The RAG component finds relevant past resolutions for over 75% of incoming tickets. The agents still make the final call on every response — the system just does the tedious lookup work for them.