About
I am currently pursuing my M.S. in Computer Science at the University of Colorado Boulder, where I also conduct AI research at the Leeds School of Business. My work focuses on building multi-agent LLM systems and production-grade MLOps infrastructure.
Previously, I spent two years at Wells Fargo scaling virtual assistants to a quarter-million enterprise users, and I am currently building data platforms at Credible Data that make complex analytics accessible to everyone. I care deeply about performance, reliability, and the craft of writing clean, maintainable code.
When I am not coding, you will find me optimizing LLM inference engines or exploring the frontiers of RAG systems. I believe the best software feels invisible: it just works, beautifully.
Experience
AI Research Intern
Leeds School of Business, CU Boulder • Boulder, CO
- Built multi-agent LLM advisory platform with React/FastAPI on GCP Cloud Run, serving 100+ concurrent users
- Integrated Gemini and Llama with live model switching via Kubernetes, achieving 2-3s response times
- Engineered RAG pipeline with context-aware memory, processing 20+ academic documents per session
- Cut student research onboarding time 40% through AI-assisted document summarization
Software Engineer
Credible Data • Boulder, CO
- Built data platform in React/TypeScript/Node.js with 7 chart types and Malloy semantic query support
- Integrated Claude Agent SDK for autonomous AI data discovery with real-time streaming
- Designed multi-database architecture across PostgreSQL, Neo4j, and BigQuery
- Deployed containerized pnpm monorepo to GCP Cloud Run via GitHub Actions CI/CD
Software Engineer
Wells Fargo • Hyderabad, India
- Scaled virtual assistant to 250,000+ enterprise users with 3s response times
- Built BERT intent recognition pipeline with 16 intents and 8 NER categories
- Achieved 95%+ test coverage with XUnit/Moq, cutting production bugs 40%
- Automated multi-environment CI/CD with Enterprise Pipeline, enabling same-day releases
Selected Projects
MLOps Platform
End-to-end model training and deployment infrastructure
- Cut model deployment time from days to 15 minutes via automated CI/CD
- Engineered FastAPI serving with A/B testing at 1,000 req/sec and <50ms p95 latency
- Integrated Prometheus/Grafana with automated drift detection and one-click rollback
Optimized LLM Inference Engine
High-performance language model serving infrastructure
- Optimized Llama-3-8B on CUDA with vLLM, achieving 3x throughput improvement
- Applied INT8/INT4 quantization and Flash Attention 2, cutting GPU memory 60%
- Deployed streaming FastAPI server with WebSockets serving 50+ concurrent users at <200ms TTFT
GradCompass
AI-powered graduate application assistant with specialized agents
- Built platform with 6 specialized LLM agents serving 20+ testers across 2,000+ universities
- Architected FastAPI backend with PostgreSQL, Google OAuth, and SQLAlchemy ORM
- Built AI visa interview simulator with Gemini API and speech-to-text at 2s response times
Cine-Stellation
Interactive movie recommendation engine with visual discovery
- Built TF-IDF recommendation engine across 9,600+ films at 90% accuracy and <1s query response
- Developed Next.js frontend with real-time HTML5 Canvas constellation graphs
Skills
Languages
Frameworks
Machine Learning
Databases
Infrastructure
Get in Touch
I am always open to discussing new projects, research opportunities, or just chatting about the future of AI infrastructure.
© 2025 Girish Jeswani. Built with care in Boulder, Colorado.