Yash Ghogre
AI Engineer
AI Engineer transforming complex problems into intelligent, scalable solutions—from custom ML models to integrated LLM applications.
About Me
I'm an AI Engineer driven by a deep curiosity to understand how complex systems work, not just that they do. This curiosity has led me to build foundational LLM architectures like LLaMA 2 and GPT-2 from scratch just to see their inner workings. I apply this same "from-the-ground-up" mindset to solve practical problems, whether I'm architecting a scalable memory framework or optimizing code to win a GPU-accelerated computing codeathon. I thrive on bridging the gap between deep theory and real-world application, building intelligent solutions that are robust and highly efficient.
Featured Projects
Rivet: Autonomous AI Software Engineer
Developed an autonomous coding agent capable of end-to-end software development. Rivet leverages advanced reasoning loops to plan, execute, and debug complex codebases, simulating a full-cycle engineer's workflow from task breakdown to final implementation.
Mem1: Memory Framework for LLMs
Independently developed a scalable memory framework for LLMs and autonomous agents based on the Mem0 research paper, engineering a multi-component retrieval pipeline and a CLI assistant.
Core LLM Architecture (LLaMA 2 & GPT-2)
Engineered complete, from-scratch PyTorch implementations of LLaMA 2 (7B) and GPT-2 (124M), demonstrating deep proficiency in modern transformer design and components like RoPE, GQA, and KV Caching.
Autograd Engine from Scratch
Designed and implemented a Python-based automatic differentiation engine, supporting dynamic computation graphs and diverse tensor operations, improving computational efficiency by 30%.
Tech Stack
Programming Languages
Frameworks/Libraries
Databases
Cloud & Tools
Experience
Turbo ML (Puch AI)
AI Engineering Intern (Core LLM & Agents)
- Architected a LangGraph multi-agent system for autonomous research, reducing user research time by ~60%.
- Deployed self-hosted search infra for WhatsApp chatbot, achieving low-latency retrieval without external APIs.
- Implemented a production RAG pipeline, improving factual accuracy by 40% and reducing hallucinations.
- Engineered a geolocation engine for unstructured intent, boosting local search relevance by 30%.
- Deployed stateful workflows to Kubernetes, enabling horizontal scaling for concurrent user sessions.
Dunlin
ML Intern (Model Serving & MLOps)
- Built an ensemble voting system (DistilBERT + AutoGluon), improving transaction classification by 20%.
- Reduced P95 inference latency using Async FastAPI with request batching and optimized utilization.
- Implemented AWS S3 model versioning and artifact management, ensuring 100% pipeline reproducibility.
Education
Bachelor of Technology
Computer Technology
Yeshwantrao Chavan College of Engineering, Nagpur
Achievements
Winner
GPU-Accelerated Computing and Codeathon
Runner-up
Kaggle Datathon Competition