Available for Work

Yash Ghogre

AI Engineer

Nagpur, MH
YG

AI Engineer transforming complex problems into intelligent, scalable solutions—from custom ML models to integrated LLM applications.

About Me

I'm an AI Engineer driven by a deep curiosity to understand how complex systems work, not just that they do. This curiosity has led me to build foundational LLM architectures like LLaMA 2 and GPT-2 from scratch just to see their inner workings. I apply this same "from-the-ground-up" mindset to solve practical problems, whether I'm architecting a scalable memory framework or optimizing code to win a GPU-accelerated computing codeathon. I thrive on bridging the gap between deep theory and real-world application, building intelligent solutions that are robust and highly efficient.

Featured Projects

Rivet: Autonomous AI Software Engineer

Developed an autonomous coding agent capable of end-to-end software development. Rivet leverages advanced reasoning loops to plan, execute, and debug complex codebases, simulating a full-cycle engineer's workflow from task breakdown to final implementation.

Python
Langgraph
Docker
+1

Mem1: Memory Framework for LLMs

Independently developed a scalable memory framework for LLMs and autonomous agents based on the Mem0 research paper, engineering a multi-component retrieval pipeline and a CLI assistant.

Python
Qdrant
MongoDB
+1

Core LLM Architecture (LLaMA 2 & GPT-2)

Engineered complete, from-scratch PyTorch implementations of LLaMA 2 (7B) and GPT-2 (124M), demonstrating deep proficiency in modern transformer design and components like RoPE, GQA, and KV Caching.

PyTorch
Python
CUDA

Autograd Engine from Scratch

Designed and implemented a Python-based automatic differentiation engine, supporting dynamic computation graphs and diverse tensor operations, improving computational efficiency by 30%.

Python

Tech Stack

Programming Languages

Python
C++
C
JavaScript

Frameworks/Libraries

PyTorch
FastAPI
Next.JS
React.JS
ExpressJS
NodeJS
Numpy
Pandas
Scikit-learn

Databases

MongoDB
SQL
Redis

Cloud & Tools

AWS (S3)
Docker
Git
HTML
CSS
Socket.IO

Experience

Turbo ML (Puch AI)

AI Engineering Intern (Core LLM & Agents)

Apr 2025 – Oct 2025Remote (CA, USA)
  • Architected a LangGraph multi-agent system for autonomous research, reducing user research time by ~60%.
  • Deployed self-hosted search infra for WhatsApp chatbot, achieving low-latency retrieval without external APIs.
  • Implemented a production RAG pipeline, improving factual accuracy by 40% and reducing hallucinations.
  • Engineered a geolocation engine for unstructured intent, boosting local search relevance by 30%.
  • Deployed stateful workflows to Kubernetes, enabling horizontal scaling for concurrent user sessions.

Dunlin

ML Intern (Model Serving & MLOps)

Jun 2024 – Sep 2024Remote (DE, USA)
  • Built an ensemble voting system (DistilBERT + AutoGluon), improving transaction classification by 20%.
  • Reduced P95 inference latency using Async FastAPI with request batching and optimized utilization.
  • Implemented AWS S3 model versioning and artifact management, ensuring 100% pipeline reproducibility.

Education

Bachelor of Technology

Computer Technology

Yeshwantrao Chavan College of Engineering, Nagpur

June 2026GPA: 8.01

Achievements

Winner

GPU-Accelerated Computing and Codeathon

Runner-up

Kaggle Datathon Competition