Available for Work

Yash Ghogre

AI Engineer

Nagpur, MH

AI Engineer transforming complex problems into intelligent, scalable solutions—from custom ML models to integrated LLM applications.

Download Resume Email Me

LinkedIn GitHub Email Phone

About Me

I'm an AI Engineer driven by a deep curiosity to understand how complex systems work, not just that they do. This curiosity has led me to build foundational LLM architectures like LLaMA 2 and GPT-2 from scratch just to see their inner workings. I apply this same "from-the-ground-up" mindset to solve practical problems, whether I'm architecting a scalable memory framework or optimizing code to win a GPU-accelerated computing codeathon. I thrive on bridging the gap between deep theory and real-world application, building intelligent solutions that are robust and highly efficient.

Featured Projects

Rivet: Autonomous AI Software Engineer

Developed an autonomous coding agent capable of end-to-end software development. Rivet leverages advanced reasoning loops to plan, execute, and debug complex codebases, simulating a full-cycle engineer's workflow from task breakdown to final implementation.

Python

Langgraph

Docker

View Repo

Mem1: Memory Framework for LLMs

Independently developed a scalable memory framework for LLMs and autonomous agents based on the Mem0 research paper, engineering a multi-component retrieval pipeline and a CLI assistant.

Python

Qdrant

MongoDB

View Repo

Core LLM Architecture (LLaMA 2 & GPT-2)

Engineered complete, from-scratch PyTorch implementations of LLaMA 2 (7B) and GPT-2 (124M), demonstrating deep proficiency in modern transformer design and components like RoPE, GQA, and KV Caching.

PyTorch

Python

CUDA

LLaMA 2 Repo GPT-2 Repo

Autograd Engine from Scratch

Designed and implemented a Python-based automatic differentiation engine, supporting dynamic computation graphs and diverse tensor operations, improving computational efficiency by 30%.

Python

View Repo

Tech Stack

Programming Languages

Python

C++

JavaScript

Frameworks/Libraries

PyTorch

FastAPI

Next.JS

React.JS

ExpressJS

NodeJS

Numpy

Pandas

Scikit-learn

Databases

MongoDB

SQL

Redis

Cloud & Tools

AWS (S3)

Docker

Git

HTML

CSS

Socket.IO

Experience

Turbo ML (Puch AI)

AI Engineering Intern (Core LLM & Agents)

Apr 2025 – Oct 2025Remote (CA, USA)

Architected a LangGraph multi-agent system for autonomous research, reducing user research time by ~60%.
Deployed self-hosted search infra for WhatsApp chatbot, achieving low-latency retrieval without external APIs.
Implemented a production RAG pipeline, improving factual accuracy by 40% and reducing hallucinations.
Engineered a geolocation engine for unstructured intent, boosting local search relevance by 30%.
Deployed stateful workflows to Kubernetes, enabling horizontal scaling for concurrent user sessions.

Dunlin

ML Intern (Model Serving & MLOps)

Jun 2024 – Sep 2024Remote (DE, USA)

Built an ensemble voting system (DistilBERT + AutoGluon), improving transaction classification by 20%.
Reduced P95 inference latency using Async FastAPI with request batching and optimized utilization.
Implemented AWS S3 model versioning and artifact management, ensuring 100% pipeline reproducibility.

Education

Bachelor of Technology

Computer Technology

Yeshwantrao Chavan College of Engineering, Nagpur

June 2026GPA: 8.01

Achievements

Winner

GPU-Accelerated Computing and Codeathon

Runner-up

Kaggle Datathon Competition