projects

ASR NEMO TOKENIZER

2024

Engineered a sophisticated tokenization system for Malayalam, an ancient Dravidian language spoken by 45 million people across southern India. This implementation in NVIDIA's NeMo ASR framework bridges traditional linguistics with cutting-edge speech recognition technology, enabling unprecedented accuracy in low-resource language processing.

ASRNvidia NEMOopen source

GOOGLE MAP SCAPPER

2024

A Python-based web scraping tool that extracts and compiles data from Google Maps. This project is designed to gather business information, reviews, and ratings, providing a comprehensive dataset for analysis.

Pythonweb scrapingGoogle Maps

MedSynth

2025

A powerful tool for generating synthetic medical data with semantic retrieval and validation. This project focuses on creating realistic patient records while ensuring data privacy and compliance with regulations.

medicaldata generationprivacy

ClinomeX

2025

A comprehensive web application for analyzing patient data to assess cancer and diabetes risk based on genetic markers and personal health metrics.

medicalhealthdata analysis

AI Web Assistant UNDERDEVELOPMENT

2025

A Streamlit application that leverages Google Gemini (via LangChain) and the browser-use package to perform automated web research, extract key information, and generate structured summaries on any query.

StreamlitGoogle Geminiweb research

SmolDocling OCR UNDERDEVELOPMENT

2025

A desktop-ready OCR tool that uses Streamlit and Hugging Face’s SmolDocling-256M model to process JPEG/PNG documents entirely on-device, instantly outputting plain text or structured JSON—no internet or external APIs needed.

StreamlitOCRHugging Face