projects
ASR NEMO TOKENIZER
2024
Engineered a sophisticated tokenization system for Malayalam, an ancient Dravidian language spoken by 45 million people across southern India. This implementation in NVIDIA's NeMo ASR framework bridges traditional linguistics with cutting-edge speech recognition technology, enabling unprecedented accuracy in low-resource language processing.
GOOGLE MAP SCAPPER
2024
A Python-based web scraping tool that extracts and compiles data from Google Maps. This project is designed to gather business information, reviews, and ratings, providing a comprehensive dataset for analysis.
MedSynth
2025
A powerful tool for generating synthetic medical data with semantic retrieval and validation. This project focuses on creating realistic patient records while ensuring data privacy and compliance with regulations.
ClinomeX
2025
A comprehensive web application for analyzing patient data to assess cancer and diabetes risk based on genetic markers and personal health metrics.
AI Web Assistant UNDERDEVELOPMENT
2025
A Streamlit application that leverages Google Gemini (via LangChain) and the browser-use package to perform automated web research, extract key information, and generate structured summaries on any query.
SmolDocling OCR UNDERDEVELOPMENT
2025
A desktop-ready OCR tool that uses Streamlit and Hugging Face’s SmolDocling-256M model to process JPEG/PNG documents entirely on-device, instantly outputting plain text or structured JSON—no internet or external APIs needed.