Updates
Hello! I'm Shagun Dwivedi
I'm an AI Researcher, with a passion for data, writing systems, languages, and accuracy. I currently work at the Centre for Interdisciplinary AI, FLAME University, Pune, India. I have a BS in Data Science and Applications from IIT Madras. I have a BA (Hons) in Ancient Indian History and Archaeology from University of Lucknow.
I'm interested in natural language processing, use of language models in machine perception, and evaluation benchmarks/metrics for LLMs.
Education
Check out some of my projects
Analysis of different Tokenizers and their effect on Downstream tasks for Hindi and Marathi
Evaluation of various subword tokenization methods and their impact on language modeling performance for Hindi and Marathi.
Proposing a method of tokenization, which could help reduce the issue of sub-optimal tokenization with over-segmentation of text in Indic languages and its effect on a language modeling (using T5).
MANUSCRIPT IN REVIEWDownscaling Terrestrial Water Storage Anomalies
Algorithm that produces a higher spatial resolution TWSA using 6 months of precipitation and temperature.
Implemented a neighborhood-weighted iterative optimization algorithm to downscale TWSA from 3°×3° to 0.1°×0.1° resolution achieving better RMSE and spatial frequency scores than GLDAS estimates.
MANUSCRIPT IN PREPARATIONConverting Custom-Embedded Subsetted Non-Unicode Fonts to Searchable Formats
Extracting Indic text from non-Unicode standard PDFs with embedded fonts to a searchable and Unicode-compliant format.
Expanding on previously published case study in Gujarati by including Hindi, Marathi, and Bangla documents.
Handwritten Text Recognition for Pre-Colonial Sanskrit manuscripts
Digitizing manuscripts on Vedanta and Mimansa to make them searchable.
Finetuned a Resnet-BiLSTM-CTC model for handwritten manuscripts which outperforms out-of-the-box HTR models on grapheme cluster error rate and character error rate.
GO TO PROJECTAI Language Practice Applications for French and German
Platform for beginner French 101 and German 101 courses' practice customised to university course syllabi.
Designed multi-agent system supported by various custom prompt-engineered OpenAI bots for the dialog framework. Presented at AIET 2025.
GO TO PROJECTCustomer Behavior Prediction
Predicting user behaviour based on the restaurant's offers.
Performed preprocessing, trained and validated and evaluated multiple supervised classification models and decided on the best prediction model.
GO TO PROJECTIndian Weather API
API for India Meteorological Department Weather Data
Access brief or detailed weather report based on station's name. Get forecast for the next seven days.
GO TO PROJECTAirIndia Case Study
Analysis and formulation of growth strategies for AirIndia post acquisition by TATA Group
Case study ranked first under Winter Consulting'22, IIT Guwahati. Conducted extensive market research, formulated strategies for airline's improvement based on the growth strategy framework.
GO TO PROJECTTrackOn App
Flask based web application for tracking habits, activities, and other life parameters.
TrackOn is a web application used for tracking habits, activities, other life parameters. Users can register & login to create multiple trackers with multiple logs. They can review their progress over time with graphs trend lines.
GO TO PROJECT