Hi! I’m a freshman at Caltech majoring in Computer Science.
I’m interested in machine learning–particularly in its applications to scientific research–as well as signal processing and full-stack development.
You can reach me here:
I also have profiles on Google Scholar and ORCiD.
Publications
- Arjun Sharma, Vinesh Maguire Rajpaul, "Positive and unlabelled machine learning
reveals new fast radio burst repeater candidates", Monthly Notices of the Royal Astronomical
Society, Volume 533, Issue 3, September 2024
doi: 10.1093/mnras/stae1972
Projects
-
Fast radio bursts and positive unlabelled machine learning
2023-2024Fast radio bursts (FRBs) are high-energy radio transients of unknown astronomical origin. Some FRB sources repeat, offering valuable scientific insights. However, past uses of supervised ML to identify ‘hidden repeaters’ overlooked the positive and unlabelled (PU) nature of FRB repeater data, where true labels exist only for known repeaters. The key contribution of my paper was identify the need for different, PU-specific, techniques for this problem.
I trained and optimised an ensemble of five PU-specific classifiers to identify 66 repeater candidtes with high confidence, 18 of which were missed by previous techniques. Compared to supervised methods, PU classifiers improved recall by 4.62% and consistently scored better on PU-specific metrics.
I presented my findings at the 2023 MIT Undergraduate Research Technology Conference and published them in the Monthly Notices of the Royal Astronomical Society. This project was also one of 20 to win the grand prize at the Indian IRIS National Science Fair in 2024 and be selected as part of Team India for the 2024 International Science & Engineering Fair in Los Angeles.
The code for this project was written in Python and made extensive use of the scikit-learn, SciPy, optuna and pandas libraries. To add standardised methods for fitting and inference, I created custom open-source implementations of pre-existing PU learning algorithms, including Modified PU Logistic Regression and PUExtraTrees.
I am grateful for the mentorship of Dr. Vinesh Maguire Rajpaul (University of Cambridge) in this project.
-
Fourier transforms and audio recognition: signal processing research paper
2023-2024For my IB Mathematics Extended Essay, I studied and implemented signal processing methods to rapidly identify short audio clips.
I developed and analyzed a technique that generated memory-efficient hashes of a database of audio clips by taking Hann-windowed short-time Fourier Transforms of approximately 1-second frames. A simple matching algorithm then identified arbitrary 10-second clips from the database with 96.8% accuracy, and 2.5-second clips with 93.8% accuracy. Upon the addition of background noise of an equal intensity of the main audio file, 82.8% accuracy was still achieved for 10-second clips.
The code for this project was written in Python using Numpy and SciPy libraries. I wrote programs to scrape public-domain audio files, generate frequency domain-based hashes of an audio library, recognise clips from the library, and add random Gaussian noise.
-
MLCryptHunt: online platform for annual cryptic hunt event
2021-2023I designed, developed, and managed from scratch the deployment for the web portal which hosted the 2021-23 editions of my school’s annual puzzle hunt-style competition. Players competing collectively between four houses logged in and tackled dozens of cryptic puzzles together while logging tens of thousands of answer attempts over 72+ hour periods.
Some key functionality I implemented included synchronisation of questions between potentially hundreds of users concurrently playing for the same house; server-side moderation, logging and anti-cheating tools; and automatic switching of website status between four possible modes.
I developed the platform with Django, worked with a PostgreSQL database deployed to Heroku, created REST APIs with Django REST Framework, and styled the UI using TailwindCSS.
-
Secure volunteer tutoring app for schools and students
2022An all-in-one platform for facilitating volunteer tutoring I developed for the Government of India Smart India Hackathon 2022, where it was declared one of five winners nationwide in the ‘Smart Education’ category, and one of fifty winners overall across 3000+ submissions.
In line with recommendations from India’s 2020 National Education Policy, this app connects students to volunteer tutors to enable personalised tutoring under the supervision of their schools. Crucially, it is adapted to the diverse needs of Indian students, allowing them to find tutors of the right language, age, subjects, and educational qualification.
I built the backend of this app in Python using Django, the cross-platform mobile app with Dart using Flutter, and web dashboard with BulmaCSS. Data from a PostgreSQL database was transmitted and serialized over a REST API built using Django REST Framework, and JSON Web Tokens were used for storing authentication state. Real-time chat was implemented using a Firebase Firestore database. Static and media file storage was implemented using AWS S3. Private Zoom meetings were autogenerated using the Zoom API.