Welcome!

My name is Lovkush Agarwal. I recently decided to change careers and become a data scientist. Following David Robinson’s’ advice, I decided to create this blog, to record my progress, learning and projects.

Posts

Using data to improve professional squash rankings
Improving the ratings of professional squash players using data and the ELO rating system
Oct 12, 2021
Similarity trees and NaN trees
I summarise the two main ideas in Sathe and Aggarwal's paper Similarity Forests.
Apr 17, 2021
Examples of collider bias
Collider bias can completely skew research findings. I describe some examples that highlight how this non-obvious bias arises.
Feb 21, 2021
Using Data Science to Create Art
While trying to find the best clustering of some text data, I unintentionally stumbled upon some visually striking plots, which I think are highly aesthetic and artistic.
Jan 1, 2021
Presentations. Turning good slides into great slides
I spent a solid week improving my Demo Day talk with the help of several people at Faculty. In this post, I detail the changes that were made and big lessons I learnt.
Dec 23, 2020
A surprising bug caused by regex
My program to process hundreds of documents would mysteriously stop on a particular document. After a couple of hours of checking all my code, I managed to isolate the problem to a particular regex search. I describe what I learnt in this blog post.
Dec 20, 2020
Squash rankings, Part III, All hail Bokeh!
Bokeh is amazing! I learnt about it earlier this week and I want to illustrate its prowess by remaking plots from Part II series using Bokeh.
Nov 1, 2020
Visualising L1 and L2 regularisation, Part II, Lessons learnt from an experienced programmer
The process I used to make the animations was inefficient and not programmatic. I could not work out how to adapt the matplotlib animation tools to my situation so I asked for help. Here I describe what I learnt from the help that I received.
Oct 18, 2020
Visualising L1 and L2 regularisation
I create various charts to help visualise the difference between L1 and L2 regularisation. The pattern is clear and L1 regularisation does tend to force parameters to zero.
Oct 11, 2020
Stochastic Gradient Descent, Part IV, Experimenting with sinusoidal case
I end this series by describing some experiments I did with the sinusoidal case, before I realised that the learning rate was too big. Spoiler alert: turns out my first instincts from Part I were correct all along...
Oct 1, 2020
Squash rankings, Part II, dimension reduction and clustering
I finally dip my toes into some dimension reduction and clustering algorithms, by visualising the data I scraped in Part I of this series.
Sep 28, 2020
An intuitive but unknown version of Bayes' Theorem
In the 80000 Hours' interview of Spencer Greenberg, Spencer describes a surprisingly simple yet largely unknown version of Baye's Theorem via odds instead of probabilities. In this post, I will describe the various ways I have conceptualised Bayes Theorem, ending with the interpretation that Spencer describes using odds.
Sep 24, 2020
Squash rankings, Part I, Scraping wikipedia and data analysis
I practice some web-scraping and pandas manipulation by scraping squash ranking data from Wikipedia.
Sep 17, 2020
Stochastic Gradient Descent, Part III, Fitting linear, quadratic and sinusoidal data using a neural network and **S**GD
I add the stochasticity in Stochastic Gradient Descent, by using mini-batches. In my previous post, I was hoping this would solve my local minimum with sinusoidal data. To my dismay, it did not help. However, I discover what the problem was all along.
Sep 17, 2020
Stochastic Gradient Descent, Part II, Fitting linear, quadratic and sinusoidal data using a neural network and GD
I continue my project to visualise and understand gradient descent. This time I try to fit a neural network to linear, quadratic and sinusoidal data.
Sep 11, 2020