My name is Lovkush Agarwal. I recently decided to change careers and become a
data scientist. Following David Robinson’s’
advice, I decided to create this blog, to record my progress, learning and projects.

Improving the ratings of professional squash players using data and the ELO rating system

Oct 12, 2021

I summarise the two main ideas in Sathe and Aggarwal's paper Similarity Forests.

Apr 17, 2021

Collider bias can completely skew research findings. I describe some examples that highlight how this non-obvious bias arises.

Feb 21, 2021

While trying to find the best clustering of some text data, I unintentionally stumbled upon some visually striking plots, which I think are highly aesthetic and artistic.

Jan 1, 2021

I spent a solid week improving my Demo Day talk with the help of several people at Faculty. In this post, I detail the changes that were made and big lessons I learnt.

Dec 23, 2020

My program to process hundreds of documents would mysteriously stop on a particular document. After a couple of hours of checking all my code, I managed to isolate the problem to a particular regex search. I describe what I learnt in this blog post.

Dec 20, 2020

Bokeh is amazing! I learnt about it earlier this week and I want to illustrate its prowess by remaking plots from Part II series using Bokeh.

Nov 1, 2020

The process I used to make the animations was inefficient and not programmatic. I could not work out how to adapt the matplotlib animation tools to my situation so I asked for help. Here I describe what I learnt from the help that I received.

Oct 18, 2020

I create various charts to help visualise the difference between L1 and L2 regularisation. The pattern is clear and L1 regularisation does tend to force parameters to zero.

Oct 11, 2020

I end this series by describing some experiments I did with the sinusoidal case, before I realised that the learning rate was too big. Spoiler alert: turns out my first instincts from Part I were correct all along...

Oct 1, 2020

I finally dip my toes into some dimension reduction and clustering algorithms, by visualising the data I scraped in Part I of this series.

Sep 28, 2020

In the 80000 Hours' interview of Spencer Greenberg, Spencer describes a surprisingly simple yet largely unknown version of Baye's Theorem via odds instead of probabilities. In this post, I will describe the various ways I have conceptualised Bayes Theorem, ending with the interpretation that Spencer describes using odds.

Sep 24, 2020

I practice some web-scraping and pandas manipulation by scraping squash ranking data from Wikipedia.

Sep 17, 2020

I add the stochasticity in Stochastic Gradient Descent, by using mini-batches. In my previous post, I was hoping this would solve my local minimum with sinusoidal data. To my dismay, it did not help. However, I discover what the problem was all along.

Sep 17, 2020

I continue my project to visualise and understand gradient descent. This time I try to fit a neural network to linear, quadratic and sinusoidal data.

Sep 11, 2020