My name is Lovkush Agarwal. I recently decided to change careers and become a
data scientist. Following David Robinson’s’
advice, I decided to create this blog, to record my progress, learning and projects.
Improving the ratings of professional squash players using data and the ELO rating system
Oct 12, 2021
I summarise the two main ideas in Sathe and Aggarwal's paper Similarity Forests.
Apr 17, 2021
Collider bias can completely skew research findings. I describe some examples that highlight how this non-obvious bias arises.
Feb 21, 2021
While trying to find the best clustering of some text data, I unintentionally stumbled upon some visually striking plots, which I think are highly aesthetic and artistic.
Jan 1, 2021
I spent a solid week improving my Demo Day talk with the help of several people at Faculty. In this post, I detail the changes that were made and big lessons I learnt.
Dec 23, 2020
My program to process hundreds of documents would mysteriously stop on a particular document. After a couple of hours of checking all my code, I managed to isolate the problem to a particular regex search. I describe what I learnt in this blog post.
Dec 20, 2020
Bokeh is amazing! I learnt about it earlier this week and I want to illustrate its prowess by remaking plots from Part II series using Bokeh.
Nov 1, 2020
The process I used to make the animations was inefficient and not programmatic. I could not work out how to adapt the matplotlib animation tools to my situation so I asked for help. Here I describe what I learnt from the help that I received.
Oct 18, 2020
I create various charts to help visualise the difference between L1 and L2 regularisation. The pattern is clear and L1 regularisation does tend to force parameters to zero.
Oct 11, 2020
I end this series by describing some experiments I did with the sinusoidal case, before I realised that the learning rate was too big. Spoiler alert: turns out my first instincts from Part I were correct all along...
Oct 1, 2020
I finally dip my toes into some dimension reduction and clustering algorithms, by visualising the data I scraped in Part I of this series.
Sep 28, 2020
In the 80000 Hours' interview of Spencer Greenberg, Spencer describes a surprisingly simple yet largely unknown version of Baye's Theorem via odds instead of probabilities. In this post, I will describe the various ways I have conceptualised Bayes Theorem, ending with the interpretation that Spencer describes using odds.
Sep 24, 2020
I practice some web-scraping and pandas manipulation by scraping squash ranking data from Wikipedia.
Sep 17, 2020
I add the stochasticity in Stochastic Gradient Descent, by using mini-batches. In my previous post, I was hoping this would solve my local minimum with sinusoidal data. To my dismay, it did not help. However, I discover what the problem was all along.
Sep 17, 2020
I continue my project to visualise and understand gradient descent. This time I try to fit a neural network to linear, quadratic and sinusoidal data.
Sep 11, 2020