SQL in Python
I ran the SQL commands in Python using the module sqlite3. I used this article to help me set it all up. Below is the code to run queries on the database movies.db in python.
import sqlite3
def run_sql(query):
connection = sqlite3.connect('movies.db')
cursor = connection.cursor()
cursor.execute(query)
results = cursor.fetchall()
cursor.close()
connection.close()
return results
Understanding the database
The data is in the form of a database, movies.db, and was obtained from the online course CS50, which in turn was obtained from imdb. To find out the structure of the tables in the database, I ran the following code (obtained from stackoverflow).
def produce_schema():
sql = "SELECT name FROM sqlite_master WHERE type = 'table';"
tables = run_sql(sql)
schema = {}
for table in tables:
sql = f"SELECT sql FROM sqlite_master WHERE type = 'table' and name = '{table[0]}'"
results = run_sql(sql)
print(results[0][0])
schema[table[0]] = results[0][0]
return schema
produce_schema()
For the task of determining bacon numbers, the relevant tables and columns are:
-
people
, with columnsid
andname
. Each person has a unique id. There are 1044499 people in the table. -
stars
, with columnsmovie_id
andperson_id
. Each row tells you that the actor with idperson_id
starred in the movie with idmovie_id
.
Kevin Bacon has an id of 102, found by running the query SELECT id FROM people WHERE name = 'Kevin Bacon'
The recursive query
To understand recursion in SQL, I recommend this guide which explains recursion and how to use recursion in SQL, and then the SQLite documentation to better understand some implementation details and what options are available to you.
The query I created to produce the table of bacon numbers is:
WITH RECURSIVE
costars(id1, id2) AS
(
SELECT stars1.person_id, stars2.person_id
FROM stars AS stars1
JOIN stars AS stars2
ON stars1.movie_id = stars2.movie_id
),
bacon(id, num) AS
(
VALUES(102, 0)
UNION
SELECT id2, num+1
FROM bacon JOIN costars
ON bacon.id = costars.id1
WHERE num < 13
)
SELECT bacon.id, name, MIN(num)
FROM bacon JOIN people ON bacon.id = people.id
GROUP BY bacon.id
ORDER BY num
- The table
costars
lists all pairs of actors that co-starred in a movie. This is obtained by doing a self-join of the stars table. - The table
bacon
is the main table.id
is the id of the actor andnum
is the bacon number.- It starts off with the entry
(102,0)
. 102 is Kevin Bacon’s id and 0 is Kevin Bacon’s Bacon number. - Then for any entry
(id, num)
already in the table, we add a new row(id2, num+1)
, wheneverid2
co-starred withid
. -
num < 13
indicates we will only have a maximum bacon number of 13. Without this manual limit, the recursive query would never end. This is because the underlying data is not acyclic: e.g. if a is a co-star with b, then b is a co-star of a. The number 13 was chosen via trial-and-error. The resulting table does not change if I increase the limit further, which implies that the maximum bacon number is 13.
- It starts off with the entry
- In the end, I select the relevant data from this recursive construction. Because each actor can appear many times in this construction, I use
GROUP BY
to ensure each actor appears only once. I useMIN(num)
to select each actor’s earliest appearance.
The two main problems with this query are that:
- It is inefficient. There is huge redundancy as actors appear many times in the recursive construction. I do not think there is a way of avoiding this within SQL.
- I have to know what the maximum Bacon number is for the query to produce a complete list. I found this using trial-and-error.
Bacon number of 13
By running a simple query, I find there are two people with the maximum bacon number of 13, Javier Ordonez and Kimberly Peters. Using the program created for this CS50 AI project, I could find the path from these actors to Kevin Bacon. As expected, it takes 13 steps (always satisfying to see two different programs being consistent!) and they are below. Note they have the same path.
- Javier Ordonez/Kimberly Peters and Amanda Brass starred in PRND
- Amanda Brass and Michael Bayouth starred in Park Reverse Neutral Drive, PRND (Director’s cut)
- Michael Bayouth and Brandy Bourdeaux starred in Grease Trek
- Brandy Bourdeaux and Kim Beuché starred in Murder Inside of Me
- Kim Beuché and Ed Baccari starred in Island, Alicia
- Ed Baccari and Aida Angotti starred in Late Watch
- Aida Angotti and Lamont Copeland starred in Bottom Out
- Lamont Copeland and Ashley Marie Arnold starred in Eye Was Blind
- Ashley Marie Arnold and Sid Bernstein starred in The Rodnees: We Mod Like Dat!
- Sid Bernstein and Chuck Berry starred in The Beatles: The Lost Concert
- Chuck Berry and Eric Clapton starred in Chuck Berry Hail! Hail! Rock ‘n’ Roll
- Eric Clapton and Tom Cruise starred in Close Up
- Tom Cruise and Kevin Bacon starred in A Few Good Men