dvd-rental-store-database-analysis

DVD Rental Store Database Analysis

Wellington da Silva

2022-09-07

1 Introduction

In this project I leverage SQL with MySQL to perform an exploratory data analysis on a DVD rental store database.

The database analyzed here is Sakila, an example database distributed with MySQL.

1.2 The database

Let’s first access the database and list its tables:

USE sakila;
SHOW FULL TABLES;

The Sakila database is comprised of the following tables:

Tables_in_sakila Description
actor Lists information for all actors, such as the first and last names.
address Contains address information for customers, staff, and stores, as well as the telephone numbers for the addresses.
category Lists the categories that can be assigned to a movie.
city Contains the list of the cities in the database.
country Contains the list of the countries in the database.
customer Contains information of all customers.
film Contains information of all movies potentially in stock in the stores.
film_actor Relates the actors to the movies they take part in as a many-to-many relationship through foreign keys.
film_category Relates movies and categories through foreign keys.
film_text Contains the title and description of the movies.
inventory Lists the copies of the movies that are in stock in each store.
language Lists the possible languages that the movies can have.
payment Records each payment made by a customer, with information such as the amount and the rental being paid for (when applicable).
rental Contains information on each rental of each inventory item, such as the customer who rented what item, when it was rented, and when it was returned.
staff Contains the information of all staff members.
store Contains information about all stores in the system, such as the address and who is the manager of the store.

The figure below shows the relations between the tables:

Sakila EER Diagram

2 Analysis

2.1 Movies

What are the rental rate (the cost to rent the movie) categories and how many movies are there in each one?

SELECT rental_rate, COUNT(film_id) AS number_of_films 
FROM film
GROUP BY rental_rate;

The result of the above query is displayed in the following table:

rental_rate number_of_films
0.99 341
4.99 336
2.99 323

Which rating do we have the most movies in?

SELECT rating, COUNT(film_id) AS number_of_films
FROM film
GROUP BY rating
ORDER BY number_of_films DESC;
rating number_of_films
PG-13 223
NC-17 210
R 195
PG 194
G 178

Which rating is most prevalent in each store (considering the total number of film copies)?

SELECT film.rating, inventory.store_id, COUNT(inventory.inventory_id) AS total_number_of_copies
FROM inventory
LEFT JOIN film
ON inventory.film_id = film.film_id
GROUP BY inventory.store_id, film.rating
ORDER BY total_number_of_copies DESC;
rating store_id total_number_of_copies
PG-13 1 525
PG-13 2 493
PG 2 480
NC-17 2 479
NC-17 1 465
R 2 462
PG 1 444
R 1 442
G 2 397
G 1 394

How many times each movie has been rented out?

The following query will return each unique film Id and how many times it has been rented out.

SELECT inventory.film_id, COUNT(inventory.film_id) AS number_of_rentals
FROM rental
LEFT JOIN inventory
ON rental.inventory_id=inventory.inventory_id
GROUP BY inventory.film_id
ORDER BY 2 DESC;

As there were 958 rows returned, only the top 10 movies with respect to the total number of rentals are displayed here:

film_id number_of_rentals
103 34
738 33
730 32
382 32
767 32
489 32
331 32
418 31
735 31
1000 31

For each movie, when was the first and the last time it was rented out?

SELECT inventory.film_id, film.title, MIN(rental.rental_date) AS rented_first_on, MAX(rental.rental_date) AS rented_last_on
FROM inventory
LEFT JOIN rental
ON rental.inventory_id=inventory.inventory_id
LEFT JOIN film
ON film.film_id=inventory.film_id
GROUP BY inventory.film_id;
film_id title rented_first_on rented_last_on
1 ACADEMY DINOSAUR 2005-05-27 07:03:28 2005-08-23 01:01:01
2 ACE GOLDFINGER 2005-07-07 19:46:51 2006-02-14 15:16:03
3 ADAPTATION HOLES 2005-05-31 04:50:07 2005-08-23 13:54:39
4 AFFAIR PREJUDICE 2005-05-27 20:44:36 2006-02-14 15:16:03
5 AFRICAN EGG 2005-05-28 07:53:38 2006-02-14 15:16:03
6 AGENT TRUMAN 2005-05-26 15:32:46 2005-08-21 16:03:01
7 AIRPLANE SIERRA 2005-06-20 21:11:50 2005-08-22 17:18:05
8 AIRPORT POLLOCK 2005-05-25 19:37:47 2005-08-23 20:24:36
9 ALABAMA DEVIL 2005-07-06 18:32:49 2005-08-23 14:26:51
10 ALADDIN CALENDAR 2005-06-15 11:03:24 2005-08-21 20:49:21

Revenue per Movie

SELECT film.film_id, film.title, SUM(payment.amount) AS total_revenue
FROM payment
LEFT JOIN rental
ON payment.rental_id=rental.rental_id
LEFT JOIN inventory
ON rental.inventory_id=inventory.inventory_id
LEFT JOIN film
ON inventory.film_id=film.film_id
GROUP BY film.film_id
ORDER BY 3 DESC;
film_id title total_revenue
879 TELEGRAPH VOYAGE 231.73
973 WIFE TURN 223.69
1000 ZORRO ARK 214.69
369 GOODFELLAS SALUTE 209.69
764 SATURDAY LAMBS 204.72
893 TITANS JERK 201.71
897 TORQUE BOUND 198.72
403 HARRY IDAHO 195.70
460 INNOCENT USUAL 191.74
444 HUSTLER PARTY 190.78

2.2 Customers

What is the last rental date of each customer?

SELECT customer_id, MAX(rental_date) AS last_rental_date
FROM rental
GROUP BY customer_id;
customer_id last_rental_date
1 2005-08-22 20:03:46
2 2005-08-23 17:39:35
3 2005-08-23 07:10:14
4 2005-08-23 07:43:00
5 2006-02-14 15:16:03
6 2005-08-23 06:41:32
7 2005-08-21 04:49:48
8 2005-08-23 14:31:19
9 2006-02-14 15:16:03
10 2005-08-22 21:59:29

Who are the customers who have rented at least 30 times?

SELECT rental.customer_id, customer.first_name, customer.last_name, customer.email, COUNT(rental.rental_id) AS rentals_amount
FROM rental
LEFT JOIN customer
ON rental.customer_id=customer.customer_id
GROUP BY rental.customer_id
HAVING rentals_amount >= 30;
customer_id first_name last_name email rentals_amount
1 MARY SMITH MARY.SMITH@sakilacustomer.org 32
5 ELIZABETH BROWN ELIZABETH.BROWN@sakilacustomer.org 38
7 MARIA MILLER MARIA.MILLER@sakilacustomer.org 33
15 HELEN HARRIS HELEN.HARRIS@sakilacustomer.org 32
20 SHARON ROBINSON SHARON.ROBINSON@sakilacustomer.org 30
21 MICHELLE CLARK MICHELLE.CLARK@sakilacustomer.org 35
23 SARAH LEWIS SARAH.LEWIS@sakilacustomer.org 30
26 JESSICA HALL JESSICA.HALL@sakilacustomer.org 34
27 SHIRLEY ALLEN SHIRLEY.ALLEN@sakilacustomer.org 31
28 CYNTHIA YOUNG CYNTHIA.YOUNG@sakilacustomer.org 32

Who rented the most?

SELECT rental.customer_id, customer.first_name, customer.last_name, customer.email, COUNT(rental.rental_id) AS rentals_amount
FROM rental
LEFT JOIN customer
ON rental.customer_id=customer.customer_id
GROUP BY rental.customer_id
ORDER BY rentals_amount DESC
LIMIT 1;
customer_id first_name last_name email rentals_amount
148 ELEANOR HUNT ELEANOR.HUNT@sakilacustomer.org 46

What is the last rental date of each active customer?

SELECT customer.customer_id, customer.first_name, customer.last_name, MAX(rental.rental_date) AS last_rental_date
FROM rental
JOIN customer
ON rental.customer_id=customer.customer_id
WHERE rental.customer_id IN (
    SELECT customer.customer_id
    FROM customer
    WHERE customer.active = 1
)
GROUP BY rental.customer_id;
customer_id first_name last_name last_rental_date
1 MARY SMITH 2005-08-22 20:03:46
2 PATRICIA JOHNSON 2005-08-23 17:39:35
3 LINDA WILLIAMS 2005-08-23 07:10:14
4 BARBARA JONES 2005-08-23 07:43:00
5 ELIZABETH BROWN 2006-02-14 15:16:03
6 JENNIFER DAVIS 2005-08-23 06:41:32
7 MARIA MILLER 2005-08-21 04:49:48
8 SUSAN WILSON 2005-08-23 14:31:19
9 MARGARET MOORE 2006-02-14 15:16:03
10 DOROTHY TAYLOR 2005-08-22 21:59:29

How much each active customer has spent on average?

SELECT customer.customer_id, customer.first_name, customer.last_name, AVG(payment.amount) AS average_spent
FROM payment
JOIN customer
ON payment.customer_id=customer.customer_id
WHERE payment.customer_id IN (
    SELECT customer.customer_id
    FROM customer
    WHERE customer.active = 1
)
GROUP BY payment.customer_id;
customer_id first_name last_name average_spent
1 MARY SMITH 3.708750
2 PATRICIA JOHNSON 4.767778
3 LINDA WILLIAMS 5.220769
4 BARBARA JONES 3.717273
5 ELIZABETH BROWN 3.805789
6 JENNIFER DAVIS 3.347143
7 MARIA MILLER 4.596061
8 SUSAN WILSON 3.865000
9 MARGARET MOORE 3.903043
10 DOROTHY TAYLOR 3.990000

2.3 Monthly analysis

Rentals per month

SELECT EXTRACT(MONTH FROM rental_date) AS month, COUNT(rental_id) AS rentals
FROM rental
GROUP BY month
ORDER BY month;
month rentals
2 182
5 1156
6 2311
7 6709
8 5686

Revenue Per Month

SELECT EXTRACT(MONTH FROM payment_date) AS month, SUM(amount) AS revenue
FROM payment
GROUP BY month
ORDER BY month;
month revenue
2 514.18
5 4824.43
6 9631.88
7 28373.89
8 24072.13