---[REF]---"style":"heading"---[REF]---
---[SIDEBAR]---
"type":"objectives",
"title":"Learning Objectives",
"text":"After completing this lesson, the successful student will be able to:
- Create aliases for tablenames to simplify the syntax for joins.
- Describe how joining tables may lead to overcounting records.
- Explain what an inner join is and how to create one.
- Explain what an outer join is and how to create one.
- Explain what a self join is and how to create one.
The syntax for a query that accomplishes this is:
SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count
FROM artist a JOIN work w ON a.artist_id = w.artist_id
JOIN museum m ON w.museum_id = m.museum_id
GROUP BY full_name
ORDER BY museum_name DESC;
---[EVALUATE]---
"type":"evaluate",
"query":"SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count \nFROM artist a JOIN work w ON a.artist_id = w.artist_id \nJOIN museum m ON w.museum_id = m.museum_id \nGROUP BY full_name \nORDER BY museum_name DESC;"
---[EVALUATE]---
This query returns the correct results. It retrieves the full_name from the artist table and counts of the number of museums that exhibit each artist's work. All artists are included in the query result and the results are sorted by museum_count in descending order.
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:artist-museum-count
---[SQL]---
SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count
FROM artist a JOIN work w ON a.artist_id = w.artist_id
JOIN museum m ON w.museum_id = m.museum_id
GROUP BY full_name
ORDER BY museum_count DESC;
---[SQL]---
This query produces a list of the artist and calculates the number of museums that exhibit the work of each artist. Notice the keyword DISTINCT in the argument for the COUNT function. Only the unique values for museum_id will be included in the calculation of the count for each artist.
The use of the keyword DISTINCT is important in this case. Since artist can create more than one work of art, it is possible for an artist to have more than one work of art exhibited in the same museum. This would lead to museums being counted multiple times for the same artist. The keyword distinct ensures that this does not happen.
For comparison, examine the query below. It omits the keyword DISTINCT to demonstrates how much overcounting would happen.
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:artist-museum-count-no-distincts
---[SQL]---
SELECT full_name, COUNT(m.museum_id) AS museum_count_no_distinct
FROM artist a JOIN work w ON a.artist_id = w.artist_id
JOIN museum m ON w.museum_id = m.museum_id
GROUP BY full_name
ORDER BY museum_count_no_distinct DESC;
---[SQL]---
The overcounting in this query can be extensive. Claude Monet's work is exhibited in 27 museums in the database, but counted 188 times in the second query. Pierre-Auguste Renoir museum count is even more overcounted. His work is exhibited in 14 museum, but counted 188 times. This happens because more than one of each artists' work is exhibit in the same museum. Using the keyword DISTINCT in the COUNT function eliminates this problem since only the unique values for museum_id will be included in the count.
---[SIDEBAR]---
"type":"tip",
"title":"Joins and the keyword DISTINCT",
"text":"As a beginning query writer it can be difficult to know when you need to use DISTINCT when writing SQL statements with JOINs. It is generally best to error on the side of including DISTINCT with JOINs when you are unsure."
---[SIDEBAR]---
## Types of Joins
All of the SQL statements discussed in this lesson and in ---[REF]---"style":"link","pageId":"lesson-joining-tables"---[REF]--- have used inner join queries. There are other types of joins that return different results bases on how the joining fields - generally primary and foreign key pairs - are used. The remainder of this lesson will describe three common join types: inner joins, outer joins, and self joins.
### Inner Joins
An inner join is the most common type of join. In fact, it is the default way to join tables. Technically, the syntax for an inner join is: FROM table1 INNER JOIN table2 ON table1.primary_key = table2.foreign_key. Although this is technically the syntax, you will generally see the keyword INNER omitted from the SQL statement. This has been the case with all of the join queries you have seen so far in this resource.
An inner join combines tables in a query result by matching the values for the primary and foreign keys in the joined tables. Those rows that contain matching values are included in the query result. Any row that does not match will be omitted from the query result. An example will best illustrate how this works.
---[SIDEBAR]---
"type":"term",
"title":"Inner Joins",
"text":"A join that combines rows from tables where only those rows that contain a match for the values of the primary and foreign key columns will be included in the query results."
---[SIDEBAR]---
Suppose you want to retrieve the subjects for each of the works of art. Show only those works with a subject.
---[EVALUATE]---
"type":"plan",
"task":"Retrieve the subjects for each of the works of art.",
"columns":"work name, subject",
"tables":"work, subject",
"rows":"works that have a subject",
"order":"none specified"
---[EVALUATE]---
The syntax for a query that accomplishes this is:
SELECT name, subject
FROM work w JOIN subject s ON w.work_id = s.work_id;
---[EVALUATE]---
"type":"evaluate",
"query":"SELECT name, subject \nFROM work w JOIN subject s ON w.work_id = s.work_id;"
---[EVALUATE]---
This query returns the correct results. It retrieves the name from the work table and subject from the subject table. Since this query uses an INNER JOIN, only those works of art with a subject (and subjects with a work of art) are included in the query result.
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:work-subject-inner
---[SQL]---
SELECT name, subject
FROM work w JOIN subject s ON w.work_id = s.work_id;
---[SQL]---
The results of the query are a list of works of art and the subjects in the work. Since a work of art can haave more than one subject, a work of art may be represented on more than one row of the query result. Notice that there are 2132 rows in the results. There are 2132 matches found between the work and subject tables.
It is instuctive to compare this to the number of rows on the work and subject tables. The following queries count the rows on these tables.
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:work-count-subject-count
---[SQL]---
SELECT count(*) as work_count
FROM work;
SELECT count(*) as subject_count
FROM subject;
---[SQL]---
Notice that there are 4294 works of art (rows on the work table) and 2132 rows on the subject table. This means that all of the subjects match with a work of art. However, there are a number of works of art that aren't related to a subject. With the information you have at this point, it would be impossible to determine how may works of art are not associated with a subject. To determine that, you would need to use an outer join.
### Outer Joins
Outer joins combine rows from tables differently than inner joins. Where an inner join returns only those rows from both tables that have values for the primary and foreign keys that match, outer joins also return rows with values that don't match. There are three types of outer joins. Each of these types work slightly differently providing you flexibility to combine tables in ways to fit the task of the query.
---[SIDEBAR]---
"type":"term",
"title":"Outer Joins",
"text":"A join that combines rows from tables even if there is not a match for the values in the primary and foreign keys."
---[SIDEBAR]---
#### Left and Right Outer Joins
To understand how left and right outer joins work, you first need to know what is meant by left and right. Left and right refer to the position of the tables in the JOIN syntax of the FROM clause. In the statement FROM table1 JOIN table2, table1 is on the left side of the join and table2 is on the right.
In a left outer join, *all* of the rows from the table on the left side of the keyword JOIN will be returned. Only those rows from the table on the right side of JOIN that contain a value for the primary/foreign key pair column will be included in the join. For example, the syntax 'FROM work LEFT OUTER JOIN subject on work.work_id = subject.work_id' will combine the work and subject table so that:
1. All of the rows on the subject table that contain a work_id that matches a work_id on the work table will be included in the query result. As with an inner join, rows with matching values will be combined in the query results. Those rows on the subject table where the work_id does not match a value on the work table will be *excluded* from the query results.
2. All of the rows on the work table will be included in the query result, regardless of whether the value for the work_id on the work table matches a value for work_id on the subject table. Those rows with matching values on both tables will be combined in the query result. Those rows on the work table that do not match a row on the subject table will still be included. These rows will be combined null values for the columns of the subject table.
As you might expect, a right outer join is similar to a left join. The difference is that with a right outer join only matching rows from the table to the left of the keyword JOIN will be included in the query result and all rows will be returned from the table on the right of JOIN. All right outer join queries can also be written as a left outer join by changing the order of the tables in the FROM clause.
---[SIDEBAR]---
"type":"caution",
"title":"Right Outer Join",
"text":"Because a right outer join can be rewritten as a left outer join, some DBMS do not support right outer joins. It is generally a best practice to use left outer joins in place of right outer joins."
---[SIDEBAR]---
The syntax for a left outer join is 'FROM table1 LEFT OUTER JOIN table2 ON table1.primary_key = table2.foreign_key'.
---[SIDEBAR]---
"type":"tip",
"title":"Left Outer Join Syntax",
"text":"While the technical syntax for a left join is LEFT OUTER JOIN, you will often see the syntax LEFT JOIN. Most DBMS will allow either syntax for a left join."
---[SIDEBAR]---
A common use for left outer join queries is to find exceptions, or data that does not fit an expected pattern. For example, you may want to write a query to show the works of art that do not have a subject.
---[EVALUATE]---
"type":"plan",
"task":"Show the works of art that do not have a subject.",
"columns":"work name, subject",
"tables":"work, subject",
"rows":"works that do not have a subject",
"order":"none specified"
---[EVALUATE]---
The syntax for a query that accomplishes this is:
SELECT name, subject
FROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id
WHERE subject IS NULL;
---[EVALUATE]---
"type":"evaluate",
"query":"SELECT name, subject \nFROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id \nWHERE subject IS NULL;"
---[EVALUATE]---
This query returns the correct results. It retrieves the name from the work table and subject from the subject table. Since this query uses an LEFT OUTER JOIN, all works of art will be included in the query result even if they don't have a subject. Those works of art without a subject will have a null value for the subject. Because of this, the WHERE clause is used to filter those rows with a null value for subject. Only those works of art with a null value for subject will be included in the query results.
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:work-subject-left
---[SQL]---
SELECT name, subject
FROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id
WHERE subject IS NULL;
---[SQL]---
The result of the query is the works of art without a subject. Notice that all of the rows returned in the query result have a null value for the subject column. 2388 works of art do not have a subject in the database.
The following set of queries drives home the differences between inner join and left outer join queries.
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:work-subject-join-compare
---[SQL]---
SELECT count(distinct w.work_id) as works_without_subjects
FROM work w LEFT OUTER JOIN subject s on w.work_id = s.work_id
where s.work_id is null;
SELECT count(distinct w.work_id) as works_with_subjects
FROM work w INNER JOIN subject s on w.work_id = s.work_id;
SELECT count(distinct work_id) as number_of_works
FROM work
---[SQL]---
The first query uses a left outer join to count the number of works of art on the work table that don't have a corresponding subject on the subject table. The second query uses an inner join to count the number of works of art on the work table that have a corresponding subject on the subject table. The final query counts the number of works of art. Notice that the total of the count in the first two queries equals the total number of works of art.
#### Full Outer Joins
Full outer join queries are not common. In a full outer join, all rows from both tables in the join will be included in the query result regardless of whether the values for the primary and foreign key columms match in either table. Those rows that match will be combined in the query result. When the row from either table does not match, it is still included in the results with null values for the corresponding columns from the other table.
The syntax for a full outer join is 'FROM table1 FULL OUTER JOIN table2 ON table1.primary_key = table2.foreign_key'.
### Self Joins
The last join type discussed in this lesson is a self join. As the name implies, a self join happens when a single table is joined to itself. A self join can be an inner join or an outer join. The syntax for an inner join version of a self join query is 'FROM table t1 JOIN table t2 ON t1.key_column = t2.key_column'.
Conceptually, a self join is not really different than a join between two different tables. In a self join tables being joined can be thought of as two instances of the same table. Because the table is joined to itself, you must use aliases to refer to each 'instance', otherwise the DBMS will not be able to differentiate between the instances of the table.
Often in a self join, you don't use primary and foreign key values as the basis for the join. In fact, you generally won't have a column that can act as the foreign key for its own table. Instead you can use any two columns (or different versions of the same column) as the basis for the join.
This is best seen in an example. Suppose you want to retrieve all of the artists with of the same nationality as the artist Henri Manguin.
---[EVALUATE]---
"type":"plan",
"task":"Retrieve all of the artists with of the same nationality as the artist Henri Manguin.",
"columns":"full_name, nationality",
"tables":"artist",
"rows":"nationality is the same as nationality for Henri Manguin",
"order":"none specified"
---[EVALUATE]---
The syntax for a query that accomplishes this is:
SELECT a2.full_name, a2.nationality
FROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality
WHERE a1.full_name = 'Henri Manguin';
---[EVALUATE]---
"type":"evaluate",
"query":"SELECT a2.full_name, a2.nationality \nFROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality \nWHERE a1.full_name = 'Henri Manguin';"
---[EVALUATE]---
This query returns the correct results. It displays the full_name and nationality of the artists with the same nationality as Henri Manguin. To accomplish this, two instances of the artist table are joined together. The first version is aliased as a1 and represents Henri Manguin. The second is aliased as a2 and represents all of the artists. The WHERE clause (WHERE a1.full_name = 'Henri Manguin') ensures that the only data on table a1 includes is that for Henri Manguin. The join condition ON a1.nationality = a2.nationality, compares the nationality for Henri Manguin (a1.nationality) with the nationality for the artists in table a2 (all of the artists). Because the query is an inner join, only those rows in table a2 (that represents all artists) where the nationality matches the the value for nationality in table a1 (which is the nationality of Henri Manguin) will be included in the query result. The query displays the full_name and nationality columns from table a2, so the query will display all artists included in the query results.
**Try the SQL statement.**
---[SQL]---
connection:sql_book_connection
schema:canvas
diagram:canvas
owner:atlas-query
result:artist-self-manguin-nationality
---[SQL]---
SELECT a2.full_name, a2.nationality
FROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality
WHERE a1.full_name = 'Henri Manguin';
---[SQL]---
Each of the artists in the query results have a nationality of French. The nationality of Henri Manguin is French. The query retrieves the correct results.
## Summary
This lesson covered advanced issues related to joins. It described creating table aliases and issues related to joins and duplicate records. It also highlighted three types of joins (inner joins, outer joins, and self joins) and how to create them.