---[REF]---"style":"heading"---[REF]--- ---[SIDEBAR]--- "type":"objectives", "title":"Learning Objectives", "text":"After completing this lesson, the successful student will be able to:
  1. Create aliases for tablenames to simplify the syntax for joins.
  2. Describe how joining tables may lead to overcounting records.
  3. Explain what an inner join is and how to create one.
  4. Explain what an outer join is and how to create one.
  5. Explain what a self join is and how to create one.
" ---[SIDEBAR]--- ## Using Table Aliases The syntax for joining multiple tables together in the same query is not difficult, but it can be tedious. Just the syntax of the FROM clause requires that you use the names of the tables being joined at least twice. If a query uses columns that appear in more than one column in other clauses, the table names will be used even more times. Consider this example: ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-name-work-name-museum-name ---[SQL]--- SELECT full_name, work.name AS work_name, museum.name AS museum_name FROM artist JOIN work ON artist.artist_id = work.artist_id JOIN museum ON work.museum_id = museum.museum_id ORDER BY full_name; ---[SQL]--- Notice that column 'name' appears on both the museum and work tables. To use these columns in the SELECT clause, they need to be fully specified with the tablename. In total, the work table is referred to four time in the query syntax, the museum table is referred to three times, and the artist table is referred to twice. To simplify the syntax of the query, each of the tables can be referred to using an alias. Each of the tables could be aliased as the first letter of the table name (m for museum, for example). The syntax for creating an alias for a table is slightly different than the syntax for creating an alias for a column name in the SELECT clause. To alias a table, specify the actual name of the table followed by a space and the alias you wish to use. The syntax to alias the museum table as the letter m is FROM museum m. Once a table is aliased, it is referenced using the alias everywhere else in the SQL statement. ---[SIDEBAR]--- "type":"note", "title":"Table Aliases", "text":"Table aliases are useful for simplifying the syntax of a SQL statement. There are times that you must use an alias for the query to work. This is the case for queries that use a self join. Self joins are discussed later in this lesson." ---[SIDEBAR]--- The previous query can be rewritten with aliases for each of the table. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-name-work-name-museum-name-alias ---[SQL]--- SELECT full_name, w.name AS work_name, m.name AS museum_name FROM artist a JOIN work w ON a.artist_id = w.artist_id JOIN museum m ON w.museum_id = m.museum_id ORDER BY full_name; ---[SQL]--- ---[SIDEBAR]--- "type":"note", "title":"Order of Execution and Table Aliases", "text":"Recall that the clauses of a SQL statement are not processed by the DBMS in the same order that they are written. The FROM clause is the first to be executed. Because a table is renamed with an alias in the FROM clause, that alias can be used anywhere else in the query." ---[SIDEBAR]--- ## Joins and Duplication of Data In ---[REF]---"style":"link","pageId":"lesson-table-relationships"---[REF]--- you learned about cardinality and the types of relationships between tables. Recall that a one to many relationship means that a row on one table can be related to many rows on the related table. In the museum database, several of the relationships are one to many. One of these is the relationship between artist and work. Each work of art is related to only one artist, but a given artist could create many works of art. One to many relationship can have an important implication for queries that involve multiple tables. Depending on the task you want to perform, you might find that the results of your queries contain unexpected or unintended duplicate rows. For example, you may want to create an aggregate query to count the number of museums that exhibit each artists' work. Name the calculated field museum_count. Sort the results by museum_count in descending order. ---[EVALUATE]--- "type":"plan", "task":"Count the number of museums that exhibit each artists' work. Sort the results by museum_count in descending order.", "columns":"artist name, museum_count", "tables":"artist, work, museum", "rows":"include: all artists, only unique museums|show: each artist", "order":"alphabetically by full_name" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to count the number of museums that exhibit each artists' work. Name the calculated field museum_count. Sort the results by museum_count in descending order.", "pretext": "To count the number of museums that exhibit each artist's work, you can use the following SQL statement with a GROUP BY clause and the COUNT() function:", "response": "SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count \nFROM artist a JOIN work w ON a.artist_id = w.artist_id \nJOIN museum m ON w.museum_id = m.museum_id \nGROUP BY full_name \nORDER BY museum_count DESC;", "posttext": "In this query::
  1. We join the artist, work, and museum tables to establish the relationships between artists, their works, and the museums exhibiting those works.
  2. We use the GROUP BY clause to group the results by the artist's full name (full_name).
  3. Within each group, we use the COUNT(DISTINCT museum.museum_id) function to count the distinct museums where the artist's work is exhibited. COUNT(DISTINCT) ensures that each museum is counted only once per artist, even if the same museum exhibits multiple works by the same artist.
  4. Finally, we order the results by museum_count in descending order.
The result will display each artist's name along with the count of museums that exhibit their work." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count FROM artist a JOIN work w ON a.artist_id = w.artist_id JOIN museum m ON w.museum_id = m.museum_id GROUP BY full_name ORDER BY museum_name DESC;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count \nFROM artist a JOIN work w ON a.artist_id = w.artist_id \nJOIN museum m ON w.museum_id = m.museum_id \nGROUP BY full_name \nORDER BY museum_name DESC;" ---[EVALUATE]--- This query returns the correct results. It retrieves the full_name from the artist table and counts of the number of museums that exhibit each artist's work. All artists are included in the query result and the results are sorted by museum_count in descending order. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-museum-count ---[SQL]--- SELECT full_name, COUNT(DISTINCT m.museum_id) AS museum_count FROM artist a JOIN work w ON a.artist_id = w.artist_id JOIN museum m ON w.museum_id = m.museum_id GROUP BY full_name ORDER BY museum_count DESC; ---[SQL]--- This query produces a list of the artist and calculates the number of museums that exhibit the work of each artist. Notice the keyword DISTINCT in the argument for the COUNT function. Only the unique values for museum_id will be included in the calculation of the count for each artist. The use of the keyword DISTINCT is important in this case. Since artist can create more than one work of art, it is possible for an artist to have more than one work of art exhibited in the same museum. This would lead to museums being counted multiple times for the same artist. The keyword distinct ensures that this does not happen. For comparison, examine the query below. It omits the keyword DISTINCT to demonstrates how much overcounting would happen. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-museum-count-no-distincts ---[SQL]--- SELECT full_name, COUNT(m.museum_id) AS museum_count_no_distinct FROM artist a JOIN work w ON a.artist_id = w.artist_id JOIN museum m ON w.museum_id = m.museum_id GROUP BY full_name ORDER BY museum_count_no_distinct DESC; ---[SQL]--- The overcounting in this query can be extensive. Claude Monet's work is exhibited in 27 museums in the database, but counted 188 times in the second query. Pierre-Auguste Renoir museum count is even more overcounted. His work is exhibited in 14 museum, but counted 188 times. This happens because more than one of each artists' work is exhibit in the same museum. Using the keyword DISTINCT in the COUNT function eliminates this problem since only the unique values for museum_id will be included in the count. ---[SIDEBAR]--- "type":"tip", "title":"Joins and the keyword DISTINCT", "text":"As a beginning query writer it can be difficult to know when you need to use DISTINCT when writing SQL statements with JOINs. It is generally best to error on the side of including DISTINCT with JOINs when you are unsure." ---[SIDEBAR]--- ## Types of Joins All of the SQL statements discussed in this lesson and in ---[REF]---"style":"link","pageId":"lesson-joining-tables"---[REF]--- have used inner join queries. There are other types of joins that return different results bases on how the joining fields - generally primary and foreign key pairs - are used. The remainder of this lesson will describe three common join types: inner joins, outer joins, and self joins. ### Inner Joins An inner join is the most common type of join. In fact, it is the default way to join tables. Technically, the syntax for an inner join is: FROM table1 INNER JOIN table2 ON table1.primary_key = table2.foreign_key. Although this is technically the syntax, you will generally see the keyword INNER omitted from the SQL statement. This has been the case with all of the join queries you have seen so far in this resource. An inner join combines tables in a query result by matching the values for the primary and foreign keys in the joined tables. Those rows that contain matching values are included in the query result. Any row that does not match will be omitted from the query result. An example will best illustrate how this works. ---[SIDEBAR]--- "type":"term", "title":"Inner Joins", "text":"A join that combines rows from tables where only those rows that contain a match for the values of the primary and foreign key columns will be included in the query results." ---[SIDEBAR]--- Suppose you want to retrieve the subjects for each of the works of art. Show only those works with a subject. ---[EVALUATE]--- "type":"plan", "task":"Retrieve the subjects for each of the works of art.", "columns":"work name, subject", "tables":"work, subject", "rows":"works that have a subject", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to retrieve the subjects for each of the works of art. Retrieve only those works with a subject.", "pretext": "To retrieve the subjects for each of the works of art and retrieve only those works with a subject, you can use the following SQL statement:", "response": "SELECT name, subject \nFROM work w JOIN subject s ON w.work_id = s.work_id;", "posttext": "In this query::
  1. We join the work and subject tables based on the work_id column to link works of art with their associated subjects.
  2. We select the name column from the work table (work of art's name) and the subject column from the subject table (subject of the work).
The result will display the names of works of art and their associated subjects, but only for works that have a subject." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT name, subject FROM work w JOIN subject s ON w.work_id = s.work_id;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT name, subject \nFROM work w JOIN subject s ON w.work_id = s.work_id;" ---[EVALUATE]--- This query returns the correct results. It retrieves the name from the work table and subject from the subject table. Since this query uses an INNER JOIN, only those works of art with a subject (and subjects with a work of art) are included in the query result. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-subject-inner ---[SQL]--- SELECT name, subject FROM work w JOIN subject s ON w.work_id = s.work_id; ---[SQL]--- The results of the query are a list of works of art and the subjects in the work. Since a work of art can haave more than one subject, a work of art may be represented on more than one row of the query result. Notice that there are 2132 rows in the results. There are 2132 matches found between the work and subject tables. It is instuctive to compare this to the number of rows on the work and subject tables. The following queries count the rows on these tables. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-count-subject-count ---[SQL]--- SELECT count(*) as work_count FROM work; SELECT count(*) as subject_count FROM subject; ---[SQL]--- Notice that there are 4294 works of art (rows on the work table) and 2132 rows on the subject table. This means that all of the subjects match with a work of art. However, there are a number of works of art that aren't related to a subject. With the information you have at this point, it would be impossible to determine how may works of art are not associated with a subject. To determine that, you would need to use an outer join. ### Outer Joins Outer joins combine rows from tables differently than inner joins. Where an inner join returns only those rows from both tables that have values for the primary and foreign keys that match, outer joins also return rows with values that don't match. There are three types of outer joins. Each of these types work slightly differently providing you flexibility to combine tables in ways to fit the task of the query. ---[SIDEBAR]--- "type":"term", "title":"Outer Joins", "text":"A join that combines rows from tables even if there is not a match for the values in the primary and foreign keys." ---[SIDEBAR]--- #### Left and Right Outer Joins To understand how left and right outer joins work, you first need to know what is meant by left and right. Left and right refer to the position of the tables in the JOIN syntax of the FROM clause. In the statement FROM table1 JOIN table2, table1 is on the left side of the join and table2 is on the right. In a left outer join, *all* of the rows from the table on the left side of the keyword JOIN will be returned. Only those rows from the table on the right side of JOIN that contain a value for the primary/foreign key pair column will be included in the join. For example, the syntax 'FROM work LEFT OUTER JOIN subject on work.work_id = subject.work_id' will combine the work and subject table so that: 1. All of the rows on the subject table that contain a work_id that matches a work_id on the work table will be included in the query result. As with an inner join, rows with matching values will be combined in the query results. Those rows on the subject table where the work_id does not match a value on the work table will be *excluded* from the query results. 2. All of the rows on the work table will be included in the query result, regardless of whether the value for the work_id on the work table matches a value for work_id on the subject table. Those rows with matching values on both tables will be combined in the query result. Those rows on the work table that do not match a row on the subject table will still be included. These rows will be combined null values for the columns of the subject table. As you might expect, a right outer join is similar to a left join. The difference is that with a right outer join only matching rows from the table to the left of the keyword JOIN will be included in the query result and all rows will be returned from the table on the right of JOIN. All right outer join queries can also be written as a left outer join by changing the order of the tables in the FROM clause. ---[SIDEBAR]--- "type":"caution", "title":"Right Outer Join", "text":"Because a right outer join can be rewritten as a left outer join, some DBMS do not support right outer joins. It is generally a best practice to use left outer joins in place of right outer joins." ---[SIDEBAR]--- The syntax for a left outer join is 'FROM table1 LEFT OUTER JOIN table2 ON table1.primary_key = table2.foreign_key'. ---[SIDEBAR]--- "type":"tip", "title":"Left Outer Join Syntax", "text":"While the technical syntax for a left join is LEFT OUTER JOIN, you will often see the syntax LEFT JOIN. Most DBMS will allow either syntax for a left join." ---[SIDEBAR]--- A common use for left outer join queries is to find exceptions, or data that does not fit an expected pattern. For example, you may want to write a query to show the works of art that do not have a subject. ---[EVALUATE]--- "type":"plan", "task":"Show the works of art that do not have a subject.", "columns":"work name, subject", "tables":"work, subject", "rows":"works that do not have a subject", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to show the works of art that do not have a subject.", "pretext": "To show the works of art that do not have a subject, you can use a SQL statement with a LEFT JOIN and a WHERE clause to filter for rows where the subject is NULL. Here's the SQL statement:", "response": "SELECT name, subject \nFROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id \nWHERE subject IS NULL;", "posttext": "In this query::
  1. We use a LEFT JOIN to join the work and subject tables based on the work_id column. This will include all works of art, even those without a subject.
  2. We use a WHERE clause to filter the results and select only those works where the subject is NULL. This condition ensures that we retrieve only works of art that do not have a subject.
  3. We select the name column from the work table (work of art's name) and the subject column from the subject table.
The result will display the names of works of art that do not have a subject." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT name, subject FROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id WHERE subject IS NULL;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT name, subject \nFROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id \nWHERE subject IS NULL;" ---[EVALUATE]--- This query returns the correct results. It retrieves the name from the work table and subject from the subject table. Since this query uses an LEFT OUTER JOIN, all works of art will be included in the query result even if they don't have a subject. Those works of art without a subject will have a null value for the subject. Because of this, the WHERE clause is used to filter those rows with a null value for subject. Only those works of art with a null value for subject will be included in the query results. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-subject-left ---[SQL]--- SELECT name, subject FROM work w LEFT OUTER JOIN subject s ON w.work_id = s.work_id WHERE subject IS NULL; ---[SQL]--- The result of the query is the works of art without a subject. Notice that all of the rows returned in the query result have a null value for the subject column. 2388 works of art do not have a subject in the database. The following set of queries drives home the differences between inner join and left outer join queries. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-subject-join-compare ---[SQL]--- SELECT count(distinct w.work_id) as works_without_subjects FROM work w LEFT OUTER JOIN subject s on w.work_id = s.work_id where s.work_id is null; SELECT count(distinct w.work_id) as works_with_subjects FROM work w INNER JOIN subject s on w.work_id = s.work_id; SELECT count(distinct work_id) as number_of_works FROM work ---[SQL]--- The first query uses a left outer join to count the number of works of art on the work table that don't have a corresponding subject on the subject table. The second query uses an inner join to count the number of works of art on the work table that have a corresponding subject on the subject table. The final query counts the number of works of art. Notice that the total of the count in the first two queries equals the total number of works of art. #### Full Outer Joins Full outer join queries are not common. In a full outer join, all rows from both tables in the join will be included in the query result regardless of whether the values for the primary and foreign key columms match in either table. Those rows that match will be combined in the query result. When the row from either table does not match, it is still included in the results with null values for the corresponding columns from the other table. The syntax for a full outer join is 'FROM table1 FULL OUTER JOIN table2 ON table1.primary_key = table2.foreign_key'. ### Self Joins The last join type discussed in this lesson is a self join. As the name implies, a self join happens when a single table is joined to itself. A self join can be an inner join or an outer join. The syntax for an inner join version of a self join query is 'FROM table t1 JOIN table t2 ON t1.key_column = t2.key_column'. Conceptually, a self join is not really different than a join between two different tables. In a self join tables being joined can be thought of as two instances of the same table. Because the table is joined to itself, you must use aliases to refer to each 'instance', otherwise the DBMS will not be able to differentiate between the instances of the table. Often in a self join, you don't use primary and foreign key values as the basis for the join. In fact, you generally won't have a column that can act as the foreign key for its own table. Instead you can use any two columns (or different versions of the same column) as the basis for the join. This is best seen in an example. Suppose you want to retrieve all of the artists with of the same nationality as the artist Henri Manguin. ---[EVALUATE]--- "type":"plan", "task":"Retrieve all of the artists with of the same nationality as the artist Henri Manguin.", "columns":"full_name, nationality", "tables":"artist", "rows":"nationality is the same as nationality for Henri Manguin", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to retrieve the artists with the same nationality as Henri Manguin.", "pretext": "To retrieve the artists with the same nationality as Henri Manguin, you can use the following SQL statement:", "response": "SELECT a2.full_name, a2.nationality \nFROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality \nWHERE a1.full_name = 'Henri Manguin';", "posttext": "In this query::
  1. We first create two aliases for the artist table, denoted as a1 and a2, to distinguish between the artist being compared and the artists we are retrieving.
  2. We use a self-join on the artist table. The first instance, a1, represents Henri Manguin, and the second instance, a2, represents other artists.
  3. We compare the nationality of Henri Manguin (a1.nationality) with the nationality of other artists (a2.nationality).
  4. We use a WHERE clause to filter for rows where a1.full_name is equal to 'Henri Manguin', ensuring that we are only retrieving artists with the same nationality as Henri Manguin.
The result will display the names of artist with the same nationality as Henri Manguin." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT a2.full_name, a2.nationality FROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality WHERE a1.full_name = 'Henri Manguin';
---[EVALUATE]--- "type":"evaluate", "query":"SELECT a2.full_name, a2.nationality \nFROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality \nWHERE a1.full_name = 'Henri Manguin';" ---[EVALUATE]--- This query returns the correct results. It displays the full_name and nationality of the artists with the same nationality as Henri Manguin. To accomplish this, two instances of the artist table are joined together. The first version is aliased as a1 and represents Henri Manguin. The second is aliased as a2 and represents all of the artists. The WHERE clause (WHERE a1.full_name = 'Henri Manguin') ensures that the only data on table a1 includes is that for Henri Manguin. The join condition ON a1.nationality = a2.nationality, compares the nationality for Henri Manguin (a1.nationality) with the nationality for the artists in table a2 (all of the artists). Because the query is an inner join, only those rows in table a2 (that represents all artists) where the nationality matches the the value for nationality in table a1 (which is the nationality of Henri Manguin) will be included in the query result. The query displays the full_name and nationality columns from table a2, so the query will display all artists included in the query results. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-self-manguin-nationality ---[SQL]--- SELECT a2.full_name, a2.nationality FROM artist a1 JOIN artist a2 ON a1.nationality = a2.nationality WHERE a1.full_name = 'Henri Manguin'; ---[SQL]--- Each of the artists in the query results have a nationality of French. The nationality of Henri Manguin is French. The query retrieves the correct results. ## Summary This lesson covered advanced issues related to joins. It described creating table aliases and issues related to joins and duplicate records. It also highlighted three types of joins (inner joins, outer joins, and self joins) and how to create them.