---[REF]---"style":"heading"---[REF]--- ---[SIDEBAR]--- "type":"objectives", "title":"Learning Objectives", "text":"After completing this lesson, the successful student will be able to:
  1. Describe what a subquery is.
  2. Use a subquery to construct a filter condition in the WHERE clause.
  3. Use a subquery to create a calculated field the SELECT clause.
  4. Use a subquery to a create a temporary table in the FROM clause.
" ---[SIDEBAR]--- ## What is a Subquery? Throughout the lessons in this resource, we have used the terms SELECT statement and query almost interchangably. While this is not necessarily bad, it is helpful to understand the difference between the two to help understand what a subquery is. A SELECT statement is used to retrieve data from a database. A query is any SQL statement. This includes any of the SELECT statements used so far in this resource to retrieve data from a database. A query may also be a different type of SQL statement such as a statement that adds, changes, or deletes data in the database. ---[SIDEBAR]--- "type":"term", "title":"Query", "text":"A SQL statement used to interact with a database." ---[SIDEBAR]--- A query can also include multiple SELECT statements. This happens when a SELECT statement is used within another SELECT statement. When one SELECT statement is included inside of another SELECT statement, it is called a subquery. ---[SIDEBAR]--- "type":"term", "title":"Subquery", "text":"A SELECT statement that is included inside of another SELECT statement." ---[SIDEBAR]--- For simplicity in this discussion, the query that contains the subquery will be called an "outer" query. Subqueries can be used in several ways. A subquery can be inserted in the SELECT, FROM, or WHERE clauses of the outer query. They can also be writen as an independend subsuqry, so that the subquery could be executed as a standalone query. It could also be writed as a dependent or correlated subquery. In this case, data must be passed from the outer query to the subquery when it is executed. ---[SIDEBAR]--- "type":"term", "title":"Independent Subquery", "text":"A subquery that could be executed as a standalone query without the outer query." ---[SIDEBAR]--- ---[SIDEBAR]--- "type":"term", "title":"Dependent Subquery", "text":"A subquery that requires data passed to it from the outer query when the query is executed." ---[SIDEBAR]--- The remainder of this lesson will be divided into three section. The first section will describe how to use a subquery in the WHERE clause of the outer query. Subqueries in the SELECT clause will be discussed in the second section. The final section will demonstrate how to use a subquery in the FROM clause of the outer query. The lesson will highlight when a subquery is independent or dependent. ### Using Subqueries in the WHERE Clause Probably the easiest place to use a subquery is in the WHERE clause. Recall that the WHERE clause builds filters to determine which rows will be included in a query result. These filters are based on logical tests. A logical test compares the values in a column to a specified value using a mathmatical operator. The clause WHERE width > 10 creates a filter that will include all of the rows that contain a value for the width column that is greater than 10. Consider that you may want to include all of the rows from the work table that have a value for the width column that is greater than the average width. To accomplish this, you would likely need two separate queries. The first query would be used to calculate the average width of the works of art. ---[EVALUATE]--- "type":"plan", "task":"Display the works of art with a greater than average width.", "columns":"all columns", "tables":"work", "rows": "width is greater than the average width", "order": "none specified" ---[EVALUATE]--- The syntax for this query would be: SELECT AVG(width) FROM work; **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas|visible owner:atlas-query result:average-work-width ---[SQL]--- SELECT AVG(width) FROM work; ---[SQL]--- The average width is calculated as 23.32. Now that you know the average width, you can a query to display the works of art with a greater than average width. The syntax for this query is: SELECT * FROM work WHERE width > 23.32 **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:works-width-more-than-23.32 ---[SQL]--- SELECT * FROM work WHERE width > 23.32 ---[SQL]--- Although this approach is effective (you are able to display the works of art with a greater than average width), it is not the best approach. The main problem with this approach is that the average width will change every time a new work of art is added to the table. This means that you will need to execute the first query to calculate the average width and then edit the second query with the updated average value every time you wish to generate the list. This is a case where a subquery will be very helpful. You can replace the calculated average value in the second query with a subquery that calculates the average every time the query is executed. The syntax for this query would be: SELECT * FROM work WHERE width > (SELECT AVG(width) FROM work); ---[EVALUATE]--- "type":"evaluate", "query":"SELECT *
FROM work
WHERE width > (SELECT AVG(width) FROM work);" ---[EVALUATE]--- This query will retrieve the correct rows. The outer query displays all of the columns from the work table. The WHERE clause of the outer query is constructed to ensure that the rows included in the result have a value for the width column that is greater than the value returned by the subquery. The subquery calculates the average value for the width column of the work table. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas owner:atlas-query diagram:canvas result:works-above-average-width-subquery ---[SQL]--- SELECT * FROM work WHERE width > (SELECT AVG(width) FROM work); ---[SQL]--- Notice the syntax of the query. Whenever you use a subquery, you always enclose the subquery in parenthese. Additionally, the subquery replaces the hard-coded average in the previous query. In this case, the subquery calculates the average value. This calculated average is then used to create the logical test for the filter in the outer query. Constructing the query in this way makes accounts for any change to the average width that may occur when new works of art are added to the work table. A subquery in the WHERE clause is always used in similar ways. The subquery will return a value or a set of values that will be used in the outer query to complete a logical test. Consider a few examples. Suppose you want display all of the works by Renior using a subquery in the WHERE clause. ---[EVALUATE]--- "type":"plan", "task":"Display all of the works by Renior using a subquery in the WHERE clause.", "columns":"artwork name", "tables":"work", "rows":"artist_id for artist with last name of Renoir", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will retrieve this data: ---[AI]--- [ { "prompt": "Write a SQL statement to display the names of the works by Renior using a subquery in the WHERE clause.", "pretext": "To display the names of the works by Renoir using a subquery in the WHERE clause, you can write an SQL query based on the table structures you provided earlier. Here's the query:", "response": "SELECT name \nFROM work \nWHERE artist_id = (SELECT artist_id FROM artist WHERE last_name = 'Renoir');", "posttext": "In this query we use a subquery in the WHERE clause to find the artist_id for the artist with the full_name \"Renoir\" in the \"artist\" table. The subquery (SELECT artist_id FROM artist WHERE full_name = 'Renoir') retrieves the artist ID for Renoir. The main query then selects the names of works from the \"work\" table where the artist_id matches the artist ID retrieved from the subquery. This query will display the names of the works by Renoir based on the artist's full name." } ] ---[AI]---
The SELECT statement to retrieve this data is: SELECT name FROM work WHERE artist_id = (SELECT artist_id FROM artist WHERE last_name = 'Renoir');
---[EVALUATE]--- "type":"evaluate", "query":"SELECT name
FROM work
WHERE artist_id = (SELECT artist_id FROM artist WHERE last_name = 'Renoir');" ---[EVALUATE]--- This SQL statement retrieves the correct records. In this query, the operator in the WHERE clause includes a subquery that returns the artist_id for the artist whose last_name is Renoir. This value is then used in a logical test the filters the works of art to show only those whose artist_id matches the id returned by the subquery. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:renoir-works-subquery ---[SQL]--- SELECT name FROM work WHERE artist_id = (SELECT artist_id FROM artist WHERE last_name = 'Renoir'); ---[SQL]--- The subquery is necessary in this query, because the artist last_name column is not on the work table. This task could also be accomplished using a query that joins the work and artist tables. This is often the case with a subquery in the WHERE clause. Whether you use a subquery or a join query in these instances is a matter of personal preference. ---[SIDEBAR]--- "type":"caution", "title":"Subquery Performance", "text":"Using subqueries can be a processor intensive process for the DBMS. You may find that the DBMS can more easily process a join query than one with a subquery." ---[SIDEBAR]--- Notice the syntax of the logical test. It uses the '=' operator. This works because we expect only one artist_id to be returned by the subquery. If you think that a subquery might return more than value (perhaps more than one artist has a last name of Renoir), you would need to use the IN operator that can handle a list of values. For example, suppose you want to display the names of works of art by Canadian artists using a subquery in the WHERE clause. ---[EVALUATE]--- "type":"plan", "task":"Display the names of works of art by Canadian artists using a subquery in the WHERE clause.", "columns":"artwork name", "tables":"work", "rows":"artist_id for Canadian artists", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will retrieve this data: ---[AI]--- [ { "prompt": "Write a SQL statement to display the names of the works by Canadian artists using a subquery in the WHERE clause.", "pretext": "To display the names of the works by Canadian artists using a subquery in the WHERE clause, you can use a SQL query like the following, assuming that the \"nationality\" column in the \"artist\" table is used to identify the artist's nationality:", "response": "SELECT name \nFROM work \nWHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian');", "posttext": "In this query we use a subquery in the WHERE clause to find artist_ids for artists with the nationality \"Canadian\" in the \"artist\" table. The subquery (SELECT a.artist_id FROM artist a WHERE a.nationality = 'Canadian') retrieves the artist IDs for Canadian artists. The main query then selects the names of works from the \"work\" table where the artist_id is in the list of artist IDs retrieved from the subquery. This query will display the names of the works by Canadian artists based on their nationality." } ] ---[AI]---
The SELECT statement to retrieve this data is: SELECT name FROM work WHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian');
---[EVALUATE]--- "type":"evaluate", "query":"SELECT name
FROM work
WHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian');" ---[EVALUATE]--- This SQL statement retrieves the correct records. In this query, the WHERE clause includes a subquery that returns the artist_id for the Canadian artists. This value is then used in a logical test the filters the works of art to show only those whose artist_id matches the ids returned by the subquery. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:canadian-artist-works-subquery ---[SQL]--- SELECT name FROM work WHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian'); ---[SQL]--- The subquery is necessary in this query, because the artist nationality column is not on the work table. Notice the syntax of the logical test. It uses the IN operator. This is necessary, because the subquery is likely to return more than one artist_id (there is likely more than one Canadian artist). ---[SIDEBAR]--- "type":"tip", "title":"Subqueries and Logical Operators", "text":"Recall that the IN operator handles a list of values and the '=' operator can be used to compare only a single value. You may not know that the IN operator can also handle a list of values that contains only a single value. For this reason, it is generally best to use the IN operator rather than the '=' operator when using a subquery in the WHERE clause. The IN operator can be used when the subquery returns any number of values, where the '=' operator only works when the subquery produces a single value." ---[SIDEBAR]--- Consider one final example of using a subquery in the WHERE clause. Suppose you wish to display all of the museums that exhibit the works of art created by Canadian artists. ---[EVALUATE]--- "type":"plan", "task":"Display all of the museums that exhibit the works of art created by Canadian artists.", "columns":"museum name", "tables":"museum", "rows":"Canadian artists", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will retrieve this data: ---[AI]--- [ { "prompt": "Write a SQL statement to display the names of the museums that exhibit works by Canadian artists using subqueries in the WHERE clause.", "pretext": "To display the names of the museums that exhibit works by Canadian artists using subqueries in the WHERE clause, you can use a SQL query like the following, assuming you have a \"museum\" table and a \"work\" table with artist information, and the artist's nationality is stored in the \"artist\" table:", "response": "SELECT museum.name \nFROM museum \nWHERE museum_id IN (SELECT museum_id FROM work WHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian'));", "posttext": "In this query the innermost subquery (SELECT a.artist_id FROM artist a WHERE a.nationality = 'Canadian') is used to find artist_ids for Canadian artists. The middle subquery (SELECT DISTINCT w.museum_id FROM work w WHERE w.artist_id IN ...), uses the artist_ids from the previous subquery to find the museum_ids of works created by Canadian artists. The outer query selects the names of museums from the \"museum\" table where the museum_id is in the list of museum_ids from the middle subquery. This query will display the names of museums that exhibit works by Canadian artists." } ] ---[AI]---
The SELECT statement to retrieve this data is: SELECT museum.name FROM museum WHERE museum_id IN (SELECT museum_id FROM work WHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian'));
---[EVALUATE]--- "type":"evaluate", "query":"SELECT museum.name
FROM museum
WHERE museum_id IN
   (SELECT museum_id FROM work WHERE artist_id IN
      (SELECT artist_id FROM artist WHERE nationality = 'Canadian')
   );" ---[EVALUATE]--- This SQL statement retrieves the correct records. In this query, the WHERE clause includes two subqueries. The first subquery returns the artist_id for the Canadian artists. The second uses these artist_ids to produce a set of museum_ids on the work table for the works of art created by Canadian artists. These values are then used in a logical test the filters the museums to show only those works of art by Canadian artists (defined by the subqueries). **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:museums-with-canadian-artist-works-subquery ---[SQL]--- SELECT museum.name FROM museum WHERE museum_id IN (SELECT museum_id FROM work WHERE artist_id IN (SELECT artist_id FROM artist WHERE nationality = 'Canadian')); ---[SQL]--- Notice that this query contains two subqueries. A query can theoretically contain any number of subqueries. As you review the subqueries in this section, you will notice that all of the subqueries could be executed as standalone queries. This means that each of these subqueries are independent subqueries. Also note one other commonallity. A subquery in the where clause will produce either a single value or a set of values to be used as part of a logical test. If the subquery returns anthing other than a value or values compatible with the logical test, it will not work. In the last example, if the inner most subquery returned a list of artist names rather than artist_ids it would not work. ### Using Subqueries in the SELECT clause A subquery used in the SELECT clause is not used as part of a logical test. Recall that the SELECT clause is used to define the columns and calculations that will be included in the query result. This means that a subquery in the SELECT clause will define a column that will be included in the query result. For example, a subquery could be used to display the number of works of art exhibited in each museum using a subquery in the SELECT clause. ---[EVALUATE]--- "type":"plan", "task":"Display the number of works of art exhibited in each museum using a subquery in the SELECT clause sorted by artwork_count from high to low.", "columns":"museum name, artwork_count", "tables":"museum", "rows":"all museums", "order":"descending order by artwork_count" ---[EVALUATE]--- A possible SQL statement that could be used to address this task is: SELECT name, (SELECT count(*) from work) AS artwork_count FROM museum ORDER BY artwork_count DESC; **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:museum-artwork-count-independent-subquery ---[SQL]--- SELECT name, (SELECT count(*) from work) AS artwork_count FROM museum ORDER BY artwork_count DESC; ---[SQL]--- Notice the syntax of the subquery. It is enclosed in parentheses as before. In this case, it is used just as you would specify any other column or calculation. It is part of the comma separated list of items to be displayed in the query result. Note how the subquery is renamed using the keyword AS so that the results of the subquery have a header label at the top of the column in the query results. In this case, the values returned by the subquery do not provide the correct results. The subquery returns the number of works of art fromt he work table. This value is repeated for each row of the query result. Instead, you would like to see the number of works of art exhibited in each museum. This would likely be different for each museum. To accomplish this, you need to use a dependent subquery.
You can use a chat with ^ai_name^ to create the SELECT statement that will retrieve this data: ---[AI]--- [ { "prompt": "Write a SQL statement to display the names of museums and the number of works of art, named artwork_count, exhibited in each museum. Use a subquery in the SELECT clause. Sort the results in descending order by artwork_count.", "pretext": "To display the names of museums and the number of works of art exhibited in each museum, you can use a subquery in the SELECT clause along with a method to combine the \"museum\" and \"work\" tables. Here's the SQL statement to achieve this, sorted in descending order by artwork_count:", "response": "SELECT name, (SELECT COUNT(*) FROM work WHERE work.museum_id = museum.museum_id) AS artwork_count \nFROM museum \nORDER BY artwork_count DESC;", "posttext": "In this query we use a subquery in the SELECT clause to count the number of works of art in the \"work\" table associated with each museum by matching the museum_id. We alias the result of the subquery as artwork_count for clarity. The main query selects the museum name from the \"museum\" table and the artwork count from the subquery. The results are then sorted in descending order by artwork_count. This query will display the names of museums and the number of works of art exhibited in each museum, sorted in descending order by the number of artworks." } ] ---[AI]---
The SELECT statement to retrieve this data is: SELECT name, (SELECT COUNT(*) FROM work WHERE work.museum_id = museum.museum_id) AS artwork_count FROM museum ORDER BY artwork_count DESC;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT name, (SELECT COUNT(*) FROM work WHERE work.museum_id = museum.museum_id) AS artwork_count
FROM museum
ORDER BY artwork_count DESC;" ---[EVALUATE]--- This SQL statement retrieves the correct records. It displays the museum name and artwork_count for each museum. Unlike the last query, this one will calculate and display the number of works of art exhibited in each museum. This is done with the WHERE clause in the subquery. The condition WHERE work.museum_id = museum.museum_id makes it so the subquery will be processed for each row of the outer query. The museum_id for each row of the outer query is used in the subquery to determine which works of art to count (those that match the museum_id of the current row of the outer query). This means that the subquery is dependent on the outer query. It will not execute without the museum_ids of the outer query. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:museums-with-artwork-count-dependent-subquery ---[SQL]--- SELECT name, (SELECT COUNT(*) FROM work WHERE work.museum_id = museum.museum_id) AS artwork_count FROM museum ORDER BY artwork_count DESC; ---[SQL]--- The results display the name of each museum. They also show the number of works of art associated with each museum. The number of works of art for each museum is calculated using a dependent or correlated subquery. ### Using Subqueries in the FROM Clause It is also common to use a subquery in the FROM clause. Recall that the FROM clause is used to specify which tables are the source of the data for the query. A subquery in the FROM clause must produce a table - a set of columns and rows. You could use a subquery in the FROM clause to complete the task in the last example. That task was to display the names of museums and the number of works of art (artwork_count) exhibited in each museum. ---[EVALUATE]--- "type":"plan", "task":"Display the number of works of art exhibited in each museum using a subquery in the FROM clause sorted by artwork_count from high to low.", "columns":"museum name, artwork_count", "tables":"museum, subquery (with the artwork_count column)", "rows":"all museums", "order":"descending order by artwork_count" ---[EVALUATE]--- For this to work, the subquery in the FROM clause must include at least two things: 1. A calculated column, named artwork_count, based on the work table that counts the number of works of art in each museum. 2. A way to join the subquery with the museum table so that the museum names can be associated with the artwork_count for each museum.
You can use a chat with ^ai_name^ to create the SELECT statement that will retrieve this data: ---[AI]--- [ { "prompt": "Write a SQL statement to display the names of museums and the number of works of art, named artwork_count, exhibited in each museum. Use a subquery in the FROM clause. Sort the results in descending order by artwork_count.", "pretext": "To display the names of museums and the number of works of art exhibited in each museum, using a subquery in the FROM clause, and sorting the results in descending order by artwork_count, you can use a SQL statement like this:", "response": "SELECT name, artwork_count \nFROM museum JOIN (SELECT museum_id, COUNT(*) AS artwork_count FROM work GROUP BY museum_id) subquery \nON museum.museum_id = subquery.museum_id \nORDER BY artwork_count DESC;", "posttext": "In this query the subquery in the FROM clause groups the works of art by museum_id and calculates the count of artworks for each museum. The main query then joins the \"museum\" table with the subquery results using the museum_id to connect the two tables. It selects the museum name from the \"museum\" table and the artwork count from the subquery results (aliased as artwork_count). The results are sorted in descending order by artwork_count. This query will display the names of museums and the number of works of art exhibited in each museum, sorted in descending order by the number of artworks." } ] ---[AI]---
The SELECT statement to retrieve this data is: SELECT name, artwork_count FROM museum JOIN (SELECT museum_id, COUNT(*) AS artwork_count FROM work GROUP BY museum_id) subquery ON museum.museum_id = subquery.museum_id ORDER BY artwork_count DESC;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT name, artwork_count
FROM museum JOIN
   (SELECT museum_id, COUNT(*) AS artwork_count FROM work GROUP BY museum_id) subquery
ON museum.museum_id = subquery.museum_id
ORDER BY artwork_count DESC;" ---[EVALUATE]--- This SQL statement retrieves the correct results. It displays the museum name and artwork_count for each museum. In this query, the subquery returns the museum_id and the count of the works of art for each museum grouped by the museum_id. The subquery is then joined with the museum table on the museum_id column in the museum table and the subquery. The museum name column is displayed from the museum table and the artwork_count column is displayed from the subquery. The results are sorted by artwork_count in descending order. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:museums-with-artwork-count-from-subquery ---[SQL]--- SELECT name, artwork_count FROM museum JOIN (SELECT museum_id, COUNT(*) AS artwork_count FROM work GROUP BY museum_id) subquery ON museum.museum_id = subquery.museum_id ORDER BY artwork_count DESC; ---[SQL]--- The results display the name of each museum. They also show the number of works of art associated with each museum. The number of works of art for each museum is calculated using a subquery in the FROM clause. This subquery is technically an independent subquery, since it could be executed without the outer query. ## Summary In this lesson, you learned how to use subqueries. In particular, you learned how to use subqueries in the WHERE SELECT, and FROM clauses. Specifically, we discussed that: * A subquery in the WHERE clause returns a value or set of values to be used in a logical test in the WHERE clause. * A subquery in the SELECT clause returns a column or calculated column that is displayed in the query results. * A subquery in the FROM clause creates a table (a set of columns and rows) that can be used as a source of the data for the query.