---[REF]---"style":"heading"---[REF]--- ---[SIDEBAR]--- "type":"objectives", "title":"Learning Objectives", "text":"After completing this lesson, the successful student will be able to:
  1. Describe the uses of aggregate functions.
  2. Use aggregate functions to summarize the values in a table.
" ---[SIDEBAR]--- ## Using Aggregate Functions The previous lesson described how to use scalar functions, or functions that perform calculations for each row of a query result. This lesson will describe how to use aggregate functions that perform summary calculations across the rows of a query result. Examples of aggregate calculations include: * Counting the rows in a table or query result * Calculating the total or average of the values in a table column * Finding the largest or smallest value in a table column. Queries that include aggregate functions are different than other queries. Each of the example queries presented in this resource have either returned all of the rows of the database table or a smaller set of rows that meet the filter conditions set in the WHERE clause. Aggregate queries display a summary of the data in a table. The table rows are not displayed. ---[SIDEBAR]--- "type":"term", "title":"Aggregate Function", "text":"A function that performs summary calculations across the rows of a database table or query result." ---[SIDEBAR]--- There are five aggregate functions supported in structured query language. Unlike the scalar functions discussed in the last lesson, each of these functions are supported by the different DBMS. Each of these functions are described in the table below. #### SQL Aggregate Functions
| Function | Argument(s) |Result | |:--------:|------------------------------|------------------------------| | SUM | A column | Returns the total of the values in a column | | AVG | A column | Returns the average of the values in a column | | COUNT | A column | Returns the number of rows in a column | | MAX | A column | Returns the largest value in a column | | MIN | A column | Returns the smallest value in a column |
The remaining sections in this lesson describes how to use each of these functions, how the functions handle distinct values, and how to write queries that combine calculations. ## The SUM Function The SUM function is used to calculate a total of the values in a column. The SUM function requires a column name as the only argument. Since the SUM function computes the total of the values in the column, the result is a single value. Also, since this value is calculated, you will want to create an alias for the result in your query. ---[SIDEBAR]--- "type":"caution", "title":"Aliases and Column Names", "text":"Some DBMS will not allow you to use a column name as an alias in a query. Given that aliases are often used with calculations, it is probably best to use a more descriptive alias anyway." ---[SIDEBAR]--- The work table in the museum database contains information about the individual pieces of art. Included in this data is the width and height (in inches) of the pieces of work. Suppose you need to calculate the total width of all of the art in the database. Name the calculated field 'total_width' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the total width of all of the art in the database.", "columns":"total widths of artwork", "tables":"work", "rows":"all", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write an SQL statement to calculate the total width of the artwork. Name the calculated field total_width.", "pretext": "Certainly! Here's the SQL statement to calculate the total width of the artwork and name the calculated field as total_width:", "response": "SELECT SUM(width) AS total_width \nFROM work;", "posttext": "This query calculates the sum of the width column from the work table and renames the result as total_width." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT SUM(width) AS total_width FROM work;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT SUM(width) AS total_width \nFROM work;" ---[EVALUATE]--- This query returns the correct results. It calculates the total width of the works of art from the work table. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas|visible owner:atlas-query result:work-total-width ---[SQL]--- SELECT SUM(width) AS total_width FROM work; ---[SQL]--- This example demonstrates a couple of important things about the using the SUM function. First, the syntax of the SUM function is straightforward. As with the scalar functions discussed in the previous lesson, you insert the name of the function - SUM - followed by parentheses with the column name to be used in the calculation - in this case the width column. Second, the result of this function is the total of the values in the width column. The total is a single value, so there is only one row returned in the query result. Since the calculated field is the only field in the SELECT clause, this query will produce exactly *one* row and *one* column. The SUM function returns the total of the values that are included in the query for the specified column. This is an important point. The exception to this is NULL values. A NULL value is not included in the SUM calculation. You can apply a filter to the query that will limit the values that will be included in the SUM. In the last example, the total width of *all* works is calculated. Assume you want to adjust the prior SQL statement to calculate the width of only the Renaissance style artwork. Name the calculated field 'total_width_renaissance' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the width of only the Renaissance style artwork.", "columns":"total widths of artwork", "tables":"work", "rows":"style is Renaissance", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Adjust the prior SQL statement to calculate the total width of only the Renaissance style artwork. Name the calculated field total_width_renaissance.", "pretext": "Certainly! Here's the adjusted SQL statement to calculate the total width of artworks with the Renaissance style and name the calculated field as total_width_renaissance:", "response": "SELECT SUM(width) AS total_width \nFROM work \n WHERE style = 'Renaissance';", "posttext": "This query calculates the sum of the width column from the work table for artworks with the style 'Renaissance' and renames the result as total_width_renaissance." } ] ---[AI]--- **Try the SQL statement.**
The syntax for a query that accomplishes this is: SELECT SUM(width) AS total_width FROM work WHERE style = 'Renaissance';
---[EVALUATE]--- "type":"evaluate", "query":"SELECT SUM(width) AS total_width \nFROM work \n WHERE style = 'Renaissance';" ---[EVALUATE]--- This query produces the correct results. The sum of the width of the artworks is calculated from the work table. Only those works of art with a style of Renaissance will be included in the query results and therefore the calculation. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-total-width-renaissance ---[SQL]--- SELECT SUM(width) AS total_width FROM work WHERE style = 'Renaissance'; ---[SQL]--- Notice that the syntax of the SUM function does not change, but the calculated total width does change. The reason for this is that the WHERE clause reduces the values included in the SUM calculation. It filters the artwork included in the calculation to be only those with a style of Renaissance. ---[SIDEBAR]--- "type":"note", "title":"WHERE Clause and Aggregate Functions", "text":"Recall that the WHERE clause is executed before the SELECT clause even though the SELECT clause appears first in the SQL statement. Because of this, the rows filtered by the WHERE clause impact value calucated by an aggregate function. Only those rows that satisfy the criteria set in the WHERE clause will be included in the calculation." ---[SIDEBAR]--- The SUM function can calculate the total of more than just the values in a column. It would probably be more helpful to know the total area (height * width) of the artwork to know how much wallspace is needed to showcase the art. Create a SQL statement that calculates the total area (height * width) of the artwork. Name the calculated field 'total_area' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the total area (height * width) of the artwork.", "columns":"total area (width*height) of artwork", "tables":"work", "rows":"all", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to calculate the total area of the artwork. Name the calculated field total_area.", "pretext": "To calculate the total area of the artwork, you'll need to multiply the width and height columns for each artwork and then sum up the resulting areas. Here's the SQL statement to achieve that and name the calculated field as total_area:", "response": "SELECT SUM(width * height) AS total_area \nFROM work;", "posttext": "This query calculates the sum of the product of width and height for each artwork in the work table and names the result as total_area, representing the total combined area of all artworks." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT SUM(width * height) AS total_area FROM work;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT SUM(width * height) AS total_area \nFROM work;" ---[EVALUATE]--- The query will produce the correct results. The total area is calculated from the width and height columns of the work table. All works of art are included in the query calculation. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-total-area ---[SQL]--- SELECT SUM(width * height) AS total_area FROM work; ---[SQL]--- In this example, the area of the individual works is calculated by multiplying each work's width by its height. The SUM function then totals the area of each artwork. The result is the total area of all artwork. Notice the argument of the SUM function. It is the formula to calculate the area. The query could also include a WHERE clause to create a filter to calculate the total area for specific works of art. ## The AVG Function The AVG function works much like the SUM function. It takes the average, or mean, of the values in a column. Suppose you want to calculate the average area (height * width) of the Renaissance artwork. Name the calculated field 'average_renaissance_area' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the average area (height * width) of the Renaissance artwork.", "columns":"average area (width*height) of artwork", "tables":"work", "rows":"style is Renaissance", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to calculate the average area of the Renaissance artwork. Name the calculated field average_renaissance_area.", "pretext": "To calculate the average area of Renaissance artworks, you'll need to first calculate the area (width * height) for each Renaissance artwork and then find the average of those areas. Here's the SQL statement to achieve that and name the calculated field as average_renaissance_area:", "response": "SELECT AVG(width * height) AS average_renaissance_area \nFROM work \nWHERE style = 'Renaissance';", "posttext": "This query calculates the average of the product of width and height for each Renaissance artwork in the work table and names the result as average_renaissance_area." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT AVG(width * height) AS average_renaissance_area FROM work WHERE style = 'Renaissance';
---[EVALUATE]--- "type":"evaluate", "query":"SELECT AVG(width * height) AS average_renaissance_area \nFROM work \nWHERE style = 'Renaissance';" ---[EVALUATE]--- This SQL statement will produce the correct results. The average area will be calculated from the width and height columns of the work table. Only those works of art with a style of Renaissance will be included in the average calculation. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-total-area-renaissance ---[SQL]--- SELECT AVG(width * height) AS average_renaissance_area FROM work WHERE style = 'Renaissance'; ---[SQL]--- In this example, the WHERE clause filters the query to include only those works of art with the Renaissance style. The area for each of these works is calculated by multiplying the width of each times the height. The AVG function calculates the average of the areas. As with the SUM function, NULL values are not included in the AVG calculation. ## The COUNT Function The COUNT function returns the number of rows in a query result. The argument for the count function is also generally a column name. Since the COUNT function returns the number of rows in a query result, the wildcard character '*' can be used for its argument. Consider this example. Suppose you want to count the number of artists in the database. Name the calculated field 'number_of_artists' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Count the number of artists in the database.", "columns":"number of artists", "tables":"artist", "rows":"all", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to count the number of artists. Name the calculation number_of_artists.", "pretext": "To count the number of artists in the artist table, you can use the SQL COUNT function. Here's the SQL statement to achieve that and name the calculated field as number_of_artists:", "response": "SELECT COUNT(*) AS number_of_artists \nFROM artist;", "posttext": "This query uses the COUNT function to count the number of rows (artists) in the artist table and names the result as number_of_artists." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT COUNT(*) AS number_of_artists FROM artist;
---[EVALUATE]--- "type":"evaluate", "query":"SELECT COUNT(*) AS number_of_artists \nFROM artist;" ---[EVALUATE]--- This query will produce the correct results. The number of artists is calculated by applying the COUNT function to the rows of the artist table. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-count ---[SQL]--- SELECT COUNT(*) AS number_of_artists FROM artist; ---[SQL]--- In this example, the COUNT function is invoked using '*' as the argument. It will return the number of rows on the artist table. Since each row of the table contains data about a unique artist, this will also be the number of artists in the database. The query could also be rewriten with one of the columns of the artist table as the argument as follows: ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-count ---[SQL]--- SELECT COUNT(artist_id) AS number_of_artists FROM artist; ---[SQL]--- In this query, the count function will return the number of rows in the artist_id column of the artist table. This will also equal the number of artists in the database. If you filter the query using a WHERE clause, the result of the COUNT function will change. Also, like the other functions discussed in this lesson, NULL values will not be included in the COUNT calculation if you use a column name as the argument to the function. However, if you use the wildcard '*' character as the argument, all rows (even those that contain NULL values) *will be included* in the COUNT calculation. ## The MAX Function The MAX function returns the largest value in a specified column. The argument for the MAX function is a column name or calculation (like the SUM and AVG functions). The MAX function ignores NULL values. For example, suppose you want to calculate the largest area for a work of art. Name the calculation 'largest_area' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the largest area for a work of art.", "columns":"largest area", "tables":"work", "rows":"all", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to find the largest area for a work of art. Name the calculation largest_area.", "pretext": "To find the largest area for a work of art, you can use the MAX function to find the maximum product of width and height from the work table. Here's the SQL statement to achieve that and name the calculated field as largest_area:", "response": "SELECT MAX(width * height) AS largest_area \nFROM work;", "posttext": "This query calculates the maximum area (width * height) among all artworks in the work table and names the result as largest_area." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT MAX(width * height) AS largest_area FROM work;
---[EVALUATE]--- "type":"plan", "query":"SELECT MAX(width * height) AS largest_area \nFROM work;" ---[EVALUATE]--- This SQL statement will return the correct results. The largest area for a work of art is calculated from the width and height columns on the work table. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-largest-area ---[SQL]--- SELECT MAX(width * height) AS largest_area FROM work; ---[SQL]--- In this example, the MAX function is used to find the artwork with the largest area. As before, the area of each work of art is first calculated as the product of the width and the height. ## The MIN Function The MIN function is used to return the smallest value in a specified column. The argument for the MIN function is a column name or calculation (like the SUM and AVG functions). The MIN function ignores NULL values. Assume you want to calculate the smallest area for a work of art. Name the calculation 'smallest_area' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the smallest area for a work of art.", "columns":"smallest area", "tables":"work", "rows":"all", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to find the smallest area for a work of art. Name the calculation smallest_area.", "pretext": "To find the smallest area for a work of art, you can use the MIN function to find the minimum product of width and height from the work table. Here's the SQL statement to achieve that and name the calculated field as smallest_area:", "response": "SELECT MIN(width * height) AS smallest_area \nFROM work;", "posttext": "This query calculates the minimum area (width * height) among all artworks in the work table and names the result as smallest_area." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT MIN(width * height) AS smallest_area FROM work;
---[EVALUATE]--- "type":"plan", "query":"SELECT MIN(width * height) AS smallest_area \nFROM work;" ---[EVALUATE]--- This query will produce the correct calculation. The width and height columns on the work table are used to calculate the area for each work of art. The MIN function is used to calculated the smallest of the calculated areas. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:work-smallest-area ---[SQL]--- SELECT MIN(width * height) AS smallest_area FROM work; ---[SQL]--- In this example, the MIN function is used to find the artwork with the smallest area. As before, the area of each work of art is first calculated as the product of the width and the height. ---[SIDEBAR]--- "type":"tip", "title":"MAX and MIN with Non-Numeric Data", "text":"Although counter-intuitive, you can use the MAX and MIN functions with columns that don't contain numbers. If you calculate the MAX of a column with text, the result would be the last value alphabetically. The opposite would be true for the MIN function. One way to think of a maximum or minimum value is that it is the first or last value in a sorted list. This could be a list of numbers, dates, or text values." ---[SIDEBAR]--- ## Aggregates on Distinct Values Recall from ---[REF]---"style":"link","pageId":"lesson-retrieving-data"---[REF]---, that the keyword DISTINCT is used to remove duplicates in a query result. The keyword DISTINCT is also relevant for aggregate functions. Each of the five aggregate functions will perform a calculations on all of the rows in a query. Each non-NULL value will be included in the calculations performed by these functions - including duplicate values. You can use DISTINCT to remove duplicates from an aggregate calculation. The syntax for this is below. Notice the difference in the results returned by the two queries in the example. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-count-birth-compare-distinct ---[SQL]--- SELECT COUNT(birth) AS number_of_artists from artist; SELECT COUNT(DISTINCT birth) AS number_of_years_with_artist_births from artist; ---[SQL]--- The first query essentially counts the artists. The argument for the COUNT function is the birth column. The COUNT function will return the number of rows in the birth column of the artist table. Since each row of the artist table contains a unique artist, counting the rows of the birth column of the artist table will count the number of artists in the database. The result of the second query should be interpretted very differently. The keyword DISTINCT in the argument of the COUNT function means that duplicate values for birth will be ignored. Since there are multiple artists born in the same year, many rows will be excluded in the COUNT calculation. What will be counted is the unigue set of birth years for the artists. Each year that an artist is born will be counted only once. Hence, the result of this query is the number of years with artist births. ---[SIDEBAR]--- "type":"tip", "title":"Limitations on DISTINCT and Aggregate Functions", "text":"There are a couple of important limitations on using the keyword DISTINCT with an aggregate function. First, DISTINCT cannot be used with the expression COUNT(*) - a column name must be used as the argument. Second, DISTINCT should also be used with a column name as the argument. It may not work with some DBMS when using a calculation as the argument to an aggregate function." ---[SIDEBAR]--- ## Combining Aggregate Functions All of the examples in this lesson have used a single aggregate function in the SELECT clause. SQL statement can have as many aggregate functions in the SELECT clause as you want. For example, you may want to calculate the smallest and largest birth year in the artist table. Name the first column 'earliest_birth_year' (without the quotes) and the second column 'lastest_birth_year' (without the quotes). ---[EVALUATE]--- "type":"plan", "task":"Calculate the smallest and largest birth year in the artist table.", "columns":"smallest birth year, largest birth year", "tables":"artist", "rows":"all", "order":"none specified" ---[EVALUATE]---
You can use a chat with ^ai_name^ to create the SELECT statement that will do this: ---[AI]--- [ { "prompt": "Write a SQL statement to calculate the smallest and largest birth year for an artist. Name the first column 'earliest_birth_year' (without the quotes) and the second column 'lastest_birth_year' (without the quotes).", "pretext": "To calculate the smallest and largest birth years for artists, you can use the MIN and MAX functions on the birth column in the artist table. Here's the SQL statement to achieve that and name the columns as specified:", "response": "SELECT MIN(birth) AS earliest_birth_year, MAX(birth) AS latest_birth_year \nFROM artist;", "posttext": "This query calculates the smallest (earliest) and largest (latest) birth years among all artists in the artist table and assigns the calculated values to the columns named earliest_birth_year and latest_birth_year, respectively." } ] ---[AI]---
The syntax for a query that accomplishes this is: SELECT MIN(birth) AS earliest_birth_year, MAX(birth) AS latest_birth_year FROM artist;
---[EVALUATE]--- "type":"plan", "query":"SELECT MIN(birth) AS earliest_birth_year, MAX(birth) AS latest_birth_year \nFROM artist;" ---[EVALUATE]--- This query will produce both calculations correctly. The birth column from the artist table is used twice. First it is used with the MIN function to find the earliest (smallest) birth year. Then it is used with the MAX function to calculate the latest birth year. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:artist-earliest-latest-birth ---[SQL]--- SELECT MIN(birth) AS earliest_birth_year, MAX(birth) AS latest_birth_year FROM artist; ---[SQL]--- Notice that the two aggregate calculations are separated with a comma in the SELECT clause. This is just like adding more than one column to the SELECT clause. Since there are two calculated fields in the query, the result will have two columns. Since each calculation will produce a single result, the query will produce only one row. ## Summary Aggregate functions allow you to perform summary calculations in a query. This lesson discussed how to use each of the five aggregate functions: SUM, AVE, COUNT, MAX, and MIN. It also described how to use the keyword DISTINCT with an aggregate function and how to combine more than one aggregate calculation in the same query result.