# 2. Retrieving Data ---[SIDEBAR]--- "type":"objectives", "title":"Learning Objectives", "text":"After completing this lesson, the successful student will be able to:
  1. Use the SELECT statement to retrieve one or more columns of data from a table.
  2. Use the keyword DISTINCT to show only unique rows in a database query
  3. Use the keyword LIMIT to show a specified number of rows in a query result
" ---[SIDEBAR]--- ## The SELECT Statement Structured query language (SQL) is used to retrieve and manipulate data stored in database tables. SQL uses a set of reserved plain english words, called keywords, to form sentence-like commands. This lesson will describe how to use select statements to retrieve one of more columns of data from a table. ---[SIDEBAR]--- "type":"term", "title":"Keyword", "text":"A reserved word that is part of the SQL language." ---[SIDEBAR]--- Because SQL statements are sentence-like, they are meant to be easy to create and to understand. For example, the statement "SELECT name FROM museum" is a valid SQL statement. It is also fairly easy to understand. It would be used to retrieve all of the values from the "name" column stored on the museum table. The keywords in this statement are "SELECT" and "FROM". Because keyword are part of the SQL language, they are "reserved" meaning they are not used to name tables or columns in database. You would not see a column named "select" or a table named "from" in a database. ---[SIDEBAR]--- "type":"note", "title":"Capitalizing SQL Statements", "text":"SQL statements are not case sensitive. The words in a statement can be in uppercase, lowercase, or mixed case. \"SELECT\", \"select\", and \"SeLeCt\" will all be treated the same when they are excuted. When we refer to a keyword in this set of lessons, we will always render it in upper case to differentiate it as a keyword, even though they do not need to be capitalized for the SQL statement to execute correctly." ---[SIDEBAR]--- ---[SIDEBAR]--- "type":"caution", "title":"Table and Column Names and Case", "text":"Although the Structured Query Language is not case-sensitive, the names of tables, columns, and values may be. For example, some database management systems are configured to treat the column name \"firstname\" as different than \"FirstName\"." ---[SIDEBAR]--- There are six main keywords in the SQL language used to retrieve data from a database. Working together, these keywords are used to precisely define the data you want to retrieve from the database. These keywords are: 1. SELECT - used to list the columns you would like to retrieve 2. FROM - used to specify the table or tables you want to get the data from 3. WHERE - used to create conditions or filters on the data that will be retrieved 4. GROUP BY - used to categorize the data that is retrieved from the database 5. HAVING - used to create more complex conditions or filters on the data that will be retrieved 6. ORDER BY - used to sort the data that is retrieved There are two important rules for using these keywords to create a select statement to retrieve data from a database. First, you must provide at least two pieces of information: the columns you would like to retrieve and the table or tables from which you want to retrieve the data. In other words, you must have both SELECT and FROM in every select statement. Second, the order of the keywords is important. All select statements must follow the order listed above. SELECT will always be first and FROM will always be second. The remaining keywords, if used, will appear same order every time. If a keyword is not used, it is just omitted from the statement. In this lesson, we will only use SELECT and FROM. The other keywords will be described in other lessons. ---[SIDEBAR]--- "type":"note", "title":"The Order the Clauses are Executed", "text":"You might be curious to know that the clauses of the SELECT statement are excuted in a different order. The order of execution is FROM, WHERE, GROUP BY, HAVING, SELECT, then ORDER BY. We'll discuss why this matters in subsequent lessons." ---[SIDEBAR]--- ## Creating Queries Creating a SELECT query to fill a data need involves at least two important skills. First, you need to learn how to logically structure a solution that retrieves the needed data. With a good plan of attack in place, the second skill involves implementing the logical solution using the syntax (keywords and rules for using them) of SQL. Getting both the logic and the syntax right is vital to writing an effective query. Incorrect queries will produce either a logic or a syntax error. A logic error happens when the logic of the query is not right. A query with a logic error will return data that is incorrect. A syntax error happens when the SQL language is not used correctly. A query with a syntax error will return an error message. In many cases it is much easier to spot a syntax error than a logic error, becuase a logic error work (it will return some data even if it is not the right data). To help you learn both the logic and the syntax of SQL, the examples in this set of lessons model a four-step process to creating queries: 1. Plan to retrieve the data 2. Write a SQL statement that implements the plan 3. Examine the proposed query, revising the SQL statement if needed 4. Try the SQL statement by running it against the database ### Plan to retrieve the data Constructing a plan to retrieve the desired data is the first step. The answers to four questions will be helpful in constructing this plan. The four questions are: ---[EVALUATE]--- "type":"plan", "task":"", "columns":"", "tables":"", "rows":"", "order":"" ---[EVALUATE]--- ### Write a SQL statement With the plan in place, write the SQL statement for the query. The answers to these questions will determine what will how the SQL statement will be written. You will be encouraged to use ^ai_name^ to help you with the SQL syntax you need.The examples in this resource will suggest the syntax you need. ### Examine the proposed query Examing the proposed SQL statement. Review the syntax to make sure that the query adequately addresses the plan you made in the first step, by revisiting the four questions again. If one or more aspects of the plan are not satisfied, you will need to make adjustments to the proposed query syntax. ---[EVALUATE]--- "type":"examine", "task":"", "columns":"", "tables":"", "rows":"", "order":"" ---[EVALUATE]--- ### Try the SQL statement Once you have a query that you think it logically and syntactically sound, try to execute the query. You can run the queries you create directly in the book and see the results.
---[SIDEBAR]--- "type":"tip", "title":"Examples in the Lessons", "text":"The best way to learn a new technology skill like writing SQL statements is to practice writing them. You will find opportunities to practice throughout the lessons of this book. Each example will present a question that can be answered with data from a database. You will see a SQL statement that retrieves the needed data in a text box. You can execute the statement and retrieve the actual data. You can also edit the statement to see how changes to the statement will result in different data being returned." ---[SIDEBAR]---
---[SIDEBAR]--- "type":"tip", "title":"Examples in the Lessons", "text":"The best way to learn a new technology skill like writing SQL statements is to practice writing them. You will find opportunities to practice throughout the lessons of this book. Each example will present a question that can be answered with data from a database. You will then see the transcript of a chat with ^ai_name^ to help you generate a SQL statement that retrieves the needed data in a text box. You can execute the statement and retrieve the actual data. You can also edit the statement to see how changes to the statement will result in different data being returned." ---[SIDEBAR]---
This lesson features example SQL statements that interact with the museum database (see ---[museumDBDiagram]---). The museum database has five tables: 1. artist: contains information about the artists whose work is exhibited in the museums in the database. 2. work: contains information about the individual works of art exhibited in the museums in the database. 3. subject: contains information about the topic or focus of the works of art in the database 4. museum: contains information about the museums in the database 5. museum_hours: contains information about the days and times the museums are open ---[FIGURE]--- "id":"museumDBDiagram", "title":"The Museum Database Diagram", "url":"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTUdqdAE1lgoTZdSeCFhvs7DjP1IpmkrmcgvG4cvbm6s6bj8S5HMHVZvqDcktQH8pfW0Y_YWPrNTMJYDgjRqMYn3Th50b_MXDvNjACrzqiqLTKecTaWj9fZpVjy1-lKG4ElONb6MU4Wxevq8KSeL4BedSCJld7EdmE1W4EIXibj4rkCYNCBbaG6FYL/s16000/canvas_transparent.png", "altText": "A diagram of the museum database. The museum database has the following tables: artist, subject, work, museum, and museum_hours." ---[FIGURE]--- Each of the examples in this lesson highlight how to write SQL statements to retrieve one or more columns from a table in the museum database. They also describe some basic ways to limit the number of rows that are returned in the query results. ## Retrieving Individual Columns Perhaps the most basic SELECT statement is used to return a single column from a database table. This statement will have both a SELECT clause and a FROM clause. The SELECT clause is used to define the columns you wish to display in the query result. The FROM clause is used to specify the table from which you want to return the data. For example, let's say you want to see the names of the museums in the database. ---[EVALUATE]--- "type":"plan", "task":"List the names of all museums.", "columns":"name", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]--- You can use ^ai_name^ to create the SQL statement you need to do that. ---[AI]--- [ { "prompt": "Write a SQL statement to list the names of museums", "pretext": "To list the names of museums from a database, you would need to know the specific table and column where the museum names are stored. Assuming there is a table called \"Museums\" with a column named \"Name,\" you can use the following SQL statement:", "response": "SELECT Name \nFROM Museums;", "posttext": "This query selects the \"Name\" column from the \"Museums\" table and retrieves all the museum names. Replace \"Museums\" with the actual table name in your database to execute the query successfully." } ] ---[AI]--- ---[EVALUATE]--- "type":"examine", "task":"List the names of all museums.", "query":"SELECT name
FROM Museums;", "columns":"name", "tables":"museum", "rows":"All", "order":"Not specified", "table_issues":"Museums" ---[EVALUATE]--- Because ^ai_name^ doesn't know anything about the database, it will need to guess the table name and column name. Notice that it does a good job of guessing the column name correctly. The name of each museum is stored in the "name" column. ^ai_name^ does not guess the table name correctly, though. Notice that the information about the museums is stored on the "Museum" table. You could fix this by either modifying the SQL to use the correct tablename or you could provide ^ai_name^ with additional information. You could message ^ai_name^: ---[AI]--- [ { "prompt": "Try again. The table is named museum.", "pretext": "Apologies for the mistake. Here's the revised SQL statement to list the names of museums from a table named \"Museum\":", "response": "SELECT Name \nFROM Museum;", "posttext": "This query selects the \"Name\" column from the \"Museum\" table and retrieves all the museum names. Replace \"Museum\" with the actual table name in your database to execute the query successfully." } ] ---[AI]--- ---[EVALUATE]--- "type":"examine", "task":"List the names of all museums.", "query":"SELECT name
FROM Museum;", "columns":"name", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]---
You can use this SQL statement to do that: SELECT name FROM museum; ---[EVALUATE]--- "type":"examine", "task":"List the names of all museums.", "query":"SELECT name
FROM museum;", "columns":"name", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]---
Now the table name matches what is in the database.The query will return the "name" column from the "museum" table. The SELECT statement is set to retrieve a single column called "name" from the "museum" table. The desired column name is specified right after the SELECT keyword. The FROM keyword specifies the name of the table from which to retrieve the data. In this case, the rows will come from the museum table. A SELECT statement without a filter (discussed in a later lesson) returns all the rows in a table. This query does not have a filter nor is it sorted (also discussed in a later lesson). **Try the SQL statement.** Click the "Play" button below the statement to see the results. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas|visible owner:atlas-query result:museum-name ---[SQL]--- SELECT name FROM museum; ---[SQL]--- The previous statement uses the SELECT statement to retrieve a single column called "name" from the "museum" table. The desired column name is specified right after the SELECT keyword, and the FROM keyword specifies the name of the table from which to retrieve the data. ---[SIDEBAR]--- "type":"tip", "title":"Terminating Statements", "text":"You can execute more than one SQL statement at the same time. Multiple SQL statements need to be separated by semicolons. For most database management systems, a semicolon is not required at the end of a single SQL statement executed alone. If your particular DBMS complains, you might have to add one. There is no harm in adding a semicolon when it is not required." ---[SIDEBAR]--- All extra white space within a SQL statement is ignored when that statement is processed. SQL statements can be specified on one long line or broken up over many lines. So, these three statements are will all be treated the same: 1. SELECT name FROM    museum; 2. SELECT name FROM   museum; 3. SELECT      name FROM      museum; **Try the SQL statements.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:museum-name height:10 ---[SQL]--- SELECT name FROM museum; SELECT name FROM museum; SELECT name FROM museum; ---[SQL]--- Most people who write a lot of SQL statements find that breaking them up over multiple lines makes them easier to read and debug. ## Retrieving Multiple Columns The SELECT statement used to retrieve more than one column from a table is very similar to the statement used to retrieve a single column. The only difference is that each column must be specified right after the SELECT keyword. Each column is separated by a comma. ---[SIDEBAR]--- "type":"tip", "title":"Be Careful with Commas", "text":"When selecting multiple columns, be sure to specify a comma between each column name, but not after the last column name. Adding a comma that the end of the list of columns will result in an error." ---[SIDEBAR]--- Let's say you want see the name, city and country of each of the museums in the database. ---[EVALUATE]--- "type":"plan", "task":"List the names, cities, and countries of all museums.", "columns":"name, city, country", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]---
You can use ^ai_name^ to generate a SQL statement to retrieve this data. ---[AI]--- [ { "prompt": "Write a SQL statement to list the names, cities, and countries of museums", "pretext": "Certainly! Here's an example of a SQL statement to list the names, cities, and countries of museums:", "response": "SELECT name, city, country \nFROM museums;", "posttext": "In this example, it is assumed that there is a table named \"museums\" in the database, which contains columns named \"name\", \"city\", and \"country\" to store the respective information for each museum. You may need to adjust the table and column names based on your specific database schema." } ] ---[AI]--- ---[EVALUATE]--- "type":"examine", "task":"List the names, cities, and countries of all museums.", "query":"SELECT name, city, country
FROM Museums;", "columns":"name, city, country", "tables":"museum", "rows":"All", "order":"Not specified", "table_issues":"Museums" ---[EVALUATE]--- Again, ^ai_name^ assumes that the table is named "museums". You can adjust the table name or prompt ^ai_name^ to provide the correct table name. ---[AI]--- [ { "prompt": "Try again. The table is named museum.", "pretext": "Apologies for the mistake. Here's the corrected SQL statement to list the names, cities, and countries of museums from a table named \"museum\":", "response": "SELECT name, city, country \nFROM museum;", "posttext": "This statement will retrieve the columns \"name\", \"city\", and \"country\" from the \"museum\" table and display the respective information for each museum." } ] ---[AI]---
The following SELECT statement retrieves the three columns from the Museum table: SELECT name, city, country FROM museum;
---[EVALUATE]--- "type":"examine", "task":"List the names, cities, and countries of all museums.", "query":"SELECT name, city, country
FROM museum;", "columns":"name, city, country", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]--- The query is ready to try. The name, city, and country columns will be included in the query result, because they are listed in the SELECT clause. Notice that the three columns are separated by commas. The FROM clause ensures that the data will come from the museum table. All rows of the museum table will be returned. The query does not specify any order for the displayed rows. **Try the SQL statement.** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:museum-3 height:3 ---[SQL]--- SELECT name, city, country FROM museum; ---[SQL]--- ---[SIDEBAR]--- "type":"note", "title":"Ordering Columns", "text":"You probably noticed that the columns are displayed in the query result in the order that they are specified after the keyword SELECT. You can display columns in any order by changing the order in the SELECT statement." ---[SIDEBAR]--- ## Retrieving All Columns There are times that you will want to retrieve all of the columns from a database table. You could list all of the column names in the the SELECT statement; however, there is a nice shortcut that you can use to save time. The asterisk "*" is a wildcard character that can be used to retrieve all columns from a table. Suppose you want to get a quick look at all of the data on the museum table. ---[EVALUATE]--- "type":"plan", "task":"List all of the data stored on the museum table.", "columns":"All (*)", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]--- The following SQL statement will retrieve all of the columns from the museum table: SELECT * FROM museum; ---[EVALUATE]--- "type":"examine", "task":"List all of the data stored on the museum table.", "query":"SELECT *
FROM museum;", "columns":"All (*)", "tables":"museum", "rows":"All", "order":"Not specified" ---[EVALUATE]--- Using the * character in the SELECT clause will return all of the columns on the table. The museum table is set in the FROM clause. All rows will be displayed and the order of the rows is not specified. **Try the SQL statement** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas result:museum owner:atlas-query ---[SQL]--- SELECT * FROM museum; ---[SQL]--- When you use the wildcard (*), all columns in the table are returned. Generally, the columns will be returned in the order that they appear in the table. It is usually best to specify the columns that you want to retrieve in your query and not use the wildcard. Retrieving unneeded columns can slow the process of retrieving data from a database. ---[SIDEBAR]--- "type":"tip", "title":"Using Wildcards to Explore Data", "text":"It can be helpful to explore the data in a database to help you effectively write queries to retrieve the specific data you need. This is particularly true if you don't have a database diagram to show you the names of the columns in the database tables. Using the wildcard character to retrieve all of the columns in a database table give you a quick and easy way to explore the data stored on the table." ---[SIDEBAR]--- ## Removing Duplicate Rows All of the queries in this lesson have returned all of the rows from the source table. There are times that you don't want to retrieve all rows. Often the rows retrieved by a query repeat. For example, you may wish to list all of the countries in the museum database with a museum. ---[EVALUATE]--- "type":"plan", "task":"List all of the countries with a museum.", "columns":"country", "tables":"museum", "rows":"Only the unique countries", "order":"Not specified" ---[EVALUATE]--- The SQL statement that retrieves the cities with museums is: SELECT country FROM museum; ---[EVALUATE]--- "type":"examine", "task":"List all of the countries with a museum.", "query": "SELECT country
FROM museum;", "columns":"country", "tables":"museum", "rows":"Only the unique countries", "order":"Not specified", "row_issues":"All rows (even duplicate countries) will be displayed" ---[EVALUATE]--- Although the SQL statement is not correct, it is instructive to see what the query will produce. **Try the SQL Statement** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:all-museum-countries ---[SQL]--- SELECT country FROM museum; ---[SQL]--- Notice that the SQL statement returns 57 rows even though there are only 16 different countries on the museum table. Many of the countries displayed appear in the results more than once. It would be easier to use the results if each country were list only once. You can edit this query to show only the unique values for country. This is done by adding the keyword DISTINCT directly after the keyword SELECT. The updated SQL statement would be: SELECT DISTINCT country FROM museum; ---[EVALUATE]--- "type":"examine", "task":"List all of the countries with a museum.", "query": "SELECT DISTINCT country
FROM museum;", "columns":"country", "tables":"museum", "rows":"Only the unique countries", "order":"Not specified" ---[EVALUATE]--- With the keyword DISTINCT in the SELECT clause, only the unique countries will be displayed. **Try the SQL statement** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:distinct-museum-countries ---[SQL]--- SELECT DISTINCT country FROM museum; ---[SQL]--- Notice that how adding the keyword DISTINCT reduces the result set to 16 rows. Each of the rows in the query result contains a unique country.
Artificial intelligence tools like ^ai_name^ can successfully determine when the keyword DISTINCT is needed. Consider the chat below that asks ^ai_name^ to construct a query to retrieve the countries with a museum. ---[AI]--- [ { "prompt": "Write a SQL statement to list the countries on the Museum table.", "pretext": "To list the countries in the \"Museum\" table, assuming there is a column named \"country\" in the table that stores the country names, you can use the following SQL statement:", "response": "SELECT DISTINCT country \nFROM museum;", "posttext": "This query selects the distinct values from the \"country\" column of the \"Museum\" table. The DISTINCT keyword ensures that each country name is returned only once, even if there are multiple museums in the same country. The result will be a list of distinct country names present in the \"Museum\" table." } ] ---[AI]--- Notice that ^ai_name^ not only includes the keyword DISTINCT in the SQL statement, but also provides an explanation for why DISTINCT is appropriate.
When you use DISTINCT in a SQL statement, it applies to all of the columns of the query not just the first column. For example, retrieving both the country and state (as shown below) will return more than 16 rows. A country with more than one state will appear once for every state on the table. The *combination* of countries and states will all be unique. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:distinct-comparison ---[SQL]--- SELECT DISTINCT country, state FROM museum; SELECT DISTINCT country FROM museum; ---[SQL]--- Notice how both queries in the example return a unique set of values. In the second query, the unique list of countries is returned. In the first, the unique combinations of countries and states are returned. ## Limiting Results There are times that you want to retrieve only the first row or set of rows from a table. Unfortunately, the different database management systems have implemented this in different ways. A common way to retrieve only a few rows involves using the keyword LIMIT. Suppose you want to retrieve the first five rows of the museum table. ---[EVALUATE]--- "type":"plan", "task":"List the first five rows of the museum table.", "columns":"All (*)", "tables":"museum", "rows":"The first five rows", "order":"Not specified" ---[EVALUATE]--- The SQL statement used to retrieve the first five rows of the museum table is: SELECT * FROM museum LIMIT 5; ---[EVALUATE]--- "type":"examine", "task":"List the first five rows of the museum table.", "query":"SELECT * FROM museum LIMIT 5;", "columns":"All (*)", "tables":"museum", "rows":"The first five rows", "order":"Not specified" ---[EVALUATE]--- All columns will be retrieved from the museum table. The syntax "LIMIT 5" ensures that the first five rows of the result will be displayed. No order is specified for the query results. **Try the SQL Statement** ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:limit ---[SQL]--- SELECT * FROM museum LIMIT 5; ---[SQL]--- ## Using Comments SQL statements are instructions that are processed by the database management system to interact with the database. There are times that you want include notes in the the SQL statement that you don't want to be processed. These notes are called comments. ---[SIDEBAR]--- "type":"term", "title":"Comments", "text":"Notes placed inside a SQL statement that are not processed by the database management system." ---[SIDEBAR]--- Here are a few reasons you might want to leave some comment in your SQL statements: 1. The SQL statements you write can become complex. It is often helpful to leave descriptive comments in the statement to explain the logic of the statement. 2. You may want to leave a comment at the beginning of the SQL statement to describe what the statement does. 3. There are times you may want to troubleshoot a SQL statement that isn't working as you expect it to. It can be helpful to temporarily stop part of a statement from being executed, by putting it in a comment, to see the effect of removing that part of the statement. Most database management systems support at least two ways to insert a comment. The first is an inline comment. An inline comment is added using two hyphens (--). Any text on the same line after the hyphens will not be processed. For example, the SQL statement below will retrieve the list of museums in the database, ignoring the text, "This is a comment" when the statement is processed. ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:inline-comment ---[SQL]--- SELECT name --This is a comment FROM museum; ---[SQL]--- You can also create comments that can span across the lines of a SQL statement. Multiline comments begin with the combination of the slash and astericks characters "/*" and end with the astericks and slash characters "*/". For example, the text "This is a multiline comment" will not be processed in this SQL statement: ---[SQL]--- connection:sql_book_connection schema:canvas diagram:canvas owner:atlas-query result:block-comment ---[SQL]--- SELECT name /* This is a multiline comment*/ FROM museum ---[SQL]--- ## Summary This lesson discussed simple SQL statements that retrieve data from a database. It covered retrieving a single column and multiple columns. Additionally, it described how to use the wildcard character to retrieve all of the columns in a table. Finally, it covered how to retrieve only unique rows, a specified number of rows, and how to embed comments in a SQL statement.