Home

SAS 9.1 SQL Procedure: User's Guide

image

Contents

1. Using PROC SQL with the SAS Macro Facility The macro facility is a programming tool that you can use to extend and customize SAS software It reduces the amount of text that you must type to perform common or Programming with the SQL Procedure A Creating Macro Variables in PROC SOL 121 repeated tasks The macro facility can improve the efficiency and usefulness of your SQL programs The macro facility allows you to assign a name to character strings or groups of SAS programming statements From that point on you can work with the names rather than with the text itself For more information about the SAS macro facility see SAS Macro Language Reference Macro variables provide an efficient way to replace text strings in SAS code The macro variables that you create and name are called user defined macro variables Those that are defined by SAS are called automatic macro variables PROC SQL produces three automatic macro variables SQLOBS SQLRC and SQLOOPS to help you troubleshoot your programs For more information about these automatic macro variables see Using the PROC SQL Automatic Macro Variables on page 126 Creating Macro Variables in PROC SQL Other software vendors SQL products allow the embedding of SQL into another language References to variables columns of that language are termed host variable references They are differentiated from references to columns in tables by names that are prefixed with
2. Using the CASE OPERAND Form You can also construct a CASE expression by using the CASE OPERAND form as in the following example This example selects states and assigns them to a region based on the value of the Continent column proc sql outobs 12 title Assigning Regions to Continents select Name Continent case Continent when North America then Continental U S when Oceania then Pacific Islands else None end as Region from sql unitedstates Note When you use the CASE OPERAND form of the CASE expression the conditions must all be equality tests that is they cannot use comparison operators or other types of operators as are used in Using a Simple CASE Expression on page 22 a Output 2 13 Using a CASE Expression in the CASE OPERAND Form Assigning Regions to Continents Continent Region America Continental Alaska America Continental Arizona America Continental Arkansas America Continental California America Continental Colorado America Continental Connecticut America Continental Delaware America Continental District of Columbia America Continental eaaeaeageqag eaeaea a Gg ANNA NAUHAAANAAUNHAHAN Florida America Continental America Continental U Georgia Hawaii Oceania Pacific Islands 24 Replacing Missing Values A Chapter 2 Replacing Missing Values The COALESCE function enables you to replace missing values in a co
3. Note that some of the weights are negative Solution Use the following PROC SQL code to obtain weighted averages that are shown in the following output proc sql title Weighted Averages from Sample Data select Gender sum Value Weight sum Weight as WeightedAverage from select Gender Value case when Weight gt 0 then Weight else 0 end as Weight from Sample group by Gender Output 6 2 PROC SQL Output for Weighted Averages Weighted Averages from Sample Data Weighted Gender Average 1864 026 1015 91 136 How It Works A Chapter 6 How It Works This solution uses an in line view to create a temporary table that eliminates the negative data values in the Weight column The in line view is a query that selects the Gender and Value columns uses a CASE expression to select the value from the Weight column If Weight is greater than zero then it is retrieved if Weight is less than zero then a value of zero is used in place of the Weight value select Gender Value case when Weight gt 0 then Weight else 0 end as Weight from Sample The first or outer SELECT statement in the query selects the Gender column constructs a weighted average from the results that were retrieved by the in line view The weighted average is the sum of the products of Value and Weight divided by the sum of the Weights select Gender sum Value Weight sum Weight as
4. ySd8 SAS Publishing SAS 9 1 SQL Procedure User s Guide The Power to Know The correct bibliographic citation for this manual is as follows SAS Institute Inc 2004 SAS 9 1 SQL Procedure User s Guide Cary NC SAS Institute Inc SAS 9 1 SQL Procedure User s Guide Copyright 2004 SAS Institute Inc Cary NC USA ISBN 1 59047 334 5 All rights reserved Produced in the United States of America No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical photocopying or otherwise without the prior written permission of the publisher SAS Institute Inc U S Government Restricted Rights Notice Use duplication or disclosure of this software and related documentation by the U S government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52 227 19 Commercial Computer Software Restricted Rights J une 1987 SAS Institute Inc SAS Campus Drive Cary North Carolina 27513 1st printing J anuary 2004 SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential For more information about our ebooks elearning products CDs and hard copy books visit the SAS Publishing Web site at support sas com publishing or call 1 800 727 3228 SAS and all other SAS Institute Inc product or service names are registered trademarks o
5. Amarillo Anchorage Annapolis Atlanta Augusta Austin Baker Baltimore Bangor Baton Rouge 16 Eliminating Duplicate Rows from the Query Results A Chapter 2 If you want to select more than one column then you must separate the names of the columns with commas as in this example which selects the City and State columns in the SQL USCITY COORDS table proc sql outobs 12 title U S Cities and Their States select City State from sql uscitycoords Output 2 3 Selecting Multiple Columns U S Cities and Their States Albuquerque Amarillo Anchorage Annapolis Atlanta Augusta Austin Baker Baltimore Bangor Baton Rouge Note When you select specific columns PROC SQL displays the columns in the order in which you specify them in the SELECT clause a Eliminating Duplicate Rows from the Query Results In some cases you might want to find only the unique values in a column For example if you want to find the unique continents in which U S states are located then you might begin by constructing the following query proc sql outobs 12 title Continents of the United States select Continent from sql unitedstates Retrieving Data from a SingleTable A Determining the Structure of a Table 17 Output 2 4 Selecting a Column with Duplicate Values Continents of the United States Continent America America America America America America America America America America Am
6. Baker Eugene Klamath Falls Portland Salem Olympia Seattle Spokane Accessing a DBMS with SAS ACCESS Software SAS ACCESS software for relational databases provides an interface between SAS software and data in other vendors database management systems SAS ACCESS software provides dynamic access to DBMS data through the SAS ACCESS LIBNAME statement and the PROC SQL Pass Through Facility The LIBNAME statement enables you to assign SAS librefs to DBMS objects such as schemas and databases The Pass Through Facility enables you to interact with a DBMS by using its SQL syntax without leaving your SAS session It is generally recommended that you use the SAS ACCESS LIBNAME statement to access your DBMS data because doing so is usually the fastest and most direct method of accessing DBMS data The LIBNAME statement offers the following advantages Programming with the SQL Procedure A Using Libname Engines 129 Significantly fewer lines of SAS code are required to perform operations in your DBMS For example a single LIBNAME statement establishes a connection to your DBMS enables you to specify how your data is processed and enables you to easily browse your DBMS tables in SAS You do not need to know your DBMS s SQL language to access and manipulate your DBMS data You can use SAS procedures such as PROC SQL or DATA step programming on any libref that references DBMS data You can read insert update de
7. China Yugoslavia Germany Colombia India Romania Hungary res Argentina Egypt India AvgHigh The OILPROD table contains oil production statistics from oil producing countries 8 Notes about the Example Tables A Chapter 1 Output 1 9 OILPROD Partial Output OILPROD Barrels Country 1 400 000 Canada 2 500 000 China 3 000 000 Egypt 900 000 Indonesia 1 500 000 Iran 4 000 000 Iraq 600 000 Kuwait 2 500 000 Libya 1 500 000 Mexico 3 400 000 Nigeria 2 000 000 Norway 3 500 000 Oman 900 000 Saudi Arabia 9 000 000 United States of America 8 000 000 The OILRSRVS table lists approximate oil reserves of oil producing countries Output 1 10 OILRSRVS Partial Output OILRSRVS Country Barrels 9 200 000 000 Canada 7 000 000 000 China 25 000 000 000 Egypt 4 000 000 000 Gabon 1 000 000 000 Indonesia 5 000 000 000 Iran 90 000 000 000 Iraq 110 000 000 000 Kuwait 95 000 000 000 Libya 30 000 000 000 Mexico 50 000 000 000 Nigeria 16 000 000 000 Norway 11 000 000 000 Saudi Arabia 260 000 000 000 United Arab Emirates 100 000 000 The CONTINENTS table contains geographic data that relates to world continents Introduction to the SQL Procedure A Notes about the Example Tables 9 Output 1 11 CONTINENTS CONTINENTS HighPoint Height LowPoint Africa 11506000 Kilimanjaro Lake Assal Antarctica 5500000 Vinson Massif Asia 16988000 Everest Dead Sea Australia 2968000 Kosciusko Lak
8. Creating Tables Likean Existing Table 92 Copying an Existing Table 93 Using Data S amp Options 93 Inserting Rowsinto Tables 93 Inserting Rows with the SET Clause 93 Inserting Rows with the VALUES Clause 94 Inserting Rows with a Query 95 Updating Data Values in a Table 96 Updating All Rows in a Column with the Same Expression 96 Updating Rows in a Column with Different Expressions 97 Handling UpdateErrors 98 Deleting Rows 98 Altering Columns 99 Addinga Column 99 Modifyinga Column 100 Ddetinga Column 101 Creating an Index 102 Using PROC SQL to Createlndexes 102 Tips for Creating Indexes 102 Deleting Indexes 103 Ddetinga Table 103 Using SQL Procedure Tables in SAS Software 103 Creating and Using Integrity Constraints in a Table 103 Creating and Using PROC SQL Views 105 Creating Views 106 Describing a View 106 Updating a View 107 Embedding a Libnameina View 107 Ddetinga View 108 Specifying In Line Views 108 Tips for Using SQL Procedure Views 109 Using SQL Procedure Views in SAS Software 109 90 Introduction A Chapter 4 Introduction This chapter shows you how to create a table update tables alter existing tables delete a table create indexes use integrity constraints in table creation create views Creating Tables The CREATE TABLE statement enables you to create tables without rows from column definitions or to create tables from a query result You can also use CREA
9. and label of a column To change a column s name use the RENAME data set option You cannot change a column s data type by using the MODIFY clause The following MODIFY clause permanently changes the format for the Population column proc sql title World s Largest Countries alter table sql newcountries modify population format commal15 select name population from sql newcountries Creating and Updating Tables and Views A Deleting a Column 101 Output 4 13 Modifying a Column Format Largest Countries Population 160 310 357 1 202 215 077 929 009 120 Indonesia 202 393 859 Russia 151 089 979 United States 263 294 808 You may have to change a column s width and format before you can update the column For example before you can prefix a long text string to Name you must change the width and format of Name from 35 to 60 The following statements modify and update the Name column proc sql title World s Largest Countries alter table sql newcountries modify name char 60 format 60 update sql newcountries set name The United Nations member country is name select name from sql newcountries Output 4 14 Changing a Column s Width World s Largest Countries United Nations member country is Brazil United Nations member country is China United Nations member country is India United Nations member country is Indonesia United Nations member country is Russia United Nations
10. for_key Referential name SQL Restrict Set Null USPOSTAL NOTE SQL table SQL USPOSTAL bufsize 8192 has the following integrity constraint s Integrity On On Constraint Variables Reference Delete Update _NM0001_ Not Null for_key Foreign Key name SQL MYSTATES Restrict Set Null Integrity constraints cannot be used in views For more information about integrity constraints see SAS Language Reference Concepts Creating and Using PROC SQL Views A PROC SQL view contains a stored query that is executed when you use the view in a SAS procedure or DATA step Views are useful because they 106 Creating Views A Chapter 4 often save space because a view is frequently quite small compared with the data that it accesses prevent users from continually submitting queries to omit unwanted columns or rows shield sensitive or confidential columns from users while enabling the same users to view other columns in the same table ensure that input data sets are always current because data is derived from tables at execution time hide complex joins or queries from users Creating Views To create a PROC SQL view use the CREATE VIEW statement as shown in the following example proc sql title Current Population Information for Continents create view sql newcontinents as select continent sum population as totpop format commal5 label Total Population sum area
11. in the SQL procedure the combination of data from two or more tables or from two or more SAS data views to produce a single result table A conventional join which is often called an inner join returns a result table for all the rows in one table that have one or more matching rows in the other table s as specified by the sql expression See also outer join join criteria The set of parameters that determine how tables are to be joined J oin criteria are usually specified in a WHERE expression or in an SQL ON clause See also join outer join inner join missing value in SAS a term that describes the contents of a variable that contains no data for a particular row or observation By default SAS prints or displays a missing numeric value as a single period and it prints or displays a missing character value as a blank space In the SQL procedure a missing value is equivalent to an SQL NULL value natural join a type of join that returns selected rows from tables in which one or more columns in each table has the same name and the same data type and contains the same value See join outer join in the SQL procedure an inner join that is augmented with rows that do not match with any row from the other table s in the join Outer joins are of three kinds left right and full See also join PROC SQL view a SAS data set of type VIEW that is created by the SQL procedure A PROC SQL view contains no data Instead it stores info
12. see the SQL Procedure chapter in the Base SAS Procedures Guide Table Name Contains Information About View Name DICTIONARY CATALOGS SAS catalogs and their entries SASHELP VCATALG DICTIONARY COLUMNS columns or variables and their SASHELP VCOLUMN attributes DICTIONARY DICTIONARIES all DICTIONARY tables SASHELP VDCTNRY DICTIONARY EXTFILES filerefs and external storage SASHELP VEXTFL locations of the external files 118 Accessing SAS System Information Using DICTIONARY Tables A Chapter 5 Table Name Contains Information About View Name DICTIONARY INDEXES indexes that exist for SAS data SASHELP VINDEX sets DICTIONARY MEMBERS SAS files SASHELP VMEMBER DICTIONARY OPTIONS current settings of SAS system SASHELP VOPTION options DICTIONARY STYLES ODS styles SASHELP VSTYLE DICTIONARY TABLES SAS data files and views SASHELP VTABLE DICTIONARY VIEWS SAS data views SASHELP VVIEW To see how each DICTIONARY table is defined submit a DESCRIBE TABLE statement This example shows the definition of DICTIONARY TABLES proc sql describe table dictionary tables The results are written to the SAS log Output 5 4 Definition of DICTIONARY TABLES NOTE SQL table DICTIONARY TABLES was created like create table DICTIONARY TABLES libname char 8 label Library Name memname char 32 label Member Name memtype char 8 label Member Type memlabel char 256 label Dataset Label typemem char 8 label Dataset Ty
13. the row You specify values by using a SET clause or VALUES clause You can also insert the rows resulting from a query Under most conditions you can insert data into tables through PROC SQL and SAS ACCESS views See Updating a View on page 107 Inserting Rows with the SET Clause With the SET clause you assign values to columns by name The columns can appear in any order in the SET clause The following INSERT statement uses multiple SET clauses to add two rows to NEWCOUNTRIES proc sql insert into sql newcountries 94 Inserting Rows with the VALUES Clause A Chapter 4 set name Bangladesh capital Dhaka population 126391060 set name Japan capital Tokyo population 126352003 title World s Largest Countries select name format 20 capital format 15 population format commal15 0 from sql newcountries Output 4 5 Rows Inserted with the SET Clause World s Largest Countries Capital Population Brazil Brasilia 160 310 357 China Beijing 1 202 215 077 India New Delhi 929 009 120 Indonesia Jakarta 202 393 859 Russia Moscow 151 089 979 United States Washington 263 294 808 Bangladesh Dhaka 126 391 060 Japan Tokyo 126 352 003 Note the following features of SET clauses As with other SQL clauses use commas to separate columns In addition you must use a semicolon after the last SET clause only If you omit data for a column then the value in that column is a mis
14. 2 Note When you use an ORDER BY clause you change the order of the output but not the order of the rows that are stored in the table A Note ThePROC SQL default sort order is ascending Output 2 16 Sorting by Column Country Populations Population Vatican City Tuvalu Nauru Turks and Caicos Islands Leeward Islands Cayman Islands San Marino Liechtenstein Gibraltar Monaco Saint Kitts and Nevis Marshall Islands Sorting by Multiple Columns You can sort by more than one column by specifying the column names separated by commas in the ORDER BY clause The following example sorts the SQL COUNTRIES table by two columns Continent and Name proc sql outobs 12 title Countries Sorted by Continent and Name select Name Continent from sql countries order by Continent Name Output 2 17 Sorting by Multiple Columns Countries Sorted by Continent and Name Continent Bermuda Iceland Kalaallit Nunaat Algeria Angola Africa Benin Africa Botswana Africa Burkina Faso Africa Burundi Africa Cameroon Africa Cape Verde Africa Central African Republic Africa Retrieving Data froma SingleTable A Sorting by Calculated Column 27 Note The results list countries without continents first because PROC SQL sorts missing values first in an ascending sort A Specifying a Sort Order To order the results specify ASC for ascending or DESC for descending You can specify a sort order fo
15. 34 248 705 1073518 Armenia 3 556 864 11500 Australia 18 255 944 2966200 Austria 8 033 746 32400 The following DESCRIBE TABLE statement writes a CREATE TABLE statement to the SAS log proc sql describe table sql densities Output 4 3 SAS Log for DESCRIBE TABLE Statement for DENSITIES NOTE SQL table SQL DENSITIES was created like create table SQL DENSITIES bufsize 8192 Name char 35 format 15 informat 35 label Country Population num format COMMA10 informat BEST8 label Population SquareMiles num format BEST8 informat BEST8 label SquareMiles Density num format 6 2 i In this form of the CREATE TABLE statement assigning an alias to a column renames the column while assigning a label does not In this example the Area column has been renamed to SquareM iles and the calculated column has been named Densities However the Name column retains its name and its display label is country Creating Tables Like an Existing Table To create an empty table that has the same columns and attributes as an existing table or view use the LIKE clause in the CREATE TABLE statement In the following example the CREATE TABLE statement creates the NEWCOUNTRIES table with 6 columns and 0 rows and with the same column attributes as those in COUNTRIES The DESCRIBE TABLE statement writes a CREATE TABLE statement to the SAS log proc sql create table sql newcountries like sql countr
16. 69 Including All Rows with the Union Join A union join combines two tables without attempting to match rows All columns and rows from both tables are included Combining tables with a union join is similar to combining them with the OUTER UNION set operator see Combining Queries with Set Operators on page 81 A union join s output can be limited by a WHERE clause This example shows a union join of the same One and Two tables that were used earlier to demonstrate a cross join proc sql select from one union join two Output 3 20 Union Join Matching Rows with a Natural Join A natural join automatically selects columns from each table to use in determining matching rows With a natural join PROC SQL identifies columns in each table that have the same name and type rows in which the values of these columns are equal are returned as matching rows The ON clause is implied This example produces the same results as the example in Specifying the Order of J oin Output on page 59 proc sql outobs 6 title Oil Production Reserves of Countries select country barrelsperday Production barrels Reserve from sql oilprod natural join sql oilrsrvs order by barrelsperday desc Output 3 21 Natural Inner Join of OILPROD and OILRSRVS Oil Production Reserves of Countries Country Production Reserve Saudi Arabia 9 000 000 260 000 000 000 United States of America 8 000 000 30 000 000 000
17. 93 114 708 Germany 81 890 690 RFPrFrRPNNNND W Using Aggregate Functions with Unique Values You can use DISTINCT with an aggregate function to cause the function to use only unique values from a column Counting Unique Values The following query returns the number of distinct nonmissing continents in the SQL COUNTRIES table proc sql title Number of Continents in the COUNTRIES Table select count distinct Continent as Count from sql countries Output 2 37 Using DISTINCT with the COUNT Function Number of Continents in the COUNTRIES Table Note You cannot use select count distinct to count distinct rows in a table This code generates an error because PROC SQL does not know which duplicate column values to eliminate Counting Nonmissing Values Compare the previous example with the following query which does not use the DISTINCT keyword This query counts every nonmissing occurrence of a continent in the SQL COUNTRIES table including duplicate values 44 Summarizing Data with Missing Values A Chapter 2 proc sql title Countries for Which a Continent is Listed select count Continent as Count from sql countries Output 2 38 Effect of Not Using DISTINCT with the COUNT Function Countries for Which a Continent is Listed Counting All Rows In the previous two examples countries that have a missing value in the Continent column are ignored by the COUNT fun
18. 980323 VID003 980223 VID001 980215 VID003 980223 You want to use this table to create a summary report that shows the sales for each produc for each month of the quarter Solution Use the following PROC SQL code to create a column for each month of the quarter and use the summary function SUM in combination with the GROUP BY statement to accumulate the monthly sales for each product Practical Problem Solving with PROC SQL A How It Works 147 proc sql title First Quarter Sales by Product select Product sum Jan label Jan sum Feb label Feb sum Mar label Mar from select Product case when substr InvoiceDate 3 2 01 then InvoiceAmount end as Jan case when substr InvoiceDate 3 2 02 then InvoiceAmount end as Feb case when substr InvoiceDate 3 2 03 then InvoiceAmount end as Mar from work sales group by Product Output 6 16 PROC SQL Output for a Summary Report First Quarter Sales by Product VID001 VID003 VID005 VID010 VID014 Note Missing values in the matrix indicate that no sales occurred for that given product in that month How It Works This solution uses an in line view to create three temporary columns J an Feb and Mar based on the month part of the invoice date column The in line view is a query that selects the product column uses a CASE expression to assign the value of invoice amount to one of three columns J an
19. LOOPS options then you are prompted to stop or continue processing when the limits set by these options are reached Checking Syntax with the NOEXEC Option and the VALIDATE Statement To check the syntax of a PROC SQL step without actually executing it use the NOEXEC option or the VALIDATE statement Both the NOEXEC option and the VALIDATE statement work essentially the same way The NOE XEC option can be used once in the PROC SQL statement and the syntax of all queries in that PROC SQL step will be checked for accuracy without executing them The VALIDATE statement must be specified before each SELECT statement in order for that statement to be checked for accuracy without executing If the syntax is valid then a message is written to the SAS log to that effect if the syntax is invalid then an error message is displayed The automatic macro variable SQLRC contains an error code that indicates the validity of the syntax For an example of the VALIDATE statement used in PROC SQL see Validating a Query on page 52 For an example of using the VALIDATE statement in a SAS AF application see Using the PROC SQL Automatic Macro Variables on page 126 Note Thereis an interaction between the PROC SQL EXEC and ERRORSTOP options when SAS is running in a batch or noninteractive session For more information see the section about the SQL procedure in Base SAS Procedures Guide Expanding SELECT with the FEEDBACK Option The FEEDBACK opt
20. MAX function can cause the same calculation to repeat for every row This occurs whenever PROC SQL remerges data Remerging occurs whenever any of the following conditions exist The SELECT clause references a column that contains an aggregate function that is not listed in a GROUP BY clause The SELECT clause references a column that contains an aggregate function and other column s that are not listed in the GROUP BY clause One or more columns or column expressions that are listed in a HAVING clause are not included in a subquery or a GROUP BY clause In this example PROC SQL writes the population of China which is the largest population in the table proc sql outobs 12 title Largest Country Populations select Name Population format comma20 max Population as MaxPopulation format comma20 from sql countries order by Population desc 42 Remerging Summary Statistics A Chapter 2 Output 2 35 Using Aggregate Functions Largest Country Populations Population MaxPopulation 1 202 215 077 1 202 215 077 929 009 120 1 202 215 077 United States 263 294 808 1 202 215 077 Indonesia 202 393 859 1 202 215 077 Brazil 160 310 357 1 202 215 077 Russia 151 089 979 1 202 215 077 Bangladesh 126 387 850 1 202 215 077 Japan 126 345 434 1 202 215 077 Pakistan 123 062 252 1 202 215 077 Nigeria 99 062 003 1 202 215 077 Mexico 93 114 708 1 202 215 077 Germany 81 890 690 1 202 215 077 In some
21. Program Department Payrate Gadgets Whatnots Jack Colton Mickey Raymond Dean Proffit Antoinette Lily Sydney Wade Alan Traherne Elizabeth Bennett Practical Problem Solving with PROC SQL A Solution 151 You want to update the table by increasing each salesperson s payrate based on the total sales of gadgets and whatnots and taking into consideration some factors that are based on department code Specifically anyone who sells over 10 000 gadgets merits an extra 5 per hour Anyone selling between 5 000 and 10 000 gadgets also merits an incentive pay but E Department salespersons are expected to be better sellers than those in the other departments so their gadget sales incentive is 2 per hour compared to 3 per hour for those in other departments Good sales of whatnots also entitle sellers to added incentive pay The algorithm for whatnot sales is that the top level level 1 in each department salespersons merit an extra 50 per hour for whatnot sales over 2 000 and level 2 salespersons merit an extra 1 per hour for sales over 2 000 Solution Use the following PROC SQL code to create a new value for the Payrate column Actually Payrate is updated twice for each row once based on sales of gadgets and again based on sales of whatnots proc sql update incentives set payrate case when gadgets gt 10000 then payrate 5 00 when gadgets gt 5000 then case when department in El E2 then payr
22. WeightedAverage Finally the query uses a GROUP BY clause to combine the data so that the calculation is performed for each gender group by Gender Comparing Tables Problem You have two copies of a table One of the copies has been updated You want to see which rows have been changed Background Information There are two tables the OLDSTAFF table and NEWSTAFF table The NEWSTAFF table is a copy of OLDSTAFF Changes have been made to NEWSTAFF You want to find out what changes have been made Practical Problem Solving with PROC SQL A Solution 137 Output 6 3 Sample Input Tables for Table Comparison Wilson Singleton Thompson Wilson Singleton Thompson Chen Old Staff Table Terence Georgina Sanford Archie Fran Adam Jack New Staff Table Middle Terence Georgina Sanford Archie Fran Adam John Bill 661 0012 661 3243 661 8897 661 4432 661 4328 661 8332 661 0980 661 6781 Phone 661 0012 661 3243 661 2231 661 4432 661 4328 661 8332 661 0980 661 6781 661 8099 Location Location Solution To display only the rows that have changed in the new version of the table use the EXCEPT set operator between two SELECT statements proc sql title Updated Rows select from newstaff except select from oldstaff Output 6 4 Rows That Have Changed Bridges Thompson Updated Rows Georgina John 661 8099 661 2231 661 6781 Location
23. as a line split character in the labels proc sql outobs 10 title Populations and or Coordinates of World Cities select City City WORLDCITYCOORDS format 20 Capital Capital COUNTRIES format 20 Population Latitude Longitude from sql countries full join sql worldcitycoords on Capital City and Name Country Output 3 17 Full Outer Join of COUNTRIES and WORLDCITYCOORDS Populations and or Coordinates of World Cities City Capital WORLDCITYCOORDS COUNTRIES Population Latitude Longitude 146436 Abu Dhabi 2818628 Abuja 99062003 Acapulco Accra Accra Adana Addis Ababa Adelaide Aden 17395511 Addis Ababa 59291170 68 Specialty Joins A Chapter 3 Specialty Joins Three types of joins cross joins union joins and natural joins are special cases of the standard join types Including All Combinations of Rows with the Cross Join A cross join is a Cartesian product it returns the product of two tables Likea Cartesian product a cross join s output can be limited by a WHERE clause This example shows a cross join of the tables One and Two Output 3 18 Tables One and Two Table One proc sql select from one cross join two Output 3 19 Cross Join The SAS System Like a conventional Cartesian product a cross join causes a note regarding Cartesian products in the SAS log Retrieving Data from Multiple Tables A Specialty Joins
24. coordinates of the capitals of the states in the United States then you need to join the UNITEDSTATES table which contains the state capitals with the USCITYCOORDS table which contains the coordinates of cities in the United States Because cities must be joined along with their states for an accurate join similarly to the previous example you must join the tables on both the city and state columns of the tables J oining the cities by joining the UNITEDSTATES Capital column to the USCITYCOORDS City column is straightforward However in the UNITEDSTATES table the Name column contains the full state name while in USCITY COORDS the states are specified by their postal code It is therefore impossible to directly join the two tables on their state columns To solve this problem it is necessary to use the POSTALCODES table which contains both the state names and their postal codes as an intermediate table to make the correct relationship between UNITEDSTATES and USCITYCOORDS The correct solution joins the UNITEDSTATES Name column to the POSTALCODES Name column matching the full state names and the POSTALCODES Code column to the USCITYCOORDS State column matching the state postal codes title Coordinates of State Capitals proc sql outobs 10 select us Capital format 15 us Name State format 15 pe Code c Latitude c Longitude from sql unitedstates us sql postalcodes pc sql uscitycoords c where us Capital c City
25. for Computing Subtotal Percentages Partial Output Sample Data for Subtotal Percentages Obs State Answer YES YES YES YES 1 2 3 4 5 6 7 8 9 0 Solution Use the following PROC SQL code to compute the subtotal percentages Practical Problem Solving with PROC SQL A Background Information 141 proc sql titlel Survey Responses select survey Answer State count State as Count calculated Count Subtotal as Percent format percent8 2 from survey select Answer count as Subtotal from survey group by Answer as survey2 where survey Answer survey2 Answer group by survey Answer State quit Output 6 8 PROC SQL Output That Computes Percentages within Subtotals Survey Responses Answer Percent How It Works This solution uses a subquery to calculate the subtotal counts for each answer The code joins the result of the subquery with the original table and then uses the calculated state count as the numerator and the subtotal from the subquery as the denominator for the percentage calculation The query uses a GROUP BY clause to combine the data so that the calculation is performed for State within each answer group by survey Answer State Counting Duplicate Rows in a Table Problem You want to count the number of duplicate rows in a table and generate an output column that shows how many times each row occurs Background Information There is one input table call
26. list of U S states that have New at the beginning of their names proc sql title New U S States select Name from sql unitedstates where Name eqt New Output 2 30 Using a Truncated String Comparison Operator New U S States New Hampshire New Jersey New Mexico New York Using a WHERE Clause with Missing Values If a column that you specify in a WHERE clause contains missing values then a query might provide unexpected results For example the following query returns all features from the SQL FEATURES table that have a depth of less than 500 feet incorrect output proc sql outobs 12 title World Features with a Depth of Less than 500 Feet 38 Using a WHERE Clause with Missing Values A Chapter 2 select Name Depth from sql features where Depth 1t 500 order by Depth Output 2 31 Using a WHERE Clause with Missing Values Incorrect Output World Features with a Depth of Less than 500 Feet Kalahari Nile Citlaltepec Lena Mont Blanc Borneo Rub al Khali Amur Yosemite Cook Mackenzie Peace Mekong However because PROC SQL treats missing values as smaller than nonmissing values features that have no depth listed are also included in the results To avoid this problem you could adjust the WHERE expression to check for missing values and exclude them from the query results as follows corrected output proc sql outobs 12 title World Feat
27. log that informs you that this occurred Table 2 7 Differences between the HAVING Clause and WHERE Clause A HAVING dause A WHERE dause is typically used to specify condition s for is used to specify conditions for including or including or excluding groups of rows froma excluding individual rows from a table table must follow the GROUP BY clausein a query if must precede the GROUP BY clausein a query used with a GROUP BY clause if used with a GROUP BY clause is affected by a GROUP BY clause when there is not affected by a GROUP BY clause is no GROUP BY clause the HAVING clauseis treated like a WHERE cause iS processed after the GROUP BY clause and iS processed before a GROUP BY clause if there any aggregate functions is one and before any aggregate functions Using HAVING with Aggregate Functions The following query returns the populations of all continents that have more than 15 countries proc sql title Total Populations of Continents with More than 15 Countries select Continent sum Population as TotalPopulation format commal6 count as Count from sql countries group by Continent having count gt 15 order by Continent 52 Validating a Query A Chapter 2 The HAVING expression contains the COUNT function which counts the number of rows within each group Output 2 50 Using HAVING with the COUNT Function Total Populations of Continents with More than 15 Countries Co
28. macro variable The INOBS option restricts PROC SQL to using the first five rows of the COUNTRIES table A comma and a space are used to delimit the values in the macro variable proc sql noprint inobs 5 select Name into countries separated by from sql countries sput amp countries Output 5 11 Concatenating Values in Macro Variables 4 proc sql noprint inobs 5 5 select Name 6 7 into countries separated by from sql countries WARNING Only 5 records were read from SQL COUNTRIES due to INOBS option 8 9 put amp countries Afghanistan Albania Algeria Andorra Angola 124 Defining Macros to Create Tables A Chapter 5 The leading and trailing blanks are trimmed from the values before the macro variables are created If you do not want the blanks to be trimmed then add NOTRIM to the INTO clause Here is the previous example with NOTRIM added proc sql noprint inobs 5 select Name into countries separated by NOTRIM from sql countries sput amp countries Output 5 12 Concatenating Values in Macro Variables Blanks Not Removed proc sql noprint inobs 5 select Name into countries separated by NOTRIM from sql countries WARNING Only 5 records were read from SQL COUNTRIES due to INOBS option 5 6 Sput amp countries Afghanistan Albania Algeria Andorra Angola Defining Macros to Create Tables Macros are useful as interfaces for table creati
29. member country is United States Deleting a Column The DROP clause deletes columns from tables The following DROP clause deletes UNDate from NEWCOUNTRIES proc sql alter table sql newcountries drop undate 102 Creating an Index A Chapter 4 Creating an Index An index is a file that is associated with a table The index enables access to rows by index value Indexes can provide quick access to small subsets of data and they can enhance table joins You can create indexes but you cannot instruct PROC SQL to use an index PROC SQL determines whether it is efficient to use the index Some columns may not be appropriate for an index In general create indexes for columns that have many unique values or are columns that you use regularly in joins Using PROC SQL to Create Indexes You can create a simple index which applies to one column only The name of a simple index must be the same as the name of the column that it indexes Specify the column name in parentheses after the table name The following CREATE INDEX statement creates an index for the Area column in NEWCOUNTRIES proc sql create index area on sql newcountries area You can also create a composite index which applies to two or more columns The following CREATE INDEX statement creates the index Places for the Name and Continent columns in NEWCOUNTRIES proc sql create index places on sql newcountries name continent To ensure that ea
30. of Table Two should appear in the output Compare this query s output to Output 3 2 proc sql select from one two where one x two x 58 Inner Joins A Chapter 3 Output 3 4 Table One and Table Two Joined Table One and Table Two The output contains only one row because only one value in column X matches from each table In an inner join only the matching rows are selected Outer joins can return nonmatching rows they are covered in Outer J oins on page 65 Note that the column names in the WHERE clause are prefixed by their table names This is Known as qualifying the column names and it is necessary when you specify columns that have the same name from more than one table Qualifying the column name avoids creating an ambiguous column reference Using Table Aliases A table alias is a temporary alternate name for a table You specify table aliases in the FROM clause Table aliases are used in joins to qualify column names and can make a query easier to read by abbreviating table names The following example compares the oil production of countries to their oil reserves by joining the OILPROD and OILRSRVS tables on their Country columns Because the Country columns are common to both tables they are qualified with their table aliases You could also qualify the columns by prefixing the column names with the table names Note TheAS keyword is optional a proc sql outobs 6 title Oil Production Reserv
31. one column list of with attributes modifying 100 multicolumn joins renaming 100 replacing missing values selecting selecting all columns selecting specific columns sorting with missing values sorting by sorting by column position sorting by multiple columns sorting by unselected columns summarizing data in multiple columns unique values comparison operators inner joins with truncated string concatenating query results conditional operators correlated subqueries counting all rows duplicate rows 141 nonmissing values unique values CREATE INDEX statement 102 cross joins D data files See tables data set options creating tables with DATA step vs SQL procedure DBMS access 128 DBMS data displaying with SQL Procedure Pass Through Facility DBMS tables PROC SQL views of querying 129 debugging queries DICTIONARY tables DICTIONARY COLUMNS 119 DICTIONARY TABLES tips for 120 DICTIONARY COLUMNS 119 DICTIONARY TABLES E errors update errors 98 example tables 4j EXCEPT operator EXISTS condition F FEEDBACK option expanding SELECT statement with fields See columns files See tables filtering grouped data HAVING clause HAVING clause vs WHERE clause HAVING clause with aggregate functions foreign key FROM clause full outer joins G general integrity constraints GROUP BY clause grouping data by multiple columns by one column filtering grouped data missing va
32. processed by an SQL procedure statement for example the number of rows that were formatted and displayed in SAS output by a SELECT statement or the number of rows that were deleted by a DELETE statement SQLOOPS contains the number of iterations that the inner loop of PROC SQL processes The number of iterations increases proportionally with the complexity of the query See Limiting Iterations with the LOOPS Option on page 112 for details SQLRC contains a status value that indicates the success of the PROC SQL statement For a complete list of the values that this macro returns see the Base SAS Procedures Guide Users of SAS AF software can access these automatic macro variables in SAS Component Language programs by using the SYMGET function The following example uses the VALIDATE statement in a SAS AF software application to check the syntax of a block of code Before it issues the CREATE VIEW statement the application checks that the view is accessible submit sql immediate validate amp viewdef end submit if symget SOLRC gt 4 then do the view is not valid end else do submit sql immediate create view amp viewname as amp viewdef end submit end The following example retrieves the data from the COUNTRIES table but does not display it because the NOPRINT option is specified in the PROC SQL statement The PUT macro language statement displays the three automatic macro variable values in the SAS l
33. rolling winter Practical Problem Solving with PROC SQL A How It Works 149 You want to reorder this chore list so that all the chores are grouped by season starting with spring and progressing through the year Simply ordering by Season makes the list appear in alphabetical sequence fall spring summer winter Solution Use the following PROC SQL code to create a new column Sorter that will have values of 1 through 4 for the seasons spring through winter Use the new column to order the query but do not select it to appear options nodate nonumber linesize 80 pagesize 60 proc sql title Garden Chores by Season in Logical Order select Project Hours Season from select Project Hours Season case when Season spring then 1 when Season summer then 2 when Season fall then 3 when Season winter then 4 else end as Sorter from chores order by Sorter Output 6 18 PROC SQL Output for a Customized Sort Sequence Garden Chores by Season in Logical Order Project Season tilling spring raking spring planting spring seeding spring aerating spring mowing summer feeding summer edging summer weeding summer raking fall mulching fall planting fall rolling winter pruning winter sweeping winter How It Works This solution uses an in line view to create a temporary column that can be used as an ORDER BY column The in line view is a query that s
34. seconds cpu time 0 01 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK OCEAN has 4 observations and 6 variables DATA statement used Total process time real time 0 01 seconds cpu time 0 01 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK RIVER has 12 observations and 6 variables DATA statement used Total process time real time 0 02 seconds cpu time 0 02 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK SEA has 13 observations and 6 variables DATA statement used Total process time real time 0 03 seconds cpu time 0 02 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK WATERFALL has 4 observations and 6 variables DATA statement used Total process time real time 0 02 seconds cpu time 0 02 seconds How It Works Practical Problem Solving with PROC SQL A Solution This solution uses the INTO clause to store values in macro variables The first SELECT statement counts the unique variables and stores the result in macro variable N The second SELECT statement creates a range of macro variables one for each unique value and stores each unique value in one of the macro variables Note the use of the LEFT function which trims leading blanks from the value of the N macro variable The MAKEDS macro uses all the macro variables that were cre
35. table with itself Use subqueries when the result that you want requires more than one query and each subquery provides a subset of the table involved in the query If a membership question is asked then a subquery is usually used If the query requires a NOT EXISTS condition then you must use a subquery because NOT EXISTS operates only in a subquery the same principle holds true for the EXISTS condition Many queries can be formulated as joins or subqueries Although the PROC SQL query optimizer changes some subqueries to joins a join is generally more efficient to process Retrieving Data from Multiple Tables A Working with Two or More Query Results 81 Combining Queries with Set Operators Working with Two or More Query Results PROC SQL can combine the results of two or more queries in various ways by using the following set operators UNION produces all unique rows from both queries EXCEPT produces rows that are part of the first query only INTERSECT produces rows that are common to both query results OUTER UNION _ concatenates the query results The operator is used between the two queries for example select columns from table set operator select columns from table Place a semicolon after the last SELECT statement only Set operators combine columns from two queries based on their position in the referenced tables without regard to the individual column names Columns in the s
36. than an equivalent subquery Resetting PROC SQL Options with the RESET Statement Use the RESET statement to add drop or change the options in the PROC SQL statement You can list the options in any order in the PROC SQL and RESET statements Options stay in effect until they are reset This example first uses the NOPRINT option to prevent the SELECT statement from displaying its result table in SAS output It then resets the NOPRINT option to PRINT the default and adds the NUMBER option which displays the row number in the result table proc sql noprint title Countries with Population Under 20 000 select Name Population from sql countries reset print number select Name Population from sql countries where population lt 20000 Output 5 3 Resetting PROC SQL Options with the RESET Statement Countries with Population Under 20 000 Population Leeward Islands Nauru Turks and Caicos Islands Tuvalu Vatican City Improving Query Performance There are several ways to improve query performance Some of them include using indexes and composite indexes using the keyword ALL in set operations when you know that there are no duplicate rows or when it does not matter if you have duplicate rows in the result table omitting the ORDER BY clause when you create tables and views using in line views instead of temporary tables or vice versa using joins instead of subqueries using WHERE expressions to limi
37. those rows that have high temperature equal to low temperature The WHERE clause also prevents a city from being joined to itself city ne City and Country ne Country although in this case it is highly unlikely that the high temperature would be equal to the low temperature for the same city Output 3 14 Amsterdam Auckland Auckland Retrieving Data from Multiple Tables Joining a Table to Itself Self Join Cities High Temps Country Netherlands New Zealand New Zealand AvgHigh Cities San Juan Lagos Manila Low Temps Country Puerto Rico Nigeria Philippines A Outer Joins Berlin Berlin Bogota Cape Town Copenhagen Dublin Glasgow London Oslo Reykjavik Stockholm Germany Germany Colombia South Africa Denmark Ireland Scotland England Norway Iceland Sweden Lagos Manila Bangkok San Juan Singapore Bombay Nassau Singapore Singapore Caracas San Juan Nigeria Philippines Thailand Puerto Rico Singapore India Bahamas Singapore Singapore Venezuela Puerto Rico Outer Joins Outer joins are inner joins that are augmented with rows from one table that do not match any row from the other table in the join The resulting output includes rows that match and rows that do not match from the join s source tables Nonmatching rows have null values in the columns from the unmatched table Use the ON clause instead of the WHERE clause to specify the column or columns o
38. 00 000 Iran 4 000 000 90 000 000 000 Norway 3 500 000 11 000 000 000 Mexico 3 400 000 50 000 000 000 China 3 000 000 25 000 000 000 Creating Inner Joins Using INNER JOIN Keywords The INNER J OIN keywords can be used to join tables The ON clause replaces the WHERE clause for specifying columns to join PROC SQL provides these keywords primarily for compatibility with the other joins OUTER RIGHT and LEFT J OIN Using INNER J OIN with an ON clause provides the same functionality as listing tables in the FROM clause and specifying join columns with a WHERE clause This code produces the same output as the previous code but uses the INNER J OIN construction proc sql select p country barrelsperday Production barrels Reserves from sql oilprod p inner join sql oilrsrvs r on p country r country order by barrelsperday desc Joining Tables Using Comparison Operators Tables can be joined by using comparison operators other than the equal sign in the WHERE clause for a list of comparison operators see Retrieving Rows Based on a Comparison on page 31 In this example all U S cities in the USCITYCOORDS table are selected that are south of Cairo Egypt The compound WHERE clause specifies the city of Cairo in the WORLDCITY COORDS table and joins USCITYCOORDS and WORLDCITYCOORDS on their Latitude columns using a less than 1t operator proc sql title US Cities South of Cairo Egypt select u
39. 05 CAUTION Make sure that you specify the ELSE clause If you omit the ELSE clause then each row that is not described in one of the WHEN clauses receives a missing value for the column that you are updating This happens because the CASE expression supplies a missing value to the SET clause and the Population column is multiplied by a missing value which produces a missing value A Handling Update Errors While you are updating or inserting rows in a table you may receive an error message that the update or insert cannot be performed By using the UNDO _ POLICY option you can control whether the changes that have already been made will be permanent The UNDO _POLICY option in the PROC SQL and RESET statements determines how PROC SQL handles the rows that have been inserted or updated by the current INSERT or UPDATE statement up to the point of error UNDO_POLICY REQUIRED is the default It undoes all updates or inserts up to the point of error UNDO_POLICY NONE does not undo any updates or inserts UNDO_POLICY OPTIONAL undoes any updates or inserts that it can undo reliably Deleting Rows The DELETE statement deletes one or more rows in a table or in a table that underlies a PROC SQL or SAS ACCESS view For more information about deleting rows from views see Updating a View on page 107 The following DELETE statement deletes the names of countries that begin with the letter R proc sql delete from sql newcountri
40. 1 creating multiple macros defining to create tables 124 match merges vs joins MEAN function summarizing data missing values grouping data containing overlaying 138 replacing in columns sorting columns with summarizing data with WHERE clause with multicolumn joins N natural joins nested subqueries NOEXEC option syntax checking with 113 null values inner joins and O observations See rows ODS Output Delivery System SQL procedure with 132 ORDER BY clause omitting query performance and outer joins full outer joins left outer joins right outer joins 66 OUTER UNION operator OUTOBS option restricting row processing output adding text to formatting with REPORT procedure overlaying missing values 138 P percentages within subtotals performance queries 115 primary key 104 PROC SQL views creating deleting 108 describing embedding libnames in in line views 108 in SAS 109 of DBMS tables 130 tips for 109 updating 107 programming with SQL procedure Q qualifying column names queries ALL keyword in set operations breaking into steps combining with set operators creating with SQL procedure 112 DBMS tables debugging 112 duplicate rows and performance 116 in line views vs temporary tables 116 indexes and inserting rows with limiting iterations 112 performance improvement 115 restricting row processing 112 subqueries validating query results concat
41. 1170 Adelaide Australia Aden Yemen COALESCE can be used in both inner and outer joins For more information about COALESCE see Replacing Missing Values on page 24 Comparing DATA Step Match Merges with PROC SQL Joins Many SAS users are familiar with using a DATA step to merge data sets This section compares merges to joins DATA step match merges and PROC SQL joins can produce the same results However a significant difference between a match merge and a join is that you do not have to sort the tables before you join them When All of the Values Match When all of the values match in the BY variable and there are no duplicate BY variables you can use an inner join to produce the same result as a match merge To demonstrate this result here are two tables that have the column Flight in common The values of Flight are the same in both tables FLTSUPER FLTDEST Flight Supervisor Flight Destination 145 Kang 145 Brussels 150 Miller 150 Paris 155 Evanko 155 Honolulu FLTSUPER and FLTDEST are already sorted by the matching column Flight A DATA step merge produces Output 3 23 data merged merge FltSuper FltDest by Flight run proc print data merged noobs title Table MERGED run 72 Comparing DATA Step Match Merges with PROC SQL Joins A Chapter 3 Output 3 23 Merged Tables When All the Values Match Table MERGED Flight Supervisor Destination 145 Kang Brussels 150 Miller Paris 155
42. 138 How It Works A Chapter 6 How It Works The EXCEPT operator returns rows from the first query that are not part of the second query In this example the EXCEPT operator displays only the rows that have been added or changed in the NEWSTAFF table Note Any rows that were deleted from OLDSTAFF will not appear Overlaying Missing Data Values Problem You are forming teams for a new league by analyzing the averages of bowlers when they were members of other bowling leagues When possible you will use each bowler s most recent league average However if a bowler was not in a league last year then you will use the bowler s average from the prior year Background Information There are two tables LEAGUE1 and LEAGUE 2 that contain bowling averages for last year and the prior year respectively The structure of the tables is not identical because the data was compiled by two different secretaries However the tables do contain essentially the same type of data Output 6 5 Sample Input Tables for Overlaying Missing Values Bowling Averages from Leaguel Fullname Bowler AvgScore Alexander Delarge John T Chance Jack T Colton 141 Andrew Shepherd 185 Bowling Averages from League2 FirstName LastName AMFNo AvgScore Mickey Raymond Jack Chance Patrick O Malley Practical Problem Solving with PROC SQL A How It Works 139 Solution The following PROC SQL code combines the information from two tables LEAGU
43. 384 801 818 Africa 706 611 183 Europe 811 680 062 Asia 3 379 469 458 Here is a SAS program that produces the same result title Large Countries Grouped by Continent proc summary data sql countries where Population gt 1000000 class Continent var Population output out sumPop sum TotPop run proc sort data SumPop by totPop run 4 Notes about the Example Tables A Chapter 1 proc print data SumPop noobs var Continent TotPop format TotPop commal5 where _type 1 run Output 1 2 Sample DATA Step Output Large Countries Grouped by Continent Continent TotPop Oceania 3 422 548 Australia 18 255 944 Central America and Caribbean 65 283 910 South America 316 303 397 North America 384 801 818 Africa 706 611 183 Europe 811 680 062 Asia 3 379 469 458 This example shows that PROC SQL can achieve the same results as base SAS software but often with fewer and shorter statements The SELECT statement that is shown in this example performs summation grouping sorting and row selection It also displays the query s results without the PRINT procedure PROC SQL executes without using the RUN statement After you invoke PROC SQL you can submit additional SQL procedure statements without submitting the PROC statement again Use the QUIT statement to terminate the procedure Notes about the Example Tables For all examples the following global statements are in effect o
44. ALL B 84 Producing Rows That Belong to Both Query Results INTERSECT A Chapter 3 select from sql a except all select from sql b Output 3 38 Producing Rows That Are in Only the First Query Result EXCEPT ALL A EXCEPT ALL B Producing Rows That Belong to Both Query Results INTERSECT The INTERSECT operator returns rows from the first query that also occur in the second proc sql title A INTERSECT B select from sql a intersect select from sql b Output 3 39 Producing Rows That Belong to Both Query Results INTERSECT A INTERSECT B The output of an INTERSECT ALL operation contains the rows produced by the first query that are matched one to one with a row produced by the second query In this example the output of INTERSECT ALL is the same as INTERSECT Retrieving Data from Multiple Tables A Concatenating Query Results OUTER UNION 85 Concatenating Query Results OUTER UNION The OUTER UNION operator concatenates the results of the queries This example concatenates tables A and B proc sql title A OUTER UNION B select from sql a outer union select from sql b Output 3 40 Concatenating the Query Results OUTER UNION A OUTER UNION B Notice that OUTER UNION does not overlay columns from the two tables To overlay columns in the same position use the CORRESPONDING keyword proc sql title A OUTER UNION CORR B select f
45. Albany Hartford Albuquerqu Amarillo Anchorage Annapolis Santa Fe Carlsbad Nome Washington Atlanta Knoxville Augusta Portland Austin San Antoni Baker Lewiston Baltimore Dover OrRFrRFONOUW OF OPO OCOOCOOWNDN CO amp The outer query joins the table to itself and determines the distance between the first city Alin table A and city B2 the first city that is not equal to city Al in Table B PROC SQL then runs the subquery The subquery does another self join and calculates the minimum distance between city Al and all other cities in the table other than city Al The outer query tests to see if the distance between cities Al and B2 is equal tothe minimum distance that was calculated by the subquery If they are equal then a row that contains cities Al and B2 with their coordinates and distance is written When to Use Joins and Subqueries Use a join or a subquery any time that you reference information from multiple tables J oins and subqueries are often used together in the same query In many cases you can solve a data retrieval problem by using a join a subquery or both Here are some guidelines for using joins and queries o If your report needs data that is from more than one table then you must perform a join Whenever multiple tables or views are listed in the FROM clause those tables become joined If you need to combine related information from different rows within a table then you can join the
46. BY clause It does not override your operating environment s default comparison operations for the WHERE clause a Operating Environment Information See the SAS documentation for your operating environment for more information about the default and other sorting sequences for your operating environment a 30 Sorting Columns That Contain Missing Values A Chapter 2 Sorting Columns That Contain Missing Values PROC SQL sorts nulls or missing values before character or numeric data therefore when you specify ascending order missing values appear first in the query results The following example sorts the rows in the CONTINENTS table by the LowPoint column proc sql title Continents Sorted by Low Point select Name LowPoint from sql continents order by LowPoint Because three continents have a missing value in the LowPoint column those continents appear first in the output Note that because the query does not specify a secondary sort rows that have the same value in the LowPoint column such as the first three rows of output are not displayed in any particular order In general if you do not explicitly specify a sort order then PROC SQL output is not guaranteed to be in any particular order Output 2 22 Sorting Columns That Contain Missing Values Continents Sorted by Low Point LowPoint Central America and Caribbean Antarctica Oceania Europe Caspian Sea Asia Dead Sea North America Death Valley Africa Lak
47. C SQL executes the view the stored query assigns the libref For SAS ACCESS libnames PROC SQL establishes a connection toa DBMS The scope of the libref is local to the view and does not conflict with any identically named librefs in the SAS session When the query finishes the libref is disassociated The connection to the DBMS is terminated and all data in the library becomes unavailable The advantage of embedded libnames is that you can store engine host options and DBMS connection information such as passwords in the view That in turn means that you do not have to remember and reenter that information when you want to use the libref 108 Deleting a View A Chapter 4 Note The USING LIBNAME clause must be the last clause in the SELECT statement Multiple clauses can be specified separated by commas A In the following example the libname OILINFO is assigned and a connection is made to an ORACLE database proc sql create view sql viewl as select from oilinfo reserves as newreserves using libname oilinfo oracle user username pass password path dbms path For more information about the SAS ACCESS LIBNAME statement see the SAS ACCESS documentation for your DBMS The following example embeds a SAS LIBNAME statement in a view proc sql create view sql view2 as select from oil reserves using libname oil SAS data library Deleting a View To delete a view use the DROP VIEW statement proc sql drop view sq
48. E statement You can also use the CONTENTS statement in the DATASETS procedure to get a description of NEWSTATES Creating Tables from a Query Result To create a PROC SQL table from a query result use a CREATE TABLE statement and place it before the SELECT statement When a table is created this way its data is derived from the table or view that is referenced in the query s FROM clause The new table s column names are as specified in the query s SELECT clause list The column attributes the type length informat and format are the same as those of the selected source columns The following CREATE TABLE statement creates the DENSITIES table from the COUNTRIES table The newly created table is not displayed in SAS output unless you query the table Note the use of the OUTOBS option which limits the size of the DENSITIES table to 10 rows proc sql outobs 10 title Densities of Countries create table sql densities as select Name Country format 15 Population format commal0 0 Area as SquareMiles Population Area format 6 2 as Density from sql countries select from sql densities 92 Creating Tables Like an Existing Table A Chapter 4 Output 4 2 Table Created from a Query Result Densities of Countries Country Population SquareMiles Density Afghanistan 17 070 323 251825 Albania 3 407 400 11100 Algeria 28 171 132 919595 Andorra 64 634 200 Angola 9 901 050 481300 Antigua and Bar 65 644 171 Argentina
49. E1 and LEAGUE2 The program uses all the values from the LEAGUE 1 table if available and replaces any missing values with the corresponding values from the LEAGUE2 table The results are shown in the following output options nodate nonumber linesize 80 pagesize 60 proc sql title Averages from Last Year s League When Possible title2 Supplemented when Available from Prior Year s League select coalesce lastyr fullname trim prioryr firstname prioryr lastname as Name format 26 coalesce lastyr bowler prioryr amfno as Bowler coalesce lastyr avgscore prioryr avgscore as Average format 8 from leaguel as lastyr full join league2 as prioryr on lastyr bowler prioryr amfno order by Bowler Output 6 6 PROC SQL Output for Overlaying Missing Values Averages from Last Year s League When Possible Supplemented when Available from Prior Year s League Bowler Average Mickey Raymond Patrick O Malley Andrew Shepherd Alexander Delarge Jack T Colton John T Chance How It Works This solution uses a full join to obtain all rows from LEAGUE as well as all rows from LEAGUE2 The program uses the COALESCE function on each column so that whenever possible there is a value for each column of a row Using the COALESCE function on a parenthesized list of expressions returns the first nonmissing value that is found For each row the following code returns the AvgScore column from LEAGUE1 for Average coalesc
50. ETWEEN AND Operators To select rows based on a range of values you can use the BETWEEN AND operators This example selects countries that have latitudes within five degrees of the E quator proc sql outobs 12 title Equatorial Cities of the World select City Country Latitude from sql worldcitycoords where Latitude between 5 and 5 Note In the tables used in these examples latitude values that are south of the Equator are negative Longitude values that are west of the Prime Meridian are also negative a Note Because the BETWEEN AND operators are inclusive the values that you specify in the BETWEEN AND expression are included in the results a 36 Using Other Conditional Operators A Chapter 2 Output 2 28 Using the BETWEEN AND Operators Equatorial Cities of the World Country Latitude Fortaleza Brazil Bogota Colombia Cali Colombia Brazzaville Congo Quito Ecuador Cayenne French Guiana Accra Ghana Medan Indonesia Palembang Indonesia Nairobi Kenya Kuala Lumpur Malaysia Using the LIKE Operator The LIKE operator enables you to select rows based on pattern matching For example the following query returns all countries in the SQL COUNTRIES table that begin with the letter Z and are any number of characters long or end with the letter a and are five characters long proc sql titlel Country Names that Begin with the Letter Z title2 or Are 5 Characters Long and End with the Letter a
51. Evanko Honolulu With PROC SQL presorting the data is not necessary The following PROC SQL join gives the same result as that shown in Output 3 23 proc sql title Table MERGED select s flight Supervisor Destination from fltsuper s fltdest d where s Flight d Flight When Only Some of the Values Match When only some of the values match in the BY variable you can use an outer join to produce the same result as a match merge To demonstrate this result here are two tables that have the column Flight in common The values of Flight are not the same in both tables FLTSUPER FLTDEST Flight Supervisor Flight Destination 145 Kang 145 Brussels 150 Miller 150 Paris 155 Evanko 165 Seattle 157 Lei A DATA step merge produces Output 3 24 data merged merge fltsuper fltdest by flight run proc print data merged noobs title Table MERGED run Retrieving Data from Multiple Tables A Comparing DATA Step Match Merges with PROC SQL Joins 73 Output 3 24 Merged Tables When Some of the Values Match Table MERGED Flight Supervisor Destination 145 Kang Brussels 150 Miller Paris 155 Evanko 157 Lei 165 Seattle To get the same result with PROC SQL use an outer join so that the query result will contain the nonmatching rows from the two tables In addition use the COALESCE function to overlay the Flight columns from both tables The following PROC SQL join gives the same result as
52. Feb or Mar depending upon the value of the month part of the invoice date column case when substr InvoiceDate 3 2 01 then InvoiceAmount end as Jan case when substr InvoiceSate 3 2 02 then InvoiceAmount end as Feb 148 Creating a Customized Sort Order A Chapter 6 case when substr InvoiceDate 3 2 03 then InvoiceAmount end as Mar The first or outer SELECT statement in the query selects the product uses the summary function SUM to accumulate the J an Feb and Mar amounts uses the GROUP BY statement to produce a line in the table for each product Notice that dates are stored in the input table as strings If the dates were stored as SAS dates then the CASE expression could be written as follows case when month InvoiceDate 1 then InvoiceAmount end as Jan case when month InvoiceDate 2 then InvoiceAmount end as Feb case when month InvoiceDate 3 then InvoiceAmount end as Mar Creating a Customized Sort Order Problem You want to sort data in a logical but not alphabetical sequence Background Information There is one input table called CHORES that contains the following data Output 6 17 Sample Input Data for a Customized Sort Garden Chores Project Season pruning winter mowing summer mulching fall raking fall raking spring planting spring planting fall sweeping winter edging summer seeding spring tilling spring aerating spring feeding summer
53. Iran 4 000 000 90 000 000 000 Norway 3 500 000 11 000 000 000 Mexico 3 400 000 50 000 000 000 China 3 000 000 25 000 000 000 70 Using the Coalesce Function in Joins A Chapter 3 The advantage of using a natural join is that the coding is streamlined The ON clause is implied and you do not need to use table aliases to qualify column names that are common to both tables These two queries return the same results proc sql select a W a X Y Z from tablel a left join table2 b on a W b W and a X b X order by a W proc sql select W X Y Z from tablel natural left join table2 order by W If you specify a natural join on tables that do not have at least one column with a common name and type then the result is a Cartesian product You can use a WHERE dause to limit the output Because the natural join makes certain assumptions about what you want to accomplish you should know your data thoroughly before using it You could get unexpected or incorrect results if for example you are expecting two tables to have only one column in common when they actually have two You can use the FEEDBACK option to see exactly how PROC SQL is implementing your query See Using PROC SQL Options to Create and Debug Queries on page 112 for more information about the FEEDBACK option A natural join assumes that you want to base the join on equal values of all pairs of common columns To base the join on inequalities or other compariso
54. Jakarta 202 393 859 Russia Moscow 151 089 979 United States Washington 263 294 808 The new column is added to NEWCOUNTRIES but it has no data values The following UPDATE statement changes the missing values for Density from missing to the appropriate population densities for each country proc sql update sql newcountries set density population area 100 Modifying a Column A Chapter 4 title Population Density Table select name format 20 capital format 15 population format commal15 0 density from sql newcountries Output 4 12 Filling in the New Column s Values Population Density Table Population Capital Population Density Brazil Brasilia 160 310 357 China Beijing 1 202 215 077 India New Delhi 929 009 120 Indonesia Jakarta 202 393 859 Russia Moscow 151 089 979 United States Washington 263 294 808 For more information about how to change data values see Updating Data Values in a Table on page 96 You can accomplish the same update by using an arithmetic expression to create the Population Density column as you recreate the table proc sql create table sql newcountries as select population area as density label Population Density format 6 2 from sql newcountries See Calculating Values on page 19 for another example of creating columns with arithmetic expressions Modifying a Column You can use the MODIFY clause to change the width informat format
55. OUNTRIES group by continent Updating a View You can update data through a PROC SQL and SAS ACCESS view with the INSERT DELETE and UPDATE statements under the following conditions You can update only a single table through a view The underlying table cannot be joined to another table or linked to another table with a set operator The view cannot contain a subquery If the view accesses a DBMS table then you must have been granted the appropriate authorization by the external database management system for example ORACLE You must have installed the SAS ACCESS software for your DBMS See the SAS ACCESS documentation for your DBMS for more information about SAS ACCESS views You can update a column in a view by using the column s alias but you cannot update a derived column that is a column that is produced by an expression In the following example you can update SquareMiles but not Density proc sql create view mycountries as select Name area as SquareMiles population area as Density from sql countries You can update a view that contains a WHERE clause The WHERE clause can be in the UPDATE clause or in the view You cannot update a view that contains any other clause such as ORDER BY HAVING and so forth Embedding a Libname in a View You can embed a SAS LIBNAME statement or a SAS ACCESS LIBNAME statement in a view by using the USING LIBNAME clause When PRO
56. Output 6 22 SQL UNITEDSTATES with Updated Population Data Partial Output UNITEDSTATES Capital Population Continent Statehood Alabama Montgomery 4447100 America 14DEC1819 Alaska Juneau 626932 America 03JAN1959 Arizona Phoenix 5130632 America 14FEB1912 Arkansas Little Rock 2447996 America 15JUN1836 California Sacramento 31518948 America 09SEP1850 Colorado Denver 3601298 America 01AUG1876 Connecticut Hartford 3405565 America 09JAN1788 Delaware Dover 707232 America 07DEC1787 District of Colum Washington 612907 America 21FEB1871 Florida Tallahassee 13814408 America 03MAR1845 How It Works The UPDATE statement updates values in the SQL UNITEDSTATES table here with the alias U For each row in the SQL UNITEDSTATES table the in line view in the SET clause returns a single value For rows that have a corresponding row in SQL NEWPOP this value is the value of the Population column from SQL NEWPOP For rows that do not have a corresponding row in SQL NEWPOP this value is missing In both cases the returned value is assigned to the Population column The WHERE clause ensures that only the rows in SQL UNITEDSTATES that havea corresponding row in SQL NEWPOP are updated by checking each value of Name against the list of state names that is returned from the in line view Without the WHERE clause rows that do not have a corresponding row in SQL NEWPOP would have their Population values updated to missing Creating and Using Mac
57. Overview A Chapter 6 Creating a Customized Sort Order 148 Problen 148 Background Information 148 Solution 149 How It Works 149 Conditionally Updating a Table 150 Problen 150 Background Information 150 Solution 151 How It Works 152 Updating a Table with Values from Another Table 153 Problen 153 Background Information 153 Solution 153 How It Works 154 Creating and Using Macro Variables 154 Problem 154 Background Information 154 Solution 155 How It Works 157 Using PROC SQL Tables in Othe SAS Procedures 157 Problen 157 Background Information 157 Solution 157 How It Works 159 Overview This section shows you examples of solutions that PROC SQL can provide Each example includes a statement of the problem to solve background information that you must know to solve the problem the PROC SQL solution code and an explanation of how the solution works Computing a Weighted Average Problem You want to compute a weighted average of a column of values Background Information There is one input table called Sample that contains the following data Practical Problem Solving with PROC SQL A Solution 135 Output 6 1 Sample Input Table for Weighted Averages Sample Data for Weighted Average Obs Value Weight Gender 2893 35 9 0868 56 13 26 2171 901 43 4 0605 2942 68 5 6557 621 16 24 3306 361 50 13 8971 2575 09 29 3734 2157 07 7 0687 690 73 40 1271 2085 80 24 4795 RRRA 1 2 3 4 5 6 7 8 9 0
58. R1 from the outer query are looked up in the inner query by means of the index An index can improve the processing of a table join if the columns that participate in the join are indexed in one of the tables This optimization can be done for equijoin queries only that is when the WHERE expression specifies that tablel X table2 Y Using the Keyword ALL in Set Operations Set operators such as UNION OUTER UNION EXCEPT and INTERSECT can be used to combine queries Specifying the optional ALL keyword prevents the final process that eliminates duplicate rows from the result table You should use the ALL form when you know that there are no duplicate rows or when it does not matter if the duplicate rows remain in the result table Omitting the ORDER BY Clause When Creating Tables and Views If you specify the ORDER BY clause when a table or view is created then the data is always displayed in that order unless you specify another ORDER BY clausein a query that references that table or view As with any kind of sorting procedure using ORDER BY when retrieving data has certain performance costs especially on large tables If the order of your output is not important for your results then your queries will typically run faster without an ORDER BY clause Using In Line Views versus Temporary Tables It is often helpful when you are exploring a problem to break a query down into several steps and create temporary tables to hold the intermediate r
59. ROC SQL from displaying the results of the query The PUT statement writes the contents of the macro variables to the SAS log proc sql noprint select country barrels into countryl barrels1 from sql oilrsrvs Sput amp countryl amp barrelsl 122 Creating Macro Variables in PROC SQL A Chapter 5 Output 5 8 Creating Macro Variables from the First Row of a Query Result proc sql noprint select country barrels into countryl barrels1l from sql oilrsrvs 4 5 6 7 8 9 Sput amp countryl amp barrelsl Algeria 9 200 000 000 NOTE PROCEDURE SQL used real time 0 12 seconds Creating a Macro Variable from the Result of an Aggregate Function A useful feature of macro variables is that they enable you to display data values in SAS titles The following example prints a subset of the WORLDTEMPS table and lists the highest temperature in Canada in the title proc sql outobs 12 reset noprint select max AvgHigh into maxtemp from sql worldtemps where country Canada reset print title The Highest Temperature in Canada amp maxtemp select city AvgHigh format 4 1 from sql worldtemps where country Canada Note You must use double quotation marks in the TITLE statement to resolve the reference to the macro variable a Output 5 9 Including a Macro Variable Reference in the Title The Highest Temperature in Canada Avg Montreal Quebec Toronto Creating Multipl
60. Singapore 2 887 301 Luxembourg 405 980 Malta 370 633 Maldives 254 495 Bangladesh 126 387 850 Bahrain 591 800 Taiwan 21 509 839 Channel Islands 146 436 Barbados 258 534 Korea South 45 529 277 Mauritius 1 128 057 Sorting by Column Position You can sort by any column within the SELECT clause by specifying its numerical position By specifying a position instead of a name you can sort by a calculated column that has no alias The following example does not assign an alias to the calculated density column Instead the column position of 4 in the ORDER BY clause refers to the position of the calculated column in the SELECT clause proc sql outobs 12 title World Population Densities per Square Mile select Name Population format commal2 Area format comma8 Population Area format commal0 label Density from sql countries order by 4 desc Note PROC SQL uses a label if one has been assigned as a heading for a column that does not have an alias A Output 2 20 Sorting by Column Position World Population Densities per Square Mile Population Density Hong Kong 5 857 414 Singapore 2 887 301 Luxembourg 405 980 Malta 370 633 Maldives 254 495 Bangladesh 126 387 850 Bahrain 591 800 Taiwan 21 509 839 Channel Islands 146 436 Barbados 258 534 Korea South 45 529 277 Mauritius 1 128 057 Retrieving Data froma SingleTable A Specifying a Different Sorting Sequence 29 Sorting by Unselected Column
61. T clause nor the optional HAVING clause of the associated table expression referenced a summary function Grouping by Multiple Columns To group by multiple columns separate the column names with commas within the GROUP BY clause You can use aggregate functions with any of the columns that you select The following example groups by both Location and Type producing total square miles for the deserts and lakes in each location in the SQL FEATURES table proc sql title Total Square Miles of Deserts and Lakes select Location Type sum Area as TotalArea format commal6 from sql features where type in Desert Lake group by Location Type 48 Grouping and Sorting Data A Chapter 2 Output 2 45 Grouping by Multiple Columns Total Square Miles of Deserts and Lakes Location TotalArea 3 725 000 Africa 50 958 Asia 25 300 Australia Desert 300 000 Canada Lake 12 275 China Desert 500 000 Europe Asia Lake 143 550 North America Desert 140 000 North America Lake 77 200 Russia Lake 11 780 Saudi Arabia Desert 250 000 Grouping and Sorting Data You can order grouped results with an ORDER BY clause The following example takes the previous example and adds an ORDER BY clause to change the order of the Location column from ascending order to descending order proc sql title Total Square Miles of Deserts and Lakes select Location Type sum Area as TotalArea format commal6 from s
62. T less than where Depth lt 5000 gt GE greater than or equal where Statehood to gt 01jan1860 d lt LE less than or equal to where Population lt 5000000 The following example subsets the SQL UNITEDSTATES table by including only states with populations greater than 5 000 000 people proc sql title States with Populations over 5 000 000 select Name Population format commal0 from sql unitedstates where Population gt 5000000 order by Population desc 32 Retrieving Rows That Satisfy Multiple Conditions A Chapter 2 Output 2 24 Retrieving Rows Based on a Comparison States with Populations over 5 000 000 Population California 31 518 948 New York 18 377 334 Texas 18 209 994 Florida 13 814 408 Pennsylvania 12 167 566 Illinois 11 813 091 Ohio 11 200 790 Michigan 9 571 318 New Jersey 7 957 196 North Carolina 7 013 950 Georgia 6 985 572 Virginia 6 554 851 Massachusetts 6 071 816 Indiana 5 769 553 Washington 5 307 322 Missouri 5 285 610 Tennessee 5 149 273 Wisconsin 5 087 770 Maryland 5 014 048 Retrieving Rows That Satisfy Multiple Conditions You can use logical or Boolean operators to construct a WHERE clause that contains two or more expressions The following table lists the logical operators that you can use Table 2 3 Logical Boolean Operators Symbol Mnemonic Equivalent Definition Example amp AND specifies that both the Continent Asia previous and following and Popu
63. TE TABLE to copy an existing table Creating Tables from Column Definitions You can create a new table without rows by using the CREATE TABLE statement to define the columns and their attributes You can specify a column s name type length informat format and label The following CREATE TABLE statement creates the NEWSTATES table proc sql create table sql newstates state char 2 2 character column for state abbreviation date num column for date of entry into the US informat date9 with an informat format date9 and format of DATE9 population num column for population The table NEWSTATES has 3 columns and 0 rows The char 2 modifier is used to change the length for State Use the DESCRIBE TABLE statement to verify that the table exists and to see the column attributes The following DESCRIBE TABLE statement writes a CREATE TABLE statement to the SAS log proc sql describe table sql newstates Creating and Updating Tables and Views A Creating Tables from a Query Result 91 Output 4 1 Table Created from Column Definitions 1 proc sql 2 describe table sql newstates NOTE SQL table SQL NEWSTATES was created like create table SOL NEWSTATES bufsize 8192 state char 2 date num format DATE9 informat DATE9 population num i DESCRIBE TABLE writes a CREATE TABLE statement to the SAS log even if you did not create the table with the CREATE TABL
64. Using an In Line View Countries With Population GT Caribbean Countries Bangladesh Brazil China Germany India Indonesia Japan Mexico Nigeria Pakistan Philippines Russia United States Vietnam Population 126 387 850 160 310 357 1 202 215 077 81 890 690 929 009 120 202 393 859 126 345 434 93 114 708 99 062 003 123 062 252 70 500 039 151 089 979 263 294 808 73 827 657 TotCarib 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 66 815 930 Tips for Using SQL Procedure Views Avoid using an ORDER BY clause in a view If you specify an ORDER BY clause then the data must be sorted each time that the view is referenced If data is used many times in one program or in multiple programs then it is more efficient to create a table rather than a view If a view is referenced often in one program then the data must be accessed at each reference If the view resides in the same SAS data library as the contributing table s then specify a one level name in the FROM clause The default for the libref for the FROM clause s table or tables is the libref of the library that contains the view This prevents you from having to change the view if you assign a different libref to the SAS data library that contains the view and its contributing table or tables This tip is used in the view that is desc
65. a Philippines Bangkok Thailand Singapore Singapore Bombay India Kingston Jamaica San Juan Puerto Rico Calcutta India Havana Cuba Nassau Bahamas Displaying Sums The following example uses the SUM function to return the total oil reserves for all countries in the SQL OILRSRVS table proc sql title World Oil Reserves select sum Barrels format commal8 as TotalBarrels from sql oilrsrvs Note TheSUM function produces a single row of output for the requested sum because no nonaggregate value appears in the SELECT clause a Retrieving Data froma Single Table A Remerging Summary Statistics 41 Output 2 34 Displaying Sums World Oil Reserves TotalBarrels 878 300 000 000 Combining Data from Multiple Rows into a Single Row In the previous example PROC SQL combined information from multiple rows of data into a single row of output Specifically the world oil reserves for each country were combined to form a total for all countries Combining or rolling up of rows occurs when the SELECT clause contains only columns that are specified within an aggregate function the WHERE clause if there is one contains only columns that are specified in the SELECT clause Remerging Summary Statistics The following example uses the MAX function to find the largest population in the SQL COUNTRIES table and displays it in a column called MaxPopulation Aggregate functions such as the
66. a colon The host variable stores the values of the object items that are listed in the SELECT clause The only host language that is currently available in SAS is the macro language which is part of Base SAS software When a calculation is performed on a column s value its result can be stored using macro variable in the macro facility The result can then be referenced by that name in another PROC SQL query or SAS procedure Host variable stores the values of the object items that are listed in the SELECT clause Host variable can be used only in the outer query of a SELECT statement not in a subquery Host variable cannot be used in a CREATE statement If the query produces more than one row of output then the macro variable will contain only the value from the first row If the query has no rows in its output then the macro variable is not modified or if the macro variable does not exist yet it is not created The PROC SQL macro variable SQLOBS contains the number of rows that are produced by the query Creating Macro Variables from the First Row of a Query Result If you specify a single macro variable in the INTO clause then PROC SQL assigns the variable the value from the first row only of the appropriate column in the SELECT list In this example amp country1 is assigned the value from the first row of the Country column and amp barrels1 is assigned the value from the first row of the Barrels column The NOPRINT option prevents P
67. ables A Inner Joins 57 Output 3 2 Cartesian Product of Table One and Table Two Table One and Table Two J oining tables in this way returns the Cartesian product of the tables Each row from the first table is combined with every row from the second table When you run this query the following message is written to the SAS log Output 3 3 Cartesian Product Log Message NOTE The execution of this query involves performing one or more Cartesian product joins that can not be optimized The Cartesian product of large tables can be huge Typically you want a subset of the Cartesian product You specify the subset by declaring the join type There are two types of joins Inner J oins return a result table for all the rows in a table that have one or more matching rows in the other table or tables that are listed in the FROM clause Outer J oins are inner joins that are augmented with rows that did not match with any row from the other table in the join There are three kinds of outer joins left right and full Inner Joins An inner join returns only the subset of rows from the first table that matches rows from the second table You can specify the columns that you want to be compared for matching values in a WHERE clause The following code adds a WHERE clause to the previous query The WHERE clause specifies that only rows whose values in column X of Table One match values in column X
68. about the operators that are available for use with subqueries see the section about the SQL procedure in the Base SAS Procedures Guide Output 3 30 Correlated Subquery Oil Reserves of Countries in Africa Country Barrels 9 200 000 000 4 000 000 000 1 000 000 000 30 000 000 000 16 000 000 000 Testing for the Existence of a Group of Values The EXISTS condition tests for the existence of a set of values An EXISTS condition is true if any rows are produced by the subquery and it is false if no rows are produced Conversely the NOT EXISTS condition is true when a subquery produces an empty table This example produces the same result as Output 3 30 EXISTS checks for the existence of countries that have oil reserves on the continent of Africa Note that the WHERE clause in the subquery now contains the condition continent Africa that was in the outer query in the previous example proc sql title Oil Reserves of Countries in Africa select from sql oilrsrvs o where exists 78 Multiple Levels of Subquery Nesting A Chapter 3 select Continent from sql countries c where o Country c Name and Continent Africa Output 3 31 Testing for the Existence of a Group of Values Oil Reserves of Countries in Africa Country Barrels 9 200 000 000 4 000 000 000 1 000 000 000 30 000 000 000 Nigeria 16 000 000 000 Multiple Levels of Subquery Nesting Subqueries can be nested so
69. ame relative position in the two queries must have the same data types The column names of the tables in the first query become the column names of the output table For information about using set operators with more than two query results see the section about the SQL procedure in the Base SAS Procedures Guide The following optional keywords give you more control over set operations ALL does not suppress duplicate rows When the keyword ALL is specified PROC SQL does not make a second pass through the data to eliminate duplicate rows Thus using ALL is more efficient than not using it ALL is not necessary with the OUTER UNION operator CORRESPONDING CORR overlays columns that have the same name in both tables When used with EXCEPT INTERSECT and UNION CORR suppresses columns that are not in both tables Each set operator is described and used in an example based on the following two tables Output 3 34 Tables Used in Set Operation Examples Table A 82 Producing Unique Rows from Both Queries UNION A Chapter 3 Table B Whereas join operations combine tables horizontally set operations combine tables vertically Therefore the set diagrams that are included in each section are displayed vertically Producing Unique Rows from Both Queries UNION The UNION operator combines two query results It produces all the unique rows that result from both queries that is it returns a row if it occurs in the fi
70. and us Name pc Name and pe Code c State 64 Inner Joins A Chapter 3 Output 3 13 Selecting Data from More Than Two Tables Coordinates of State Capitals Capital Latitude Longitude Annapolis Maryland Atlanta Georgia Augusta Maine Austin Texas Baton Rouge Louisiana Bismarck North Dakota Boise Idaho Boston Massachusetts Carson City Nevada Showing Relationships within a Single Table Using Self Joins When you need to show comparative relationships between values in a table it is sometimes necessary to join columns within the same table J coining a table to itself is called a sef join or reflexive join You can think of a self join as PROC SQL making an internal copy of a table and joining the table to its copy For example the following code uses a self join to select cities that have average yearly high temperatures equal to the average yearly low temperatures of other cities proc sql title Cities High Temps Cities Low Temps select High City format 12 High Country format 12 High AvgHigh Low City format 12 Low Country format 12 Low AvgLow from sql worldtemps High sql worldtemps Low where High AvgHigh Low AvgLow and High city ne Low city and High country ne Low country Notice that the WORLDTEMPS table is assigned two aliases High and Low Conceptually this makes a copy of the table so that a join may be made between the table and its copy The WHERE clause selects
71. as totarea format commal5 label Total Area from sql countries group by continent select from sql newcontinents Output 4 17 An SQL Procedure View Current Population Information for Continents Total Continent Population Total Area 384 772 876 800 Africa 710 529 592 11 299 595 Asia 3 381 858 879 12 198 325 Australia 18 255 944 2 966 200 Central America and Caribbean 66 815 930 291 463 Europe 813 335 288 9 167 084 North America 384 801 818 8 393 092 Oceania 5 342 368 129 600 South America 317 568 801 6 885 418 Note In this example each column has a name If you are planning to use a view in a procedure that requires variable names then you must supply column aliases that you can reference as variable names in other procedures For more information see Using SQL Procedure Views in SAS Software on page 109 a Describing a View The DESCRIBE VIEW statement writes a description of the PROC SQL view tothe SAS log The following SAS log describes the view NEWCONTINENTS which is created in Creating Views on page 106 Creating and Updating Tables and Views A Embedding a Libname in a View 107 proc sql describe view sql newcontinents Output 4 18 SAS Log from DESCRIBE VIEW Statement NOTE SQL view SQL NEWCONTINENTS is defined as select continent SUM population as totpop label Total Population format COMMA15 0 SUM area as totarea label Total Area format COMMA15 0 from SQL C
72. at have a larger population than Belgium The first query does this with a join the second with a subquery Output 5 2 shows the STIMER results from the SAS log proc sql stimer select us name us population from sql unitedstates as us sql countries as w where us population gt w population and w name Belgium select Name population from sql unitedstates where population gt select population from sql countries where name Belgium Output 5 2 Comparing Run Times of Two Queries 4 proc sql stimer NOTE SQL Statement used real time 0 00 seconds cpu time 0 01 seconds select us name us population from sql unitedstates as us sql countries as w where us population gt w population and w name Belgium The execution of this query involves performing one or more Cartesian product joins that can not be optimized SQL Statement used real time 0 10 seconds cpu time 0 05 seconds select Name population from sql unitedstates where population gt select population from sql countries where name Belgium SQL Statement used real time 0 09 seconds cpu time 0 09 seconds Programming with the SQL Procedure A Using Indexes to Improve Performance 115 Compare the CPU time of the first query that uses a join 0 05 seconds with 0 09 seconds for the second query that uses a subquery Although there are many factors that influence the run times of queries in general a join runs faster
73. ate 2 00 else payrate 3 00 end else payrate end update incentives set payrate case when whatnots gt 2000 then case when department in E2 M2 U2 then payrate 1 00 else payrate 0 50 end else payrate end title Adjusted Payrates Based on Sales of Gadgets and Whatnots select from incentives 152 How It Works A Chapter 6 Output 6 20 PROC SQL Output for Conditionally Updating a Table Adjusted Payrates Based on Sales of Gadgets and Whatnots Department Payrate Gadgets Whatnots Jack Colton Mickey Raymond Dean Proffit Antoinette Lily Sydney Wade Alan Traherne Elizabeth Bennett How It Works This solution performs consecutive updates to the payrate column of the incentive table The first update uses a nested case expression first determining a bracket that is based on the amount of gadget sales greater than 10 000 calls for an incentive of 5 between 5 000 and 10 000 requires an additional comparison That is accomplished with a nested case expression that checks department code to choose between a 2 and 3 incentive update incentives set payrate case when gadgets gt 10000 then payrate 5 00 when gadgets gt 5000 then case when department in E1 E2 then payrate 2 00 else payrate 3 00 end else payrate end The second update is similar though simpler All sales of whatnots over 2 000 merit an incentive either 50 or 1 depending on t
74. ated in the PROC SQL step The macro uses a DO loop to execute a DATA step for each unique value writing rows that contain a given value of Type to a SAS data set of the same name The Type variable is dropped from the output data sets For more information about SAS macros see SAS Macro Language Reference Using PROC SQL Tables in Other SAS Procedures Problem You want to show the average high temperatures in degrees Celsius for European countries on a map Background Information The SQL WORLDTEMPS table has average high and low temperatures for various cities around the world Output 6 25 WORLDTEMPS Partial Output Amsterdam Athens Auckland Bangkok Beijing Belgrade Berlin Bogota Bombay WORLDTEMPS Country AvgHigh Netherlands Greece New Zealand Thailand China Yugoslavia Germany Colombia India Solution Use the following PROC SQL and PROC GMAP code to produce the map You must license SAS GRAPH software to use PROC GMAP 158 Solution A Chapter 6 options fmtsearch sashelp mapfmts proc sql create table extremetemps as select country round mean avgHigh 32 1 8 as High input put country glcsmn best as ID from sql worldtemps where calculated id is not missing and country in select name from sql countries where continent Europe group by country quit proc gmap map maps europe data extremetemps all id id block high levels 3 title Average High T
75. ating State Values US Cities City Latitude Longitude Anchorage Nome Juneau Mobile Montgomery Birmingham Hot Springs Little Rock Flagstaff The following code uses PROC REPORT to format the output so that the state codes appear only once for each state group A WHERE clause subsets the data so that the report lists the coordinates of cities in Pacific Rim states only For complete information about PROC REPORT see the Base SAS Procedures Guide proc sql noprint create table sql cityreport as select from sql uscitycoords group by state proc report data sql cityreport headline headskip 128 Accessing a DBMS with SAS ACCESS Software A Chapter 5 title Coordinates of U S Cities in Pacific Rim States column state city Coordinates latitude longitude define state order format 2 width 5 State define city order format 15 width 15 City define latitude display format 4 width 8 Latitude define longitude display format 4 width 9 Longitude where state AK or state HI or state WA or state OR or state CA run Output 5 17 PROC REPORT Output Showing the First Occurrence Only of Each State Value Coordinates of U S Cities in Pacific Rim States Coordinates Latitude Longitude Anchorage Juneau Nome Sitka El Centro Fresno Long Beach Los Angeles Oakland Sacramento San Diego San Francisco San Jose Honolulu
76. blel columnA table2 columnB The SQL procedure processes joins that do not have an equijoin condition in a sequential fashion evaluating each row against the WHERE expression that is joins without an equijoin condition are not evaluated using sort merge or index lookup techniques Evaluating left and right outer joins is generally comparable to or only slightly slower than a standard inner join A full outer join usually requires two passes over both tables in the join although the SQL procedure tries to store as much data as possible in buffers thus for small tables an outer join may be processed with only one physical read of the data Accessing SAS System Information Using DICTIONARY Tables DICTIONARY tables are special read only PROC SQL tables They retrieve information about all the SAS data libraries SAS data sets SAS system options and external files that are associated with the current SAS session PROC SQL automatically assigns the DICTIONARY libref To get information from DICTIONARY tables specify DICTIONARY tablenamein the FROM clause DICTIONARY table name is valid in PROC SQL only However SAS provides PROC SQL views based on the DICTIONARY tables that can be used in other SAS procedures and in the DATA step These views are stored in the SASHELP library and are commonly called SASHELP views The following table lists some of the DICTIONARY tables and the names of their corresponding views For a complete list
77. bquery runs it passes the results back to the outer query Correlated subqueries can return single or multiple values Retrieving Data from Multiple Tables A Testing for the Existence of a Group of Values 77 This example selects all major oil reserves of countries on the continent of Africa proc sql title Oil Reserves of Countries in Africa select from sql oilrsrvs o where Africa select Continent from sql countries c where c Name o Country The outer query selects the first row from the OILRSRVS table and then passes the value of the Country column Algeria to the subquery At this point the subquery internally looks like this select Continent from sql countries c where c Name Algeria The subquery selects that country from the COUNTRIES table The subquery then passes the country s continent back to the WHERE clause in the outer query If the continent is Africa then the country is selected and displayed The outer query then selects each subsequent row from the OILRSRVS table and passes the individual values of Country to the subquery The subquery returns the appropriate values of Continent to the outer query for comparison in its WHERE clause Note that the WHERE clause uses an equal operator You can use an if the subquery returns only a single value However if the subquery returns multiple values then you must use IN or a comparison operator with ANY or ALL For detailed information
78. cases you might need to use an aggregate function so that you can use its results in another calculation To do this you need only to construct one query for PROC SQL to automatically perform both calculations This type of operation also causes PROC SQL to remerge the data For example if you want to find the percentage of the total world population that resides in each country then you construct a single query that obtains the total world population by using the SUM function divides each country s population by the total world population PROC SQL runs an internal query to find the sum and then runs another internal query to divide each country s population by the sum proc sql outobs 12 title Percentage of World Population in Countries select Name Population format commal4 Population sum Population 100 as Percentage format comma8 2 from sql countries order by Percentage desc Note When a query remerges data PROC SQL displays a note in the log to indicate that data remerging has occurred a Retrieving Data from a Single Table A Using Aggregate Functions with Unique Values 43 Output 2 36 Remerging Summary Statistics Percentage of World Population in Countries Population Percentage 1 202 215 077 929 009 120 United States 263 294 808 Indonesia 202 393 859 Brazil 160 310 357 Russia 151 089 979 Bangladesh 126 387 850 Japan 126 345 434 Pakistan 123 062 252 Nigeria 99 062 003 Mexico
79. ced if the GLCSMN format does not recognize the country name The GROUP BY clause is required so that the mean temperature can be calculated for each country rather than for the entire table The PROC GMAP step uses the ID variable to identify each country and places a block representing the High value on each country on the map The ALL option ensures that countries such as the United Kingdom in this example that do not have High values are also drawn on the map In the BLOCK statement the LEVELS option specifies how many response levels are used in the graph For more information about the GMAP procedure see SAS GRAPH Reference Volumes 1 and 2 161 APPENDIX 1 Recommended Reading Recommended Reading 161 Recommended Reading Here is the recommended reading list for this title Base SAS Procedures Guide Cody s Data Cleaning Techniques Using SAS Software Combining and Modifying SAS Data Sets Examples SAS GRAPH Reference Volumes 1 and 2 SAS Language Reference Concepts SAS Language Reference Dictionary SAS Macro Language Reference For a complete list of SAS publications see the current SAS Publishing Catalog To order the most current publications or to receive a free copy of the catalog contact a SAS representative at SAS Publishing Sales SAS Campus Drive Cary NC 27513 Telephone 800 727 3228 Fax 919 677 8166 E mail sasbook sas com Web addres
80. ch value of the indexed column or each combination of values of the columns in a composite index is unique use the UNIQUE keyword proc sql create unique index places on sql newcountries name continent Using the UNIQUE keyword causes SAS to reject any change to a table that would cause more than one row to have the same index value Tips for Creating Indexes The name of the composite index cannot be the same as the name of one of the columns in the table If you use two columns to access data regularly such as a first name column and a last name column from an employee database then you should create a composite index for the columns Keep the number of indexes to a minimum to reduce disk space and update costs Use indexes for queries that retrieve a relatively small number of rows less than 15 In general indexing a small table does not result in a performance gain In general indexing on a column with a small number less than 6 or 7 of distinct values does not result in a performance gain Creating and Updating Tables and Views A Creating and Using Integrity Constraints in a Table 103 You can use the same column in a simple index and in a composite index However for tables that have a primary key integrity constraint do not create more than one index that is based on the same column as the primary key Deleting Indexes To delete an index from a tab
81. criptor information plus other information that is required for retrieving data values from other SAS data sets or from files whose contents are in other software vendors file formats SAS data view a type of SAS data set that retrieves data values from other files A SAS data view contains only descriptor information such as the data types and lengths of the variables columns plus other information that is required for retrieving data values from other SAS data sets or from files that are stored in other software vendors file formats SAS data views are of member type VIEW See also SAS data set simple index an index that uses the values of only one variable to locate observations See also composite index index SQL Structured Query Language a standardized high level query language that is used in relational database management systems to create and manipulate database management system objects SAS implements SQL through the SQL procedure Structured Query Language See SQL Structured Query Language table in the SQL procedure a SAS data file See also SAS data file union join a type of join that returns all rows with their respective values from each input table Columns that do not exist in one table will have null missing values for those rows in the result table See join view a definition of a virtual data set The definition is named and stored for later use A view contains no data it merely describes or de
82. ction To obtain a count of all rows in the table including countries that are not on a continent you can use the following code in the SELECT clause proc sql title Number of Countries in the SQL COUNTRIES Table select count as Number from sql countries Output 2 39 Using the COUNT Function to Count All Rows in a Table Number of Countries in the SQL COUNTRIES Table Summarizing Data with Missing Values When you use an aggregate function with data that contains missing values the results might not provide the information that you expect because many aggregate functions ignore missing values Finding Errors Caused by Missing Values The AVG function returns the average of only the nonmissing values The following query calculates the average length of three features in the SQL FEATURES table Angel Falls and the Amazon and Nile rivers incorrect output proc sql title Average Length of Angel Falls Amazon and Nile Rivers Retrieving Data froma SingleTable A Grouping Data 45 select Name Length avg Length as AvgLength from sql features where Name in Angel Falls Amazon Nile Output 2 40 Finding Errors Caused by Missing Values Incorrect Output Average Length of Angel Falls Amazon and Nile Rivers Length AvgLength 4072 5 Angel Falls 4072 5 Nile 4072 5 Because no length is stored for Angel Falls the average includes only the Amazon and Ni
83. d print the view by using the PRINT procedure The LIBNAME engine optimizes the processing of joins and WHERE clauses by passing these operations directly to the DBMS to take advantage of DBMS indexing and processing capabilities libname mydblib oracle user user id password password proc sql create view LON as select flight dates idnum from mydblib schedule where dest LON quit proc print data work LON noobs run Programming with the SQL Procedure A Displaying DBMS Data with the PROC SQL Pass Through Facility 131 Output 5 19 Output from the PRINT Procedure FLIGHT 219 04MAR1998 219 04MAR1998 219 04MAR1998 219 O4MAR1998 219 04MAR1998 219 04MAR1998 219 O5MAR1998 219 O5MAR1998 219 O5MAR1998 219 O5MAR1998 219 O5MAR1998 219 O5MAR1998 219 06MAR1998 219 06MAR1998 219 06MAR1998 219 06MAR1998 219 O06MAR1998 219 06MAR1998 Displaying DBMS Data with the PROC SQL Pass Through Facility Use the PROC SQL Pass Through Facility when you want to interact with DBMS data by using SQL syntax that is specific to your DBMS In this example SAS ACCESS connects to an ORACLE database by using the alias ora2 selects all rows in the STAFF table and displays the first 15 rows of data by using PROC SQL proc sql outobs 15 connect to oracle as ora2 user user id password password select from connection to ora2 select lname fname state from staff disconnect from ora2 q
84. dref Einstein relativity saddref Smythe sailing saddref Naish sailing Output 5 13 Defining Macros to Create Tables 34 taddref Conner sailing NOTE 1 row was inserted into SQL REFEREE NOTE Conner has been added for subject sailing 35 taddref Fay sailing NOTE 1 row was inserted into SQL REFEREE NOTE Fay has been added for subject sailing 36 addref Einstein relativity NOTE 1 row was inserted into SQL REFEREE NOTE Einstein has been added for subject relativity 37 taddref Smythe sailing NOTE 1 row was inserted into SQL REFEREE NOTE Smythe has been added for subject sailing 38 taddref Naish sailing The output has a row added with each execution of the 4DDREF macro When the table contains three referee names it is displayed in SAS output with the message that it can accept no more referees Output 5 14 Result Table and Message Created with SAS Macro Language Interface ERROR Naish not inserted for subject sailing There are 3 referees already Subject Conner sailing Fay sailing Smythe sailing 126 Using the PROC SQL Automatic Macro Variables A Chapter 5 Using the PROC SQL Automatic Macro Variables PROC SQL assigns values to three automatic macro variables after it executes each statement You can use these macro variables to test your SQL programs and to determine whether to continue processing SQLOBS contains the number of rows that were
85. e North Frigid North Pole 90 Arctic Cirde 67 North Temperate Arctic Cirde 67 Tropic of Cancer 23 Torrid Tropic of Cancer 23 Tropic of Capricorn 23 South Temperate Tropic of Capricorn 23 Antarctic Cirde 67 South F rigid Antarctic Cirde 67 South Pole 90 In this example a CASE expression determines the climate zone for each city based on the value in the Latitude column in the SQL WORLDCITYCOORDS table The query also assigns an alias of Location to the value You must close the CASE logic with the END keyword proc sql outobs 12 title Climate Zones of World Cities select City Country Latitude case when Latitude gt 67 then North Frigid when 67 ge Latitude ge 23 then North Temperate when 23 gt Latitude gt 23 then Torrid when 23 ge Latitude ge 67 then South Temperate else South Frigid end as ClimateZone from sql worldcitycoords order by City Retrieving Data froma SingleTable A Output 2 12 Using a Simple CASE Expression Acapulco Accra Adana Addis Ababa Adelaide Aden Ahmenabad Algiers Alice Springs Amman Amsterdam Climate Zones of World Cities Country Mexico Ghana Turkey Ethiopia Australia Yemen India Algeria Australia Jordan Netherlands Latitude Assigning Values Conditionally ClimateZone North Temperate Torrid Torrid North Temperate Torrid South Temperate Torrid Torrid North Temperate South Temperate North Temperate North Temperate 23
86. e Assal Australia Lake Eyre South America Valdes Peninsula Retrieving Rows That Satisfy a Condition The WHERE clause enables you to retrieve only rows from a table that satisfy a condition WHERE clauses can contain any of the columns in a table including unselected columns Using a Simple WHERE Clause The following example uses a WHERE clause to find all countries that are in the continent of Europe and their populations proc sql outobs 12 title Countries in Europe select Name Population format commal0 from sql countries where Continent Europe Retrieving Data froma Single Table A Retrieving Rows Based on a Comparison 31 Output 2 23 Using a Simple WHERE Clause Countries in Europe Population Albania 3 407 400 Andorra 64 634 Austria 8 033 746 Belarus 10 508 000 Belgium 10 162 614 Bosnia and Herzegovina 4 697 040 Bulgaria 8 887 111 Channel Islands 146 436 Croatia 4 744 505 Czech Republic 10 511 029 Denmark 5 239 356 England 49 293 170 Retrieving Rows Based on a Comparison You can use comparison operators in a WHERE clause to select different subsets of data The following table lists the comparison operators that you can use Table 2 2 Comparison Operators Symbol Mnemonic Definition Example Equivalent EQ equal to where Name Asia or or or lt gt NE not equal to where Name ne Africa gt GT greater than where Area gt 10000 lt L
87. e Eyre Central America Europe 3745000 El brus Caspian Sea North America 9390000 McKinley Death Valley Oceania R South America 6795000 Aconcagua Valdes Peninsul The FEATURES table contains statistics that describe various types of geographical features such as oceans lakes and mountains Output 1 12 FEATURES Partial Output FEATURES Location Height Length Aconcagua Mountain Argentina Amazon River South America Amur River Asia 5 Andaman Sea 218100 Angel Falls Waterfall Venezuela Annapurna Mountain Nepal Aral Sea Lake Asia 25300 Ararat Mountain Turkey Arctic Ocean 5105700 Atlantic Ocean 33420000 Baffin Island Arctic 183810 Baltic Sea 146500 Baykal Lake Russia 11780 Bering Sea 873000 Black Sea 196100 10 CHAPTER Retrieving Data from a Single Table Overview of the SELECT Statement 12 SELECT and FROM Clauses 12 WHERE Clause 13 ORDER BY Clause 13 GROUP BY Clause 13 HAVING Clause 13 Ordering the SELECT Statement 14 Sdecting Columns in a Table 14 Sdecting All Columns in a Table 14 Selecting Specific Columns in a Table 15 Eliminating Duplicate Rows from the Query Results 16 Determining the Structure of a Table 17 Creating New Columns 18 Adding Text to Output 18 Calculating Values 19 Assigning a Column Alias 20 Referring to a Calculated Column by Alias 21 Assigning Values Conditionally 21 Using a Simple CASE Expression 22 Using the CASE OPERAND Form 23 Replacing Mi
88. e IN Operator The IN operator enables you to include values within a list that you supply The following example uses the IN operator to include only the mountains and waterfalls in the SQL FEATURES table proc sql outobs 12 title World Mountains and Waterfalls select Name Type Height format commal0 from sql features where Type in Mountain Waterfall order by Height Output 2 26 Using the IN Operator World Mountains and Waterfalls Height Niagara Falls Waterfall Yosemite Waterfall Tugela Falls Waterfall Angel Falls Waterfall Kosciusko Mountain Pico Duarte Mountain Cook Mountain Matterhorn Mountain Wilhelm Mountain Mont Blanc Mountain Ararat Mountain Vinson Massif Mountain Using the IS MISSING Operator The IS MISSING operator enables you to identify rows that contain columns with missing values The following example selects countries that are not located on a continent that is these countries have a missing value in the Continent column Retrieving Data froma Single Table A Using Other Conditional Operators 35 proc sql title Countries with Missing Continents select Name Continent from sql countries where Continent is missing Note ThelS NULL operator is the same as and interchangeable with the IS MISSING operator Output 2 27 Using the IS MISSING Operator Countries with Missing Continents Continent Iceland Kalaallit Nunaat Using the B
89. e Macro Variables You can create one new macro variable per row from the result of a SELECT statement Use the keywords THROUGH THRU or a hyphen in an INTO clause to create a range of macro variables This example assigns values to macro variables from the first four rows of the Name column and the first three rows of the Population column The PUT statements write the results to the SAS log proc sql noprint select name Population into countryl country4 popl pop3 Programming with the SQL Procedure A _ Concatenating Values in Macro Variables 123 from sql countries Sput amp countryl amp popl Sput amp country2 amp pop2 Sput amp country3 amp pop3 Sput amp country4 Output 5 10 Creating Multiple Macro Variables 4 proc sql noprint 5 select name Population 6 into countryl country4 popl pop3 7 from sql countries 8 9 put amp countryl amp popl Afghanistan 17070323 10 put amp country2 amp pop2 Albania 3407400 11 put amp country3 amp pop3 Algeria 28171132 12 put amp country4 Andorra Concatenating Values in Macro Variables You can concatenate the values of one column into one macro variable This form is useful for building a list of variables or constants Use the SEPARATED BY keywords to specify a character to delimit the values in the macro variable This example assigns the first five values from the Name column of the COUNTRIES table to the amp countries
90. e SAS Macro Facility 120 Creating Macro Variables in PROC SQL 121 Creating Macro Variables from the First Row of a Query Result 121 Creating a Macro Variable from the Result of an Aggregate Function 122 Creating Multiple Macro Variables 122 Concatenating Values in Macro Variables 123 Defining Macros to CreateTables 124 Using the PROC SQL Automatic Macro Variables 126 Formatting PROC SQL Output Using the REPORT Procedure 127 Accessing a DBMS with SAS ACCESS Software 128 Using LibnameEngines 129 Queryinga DBMS Table 129 Creating a PROC SQL View of a DBMS Table 130 Displaying DBMS Data with the PROC SQL Pass Through Facility 131 Using the Output Ddivery System ODS with PROC SQL 132 Introduction This section shows you the PROC SQL options that are most useful in creating and debugging queries ways to improve query performance 112 Using PROC SQL Options to Create and Debug Queries A Chapter 5 what dictionary tables are and how they can be useful in gathering information about the elements of SAS o how to use PROC SQL with the SAS macro facility how to use PROC SQL with the REPORT procedure how to access DBMSs by using SAS ACCESS software how to format PROC SQL output by using the SAS Output Delivery System ODS Using PROC SQL Options to Create and Debug Queries PROC SQL supports options that can give you greater control over PROC SQL while you are developing a query T
91. e city nearest to each city in the USCITYCOORDS table The query must first select a city A compute the distance from city A to every other city and finally select the city with the minimum distance from city A This can be done by joining the USCITYCOORDS table to itself self join and then determining the closest distance between cities by using another self join in a subquery This is the formula to determine the distance between coordinates SORT Latitude2 Latitudel 2 Longitude2 Longitudel1 2 Although the results of this formula are not exactly accurate because of the distortions caused by the curvature of the earth they are accurate enough for this example to determine if one city is closer than another proc sql outobs 10 title Neighboring Cities select a City format 10 a State a Latitude Lat a Longitude Long b City format 10 b State b Latitude Lat b Longitude Long sqrt b latitude a latitude 2 b longitude a longitude 2 as dist format 6 1 from sql uscitycoords a sql uscitycoords b where a city ne b city and calculated dist select min sqrt d latitude c latitude 2 d longitude c longitude 2 from sql uscitycoords c sql uscitycoords d where c city a city and c state a state and d city ne c city order by a city 80 When to Use Joins and Subqueries A Chapter 3 Output 3 33 Combining a Join with a Subquery Neighboring Cities
92. e lastyr avgscore prioryr avgscore as Average format 8 If this value of AvgScore is missing then COALESCE returns the AvgScore column from LEAGUE2 for Average If this value of AvgScore is missing then COALESCE returns a missing value for Average 140 Computing Percentages within Subtotals A Chapter 6 In the case of the Name column the COALESCE function returns the value of FullName from LEAGUE 1 if it exists If not then the value is obtained from LEAGUE2 by using both the TRIM function and concatenation operators to combine the first name and last name columns trim prioryr firstname prioryr lastname Finally the table is ordered by Bowler The Bowler column is the result of the COALESCE function coalesce lastyr bowler prioryr amfno as Bowler Because the value is obtained from either table you cannot confidently order the output by either the value of Bowler in LEAGUE1 or the value of AMFNoin LEAGUE 2 but only by the value that results from the COALESCE function Computing Percentages within Subtotals Problem You want to analyze answers to a survey question to determine how each state responded Then you want to compute the percentage of each answer that a given state contributed For example what percentage of all NO responses came from North Carolina Background Information There is one input table called SURVEY that contains the following data the first ten rows are shown Output 6 7 Input Table
93. ed DUPLICATES that contains the following data 142 Solution A Chapter 6 Output 6 9 Sample Input Table for Counting Duplicates Sample Data for Counting Duplicates First LastName Name City State Smith John Richmond Virginia Johnson Mary Miami Florida Smith John Richmond Virginia Reed Sam Portland Oregon Davis Karen Chicago Illinois Davis Karen Chicago Illinois Thompson Jennifer Houston Texas Smith John Richmond Virginia Johnson Mary Miami Florida WODAANDU SF WN HE Solution Use the following PROC SQL code to count the duplicate rows proc sql title Duplicate Rows in DUPLICATES Table select count as Count from Duplicates group by LastName FirstName City State having count gt 1 Output 6 10 PROC SQL Output for Counting Duplicates Duplicate Rows in DUPLICATES Table LastName FirstName Chicago Illinois Miami Florida Richmond Virginia How It Works This solution uses a query that selects all columns counts all rows groups all of the rows in the Duplicates table by matching rows excludes the rows that have no duplicates Note You must include all of the columns in your table in the GROUP BY clause to find exact duplicates Practical Problem Solving with PROC SQL A Solution 143 Expanding Hierarchical Data in a Table Problem You want to generate an output column that shows a hierarchical relationship among rows in a tab
94. elect one a One one b two a Two two b from one two where one b two b Retrieving Data from Multiple Tables A Inner Joins 61 Output 3 9 Joining Tables That Contain Null Values Table One Table Two In order to specify only the nonmissing values for the join use the S NOT MISSING operator proc sql select one a One one b two a Two two b from one two where one b two b and one b is not missing Output 3 10 Results of Adding IS NOT MISSING to Joining Tables That Contain Null Values One and Two Joined 62 Inner Joins A Chapter 3 Creating Multicolumn Joins When a row is distinguished by a combination of values in more than one column use all the necessary columns in the join For example a city name could exist in more than one country To select the correct city you must specify both the city and country columns in the joining query s WHERE clause This example displays the latitude and longitude of capital cities by joining the COUNTRIES table with the WORLDCITYCOORDS table To minimize the number of rows in the example output the first part of the WHERE expression selects capitals with names that begin with the letter L from the COUNTRIES table proc sql title Coordinates of Capital Cities select Capital format 12 Name format 12 City format 12 Country format 12 Latitude Longitude from sql countries sql worldcitycoords where Capital
95. elects the Project Hours and Season columns 150 Conditionally Updating a Table A Chapter 6 uses a CASE expression to remap the seasons to the new column Sorter spring to 1 summer to 2 fall to 3 and winter to 4 select project hours season case when season spring then 1 when season summer then 2 when season fall then 3 when season winter then 4 else end as sorter from chores The first or outer SELECT statement in the query selects the Project Hours and Season columns orders rows by the values that were assigned to the seasons in the Sorter column that was created with the in line view Notice that the Sorter column is not included in the SELECT statement That causes a note to be written to the log indicating that you have used a column in an ORDER BY statement that does not appear in the SELECT statement In this case that is exactly what you wanted to do Conditionally Updating a Table Problem You want to update values in a column of a table based on the values of several other columns in the table Background Information There is one table called INCENTIVES that contains information on sales data There is one record for each salesperson that includes a department code a base pay rate and sales of two products gadgets and whatnots Output 6 19 Sample Input Data to Conditionally Change a Table Sales Data for Incentives
96. emperatures for European Countries title2 Degrees Celsius run quit Practical Problem Solving with PROC SQL A How It Works 159 Figure 6 1 PROC GMAP Output Average High Temperatures for European Countries Degrees Celsius High Gm 20 23 Ca 24 26 ae 27 32 How It Works The SAS system option FMTSEARCH tells SAS to search in the SASHELP MAPFMTS catalog for map related formats In the PROC SQL step a temporary table is created with Country High and ID columns The calculation round mean avgHigh 32 1 8 does the following 160 How It Works A Chapter 6 1 For countries that are represented by more than one city the mean of the cities average high temperatures is used for that country 2 That value is converted from degrees Fahrenheit to degrees Celsius 3 The result is rounded to the nearest degree The PUT function uses the GLCSMN format to convert the country name toa country code The INPUT function converts this country code which is returned by the PUT function as a character value into a numeric value that can be understood by the GMAP procedure See SAS Language Reference Dictionary for details about the PUT and INPUT functions The WHERE clause limits the output to European countries by checking the value of the Country column against the list of European countries that is returned by the in line view Also rows with missing values of ID are eliminated Missing ID values could be produ
97. enating creating macro variables from 121 creating tables from deleting duplicate rows R records See rows referential integrity constraints reflexive joins relational theory relations remerging summary statistics renaming columns REPORT procedure formatting SQL output 127 RESET statement resetting SQL procedure options resetting options 115 retrieving rows based on comparison multiple conditions rows that satisfy a condition simple WHERE clause right outer joins rows See also retrieving rows counting counting duplicates deleting deleting duplicates duplicates 116 inserting inserting with queries inserting with SET clause inserting with VALUES clause restricting row processing S SAS ACCESS LIBNAME statement accessing DBMS data SAS data files See tables SELECT statement expanding with FEEDBACK option SELECT clause SELECT statement SQL procedure FROM clause GROUP BY clause ordering clauses SELECT clause WHERE clause self joins SET clause inserting rows with set operators ALL keyword combining queries sort order creating 148 sorting data by calculated column by column by column position by multiple columns by unselected columns columns with missing values grouping and sort order sorting sequence sorting sequence SQL SQL procedure automatic macro variables 126 creating macro variables creating queries 112 cumulative time for debugging queries example tables Al f
98. eract with the DBMS data Querying a DBMS Table This example uses the SQL procedure to query the ORACLE table PAYROLL The PROC SQL query retrieves all job codes and provides a total salary amount for each job code libname mydblib oracle user user id password password path path name schema schema name proc sql select jobcode label Jobcode sum salary as total label Total for Group format dollar11 2 from mydblib payroll group by jobcode quit 130 Using Libname Engines A Chapter 5 Output 5 18 Output from Querying a DBMS Table Total for Group 232 148 00 253 433 00 447 790 00 230 537 00 228 002 00 498 076 00 296 875 00 210 161 00 157 149 00 543 264 00 879 252 00 21 009 00 128 162 00 249 492 00 671 499 00 476 155 00 Creating a PROC SQL View of a DBMS Table PROC SQL views are stored query expressions that read data values from their underlying files which can include SAS ACCESS views of DBMS data While DATA step views of DBMS data can only be used to read the data PROC SQL views of DBMS data can be used to update the underlying data if the following conditions are met the PROC SQL view is based on only one DBMS table or on a DBMS view that is based on only one DBMS table the PROC SQL view has no calculated fields The following example uses the LIBNAME statement to connect to an ORACLE database create a temporary PROC SQL view of the ORACLE table SCHEDULE an
99. erica Oceania You can eliminate the duplicate rows from the results by using the DISTINCT keyword in the SELECT clause Compare the previous example with the following query which uses the DISTINCT keyword to produce a single row of output for each continent that is in the SQL UNITEDSTATES table proc sql title Continents of the United States select distinct Continent from sql unitedstates Output 2 5 Eliminating Duplicate Values Continents of the United States Continent North America Oceania Note When you specify all of a table s columns in a SELECT clause with the DISTINCT keyword PROC SQL eliminates duplicate rows or rows in which the values in all of the columns match from the results Determining the Structure of a Table To obtain a list of all of the columns in a table and their attributes you can use the DESCRIBE TABLE statement The following example generates a description of the SQL UNITEDSTATES table PROC SQL writes the description to the log proc sql describe table sql unitedstates 18 Creating New Columns A Chapter 2 Output 2 6 Determining the Structure of a Table Partial Log NOTE SQL table SQL UNITEDSTATES was created like create table SQL UNITEDSTATES bufsize 12288 Name char 35 format 35 informat 35 label Name Capital char 35 format 35 informat 35 label Capital Population num format BEST8 informat BEST8 label Popula
100. ers that PROC SQL displayed in the previous example proc sql outobs 12 title U S Postal Codes select Postal code for Name label is Code label from sql postalcodes Output 2 8 Suppressing Column Headers in Output U S Postal Codes Postal Alabama Postal Alaska Postal American Samoa Postal Arizona Postal Arkansas Postal California Postal Colorado Postal Connecticut Postal Delaware Postal District Of Columbia Postal Florida Postal Georgia Calculating Values You can perform calculations with values that you retrieve from numeric columns The following example converts temperatures in the SQL WORLDTEMPS table from Fahrenheit to Celsius proc sql outobs 12 title Low Temperatures in Celsius select City AvgLow 32 5 9 format 4 1 from sql worldtemps Note This example uses the FORMAT attribute to modify the format of the calculated output See Specifying Column Attributes on page 24 for more information a 20 Assigning a Column Alias A Chapter 2 Output 2 9 Calculating Values Low Temperatures in Celsius Amsterdam Athens Auckland Bangkok Beijing Belgrade Berlin Bogota Bombay Bucharest Budapest oe 8 8 ew ew OPORPUWONWAHAANIODHDN Assigning a Column Alias By specifying a column alias you can assign a new name to any column within a PROC SQL query The new name must follow the rules for SAS names The na
101. es where name like R A note in the SAS log tells you how many rows were deleted Output 4 10 SAS Log for DELETE statement NOTE 1 row was deleted from SQL NEWCOUNTRIES Creating and Updating Tables and Views A Adding a Column 99 Note For PROC SQL tables SAS deletes the data in the rows but retains the space in the table CAUTION If you use the DELETE statement without a WHERE clause then all rows are deleted Altering Columns The ALTER TABLE statement adds modifies and deletes columns in existing tables You can use the ALTER TABLE statement with tables only it does not work with views A note appears in the SAS log that describes how you have modified the table Adding a Column The ADD clause adds a new column to an existing table You must specify the column name and data type You can also specify a length LENGTHS format FORMATS informat INFORMAT 3 and a label LABEL The following ALTER TABLE statement adds the numeric data column Density to the NEWCOUNTRIES table proc sql alter table sql newcountries add density num label Population Density format 6 2 title Population Density Table select name format 20 capital format 15 population format commal5 0 density from sql newcountries Output 4 11 Adding a New Column Population Density Table Population Capital Population Density Brasilia 160 310 357 Beijing 1 202 215 077 New Delhi 929 009 120 Indonesia
102. es of Countries select from sql oilprod as p sql oilrsrvs as r where p country r country Output 3 5 Abbreviating Column Names by Using Table Aliases Oil Production Reserves of Countries Barrels Country Country Barrels 1 400 000 9 200 000 000 Canada 2 500 000 Canada 7 000 000 000 China 3 000 000 China 25 000 000 000 Egypt 900 000 Egypt 4 000 000 000 Indonesia 1 500 000 Indonesia 5 000 000 000 Iran 4 000 000 Iran 90 000 000 000 Note that each table s Country column is displayed Typically once you have determined that a join is functioning correctly you include just one of the matching columns in the SELECT clause Retrieving Data from Multiple Tables A Inner Joins 59 Specifying the Order of Join Output You can order the output of joined tables by one or more columns from either table The next example s output is ordered in descending order by the BarrelsPerDay column It is not necessary to qualify BarrelsPerDay because the column exists only in the OILPROD table proc sql outobs 6 title Oil Production Reserves of Countries select p country barrelsperday Production barrels Reserves from sql oilprod p sql oilrsrvs r where p country r country order by barrelsperday desc Output 3 6 Ordering the Output of Joined Tables Oil Production Reserves of Countries Country Production Reserves Saudi Arabia 9 000 000 260 000 000 000 United States of America 8 000 000 30 000 0
103. esult EXCEPT 83 Producing Rows That Belong to Both Query Results INTERSECT 84 Concatenating Query Results OUTER UNION 85 Producing Rows from the First Query or the Second Query 86 56 Introduction A Chapter 3 Introduction This chapter shows you how to select data from more than one table by joining the tables together use subqueries to select data from one table based on data values from another table combine the results of more than one query by using set operators Note Unless otherwise noted the PROC SQL operations that are shown in this chapter apply to views as well as tables For more information about views see Chapter 4 Creating and Updating Tables and Views on page 89 a Selecting Data from More Than One Table by Using Joins The data that you need for a report could be located in more than one table In order to select the data from the tables join the tables in a query J oining tables enables you to select data from multiple tables as if the data were contained in one table J oins do not alter the original tables The most basic type of join is simply two tables that are listed in the FROM clause of a SELECT statement The following query joins the two tables that are shown in Output 3 1 and creates Output 3 2 proc sql title Table One and Table Two select from one two Output 3 1 Table One and Table Two Table Retrieving Data from Multiple T
104. esults After you have worked through the problem combining the queries into one query using in line views can be more efficient However under certain circumstances it is more efficient to use temporary tables You should try both methods to determine which is more efficient for your case Comparing Subqueries with Joins Many subqueries can also be expressed as joins In general a join is processed at least as efficiently as the subquery PROC SQL stores the result values for each unique Programming with theSQL Procedure A Accessing SAS System Information Using DICTIONARY Tables 117 set of correlation columns temporarily thereby eliminating the need to calculate the subquery more than once Using WHERE Expressions with Joins When joining tables you should specify a WHERE expression J oins without WHERE expressions are often time consuming to evaluate because of the multiplier effect of the Cartesian product For example joining two tables of 1 000 rows each without specifying a WHERE expression or an ON clause produces a result table with one million rows The SQL procedure executes and obtains the correct results on unbalanced WHERE expressions or ON join expressions in an equijoin as shown here but handles them inefficiently where tablel columnA table2 columnB 0 It is more efficient to rewrite this clause to balance the expression so that columns from each table are on alternate sides of the equals condition where ta
105. fines data that is stored elsewhere SAS data views can be created by the ACCESS and SQL procedures 166 Glossary WHERE clause in the SQL procedure the keyword WHERE followed by one or more WHERE expressions WHERE expression a type of SAS expression that specifies a condition for selecting observations for processing by a DATA step or a PROC step WHERE expressions can contain special operators that are not available in other SAS expressions WHERE expressions can appear in a WHERE statement a WHERE data set option a WHERE clause or a WHERE command Index Index 167 A aggregate functions creating macro variables from result of HAVING clause with table of unique values with ALL keyword set operators and 116 automatic macro variables SQL procedure 126 BETWEEN AND operators Boolean operators C calculated columns assigning column alias to referring to by alias sorting by Cartesian product cross joins 68 CASE expression CASE OPERAND form CASE OPERAND form COALESCE function in joins column alias assigning to calculated columns referring to calculated columns column attributes list of specifying column definitions creating tables from column headers suppressing column names qualifying columns adding 99 altering 99 assigning values conditionally calculating values creating deleting DICTIONARY COLUMNS 119 finding for reports grouping by multiple columns grouping by
106. g example selects all columns in the SQL USCITYCOORDS table which contains latitude and longitude values for U S cities proc sql outobs 12 title U S Cities with Their States and Coordinates select from sql uscitycoords Note The OUTOBS option limits the number of rows observations in the output OUTOBS is similar to the OBS data set option OUTOBS is used throughout this document to limit the number of rows that are displayed in examples a Note In the tables used in these examples latitude values that are south of the Equator are negative Longitude values that are west of the Prime Meridian are also negative A Retrieving Data froma SingleTable A Selecting Specific Columns in a Table 15 Output 2 1 Selecting All Columns in a Table U S Cities with Their States and Coordinates State Latitude Longitude Albuquerque Amarillo Anchorage Annapolis Atlanta Augusta Austin Baker Baltimore Bangor Baton Rouge Note When you select all columns PROC SQL displays the columns in the order in which they are stored in the table Selecting Specific Columns in a Table To select a specific column in a table list the name of the column in the SELECT clause The following example selects only the City column in the SQL USCITY COORDS table proc sql outobs 12 title Names of U S Cities select City from sql uscitycoords Output 2 2 Selecting One Column Names of U S Cities Albuquerque
107. g SAS System Information Using DICTIONARY Tables Using PROC SQL with the SAS Macro Facility Formatting PROC SQL Output Using the REPORT Procedure Accessing a DBMS with SAS ACCESS Software Using the Output Delivery System ODS with PROC SQL 132 Chapter 6 A Practical Problem Solving with PROC SQL 133 Overview Computing a Weighted Average Comparing Tables 136 Overlaying Missing Data Values Computing Percentages within Subtotals Counting Duplicate Rows in a Table Expanding Hierarchical Data in a Table Summarizing Data in Multiple Columns Creating a Summary Report Creating a Customized Sort Order Conditionally Updating a Table 150 Updating a Table with Values from Another Table 153 Creating and Using Macro Variables Using PROC SQL Tables in Other SAS Procedures Appendix 1 A Recommended Reading Recommended Reading Glossary index 167 CHAPTER Introduction to the SQL Procedure What IsSSQL 1 What Is the SQL Procedure 1 Terminology 2 Tables 2 Queis 2 Views 2 Null Values 3 Comparing PROC SQL with the SAS DATA Step 3 Notes about the Example Tables 4 What Is SQL Structured Query Language SQL is a standardized widely used language that retrieves and updates data in relational tables and databases A rdation is a mathematical concept that is similar to the mathematical concept of a set Relations are represented physically as two dimensional tables that are arranged in rows and columns Rela
108. h values from another table 153 temporary tables vs in line views truncated string comparison operators U union joins UNION operator unique values aggregate functions with counting counting all rows counting nonmissing values updating tables 96 conditionally errors values from another table 153 user defined macro variables 121 V VALIDATE statement syntax checking with validating queries VALUES clause inserting rows wtih variables See columns views See PROC SQL views W weighted averages 134 WHERE clause MEAN function with missing values with retrieving rows conditionally summarizing data vs HAVING clause WHERE expressions joins with 117 Your Turn If you have comments or suggestions about SAS 9 1 SQL Procedure User s Guide please send them to us on a photocopy of this page or send us electronic mail For comments about this book please return the photocopy to SAS Publishing SAS Campus Drive Cary NC 27513 email yourturn sas com For suggestions about the software please return the photocopy to SAS Institute Inc Technical Support Division SAS Campus Drive Cary NC 27513 email suggest sas com
109. he INOBS S OUTOBSsS and LOOPS options reduce query execution time by limiting the number of rows and number of iterations that PROC SQL processes The EXEC and VALIDATE statements enable you to quickly check the syntax of a query The FEEDBACK option displays the columns that are represented by a SELECT statement The PROC SQL STIMER option records and displays query execution time You can set an option initially in the PROC SQL statement and then use the RESET statement to change the same option s setting without ending the current PROC SQL step Here are the PROC SQL options that are most useful when you are writing and debugging queries Restricting Row Processing with the INOBS and OUTOBS Options When you are developing queries against large tables you can reduce the amount of time that it takes for the queries to run by reducing the number of rows that PROC SQL processes Subsetting the tables with WHERE statements is one way to do this Using the INOBS and the OUTOBS options are other ways The INOBS option restricts the number of rows that PROC SQL takes as input from any single source For example if you specify INOBS 10 then PROC SQL uses only 10 rows from any table or view that is specified in a FROM clause If you specify INOBS 10 and join two tables without using a WHERE clause then the resulting table Cartesian product contains a maximum of 100 rows The INOBS option is similar to the SAS
110. he department level that again is accomplished by means of a nested case expression update incentives set payrate case when whatnots gt 2000 then case when department in E2 M2 U2 then payrate 1 00 else payrate 0 50 end else payrate end Practical Problem Solving with PROC SQL A Solution 153 Updating a Table with Values from Another Table Problem You want to update the SQL UNITEDSTATES table with updated population data Background Information The SQL NEWPOP table contains updated population data for some of the U S states Output 6 21 Table with Updated Population Data Updated U S Georgia Washington Arizona Alabama Oklahoma Connecticut Iowa West Virginia Idaho Maine New Hampshire North Dakota Alaska Population Data Population 20 851 820 8 186 453 5 894 121 5 130 632 4 447 100 3 450 654 3 405 565 2 926 324 1 808 344 1 293 953 1 274 923 1 235 786 642 200 626 932 Solution Use the following PROC SQL code to update the population information for each state in the SQL UNITEDSTATES table proc sql title UNITEDSTATES update sql unitedstates as u set population select population from sql newpop as n where u name n state where u name in select state from sql newpop select Name format 17 Capital format 15 Population Area Continent format 13 Statehood format date9 from sql unitedstates 154 How It Works A Chapter 6
111. he following query uses the SUM function to list the total population of each continent The GROUP BY clause groups the countries by continent and the ORDER BY clause puts the continents in alphabetical order select Continent sum Population from sql countries group by Continent order by Continent HAVING Clause The HAVING clause works with the GROUP BY clause to restrict the groups in a query s results based on a given condition PROC SQL applies the HAVING condition after grouping the data and applying aggregate functions For example the following query restricts the groups to include only the continents of Asia and Europe select Continent sum Population from sql countries group by Continent 14 Selecting Columns in a Table A Chapter 2 having Continent in Asia Europe order by Continent Ordering the SELECT Statement When you construct a SELECT statement you must specify the clauses in the following order 1 SELECT FROM WHERE GROUP BY HAVING ORDER BY Note Only the SELECT and FROM clauses are required A The PROC SQL SELECT statement and its clauses are discussed in further detail in the following sections ao oa Aa WwW N Selecting Columns in a Table When you retrieve data from a table you can select one or more columns by using variations of the basic SELECT statement Selecting All Columns in a Table Use an asterisk in the SELECT clause to select all columns in a table The followin
112. ies describe table sql newcountries Creating and Updating Tables and Views A Inserting Rows with the SET Clause 93 Output 4 4 SAS Log for DESCRIBE TABLE Statement for NEWCOUNTRIES NOTE SQL table SQL NEWCOUNTRIES was created like create table SQL NEWCOUNTRIES bufsize 16384 Name char 35 format 35 informat 35 Capital char 35 format 35 informat 35 label Capital Population num format BEST8 informat BEST8 label Population Area num format BEST8 informat BEST8 Continent char 35 format 35 informat 35 label Continent UNDate num format YEAR4 i Copying an Existing Table A quick way to copy a table using PROC SQL is to use the CREATE TABLE statement with a query that returns an entire table This example creates COUNTRIES1 which contains a copy of all the columns and rows that are in COUNTRIES create table countriesl as select from sql countries Using Data Set Options You can use SAS data set options in the CREATE TABLE statement The following CREATE TABLE statement creates COUNTRIES2 from COUNTRIES The DROP option deletes the UN Date column and UN Date does not become part of COUNTRIES2 create table countries2 as select from sql countries drop UNDate Inserting Rows into Tables Use the INSERT statement to insert data values into tables The INSERT statement first adds a new row to an existing table then inserts the values that you specify into
113. in is functionally the same as a Cartesian product See Cartesian product join distinct a keyword that causes the SQL procedure to remove duplicate rows from the output equijoin a kind of join in the SQL procedure When two tables are joined for example the value of a column in the first table must equal the value of the column in the second table in an SQL expression See also join 164 Glossary group in the SQL procedure a set of rows that all have the same combination of values for the columns that are specified in a GROUP BY clause in line view a query expression that is nested in the SQL procedure s FROM clause It can takea table alias but cannot be named permanently It can be referenced only in the query or statement in which it is defined index in SAS software a component of a SAS data set that contains the data values of a key variable or variables paired with a location identifier for the observation that contains the value The value identifier pairs are ordered in a structure that enables SAS to search by a value of a variable See also composite index simple index inner join See join integrity constraints a set of data validation rules that you can specify in order to restrict the data values that can be stored for a variable in a SAS data file Integrity constraints help you preserve the validity and consistency of your data join to combine data from two or more tables into a single result table join
114. ion expands a SELECT ALL statement into the list of columns it represents Any PROC SQL view is expanded into the underlying query and all expressions are enclosed in parentheses to indicate their order of evaluation The FEEDBACK option also displays the resolved values of macros and macro variables For example the following query is expanded in the SAS log proc sql feedback select from sql countries Output 5 1 Expanded SELECT Statement NOTE Statement transforms to select COUNTRIES Name COUNTRIES Capital COUNTRIES Population COUNTRIES Area COUNTRIES Continent COUNTRIES UNDate from SQL COUNTRIES 114 Timing PROC SQL with the STIMER Option A Chapter 5 Timing PROC SQL with the STIMER Option Certain operations can be accomplished in more than one way For example there is often a join equivalent to a subquery Although factors such as readability and maintenance come into consideration generally you will choose the query that runs fastest The SAS system option STIMER shows you the cumulative time for an entire procedure The PROC SQL STIMER option shows you how fast the individual statements in a PROC SQL step are running This enables you to optimize your query Note For the PROC SQL STIMER option to work the SAS system option STIMER must also be specified This example compares the execution times of two queries Both queries list the names and populations of states in the UNITEDSTATES table th
115. jects you cannot insert rows or columns alter column attributes or add integrity constraints to them For DICTIONARY TABLES and SASHELP VTABLE if a table is read protected with a password then the only information that is listed for that table is the library name member name member type and type of password protection All other information is set to missing When querying a DICTIONARY table SAS launches a discovery process that gathers information that is pertinent to that table Depending on the DICTIONARY table that is being queried this discovery process can search libraries open tables and execute views Unlike other SAS procedures and the DATA step PROC SQL can mitigate this process by optimizing the query before the discovery process is launched Therefore although it is possible to access DICTIONARY table information with SAS procedures or the DATA step by using the SASHELP views it is often more efficient to use PROC SQL instead SAS does not maintain DICTIONARY table information between queries Each query of a DICTIONARY table launches a new discovery process Therefore if you are querying the same DICTIONARY table several times in a row then you can get even better performance by creating a temporary SAS data set by using the DATA step SET statement or PROC SQL CREATE TABLE AS statement that includes the information that you want and running your query against that data set
116. l newcontinents Specifying In Line Views In some cases you may want to use a query in a FROM clause instead of a table or view You could create a view and refer toit in your FROM clause but that process involves two steps To save the extra step specify the view in line enclosed in parentheses in the FROM clause An in line view is a query that appears in the FROM clause An in line view produces a table internally that the outer query uses to select data Unlike views that are created with the CREATE VIEW statement in line views are not assigned names and cannot be referenced in other queries or SAS procedures as if they were tables An in line view can be referenced only in the query in which it is defined In the following query the populations of all Caribbean and Central American countries are summed in an in line query The WHERE clause compares the sum with the populations of individual countries Only countries that have a population greater than the sum of Caribbean and Central American populations are displayed proc sql title Countries With Population GT Caribbean Countries select w Name w Population format commal5 c TotCarib from select sum population as TotCarib format commal5 from sql countries where continent Central America and Caribbean as c sql countries as w where w population gt c TotCarib Creating and Updating Tables and Views A Using SQL Procedure Views in SAS Software 109 Output 4 19
117. lation gt conditions must be true 3000009 or or OR specifies that either the Population lt previous or the following 1000000 or condition must be true Population z 5000000 or or a NOT specifies that the Continent lt gt following condition must Africa be false The following example uses two expressions to include only countries that are in Africa and that have a population greater than 20 000 000 people proc sql title Countries in Africa with Populations over 20 000 000 select Name Population format commal0 from sql countries where Continent Africa and Population gt 20000000 order by Population desc Retrieving Data froma Single Table A Using Other Conditional Operators 33 Output 2 25 Retrieving Rows That Satisfy Multiple Conditions Countries in Africa with Populations over 20 000 000 Population Nigeria 99 062 003 Egypt 59 912 259 Ethiopia 59 291 170 South Africa 44 365 873 Congo Democratic Republic of 43 106 529 Sudan 29 711 229 Morocco 28 841 705 Kenya 28 520 558 Tanzania 28 263 033 Algeria 28 171 132 Uganda 20 055 584 Note You can use parentheses to improve the readability of WHERE clauses that contain multiple or compound expressions such as the following where Continent Africa and Population gt 2000000 or Continent Asia and Population gt 1000000 Using Other Conditional Operators You can use many different conditional operators i
118. ld check for and exclude missing values from your results to avoid unexpected results For example if there were an employee with a blank supervisor ID number and an employee with a blank ID then they would produce an erroneous match in the results a Summarizing Data in Multiple Columns Problem You want to produce a grand total of multiple columns in a table Practical Problem Solving with PROC SQL A How It Works 145 Background Information There is one input table called SALES that contains the following data Output 6 13 Sample Input Table for Summarizing Data from Multiple Columns Sample Data for Summarizing Data from Multiple Columns Obs Salesperson January February March Smith 1000 650 800 Johnson 0 900 900 Reed 1200 700 850 Davis 1050 900 1000 Thompson 750 850 1000 Peterson 900 600 500 Jones 800 900 1200 Murphy 700 800 700 Garcia 400 1200 1150 WODANHDU PWN E You want to create output that shows the total sales for each month and the total sales for all three months Solution Use the following PROC SQL code to produce the monthly totals and grand total proc sql title Total First Quarter Sales select sum January as JanTotal sum February as FebTotal sum March as MarTotal sum calculated JanTotal calculated FebTotal calculated MarTotal as GrandTotal format dollarl0 from Sales Output 6 14 PROC SQL Output for Summarizing Data from Multiple Columns Total First Quarter Sale
119. le Background Information There is one input table called EMPLOYEES that contains the following data Output 6 11 Sample Input Table for Expanding a Hierarchy Sample Data for Expanding a Hierarchy First LastName Name Supervisor O Log a Smith John 1002 Johnson Mary None Reed Sam None Davis Karen 1003 Thompson Jennifer 1002 Peterson George 1002 Jones Sue 1003 Murphy Janice 1003 Garcia Joe 1002 1 2 3 4 5 6 7 8 9 You want to create output that shows the full name and ID number of each employee who has a supervisor along with the full name and ID number of that employee s supervisor Solution Use the following PROC SQL code to expand the data proc sql title Expanded Employee and Supervisor Data select A ID label Employee ID trim A FirstName A LastName label Employee Name B ID label Supervisor ID trim B FirstName B LastName label Supervisor Name from Employees A Employees B where A Supervisor B ID and A Supervisor is not missing 144 How It Works A Chapter 6 Output 6 12 PROC SQL Output for Expanding a Hierarchy Expanded Employee and Supervisor Data Employee Supervisor Employee Name Supervisor Name John Smith Mary Johnson Jennifer Thompson Mary Johnson George Peterson Mary Johnson Joe Garcia Mary Johnson Karen Davis Sam Reed Sue Jones Sam Reed Janice Murphy Sam Reed How It Works This solution uses a self join reflexive j
120. le use the DROP INDEX statement The following DROP INDEX statement deletes the index Places from NEWCOUNTRIES proc sql drop index places from sql newcountries Deleting a Table To delete a PROC SQL table use the DROP TABLE statement proc sql drop table sql newcountries Using SQL Procedure Tables in SAS Software Because PROC SQL tables are SAS data files you can use them as input toa DATA step or to other SAS procedures For example the following PROC MEANS step calculates the mean for Area for all countries in COUNTRIES proc means data sql countries mean maxdec 2 title Mean Area for All Countries var area run Output 4 15 Using a PROC SQL Table in PROC MEANS Mean Area for All Countries The MEANS Procedure Analysis Variable Area Creating and Using Integrity Constraints in a Table Integrity constraints are rules that you specify to guarantee the accuracy completeness or consistency of data in tables All integrity constraints are enforced when you insert delete or alter data values in the columns of a table for which integrity 104 Creating and Using Integrity Constraints ina Table A Chapter 4 constraints have been defined Before a constraint is added to a table that contains existing data all the data is checked to determine that it satisfies the constraints You can use geneal integrity constraints to verify that data in a column is nonmissing unique both nonmissing a
121. le rivers The average is therefore incorrect Compare the result from the previous example with the following query which includes a CASE expression to handle missing values corrected output proc sql title Average Length of Angel Falls Amazon and Nile Rivers select Name Length case when Length is missing then 0 else Length end as NewLength avg calculated NewLength as AvgLength from sql features where Name in Angel Falls Amazon Nile Output 2 41 Finding Errors Caused by Missing Values Corrected Output Average Length of Angel Falls Amazon and Nile Rivers Length NewLength AvgLength Angel Falls Nile Grouping Data The GROUP BY clause groups data by a specified column or columns When you use a GROUP BY clause you also use an aggregate function in the SELECT clause or ina HAVING clause to instruct PROC SQL in how to summarize the data for each group PROC SQL calculates the aggregate function separately for each group 46 Grouping by One Column A Chapter 2 Grouping by One Column The following example sums the populations of all countries to find the total population of each continent proc sql title Total Populations of World Continents select Continent sum Population format commal4 as TotalPopulation from sql countries where Continent is not missing group by Continent Note Countries for which a continent is not listed are excluded by the WHERE cla
122. les from other views or from SAS ACCESS views Other SAS procedures and the DATA step can use a PROC SQL view as they would any SAS data file For more information about views see Chapter 4 Creating and Updating Tables and Views on page 89 Introduction to the SQL Procedure A Comparing PROC SQL with the SAS DATA Step 3 Null Values According to the ANSI Standard for SQL a missing value is called a null value It is not the same as a blank or zero value However to be compatible with the rest of SAS PROC SQL treats missing values the same as blanks or zero values and considers all three to be null values This important concept comes up in several places in this document Comparing PROC SQL with the SAS DATA Step PROC SQL can perform some of the operations that are provided by the DATA step and the PRINT SORT and SUMMARY procedures The following query displays the total population of all the large countries countries with population greater than 1 million on each continent proc sql title Population of Large Countries Grouped by Continent select Continent sum Population as TotPop format commal5 from sql countries where Population gt 1000000 group by Continent order by TotPop quit Output 1 1 Sample SQL Output Population of Large Countries Grouped by Continent Continent TotPop Oceania 3 422 548 Australia 18 255 944 Central America and Caribbean 65 283 910 South America 316 303 397 North America
123. lest value NMISS number of missing values probability of a greater absolute value of PRT Student s t RANGE range of values STD standard deviation STDERR standard error of the mean SUM sum of values SUMWGT sum of the WEIGHT variable values Student s t value for testing the hypothesis that T the population mean is zero USS uncorrected sum of squares VAR variance 1 Inthe SQL procedure each row has a weight of 1 Note You can use most other SAS functions in PROC SQL but they are not treated as aggregate functions a 40 Summarizing Data with a WHERE Clause A Chapter 2 Summarizing Data with a WHERE Clause You can use aggregate or summary functions by using a WHERE clause For a complete list of the aggregate functions that you can use see Table 2 6 on page 39 Using the MEAN Function with a WHERE Clause This example uses the MEAN function to find the annual mean temperature for each country in the SQL WORLDTEMPS table The WHERE clause returns countries with a mean temperature that is greater than 75 degrees proc sql outobs 12 title Mean Temperatures for World Cities select City Country mean AvgHigh AvgLow as MeanTemp from sql worldtemps where calculated MeanTemp gt 75 order by MeanTemp desc Note You must use the CALCULATED keyword to reference the calculated column Output 2 33 Using the MEAN Function with a WHERE Clause Mean Temperatures for World Cities Country MeanTemp Nigeria Manil
124. lete and append data as well as create and drop DBMS tables by using normal SAS syntax The LIBNAME statement provides more control over DBMS operations such as locking spooling and data type conversion through the many LIBNAME options and data set options The LIBNAME engine optimizes the processing of joins and WHERE clauses by passing these operations directly to the DBMS to take advantage of the indexing and other processing capabilities of your DBMS An exception to this recommendation occurs when you need to use SQL that does not conform to the ANSI standard The SAS ACCESS LIBNAME statement accepts only ANSI standard SQL but the PROC SQL Pass Through Facility accepts all the extensions to SQL that are provided by your DBMS Another advantage of this access method is that Pass Through Facility statements enable the DBMS to optimize queries when the queries have summary functions such as AVG and COUNT GROUP BY clauses or columns that were created by expressions such as the COMPUTED function Examples of both of these methods of interacting with DBMS data are presented below See SAS ACCESS for Relational Databases Reference for comprehensive information about SAS ACCESS software Using Libname Engines Use the LIBNAME statement to read from and write toa DBMS object as if it were a SAS data set After connecting to a DBMS table or by view using the LIBNAME statement you can use PROC SQL to int
125. like L and Capital City London occurs once as a capital city in the COUNTRIES table However in WORLDCITY COORDS London is found twice as a city in England and again as a city in Canada Specifying only capital City in the WHERE expression yields the following incorrect output Output 3 11 Selecting Capital City Coordinates incorrect output Coordinates of Capital Cities Capital i Country Latitude Longitude Bolivia Bolivia London England Canada Lima Peru i Peru Lisbon Portugal i Portugal London England London England Notice in the output that the inner join incorrectly matches London England to both London Canada and London England By also joining the country name columns together COUNTRIES Name to WORLDCITYCOORDS Country the rows match correctly proc sql title Coordinates of Capital Cities select Capital format 12 Name format 12 City format 12 Country format 12 latitude longitude from sql countries sql worldcitycoords where Capital like L and Capital City and Name Country Retrieving Data from Multiple Tables A Inner Joins 63 Output 3 12 Selecting Capital City Coordinates correct output Coordinates of Capital Cities Capital i Country Latitude Longitude Portugal i Portugal England England Selecting Data from More Than Two Tables The data that you need could be located in more than two tables For example if you want to show the
126. lues in data sorting and without summarizing 168 Index H HAVING clause aggregate functions with filtering grouped data vs WHERE clause hierarchical data expanding in tables 143 host variable references 121 l in line views vs temporary tables 116 IN operator indexes creating deleting query performance and 115 INNER JOIN keywords inner joins comparison operators for creating with INNER JOIN keywords data from multiple tables multicolumn joins null values and order of output reflexive joins self joins showing relationships within a table table aliases INOBS option restricting row processing inserting rows with queries with SET clause with VALUES clause integrity constraints INTERSECT operator IS MISSING operator IS NOT MISSING operator iterations limiting J joins Cartesian product COALESCE function in comparing with subqueries cross joins 68 inner joins natural joins outer joins reducing size of results union joins vs match merges vs subqueries when to use WHERE expressions with L left outer joins libname engines accessing DBMS data 129 querying DBMS tables libnames embedding in PROC SQL views LIKE operator logical operators LOOPS option limiting iterations macro facility SQL procedure with 120 macro variables 121 concatenating values in creating creating from aggregate function results creating from query results creating in SQL procedure 12
127. lumn with a new value that you specify For every row that the query processes the COALESCE function checks each of its arguments until it finds a nonmissing value then returns that value If all of the arguments are missing values then the COALESCE function returns a missing value For example the following query replaces missing values in the LowPoint column in the SQL CONTINENTS table with the words Not Available proc sql title Continental Low Points select Name coalesce LowPoint Not Available as LowPoint from sql continents Output 2 14 Using the COALESCE Function to Replace Missing Values Continental Low Points LowPoint Lake Assal Antarctica Not Available Asia Dead Sea Australia Lake Eyre Central America and Caribbean Not Available Europe Caspian Sea North America Death Valley Oceania Not Available South America Valdes Peninsula The following CASE expression shows another way to perform the same replacement of missing values however the COALESCE function requires fewer lines of code to obtain the same results proc sql title Continental Low Points select Name case when LowPoint is missing then Not Available else Lowpoint end as LowPoint from sql continents Specifying Column Attributes You can specify the following column attributes which determine how SAS data is displayed FORMAT INFORMAT LABEL LENGTH If you do not specify these attributes then PROC SQL
128. me persists only for that query When you use an alias to name a column you can use the alias to reference the column later in the query PROC SQL uses the alias as the column heading in output The following example assigns an alias of LowCelsius to the calculated column from the previous example proc sql outobs 12 title Low Temperatures in Celsius select City AvgLow 32 5 9 as LowCelsius format 4 1 from sql worldtemps Output 2 10 Assigning a Column Alias to a Calculated Column Low Temperatures in Celsius LowCelsius Amsterdam Athens Auckland Bangkok Beijing Belgrade Berlin Bogota Bombay Bucharest Budapest eo 8 l OPOrFPUWONWAANOHDN Retrieving Data froma SingleTable A Assigning Values Conditionally 21 Referring to a Calculated Column by Alias When you use a column alias to refer to a calculated value you must use the CALCULATED keyword with the alias to inform PROC SQL that the value is calculated within the query The following example uses two calculated values LowC and HighC to calculate a third value Range proc sql outobs 12 title Range of High and Low Temperatures in Celsius select City AvgHigh 32 5 9 as HighC format 5 1 AvgLow 32 5 9 as LowC format 5 1 calculated HighC calculated LowC as Range format 4 1 from sql worldtemps Note You can specify a calculated column only in a SELECT clause or a WHERE clause A Ou
129. n a WHERE clause The following table lists other operators that you can use Table 2 4 Conditional Operators Operator Definition Example ANY specifies that at least one where Population gt any select of a set of values obtained Population from sql countries from a subquery must satisfy a given condition ALL specifies that all of the where Population gt all select values obtained froma Population from sql countries subquery must satisfy a given condition BETWEEN AND tests for values within an where Population between 1000000 inclusive range and 5000000 CONTAINS tests for values that where Continent contains contain a specified string America EXISTS tests for the existence ofa where exists select from set of values obtained sql oilprod from a subquery 34 Using Other Conditional Operators A Chapter 2 Operator Definition Example IN tests for values that match where Name in Africa one of a list of values Asia IS NULL or IS MISSING tests for missing values where Population is missing LIKE tests for values that g where Continent like A match a specified pattern k tests for values that sound where Name Tiland like a specified value 1 You can use a percent symbol to match any number of characters You can use an underscore _ to match one arbitrary character Note All of these operators can be prefixed with the NOT operator to form a negative condition A Using th
130. n operator The subquery must return only one value or else the query fails and an error message is printed to the log This query uses a subquery in its WHERE clause to select U S states that have a population greater than Belgium The subquery is evaluated first and then it returns the population of Belgium to the outer query proc sql title U S States with Population Greater than Belgium select Name State population format commal0 from sql unitedstates where population gt select population from sql countries where name Belgium Internally this is what the query looks like after the subquery has executed proc sql title U S States with Population Greater than Belgium select Name State population format commal0 from sql unitedstates where population gt 10162614 The outer query lists the states whose populations are greater than the population of Belgium Output 3 27 Single Value Subquery States with Population Greater than Belgium Population California 31 518 948 Florida 13 814 408 Illinois 11 813 091 New York 18 377 334 Ohio 11 200 790 Pennsylvania 12 167 566 Texas 18 209 994 Multiple Value Subqueries A multiple value subquery can return more than one value from one column It is used in a WHERE or HAVING expression that contains IN or a comparison operator that is modified by ANY or ALL This example displays the populations of oil producing countries The
131. n operators use standard inner or outer join syntax Using the Coalesce Function in Joins As you can see from the previous examples the nonmatching rows in outer joins contain missing values By using the COALESCE function you can overlay columns so that only the row from the table that contains data is listed Recall that COALESCE takes a list of columns as its arguments and returns the first nonmissing value that it encounters This example adds the COALESCE function to the previous example to overlay the COUNTRIES Capital WORLDCITYCOORDS City and COUNTRIES Name columns COUNTRIES Name is supplied as an argument to COALESCE because some islands do not have capitals proc sql outobs 10 title Populations and or Coordinates of World Cities select coalesce Capital City Name format 20 City coalesce Name Country format 20 Country Population Latitude Longitude from sql countries full join sql worldcitycoords on Capital City and Name Country Retrieving Data from Multiple Tables A Comparing DATA Step Match Merges with PROC SQL Joins 71 Output 3 22 Using COALESCE in Full Outer Join of COUNTRIES and WORLDCITYCOORDS Populations and or Coordinates of World Cities Country Population Latitude Longitude Channel Islands Channel Islands 146436 Abadan Iran Abu Dhabi United Arab Emirates 2818628 Abuja Nigeria 99062003 Acapulco Mexico Accra Ghana 17395511 Adana Turkey 7 Addis Ababa Ethiopia 5929
132. n which you are joining the tables However you can continue to use the WHERE clause to subset the query result Including Nonmatching Rows with the Left Outer Join A left outer join lists matching rows and rows from the left hand table the first table listed in the FROM clause that do not match any row in the right hand table A left join is specified with the keywords LEFT J OIN and ON For example to list the coordinates of the capitals of international cities join the COUNTRIES table which contains capitals with the WORLDCITYCOORDS table which contains cities coordinates by using a left join The left join lists all capitals regardless of whether the cities exist in WORLDCITYCOORDS Using an inner join would list only capital cities for which there is a matching city in WORLDCITYCOORDS proc sql outobs 10 title Coordinates of Capital Cities select Capital format 20 Name Country format 20 Latitude Longitude from sql countries a left join sql worldcitycoords b on a Capital b City and a Name b Country order by Capital 66 Outer Joins A Chapter 3 Output 3 15 Left Join of COUNTRIES and WORLDCITYCOORDS Coordinates of Capital Cities Capital Country Latitude Longitude Channel Islands Abu Dhabi United Arab Emirates Abuja Nigeria Accra Ghana Addis Ababa Ethiopia Algiers Algeria Almaty Kazakhstan Amman Jordan Amsterdam Netherlands Andorra la Vella Andorra Including Nonmatching R
133. nd unique within a specified set or range of values You can also apply referential integrity constraints to link the values in a specified column called a primary key of one table to values of a specified column in another table When linked toa primary key a column in the second table is called a fordgn key When you define referential constraints you can also choose what action occurs when a value in the primary key is updated or deleted You can prevent the primary key value from being updated or deleted when matching values exist in the foreign key This is the default You can allow updates and deletions to the primary key values By default any affected foreign key values are changed to missing values However you can specify the CASCADE option to update foreign key values instead Currently the CASCADE option does not apply to deletions You can choose separate actions for updates and for deletions Note Integrity constraints cannot be defined for views A The following example creates integrity constraints for a table MYSTATES and another table USPOSTAL The constraints are as follows state name must be unique and nonmissing in both tables population must be greater than 0 continent must be either North America or Oceania proc sql create table sql mystates state char 15 population num continent char 15 contraint specifications const
134. neau Phoenix Little Rock Sacramento Denver Hartford Dover Washington Tallahassee Atlanta Honolulu Boise Springfield Indianapolis UNITEDSTATES Population 4227437 604929 3974962 2447996 31518948 3601298 3309742 707232 612907 13814408 6985572 1183198 1109980 11813091 5769553 656400 114000 53200 163700 104100 5500 2500 100 65800 59400 10900 83600 57900 36400 Continent America America America America America America America America America America America Oceania North America North America North America Statehood 14DEC1819 03JAN1959 14FEB1912 15JUN1836 09SEP1850 01AUG1876 09JAN1788 07DEC1787 21FEB1871 03MAR1845 02JAN1788 21AUG1959 03JUL1890 03DEC1818 11DEC1816 The POSTALCODES table contains postal code abbreviations Introduction to the SQL Procedure Output 1 7 POSTALCODES Partial Output POSTALCODES Alabama Alaska American Samoa Arizona Arkansas California Colorado Connecticut Delaware District Of Columbia Florida Georgia Guam Hawaii Idaho A Notes about the Example Tables 7 The WORLDTEMPS table contains average high and low temperatures from various international cities Output 1 8 WORLDTEMPS Partial Output Amsterdam Athens Auckland Bangkok Beijing Belgrade Berlin Bogota Bombay Bucharest Budapest Buenos Ai Cairo Calcutta WORLDTEMPS Country Netherlands Greece New Zealand Thailand
135. need to change the file specifications A Note Some browsers require an extension of HTM or HTML on the filename ods html body odsout htm proc sql outobs 12 title U S Cities with Their States and Coordinates select from sql uscitycoords ods html close Display 5 1 ODS HTML Output Coordinates of U S Cities State Latitude Longitude Albany NY Albuquerque NM Amarillo TA Anchorage AK Annapolis MD Atlanta GA Augusta ME Austin TX Baker OR Baltimore MD Bangor ME Baton Rouge LA CHAPTER Practical Problem Solving with PROC SQL Overview 134 Computing a Weighted Average 134 Problem 134 Background Information 134 Solution 135 How It Works 136 Comparing Tables 136 Problen 136 Background Information 136 Solution 137 How It Works 138 Overlaying Missing Data Values 138 Problen 138 Background Information 138 Solution 139 How It Works 139 Computing Percentages within Subtotals 140 Problen 140 Background Information 140 Solution 140 How It Works 141 Counting Duplicate Rows in a Table 141 Problem 141 Background Information 141 Solution 142 How It Works 142 Expanding Hierarchical Data in a Table 143 Problem 143 Background Information 143 Solution 143 How It Works 144 Summarizing Data in Multiple Columns 144 Problem 144 Background Information 145 Solution 145 How It Works 145 Creating a Summary Report 146 Problem 146 Background Information 146 Solution 146 How It Works 147 133 134
136. ng Values 49 Filtering Grouped Data 50 Using a Simple HAVING Clause 50 Choosing Between HAVING and WHERE 51 Using HAVING with Aggregate Functions 51 Validating a Query 52 Overview of the SELECT Statement This chapter shows you how to retrieve data from a single table by using the SELECT statement validate the correctness of a SELECT statement by using the VALIDATE statement With the SELECT statement you can retrieve data from tables or data that is described by SAS data views Note The examples in this chapter retrieve data from tables that are SAS data sets However you can use all of the operations that are described here with SAS data views A The SELECT statement is the primary tool of PROC SQL You use it to identify retrieve and manipulate columns of data from a table You can also use several optional clauses within the SELECT statement to place restrictions on a query SELECT and FROM Clauses The following simple SELECT statement is sufficient to produce a useful result select Name from sql countries The SELECT statement must contain a SELECT clause and a FROM clause both of which are required in a PROC SQL query This SELECT statement contains Retrieving Data froma SingleTable A Overview of the SELECT Statement 13 a SELECT clause that lists the Name column a FROM clause that lists the table in which the Name column resides WHERE Clause The WHERE clause enables you
137. ntinent TotalPopulation Africa 710 529 592 Asia 3 381 858 879 Central America and Caribbean 66 815 930 Europe 813 481 724 Validating a Query The VALIDATE statement enables you to check the syntax of a query for correctness without submitting it to PROC SQL PROC SQL displays a message in the log to indicate whether the syntax is correct proc sql validate select Name Statehood from sql unitedstates where Statehood 1t 01Jan1800 d Output 2 51 Validating a Query Partial Log 3 proc sql 4 validate 5 select Name Statehood 6 from sql unitedstates a where Statehood lt 01Jan1800 d NOTE PROC SQL statement has valid syntax The following example shows an invalid query and the corresponding log message proc sql validate select Name Statehood from sql unitedstates where 1t 01Jan1800 d Retrieving Data froma SingleTable A Validating a Query 53 Output 2 52 Validating an Invalid Query Partial Log proc sql validate select Name Statehood from sql unitedstates where lt 01Jan1800 d ERROR 22 322 Syntax error expecting one of the following amp lt lt lt gt gt gt AND CONTAINS EQ GE GROUP GT HAVING LE LIKE LT NE OR ORDER ERROR 76 322 Syntax error statement will be ignored NOTE The SAS System stopped processing this step because of errors 54 55 CHAPTER Retrieving Da
138. o amp n data amp amp type amp i drop type set sql features if type amp amp type amp i run send mend makeds makeds 156 Solution A Chapter 6 Output 6 24 Log 240 proc sql noprint 241 select count distinct type 242 into n 243 from sql features 244 select distinct type 245 into typel type left amp n 246 from sql features 247 quit NOTE PROCEDURE SQL used Total process time real time 0 04 seconds cpu time 0 03 seconds macro makeds do i 1 to amp n data amp amp type amp i drop type set sql features if type amp amp type amp i run send mend makeds smakeds There were 74 observations read from the data set SQL FEATURES The data set WORK DESERT has 7 observations and 6 variables DATA statement used Total process time real time 1 14 seconds cpu time 0 41 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK ISLAND has 6 observations and 6 variables DATA statement used Total process time real time 0 02 seconds cpu time 0 00 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK LAKE has 10 observations and 6 variables DATA statement used Total process time real time 0 01 seconds cpu time 0 01 seconds There were 74 observations read from the data set SQL FEATURES The data set WORK MOUNTAIN has 18 observations and 6 variables DATA statement used Total process time real time 0 02
139. og For more information about the PUT statement and the SAS macro facility see SAS Macro Language Reference proc sql noprint select from sql countries sput SQLOBS amp sqlobs SQLOOPS amp sqloops SQLRC amp sqlrc Output 5 15 Using the PROC SQL Automatic Macro Variables SQLOBS 1 SQLOOPS 11 SQLRC 0 Programming with the SQL Procedure A Formatting PROC SQL Output Using the REPORT Procedure 127 Notice that the value of SQLOBS is 1 When the NOPRINT option is used and no table or macro variables are created SQLOBS returns a value of 1 because only one row is processed Note You can use the AUTOMATIC_ option in the PUT statement to list the values of all automatic macro variables The list depends on the SAS products that are installed at your site a Formatting PROC SQL Output Using the REPORT Procedure SQL provides limited output formatting capabilities Some SQL vendors add output formatting statements to their products to address these limitations SAS has reporting tools that enhance the appearance of PROC SQL output For example SQL cannot display the first occurrence only of a repeating value ina column in its output The following example lists cities in the USCITY COORDS table Notice the repeating values in the State column proc sql outobs 10 title US Cities select State City latitude Longitude from sql uscitycoords order by state Output 5 16 USCITYCOORDS Table Showing Repe
140. oin to match employees and their supervisors The SELECT clause assigns aliases of A and B to two instances of the same table and retrieves data from each instance From instance A the SELECT clause selects the ID column and assigns it a label of Employee ID selects and concatenates the FirstName and LastName columns into one output column and assigns it a label of Employee Name From instance B the SELECT clause selects the ID column and assigns it a label of Supervisor ID selects and concatenates the FirstName and LastName columns into one output column and assigns it a label of Supervisor Name In both concatenations the SELECT clause uses the TRIM function to remove trailing spaces from the data in the FirstName column then concatenates the data with a single space and the data in the LastName column to produce a single character value for each full name trim A FirstName A LastName label Employee Name When PROC SQL applies the WHERE clause the two table instances are joined The WHERE clause conditions restrict the output to only those rows in table A that have a supervisor ID that matches an employee ID in table B This operation provides a supervisor ID and full name for each employee in the original table except for those who do not have a supervisor where A Supervisor B ID and A Supervisor is not missing Note Although there are no missing values in the Employees table you shou
141. on You can use the SAS macro facility to help you create new tables and add rows to existing tables The following example creates a table that lists people to serve as referees for reviews of academic papers No more than three people per subject are allowed in a table The macro that is defined in this example checks the count of referees before it inserts a new referee s name into the table The macro has two parameters the referee s name and the subject matter of the academic paper proc sql create table sql referee Name char 15 Subject char 15 define the macro macro addref name subject local count are there three referees in the table reset noprint select count into count from sql referee where subject amp subject Sif amp count ge 3 then do reset print title ERROR amp name not inserted for subject amp subject title2 There are 3 referees already Programming with the SQL Procedure A Defining Macros to Create Tables 125 select from sql referee where subject amp subject reset noprint send Zelse do insert into sql referee name subject values amp name amp subject Sput NOTE amp name has been added for subject amp subject send Smend Submit the ADDREF macro with its two parameters to add referee names to the table Each time you submit the macro a message is written to the SAS log saddref Conner sailing saddref Fay sailing sad
142. ormatting output macro facility with 120 ODS with 132 programming with resetting options syntax checking terminology timing individual statements vs DATA step SQL Procedure Pass Through Facility displaying DBMS data SQLOBS macro variable SQLOOPS macro variable STIMER option timing SQL procedure 114 Structured Query Language See SQL subqueries comparing with joins 116 correlated subqueries multiple nesting levels multiple value single value testing for a group of values vs joins when to use subtotals percentages within summarizing data aggregate functions combining data from multiple rows displaying sums in multiple columns 144 missing values in data remerging summary statistics WHERE clause for summary functions summary reports creating sums displaying syntax checking 113 T table aliases tables altering columns 99 Cartesian product comparing 136 copying existing tables counting duplicate rows 141 creating creating with macros 124 creating without rows 90 Index 169 deleting deleting rows 98 example tables K expanding hierarchical data 143 inserting rows integrity constraints modifying columns selecting all columns selecting columns selecting specific columns SQL tables in other procedures 157 SQL tables in SAS 103 structure of temporary tables vs in line views update errors updating conditionally 150 updating values updating wit
143. ows with the Right Outer Join A right join specified with the keywords RIGHT J OIN and ON is the opposite of a left join nonmatching rows from the right hand table the second table listed in the FROM clause are included with all matching rows in the output This example reverses the join of the last example it uses a right join to select all the cities from the WORLDCITY COORDS table and displays the population only if the city is the capital of a country that is if the city exists in the COUNTRIES table proc sql outobs 10 title Populations of Capitals Only select City format 20 Country Country format 20 Population from sql countries right join sql worldcitycoords on Capital City and Name Country order by City Retrieving Data from Multiple Tables A Outer Joins 67 Output 3 16 Right Join of COUNTRIES and WORLDCITYCOORDS Acapulco Accra Adana Addis Ababa Adelaide Aden Ahmenabad Algiers Alice Springs Populations of Capitals Only Country Population Mexico Ghana 17395511 Turkey Ethiopia Australia Yemen India Algeria Australia 59291170 28171132 Selecting All Rows with the Full Outer Join A full outer join specified with the keywords FULL J OIN and ON selects all matching and nonmatching rows This example displays the first ten matching and nonmatching rows from the City and Capital columns of WORLDCITYCOORDS and COUNTRIES Note that the pound sign is used
144. pe crdate num format DATETIME informat DATETIME label Date Created modate num format DATETIME informat DATETIME label Date Modified nobs num label Number of Observations obslen num label Observation Length nvar num label Number of Variables protect char 3 label Type of Password Protection compress char 8 label Compression Routine encrypt char 8 label Encryption npage num label Number of Pages pcompress num label Percent Compression reuse char 3 label Reuse Space bufsize num label Bufsize delobs num label Number of Deleted Observations indxtype char 9 label Type of Indexes Similarly you can use the DESCRIBE VIEW statement to see how the SASHELP views are constructed proc sql describe view sashelp vtable Programming with the SQL Procedure A Using DICTIONARY COLUMNS 119 Output 5 5 Description of SASHELP VTABLE NOTE SQL view SASHELP VTABLE is defined as select from DICTIONARY TABLES Using DICTIONARY TABLES After you know how a DICTIONARY table is defined you can use its column names in SELECT clauses and subsetting WHERE clauses to get more specific information The following query retrieves information about permanent tables and views that appear in this document proc sql title All Tables and Views in the SQL Library select libname memname memtype nobs from dictionary
145. ptions nodate nonumber linesize 80 pagesize 60 libname sql SAS data library The tables that are used in this document contain geographic and demographic data The data is intended to be used for the PROC SQL code examples only it is not necessarily up to date or accurate The COUNTRIES table contains data that pertains to countries The Area column contains a country s area in square miles The UNDate column contains the year a country entered the United Nations if applicable Introduction to the SQL Procedure A Output 1 3 COUNTRIES Partial Output Afghanistan Albania Algeria Andorra Angola Antigua and Barbuda Argentina Armenia Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Capital Tirane Algiers Andorra la Vell Luanda St John s Buenos Aires Yerevan Canberra Vienna Baku Nassau Manama Dhaka Bridgetown COUNTRIES Population 17070323 3407400 28171132 64634 9901050 65644 34248705 3556864 18255944 8033746 7760064 275703 591800 1 2639E8 258534 251825 11100 919595 200 481300 171 1073518 11500 2966200 32400 33400 5400 300 57300 200 Notes about the Example Tables Continent UNDate Europe Africa Europe Africa Central America South America Asia Australia Europe Asia Central America Asia Asia Central America 5 The WORLDCITY COORDS table contains latitude and longitude data for world cities Cities in the We
146. ql title Table JOINED select from fltsuper s fltdest d where s Flight d Flight Output 3 26 PROC SQL Join of the FLTSUPER and FLTDEST Tables Table JOINED Flight Supervisor Flight Destination Brussels Edmonton Ramirez Brussels Ramirez Edmonton Miller Paris Miller Madrid Picard Paris Picard Madrid PROC SQL builds the Cartesian product and then lists the rows that meet the WHERE clause condition The WHERE clause returns two rows for each supervisor one row for each destination Because Flight has duplicate values and there is no other matching column there is no way to associate Kang only with Brussels Ramirez only with Edmonton and so on For more information about DATA step match merges see SAS Language Reference Dictionary Using Subqueries to Select Data While a table join combines multiple tables into a new table a subquery enclosed in parentheses selects rows from one table based on values in another table A subquery or inner query is a query expression that is nested as part of another query expression Retrieving Data from Multiple Tables A Multiple Value Subqueries 75 Depending on the clause that contains it a subquery can return a single value or multiple values Subqueries are most often used in the WHERE and the HAVING expressions Single Value Subqueries A singlevalue subquery returns a single row and column It can be used in a WHERE or HAVING clause with a compariso
147. ql features where type in Desert Lake group by Location Type order by Location desc Output 2 46 Grouping with an ORDER BY Clause Total Square Miles of Deserts and Lakes Location TotalArea Saudi Arabia 250 000 Russia 11 780 North America 77 200 North America Desert 140 000 Europe Asia Lake 143 550 China Desert 500 000 Canada Lake 12 275 Australia Desert 300 000 Asia Lake 25 300 Africa Desert 3 725 000 Africa Lake 50 958 Grouping with Missing Values When a column contains missing values PROC SQL treats the missing values as a single group This can sometimes provide unexpected results Retrieving Data froma SingleTable A Grouping with Missing Values 49 Finding Grouping Errors Caused by Missing Values In this example because the SQL COUNTRIES table contains some missing values in the Continent column the missing values combine to form a single group that has the total area of the countries that have a missing value in the Continent column incorrect output proc sql outobs 12 title Areas of World Continents select Name format 25 Continent sum Area format commal2 as TotalArea from sql countries group by Continent order by Continent Name The output is incorrect because Bermuda Iceland and Kalaallit Nunaat are not actually part of the same continent however PROC SQL treats them that way because they all have a missing character value in the Continent column O
148. r each column in the ORDER BY clause When you specify multiple columns in the ORDER BY clause the first column determines the primary row order of the results Subsequent columns determine the order of rows that have the same value for the primary sort The following example sorts the SQL FEATURES table by feature type and name proc sql outobs 12 title World Topographical Features select Name Type from sql features order by Type desc Name Note TheASC keyword is optional because the PROC SQL default sort order is ascending Output 2 18 Specifying a Sort Order World Topographical Features Angel Falls Waterfall Niagara Falls Waterfall Tugela Falls Waterfall Yosemite Waterfall Andaman Sea Baltic Sea Bering Sea Black Sea Caribbean Sea Gulf of Mexico Sea Hudson Bay Sea Mediterranean Sea Sorting by Calculated Column You can sort by a calculated column by specifying its alias in the ORDER BY clause The following example calculates population densities and then performs a sort on the calculated Density column proc sql outobs 12 title World Population Densities per Square Mile select Name Population format commal2 Area format comma8 Population Area as Density format commal0 from sql countries order by Density desc 28 Sorting by Column Position A Chapter 2 Output 2 19 Sorting by Calculated Column World Population Densities per Square Population Density Hong Kong 5 857 414
149. r more information about how PROC SQL handles errors during data insertions see Handling Update Errors on page 98 Updating Data Values in a Table You can use the UPDATE statement to modify data values in tables and in the tables that underlie PROC SQL and SAS ACCESS views For more information about updating views see Updating a View on page 107 The UPDATE statement updates data in existing columns it does not create new columns To add new columns see Altering Columns on page 99 and Creating New Columns on page 18 The examples in this section update the original NEWCOUNTRIES table Updating All Rows in a Column with the Same Expression The following UPDATE statement increases all populations in the NEWCOUNTRIES table by five percent proc sql update sql newcountries set population population 1 05 title Updated Population Values select name format 20 capital format 15 population format commal15 0 from sql newcountries Creating and Updating Tables and Views A Updating Rows in a Column with Different Expressions 97 Output 4 8 Updating a Column for All Rows Updated Population Values Capital Population Brasilia 168 325 875 Beijing 1 262 325 831 New Delhi 975 459 576 Indonesia Jakarta 212 513 552 Russia Moscow 158 644 478 United States Washington 276 459 548 Updating Rows in a Column with Different Expressions If you want to update some but not all of a column
150. r trademarks of SAS Institute Inc in the USA and other countries indicates USA registration Other brand and product names are registered trademarks or trademarks of their respective companies Contents Chapter 1 A Introduction to the SQL Procedure hl What Is SQL H What Is the SQL Procedure y Terminology J Comparing PROC SQL with the SAS DATA Step 3 Notes about the Example Tables fa Chapter 2 A Retrieving Data from a Single Table 11 Overview of the SELECT Statement 12 Selecting Columns in a Table 14 Creating New Columns Sorting Data 25 Retrieving Rows That Satisfy a Condition Summarizing Data Grouping Data 45 Filtering Grouped Data Validating a Query 52 Chapter 3 A Retrieving Data from Multiple Tables 55 Introduction Selecting Data from More Than One Table by Using J oins Using Subqueries to Select Data When to Use J oins and Subqueries Combining Queries with Set Operators Chapter 4 A Creating and Updating Tables and Views Introduction Creating Tables Inserting Rows into Tables Updating Data Values in a Table Deleting Rows Altering Columns Creating an Index Deleting a Table Using SQL Procedure Tables in SAS Software Creating and Using Integrity Constraints in a Table Creating and Using PROC SQL Views Chapter 5 A Programming with the SQL Procedure 111 Introduction 111 Using PROC SQL Options to Create and Debug Queries 112 Improving Query Performance 115 Accessin
151. raint prim key primary key state constraint population check population gt 0 constraint continent check continent in North America Oceania create table sql uspostal name char 15 code char 2 not null constraint specified as a column attribute constraint for_key foreign key name links NAME to the references sql mystates primary key in MYSTATES on delete restrict forbids deletions to STATE unless there is no matching NAME value on update set null allows updates to STATE Creating and Updating Tables and Views A Creating and Using PROC SOL Views 105 changes matching NAME values to missing The DESCRIBE TABLE statement displays the integrity constraints in the SAS log as part of the table description The DESCRIBE TABLE CONSTRAINTS statement writes only the constraint specifications to the SAS log proc sql describe table sql mystates describe table constraints sql uspostal Output 4 16 SAS Log Showing Integrity Constraints NOTE SQL table SQL MYSTATES was created like create table SOQL MYSTATES bufsize 8192 state char 15 population num continent char 15 yi create unique index state on SQL MYSTATES state Integrity Where On Constraint Type Variables Clause Reference Delete Update 49 continent Check continent in North America Oceania 48 population Check population gt 0 47 prim_key Primary Key state
152. ribed in Creating Views on page 106 Avoid creating views that are based on tables whose structure may change A view is no longer valid when it references a nonexistent column Using SQL Procedure Views in SAS Software You can use PROC SQL views as input to a DATA step or to other SAS procedures The syntax for using a PROC SQL view in SAS is the same as that for a PROC SQL table For an example see Using SQL Procedure Tables in SAS Software on page 103 110 111 CHAPTER Programming with the SQL Procedure Introduction 111 Using PROC SQL Options to Createand Debug Queries 112 Restricting Row Processing with the NOBS and OUTOBS Options 112 Limiting Iterations with the LOOPS Option 112 Checking Syntax with the NOEXEC Option and the VALIDATE Statement 113 Expanding SELECT with the FEEDBACK Option 113 Timing PROC SQL with the STIMER Option 114 Resetting PROC SQL Options with the RESET Statement 115 Improving Query Performance 115 Using Indexes to Improve Performance 115 Using the Keyword ALL in S amp Operations 116 Omitting the ORDER BY Clause Whe Creating Tables and Views 116 Using In Line Views versus Temporary Tables 116 Comparing Subqueries with J oins 116 Using WHERE Expressions with J oins 117 Accessing SAS System Information Using DICTIONARY Tables 117 Using DICTIONARY TABLES 119 Using DICTIONARY COLUMNS 119 Tips for Using DICTIONARY Tables 120 Using PROC SQL with th
153. rmation that enables it to read data values from other files which can include SAS data files SAS ACCESS views DATA step views or other PROC SQL views A PROC SQL view s output can be either a subset or a superset of one or more files See also view Glossary 165 query a set of instructions that requests particular information from one or more data sources query expression query in PROC SQL one or more table expressions that can be linked with set operators The primary purpose of a query expression is to retrieve data from tables PROC SQL views or SAS ACCESS views In PROC SQL the SELECT statement is contained in a query expression row in relational database management systems the horizontal component of a table It is analogous to a SAS observation SAS data file a SAS data set that contains data values as well as descriptor information that is associated with the data The descriptor information includes information such as the data types and lengths of the variables as well as which engine was used to create the data A PROC SQL table is a SAS data file SAS data files are of member type DATA See also SAS data set SAS data view SAS data set a file whose contents are in one of the native SAS file formats There are two types of SAS data sets SAS data files and SAS data views SAS data files contain data values in addition to descriptor information that is associated with the data SAS data views contain only the des
154. ro Variables Problem You want to create a separate data set for each unique value of a column Background Information The SQL FEATURES data set contains information on various geographical features around the world Practical Problem Solving with PROC SQL A Solution 155 Output 6 23 FEATURES Partial Output FEATURES Location Height Length Aconcagua Mountain Argentina Amazon River South America Amur River Asia Andaman Sea 218100 Angel Falls Waterfall Venezuela Annapurna Mountain Nepal Aral Sea Lake Asia 25300 Ararat Mountain Turkey Arctic Ocean 5105700 Atlantic Ocean 33420000 Solution To create a separate data set for each type of feature you could go through the data set manually to determine all the unique values of Type and then write a separate DATA step for each type or a single DATA step with multiple OUTPUT statements This approach is labor intensive error prone and impractical for large data sets The following PROC SQL code counts the unique values of Type and puts each value in a separate macro variable The SAS macro that follows the PROC SQL code uses these macro variables to create a SAS data set for each value You do not need to know beforehand how many unique values there are or what the values are proc sql noprint select count distinct type into n from sql features select distinct type into typel type left amp n from sql features quit macro makeds do i l t
155. rom sql a outer union corr select from sql b 86 Producing Rows from the First Query or the Second Query A Chapter 3 Output 3 41 Concatenating the Query Results OUTER UNION CORR A OUTER UNION CORR B Producing Rows from the First Query or the Second Query Thereis no keyword in PROC SQL that returns unique rows from the first and second table but not rows that occur in both Here is one way you can simulate this operation queryl except query2 union query2 except queryl1 This example shows how to use this operation proc sql title A EXCLUSIVE UNION B select from sql a except select from sql b union select from sql b except select from sql a Output 3 42 Producing Rows from the First Query or the Second Query A EXCLUSIVE UNION B Retrieving Data from Multiple Tables A Producing Rows from the First Query or the Second Query 87 The first EXCEPT returns one unique row from the first table table A only The second EXCEPT returns one unique row from the second table table B only The middle UNION combines the two results Thus this query returns the row from the first table that is not in the second table as well as the row from the second table that is not in the first table 88 89 CHAPTER Creating and Updating Tables and Views Introduction 90 Creating Tables 90 Creating Tables from Column Definitions 90 Creating Tables from a Query Result 91
156. rst table the second or both UNION does not return duplicate rows If a row occurs more than once then only one occurrence is returned proc sql title A UNION B select from sql a union select from sql b Output 3 35 Producing Unique Rows from Both Queries UNION A UNION B You can use the ALL keyword to request that duplicate rows remain in the output proc sql title A UNION ALL B select from sql a union all select from sql b Retrieving Data from Multiple Tables A Producing Rows That Are in Only the First Query Result EXCEPT 83 Output 3 36 Producing Rows from Both Queries UNION ALL A UNION ALL B Producing Rows That Are in Only the First Query Result EXCEPT The EXCEPT operator returns rows that result from the first query but not from the second query In this example the row that contains the values 3 and three exists in the first query table A only and is returned by EXCEPT proc sql title A EXCEPT B select from sql a except select from sql b Output 3 37 Producing Rows That Are in Only the First Query Result EXCEPT A EXCEPT B Note that the duplicated row in Table A containing the values 2 and two does not appear in the output EXCEPT does not return duplicate rows that are unmatched by rows in the second query Adding ALL keeps any duplicate rows that do not occur in the second query proc sql title A EXCEPT
157. s JanTotal FebTotal MarTotal GrandTotal 7500 8100 22 400 How It Works Recall that when you specify one column as the argument to an aggregate function the values in that column are calculated When you specify multiple columns the values in each row of the columns are calculated This solution uses the SUM function 146 Creating a Summary Report A Chapter 6 to calculate the sum of each month s sales then uses the SUM function a second time to total the monthly sums into one grand total sum calculated JanTotal calculated FebTotal calculated MarTotal as GrandTotal format dollarl0 An alternative way to code the grand total calculation is to use nested functions sum sum January sum February sum March as GrandTotal format dollarl0 Creating a Summary Report Problem You have a table that contains detailed sales information You want to produce a summary report from the detail table Background Information There is one input table called SALES that contains detailed sales information There is one record for each sale for the first quarter that shows the site product invoice number invoice amount and invoice date Output 6 15 Sample Input Table for Creating a Summary Report Sample Data to Create Summary Sales Report Invoice Product Invoice InvoiceDate VID010 980126 VID010 980126 VID005 980309 VID014 980309 VID003 980330 VID010 980330 VID003 980302 VID003 980223 VID003
158. s You can sort query results by columns that are not included in the query For example the following query returns all the rows in the SQL COUNTRIES table and sorts them by population even though the Population column is not included in the query proc sql outobs 12 title Countries Sorted by Population select Name Continent from sql countries order by Population Output 2 21 Sorting by Unselected Columns Countries Sorted by Population Continent Vatican City Europe Tuvalu Oceania Nauru Oceania Turks and Caicos Islands Central America and Caribbean Leeward Islands Central America and Caribbean Cayman Islands Central America and Caribbean San Marino Europe Liechtenstein Europe Gibraltar Europe Monaco Europe Saint Kitts and Nevis Central America and Caribbean Marshall Islands Oceania Specifying a Different Sorting Sequence SORTSEQ is a PROC SQL statement option that specifies the sorting sequence for PROC SQL to use when a query contains an ORDER BY clause Use this option only if you want to use a sorting sequence other than your operating environment s default sorting sequence Possible values include ASCII EBCDIC and some languages other than English For example in an operating environment that supports the EBCDIC sorting sequence you could use the following option in the PROC SQL statement to set the sorting sequence to EBCDIC proc sql sortseq ebcdic Note SORTSEQ affects only the ORDER
159. s support sas com publishing For other SAS Institute business call 919 677 8000 Customers outside the United States should contact their local SAS office 162 163 Glossary calculated column in a query a column that does not exist in any of the tables that are being queried but which is created as a result of a column expression Cartesian product a type of join that matches each row from each joined table to each row from all other joined tables See cross join join column in relational databases a vertical component of a table Each column has a unique name contains data of a specific type and has certain attributes A column is analogous to a variable in SAS terminology column alias a temporary alternate name for a column in the SQL procedure Aliases are optionally specified in the SELECT clause to name or rename columns An alias is one word See also column column expression a set of operators and operands that when evaluated results in a single data value The resulting data value can be either a character value or a numeric value composite index an index that locates observations in a SAS data set by the values of two or more key variables See also index simple index condition in PROC SQL the part of the WHERE clause that contains the search criteria In the condition you specify which rows are to be retrieved cross join a type of join that returns the product of joined tables A cross jo
160. s City us State us Latitude world city world latitude from sql worldcitycoords world sql uscitycoords us 60 Inner Joins A Chapter 3 where world city Cairo and us latitude 1t world latitude Output 3 7 Using Comparison Operators to Join Tables US Cities South of Cairo Egypt State Latitude i Latitude Honolulu Key West Miami San Antonio Tampa When you run this query the following message is written to the SAS log Output 3 8 Comparison Query Log Message NOTE The execution of this query involves performing one or more Cartesian product joins that can not be optimized Recall that you see this message when you run a query that joins tables without specifying matching columns in a WHERE clause PROC SQL also displays this message whenever tables are joined by using an inequality operator The Effects of Null Values on Joins Most database products treat nulls as distinct entities and do not match them in joins PROC SQL treats nulls as missing values and as matches for joins Any null will match with any other null of the same type character or numeric in a join The following example joins Table One and Table Two on column B There are null values in column B of both tables Notice in the output that the null value in row c of Table One matches all the null values in Table Two This is probably not the intended result for the join proc sql title One and Two Joined s
161. s values then use a WHERE expression in the UPDATE statement You can use multiple UPDATE statements each with a different expression However each UPDATE statement can have only one WHERE clause The following UPDATE statements result in different population increases for different countries in the NEWCOUNRTRIES table proc sql update sql newcountries set population population 1 05 where name like B update sql newcountries set population population 1 07 where name in China Russia title Selectively Updated Population Values select name format 20 capital format 15 population format commal5 0 from sql newcountries Output 4 9 Selectively Updating a Column Selectively Updated Population Values Capital Population Brazil Brasilia 168 325 875 China Beijing 1 286 370 132 India New Delhi 929 009 120 Indonesia Jakarta 202 393 859 Russia Moscow 161 666 278 United States Washington 263 294 808 You can accomplish the same result with a CASE expression update sql newcountries set population population 98 Handling Update Errors A Chapter 4 case when name like B then 1 05 when name in China Russia then 1 07 else 1 end If the WHEN clause is true then the corresponding THEN clause returns a value that the SET clause then uses to complete its expression In this example when Name starts with the letter B the SET expression becomes population population 1
162. select Name from sql countries where Name like Z or Name like _ a Output 2 29 Using the LIKE Operator Country Names that Begin with the Letter zZ or Are 5 Characters Long and End with the Letter a The percent sign and underscore _ are wild card characters For more information about pattern matching with the LIKE comparison operator see the SQL Procedure chapter in the Base SAS Procedures Guide Retrieving Data from a Single Table A Using a WHERE Clause with Missing Values 37 Using Truncated String Comparison Operators Truncated string comparison operators are used to compare two strings They differ from conventional comparison operators in that before executing the comparison PROC SQL truncates the longer string to be the same length as the shorter string The truncation is performed internally neither operand is permanently changed The following table lists the truncated comparison operators Table 2 5 Truncated String Comparison Operators Symbol Definition Example EQT equal to truncated strings where Name eqt Aust GTT greater than truncated strings where Name gtt Bah LTT less than truncated strings where Name ltt An GET greater than or equal to truncated strings where Country get United A LET less than or equal to truncated strings where Lastname let Smith NET not equal to truncated strings where Style net TWO The following example returns a
163. sing value To specify that a value is missing use a blank in single quotation marks for character values and a period for numeric values Inserting Rows with the VALUES Clause With the VALUES clause you assign values to a column by position The following INSERT statement uses multiple VALUES clauses to add rows to NEWCOUNTRIES Recall that NEWCOUNTRIES has six columns so it is necessary to specify a value or an appropriate missing value for all six columns See the results of the DESCRIBE TABLE statement in Creating Tables Like an Existing Table on page 92 for information about the columns of NEWCOUNTRIES proc sql insert into sql newcountries values Pakistan Islamabad 123060000 values Nigeria Lagos 99062000 title World s Largest Countries select name format 20 capital format 15 population format commal15 0 from sql newcountries Creating and Updating Tables and Views A Inserting Rows with a Query 95 Output 4 6 Rows Inserted with the Values Clause World s Largest Countries Capital Population Brazil Brasilia 160 310 357 China Beijing 1 202 215 077 India New Delhi 929 009 120 Indonesia Jakarta 202 393 859 Russia Moscow 151 089 979 United States Washington 263 294 808 Pakistan Islamabad 123 060 000 Nigeria Lagos 99 062 000 Note the following features of VALUES clauses As with other SQL clauses use commas
164. ssing Values 24 Specifying Column Attributes 24 Sorting Data 25 Sorting by Column 25 Sorting by Multiple Columns 26 Specifying a Sort Order 27 Sorting by Calculated Column 27 Sorting by Column Position 28 Sorting by Unsdected Columns 29 Specifying a Different Sorting Sequence 29 Sorting Columns That Contain Missing Values 30 Retrieving Rows That Satisfy a Condition 30 Using a Simple WHERE Clause 30 Retrieving Rows Based on a Comparison 31 R amp rieving Rows That Satisfy Multiple Conditions 32 Using Other Conditional Operators 33 Using the IN Operator 34 Using theIS MISSING Operator 34 Using the BETWEEN AND Operators 35 Using theLIKE Operator 36 Using Truncated String Comparison Operators 37 12 Overview of the SELECT Statement A Chapter 2 Using a WHERE Clausewith Missing Values 37 Summarizing Data 39 Using Aggregate Functions 39 Summarizing Data with a WHERE Clause 40 Using the MEAN Function with a WHERE Clause 40 DisplayingSums 40 Combining Data from Multiple Rows intoa SingleRow 41 Remnerging Summary Statistics 41 Using Aggregate Functions with Unique Values 43 Counting Unique Values 43 Counting Nonmissing Values 43 Counting All Rows 44 Summarizing Data with Missing Values 44 Finding Errors Caused by Missing Values 44 Grouping Data 45 Grouping by One Column 46 Grouping without Summarizing 46 Grouping by Multiple Columns 47 Grouping and Sorting Data 48 Grouping with Missing Values 48 Finding Grouping Errors Caused by Missi
165. stern hemisphere have negative longitude coordinates Cities in the Southern hemisphere have negative latitude coordinates Coordinates are rounded to the nearest degree Output 1 4 WORLDCITYCOORDS Partial Output WORLDCITCOORDS Country Latitude Longitude Afghanistan Algeria Argentina Argentina Argentina Australia Australia Australia Australia Australia Australia Australia Austria Bahamas Bangladesh Buenos Aires Cordoba Tucuman Adelaide Alice Springs Brisbane Darwin Melbourne Perth Sydney Vienna Nassau Chittagong The USCITY COORDS table contains the coordinates for cities in the United States Because all cities in this table are in the Western hemisphere all of the longitude coordinates are negative Coordinates are rounded to the nearest degree 6 Notes about the Example Tables A Chapter 1 Output 1 5 USCITYCOORDS Partial Output Albuquerque Amarillo Anchorage Annapolis Atlanta Augusta Austin Baker Baltimore Bangor Baton Rouge Birmingham Bismarck Boise USCITYCOORDS State Latitude Longitude The UNITEDSTATES table contains data that is associated with the states The Statehood column contains the date when the state was admitted into the Union Output 1 6 UNITEDSTATES Partial Output Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Colum Florida Georgia Hawaii Idaho Illinois Indiana Capital Montgomery Ju
166. subquery first returns all countries that are found in the OILPROD 76 Correlated Subqueries A Chapter 3 table The outer query then matches countries in the COUNTRIES table to the results of the subquery proc sql outobs 5 title Populations of Major Oil Producing Countries select name Country Population format commal15 from sql countries where Name in select Country from sql oilprod Output 3 28 Multiple Value Subquery Using IN Populations of Major Oil Producing Countries Country Population 28 171 132 28 392 302 1 202 215 077 59 912 259 Indonesia 202 393 859 If you use the NOT IN operator in this query then the query result will contain all the countries that are not contained in the OILPROD table proc sql outobs 5 title Populations of NonMajor Oil Producing Countries select name Country Population format commal15 from sql countries where Name not in select Country from sql oilprod Output 3 29 Multiple Value Subquery Using NOT IN Populations of NonMajor Oil Producing Countries Country Population Afghanistan 17 070 323 Albania 3 407 400 Andorra 64 634 Angola 9 901 050 Antigua and Barbuda 65 644 Correlated Subqueries The previous subqueries have been simple subqueries that are self contained and that execute independently of the outer query A corrdated subquery requires a value or values to be passed to it by the outer query After the su
167. system option OBS The OUTOBS option restricts the number of rows that PROC SQL displays or writes to a table For example if you specify OU TOBS 10 and insert values into a table by using a query then PROC SQL inserts a maximum of 10 rows into the resulting table OUTOBS is similar to the SAS data set option OBS In a simple query there might be no apparent difference between using INOBS or OUTOBS Other times however it is important to choose the correct option For example taking the average of a column with INOBS 10 returns an average of only 10 values from that column Limiting Iterations with the LOOPS Option The LOOPS option restricts PROC SQL to the number of iterations that are specified in this option through its inner loop By setting a limit you can prevent Programming with the SQL Procedure A Expanding SELECT with the FEEDBACK Option 113 queries from consuming excessive computer resources For example joining three large tables without meeting the join matching conditions could create a huge internal table that would be inefficient to process Use the LOOPS option to prevent this from happening You can use the number of iterations that are reported in the SQLOOPS macro variable after each PROC SQL statement is executed to gauge an appropriate value for the LOOPS option For more information see Using the PROC SQL Automatic Macro Variables on page 126 If you use the PROMPT option with the NOBS OUTOBSs or
168. t the size of result tables created with joins Using Indexes to Improve Performance Indexes are created with the CREATE INDEX statement in the SQL procedure or alternatively with the MODIFY and INDEX CREATE statements in the DATASETS 116 Using the Keyword ALL in Set Operations A Chapter 5 procedure Indexes are stored in specialized members of a SAS data library and havea SAS member type of INDEX The values that are stored in an index are automatically updated if you make a change to the underlying data Indexes can improve the performance of certain classes of retrievals For example if an indexed column is compared to a constant value in a WHERE expression then the index will likely improve the query s performance Indexing the column that is specified in a correlated reference to an outer table also improves a subquery s and hence query s performance Composite indexes can improve the performance of queries that compare the columns that are named in the composite index with constant values that are linked by using the AND operator For example if you have a compound index on the columns CITY and STATE and the WHERE expression is specified as WHERE CITY xxx AND STATE Yyy then the index can be used to select that subset of rows more efficiently Indexes can also benefit queries that have a WHERE clause of the form where varl in select iteml from tablel The values of VA
169. ta from Multiple Tables Introduction 56 Selecting Data from More Than One Table by Using oins 56 Inner J oins 57 Using Table Aliases 58 Specifying the Order of J oin Output 59 Creating Inne J oins Using INNER J OIN Keywords 59 J oining Tables Using Comparison Operators 59 The Effects of Null Values on J oins 60 Creating Multicolumn J oins 62 Selecting Data from More Than Two Tables 63 Showing Rdationships within a Single Table Using Saf J oins 64 Outer J oins 65 Including Nonmatching Rows with the Left Outer J oin 65 Including Nonmatching Rows with the Right Outer J oin 66 Saecting All Rows with the Full Outer J oin 67 Specialty oins 68 Including All Combinations of Rows with the Cross oin 68 Including All Rows with the Union Join 69 Matching Rows with a Natural Join 69 Using the Coalesce Function inJ oins 70 Comparing DATA Step Match Merges with PROC SQL J oins 71 When All of the Values Match 71 When Only Some of the Values Match 72 When the Position of the Values Is Important 73 Using Subqueries toSdect Data 74 Single Value Subqueries 75 Multiple Value Subqueries 75 Corrdated Subqueries 76 Testing for the Existence of a Group of Values 77 Multiple Levels of Subquery Nesting 78 Combining a J oin with a Subquery 79 When to Use oins and Subqueries 80 Combining Queries with S amp Operators 81 Working with Two or More Query Results 81 Producing Unique Rows from Both Queries UNION 82 Producing Rows That Arein Only the First Query R
170. tables where libname SQL Output 5 6 Tables and Views Used in This document All Tables and Views in the SQL Library Library Member Number of Member Name Observations CITYREPORT CONTINENTS COUNTRIES DENSITIES FEATURES MYSTATES NEWCONTINENTS NEWCOUNTRIES NEWSTATES OILPROD OILRSRVS POSTALCODES STATECODES UNITEDSTATES USCITYCOORDS WORLDCITYCOORDS WORLDTEMPS Using DICTIONARY COLUMNS DICTIONARY tables are useful when you want to find specific columns to include in reports The following query shows which of the tables that are used in this document contain the Country column proc sql title All Tables that Contain the Country Column 120 Tips for Using DICTIONARY Tables A Chapter 5 select libname memname name from dictionary columns where name Country and libname SQL Output 5 7 Using DICTONARY COLUMNS to Locate Specific Columns All Tables that Contain the Country Column Library Member Name Column Name OILPROD Country OILRSRVS Country WORLDCITYCOORDS Country WORLDTEMPS Country Tips for Using DICTIONARY Tables You cannot use data set options with DICTIONARY tables The DICTIONARY DICTIONARIES table contains information about each column in all DICTIONARY tables Many character values such as member names and libnames are stored as all uppercase characters you should design your queries accordingly Because DICTIONARY tables are read only ob
171. that shown in Output 3 24 proc sql select coalesce s Flight d Flight as Flight Supervisor Destination from fltsuper s full join fltdest d on s Flight d Flight When the Position of the Values Is Important When you want to merge two tables and the position of the values is important you might need to use a DATA step merge To demonstrate this idea here are two tables to consider FLTSUPER FLTDEST Flight Supervisor Flight Destination 145 Kang 145 Brussels 145 Ramirez 145 Edmonton 150 Miller 150 Paris 150 Picard 150 Madrid 155 Evanko 165 Seattle 157 Lei For Flight 145 Kang matches with Brussels and Ramirez matches with Edmonton Because the DATA step merges data based on the position of values in BY groups the values of Supervisor and Destination match appropriately A DATA step merge produces Output 3 25 data merged merge fltsuper fltdest by flight run proc print data merged noobs title Table MERGED run 74 Using Subqueries to Select Data A Chapter 3 Output 3 25 Match Merge of the FLTSUPER and FLTDEST Tables Table MERGED Flight Supervisor Destination 145 Kang Brussels 145 Ramirez Edmonton 150 Miller Paris 150 Picard Madrid 155 Evanko 157 Lei 165 Seattle PROC SQL does not process joins according to the position of values in BY groups Instead PROC SQL processes data only according to the data values Here is the result of an inner join for FLTSUPER and FLTDEST proc s
172. that the innermost subquery returns a value or values to be used by the next outer query Then that subquery s value s are used by the next outer query and so on Evaluation always begins with the innermost subquery and works outward This example lists cities in Africa that are in countries with major oil reserves The innermost query is evaluated first It returns countries that are located on the continent of Africa The outer subquery is evaluated It returns a subset of African countries that have major oil reserves by comparing the list of countries that was returned by the inner subquery against the countries in OILRSRVS Finally the WHERE clause in the outer query lists the coordinates of the cities that exist in the WORLDCITY COORDS table whose countries match the results of the outer subquery proc sql title Coordinates of African Cities with Major Oil Reserves select from sql worldcitycoords where country in select Country from sql oilrsrvs o where o Country in select Name from sql countries c where c Continent Africa Retrieving Data from Multiple Tables A Combining a Join with a Subquery 79 Output 3 32 Multiple Levels of Subquery Nesting Coordinates of African Cities with Major Oil Reserves Country Latitude Longitude Benghazi Lagos Combining a Join with a Subquery You can combine joins and subqueries in a single query Suppose that you want to find th
173. tion Area num format BEST8 informat BEST8 Continent char 35 format 35 informat 35 label Continent Statehood num i Creating New Columns In addition to selecting columns that are stored in a table you can create new columns that exist for the duration of the query These columns can contain text or calculations PROC SQL writes the columns that you create as if they were columns from the table Adding Text to Output You can add text to the output by including a string expression or literal expression in a query The following query includes two strings as additional columns in the output proc sql outobs 12 title U S Postal Codes select Postal code for Name is Code from sql postalcodes Output 2 7 Adding Text to Output U S Postal Codes Postal Alabama Postal Alaska Postal American Samoa Postal Arizona Postal Arkansas Postal California Postal Colorado Postal Connecticut Postal Delaware Postal District Of Columbia Postal Florida Postal Georgia Retrieving Data froma SingleTable A Calculating Values 19 To prevent the column headers Name and Code from printing you can assign a label that starts with a special character to each of the columns PROC SQL does not output the column name when a label is assigned and it does not output labels that begin with special characters For example you could use the following query to suppress the column head
174. tional theory was developed by E F Codd an IBM researcher and first implemented at IBM in a prototype called System R This prototype evolved into commercial IBM products based on SQL The Structured Query Language is now in the public domain and is part of many vendors products What Is the SQL Procedure The SQL procedure is SAS implementation of Structured Query Language PROC SQL is part of Base SAS software and you can use it with any SAS data set table Often PROC SQL can be an alternative to other SAS procedures or the DATA step You can use SAS language elements such as global statements data set options functions informats and formats with PROC SQL just as you can with other SAS procedures PROC SQL can generate reports generate summary statistics retrieve data from tables or views combine data from tables or views create tables views and indexes update the data values in PROC SQL tables update and retrieve data from database management system DBMS tables 2 Terminology A Chapter 1 modify a PROC SQL table by adding modifying or dropping columns PROC SQL can be used in an interactive SAS session or within batch programs and it can include global statements such as TITLE and OPTIONS Terminology Tables A PROC SQL tableis the same as a SAS data file It is a SAS file of type DATA PROC SQL tables consist of rows and columns The ro
175. to restrict the data that you retrieve by specifying a condition that each row of the table must satisfy PROC SQL output includes only those rows that satisfy the condition The following SELECT statement contains a WHERE clause that restricts the query output to only those countries that have a population that is greater than 5 000 000 people select Name from sql countries where Population gt 5000000 ORDER BY Clause The ORDER BY clause enables you to sort the output from a table by one or more columns that is you can put character values in either ascending or descending alphabetical order and you can put numerical values in either ascending or descending numerical order The default order is ascending For example you can modify the previous example to list the data by descending population select Name from sql countries where Population gt 5000000 order by Population desc GROUP BY Clause The GROUP BY clause enables you to break query results into subsets of rows When you use the GROUP BY clause you use an aggregate function in the SELECT clause or a HAVING clause to instruct PROC SQL how to group the data For details about aggregate functions see Summarizing Data on page 39 PROC SQL calculates the aggregate function separately for each group When you do not use an aggregate function PROC SQL treats the GROUP BY clause as if it were an ORDER BY clause and any aggregate functions are applied to the entire table T
176. to separate columns In addition you must use a semicolon after the last VALUES clause only If you omit data for a column without indicating a missing value then you receive an error message and the row is not inserted To specify that a value is missing use a space in single quotation marks for character values and a period for numeric values Inserting Rows with a Query You can insert the rows from a query result into a table The following query returns rows for large countries over 130 million in population from the COUNTRIES table The INSERT statement adds the data to the empty table NEWCOUNTRIES which was created earlier in Creating Tables Like an Existing Table on page 92 proc sql create table sql newcountries like sql countries proc sql title World s Largest Countries insert into sql newcountries select from sql countries where population ge 130000000 select name format 20 capital format 15 population format commal5 0 from sql newcountries 96 Updating Data Values in a Table A Chapte 4 Output 4 7 Rows Inserted with a Query World s Largest Countries Capital Population Brasilia 160 310 357 Beijing 1 202 215 077 New Delhi 929 009 120 Indonesia Jakarta 202 393 859 Russia Moscow 151 089 979 United States Washington 263 294 808 If your query does not return data for every column then you receive an error message and the row is not inserted Fo
177. tput 2 11 Referring to a Calculated Column by Alias Range of High and Low Temperatures in Celsius Amsterdam Athens Auckland Bangkok Beijing Belgrade Berlin Bogota Bombay Bucharest Budapest 8 ew ee NWNHWOANOOWONAEF DN ee 8 OPORFRPUWONWAAIODN DOAONFWDWWHNAINDHDO Note Because this query sets a numeric format of 4 1 on the HighC LowC and Range columns the values in those columns are rounded to the nearest tenth As a result of the rounding some of the values in the HighC and LowC columns do not reflect the range value output for the Range column When you round numeric data values this type of error sometimes occurs If you want to avoid this problem then you can specify additional decimal places in the format a Assigning Values Conditionally CASE expressions enable you to interpret and change some or all of the data values in a column to make the data more useful or meaningful 22 Assigning Values Conditionally A Chapter 2 Using a Simple CASE Expression You can use conditional logic within a query by using a CASE expression to conditionally assign a value You can use a CASE expression anywhere that you can use a column name The following table which is used in the next example describes the world climate zones rounded to the nearest degree Table 2 1 World Climate Zones This climate is between at this and at this zone latitude latitud
178. uit Output 5 20 Output from the Pass Through Facility Example ALIBRANDI ALHERTANT ABDULLAH ALVAREZ MERCEDES ALVAREZ CARLOS BAREFOOT JOSEPH BAUCOM WALTER BANADYGA JUSTIN BLALOCK RALPH BALLETTI MARIE BOWDEN EARL BRANCACCIO JOSEPH BREUHAUS JEREMY BRADY CHRISTINE BREWCZAK JAKOB 132 Using the Output Delivery System ODS with PROC SQL A Chapter 5 Using the Output Delivery System ODS with PROC SQL The Output Delivery System ODS enables you to produce the output from PROC SQL in a variety of different formats such as PostScript HTML or list output ODS defines the structure of the raw output from SAS procedures and from the SAS DATA step The combination of data with a definition of its output structure is called an output object Output objects can be sent to any of the various ODS destinations which include listing HTML output and printer When new destinations are added to ODS they will automatically become available to PROC SQL to all other SAS procedures that support ODS and tothe DATA step For more information about ODS see SAS Output Ddivery System User s Guide The following example opens the HTML destination and specifies ODSOUT HTM as the file that will contain the HTML output The output from the PROC SQL procedure is sent to ODSOUT HTM Note This example uses filenames that may not be valid in all operating environments To run the example successfully in your operating environment you may
179. ures with a Depth of Less than 500 Feet select Name Depth from sql features where Depth 1t 500 and Depth is not missing order by Depth Output 2 32 Using a WHERE Clause with Missing Values Corrected Output World Features with a Depth of Less than 500 Feet Aral Sea Victoria Hudson Bay North Retrieving Data from a Single Table A Using Aggregate Functions 39 Summarizing Data You can use an aggregate function or summary function to produce a statistical summary of data in a table The aggregate function instructs PROC SQL in how to combine data in one or more columns If you specify one column as the argument to an aggregate function then the values in that column are calculated If you specify multiple arguments then the arguments or columns that are listed are calculated When you use an aggregate function PROC SQL applies the function to the entire table unless you use a GROUP BY clause You can use aggregate functions in the SELECT or HAVING clauses Note See Grouping Data on page 45 for information about producing summaries of individual groups of data within a table Using Aggregate Functions The following table lists the aggregate functions that you can use Table 2 6 Aggregate Functions Function Definition AVG MEAN mean or average of values COUNT FREQ N number of nonmissing values CSS corrected sum of squares CV coefficient of variation percent MAX largest value MIN smal
180. use A Output 2 42 Grouping by One Column Total Populations of World Continents Total Continent Population Africa 710 529 592 Asia 3 381 858 879 Australia 18 255 944 Central America and Caribbean 66 815 930 Europe 872 192 202 North America 384 801 818 Oceania 5 342 368 South America 317 568 801 Grouping without Summarizing When you use a GROUP BY clause without an aggregate function PROC SQL treats the GROUP BY clause as if it were an ORDER BY clause and displays a message in the log that informs you that this has happened The following example attempts to group high and low temperature information for each city in the SQL WORLDTEMPS table by country proc sql outobs 12 title High and Low Temperatures select City Country AvgHigh AvgLow from sql worldtemps group by Country The output and log show that PROC SQL transforms the GROUP BY clause into an ORDER BY clause Retrieving Data froma SingleTable A Grouping by Multiple Columns 47 Output 2 43 Grouping without Aggregate Functions High and Low Temperatures Country AvgHigh Buenos Aires Argentina Sydney Australia Vienna Austria Nassau Bahamas Hamilton Bermuda Sao Paulo Brazil Rio de Janeiro Brazil Quebec Canada Montreal Canada Toronto Canada Beijing China Output 2 44 Grouping without Aggregate Functions Partial Log WARNING A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELEC
181. uses attributes that are already saved in the table or if no attributes are saved then it uses the default attributes Retrieving Data froma Single Table A Sorting by Column 25 The following example assigns a label of state to the Name column and a format of COMMA1O0 to the Area column proc sql outobs 12 title Areas of U S States in Square Miles select Name label State Area format commal0 from sql unitedstates Note Using the LABEL keyword is optional For example the following two select clauses are the same select Name label State Area format commal0 select Name State Area format commal0 A Output 2 15 Specifying Column Attributes Areas of U S States in Square Miles Alabama Alaska 656 400 Arizona 114 000 Arkansas 53 200 California 163 700 Colorado 104 100 Connecticut 5 500 Delaware 2 500 District of Columbia 100 Florida 65 800 Georgia 59 400 Hawaii 10 900 Sorting Data You can sort query results with an ORDER BY clause by specifying any of the columns in the table including unselected or calculated columns Sorting by Column The following example selects countries and their populations from the SQL COUNTRIES table and orders the results by population proc sql outobs 12 title Country Populations select Name Population format commal0 from sql countries order by Population 26 Sorting by Multiple Columns A Chapter
182. utput 2 47 Finding Grouping Errors Caused by Missing Values Incorrect Output Areas of World Continents Continent TotalArea 876 800 Iceland 876 800 Kalaallit Nunaat 876 800 Algeria Africa 11 299 595 Angola Africa 11 299 595 Benin Africa 11 299 595 Botswana Africa 11 299 595 Burkina Faso Africa 11 299 595 Burundi Africa 11 299 595 Cameroon Africa 11 299 595 Cape Verde Africa 11 299 595 Central African Republic Africa 11 299 595 To correct the query from the previous example you can write a WHERE clause to exclude the missing values from the results corrected output proc sql outobs 12 title Areas of World Continents select Name format 25 Continent sum Area format commal2 as TotalArea from sql countries where Continent is not missing group by Continent order by Continent Name 50 Filtering Grouped Data A Chapter 2 Output 2 48 Adjusting the Query to Avoid Errors Due to Missing Values Corrected Output Areas of World Continents Continent TotalArea 11 299 595 Africa 11 299 595 Africa 11 299 595 Botswana Africa 11 299 595 Burkina Faso Africa 11 299 595 Burundi Africa 11 299 595 Cameroon Africa 11 299 595 Cape Verde Africa 11 299 595 Central African Republic Africa 11 299 595 Chad Africa 11 299 595 Comoros Africa 11 299 595 Congo Africa 11 299 595 Note Aggregate functions such as the SUM function can cause the same calculation to repeat for every row This occurs
183. whenever PROC SQL remerges data See Remerging Summary Statistics on page 41 for more information about remerging Filtering Grouped Data You can use a HAVING clause with a GROUP BY clause to filter grouped data The HAVING clause affects groups in a way that is similar to the way in which a WHERE clause affects individual rows When you use a HAVING clause PROC SQL displays only the groups that satisfy the HAVING expression Using a Simple HAVING Clause The following example groups the features in the SQL FEATURES table by type and then displays only the numbers of islands oceans and seas proc sql title Numbers of Islands Oceans and Seas select Type count as Number from sql features group by Type having Type in Island Ocean Sea order by Type Retrieving Data from a Single Table A Using HAVING with Aggregate Functions 51 Output 2 49 Using a Simple HAVING Clause Numbers of Islands Oceans and Seas Choosing Between HAVING and WHERE The differences between the HAVING clause and the WHERE clause are shown in the following table Because you use the HAVING clause when you work with groups of data queries that contain a HAVING clause usually also contain the following a GROUP BY clause an aggregate function Note When you use a HAVING clause without a GROUP BY clause PROC SQL treats the HAVING clause as if it were a WHERE clause and provides a message in the
184. ws correspond to observations in SAS data files and the columns correspond to variables The following table lists equivalent terms that are used in SQL SAS and traditional data processing SQL Term SAS Term Data Processing Term table SAS data file file row observation record column variable field You can create and modify tables by using the SAS DATA step or by using the PROC SQL statements that are described in Chapter 4 Creating and Updating Tables and Views on page 89 Other SAS procedures and the DATA step can read and update tables that are created with PROC SQL DBMS tables are tables that were created with other software vendors database management systems PROC SQL can connect to update and modify DBMS tables with some restrictions For more information see Accessing a DBMS with SAS ACCESS Software on page 128 Queries Queries retrieve data from a table view or DBMS A query returns a query result which consists of rows and columns from a table With PROC SQL you use a SELECT statement and its subordinate clauses to form a query Chapter 2 Retrieving Data from a Single Table on page 11 describes how to build a query Views PROC SQL views do not actually contain data as tables do Rather a PROC SQL view contains a stored SELECT statement or query The query executes when you use the view in a SAS procedure or DATA step When a view executes it displays data that is derived from existing tab

Download Pdf Manuals

image

Related Search

Related Contents

  この取扱説明書は、必ず最終ューザー様までお届けください。 胴  INTERSURGICAL FICHE TECHNIQUE  owners manual manual del usuario  ZX1200i/ZX1300i/ZX1600i BARCODE PRINTER USER MANUAL    

Copyright © All rights reserved.
Failed to retrieve file