Advanced SAS Programming – Day 12 & 13 (Introduction to PROC SQL)




Day 12 – Introduction to PROC SQL

  • Group By and Having clauses in SQL
  • – Using aggregate functions to summarize data in SQL
  • – Using nested Where clause in SQL
  • – Modifying datasets using Create Table Like, Create Table As, Alter Table and Update clauses

Day 13 – PROC SQL Continued

– Using SQL queries in SAS
– Queries to create a table and select values from a table
– Using SQL conditional statements (Where and Case When clauses)
– Order by clause in SQL to sort datasets

SAS Programming – Day 10

Arrays in SAS

  • SAS arrays are another way to temporarily group and refer to SAS variables. A SAS array provides a different name to reference a group of variables
  • Array statement begins with keyword ARRAY followed by array name and N – number of elements within array
  • _temporay_  option is used to create a temporary array

Base SAS Programming – Day 3

Conditional Statements in SAS:

  • Where statement and If statement are the 2 types of conditional statements in SAS
  • Where Statement can be used in both Proc step and Data step
  • If statement can be used only in Data step

Base SAS Programming – Day 7

Introduction to Important PROCs – FREQ, SUMMARY and FUNCTIONS in SAS

  1. Proc Freq
    • To determine the frequency of occurrence of values in categorical variables
    • Results in frequency, cumulative frequency, percentage and cumulative percentage
    • Can create a n-way crosstab using tables statement and * between the variables
    • Crosstab results in frequency, percentage, row percent and column percent
    • norow, nocol, nocumm, nopercent options can be used with Tables statement to customise the result
  2. Proc Summary
    • Similar to proc Means, used to extract basic statistics / summarize data
    • Print or Output option MUST be used in Proc Summary to get the result
    • By and Class statements can be used in Proc Summary
  3. Functions in SAS
    • There are different types of functions in SAS, mainly used for data manipulation
    • Text functions, Data Type conversion functions, Math / Stat functions, Date and Time functions
    • Type conversion functions
      • Input – to convert character data type to numeric
      • Put – to convert numeric data type to character
    • Text functions
      • Substr – to extract part of a string, based on number of characters
      • Scan – to extract part of a string, based on a delimiter

Base SAS Programming – Day 4

Sorting and De-duping datasets in SAS

  1. Proc Sort
    • By statement is used in Proc Sort to sort variables in a datset
    • By default, the specified variable(s) is sorted in ascending order
    • Descending option / keyword is used in the ‘By’ statement along with the variable to sort it in descending order
    • Only the variable followed by ‘descending’ will be sorted in descending order, the rest of them, if any will be sorted in ascending order
    • Noduprecs option – is used to extract only the non duplicate observations of the dataset
    • Nodupkey option – is used to extract only the non duplicate observations of a particular variable(s) specified in the By statement
  2. Proc Print
    • By and Sum statement in SAS – using aggregate  functions in Proc Print procedure
  3. Proc Append
    • To append SAS datasets, is to stack observations of one dataset over the other. It is like a horizontal join.
    • The datasets being appended must have the same variables (same data type and variable name)
    • Force option – to append datasets with uncommon variable names/attributes
    • Appending of datasets can be done using data and set statement, which is more of a union of datasets.

Base SAS Programming – Day 2

  1. Proc Contents – other option
    • Position – to list the variables in alphabetic order as well as in order of creation in the dataset.
    • _all_ option – to list out the contents of all the files / datasets in a library
    • nods option – used only with _all_ , to list out only the file/dataset names in a library
  2. Data step:
    • Creating a copy of a dataset using data and set statement.
    • Keep and Drop options – to modify a dataset by retaining only the required variables in it. Can be used in both Data and Set statements.
    • Rename and Label statements to change the name of the variable and provide a brief description to it.
    • Using Firstobs and obs option in Data step
  3. Proc Print: Options
    • Firstobs and Obs – to specify the number of observations (based on observation number) to be printed in the result / output window
    • Label – Prints the label of the variable in place of the variable name in the output.
    • n – specifies the  number of observations in the result