Base SAS Programming – Day 4

Sorting and De-duping datasets in SAS

  1. Proc Sort
    • By statement is used in Proc Sort to sort variables in a datset
    • By default, the specified variable(s) is sorted in ascending order
    • Descending option / keyword is used in the ‘By’ statement along with the variable to sort it in descending order
    • Only the variable followed by ‘descending’ will be sorted in descending order, the rest of them, if any will be sorted in ascending order
    • Noduprecs option – is used to extract only the non duplicate observations of the dataset
    • Nodupkey option – is used to extract only the non duplicate observations of a particular variable(s) specified in the By statement
  2. Proc Print
    • By and Sum statement in SAS – using aggregate  functions in Proc Print procedure
  3. Proc Append
    • To append SAS datasets, is to stack observations of one dataset over the other. It is like a horizontal join.
    • The datasets being appended must have the same variables (same data type and variable name)
    • Force option – to append datasets with uncommon variable names/attributes
    • Appending of datasets can be done using data and set statement, which is more of a union of datasets.

Base SAS Programming – Day 1

The BASE SAS video series begins with the assumption that the student viewer has no background in SAS programming and in fact very limited to no prior exposure to any kind of programming at all. Base SAS video series comprises of 9 video lectures (Average of hour and a half each), plus additional videos covering Advanced topics like PROC SQL and SAS Macros.

Day 1 topics as below —

  1. Intro to Libraries, Data step and Proc step in SAS
  2. Data step example:
    • Creating a sample dataset using Datalines / Cards statement
    • Understanding data types in SAS
    • Informat and Format
    • Label
  3.  Proc step example: Overview
    • Proc Contents – to know the dataset structure, with list of variables and number of observations
      • Varnum option – to list the variables of a dataset in creation order (otherwise the list is in alphabetic order)
    • Proc Print – to view the data in a dataset.
      • Label option and Var statement
    • Proc Means – to extract basic statistics of a numerical variables in a dataset like N, Mean, Standard Deviation, Minimum and Maximum (default)