SAS
Jump to navigation
Jump to search
Contents
- 1 Basics
- 2 DATA steps
- 3 PROCS
- 4 Macros
- 5 Enterprise Miner
Basics
PROC FORMAT
proc format; value cfmt 0 = "Non-Charter" 1 = "Charter"; value ufmt 0 = "Rural" 1 = "Urban"; run;
Example with string data
PROC FORMAT; VALUE $ soil_frmt 'STP' = 'Reconstructed prairie' 'REM' = 'Remnant prairie' 'CUL' = 'Cultivated land'; VALUE $ sterile_frmt 'Y' = 'yes' 'N' = 'no'; VALUE $ species_frmt 'L' = 'Leadplant' 'C' = 'Coneflower'; RUN;
PROC IMPORT
PROC IMPORT OUT= WORK.Charter_Wide DATAFILE= "G:\data\Classes\ST775\BYSH\BYSH Data sets and R scripts\Ch9 Two- Level Longitudinal\chart_wide_condense-SAS.csv" DBMS=CSV REPLACE; GETNAMES=YES; RUN;
- load from file
- limit number of lines on import
- missover option
- import from excel
Functions
- LOG - natural logarithm
- LAG - a function used in data step
- PROBNORM - area_under_curve
- probnorm((x - mu) / sigma); = P( (10-12.3) / 2.3 ) = P( X < 10 )
Other stuff
- distribution functions
- TITLE
- PROC EXPORT
- FILENAME infile "W:\SAS_projects\ST775\seeds2.csv";
- LIBNAME musicraw "W:\SAS_projects\ST775\music";
DATA steps
- DATA name;
- INFILE 'K:\LG\iicbu\IICBU\colettace\SAS_examples\datasets\running.dat';
- INPUT class 1 sex $ 3 race1_min 5 race1_sec 7-8 race2_min 10 race2_sec 12-13;
- LABEL gestage="Gestational Age (days)" bweight="Birth Weight (grams)";
- WHERE
- SET other_data1 other_data2 other_data3;
- BY varname;
- DO
- DROP varname1 varname2;
- KEEP varname3 varname4;
- put
- FORMAT urban ufmt. charter cfmt.;
- MERGE dataset1 dataset2; BY variablename;
- MERGE Charter_Long nonc_2 (in = notc);
IF
- IF varname ^= .
- keep rows
- THEN
- ELSE
IF instrument = "orche" THEN orch = 1; ELSE orch = 0; IF perform_type = "Large Ensemble" THEN large = 1; ELSE large = 0;
PROCS
PROC PRINT
PROC SORT
Syntax: PROC SORT <collating-sequence-option> <other option(s)>; BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>; The SORT procedure orders SAS data set observations by the values of one or more character or numeric variables. The SORT procedure either replaces the original data set or creates a new data set. PROC SORT produces only an output data set.
PROC UNIVARIATE
Example: Test for normality
title3 "Test whether the systolic BP for entire group is normally distributed"; proc univariate data=problem5_3 normal; var sys_bp; histogram sys_bp / normal midpoints=90 to 140 by 2.5; probplot / square; run;
Other Example
proc univariate data=merged noprint; by gender notsorted; var height weight; hist height weight; run;
PROC BOXPLOT
PROC BOXPLOT data=WORK.chart_long; PLOT MathAvgScore * charter / GRID HORIZONTAL BOXSTYLE=SCHEMATIC; RUN;
PROC MEANS
Descriptive Statistics
proc means data=problem10_3 mean std min max; by sex; var race1_time; run;
Hypothesis testing
* Mean time for girls in race 1 > 78s?"; DATA problem10_3; test_race1 = race1_time - 78; RUN; proc means data=problem10_3 t probt; by sex; var test_race1; run;
Documentation
Syntax: PROC MEANS <option(s)> <statistic-keyword(s)>; BY <DESCENDING> variable-1 <... <DESCENDING> variable-n><NOTSORTED>; CLASS variable(s) </ option(s)>; FREQ variable; ID variable(s); OUTPUT <OUT=SAS-data-set> <output-statistic-specification(s)> <id-group-specification(s)> <maximum-id-specification(s)> <minimum-id-specification(s)> < / option(s)> ; TYPES request(s); VAR variable(s) < / WEIGHT=weight-variable>; WAYS list; WEIGHT variable; The MEANS procedure provides data summarization tools to compute descriptive statistics for variables across all observations and within groups of observations. For example, PROC MEANS o calculates descriptive statistics based on moments o estimates quantiles, which includes the median o calculates confidence limits for the mean o identifies extreme values o performs a t test. By default, PROC MEANS displays output. You can also use the OUTPUT statement to store the statistics in a SAS data set. PROC MEANS and PROC SUMMARY are very similar.
Example
** Obtain the average of the three math scores by school. ; PROC MEANS DATA = Charter_Long noprint nway; class SchoolNum; var AvgMathScore; output out=math_mean mean = Mean_Math_Score; RUN; * Get Charter and Urban to make plots; DATA CU; set charter_wide; keep SchoolNum urban charter; run; data math_mean; merge math_mean CU; by SchoolNum; RUN;
PROC FREQ
Pivot Table Example
PROC FREQ DATA=music; TABLES orch * large; RUN;
Chi-Squared test of Proportion
proc freq data=problem10_8; table sex / chisq testp = (0.5, 0.5); run;
Documentation
Keyword: FREQ Context: [PROCEDURE DEFINITION] PROC FREQ Syntax: PROC FREQ <options> ; BY variables ; EXACT statistic-options </ computation-options> ; OUTPUT <OUT=SAS-data-set> options ; TABLES requests </ options> ; TEST options ; WEIGHT variable </ option> ; The FREQ procedure produces one-way to n-way frequency and contingency (crosstabulation) tables. For two-way tables, PROC FREQ computes tests and measures of association. For n-way tables, PROC FREQ provides stratified analysis by computing statistics across, as well as within, strata. For one-way frequency tables, PROC FREQ computes goodness-of-fit tests for equal proportions or specified null proportions. For one-way tables, PROC FREQ also provides confidence limits and tests for binomial proportions, including tests for noninferiority and equivalence. For contingency tables, PROC FREQ can compute various statistics to examine the relationships between two classification variables. For some pairs of variables, you might want to examine the existence or strength of any association between the variables. To determine if an association exists, chi-square tests are computed. To estimate the strength of an association, PROC FREQ computes measures of association that tend to be close to zero when there is no association and close to the maximum (or minimum) value when there is perfect association. The statistics for contingency tables include the following: o chi-square tests and measures o measures of association o risks (binomial proportions) and risk differences for 2 x 2 tables o odds ratios and relative risks for 2 x 2 tables o tests for trend o tests and measures of agreement o Cochran-Mantel-Haenszel statistics
BY variables
Syntax: BY variables; You can specify a BY statement with PROC BCHOICE to obtain separate analyses of observations in groups that are defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If you specify more than one BY statement, only the last one specified is used. If your input data set is not sorted in ascending order, use one of the following alternatives: • Sort the data by using the SORT procedure with a similar BY statement. • Specify the NOTSORTED or DESCENDING option in the BY statement for the BCHOICE procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. • Create an index on the BY variables by using the DATASETS procedure (in Base SAS software).
TABLE options
Keyword: CHISQ Context: [PROC FREQ, TABLE STATEMENT] CHISQ option Syntax: CHISQ <(chisq-options)> Requests chi-square tests of homogeneity or independence and measures of association that are based on the chi-square statistic. For two-way tables, the chi-square tests include the Pearson chi-square, likelihood ratio chi-square, and Mantel-Haenszel chi-square tests. The chi-square measures include the phi coefficient, contingency coefficient, and Cramér's V. For tables, the CHISQ option also provides Fisher's exact test and the continuity-adjusted chi-square test. For one-way tables, the CHISQ option provides the Pearson chi-square goodness-of-fit test. You can also request the likelihood ratio goodness-of-fit test for one-way tables by specifying the LRCHISQ chisq-option in parentheses after the CHISQ option. By default, the one-way chi-square tests are based on the null hypothesis of equal proportions. Alternatively, you can provide null hypothesis proportions or frequencies by specifying the TESTP= or TESTF= chisq-option, respectively. You can specify the following chisq-options in parentheses after the CHISQ option: DF=df pecifies the degrees of freedom for the chi-square tests. The value of df must not be zero. LRCHI requests the likelihood ratio goodness-of-fit test for one-way tables. TESTF=(values) | SAS-data-set specifies null hypothesis frequencies for the one-way chi-square goodness-of-fit tests. TESTP=(values) | SAS-data-set specifies null hypothesis proportions for the one-way chi-square goodness-of-fit tests. WARN=value | (values) controls the warning message for the validity of the asymptotic Pearson chi-square test. By default, PROC FREQ displays a warning message when more than 20% of the table cells have expected frequencies that are less than 5.
PROC CORR
proc corr data=problem15_1 nosimple; var house_size; with family_income; run;
PROC REG
- create a linear model
proc reg data=problem15_1 plots; model peak_hour_load = aircon_capac; plot peak_hour_load * aircon_capac; * Plot of data with fitted line; run;
PROC FMM
PROC FMM data=a (where=(Player=1)); MODEL Score = /k=3 parms( -5 4 13 **GUESSES**); RUN;
PROC SGPLOT
Bar plot
proc SGPLOT data=problem6_8; vbar momeduc; run;
Histogram
title 'Mileage Distribution'; proc sgplot data=sashelp.cars; histogram mpg_city; density mpg_city / type=normal legendlabel='Normal' lineattrs=(pattern=solid); density mpg_city / type=kernel legendlabel='Kernel' lineattrs=(pattern=solid); keylegend / location=inside position=topright across=1; xaxis display=(nolabel); run;
Regression plot
proc sgplot data=elephant.data; REG x=AGE y=MATINGS; RUN;
Boxplot
proc sgplot data = Charter_Long; hbox AvgMathScore / group = charter; title 9.2a Average Math Score by Type of School; title2 All Data; run;
Scatterplot with regression line
proc sgplot data = Charter_Long; scatter x = schPctsped y = AvgMathScore; reg x = schPctsped y = AvgMathScore; run;
PROC SGPANEL
Example Spaghetti plot by species with loess fit
PROC SGPANEL DATA=seeds_long; WHERE plant <= 71; PANELBY plant / COLUMNS=5 ROWS=5 spacing=8; SERIES X=time13 y=hgt / GROUP=plant LINEATTRS = (COLOR = gray); SCATTER X=time13 y=hgt; LOESS X=time13 Y=hgt / lineattrs = (color = black thickness =2); RUN;
PROC TTEST
- two sample ttest
- pooled ttest
- paired ttest
Example
proc ttest data=question6; class group_cat; var psa psa_ln; run;
Documentation
Syntax: PROC TTEST <options> ; CLASS variable ; PAIRED variables ; BY variables ; VAR variables </ options> ; FREQ variable ; WEIGHT variable ; The TTEST procedure performs t tests and computes confidence limits for one sample, paired observations, two independent samples, and the AB/BA crossover design.
PROC NPAR1WAY
Wilcoxon
proc npar1way data=question6 wilcoxon; class group_cat; var psa psa_ln; run;
PROC MIXED
- PROC MIXED documentation
- "A mixed linear model is a generalization of the standard linear model used in the GLM procedure, the generalization being that the data are permitted to exhibit correlation and nonconstant variability."
- "The mixed linear model, therefore, provides you with the flexibility of modeling not only the means of your data (as in the standard linear model) but their variances and covariances as well."
- "The primary assumptions underlying the analyses performed by PROC MIXED are as follows:
- "The data are normally distributed (Gaussian).
- "The means (expected values) of the data are linear in terms of a certain set of parameters.
- "The variances and covariances of the data are in terms of a different set of parameters, and they exhibit a structure matching one of those available in PROC MIXED.
Example
proc mixed data = charter_long noclprint; class SchoolNum; model AvgMathScore = Year0809 year0910 / s; random Int / sub = SchoolNum g gcorr; title 9.5.3 Piecewise linear model; run;
Example2
PROC MIXED DATA=chart_long NOCLPRINT CL COVTEST METHOD=REML; CLASS schoolid; MODEL MathAvgScore = charter urban schPctfree schPctsped year08 charter*year08 urban*year08 schPctsped*year08 / SOLUTION RESIDUAL CL; RANDOM INT year08/ SUBJECT=schoolid TYPE=UN G SOLUTION GCORR; ODS EXCLUDE WHERE=( _PATH_ ? 'ResidualPlots' ); ODS EXCLUDE "The Mixed Procedure"."Solution for Random Effects"; RUN;
Example3
PROC MIXED DATA=music_final_model NOCLPRINT CL COVTEST METHOD=REML; CLASS id; MODEL na = previous students juried public solo mpqpem mpqab orch mpqnem mpqnem:solo / SOLUTION RESIDUAL CL; RANDOM INT previous public / SUBJECT=id TYPE=UN G SOLUTION GCORR; RUN;
PROC GENMOD
PROC GENMOD DATA=elephant.data; * CLASS ; MODEL matings=age age2/ DIST=poisson LINK=log; RUN;
Quasiliklihood
PROC GENMOD DATA=elephant_quad_model; MODEL MATINGS=AGE / DIST=poisson LINK=log DSCALE; RUN;
PROC SQL
PROC SQL ; CREATE TABLE elephant_summary AS SELECT AGE, MEAN(MATINGS) AS MEAN_MATINGS FROM elephant.data GROUP BY AGE; QUIT;
PROC SURVEYSELECT
proc surveyselect data = nonc out = nonc_2 method = srs seed = 275214 SAMPSIZE=80; run;
Macros
Simple Example
%macro mytest( indep_var ); proc freq data=skyline; table gender * &indep_var / chisq; *table var2 / chisq cellchi2; run; %mend mytest; %mytest( compare ); %mytest( argumentation );
Crazy Example
%macro sphyg_mixed(Y, YNAME, X, X2, FILE); ODS TRACE ON; ODS PDF FILE="K:\LG\iicbu\IICBU\colettace\SAS_projects\SardiNIA\output\&FILE.all_output.pdf" STYLE=HTMLEncore; TITLE1 Mixed Effects Model Results for &YNAME. for men and women combined.; PROC MIXED DATA=sphygdat NOCLPRINT NOITPRINT MAXFUNC=400 COVTEST; CLASS id_individual Sex machine_ver; MODEL &Y = &X &X2 / SOLUTION RESIDUAL CL OUTPRED=&Y._pred OUTPREDM=&Y._predm; * DDFM=KENWARDROGER ; RANDOM INT Time / SUBJECT=id_individual TYPE=UN G SOLUTION GCORR; ODS OUTPUT solutionf=&Y._sf(rename=(estimate=&Y._fe)); ODS OUTPUT solutionr=&Y._sr(rename=(estimate=&Y._re)); ODS EXCLUDE "The Mixed Procedure"."Solution for Random Effects"; RUN; PROC EXPORT DATA=&Y._sf OUTFILE= "K:\LG\iicbu\IICBU\colettace\SAS_projects\SardiNIA\output\&FILE._&Y._solutionf.csv" DBMS=csv REPLACE; RUN; PROC EXPORT DATA=&Y._sr OUTFILE= "K:\LG\iicbu\IICBU\colettace\SAS_projects\SardiNIA\output\&FILE._&Y._solutionr.csv" DBMS=csv REPLACE; RUN; DATA sphyglib.&Y._predm; SET &Y._predm; RUN; * save/reload results to/from disk, needed this for some reason; DATA sphyglib.&Y._pred; SET &Y._pred; KEEP id_individual Wave reading agegroup &X &Y pred resid StdErrPred; RUN; DATA &Y._pred; SET sphyglib.&Y._pred; FORMAT Sex sex_frmt.; FORMAT agegroup agegroup_frmt.; RUN; *TITLE1 'Contents of &YNAME. dataset AFTER running model'; *PROC CONTENTS DATA=&Y._pred; *RUN; TITLE1 &YNAME. Model Checks; TITLE2 Correlation between obs and pred values from LME model; PROC CORR DATA = &Y._pred; VAR &Y pred; RUN; goptions htext = 2 hby = 2;* colors = (black); symbol1 cv=black v=dot height = 0.5 i=none; axis1 label = (a=90 'Observed'); axis2 label = ('Predicted'); PROC GPLOT DATA=&Y._pred; PLOT &Y * pred / vaxis = axis1 haxis = axis2; TITLE Obseved vs. Predicted for &YNAME; RUN; QUIT; goptions htext = 2 hby = 2;* colors = (black); symbol1 cv=black v=dot height = 0.5 i=none; axis1 label = (a=90 'Residual'); axis2 label = ('Predicted'); PROC GPLOT DATA=&Y._pred; PLOT resid*pred / vref = 0 vaxis = axis1 haxis = axis2; TITLE Residuals vs. Predicted for &YNAME; RUN; QUIT; PROC REG DATA=&Y._pred PLOTS=NONE; MODEL pred = &Y / RSQUARE RMSE; RUN; QUIT; ODS PDF CLOSE; ODS TRACE OFF; %mend sphyg_mixed; %let covars = machine_ver Sex fage fage2 Time; %let covars2 = fAge*Time; %let other_covars = exmWeight exmHeight exmBMI exmWaist pwv labsGlicemia labsHdl labsTrigliceridi labsColesterolo; %let exp_var_name_list= pwv_ln SP C_SP P_SP DP C_DP P_DP HR P_MEANP C_MEANP; %let exp_var_desc_list = ln(PWV), Systolic Pressure, Central Systolic Pressure, Peripheral Systolic Pressure, Diastolic Pressure, Central Diastolic Pressure, Peripheral Diastolic Pressure, Heart Rate, Peripheral Mean Pressure, Central Mean Pressure; /* macro function signature %macro sphyg_mixed(Y, YNAME, X, X2, FILE); */ %local i this_var this_description; %do i=1 %to %sysfunc( countw( &exp_var_name_list ) ); %let this_var = %scan( &exp_var_name_list, &i ); %let this_description = %scan( &exp_var_desc_list, &i ); %sphyg_mixed( &this_var, &this_description, &covars &other_covars, &covars2 , &this_var._model2 );