Categorical Analysis using SAS
Discussion Board Assignment for Ordinal Variables:
Use the class survey dataDownload class survey data (see also list of variables Download list of variablesand list of values)
) to examine the relationship between two ordinal variables or an ordinal variable and a binary variable. The ordinal variable may be one that you have constructed out of a continuous variable (will certainly be the case if you are using class survey data). Compare results from an appropriate testing method that accounts for the ordinal nature with results from the Pearson chi-square. Write a one or two-sentence lay summary of your findings.
EXAMPLE.
I have decided to look at number of shoes owned by whether one prefers to vacation by going to the beach or hiking. I convert # pairs of shoes into an ordinal variable in the data step (as presented below). I also created a new variable called “Pref” where the outcomes were coded as “Beach” or “Hiking”. I set those who answered “neither” with respect to their preference in the BeachHiking variable as missing for “Pref,” such that they will not be included in the analyses.
data new_survey; set class.class_survey;
cat_shoes=1;
if shoes GE 10 then cat_shoes=2;
if shoes GE 15 then cat_shoes=3;
if shoes GE 20 then cat_shoes=4;
if shoes =. then cat_shoes=.;
label cat_shoes=”Quartiles of Numbers of Pairs of Shoes”;
if BeachHiking=1 then Pref=’Beach’;
if BeachHiking=2 then Pref=’Hiking’;
run;
I could have just used a “where not(BeachHiking=0);” as a statement in my run of proc freq with the variable BeachHiking as the response. However, I liked having the outcomes labeled as “Beach” and “Hiking” rather than as 1 and 2.
Then I ran the following code:
proc freq data=new_survey;
table cat_shoes*Pref/nopercent nocol chisq cmh trend;
run;
Relevant output is below:
Table of cat_shoes by Pref | ||||
Pref | Total | |||
Beach | Hikin | |||
cat_shoes(Quartiles of Numbers of Pairs of Shoes) | 10 | 19 | 29 | |
1 | Frequency | |||
Row Pct | 34.48 | 65.52 | ||
2 | Frequency | 18 | 9 | 27 |
Row Pct | 66.67 | 33.33 | ||
3 | Frequency | 15 | 7 | 22 |
Row Pct | 68.18 | 31.82 | ||
4 | Frequency | 26 | 16 | 42 |
Row Pct | 61.90 | 38.10 | ||
69 | 51 | 120 | ||
Total | Frequency | |||
Frequency Missing = 14 |
Statistic | DF | Value | Prob |
Chi-Square | 3 | 8.5761 | 0.0355 |
Likelihood Ratio Chi-Square | 3 | 8.5686 | 0.0356 |
Mantel-Haenszel Chi-Square | 1 | 3.8745 | 0.0490 |
Phi Coefficient | 0.2673 | ||
Contingency Coefficient | 0.2583 | ||
Cramer’s V | 0.2673 |
Cochran-Armitage Trend Test | |
Statistic (Z) | 1.9766 |
One-sided Pr > Z | 0.0240 |
Two-sided Pr > |Z| | 0.0481 |
Cochran Armitage Trend Test:
ASSUMPTIONS: Large Sample size, Independent random sample, explanatory variable is ordinal.
H0: No relationship between the two variables vs. HA: There is an increasing or decreasing trend in rate of preferring beach with respect to the ranked categories of owned pairs of shoes
TEST STAT: Z=1.9766
Two-sided P-value: 0.048
CONCLUSION: Reject Null hypothesis at the 0.05 significance level. There is a significant association between number of pairs of shoes owned and preference for beach over hiking (p=0.048).
Pearson Chi-square:
ASSUMPTIONS: Large Sample size, Independent random sample.
H0: No relationship between the two variables vs. HA: Dependence preferring beach with respect to the ranked categories of owned pairs of shoes
TEST STAT: observed chi-square=8.58 ~ chi-square with 3 d.f.
Two-sided P-value: 0.036
CONCLUSION: Reject Null hypothesis at the 0.05 significance level. There is a significant association between number of pairs of shoes owned and preference for beach over hiking (p=0.036).
Note that the two tests provide very similar results. There is definitely a threshold effect in the data as the rates for beach preference is very different for those in the first quartile versus the upper three quartile of shoe ownership.
Lay summary which I would have written in a paper based on the Trend test which I chose a priori:
There was a significant association between number of pairs of shoes owned and whether one preferred vacationing at the beach or hiking (p=.048). While the preference for beach was only 34.5% among those in the lowest quartile of number of pairs of shoes owned, the preference for beach ranged between 61.9% and 68.2% in the upper three quartiles of number of pairs of shoes owned.