SAS CODE
%web_drop_table(WORK.GAPMINDER);
PROC IMPORT DATAFILE="/home/mst07221/gapminder.csv"
DBMS=CSV
OUT=WORK.GAPMINDER;
GETNAMES=YES;
RUN;
***********************************************************************
PROC STANDARD: A new explanatory variable is created from the current explanatory variable.
PROC MEANS: the results are confirmed
**********************************************************************;
DATA gapminder2;
SET WORK.GAPMINDER;
zurbanrate = urbanrate;
RUN;
PROC STANDARD DATA=gapminder2 MEAN=0 STD=1 OUT=zgapminder2;
VAR zurbanrate;
RUN;
PROC MEANS DATA=zgapminder2;
VAR zurbanrate urbanrate femaleemployrate;
RUN;
***********************************************************************
BASIC LINEAR REGRESSION. Generate a scatter plot.
**********************************************************************;
PROC SGPLOT DATA=zgapminder2;
reg x=zurbanrate y=femaleemployrate / lineattrs=(color=blue thickness=2);
title "Scatterplot for the association Between Urban Rate and Female Employment Rate";
yaxis label= "Female Employment Rate";
xaxis label="Standardized Urban Rate";
RUN;
title;
***********************************************************************
The GLM procedure uses the method of least squares to fit general linear models;
The option 'solution' produces parameter estimates;
**********************************************************************;
PROC GLM DATA=zgapminder2;
model femaleemployrate=zurbanrate/solution;
RUN;
%web_open_table(WORK.GAPMINDER);
RESULTS
The MEANS Procedure
Variable | N | Mean | Std Dev | Minimum | Maximum |
---|---|---|---|---|---|
zurbanrate
urbanrate
femaleemployrate
|
203
203
178
|
7.328566E-16
56.7693596
47.5494381
|
1.0000000
23.8449326
14.6257425
|
-1.9446211
10.4000000
11.3000002
|
1.8129907
100.0000000
83.3000031
|
The GLM Procedure
Number of Observations Read | 213 |
---|---|
Number of Observations Used | 173 |
The GLM Procedure
Dependent Variable: femaleemployrate
Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
Model | 1 | 3446.74019 | 3446.74019 | 17.28 | <.0001 |
Error | 171 | 34100.73113 | 199.41948 | ||
Corrected Total | 172 | 37547.47132 |
R-Square | Coeff Var | Root MSE | femaleemployrate Mean |
---|---|---|---|
0.091797 | 29.57241 | 14.12160 | 47.75260 |
Source | DF | Type I SS | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
zurbanrate | 1 | 3446.740194 | 3446.740194 | 17.28 | <.0001 |
Source | DF | Type III SS | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
zurbanrate | 1 | 3446.740194 | 3446.740194 | 17.28 | <.0001 |
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
---|---|---|---|---|
Intercept | 47.72703970 | 1.07366269 | 44.45 | <.0001 |
zurbanrate | -4.58870848 | 1.10374814 | -4.16 | <.0001 |
- Of the 213 observations available, 173 were used in the model.
- The overall F test is significant (F = 17.28, p < 0.0001 ). We can reject the NULL hypothesis and conclude that the urban rate is significantly associated with the female employ rate.
- The parameter estimates show a coefficient value of -4.59 and an intercept value of 47.73 (beta0 = 47.73, beta1 = -4.59).
- Therefore the best fit line equation is: femaleemployrate = 47.73 – 4.59 * urbanrate. This indicates a negative association between the two variables, also evident in the fit plot.
- The p-values for both the intercept and coefficient values are very small ( both p < 0.0001). This indicates there is indeed a straight-line relationship between femaleemployrate and urbanrate.
- The R-square value of 0.092 indicates that the proportion of variance in the response variable that can be attributed to the explanatory variable is only 9.2%. There may be other confounding variables that directly or inversely impact the relationship between the femaleemployrate and the urbanrate.
No comments:
Post a Comment