v25 ), energy consumption per capita W r i t i n g

**Question 1:**

(15 points) Using the Baylor dataset (available on Canvas) in STATA, select two __categorical__ variables that you think may be correlated.

(a) Write a sentence or two explaining the two variables you selected and why you think they may be correlated. Note which one you think of as the independent variable and which one as the dependent variable.

(b) Write a null hypothesis and an alternative hypothesis. Report the alpha-level you will use.

(c) Perform a hypothesis test using the chi-square test.

(d) Present the full table, including the expected values and the row, column and table totals. Present the test statistic and the p-value. Report whether you reject or retain the null hypothesis. Write a few sentences interpreting these results.

**Question 2:**

(15 Points) Using the WDI dataset (available on Canvas) in STATA, select two variables that are both measured at the __ratio level__ that you think may be correlated.

(a) Write a sentence or two explaining the two variables you selected and why you think they may be correlated. Note which one you think of as the independent variable and which one as the dependent variable.

(b) Write a null hypothesis and an alternative hypothesis. Report the alpha-level you will use.

(c) Perform a regression analysis.

(d) Write the regression equation from this analysis, including the slope and y-intercept. Report the R^{2}. Report the p-value for the slope coefficient and whether or not you reject the null hypothesis.

(e)Make a scatter plot of these two variables in STATA and include the fitted regression line. Include this graph in your final exam document.

(f) Write a few sentences interpreting the analysis.

**Question 3:**

(20 Points) Using the WDI dataset (available on Canvas) in STATA, you want to explore how several factors may be connected to overall life expectancy (v40) of people in nations. The variables you think might be important are GDP per capita (v25), energy consumption per capita (v20), air pollution as indicated by exposure to high levels of particulate matter (v49), and the percentage of people living in urban areas (v59).

(a) Present a correlation matrix of all of these variables including a significance test for each (use a .05 alpha level). Explain what each of these bivariate relationships suggests about factors that are connected to life expectancy.

(b)Do a multiple regression analysis with life expectancy as the dependent variable, including the four other variables as independent variables. Present and interpret the results from this regression analysis (i.e., include the full regression equation, the significance tests for each slope coefficient, and the R^{2} value).

(c) What does the regression analysis suggest about the relationships among these variables that is different from what you might think based on the bivariate relationships (from the correlation matrix)?

WDI data: https://drive.google.com/file/d/1Cjgf0BAtZpNKAMXcH…

Baylor data: https://drive.google.com/file/d/1dYzMkyQZv98V6Y-n9…