Saturday, April 30, 2016

Introduction to Regression

Describing the data used in the statistical analysis:

The Data Sample
The research is based on the Gapminder codebook which has 15 socio-economic factors from 2007 drawn from 213 countries.  Based on research review, I chose to study the correlation between urban rate and the female employ rate in the Gapminder dataset among the 213 countries.

Female employ rate:
  • 178 of the 213 countries have female employment statistics listed.
  • The highest female employment rate listed is 83.3%, from Burundi.
  • The lowest female employment rate listed is 11.3%, from the West Bank and Gaza.
  • The female employment rate of 42.1% has the highest frequency distribution of 1.9%. This belongs to 4 countries: Bulgaria, Poland, Albania and Maldives.
  • Of the 178 countries with female employment rate statistics, 76 of these countries (42.7%) have an employment rate equal to or greater than 50%.

Urban rate:
  • 203 of the 213 countries have urbanization rate statistics listed.
  • The highest urban rate listed is 100%, from 6 countries: Hong Kong / China, Singapore, Macao / China, Cayman Islands, Monaco and Bermuda.
  • The lowest urban rate listed is 10.4%, from Burundi. 
  • The urban rate with the highest frequency distribution of 2.8% is observed among the 6 countries with the highest urban rates.
  • Of the 203 countries with urbanization rate statistics, 115 of these countries (56.7%) have an urban rate equal to or greater than 50%.
The data is available at the country level and the analysis is focused on the 173 countries that have both female employ rate and urban rate values.

Procedures
The 2007 female employ rate comes from the International Labour Organization (ILO), a specialized agency of the United Nations. The ILO collects the data in order to set labor standards, develop policies and devise programs promoting decent work for all women and men. The data is derived from three major types of sources: establishment surveys (including administrative reports submitted by establishments or enterprises to the national authorities and assimilated to surveys), labor force/household surveys, and administrative sources (mostly  security records). Each type of source has its own characteristics and provides certain types of data.

The 2007 urban rate data is drawn by Gapminder from the World Bank’s World Development Indicators which is the World Bank’s primary collection of development indicators, and the United Nations Population Fund. The data is compiled from national census data provided by officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates. The World Bank collects the data in order to promote economic and social progress in developing countries by helping to raise productivity so that their people may live a better and fuller life. The United Nations Population Fund uses the data to address population and development issues, with an emphasis on reproductive health and gender equality, within the context of international development goals. 

Measures
The female employ rate represents the percentage of a country’s female population, age 15 and above, that was employed during the given year. Gapminder aggregates data for 213 countries from the International Labour Organization’s Key Indicators of the Labor Market Program. It represents the response variable in the analysis. 

The urban rate refers to the percentage of people living in urban areas in 213 countries, as defined by national statistical offices. It is calculated using World Bank and and the United Nations Population Fund 2007 population estimates and urban ratios from the United Nations World Urbanization Prospects. It represents the explanatory variable in the analysis.

For the purpose of an efficient analysis and because of the significant data ranges, both the explanatory and response variables were each grouped into 6 distinct categories. Countries with missing values were excluded from the analysis.