Posted: October 27th, 2022

DATA ANALYTICS

Data Analysis of US Population

Netra Basyal 11900378

Ashutosh Sen Thapa 11801770

Sonia Thapa 11801446

Mohan Khadka 11801650

Table of Contents

Introduction

Here we have the analysis of the population of the US with its region and states from 1910 to 2010 provided by the US bureau. Population analysis has been carried on from ages to see how population fluctuates and what are the reasons behind these effects. We have taken the data and presented in a more understandable form through data visualization and analyzed the following data and researched on how and why the fluctuation exists among these time periods. We have also predicted the population through the use of regression lines to show the population of 50 years taking the average growth rate within this period of time.

On this analysis project, we the group of 4 members took last 100 years old data of Population number and Population growth rate of USA and Different regions of the USA and the 50 states of the country and described, compared and predicted the future projections. During the analysis, we found out the south and west part of the country is getting highly populated while California has come forward as most populated state of the country beating New York’s long streak.

About Data

Analysis Techniques

For our project to analyze the US census data of the last 100 years, we have chosen 2 analysis techniques which includes Descriptive analysis and Predictive analysis.

Descriptive Analysis: Descriptive analysis also known as insight into the past is one of the analysis techniques that describes, summarizes the acquired raw data into more simpler and understandable form for humans. They are high in demand to analyze the past data such as in our case we have all the provided data of the US census from the year 1910 till 2010. Descriptive analysis are useful to research on the past data and understand how they might affect the future results. Descriptive analysis mostly contains the basic arithmetic such as sums, averages and percentages where we apply these methods for all practical purposes and shows the outcome like total number of population in the US, Average growth rate, maximum and minimum population in the country or the states etc. Thus, descriptive analysis are normally used when we need to understand at an aggregate and summarize and describe the data in a more simpler way.

Predictive Analysis: Predictive analysis can also be named as understanding the future because this method simply predicts the future of the given data. They provide actionable insights based on the acquired data. Predictive analysis uses the method of probability to estimate the future results even though the predicted outcome would never be 100% certain. This method is mostly used to forecast the future. The statistics are all done through guesses. The process of predictive analysis is firstly, they accumulate all the data acquired from the past and look for patterns in the data and apply various statistical models and algorithms to show the relationship between the data sets. As for the project, we can see that we have used the regression lines to predict the future of 50 years through the data acquired from the US bureau of the population census from the past century. Thus, predictive analysis is mostly used when we want to forecast the future.

Data Visualization

Data visualization is the process of placing of the accumulated data to help people understand the significance of the data. It is the graphical representation of various data visualized through patterns, trends and correlations. The data are visualized through charts, graphs and maps. This will help deliver the information efficiently and effectively to the people for a better overview of the data. Data visualization has been a popular tool for research, teaching and development in the modern world. We have used Ms. Excel in our project to visualize the information in a simpler and more understandable form, from where we are able to explain the graph more efficiently without any complexity.

United States Analysis

Here we have the population trend of the past 100 years. According to the bureau, we can see that the population has grown from 92,228,531 to 308,745,538 from 1910 to 2010. We can sense that the population has been growing in the US gradually and has increased approximately 200 million within these 100 years. This data shows that within these 100 years, the population of the US has tripled itself.

The above data shows the growth rate of population of the US from 1910 to 2010. We can clearly see that the growth rate has been fluctuating within these years. Here, the data shows that the growth rate in 1910 was 21 whereas it was only 9.7 in 2010. We can sense that the growth rate was decreased in 1910-1920 and was drastically decreased in 1930-1940 possibly due to the World Wars. Here, we can also see that the growth rate has been gradually decreasing since 1970. The main reasons behind this was the urbanization, family planning and investment in children.

The following diagram shows the increase of US region population from 1910 to 2010. In 1910, west lead last position but it lead second position in 2010. The population of south and west was skyrocketed whereas remaining two region population was slightly increased.

Regression Analysis

US Regions

In our group report we have used descriptive and statistics data analysis. While comparing the population of US region from 1990 to 1960, the following change can be seen in following pie chart. In 1910, mid-west and south has leaded first position whereas northeast and west occupy second and third position respectively. The population of Midwest and south were slightly decreased by 3% and 1% respectively in 1960. The population of west was drastically increased in 1960.

Like this we also have calculated the US region population of 2010 through following pie chart. In 2010, south has maximum population which occupy 37%. In 2010, northeast occupy 18% which has minimum population as compared to others.

In 1950, west will occupy 36.1% of the total population of US with number of 130,000,000 in the region which is estimated by given regression analysis.

In the following figure, we have calculated the maximum and minimum population of the US states as per year. New York represent maximum population from 1910 to 1970 whereas Alaska represent minimum population from 1910 to 1990. California lead maximum population from 1980 to 2010 but Wyoming had least population from 2000 to 2010.

In this report, we have calculated and analyzed the US region population through data collection. We have used two analysis techniques one is predictive, and another is descriptive. Using pie-chart and line graph, we have presented already collected data and summarized data. Descriptive statistics mean describing elementary feature of the data in study which gives primary summaries about the sample and the measure. basis of virtual every quantitative analysis of data is formed by simple graphics analysis.

The process of extracting the information from existing set of data which help to determine the pattern and predict about the future is called predictive data analysis. Regression line, we have predicted the future population regarding the population trend of the last century. In our report, we have used predictive data analysis techniques using regression line predict future population.

References

Halo. (2020). Descriptive, Predictive, and Prescriptive Analytics Explained.

Available at:

https://halobi.com/blog/descriptive-predictive-and-prescriptive-analytics-explained/

SearchBusinessAnalytics. (2020). What is data visualization? – Definition from WhatIs.com.

Available at:

https://searchbusinessanalytics.techtarget.com/definition/data-visualization

Lam, D. (2011). How the World Survived the Population Bomb: Lessons From 50 Years of Extraordinary Demographic History. Demography, 48(4), pp.1231-1262.

