Date science in python

My assignment is due today

I need an A grade.

Don't use plagiarized sources. Get Your Custom Essay on
Date science in python
Just from $13/Page
Order Essay

COMP47670 Assignment 1: Data Collection & Preparation

Deadline: Monday 23rd March 2020

Overview:

The objective of this assignment is to collect a dataset from one or more open web APIs of your choice, and use Python to preprocess and analyse the collected data.

The assignment should be implemented as a single Jupyter Notebook (not a script). Your notebook should be clearly documented, using comments and Markdown cells to explain the code and results.

Tasks:

For this assignment you should complete the following tasks:

1. Data identification:

· Choose at least one open web API as your data source (i.e. not a static or pre-collected dataset). If you decide to use more than one API, these APIs should be related in some way.

2. Data collection:

· Collect data from your API(s) using Python. Depending on the API(s), you may need to repeat the collection process multiple times to download sufficient data.

· Store the collected data in an appropriate file format for subsequent analysis (e.g. JSON, XML, CSV).

3. Data preparation and analysis:

· Load and represent the data using an appropriate data structure (i.e. records/items as rows, described by features as columns).

· Apply any preprocessing steps that might be required to clean or filter the data before analysis. Where more than one API is used, apply suitable data integration methods.

· Analyse, characterise, and summarise the cleaned dataset, using tables and plots where appropriate. Clearly explain and interpret any analysis results which are produced.

· Summarise any insights which you gained from your analysis of the data. Suggest ideas for further analysis which could be performed on the data in future.

Guidelines:

· The assignment should be completed individually. Any evidence of plagiarism will result in a 0 grade.

· Submit your assignment via the COMP47670 Brightspace page. Your submission should be in the form of a single ZIP file containing the notebook (i.e. IPYNB file) and your data. If your data is too large to upload, please include a smaller sample of the data in the ZIP file.

· In the notebook please clearly state your full name and your student number. Also provide links to the home pages for the API(s) which you used.

· Hard deadline: Submit by the end of Monday 23rd March 2020

· 1-5 days late: 10% deduction from overall mark

· 6-10 days late: 20% deduction from overall mark

· No assignments accepted after 10 days without extenuating circumstances approval and/or medical certificate.

2

>COMP

4 1 6

8

0

S

ample API Assignment¶

In [

5

]:

import os
import urllib.request
import csv
import pandas as pd

Task 1: Identify one or more suitable web APIs¶

API Chosen:

A single API that was chosen for this assignment was that provided by www.worldweatheronline.com

Specifically, the historic weather data API –

http://developer.worldweatheronline.com/api/docs/historical-weather-api.aspx

The API is no longer freely available but they give out a free 60 day trial upon signing up, this entitles the user to 500 calls to the API per day.

The API key I received which works here is fbaf

42

9501ff4c

7

f92b846

3

2

17

d

10

3

In [3]:

api_key = “fbaf429501ff4c7f92b8463217d103”

Task 2: Collect data your chosen API(s)¶

Collecting Raw Data – Functions needed:

The following 3 functions were written to allow multiple calls of the API as only limited data is available per call.

These function are commented throughout and are called below:

In [4]:

#create a file with set headings – 2 diff types of data to store
def create_file (file_loc, headings):
with open(file_loc, “w”,newline=”) as write_file: #as in get_and_write_data function
f = csv.writer(write_file)
f.writerow(headings)
write_file.close()

#function to call the API, retreive the raw csv data, and write to a file
def get_and_write_data(link, file_loc):
response = urllib.request.urlopen(link)
html = response.read().decode()
with open(file_loc, “a”,newline=”) as write_file: #open the file / create it, newline =” to prevent blank lines being written
f = csv.writer(write_file)
lines = html.strip().split(“\n”)
for l in lines:
if l[0] ==”#”: # prevent it from writing the comments in the return of each API call
continue
elif l[0:10] in [“Not Availa”, “There is n”]: #prevent it from writing lines where no data is present (i.e. returns saying – “Not Available” or “There is no weather data available for the

date

provided. Past data is available from 1 July,

20

08 onwards only.”)
continue
else: #if doesn’t have those it is data and so should be written
l = l.split(“,”) #it comes in as a String, so convert to a list for later easier writing and manipulation
f.writerow(l)
#print (“Line Written”)
write_file.close()
#return print (“Monthly Data Appending to Raw File – Completed”)
# function to take in parameters set and then use this data to build a link
# to be passed into the get_and_write_data function
def get_raw_data(file_loc, api_key, location, year, month): #month needs to be a string to avoid invalid token errors for ints as the API needs a leading 0 for single digit months
while year <=20 16

: #iterate for all years available in api, namely July 2008 to date
if month == “02”: #need to change end date in the call to the API as it doesn’t return full values if the date doesn’t exist, e.g.

31

st of February
end_day = “

28


elif month in [“04”, “06”, “09”, “

11

“]:
end_day = “30”
else:
end_day = “31”
# the bulding of the link is what decides the data returned, it’s available in hourly intervals,
# for any location in different formats, the documentation below outlines the possibilities
# http://developer.worldweatheronline.com/api/docs/historical-weather-api.aspx
link = “http://api.worldweatheronline.com/premium/v1/past-weather.ashx?key=”+ api_key + “&q=”+ location +”&format=csv&date=”+ str(year) + “-“+ month +”-01&enddate=”+ str(year) + “-“+ month +”-“+ str(end_day) +”&tp=

24

“#
get_and_write_data(link, file_loc)
year = year+1

Task 3: Parse the collected data, and store it in an appropriate file format¶

Collecting Raw Data and writing raw data to CSV:

The following code retreives the raw data using the above Functions from the API and writes it to a CSV file.

This data needs extensive cleaning and manipulation before it can be used.

In [98]:

###Set Variable get the raw data from the API and store in the File location set here
location = “Dublin”
raw_file_loc = “weather-data-raw.csv”
create_file (raw_file_loc, ” “) # create a file with no headings to store the raw data, no headings needed as the data returns 2 distinct CSV lines with different # of columns
# the api only returns 1 month worth of data at a

time

# so a loop to iterate over all months beginning at Jan is needed
# the API needs the month in 0x format, therefore months 1-9 need to have a 0 added to the front,
# therefore the conversion between int and str was necessary here and a string is passed through to the funtion
month = 1
while month <= 12

:
if month <10: month = "0" + str(month) else: month = str(month) get_raw_data(raw_file_loc, api_key, "location", 2008, month) month = int(month)+1 print("Raw Data Collection Completed \n") Begin Raw Data Collection Raw Data Collection Completed

Task 4: Load and represent the data using an appropriate data structure. Apply any pre-processing steps to clean/filter/combine the data¶

Parsing Raw Data:

The raw data returns alternating lines of values, 8 CSVs for each day, and 24 columns of “Hourly data” which is also a daily average as the call to the API has been configured.

These need to be parsed and the data that is to be used later saved, while only 1 data set was needed, I decided to keep and write both sets to different files, for future proofing.

In [ ]:

hourly_file = “weather-data-hourly.csv”
daily_file = “weather-data-daily.csv”
#these are the headings as provided by the API documentation
hourly_headings = [“date”,”time”,”

tempC

“,”

tempF

“,”

windspeedMiles

“,”

windspeedKmph

“,”

winddirdegree

“,”

winddir16point

“,”

weatherCode

“,”

weatherIconUrl

“,”weatherDesc”,”

precipMM

“,”humidity”,”visibilityKm”,”pressureMB”,”cloudcover”,”

HeatIndexC

“,”

HeatIndexF

“,”

DewPointC

“,”

DewPointF

“,”

WindChillC

“,”

WindChillF

“,”

WindGustMiles

“,”

WindGustKmph

“,”

FeelsLikeC

“,”

FeelsLikeF

“]
daily_headings = [“date”,”maxtempC”,”maxtempF”,”mintempC”,”mintempF”,”sunrise”,”sunset”,”moonrise”,”moonset”]
#call on the function to create the files as needed
create_file(hourly_file, hourly_headings)
create_file(daily_file, daily_headings)
# open the raw data and then based on the length of the line, write to appropriate file
# the len of the lines is actually around 58-62 and over

18

0, so

100

was chosen for safety,
# this can be easily changed in future if the API changes
raw_data = open(raw_file_loc, “r”)
lines = raw_data.readlines()
for l in lines:
# print (len(l))
if len(l) <= 100: with open(daily_file, "a",newline='') as daily: df = csv.writer(daily) l = l.split(",") df.writerow(l) daily.close() elif len(l) >101:
with open(hourly_file, “a”,newline=”) as hourly:
hf = csv.writer(hourly)
l = l.split(“,”)
hf.writerow(l)
hourly.close()
else:
continue
raw_data.close()

Utilising Pandas and further Data Modification

  • With the CSV files written these are imported using Pandas.
  • 2 columns were chosen for analysis, namely Temperature and Precipitation for each day
  • The date field was stored as a String, so this was converted to a Datetime to allow for time analysis.

In [5]:

hourly_data = pd.read_csv(hourly_file)
daily_data = pd.read_csv(daily_file)
#convert date string to datetime – http://stackoverflow.com/questions/17

13

4716/convert-dataframe-column-type-from-string-to-datetime
pd.options.mode.chained_assignment = None # default=’warn’ ## suppress warning regarding A value is trying to be set on a copy of a slice from a DataFrame. – same warning was appearing using a For loop, the index and .loc, and that loop took 5 minutes to run on my machine
hourly_data[‘date’] = pd.to_datetime(hourly_data[‘date’]) # removed from to_datetime {, format=”YYYY-MM-DD”}
#for i in simplified_data.index:
# simplified_data.loc[i,’date’]=pd.to_datetime(simplified_data.loc[i, ‘date’])
simplified_data = hourly_data[[“date”, “tempC”, “precipMM”]] # extract temp and precip data for analysis and visualisation
simplified_data = simplified_data.sort_values(by=[‘date’]) # reorder the data by date

In [101]:

hourly_data[0:5]

Out[101]:

date time tempC tempF windspeedMiles windspeedKmph winddirdegree winddir16point weatherCode weatherIconUrl

HeatIndexC HeatIndexF DewPointC DewPointF WindChillC WindChillF WindGustMiles WindGustKmph FeelsLikeC FeelsLikeF

0

24 7

11 17

… 7 44 2

4

13 20 4 38

1

24 6 42 18 28

http://cdn.worldweatheronline.net/images/wsymb… … 6 42 2 35 1

1 33

2

24 5

S 113 http://cdn.worldweatheronline.net/images/wsymb… … 5 41

31 3

7 11 3 37

3

24 6

12

http://cdn.worldweatheronline.net/images/wsymb… … 6 42 3 37 3 37 10 17 3 37

4

24 5 42 17 28 100 E

http://cdn.worldweatheronline.net/images/wsymb… … 5 41 1 33 0 33

42 0 33

2009-01-01 44 94 E 113 http://cdn.worldweatheronline.net/images/wsymb… 35 38
2009-01-02 121 ESE 116 33 22 36
2009-01-03 41 14 23 172 -1 37
2009-01-04 43 19 314 NW 122
2009-01-05 176 26

5 rows × 26 columns

In [102]:

simplified_data[0:5]

Out[102]:

date tempC precipMM
1329

1330

16

1331

14

1332

15

1333

16

2008-07-01 15 15.8
2008-07-02 9.9
2008-07-03 25.5
2008-07-04 4.6
2008-07-05 37.5

Missing Data

Final Pre-Processing steps are to look for missing data to see if further pre-processing is needed.

In [138]:

#look for missing data
simplified_data.isnull().sum() # no missing values in the reduced dataset

Out[138]:

date 0
tempC 0
precipMM 0
dtype: int64

In [104]:

simplified_data.dtypes.value_counts()

Out[104]:

float64 1
int64 1
datetime64[ns] 1
dtype: int64

There’s no Null’s in the data, there’s also no strings either, this means there’s therefore no values in it such as “Not Available” or for example “No moonrise” in moonrise column, etc.

Both of these are highly indicative that all values are present.

The final Pre-processing step is to get Monthly averages to create a reduced size data set that can be easier visualised, but still accurate and indicative of the months rain and temperature.

In [107]:

monthly = simplified_data.groupby([pd.Grouper(key=’date’,freq=’M’)]) # http://stackoverflow.com/questions/32982012/grouping-dataframe-by-custom-date
avg_month = monthly.mean() #create a new DF based on the mean of the groupby object created above
print(avg_month[0:5])

tempC precipMM
date
2008-07-31 16.419355 8.290323
2008-08-31 17.064516 7.906452
2008-09-30 15.833333 5.186667
2008-10-31 13.032258 5.477419
2008-11-30 10.766667 3.250000

Task 5: Analyse and summarise the cleaned dataset¶

Descriptive Statistics

Initially of the Data Set containing all daily data:

In [110]:

print(“\nSimplified_data columnns:\n” + str(simplified_data.columns) + “\n”)
print(“Simplified_data Descriptive Stats:\n”)
print(simplified_data.describe())

Simplified_data columnns:
Index([‘date’, ‘tempC’, ‘precipMM’], dtype=’object’)
Simplified_data Descriptive Stats:
tempC precipMM
count 2740.000000 2740.000000
mean 12.941241 3.267628
std 4.517523 5.706110
min 2.000000 0.000000
25% 10.000000 0.100000
50% 13.000000 1.100000
75% 16.000000 3.800000
max 25.000000 52.400000

In [111]:

print(“Descriptive Stats:\n”)
print(avg_month.describe())

Descriptive Stats:
tempC precipMM
count 92.000000 92.000000
mean 12.742613 3.327383
std 3.926780 2.076563
min 4.000000 0.100000
25% 9.403226 1.858871
50% 12.768817 2.784516
75% 16.108333 4.209516
max 21.354839 12.500000

As can be seen from comparing both descriptive stats, the monthly average, seems to have removed outliers (e.g. max precipitation 52mm), has reduced the standard deviation, but the quartiles have remained largely the same.

Matplotlib and Pandas Graphing

In [112]:

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

Line Graphs and Area Plot¶

In [28]:

plt.figure()
avg_month.plot()
plt.title(“Avg Monthly Temperature and Precipitation in Dublin since July 2008\n”)
plt.ylabel(“Temperature C | Precipitation MM”)
plt.xlabel(“Time”)
plt.show()

In [35]:

avg_month.plot.area(stacked=False)

Out[35]:

Basic Line Graph and Area Plot show how temp and precip interact, as expected Temp increases and falls based on time of year.

Precipitation doesn’t seem to follow the same expected trend. It seems the Irish reputation for never ending rain is well deserved, although it would appear to have fallen in recent years.

Stacked Histogram¶

Shows the distribution of the data.

In [33]:

avg_month.plot.hist(stacked=True)

Out[33]:

ScatterPlots¶

Explore the data, look for patterns, outliers, etc.

In [38]:

avg_month.plot.scatter(x=”tempC”, y=”precipMM”, s=50 )

Out[38]:

In [114]:

plt.scatter(avg_month[‘tempC’], avg_month[‘precipMM’])
plt.show()

In [39]:

from pandas.tools.plotting import scatter_matrix
scatter_matrix(avg_month, alpha=0.2, figsize=(6, 6), diagonal=’kde’)

Out[39]:

array([[,
],
[,
]], dtype=object)

In [40]:

from pandas.tools.plotting import scatter_matrix
scatter_matrix(daily_data, alpha=0.2, figsize=(6, 6), diagonal=’kde’)

Out[40]:

array([[,
,
,
],
[,
,
,
],
[,
,
,
],
[,
,
,
]], dtype=object)

Dual Axis Line Graphs¶

In [54]:

plt.figure()
ax = avg_month.plot(secondary_y=[‘precipMM’])
ax.set_ylabel(“Temperature C”)
ax.right_ax.set_ylabel(“Precipitation MM”)
plt.title(“Avg Monthly Temperature and Precipitation in Dublin since July 2008\n”)
plt.xlabel(“Time”)
plt.show()

In [55]:

avg_month.plot(subplots=True, figsize=(6, 6));

Final Manipulation, Exploration and Visualisation¶

Temperature v Precipitation¶

For the purposes of this exploration, 2 new Data Frames were created, grouping by Temp and Precipitation and comparing to the mean value of the other, this data was then explored as outlined further below

In [132]:

#x=”tempC”, y=”precipMM”
avg_month_temp = avg_month.groupby(“tempC”)
temp_data = avg_month_temp.mean() #create a new DF based on the mean of the groupby object created above
print(temp_data[4:7])

precipMM
tempC
4.000000 12.500000
5.333333 0.100000
6.322581 4.441935
6.392857 2.585714
6.903226 2.574194

In [135]:

plt.figure()
avg_month_temp.mean().plot()#secondary_y=[‘precipMM’])
plt.title(“Avg amount (mm) of Precipitation as Temperature increases (Dublin since July 2008)\n”)
plt.xlabel(“Temperature – C”)
plt.ylabel(“Precipitation MM”)
plt.show()

In [136]:

#x=”tempC”, y=”precipMM”
avg_month_precip = avg_month.groupby(“precipMM”)
precip_data = avg_month_precip.mean() #create a new DF based on the mean of the groupby object created above
print(precip_data[0:1])

tempC
precipMM
0.100000 5.333333
0.722581 11.387097
0.733333 19.166667

In [137]:

plt.figure()
avg_month_precip.mean().plot()#secondary_y=[‘precipMM’])
plt.title(“Avg Temperature as amount of Precipitation increases (Dublin since July 2008)\n”)
plt.xlabel(“Precipitation – MM”)
plt.ylabel(“Temperature – C”)
plt.show()

Tentative Conclusion¶

Further in-depth studies and tests could be carried out to test the statistical significant of the results, and to incorporate other meterological datasets. However, based on the current data, there does not seem to be a strong relationship between level of rain compared to temperature.

So it doesn’t really matter how hot it gets in Dublin, we can still expect rain!

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
Place an Order Start Chat Now
image

Order your essay today and save 30% with the discount code Happy