assignment work (AVD)

Outline for Assignment 2

Using the Documenting Research Guide and the Assignment 2 Instructions, develop your outline.

Don't use plagiarized sources. Get Your Custom Essay on

assignment work (AVD)

Just from $13/Page

Order Essay

When you begin working with the data, use the document Assignment 2:Finding the right data to get started.

Submit the outline in an MS Word document file type. Utilize the standards in APA 7 for all citations or references in the outline. Ensure that the document includes your name. Do not include your student identification number. Include a cover page with your name, you may use the Student Paper Template‘s cover page.

Both the Documenting Research Guide and the Student Paper Template can be found in the Useful Information folder in the content area.

Submit your outline on or before the Sunday at 11:59 PM eastern time.

By submitting this paper, you agree:

(1) that you are submitting your paper to be used and stored as part of the SafeAssign™ services in accordance with the

Blackboard Privacy Policy

;

(2) that your institution may use your paper in accordance with UC’s policies; and

(3) that the use of SafeAssign will be without recourse against Blackboard Inc. and its affiliates.

1/27/20 Assignment 2 Finding the right data x P a g e | 1

Research Assignment 2: Finding the right data

This document is provided to help you work through this robust research question. This document will offer

insight in how to address these fields, by working with three of fields and some tips for working with the data in

R. Work through fields one piece at a time, shown here in bold.

This research question requires several filters to obtain the secondary data sample that is necessary for this

research. Consider the field that represents whether a survey respondent has used the SO job board or is

aware of the board, but has never used it. When you identified the variable that represents this field, what

unique values can be found in the field?

The question has limited the scope of the data sample to respondents that use it and those that are aware of it,

but have never used it. We don’t have enough information! What is the survey question associated with these

survey answers?

Can you determine what unique answers or values you need to keep now?

Respondents that use it: how would they answer this survey question? Yes

Respondents that don’t use it, but know about the job board? No, I knew…

When you have lengthy character fields, it can get cumbersome to isolate the strings. If even one letter is out

of place, your filter will not perform as expected. Using the entire data frame of 88,883 observations the first

chunk returns zero observations; the second chunk returns 75,532 observations.

While the second options works, you may find that method to be slow to type and problematic due to typing

errors. How about this approach?

The function str_detect() or string detect searches a character string for the contents.

To keep it simple, don’t use the first word or last word of the string. Want to know why or more? Email me or

use help in RStudio.

What are the most influential features when predicting whether a survey respondent has used the SO job board

or is aware of the board but has never used it when considering respondents who reported residing in the

country …(your countries listed here); reported their age as somewhere between 18 and 65 years old; and that

indicated that they were either not at all, somewhat, or very confident in their manager; reported an

undergraduate major in either an engineering field, information systems, or web design, or statistics; in

addition to the responses these respondents reported regarding employment; how often the respondent

contributes to open source; and whether or not they code for a hobby; when the respondent indicated that

the number of years they have been coding is somewhere within one to 49 years using the data from SO

(2019)?

> unique(df$SOJobs)
[1] “No, I didn’t know that Stack Overflow had a job board”
[2] “No, I knew that Stack Overflow had a job board but have never used or visited it”
[3] “Yes”
[4] NA

Have you ever used or visited Stack Overflow Jobs?

df <- filter(df, # ‘and’ statement

SOJobs == “Yes”,

SOJobs == “No, I knew that Stack Overflow had a job board but have never used or visited it”)

df <- filter(df, # ‘or’ statement

SOJobs == “Yes”|

SOJobs == “No, I knew that Stack Overflow had a job board but have never used or visited it”)

df <- filter(df, str_detect(SOJobs, "knew")|SOJobs == "Yes")

1/27/20 Assignment 2 Finding the right data x P a g e | 2

Let’s look at another field from the research question: reported an undergraduate major in either an

engineering field, information systems, or web design, or statistics.

What unique values exist for this field?

Do you need the survey question? Maybe not. You do need to capture the questions in the sample section.

Here I’ve highlighted the words coinciding with the research question and truncated the list.

What’s the best method to approach this? Think about how you could use the string detect function here.

When you pick the words to detect, consider all the unique values. How can you validate the filter worked like

you had intended? Try using the function table() before and after your filter. The function table() will

alphabetize your unique field names, so they won’t appear in the same order as they did with unique(). Add

the Check the remaining fields and how often each of the fields occur in the data.

> unique(df$UndergradMajor)
[1]
[2] Web development or web design
[3] Computer science, computer engineering, or software engineering
[4] Mathematics or statistics
[5] Another engineering discipline (ex. civil, electrical, mechanical)
[6] Information systems, information technology, or system administration
[7] A business discipline (ex. accounting, finance, marketing)
[8] A natural science (ex. biology, chemistry, physics)
[9] A social science (ex. anthropology, psychology, political science)
[10] A humanities discipline (ex. literature, history, philosophy)
[11] Fine arts or performing arts (ex. graphic design, music, studio art)
[12] A health science (ex. nursing, pharmacy, radiology)
[13] I never declared a major

> unique(df$UndergradMajor)
[1]
[2] Web development or web design
[3] Computer science, computer engineering, or software engineering
[4] Mathematics or statistics
[5] Another engineering discipline (ex. civil, electrical, mechanical)
[6] Information systems, information technology, or system administration
[7] A business discipline (ex. accounting, finance, marketing)

df <- df %>%
filter(str_detect(UndergradMajor, “engineering”)|
str_detect(UndergradMajor, “information”)|
str_detect(UndergradMajor, “statistics”)|
str_detect(UndergradMajor, “web”)) %>%
droplevels() # droplevels() is necessary when dropping factor levels

> table(df$UndergradMajor)
A business discipline (ex. accounting, finance, marketing)
1841
A health science (ex. nursing, pharmacy, radiology)
323
A humanities discipline (ex. literature, history, philosophy)
1571
A natural science (ex. biology, chemistry, physics)
3232
A social science (ex. anthropology, psychology, political science)
1352
Another engineering discipline (ex. civil, electrical, mechanical)
6222
Computer science, computer engineering, or software engineering

> table(df$UndergradMajor)
Another engineering discipline (ex. civil, electrical, mechanical)
6222
Computer science, computer engineering, or software engineering
47214
Information systems, information technology, or system administration
5253
Mathematics or statistics
2975
Web development or web design
3422

1/27/20 Assignment 2 Finding the right data x P a g e | 3

When you think it’s a field with numeric values, but it isn’t. From the research question:

when the respondent indicated that the number of years they have been coding is somewhere within

one to 49 years

Pay attention to what these fields contain. If the data type is a character or factor field, you cannot use numeric

values to filter. Look at what happens here when working with the entire data set.

Technically they filter for the same information. Are either one correct? Nope. What unique values exist?

Do you see the quotation marks? R is interpreting every value here as strings or words, not numbers.

Which function call is correct?

The first returns 87,938. The second returns 86,445. What’s happening here? Because these filters are using

!= or does not equal, you have to consider how that impacts whether you use a comma or the vertical pipe |.

For this example, you can see the difference when you use unique().This shows the number of unique values.

An odd set of outcomes, right? How did it go from 53 to 52 and from 53 to 50? There were two labels filtered

out in both function calls, yet neither removed two unique values.

Both the ‘or’ and ‘and’ statements removed NA. The ‘or’ statement did not remove either string completely. The

AND statement is needed here. The field still has to be converted to a numeric type, then filtered for the range

in the research question.

> nrow(filter(df, YearsCode >= 1, YearsCode <= 49)) [1] 59127 > nrow(filter(df, YearsCode != “More than 50 years”, YearsCode != “Less than 1 year”))
[1] 86445

> unique(df$YearsCode)
[1] “4” NA “3” “16”
[5] “13” “6” “8” “12”
[9] “2” “5” “17” “10”
[13] “14” “35” “7” “Less than 1 year”
[17] “30” “9” “26” “40”
[21] “19” “15” “20” “28”
[25] “25” “1” “22” “11”
[29] “33” “50” “41” “18”
[33] “34” “24” “23” “42”
[37] “27” “21” “36” “32”
[41] “39” “38” “31” “37”
[45] “More than 50 years” “29” “44” “45”
[49] “48” “46” “43” “47”
[53] “49”

df <- filter(df, YearsCode != "Less than 1 year"| # using an ‘or’ statement YearsCode != "More than 50 years") df <- filter(df, YearsCode != "Less than 1 year", # using an ‘and’ statement YearsCode != "More than 50 years")

> length(unique(df$YearsCode)) # unfiltered original data
[1] 53
> length(unique(df$YearsCode)) # filtered with the or statement above
[1] 52
> length(unique(df$YearsCode)) # filtered with the ‘and’ statement above
[1] 50

1/27/20 Assignment 2 Finding the right data x P a g e | 4

What range of data is available for this field that represents the number of years the respondent has been
programming, after the ‘and’ statement?

There is one more step you need to take, before you’re done with this field. You changed it, now validate those

last changes. How do you know it’s correct?

Beyond the three variables shown in this document so far, if you run into trouble trying to model this data in

your analysis, there are two things you can look for.

The first: is your data a regular data frame? Not a tbl_df or tibble?

Convert it to a data frame df <- as.data.frame(df).

The second: did you validate the changes you made? What does a summary of your data look like?

How many levels do your factors have? Is it clean? If the function summary returns factor levels with a count of

0, your data is not clean!

If you were filtering for Australia, Russia, and the Netherlands, along with only full-time and part-time workers,

and the use of summary() returned the following:

Your data is not clean! If you attempt to train a random forest model unclean data, as shown, it could take

hours to process and may finish with an error. Clean the data first. Empty levels? Try this:

The final caveat: It will make reading the confusion matrices a lot easier to read if you change the labels of the

outcome variable before training your model.

> range(df$YearsCode %>% as.numeric())
[1] 1 50 # after the ‘and’ statement, the field is 1:50
# the field needs to be permanently changed to numeric
# the field will need to be filtered again

> df$YearsCode <- as.numeric(df$YearsCode) > df <- filter(df, YearsCode >= 1, YearsCode <= 49)

Employment Country
Employed full-time :2252 Australia :845
Employed part-time : 108 Russian Federation:764
Independent contractor, freelancer, or self-employed: 0 Netherlands :751
Not employed, and not looking for work : 0 Afghanistan : 0
Not employed, but looking for work : 0 Albania : 0
Retired : 0 Algeria : 0
(Other) : 0

> df <- droplevels(df) # no more empty factor levels! # after dropping the empty factor levels, summary returns this > summary(df)

Employment Country
Employed full-time:2252 Australia :845
Employed part-time: 108 Netherlands :751
Russian Federation:764

> levels(df$SOJobs) # filtered SOJobs for analysis has two levels
[1] “No, I knew that Stack Overflow had a job board but have never used or visited it”
[2] “Yes”

> levels(t2$SOJobs) <- c("No","Yes") # change labels; order of labels matters

Research Assignment 2: Finding the right data

11/4/20Assignment 2 Sp Au Br x P a g e | 1

Research Assignment 2
The Outline for Research Assignment 2 and Research Assignment 2 will use this document.

Use the Documenting Research Guide to understand how to use the information in this document for

either of these submissions. Ask questions, if needed!

Problem:

Employers’ external job postings need to be posted to the one job board that targets their

model candidate and only receive applicants that are perfect for the role. In reality jobs are

typically posted in numerous places, and both suitable and unsuitable candidates apply for the

role. Using specific candidate characteristics and a specific job board, considering what may or

may not influence the use of a specific job board will lead to better targeting of candidates,

reducing redundant job postings, and decreasing the number of unfit candidates.

Question 1:

What are the most influential features when predicting whether a survey respondent has used

the SO job board or is aware of the board but has never used it when considering respondents

who reported residing in the country of Spain, Australia, or Brazil; reported their age as

somewhere between 18 and 65 years old; and that indicated that they were either not at all,

somewhat, or very confident in their manager; reported an undergraduate major in either an

engineering field, information systems, or web design, or statistics; in addition to the responses

these respondents reported regarding employment; how often the respondent contributes to

open source; and whether or not they code for a hobby; when the respondent indicated that

the number of years they have been coding is somewhere within one to 49 years using the

data from SO (2019)?

Question 2:

You are responsible for developing a second research question. This question must meet the

criteria from Unit 1 Part 1. Additionally, it must relate to the problem statement. It does not

have to use the same subset of data as the other research question. The analysis method

must be an analysis method demonstrated in one of the lectures. When completing the outline,

make sure to include both the given question and the well-developed, sound research question

you have developed.

Data:

• The data and data dictionaries are online.

o Note: The raw data in your program must be in the original form. Do not modify the data

outside of the programming. Use the data dictionary to understand the data.

o The data and data dictionary are downloaded together. When you visit this site, ensure

you select the 2019 survey and you cite and reference the source in your work.

▪ Stack Overflow. (2019). Stack overflow annual developer survey [Data set and

code book]. https://insights.stackoverflow.com/survey/

• Create a subset of data to represent the sample of secondary data in this analysis, based on

the research questions.

11/4/20 Assignment 2 Sp Au Br x P a g e | 2

Data Cleaning:

• Do not remove missing values during cleaning.

• When changing an object or part of an object, validate the change that occurred as expected.

• The steps that are taken in cleaning are not discussed in the research paper.

Analyze:

• When analyzing the given research question, you must use a random forest model.

o You must attempt to improve the model performance by one of the methods covered in

Unit 5.

o The research question you write must make use of a method of analysis demonstrated

in the lectures from this course.

o The use of Accuracy is not suitable in and of itself to determine the validity and reliability

of the model.

• The sub-stages of Analyze are necessary at least two times; profile, prepare, and apply. This

method is for programming, not documenting research.

• Ensure you establish that the model is valid and reliable before discussing the influential

indicators.

Results section and discussion section:

• Ensure that assertions and assessments in the results and discussion sections are derived

from the analysis in R.

• Do not speculate. Use evidence. When documenting the results, consider the generalizability.

• Explain what was done to improve model performance in words: not programming functions,

variable names, or argument names. Assume the reader cannot see the programming code or

raw data, but needs to understand what you did to improve the performance.

Future recommendations:

• Include recommendations for future analysis, based on the research in R.

• Explore the insights you can gain from this model and provide your interpretations when

documenting your research.

Bonus challenge:

Compare the influential indicators in predicting the outcome depending on the country by creating

separate models for each country. Describe if there were or were not distinct differences in the

contribution of the different predictors. Do not speculate when discussing the findings.

Tip: An additional research question that meets the five criteria from the first lecture will bring

this additional analysis into the focus of the research. The challenge does not replace the original

research requirements for this assignment. If you were to complete the challenge, there would be

three research questions.

Required files to submit:

1) Research paper in APA 7 format; MS Word document file type

2) R Script; final version with file type .r

11/4/20 Assignment 2 Sp Au Br x P a g e | 3

Important Information:

• You will receive an email confirming the submission. Should you receive that email, your

submission is received.

o An error is derived from the use of SafeAssign.

o SafeAssign does not recognize r file types. The warning does not impact the

submission.

• The research paper will be written in a professional writing style, following APA 7 student

paper format, use the student paper template.

o The document shall be 3-5 pages and at least 1000 words. The page count does

include the cover page, tables, or figures, or the reference page.

o Ensure that every reference in the reference list is also cited in the text.

o Do not forget to cite and reference the source of the data.

• It is ill-advised to modify the problem statement and research question provided.

• If the research problem or research questions are modified, the requirements of the analysis

will not change, nor the objective outlined in the original research question.

• There are several different versions of this assignment. If the submitted work is in line with a

different version than assigned, the submitted work is a demonstration of academic

dishonesty. Do not share the work with peers. Do not accept work that you did not do.

• Take a look at the rubric to get the best grade possible.

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Sign Up Talk to Us

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories

All samples

Essay (any type)

Essay (any type)

The Value of a Nursing Degree

Undergrad. (yrs 3-4)

Nursing

2

View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 (877) 657-8180 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now