assignment work (AVD)

 

Outline for Assignment 2

Using the Documenting Research Guide and the Assignment 2 Instructions, develop your outline.

Don't use plagiarized sources. Get Your Custom Essay on
assignment work (AVD)
Just from $13/Page
Order Essay

When you begin working with the data, use the document Assignment 2:Finding the right data to get started.

Submit the outline in an MS Word document file type. Utilize the standards in APA 7 for all citations or references in the outline. Ensure that the document includes your name. Do not include your student identification number. Include a cover page with your name, you may use the Student Paper Template‘s cover page.

Both the Documenting Research Guide and the Student Paper Template can be found in the Useful Information folder in the content area.

Submit your outline on or before the Sunday at 11:59 PM eastern time.

By submitting this paper, you agree:

(1) that you are submitting your paper to be used and stored as part of the SafeAssign™ services in accordance with the

Blackboard Privacy Policy

;

(2) that your institution may use your paper in accordance with UC’s policies; and

(3) that the use of SafeAssign will be without recourse against Blackboard Inc. and its affiliates.

1/27/20 Assignment 2 Finding the right data x P a g e | 1

  • Research Assignment 2: Finding the right data
  • This document is provided to help you work through this robust research question. This document will offer

    insight in how to address these fields, by working with three of fields and some tips for working with the data in

    R. Work through fields one piece at a time, shown here in bold.

    This research question requires several filters to obtain the secondary data sample that is necessary for this

    research. Consider the field that represents whether a survey respondent has used the SO job board or is

    aware of the board, but has never used it. When you identified the variable that represents this field, what

    unique values can be found in the field?

    The question has limited the scope of the data sample to respondents that use it and those that are aware of it,

    but have never used it. We don’t have enough information! What is the survey question associated with these

    survey answers?

    Can you determine what unique answers or values you need to keep now?

    Respondents that use it: how would they answer this survey question? Yes

    Respondents that don’t use it, but know about the job board? No, I knew…

    When you have lengthy character fields, it can get cumbersome to isolate the strings. If even one letter is out

    of place, your filter will not perform as expected. Using the entire data frame of 88,883 observations the first

    chunk returns zero observations; the second chunk returns 75,532 observations.

    While the second options works, you may find that method to be slow to type and problematic due to typing

    errors. How about this approach?

    The function str_detect() or string detect searches a character string for the contents.

    To keep it simple, don’t use the first word or last word of the string. Want to know why or more? Email me or

    use help in RStudio.

    What are the most influential features when predicting whether a survey respondent has used the SO job board

    or is aware of the board but has never used it when considering respondents who reported residing in the

    country …(your countries listed here); reported their age as somewhere between 18 and 65 years old; and that

    indicated that they were either not at all, somewhat, or very confident in their manager; reported an

    undergraduate major in either an engineering field, information systems, or web design, or statistics; in

    addition to the responses these respondents reported regarding employment; how often the respondent

    contributes to open source; and whether or not they code for a hobby; when the respondent indicated that

    the number of years they have been coding is somewhere within one to 49 years using the data from SO

    (2019)?

    > unique(df$SOJobs)
    [1] “No, I didn’t know that Stack Overflow had a job board”
    [2] “No, I knew that Stack Overflow had a job board but have never used or visited it”
    [3] “Yes”
    [4] NA

    Have you ever used or visited Stack Overflow Jobs?

    df <- filter(df, # ‘and’ statement

    SOJobs == “Yes”,

    SOJobs == “No, I knew that Stack Overflow had a job board but have never used or visited it”)

    df <- filter(df, # ‘or’ statement

    SOJobs == “Yes”|

    SOJobs == “No, I knew that Stack Overflow had a job board but have never used or visited it”)

    df <- filter(df, str_detect(SOJobs, "knew")|SOJobs == "Yes")

    1/27/20 Assignment 2 Finding the right data x P a g e | 2

    Let’s look at another field from the research question: reported an undergraduate major in either an

    engineering field, information systems, or web design, or statistics.

    What unique values exist for this field?

    Do you need the survey question? Maybe not. You do need to capture the questions in the sample section.

    Here I’ve highlighted the words coinciding with the research question and truncated the list.

    What’s the best method to approach this? Think about how you could use the string detect function here.

    When you pick the words to detect, consider all the unique values. How can you validate the filter worked like

    you had intended? Try using the function table() before and after your filter. The function table() will

    alphabetize your unique field names, so they won’t appear in the same order as they did with unique(). Add

    the Check the remaining fields and how often each of the fields occur in the data.

    > unique(df$UndergradMajor)
    [1]
    [2] Web development or web design
    [3] Computer science, computer engineering, or software engineering
    [4] Mathematics or statistics
    [5] Another engineering discipline (ex. civil, electrical, mechanical)
    [6] Information systems, information technology, or system administration
    [7] A business discipline (ex. accounting, finance, marketing)
    [8] A natural science (ex. biology, chemistry, physics)
    [9] A social science (ex. anthropology, psychology, political science)
    [10] A humanities discipline (ex. literature, history, philosophy)
    [11] Fine arts or performing arts (ex. graphic design, music, studio art)
    [12] A health science (ex. nursing, pharmacy, radiology)
    [13] I never declared a major

    > unique(df$UndergradMajor)
    [1]
    [2] Web development or web design
    [3] Computer science, computer engineering, or software engineering
    [4] Mathematics or statistics
    [5] Another engineering discipline (ex. civil, electrical, mechanical)
    [6] Information systems, information technology, or system administration
    [7] A business discipline (ex. accounting, finance, marketing)

    df <- df %>%
    filter(str_detect(UndergradMajor, “engineering”)|
    str_detect(UndergradMajor, “information”)|
    str_detect(UndergradMajor, “statistics”)|
    str_detect(UndergradMajor, “web”)) %>%
    droplevels() # droplevels() is necessary when dropping factor levels

    > table(df$UndergradMajor)
    A business discipline (ex. accounting, finance, marketing)
    1841
    A health science (ex. nursing, pharmacy, radiology)
    323
    A humanities discipline (ex. literature, history, philosophy)
    1571
    A natural science (ex. biology, chemistry, physics)
    3232
    A social science (ex. anthropology, psychology, political science)
    1352
    Another engineering discipline (ex. civil, electrical, mechanical)
    6222
    Computer science, computer engineering, or software engineering

    > table(df$UndergradMajor)
    Another engineering discipline (ex. civil, electrical, mechanical)
    6222
    Computer science, computer engineering, or software engineering
    47214
    Information systems, information technology, or system administration
    5253
    Mathematics or statistics
    2975
    Web development or web design
    3422

    1/27/20 Assignment 2 Finding the right data x P a g e | 3

    When you think it’s a field with numeric values, but it isn’t. From the research question:

    when the respondent indicated that the number of years they have been coding is somewhere within

    one to 49 years

    Pay attention to what these fields contain. If the data type is a character or factor field, you cannot use numeric

    values to filter. Look at what happens here when working with the entire data set.

    Technically they filter for the same information. Are either one correct? Nope. What unique values exist?

    Do you see the quotation marks? R is interpreting every value here as strings or words, not numbers.

    Which function call is correct?

    The first returns 87,938. The second returns 86,445. What’s happening here? Because these filters are using

    != or does not equal, you have to consider how that impacts whether you use a comma or the vertical pipe |.

    For this example, you can see the difference when you use unique().This shows the number of unique values.

    An odd set of outcomes, right? How did it go from 53 to 52 and from 53 to 50? There were two labels filtered

    out in both function calls, yet neither removed two unique values.

    Both the ‘or’ and ‘and’ statements removed NA. The ‘or’ statement did not remove either string completely. The

    AND statement is needed here. The field still has to be converted to a numeric type, then filtered for the range

    in the research question.

    > nrow(filter(df, YearsCode >= 1, YearsCode <= 49)) [1] 59127 > nrow(filter(df, YearsCode != “More than 50 years”, YearsCode != “Less than 1 year”))
    [1] 86445

    > unique(df$YearsCode)
    [1] “4” NA “3” “16”
    [5] “13” “6” “8” “12”
    [9] “2” “5” “17” “10”
    [13] “14” “35” “7” “Less than 1 year”
    [17] “30” “9” “26” “40”
    [21] “19” “15” “20” “28”
    [25] “25” “1” “22” “11”
    [29] “33” “50” “41” “18”
    [33] “34” “24” “23” “42”
    [37] “27” “21” “36” “32”
    [41] “39” “38” “31” “37”
    [45] “More than 50 years” “29” “44” “45”
    [49] “48” “46” “43” “47”
    [53] “49”

    df <- filter(df, YearsCode != "Less than 1 year"| # using an ‘or’ statement YearsCode != "More than 50 years") df <- filter(df, YearsCode != "Less than 1 year", # using an ‘and’ statement YearsCode != "More than 50 years")

    > length(unique(df$YearsCode)) # unfiltered original data
    [1] 53
    > length(unique(df$YearsCode)) # filtered with the or statement above
    [1] 52
    > length(unique(df$YearsCode)) # filtered with the ‘and’ statement above
    [1] 50

    1/27/20 Assignment 2 Finding the right data x P a g e | 4

    What range of data is available for this field that represents the number of years the respondent has been
    programming, after the ‘and’ statement?

    There is one more step you need to take, before you’re done with this field. You changed it, now validate those

    last changes. How do you know it’s correct?

    Beyond the three variables shown in this document so far, if you run into trouble trying to model this data in

    your analysis, there are two things you can look for.

    The first: is your data a regular data frame? Not a tbl_df or tibble?

    Convert it to a data frame df <- as.data.frame(df).

    The second: did you validate the changes you made? What does a summary of your data look like?

    How many levels do your factors have? Is it clean? If the function summary returns factor levels with a count of

    0, your data is not clean!

    If you were filtering for Australia, Russia, and the Netherlands, along with only full-time and part-time workers,

    and the use of summary() returned the following:

    Your data is not clean! If you attempt to train a random forest model unclean data, as shown, it could take

    hours to process and may finish with an error. Clean the data first. Empty levels? Try this:

    The final caveat: It will make reading the confusion matrices a lot easier to read if you change the labels of the

    outcome variable before training your model.

    > range(df$YearsCode %>% as.numeric())
    [1] 1 50 # after the ‘and’ statement, the field is 1:50
    # the field needs to be permanently changed to numeric
    # the field will need to be filtered again

    > df$YearsCode <- as.numeric(df$YearsCode) > df <- filter(df, YearsCode >= 1, YearsCode <= 49)

    Employment Country
    Employed full-time :2252 Australia :845
    Employed part-time : 108 Russian Federation:764
    Independent contractor, freelancer, or self-employed: 0 Netherlands :751
    Not employed, and not looking for work : 0 Afghanistan : 0
    Not employed, but looking for work : 0 Albania : 0
    Retired : 0 Algeria : 0
    (Other) : 0

    > df <- droplevels(df) # no more empty factor levels! # after dropping the empty factor levels, summary returns this > summary(df)

    Employment Country
    Employed full-time:2252 Australia :845
    Employed part-time: 108 Netherlands :751
    Russian Federation:764

    > levels(df$SOJobs) # filtered SOJobs for analysis has two levels
    [1] “No, I knew that Stack Overflow had a job board but have never used or visited it”
    [2] “Yes”

    > levels(t2$SOJobs) <- c("No","Yes") # change labels; order of labels matters

      Research Assignment 2: Finding the right data

    11/4/20Assignment 2 Sp Au Br x P a g e | 1

    Research Assignment 2
    The Outline for Research Assignment 2 and Research Assignment 2 will use this document.

    Use the Documenting Research Guide to understand how to use the information in this document for

    either of these submissions. Ask questions, if needed!

  • Problem:
  • Employers’ external job postings need to be posted to the one job board that targets their

    model candidate and only receive applicants that are perfect for the role. In reality jobs are

    typically posted in numerous places, and both suitable and unsuitable candidates apply for the

    role. Using specific candidate characteristics and a specific job board, considering what may or

    may not influence the use of a specific job board will lead to better targeting of candidates,

    reducing redundant job postings, and decreasing the number of unfit candidates.

  • Question 1:
  • What are the most influential features when predicting whether a survey respondent has used

    the SO job board or is aware of the board but has never used it when considering respondents

    who reported residing in the country of Spain, Australia, or Brazil; reported their age as

    somewhere between 18 and 65 years old; and that indicated that they were either not at all,

    somewhat, or very confident in their manager; reported an undergraduate major in either an

    engineering field, information systems, or web design, or statistics; in addition to the responses

    these respondents reported regarding employment; how often the respondent contributes to

    open source; and whether or not they code for a hobby; when the respondent indicated that

    the number of years they have been coding is somewhere within one to 49 years using the

    data from SO (2019)?

  • Question 2:
  • You are responsible for developing a second research question. This question must meet the

    criteria from Unit 1 Part 1. Additionally, it must relate to the problem statement. It does not

    have to use the same subset of data as the other research question. The analysis method

    must be an analysis method demonstrated in one of the lectures. When completing the outline,

    make sure to include both the given question and the well-developed, sound research question

    you have developed.

  • Data:
  • • The data and data dictionaries are online.

    o Note: The raw data in your program must be in the original form. Do not modify the data

    outside of the programming. Use the data dictionary to understand the data.

    o The data and data dictionary are downloaded together. When you visit this site, ensure

    you select the 2019 survey and you cite and reference the source in your work.

    ▪ Stack Overflow. (2019). Stack overflow annual developer survey [Data set and

    code book]. https://insights.stackoverflow.com/survey/

    • Create a subset of data to represent the sample of secondary data in this analysis, based on

    the research questions.

    11/4/20 Assignment 2 Sp Au Br x P a g e | 2

  • Data Cleaning:
  • • Do not remove missing values during cleaning.

    • When changing an object or part of an object, validate the change that occurred as expected.

    • The steps that are taken in cleaning are not discussed in the research paper.

  • Analyze:
  • • When analyzing the given research question, you must use a random forest model.

    o You must attempt to improve the model performance by one of the methods covered in

    Unit 5.

    o The research question you write must make use of a method of analysis demonstrated

    in the lectures from this course.

    o The use of Accuracy is not suitable in and of itself to determine the validity and reliability

    of the model.

    • The sub-stages of Analyze are necessary at least two times; profile, prepare, and apply. This

    method is for programming, not documenting research.

    • Ensure you establish that the model is valid and reliable before discussing the influential

    indicators.

  • Results section and discussion section:
  • • Ensure that assertions and assessments in the results and discussion sections are derived

    from the analysis in R.

    • Do not speculate. Use evidence. When documenting the results, consider the generalizability.

    • Explain what was done to improve model performance in words: not programming functions,

    variable names, or argument names. Assume the reader cannot see the programming code or

    raw data, but needs to understand what you did to improve the performance.

  • Future recommendations:
  • • Include recommendations for future analysis, based on the research in R.

    • Explore the insights you can gain from this model and provide your interpretations when

    documenting your research.

  • Bonus challenge:
  • Compare the influential indicators in predicting the outcome depending on the country by creating

    separate models for each country. Describe if there were or were not distinct differences in the

    contribution of the different predictors. Do not speculate when discussing the findings.

    Tip: An additional research question that meets the five criteria from the first lecture will bring

    this additional analysis into the focus of the research. The challenge does not replace the original

    research requirements for this assignment. If you were to complete the challenge, there would be

    three research questions.

  • Required files to submit:
  • 1) Research paper in APA 7 format; MS Word document file type

    2) R Script; final version with file type .r

    11/4/20 Assignment 2 Sp Au Br x P a g e | 3

  • Important Information:
  • • You will receive an email confirming the submission. Should you receive that email, your

    submission is received.

    o An error is derived from the use of SafeAssign.

    o SafeAssign does not recognize r file types. The warning does not impact the

    submission.

    • The research paper will be written in a professional writing style, following APA 7 student

    paper format, use the student paper template.

    o The document shall be 3-5 pages and at least 1000 words. The page count does

    include the cover page, tables, or figures, or the reference page.

    o Ensure that every reference in the reference list is also cited in the text.

    o Do not forget to cite and reference the source of the data.

    • It is ill-advised to modify the problem statement and research question provided.

    • If the research problem or research questions are modified, the requirements of the analysis

    will not change, nor the objective outlined in the original research question.

    • There are several different versions of this assignment. If the submitted work is in line with a

    different version than assigned, the submitted work is a demonstration of academic

    dishonesty. Do not share the work with peers. Do not accept work that you did not do.

    • Take a look at the rubric to get the best grade possible.

      Problem:
      Question 1:
      Question 2:
      Data:
      Data Cleaning:
      Analyze:
      Results section and discussion section:
      Future recommendations:
      Bonus challenge:
      Required files to submit:
      Important Information:

    What Will You Get?

    We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

    Premium Quality

    Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

    Experienced Writers

    Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

    On-Time Delivery

    Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

    24/7 Customer Support

    Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

    Complete Confidentiality

    Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

    Authentic Sources

    We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

    Moneyback Guarantee

    Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

    Order Tracking

    You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

    image

    Areas of Expertise

    Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

    Areas of Expertise

    Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

    image

    Trusted Partner of 9650+ Students for Writing

    From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

    Preferred Writer

    Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

    Grammar Check Report

    Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

    One Page Summary

    You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

    Plagiarism Report

    You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

    Free Features $66FREE

    • Most Qualified Writer $10FREE
    • Plagiarism Scan Report $10FREE
    • Unlimited Revisions $08FREE
    • Paper Formatting $05FREE
    • Cover Page $05FREE
    • Referencing & Bibliography $10FREE
    • Dedicated User Area $08FREE
    • 24/7 Order Tracking $05FREE
    • Periodic Email Alerts $05FREE
    image

    Our Services

    Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

    • On-time Delivery
    • 24/7 Order Tracking
    • Access to Authentic Sources
    Academic Writing

    We create perfect papers according to the guidelines.

    Professional Editing

    We seamlessly edit out errors from your papers.

    Thorough Proofreading

    We thoroughly read your final draft to identify errors.

    image

    Delegate Your Challenging Writing Tasks to Experienced Professionals

    Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

    Check Out Our Sample Work

    Dedication. Quality. Commitment. Punctuality

    Categories
    All samples
    Essay (any type)
    Essay (any type)
    The Value of a Nursing Degree
    Undergrad. (yrs 3-4)
    Nursing
    2
    View this sample

    It May Not Be Much, but It’s Honest Work!

    Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

    0+

    Happy Clients

    0+

    Words Written This Week

    0+

    Ongoing Orders

    0%

    Customer Satisfaction Rate
    image

    Process as Fine as Brewed Coffee

    We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

    See How We Helped 9000+ Students Achieve Success

    image

    We Analyze Your Problem and Offer Customized Writing

    We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

    • Clear elicitation of your requirements.
    • Customized writing as per your needs.

    We Mirror Your Guidelines to Deliver Quality Services

    We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

    • Proactive analysis of your writing.
    • Active communication to understand requirements.
    image
    image

    We Handle Your Writing Tasks to Ensure Excellent Grades

    We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

    • Thorough research and analysis for every order.
    • Deliverance of reliable writing service to improve your grades.
    Place an Order Start Chat Now
    image

    Order your essay today and save 30% with the discount code Happy