Welcome to Data Analytics for Development

CVEN 5837 - Summer 2022

Lars Schöbitz

Welcome! 👋

Meet the lecturer

Lars Schöbitz (he/him)

Headshot of Lars Schöbitz

  • Environmental Engineer
  • Open Science Specialist at ETH Zurich
  • Independent Instructor for Data Science with R
  • Twitter: @larnsce

Learning Goals (for the course)

  • Be familiar with the most commonly used qualitative and quantitative data collection methods and tools.
  • Be able to employ remote sensing and in-situ data, and analysis tools to illustrate the utility of solutions for water, agriculture, disaster forecasting and relief, air quality, and global health.

Why are you here?

Pick an item

What does the item you have picked have to do with the reason for you being here?

Topics

  • Overview of qualitative and quantitative research methods and tools
  • The data science life-cycle
  • Data organization in spreadsheets
  • Exploratory data analysis using visualization
  • Concept of tidy data and data tidying
  • Data transformation and descriptive statistics
  • Data communication using the Quarto open-source scientific and technical publishing system

Learning Objectives (for this week)

  1. Learners can navigate the platforms that are used to for the course
  2. Learners can render a file on RStudio Cloud in PDF format
  3. Learners can list the six elements of the data science lifecycle
  4. Learners can identify four components of a Quarto file (YAML, code chunk, R code, markdown)

Classroom tools

Live Coding Exercises

  • Instructor writes and narrates code out loud
  • Intstructor explains elements and principles that are relevant
  • Code is displayed on second screen / split screen
  • Learners join by writing and executing the same code
  • Learners “code-along” with the instructor

Pair Programming Exercises

  • Two learners work together in a break out session
  • One person (the driver) shares the screen and does the typing
  • The other person (the navigator) offers comments and suggestions
  • Roles get switched

Platforms and Tools

  • R
  • RStudio (Cloud)
  • tidyverse R Packages
  • Quarto publishing system

cven5837-ss22.github.io/website/ 🔖

RStudio Cloud

-

-

-

-

-

-

-

Screen setup

  • Who uses a second external screen?
  • “Yes” in the Zoom Chat

Live Coding Exercise

live-01a-setup - RStudio Cloud Setup

  1. Head over to rstudio.cloud
  2. Create a free account if you do not have one yet
  3. Open the link that is posted to the Zoom chat
  4. Accept the invitation to join the cven5837-ss22 workspace
  5. Post “ready” to the Zoom chat when you are done

Break

10:00

Data Science Lifecycle

Think, Pair, Share

Question

  1. What is your mental model of the Data Science Lifecycle?
  • Think for 2 minutes
  • Pair with your neighbour for 4 minutes
  • Share your answer with the class
02:00

Deep End

via GIPHY

-

-

-

-

-

-

-

Live Coding Exercise

live-01b-data-science-lifecycle - Data Science Lifecycle

  1. Head over to rstudio.cloud
  2. Open the workspace for the course (cven5837-ss22)
  3. Open “Projects”
  4. Open the “course-materials” project
  5. Follow along with me

Break

05:00

R

Packages

base R

sqrt(49)
sum(1, 2)
  • Functions come with R

R Packages

library(dplyr)
  • Installed once in the Console: install.packages("dplyr")
  • Loaded per script

Functions & Arguments

library(dplyr)

filter(.data = gapminder, 
       year == 2007)
  • Function: filter()
  • Argument: .data =
  • Arguments following: year == 2007 What do do with the data

Objects

library(dplyr)

gapminder_yr_2007 <- filter(.data = gapminder, 
                            year == 2007)
  • Function: filter()
  • Argument: .data =
  • Arguments following: year == 2007 What do do with the data
  • Object: gapminder_yr_2007

Operators

library(dplyr)

gapminder_yr_2007 <- gapminder |> 
  filter(year == 2007) 
  • Function: filter()
  • Argument: .data =
  • Arguments following: year == 2007 What do do with the data
  • Object: gapminder_yr_2007
  • Assignment operator: <-
  • Pipe operator: |>

Rules

Rules of dplyr functions:

  • First argument is always a data frame
  • Subsequent arguments say what to do with that data frame
  • Always return a data frame
  • Don’t modify in place

Course information

Weekly Structure

Monday Learning reflections are due
Tuesday Lecture
Wednesday Feedback (grading) on assignment
Thursday Student hours on Zoom (10 am to 12 pm CEST)
Friday Homework assignment is due

Homework assignments

  • Weekly programming assignments
  • 75% of the total grade

Learning reflections

  • Reflections on the different class elements (lecture, homework assignment, readings)
  • minimum 200 words
  • 25% of the total grade

Grading

grade percent
A+ 97
A+ 93
A- 90
B+ 87
B 83
B- 80
C+ 77
C 73
C- 70
D+ 67
D 63
D- 60
F 0

Late work policy

  • up to 2 working days after deadline (25% penalty for each day)
    • Tuesday for homework assignments (-50%)
    • Wednesday for learning reflections (-50%)
  • work handed in more than two working days after due date with be graded 0%

Homework week 1

Homework due dates

  • All material on course website
  • Homework assignment due: Friday, 8th July
  • Learning reflection due: Monday, 11th July

Thanks! 🌻

Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/ Access slides as PDF on GitHub

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.