Data Visualisation & Data Organisation in Spreadsheets

CVEN 5837 - Summer 2022

Lars Schöbitz

Solving coding problems

Tipps for search engines

  • Use actionable verbs that describe what you want to do
  • Be specific
  • Add R to the search query
  • Add the name of the R package name to the search query
  • Scroll through the top 5 results (don’t just pick the first)

Example: “How to remove a legend from a plot in R ggplot2”

Stack Overflow

What is it?

  • The biggest support network for (coding) problems
  • Can be intimidating at first
  • Up-vote system


  • First, briefly read the question that was posted
  • Then, read the answer marked as “correct”
  • Then, read one or two more answers with high votes
  • Then, check out the “Linked” posts
  • Always give credit for the solution

Give credit

Give credit

Give credit

ggplot(data = global_waste_data_kg_year,
       mapping = aes(x = income_id, 
                     y = capita_kg_year,
                     color = income_id)) +
  ## Remove legend ref:
  theme(legend.position = "none")

Other sources for help

  • Our Google Space for the course
  • RStudio Community Forum:
  • Documentation websites:
  • Twitter community: #rstats

Minimal reproducible example (reprex)

  • Needed when asking questions online
  • Good support information:


Learning Objectives (for this week)

  1. Learners can describe the four main aesthetic mappings that can be used to visualise data using the ggplot2 R Package
  2. Learners can control the colour scaling applied to a plot using colour as an aesthetic mapping
  3. Learners can compare three different geoms and their use case
  4. Learners can apply a theme to control font types and sizes within a plot
  5. Learners can apply 12 principles for data organisation in spreadsheets in the layout of a collected dataset

Exploratory Data Analysis with ggplot2

R Package ggplot2

  • ggplot2 is tidyverse’s data visualization package
  • gg in ggplot2 stands for Grammar of Graphics
  • Inspired by the book Grammar of Graphics by Leland Wilkinson
  • Documentation:
  • Book:

Code structure

  • ggplot() is the main function in ggplot2
  • Plots are constructed in layers
  • Structure of the code for plots can be summarized as
ggplot(data = [dataset], 
       mapping = aes(x = [x-variable], 
                     y = [y-variable])) +
   geom_xxx() +
   other options

Code structure


Code structure

ggplot(data = gapminder_yr_2007)

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes()) 

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp))  

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp)) +

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp)) +
  geom_boxplot() +

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp)) +
  geom_boxplot() +
  theme_minimal(base_size = 14)

Live Coding Exercise: Reproduce this plot


  1. Head over to
  2. Open the workspace for the course (cven5837-ss22)
  3. Open “Projects”
  4. Open the “course-materials” project
  5. Follow along with me



Visualising numerical data

Types of variables


discrete variables

  • non-negative
  • whole numbers
  • e.g. number of students, roll of a dice

continuous variables

  • infinite number of values
  • also dates and times
  • e.g. length, weight, size


categorical variables

  • finite number of values
  • distinct groups (e.g. EU countries, continents)
  • ordinal if levels have natural ordering (e.g. week days, school grades)

Data Collection Tools

Data Collection Tools

  • Questionnaires for survey based data
  • Spreadsheets for manual experimental/observational data
  • Sensors for automated near real-time data

Survey tools

Commonly used in the Global Engineering and Development sector

  • KOBO Toolbox
  • mWater
  • OpenDataKit

Data Organisation in Spreadsheets

Data Organisation in Spreadsheets

Read the paper (it’s part of your homework), but you can also:

  • Go through the annotated slides:
  • Watch Karl Broman give the talk (02:36 to 45:00):
  • Read the content on a website:

But, especially apply it to your data



Because it will make your life easier!

License? CC0 (!)

( .footnote[Screenshot taken on 2022-03-23]

Pair Programming Exercise

Pair Programming Exercises

  • Two learners work together in a break out session
  • One person (the driver) shares the screen and does the typing
  • The other person (the navigator) offers comments and suggestions
  • Roles get switched


  1. Head over to
  2. Open the workspace for the course (cven5837-ss22)
  3. Open “Projects”
  4. Open the “course-materials” project

Homework week 2

Bring your own data

  • Generate data doing a short survey or observational study
  • Find a data online that interests you
  • Use a dataset that you already have available

Homework due dates

  • All material on course website
  • Homework assignment due: Friday, 15th July
  • Learning reflection due: Monday, 18th July

Thanks! 🌻

Slides created via revealjs and Quarto: Access slides as PDF on GitHub

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.