Take-home Exercise 4

In this Take-home Exercise, I will attempt to answer question 3 of Challenge 2 of the VAST Challenge 2022. I will reveal and describe the daily routines of 2 participants of Engagement, Ohio USA by using appropriate static and interactive statistical graphics methods.

Jeremiah Lam https://sg.linkedin.com/in/jeremiah-lam-6156238a (School of Computing and Information Systems)https://scis.smu.edu.sg/
2022-05-23

Overview

In this take-home exercise, appropriate static and interactive statistical graphic methods are used to reveal the routines of the participants in the city of Engagement, Ohio USA.

The data is processed and prepared by using appropriate tidyverse, scales, viridis, lubridate, ggthemes. gridExtra, readxl, knitr, data.table and ViSiElse family of packages.

Getting Started

Before getting started, it is important to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('scales', 'viridis', 
             'lubridate', 'ggthemes', 
             'gridExtra', 'tidyverse', 
             'readxl', 'knitr',
             'data.table', 'ViSiElse','clock', 'janitor', 'plotly', 'ggiraph', 'patchwork', 'gganimate')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

The code chunk below imports data provided by VAST Challenge 2022 into R by using read_csv() of readr package and saves it as a tibble data frame.

logs <- read_csv('data/ParticipantStatusLogs1.csv')

It is always a good practice to examine the imported data frame before further analysis is performed.

For example, kable() can be used to review the structure of the imported data frame.

kable(head(logs))
timestamp currentLocation participantId currentMode hungerStatus sleepStatus apartmentId availableBalance jobId financialStatus dailyFoodBudget weeklyExtraBudget
2022-03-01 POINT (-2724.6277665310454 6866.2081834436985) 0 AtHome JustAte Sleeping 926 1286.5196 254 Stable 12 1104.3026
2022-03-01 POINT (-1526.9372331431534 5582.2951345645315) 1 AtHome JustAte Sleeping 928 860.5742 929 Stable 12 926.7144
2022-03-01 POINT (-1360.9905987829304 2108.804385379679) 2 AtHome JustAte Sleeping 291 1298.1845 348 Stable 16 848.8029
2022-03-01 POINT (-1558.517200825967 5600.664347152427) 3 AtHome JustAte Sleeping 1243 1180.6417 316 Stable 12 819.3254
2022-03-01 POINT (976.2409614204214 4574.575079082071) 4 AtHome JustAte Sleeping 194 -681.6506 177 Unstable 20 0.0000
2022-03-01 POINT (-1525.6957374012197 1994.5285187115571) 5 AtHome JustAte Sleeping 243 1103.2286 33 Stable 20 386.2154

Preparation of Data

To prepare the data, first we select the logs specific to participant 500 and 501.

logs_selected <- logs %>%
  filter(participantId == 500 | participantId == 501) %>%
  select(participantId, timestamp, currentMode)
logs_selected1 <- logs_selected %>%
  mutate(timestamp = ymd_hms(timestamp, quiet = TRUE )) %>%
  mutate(wkday = weekdays(timestamp)) %>%
  mutate(hour = hour(timestamp)) %>%
  mutate(date = date(timestamp)) %>%
  mutate(Start = timestamp) %>%
  mutate(End = timestamp + 5*60)
p1 <- logs_selected1 %>% 
  filter(date == '2022-03-01') 

p2 <- logs_selected1 %>% 
  filter(date == '2022-03-05')

Plotting the graph

Interesting to note that there’s a distinct difference in lifestyles between participant 500 and 501, even on a Tuesday, participant 501 is finding time for recreational activities and is intermitently travelling about via transport to different places throughout the day. Participant 500 on the other hand, is more sedentary, only travelling to and from work, and during lunch, before heading home after work.

ggplot(p1, aes(x=Start, xend=End, y=currentMode, yend=currentMode)) +
  geom_segment(aes(group = seq_along(start)), size=10)+
  xlab("Time") +
  ylab("Activity") +
  ggtitle("Typical Tuesday")+ 
  facet_wrap(~participantId, nrow=2)

No surprises that both participants spend most of their time on a Saturday doing recreational activities and travelling about. However, it is interesting to note that participant 501 doesn’t dine out on weekends.

ggplot(p2, aes(x=Start, xend=End, y=currentMode, yend=currentMode)) +
  geom_segment(aes(group = seq_along(start)), size=10)+
  xlab("Time") +
  ylab("Activity") +
  ggtitle("Saturday")+ 
  facet_wrap(~participantId, nrow=2)

The heatmap below gives a broader overview on how the participants allocate their time throughout the week.

p3 <- logs_selected1 %>%
  select(participantId, currentMode, wkday, date, Start, End) %>%
  group_by(participantId, currentMode, wkday, date) %>%
  summarise(count = n()*5)

ggplot(p3,
       aes(wkday,
           currentMode,
           fill = count)) +
  geom_tile(color = "white",
            size = 0.1) +
  theme_tufte(base_family = "Helvetica") +
  coord_equal() + 
  scale_fill_gradient(name = "Frequency of activity(by mins)",
                      low = "sky blue",
                      high = "dark blue") +
  labs(x = "Day",
       y = "Activity",
       title = "Frequency of activity by day") +
  theme(axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6),
axis.text.x = element_text(size = 8, angle = 90)) +
  facet_wrap(~participantId, ncol = 1)