Overview

In this take-home exercise, appropriate static and interactive statistical graphic methods are used to reveal the routines of the participants in the city of Engagement, Ohio USA.

The data is processed and prepared by using appropriate tidyverse, scales, viridis, lubridate, ggthemes. gridExtra, readxl, knitr, data.table and ViSiElse family of packages.

Getting Started

Before getting started, it is important to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('scales', 'viridis', 
             'lubridate', 'ggthemes', 
             'gridExtra', 'tidyverse', 
             'readxl', 'knitr',
             'data.table', 'ViSiElse','clock', 'janitor', 'plotly', 'ggiraph', 'patchwork', 'gganimate')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

The code chunk below imports data provided by VAST Challenge 2022 into R by using read_csv() of readr package and saves it as a tibble data frame.

logs <- read_csv('data/ParticipantStatusLogs1.csv')

It is always a good practice to examine the imported data frame before further analysis is performed.

For example, kable() can be used to review the structure of the imported data frame.

kable(head(logs))

timestamp	currentLocation	participantId	currentMode	hungerStatus	sleepStatus	apartmentId	availableBalance	jobId	financialStatus	dailyFoodBudget	weeklyExtraBudget
2022-03-01	POINT (-2724.6277665310454 6866.2081834436985)	0	AtHome	JustAte	Sleeping	926	1286.5196	254	Stable	12	1104.3026
2022-03-01	POINT (-1526.9372331431534 5582.2951345645315)	1	AtHome	JustAte	Sleeping	928	860.5742	929	Stable	12	926.7144
2022-03-01	POINT (-1360.9905987829304 2108.804385379679)	2	AtHome	JustAte	Sleeping	291	1298.1845	348	Stable	16	848.8029
2022-03-01	POINT (-1558.517200825967 5600.664347152427)	3	AtHome	JustAte	Sleeping	1243	1180.6417	316	Stable	12	819.3254
2022-03-01	POINT (976.2409614204214 4574.575079082071)	4	AtHome	JustAte	Sleeping	194	-681.6506	177	Unstable	20	0.0000
2022-03-01	POINT (-1525.6957374012197 1994.5285187115571)	5	AtHome	JustAte	Sleeping	243	1103.2286	33	Stable	20	386.2154

Preparation of Data

To prepare the data, first we select the logs specific to participant 500 and 501.

logs_selected <- logs %>%
  filter(participantId == 500 | participantId == 501) %>%
  select(participantId, timestamp, currentMode)

logs_selected1 <- logs_selected %>%
  mutate(timestamp = ymd_hms(timestamp, quiet = TRUE )) %>%
  mutate(wkday = weekdays(timestamp)) %>%
  mutate(hour = hour(timestamp)) %>%
  mutate(date = date(timestamp)) %>%
  mutate(Start = timestamp) %>%
  mutate(End = timestamp + 5*60)

p1 <- logs_selected1 %>% 
  filter(date == '2022-03-01') 

p2 <- logs_selected1 %>% 
  filter(date == '2022-03-05')

Plotting the graph

Interesting to note that there’s a distinct difference in lifestyles between participant 500 and 501, even on a Tuesday, participant 501 is finding time for recreational activities and is intermitently travelling about via transport to different places throughout the day. Participant 500 on the other hand, is more sedentary, only travelling to and from work, and during lunch, before heading home after work.

ggplot(p1, aes(x=Start, xend=End, y=currentMode, yend=currentMode)) +
  geom_segment(aes(group = seq_along(start)), size=10)+
  xlab("Time") +
  ylab("Activity") +
  ggtitle("Typical Tuesday")+ 
  facet_wrap(~participantId, nrow=2)

No surprises that both participants spend most of their time on a Saturday doing recreational activities and travelling about. However, it is interesting to note that participant 501 doesn’t dine out on weekends.

ggplot(p2, aes(x=Start, xend=End, y=currentMode, yend=currentMode)) +
  geom_segment(aes(group = seq_along(start)), size=10)+
  xlab("Time") +
  ylab("Activity") +
  ggtitle("Saturday")+ 
  facet_wrap(~participantId, nrow=2)

The heatmap below gives a broader overview on how the participants allocate their time throughout the week.

p3 <- logs_selected1 %>%
  select(participantId, currentMode, wkday, date, Start, End) %>%
  group_by(participantId, currentMode, wkday, date) %>%
  summarise(count = n()*5)

ggplot(p3,
       aes(wkday,
           currentMode,
           fill = count)) +
  geom_tile(color = "white",
            size = 0.1) +
  theme_tufte(base_family = "Helvetica") +
  coord_equal() + 
  scale_fill_gradient(name = "Frequency of activity(by mins)",
                      low = "sky blue",
                      high = "dark blue") +
  labs(x = "Day",
       y = "Activity",
       title = "Frequency of activity by day") +
  theme(axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6),
axis.text.x = element_text(size = 8, angle = 90)) +
  facet_wrap(~participantId, ncol = 1)

Take-home Exercise 4

Overview

Getting Started

Importing Data

Preparation of Data

Plotting the graph