Weather Data Import and Cleaning

Load Libraries for This Exercise

library("ChickpeaAscoDispersal")
library("tidyverse")
library("lubridate")
library("skimr")
library("ggpubr")
library("kableExtra")
library("scales")

theme_set(theme_pubclean(base_size = 14))

Weather data

Import and select only the weather data necessary for analysis from Curyo and Horsham weather stations. The original data are located in the “data” directory. Dates for events are recorded in the “data/Dispersal experiments dates.csv” file and are used to subset the weather data in this file.

Irrigation Amount Data

Dr J. Fanning (AgVic) tested the irrigation system on two separate days, 2020-03-17 and 2020-03-18. Following is his e-mail and conclusions.

This is based on the following I have checked the irrigation schedules and I irrigated for 90 minutes each time for these experiments. When I irrigated it was usually below 10kph.

The team have run the test once yesterday morning and once this morning. It is difficult, as the wind has not been dropping down to 0-5kph and there is no wind breaks out there. Based on the forecast this will be the best we can get for at least the length of the forecast ahead.

First test showed 0.12mm per min, with 10-15kph wind

Second test showed 0.13mm per min with 10-20kph wind

Even with the variability I am confident in these figures as the wind is mainly changing where is being irrigated rather than the amount based on the testing. Less irrigation into the wind. We have a Northerly wind currently where as it was westerly when we irrigated so wind would have been blowing with the length of the sprinkler system so to speak. With the piping running East/West. The attached picture is orientated with North to the top of the page so shows the length running east/west which I feels reduces the variability in irrigation.

Based on this, we elected to use 11 mm as the amount of irrigation applied during the spread events at Horsham.

These values are added to the data in the Summarise weather data by event code chunk for the raw weather output.

Curyo Weather Data

In this first step, the data are imported and only the date and time, average wind speed and average wind direction are selected.

Import Curyo Weather Data

Curyo_w <-
   read_csv(
      system.file(
         "extdata",
         "Curyo_SPA_2019_weather.csv",
         package = "ChickpeaAscoDispersal",
         mustWork = TRUE
      )
   ) %>%
   select(Time,
          'Wind Speed - average (km/h)',
          'Wind Direction - average (º)',
          "Rainfall - (mm)") %>%
   mutate(Time = dmy_hm(Time)) %>%
   mutate(Location = "Curyo") %>%
   select(Location, everything())

Inspect the Curyo Weather Data

skim(Curyo_w)

Data summary
Name	Curyo_w
Number of rows	43473
Number of columns	5
_______________________
Column type frequency:
character	1
numeric	3
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Location	0	1	5	5	0	1	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p25	p50	p75	p100	hist
Wind Speed - average (km/h)	1	9.33	7.46	3.85	7.78	13.46	55.30	▇▃▁▁▁
Wind Direction - average (º)	1	165.15	142.04	7.85	158.55	307.60	359.89	▇▁▁▃▆
Rainfall - (mm)	1	0.01	0.07	0.00	0.00	0.00	3.80	▇▁▁▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
Time	0	1	2019-01-22 12:40:00	2019-12-06 14:30:00	2019-07-01 18:00:00	43473

Horsham Weather Data

Import Horsham Weather Data

Horsham_w <-
   read_csv(
      system.file(
         "extdata",
         "Horsham_SPA_2019_weather.csv",
         package = "ChickpeaAscoDispersal",
         mustWork = TRUE
      )
   ) %>%
   select(Time,
          'Wind Speed - average (km/h)',
          'Wind Direction - average (º)',
          "Rainfall - (mm)") %>%
   mutate(Time = dmy_hm(Time)) %>%
   mutate(Location = "Horsham") %>%
   select(Location, everything())

Inspect Horsham Weather Data

skim(Horsham_w)

Data summary
Name	Horsham_w
Number of rows	38705
Number of columns	5
_______________________
Column type frequency:
character	1
numeric	3
POSIXct	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Location	0	1	7	7	0	1	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p25	p50	p75	p100	hist
Wind Speed - average (km/h)	1	8.16	6.98	2.77	6.37	12.13	49.39	▇▃▁▁▁
Wind Direction - average (º)	1	164.14	133.97	15.19	160.41	292.59	359.89	▇▂▁▅▆
Rainfall - (mm)	1	0.01	0.07	0.00	0.00	0.00	6.80	▇▁▁▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
Time	0	1	2019-01-01	2019-12-06 14:30:00	2019-06-10 07:20:00	38705

Merge and Filter the Data for Events

Create the Event Data

Event data have dates and times when trap plants were deployed, retrieved and assessed for each event.

events <-
   read_csv(
      system.file(
         "extdata",
         "Dispersal_experiment_dates.csv",
         package = "ChickpeaAscoDispersal",
         mustWork = TRUE
      )
   ) %>%
   mutate(`assessment date` = dmy(`assessment date`)) %>%
   mutate(exposed = interval(`time out`, `time removed`))

kable(events, format = "html", table.attr = 'class="table table-hover"')

site	rep	time out	time removed	assessment date	exposed
Horsham irrigated	1	2019-10-09 18:00:00	2019-10-10 18:00:00	2019-10-24	2019-10-09 18:00:00 UTC–2019-10-10 18:00:00 UTC
Horsham irrigated	2	2019-10-14 18:00:00	2019-10-15 18:00:00	2019-10-31	2019-10-14 18:00:00 UTC–2019-10-15 18:00:00 UTC
Horsham irrigated	3	2019-11-06 18:00:00	2019-11-07 18:00:00	2019-11-21	2019-11-06 18:00:00 UTC–2019-11-07 18:00:00 UTC
Horsham dryland	1	2019-10-15 08:00:00	2019-10-17 18:00:00	2019-10-31	2019-10-15 08:00:00 UTC–2019-10-17 18:00:00 UTC
Horsham dryland	2	2019-11-01 08:00:00	2019-11-08 18:00:00	2019-11-22	2019-11-01 08:00:00 UTC–2019-11-08 18:00:00 UTC
Curyo	1	2019-10-15 08:00:00	2019-10-17 18:00:00	2019-10-31	2019-10-15 08:00:00 UTC–2019-10-17 18:00:00 UTC

Filter and Merge the Locations’ Data

Filter the data removing any dates that do not have “event” data necessary for analysis. Because events overlap at Horsham, the dryland and irrigated sites are handled separately first, then combined. To do this, first filter(), then use case_when() to match the dates and times with the events data frame and create new variables to indicate which replicate and location, which is used to determine an event in the data.

Horsham Irrigated

Horsham_irrg <-
   Horsham_w %>%
   filter(Time %within% events[1, "exposed"] |
             Time %within% events[2, "exposed"] |
             Time %within% events[3, "exposed"]) %>%
   mutate(
      Location = case_when(
         Time %within% events[1, "exposed"] ~ events[[1, "site"]],
         Time %within% events[2, "exposed"] ~ events[[2, "site"]],
         Time %within% events[3, "exposed"] ~ events[[3, "site"]]
      )
   ) %>%
   mutate(
      Rep = case_when(
         Time %within% events[1, "exposed"] ~ events[[1, "rep"]],
         Time %within% events[2, "exposed"] ~ events[[2, "rep"]],
         Time %within% events[3, "exposed"] ~ events[[3, "rep"]]
      )
   ) %>%
   rename(site = Location, rep = Rep, time = Time) %>%
   select(site, rep, time, everything())

Horsham Rain

Horsham_rain <-
   Horsham_w %>%
   filter(Time %within% events[4, "exposed"] |
             Time %within% events[5, "exposed"]) %>%
   mutate(Location = case_when(Time %within% events[4, "exposed"] ~ events[[4, "site"]],
                               Time %within% events[5, "exposed"] ~ events[[5, "site"]])) %>%
   mutate(Rep = case_when(Time %within% events[4, "exposed"] ~ events[[4, "rep"]],
                          Time %within% events[5, "exposed"] ~ events[[5, "rep"]], )) %>%
   rename(site = Location, rep = Rep, time = Time) %>%
   select(site, rep, time, everything())

Curyo Rain

Curyo_rain <-
   Curyo_w %>%
   filter(Time %within% events[which(events$site == "Curyo"), "exposed"]) %>%
   mutate(Location = case_when(Time %within% events[which(events$site == "Curyo"),
                                                    "exposed"] ~ "Curyo")) %>%
   mutate(Rep = case_when(Time %within% events[which(events$site == "Curyo"), "exposed"] ~
                             events[[which(events$site == "Curyo"), "rep"]])) %>%
   rename(site = Location, rep = Rep, time = Time) %>%
   select(site, rep, time, everything())

weather <- bind_rows(Curyo_rain, Horsham_irrg, Horsham_rain)

Rename Columns and Add Other Calculations

The Wind Speed - average (km/h) column is converted to metres per second and renamed wind_speed. The standard deviation of the wind speed and wind direction are calculated for the data. The columns are then reordered by Location, Rep, Time and the weather data.

cleaned_weather <-
   weather %>%
   mutate(`Wind Speed - average (km/h)` = `Wind Speed - average (km/h)` /
             3.6) %>%
   rename(wind_speed = `Wind Speed - average (km/h)`) %>%
   rename(wind_direction = `Wind Direction - average (º)`) %>%
   rename(rainfall = `Rainfall - (mm)`) %>%
   select(site, rep, time, everything()) %>%
   arrange(site, rep, time) %>% 
   mutate_at(vars(site, rep), factor)

glimpse(cleaned_weather)

## Rows: 2,202
## Columns: 6
## $ site           <fct> Curyo, Curyo, Curyo, Curyo, Curyo, Curyo, Curyo, Curyo,…
## $ rep            <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ time           <dttm> 2019-10-15 08:00:00, 2019-10-15 08:10:00, 2019-10-15 0…
## $ wind_speed     <dbl> 3.37, 3.44, 3.30, 3.49, 3.69, 3.73, 3.61, 3.32, 3.40, 3…
## $ wind_direction <dbl> 350.537678, 357.810346, 357.167889, 357.911110, 355.975…
## $ rainfall       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Summarise Weather Data by Event

The weather data are now ready for summarising for each event that occurred. The resulting columns are:

mws - mean wind speed in metres per second
ws_sd - standard deviation of the wind speed
mwd - mean wind direction in degrees
wd_sd - standard deviation of the wind direction in degrees
sum_rain - total rainfall and irrigation (if applicable) during the event

summary_weather <-
   cleaned_weather %>%
   group_by(site, rep) %>%
   summarise(
      mws = mean(wind_speed),
      ws_sd = sd(wind_speed),
      max_ws = max(wind_speed),
      min_ws = min(wind_speed),
      mwd = circular.averaging(wind_direction),
      sum_rain = sum(rainfall)
   ) %>%
   mutate(# add the 11 mm of irrigation to the summary
      sum_rain =
         case_when(site == "Horsham irrigated" ~ sum_rain + 11,
                   TRUE ~ sum_rain))

## `summarise()` has grouped output by 'site'. You can override using the
## `.groups` argument.

kable(summary_weather,
      align = "c",
      caption = "Summary weather data for replicated rain event (spread event) per unique site.")

Summary weather data for replicated rain event (spread event) per unique site.
site	rep	mws	ws_sd	max_ws	min_ws	mwd	sum_rain
Curyo	1	3.577163	2.2302100	9.41	0.00	353.5045	0.8
Horsham dryland	1	3.085387	1.9379332	7.61	0.00	311.0201	4.6
Horsham dryland	2	3.996127	2.7133595	11.24	0.00	332.3283	18.6
Horsham irrigated	1	1.902828	0.9050266	4.01	0.35	234.6542	11.0
Horsham irrigated	2	2.233103	1.2895001	4.16	0.00	272.7072	11.0
Horsham irrigated	3	6.531586	2.1241262	11.24	2.75	333.6699	11.6

Save Weather Data

Save weather data for use in visualisation and modelling. This only needs to be done once.

save(cleaned_weather, file = "./data/cleaned_weather.rda")
save(summary_weather, file = "./data/summary_weather.rda")

A.H. Sparks and P. Melloy

2024-05-04