By Charlie Joey Hadley | February 28, 2018
Citable Author
Charlie Joey Hadley
(orcid.org/0000-0002-3039-6849)
Citable Data
No real data used.
Surprisingly often I’m asked to help folks build maps which visualise the journeys between geographic locations, in the bad old days of early 2017 I would have used the geospheres
package and sp
.
That’s because the shortest distance between two points on a map is a great circle, and geospheres
is capable of computing such things. Note that it’s these great circles that you see on flight information screens, their exact curvature depends on the projection of the map.
However, I’m now an sf
convert and try to do everything within that package. In my original version of this blogpost I introduced my painfully inefficient way to compute these great circles with sf
and asked folks in the R4DS Slack group run by Jesse Maegan if they had any suggested improvements, which I’ve now baked in below.
Typically my datasets are collections of journeys, here’s an example of what the data looks like (these are some of the journeys I took during the 3 years I worked for Wolfram Research):
library("tidyverse")
library("sf")
raw_journey_data <- tribble(
~start.long, ~start.lat, ~end.long, ~end.lat, ~name,
-0.118092, 51.509865, -119.698189, 34.420830, "London to Santa Barbara",
31.496773, 30.026300, 24.753574, 59.436962, "New Cairo to Tallinn",
126.633333, 45.75, 46.738586, 24.774265, "Harbin to Riyadh"
)
raw_journey_data
I’ve split this post into two sections below:
- journeys_to_sf: A function for reliably converting journey data to an
sf
object that can be plotted withleaflet
andggplot2
- Creating
sf
features fromdata.frames
journeys_to_sf
Here’s my function for converting a data.frame
containing journeys to a usable "sf" "data.frame"
through the use of tidyeval:
journeys_to_sf <- function(journeys_data,
start_long = start.long,
start_lat = start.lat,
end_long = end.long,
end_lat = end.lat) {
quo_start_long <- enquo(start_long)
quo_start_lat <- enquo(start_lat)
quo_end_long <- enquo(end_long)
quo_end_lat <- enquo(end_lat)
journeys_data %>%
select(
!! quo_start_long,
!! quo_start_lat,
!! quo_end_long,
!! quo_end_lat
) %>%
transpose() %>%
map(~ matrix(flatten_dbl(.), nrow = 2, byrow = TRUE)) %>%
map(st_linestring) %>%
st_sfc(crs = 4326) %>%
st_sf(geometry = .) %>%
bind_cols(journeys_data) %>%
select(everything(), geometry)
}
This can very easily be inserted into a pipe chain for visualising with leaflet
:
library("leaflet")
raw_journey_data %>%
journeys_to_sf() %>%
st_segmentize(units::set_units(100, km)) %>%
leaflet() %>%
addTiles() %>%
addPolylines() %>%
addCircleMarkers(
lng = ~start.long,
lat = ~start.lat,
color = "green",
opacity = 1,
radius = 2
) %>%
addCircleMarkers(
lng = ~end.long,
lat = ~end.lat,
color = "red",
opacity = 1,
radius = 2
) %>%
addLegend(
colors = c(
"green",
"red"
),
labels = c(
"Journey start",
"Journey end"
)
)
A quick note to say, leaflet
is an awesome htmlwidget
for building interactive maps. If you’ve never heard of htmlwidgets
before, check out my blogpost; where’s the love for htmlwidgets?
Creating sf
features from data.frame
The sf
library allows R folks to work with GIS data using the very widely used Simple Features standard for working with 2D geometries. All sf
objects are constructed from a combination of 17 unique “simple features”, which are usefully summarised here.
Originally my blogpost used a quite inefficient method for converting data.frame
to sf
. Thankfully I’m a member of the excellent R4DS Slack group and I got great advice from Jesse Sadler and Kent Johnson. Here’s my old horrible method of doing things:
start_locs <- raw_journey_data %>%
select(
start.long,
start.lat
) %>%
setNames(c("long", "lat")) %>%
mutate(journey_id = row_number())
end_locs <- raw_journey_data %>%
select(
end.long,
end.lat
) %>%
setNames(c("long", "lat")) %>%
mutate(journey_id = row_number())
journey_data_linestrings <- start_locs %>%
bind_rows(end_locs) %>%
st_as_sf(coords = c("long", "lat")) %>%
group_by(journey_id) %>%
arrange(journey_id) %>%
summarise() %>%
st_cast("LINESTRING")
journey_data_linestrings <- st_set_crs(journey_data_linestrings, st_crs(4326))
journey_data <- raw_journey_data
st_geometry(journey_data) <- st_geometry(journey_data_linestrings)
journey_data
## Simple feature collection with 3 features and 5 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -119.6982 ymin: 24.77426 xmax: 126.6333 ymax: 59.43696
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 3 x 6
## start.long start.lat end.long end.lat name geometry
## <dbl> <dbl> <dbl> <dbl> <chr> <LINESTRING [°]>
## 1 -0.118 51.5 -120. 34.4 London … (-119.6982 34.42083, -0.1…
## 2 31.5 30.0 24.8 59.4 New Cai… (24.75357 59.43696, 31.49…
## 3 127. 45.8 46.7 24.8 Harbin … (46.73859 24.77426, 126.6…
Let’s go through the stages required and re-write my method to be more purrr
dependent:
- Create a list of
LINESTRING
s - Combine the list
LINESTRING
s into a collection ofsf
features - Join this collection of
sf
features with the original dataset
LINESTRINGs
We need LINESTRING
s for our great circles as these represent a “sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry”. As with all simple features, LINESTRING
s are constructed from matrices:
st_linestring(matrix(1:6, 3))
## LINESTRING (1 4, 2 5, 3 6)
Let’s extract the start and end coordinates from our raw_journeys_data
and convert them to a list
with purrr::transpose
:
raw_journey_data %>%
select(-name) %>%
transpose()
## [[1]]
## [[1]]$start.long
## [1] -0.118092
##
## [[1]]$start.lat
## [1] 51.50986
##
## [[1]]$end.long
## [1] -119.6982
##
## [[1]]$end.lat
## [1] 34.42083
##
##
## [[2]]
## [[2]]$start.long
## [1] 31.49677
##
## [[2]]$start.lat
## [1] 30.0263
##
## [[2]]$end.long
## [1] 24.75357
##
## [[2]]$end.lat
## [1] 59.43696
##
##
## [[3]]
## [[3]]$start.long
## [1] 126.6333
##
## [[3]]$start.lat
## [1] 45.75
##
## [[3]]$end.long
## [1] 46.73859
##
## [[3]]$end.lat
## [1] 24.77426
These lists can easily be converted to matrices using map
:
raw_journey_data %>%
select(-name) %>%
transpose() %>%
map(~ matrix(flatten_dbl(.), nrow = 2, byrow = TRUE))
## [[1]]
## [,1] [,2]
## [1,] -0.118092 51.50986
## [2,] -119.698189 34.42083
##
## [[2]]
## [,1] [,2]
## [1,] 31.49677 30.02630
## [2,] 24.75357 59.43696
##
## [[3]]
## [,1] [,2]
## [1,] 126.63333 45.75000
## [2,] 46.73859 24.77426
… and finally we can convert these to
list_of_linestrings <- raw_journey_data %>%
select(-name) %>%
transpose() %>%
map(~ matrix(flatten_dbl(.), nrow = 2, byrow = TRUE)) %>%
map(st_linestring)
list_of_linestrings
## LINESTRING (-0.118092 51.50986, -119.6982 34.42083)
## LINESTRING (31.49677 30.0263, 24.75357 59.43696)
## LINESTRING (126.6333 45.75, 46.73859 24.77426)
Collecting features
Eventually we will create an "sf" "data.frame"
which contains our LINESTRING
s in the geometry
column. We create a set of features for this column from our list of LINESTRING
s with the st_sfc
function. Note that we need to specify a coordinate system, the most obvious choice is the WGS84 coordinate reference system, that’s why we specify crs = 4326
:
list_of_linestrings %>%
st_sfc(crs = 4326)
## Geometry set for 3 features
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -119.6982 ymin: 24.77426 xmax: 126.6333 ymax: 59.43696
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
Now we have this set of features we create a collection of them in a "sf" "data.frame"
with st_sf
:
collection_sf <- list_of_linestrings %>%
st_sfc(crs = 4326) %>%
st_sf(geometry = .)
collection_sf
## Simple feature collection with 3 features and 0 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -119.6982 ymin: 24.77426 xmax: 126.6333 ymax: 59.43696
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## geometry
## 1 LINESTRING (-0.118092 51.50...
## 2 LINESTRING (31.49677 30.026...
## 3 LINESTRING (126.6333 45.75,...
Joining feature collections with data
As sf
collections store all of the GIS data in a column it’s very easy to combine this object with our original data - we use dplyr::bind_columns
collection_sf %>%
bind_cols(raw_journey_data)
## Simple feature collection with 3 features and 5 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -119.6982 ymin: 24.77426 xmax: 126.6333 ymax: 59.43696
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## geometry start.long start.lat end.long end.lat
## 1 LINESTRING (-0.118092 51.50... -0.118092 51.50986 -119.69819 34.42083
## 2 LINESTRING (31.49677 30.026... 31.496773 30.02630 24.75357 59.43696
## 3 LINESTRING (126.6333 45.75,... 126.633333 45.75000 46.73859 24.77426
## name
## 1 London to Santa Barbara
## 2 New Cairo to Tallinn
## 3 Harbin to Riyadh
We can now perform calculations on our dataset, for instance compute great circles with a resolution of 100km:
collection_sf %>%
bind_cols(raw_journey_data) %>%
st_segmentize(units::set_units(100, km)) %>%
leaflet() %>%
addTiles() %>%
addPolylines(label = ~name)
Sadly, there’s no good solution for adding arrow heads to these great circles - they would be heavily distorted by the map projection and would look ugly. An alternative solution to showing where journeys start and end is to use different coloured circle markers:
collection_sf %>%
bind_cols(raw_journey_data) %>%
st_segmentize(units::set_units(100, km)) %>%
leaflet() %>%
addTiles() %>%
addPolylines() %>%
addCircleMarkers(
lng = ~start.long,
lat = ~start.lat,
color = "green",
opacity = 1,
radius = 2
) %>%
addCircleMarkers(
lng = ~end.long,
lat = ~end.lat,
color = "red",
opacity = 1,
radius = 2
) %>%
addLegend(
colors = c(
"green",
"red"
),
labels = c(
"Journey start",
"Journey end"
)
)
Citable Author
Charlie Joey Hadley
(orcid.org/0000-0002-3039-6849)
Citable Data
No real data used.