networkD3
sankeyNetwork
functionsankeyNetwork(
Links,
a data frame
object with the links between the nodes
Nodes,
a data
frame containing the node id and properties of the nodes
Source,
character string naming the network source variable
in the Links
data frame
Target,
character
string naming the network target variable in the Links
data
frame
Value,
character string naming the variable in
the Links
data frame for how far away the nodes are from
one another
NodeID,
character string specifying the
node IDs in the Nodes
data frame. Must be 0-indexed
NodeGroup,
character string specifying the node groups in
the Nodes
, used to color the nodes in the network
LinkGroup,
character string specifying the groups in the
Links
, used to color the links in the network
units,
character string describing physical units (if any)
for Value
colourScale,
character string specifying the
categorical color scale for the nodes
fontSize,
numeric font size in pixels for the node text labels
fontFamily,
font family for the node text labels
nodeWidth,
numeric width of each node
nodePadding,
numeric essentially influences the width
height
margin,
an integer or a named list/vector of
integers for the plot margins. If using a named list/vector, the
positions top, right, bottom, left are valid. If a single integer is
provided, then the value will be assigned to the right margin
height,
numeric height for the network graph’s frame area
in pixels
width,
numeric width for the network graph’s
frame area in pixels
iterations,
numeric. Number of
iterations in the diagram layout for computation of the depth
(y-position) of each node
sinksRight,
boolean. If
TRUE, the last nodes are moved to the right border of the plot
...
)
To save this object, the saveNetwork
function can be
used for the html version
(saveNetwork(obj, "path/obj.html")
). Then, the
webshot
function from the webshot
package can
be used to save a png format
(webshot("path/obj.html", "path/obj.png")
).
Input data can be stored in 2 different formats:
This post describes how to build a basic Sankey diagram from these 2
types of input.
A connection data frame lists all the connections one by one in a data frame.
Usually you have a source
and a target
column. You can add a third column that gives further information for
each connection, like the value of the flow.
This is the format you need to use the networkD3
library. Let’s build a connection data frame and represent it as a
Sankey diagram:
# Libraries
library(networkD3)
library(dplyr)
# A connection data frame is a list of flows with intensity for each flow
links <- data.frame(
source = c("group_A", "group_A", "group_B", "group_C", "group_C", "group_E"),
target = c("group_C", "group_D", "group_E", "group_F", "group_G", "group_H"),
value = c(2, 3, 2, 3, 1, 3)
)
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
name = c(as.character(links$source), as.character(links$target)) %>% unique()
)
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1
# Make the Network
p <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
fontSize = 16, sinksRight=FALSE)
An other example with additional graphic arguments and where the size of each node is displayed:
# Libraries
library(networkD3)
library(dplyr)
# A connection data frame is a list of flows with intensity for each flow
set.seed(123)
links <- data.frame(
source = rep(c("group_A", "group_B", "group_C", "group_D"),each=4),
target = rep(c("group_A ", "group_B ", "group_C ", "group_D "),4),
value = sample(0:4, size=16, replace=T)
) %>% filter(value!=0)
links$source <- as.factor(links$source)
links$target <- as.factor(links$target)
# We add size of each node
links$source_n <- NA
links$target_n <- NA
for(i in 1:nrow(links)){
links$source_n[i] <- paste0(links$source[i],' (n=',sum(links$value[links$source==links$source[i]]),')')
links$target_n[i] <- paste0(links$target[i],' (n=',sum(links$value[links$target==links$target[i]]),')')
}
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
name_n = c(as.character(links$source_n), as.character(links$target_n)) %>% unique()
)
nodes$name <- gsub(" \\(n=[0-9]+)","",nodes$name_n)
nodes$name <- gsub(" $","",nodes$name)
nodes$name <- as.factor(nodes$name)
nodes$target <- c(rep(0,length(unique(nodes$name))),rep(1,length(unique(nodes$name))))
nodes <- nodes %>% arrange(target, name_n)
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source_n, nodes$name_n)-1
links$IDtarget <- match(links$target_n, nodes$name_n)-1
# Make the Network
my_color <- paste0('d3.scaleOrdinal() .domain(["',
paste(nodes$name_n, collapse='","'),
'"]) .range(["#6699CC","#CC3333","orange","#666699","#6699CC","#CC3333","orange","#666699"])')
p <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name_n",
iteration=0,
sinksRight=FALSE, nodeWidth = 20,
fontSize = 16, nodePadding = 20,
colourScale=my_color, LinkGroup="source_n")
An incidence matrix is square or rectangle.
Row and column names are node names. The item in row x and column y represents the flow between x and y. In the Sankey diagram we represent all flows that are over 0.
Since the networkD3
library expects a connection data
frame, we will fist convert the dataset, and then re-use the code from
above.
# Libraries
library(networkD3)
library(dplyr)
# Create an incidence matrix. Usually the flow goes from the row names to the column names.
# Remember that our connection are directed since we are working with a flow.
data <- matrix(c(0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,
2,0,0,0,0,0,0,0, 3,0,0,0,0,0,0,0,
0,2,0,0,0,0,0,0, 0,0,3,0,0,0,0,0,
0,0,1,0,0,0,0,0, 0,0,0,0,3,0,0,0), 8, 8)
colnames(data) = rownames(data) = c("group_A", "group_B", "group_C", "group_D", "group_E", "group_F", "group_G", "group_H")
# Transform it to connection data frame with tidyr from the tidyverse:
links <- data %>%
as.data.frame() %>%
tibble::rownames_to_column(var="source") %>%
tidyr::gather(key="target", value="value", -1) %>%
filter(value != 0)
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
name = c(as.character(links$source), as.character(links$target)) %>% unique()
)
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1
# Make the Network
p <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
fontSize = 16, sinksRight=FALSE)
Here is an example displaying the number of people migrating from one country (left) to another (right).
# Libraries
library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(circlize)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'
# Make the Network
p <- sankeyNetwork(Links = data_long, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)
plotly
How to create sankey diagrams in R with Plotly :
# Libraries
library(plotly)
library(rjson)
json_file <- "https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
fig <- plot_ly(
type = "sankey",
domain = list(
x = c(0,1),
y = c(0,1)
),
orientation = "h",
valueformat = ".0f",
valuesuffix = "TWh",
node = list(
label = json_data$data[[1]]$node$label,
color = json_data$data[[1]]$node$color,
pad = 15,
thickness = 15,
line = list(
color = "black",
width = 0.5
)
),
link = list(
source = json_data$data[[1]]$link$source,
target = json_data$data[[1]]$link$target,
value = json_data$data[[1]]$link$value,
label = json_data$data[[1]]$link$label
)
)
fig <- fig %>% layout(
title = "Energy forecast for 2050<br>Source: Department of Energy & Climate Change, Tom Counsell via <a href='https://bost.ocks.org/mike/sankey/'>Mike Bostock</a>",
font = list(
size = 10
),
xaxis = list(showgrid = F, zeroline = F),
yaxis = list(showgrid = F, zeroline = F)
)
This document is a work of the statistics team in the Biostatistics and Medical Information Department at Saint-Louis Hospital in Paris (SBIM).
Developed and updated by Noémie Bigot and Anouk Walter-Petrich
noemie.bigot@aphp.fr; anouk.walter-petrich@u-paris.fr
Based on The R Graph Gallery by Yan Holtz.