| Title: | Bayesian Surprise for De-Biasing Thematic Maps |
|---|---|
| Description: | Implements Bayesian Surprise methodology for data visualization, based on Correll and Heer (2017) <doi:10.1109/TVCG.2016.2598839> "Surprise! Bayesian Weighting for De-Biasing Thematic Maps". Provides tools to weight event data relative to spatio-temporal models, highlighting unexpected patterns while de-biasing against known factors like population density or sampling variation. Integrates seamlessly with 'sf' for spatial data and 'ggplot2' for visualization. Supports temporal/streaming data analysis. |
| Authors: | Dmitry Shkolnik [aut, cre] |
| Maintainer: | Dmitry Shkolnik <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-22 08:37:32 UTC |
| Source: | https://github.com/dshkol/bayesiansurpriser |
Adds a new model to an existing model space.
add_model(model_space, model, prior_weight = NULL)add_model(model_space, model, prior_weight = NULL)
model_space |
A |
model |
A |
prior_weight |
Prior probability for the new model. The existing priors are rescaled to accommodate. |
Updated bs_model_space
space <- model_space(bs_model_uniform()) space <- add_model(space, bs_model_gaussian(), prior_weight = 0.3)space <- model_space(bs_model_uniform()) space <- add_model(space, bs_model_gaussian(), prior_weight = 0.3)
Simplified interface that automatically selects appropriate models and computes per-observation surprise from the model priors.
auto_surprise( observed, expected = NULL, sample_size = NULL, include_gaussian = FALSE, include_sampled = FALSE, signed = TRUE, ... )auto_surprise( observed, expected = NULL, sample_size = NULL, include_gaussian = FALSE, include_sampled = FALSE, signed = TRUE, ... )
observed |
Numeric vector of observed values |
expected |
Numeric vector of expected values (optional) |
sample_size |
Numeric vector of sample sizes (optional) |
include_gaussian |
Include Gaussian model? |
include_sampled |
Include KDE model? |
signed |
Compute signed surprise? |
... |
Additional arguments |
A bs_surprise object
observed <- c(50, 100, 150, 200, 75) expected <- c(10000, 50000, 100000, 25000, 15000) result <- auto_surprise(observed, expected)observed <- c(50, 100, 150, 200, 75) expected <- c(10000, 50000, 100000, 25000, 15000) result <- auto_surprise(observed, expected)
Updates the prior probability distribution over models given observed data, using Bayes' rule.
bayesian_update(model_space, observed, region_idx = NULL, ...)bayesian_update(model_space, observed, region_idx = NULL, ...)
model_space |
A |
observed |
Numeric vector of observed values |
region_idx |
Optional integer index for region-specific likelihood |
... |
Additional arguments passed to likelihood functions |
Applies Bayes' rule:
where P(D|M) is the likelihood of data D given model M, and P(M) is the prior.
Updated bs_model_space with posterior probabilities
# Create a model space space <- model_space( bs_model_uniform(), bs_model_gaussian() ) # Update with observed data observed <- c(10, 20, 30, 40, 50) updated <- bayesian_update(space, observed) print(updated)# Create a model space space <- model_space( bs_model_uniform(), bs_model_gaussian() ) # Update with observed data observed <- c(10, 20, 30, 40, 50) updated <- bayesian_update(space, observed) print(updated)
Creates a model that compares observed events to expected rates based on a known baseline (e.g., population). This addresses "base rate bias" where patterns in visualizations are dominated by underlying factors like population density.
bs_model_baserate(expected, normalize = TRUE, name = NULL)bs_model_baserate(expected, normalize = TRUE, name = NULL)
expected |
Numeric vector of expected values or proportions. E.g., population counts, area sizes, or any prior expectation. |
normalize |
Logical; normalize expected to sum to 1? |
name |
Optional name for the model |
Under the base rate model, expected proportions are defined by the
expected vector. The likelihood measures how well observed data
matches these expected proportions:
For example, if region A has 10% of the population, we expect 10% of events. Regions with event rates matching their population share show low surprise; regions with disproportionate rates show high surprise.
This is the primary tool for de-biasing choropleth maps.
A bs_model_baserate object
# Population-weighted base rate population <- c(10000, 50000, 100000, 25000) model <- bs_model_baserate(population) # Use in model space space <- model_space( bs_model_uniform(), bs_model_baserate(population) )# Population-weighted base rate population <- c(10000, 50000, 100000, 25000) model <- bs_model_baserate(population) # Use in model space space <- model_space( bs_model_uniform(), bs_model_baserate(population) )
Convenience function to create a base rate model from a data frame column.
bs_model_baserate_col(data, column, ...)bs_model_baserate_col(data, column, ...)
data |
Data frame or sf object |
column |
Name of column containing expected values |
... |
Additional arguments passed to |
A bs_model_baserate object
df <- data.frame(expected = c(100, 200, 300)) model <- bs_model_baserate_col(df, "expected") model$namedf <- data.frame(expected = c(100, 200, 300)) model <- bs_model_baserate_col(df, "expected") model$name
Creates a model based on a bootstrap sample of the data. Useful for assessing variability and identifying observations that are surprising relative to resampled distributions.
bs_model_bootstrap(n_bootstrap = 100, seed = NULL, name = NULL)bs_model_bootstrap(n_bootstrap = 100, seed = NULL, name = NULL)
n_bootstrap |
Number of bootstrap samples to use |
seed |
Random seed for reproducibility |
name |
Optional name for the model |
A bs_model_bootstrap object
model <- bs_model_bootstrap(n_bootstrap = 100)model <- bs_model_bootstrap(n_bootstrap = 100)
Creates a model that normalizes observations by their expected standard error, accounting for varying sample sizes. This addresses "sampling error bias" where regions with small sample sizes show artificially high variability.
bs_model_funnel( sample_size, target_rate = NULL, type = c("count", "proportion"), formula = c("paper", "poisson"), control_limits = c(2, 3), name = NULL )bs_model_funnel( sample_size, target_rate = NULL, type = c("count", "proportion"), formula = c("paper", "poisson"), control_limits = c(2, 3), name = NULL )
sample_size |
Numeric vector of sample sizes (e.g., population) |
target_rate |
Target rate/proportion. If NULL, estimated from data. |
type |
Type of data: "count" (Poisson) or "proportion" (binomial) |
formula |
Formula for likelihood computation:
|
control_limits |
Numeric vector of control limits (in SDs) for funnel plot. Default is c(2, 3) for warning and control limits. |
name |
Optional name for the model |
The de Moivre funnel model uses the insight that sampling variability decreases with sample size according to de Moivre's equation:
With formula = "paper": The model uses the formula that matches the paper's unemployment reference data:
With formula = "poisson": For count data (Poisson), the model computes z-scores as:
For proportion data (binomial):
Observations with large z-scores (far from expected after accounting for sample size) are genuinely surprising, while high rates in small regions are discounted as expected variation.
This model is essential for:
De-biasing per-capita rate maps
Creating funnel plots
Identifying genuine outliers vs. sampling noise
A bs_model_funnel object
# Population sizes for regions population <- c(10000, 50000, 100000, 25000) # Funnel model using the paper's unemployment-reference formula model <- bs_model_funnel(population, formula = "paper") # Funnel model with known target rate model <- bs_model_funnel(population, target_rate = 0.001) # For proportion data with Poisson-based formula model <- bs_model_funnel(population, type = "proportion", formula = "poisson")# Population sizes for regions population <- c(10000, 50000, 100000, 25000) # Funnel model using the paper's unemployment-reference formula model <- bs_model_funnel(population, formula = "paper") # Funnel model with known target rate model <- bs_model_funnel(population, target_rate = 0.001) # For proportion data with Poisson-based formula model <- bs_model_funnel(population, type = "proportion", formula = "poisson")
Convenience function to create a funnel model from a data frame column.
bs_model_funnel_col(data, column, ...)bs_model_funnel_col(data, column, ...)
data |
Data frame or sf object |
column |
Name of column containing sample sizes |
... |
Additional arguments passed to |
A bs_model_funnel object
df <- data.frame(sample_size = c(1000, 2000, 3000)) model <- bs_model_funnel_col(df, "sample_size") model$namedf <- data.frame(sample_size = c(1000, 2000, 3000)) model <- bs_model_funnel_col(df, "sample_size") model$name
Creates a model based on a Gaussian (normal) distribution. This parametric model is useful for detecting outliers and identifying multiple modes in data.
bs_model_gaussian(mu = NULL, sigma = NULL, fit_from_data = TRUE, name = NULL)bs_model_gaussian(mu = NULL, sigma = NULL, fit_from_data = TRUE, name = NULL)
mu |
Mean of the Gaussian. If NULL, estimated from data. |
sigma |
Standard deviation. If NULL, estimated from data. |
fit_from_data |
Logical; estimate parameters from data? |
name |
Optional name for the model |
The Gaussian model assumes data is drawn from a normal distribution. Points far from the mean (in terms of standard deviations) will have low likelihood and thus create high surprise when this model has probability mass.
This model is particularly useful for:
Detecting spatial outliers
Identifying multi-modal distributions
Combating renormalization bias (outliers get suppressed in dynamic visualizations)
A bs_model_gaussian object
# Gaussian model with parameters fit from data model <- bs_model_gaussian() # Gaussian model with fixed parameters model <- bs_model_gaussian(mu = 100, sigma = 20) # Use in model space with other models population <- c(10000, 50000, 100000, 25000) space <- model_space( bs_model_uniform(), bs_model_baserate(population), bs_model_gaussian() )# Gaussian model with parameters fit from data model <- bs_model_gaussian() # Gaussian model with fixed parameters model <- bs_model_gaussian(mu = 100, sigma = 20) # Use in model space with other models population <- c(10000, 50000, 100000, 25000) space <- model_space( bs_model_uniform(), bs_model_baserate(population), bs_model_gaussian() )
Creates a model based on a mixture of Gaussians for multi-modal data.
bs_model_gaussian_mixture(means, sds, weights = NULL, name = NULL)bs_model_gaussian_mixture(means, sds, weights = NULL, name = NULL)
means |
Vector of means for each component |
sds |
Vector of standard deviations for each component |
weights |
Vector of mixture weights (must sum to 1) |
name |
Optional name for the model |
A bs_model_gaussian_mixture object
# Two-component mixture model <- bs_model_gaussian_mixture( means = c(50, 150), sds = c(10, 20), weights = c(0.7, 0.3) )# Two-component mixture model <- bs_model_gaussian_mixture( means = c(50, 150), sds = c(10, 20), weights = c(0.7, 0.3) )
Creates a non-parametric model using kernel density estimation (KDE). This model is built from a sample of the data and can detect when subsequent observations deviate from the pattern established by early data.
bs_model_sampled( sample_frac = NULL, kernel = "gaussian", bandwidth = "nrd0", n_grid = 512, sample_indices = NULL, name = NULL )bs_model_sampled( sample_frac = NULL, kernel = "gaussian", bandwidth = "nrd0", n_grid = 512, sample_indices = NULL, name = NULL )
sample_frac |
Fraction of data to use for building the prior (0 < x < 1). If NULL, uses all data for density estimation. |
kernel |
Kernel type for density estimation. One of: "gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine" |
bandwidth |
Bandwidth selection method or numeric value. If character, one of: "nrd0", "nrd", "ucv", "bcv", "SJ". If numeric, used directly as bandwidth. |
n_grid |
Number of points in the density estimation grid (default: 512). Higher values give smoother estimates but use more memory. |
sample_indices |
Integer vector of specific indices to use for building prior.
Overrides |
name |
Optional name for the model |
The sampled model builds a density estimate from a subset of observations (typically early observations in temporal data) and measures surprise as deviation from this learned distribution.
This is useful for:
Detecting temporal changes in distribution
Building a "post hoc" model from initial observations
Detecting emerging patterns in streaming data
The likelihood for each observation is the density at that point under the KDE built from the sample.
A bs_model_sampled object
# KDE model using first 10% of data model <- bs_model_sampled(sample_frac = 0.1) # KDE with specific bandwidth model <- bs_model_sampled(bandwidth = 5) # Use specific observations for training model <- bs_model_sampled(sample_indices = 1:10)# KDE model using first 10% of data model <- bs_model_sampled(sample_frac = 0.1) # KDE with specific bandwidth model <- bs_model_sampled(bandwidth = 5) # Use specific observations for training model <- bs_model_sampled(sample_indices = 1:10)
Creates a model that assumes events are equally likely across all regions. This serves as a "null hypothesis" baseline - regions where events cluster will show high surprise under this model.
bs_model_uniform(n_regions = NULL)bs_model_uniform(n_regions = NULL)
n_regions |
Number of regions (optional, inferred from data if NULL) |
Under the uniform model, expected probability for each region is 1/n, where n is the total number of regions. The likelihood is computed as:
This is the Total Variation Distance from uniform, transformed to a probability.
The uniform model is useful for detecting spatial clustering - any concentration of events in fewer regions will produce high surprise.
A bs_model_uniform object
# Create uniform model model <- bs_model_uniform() # The model computes likelihood when used in a model space space <- model_space(model)# Create uniform model model <- bs_model_uniform() # The model computes likelihood when used in a model space space <- model_space(model)
Crime data for Canadian provinces and territories, showing mischief offenses. This dataset is adapted from the original Bayesian Surprise paper's Canada example and is useful for exploring base-rate effects: Ontario and Quebec dominate raw counts because of population, while model-based surprise scores ask which provinces are unusual relative to the chosen model space.
canada_mischiefcanada_mischief
A data frame with 13 rows and 6 variables:
Province or territory name
Population count
Number of mischief offenses
Mischief rate per 100,000 population
Proportion of total Canadian population
Proportion of total mischief offenses
Correll & Heer (2017). Surprise! Bayesian Weighting for De-Biasing Thematic Maps. IEEE InfoVis.
data(canada_mischief) # Basic exploration head(canada_mischief) # Compute surprise result <- auto_surprise( observed = canada_mischief$mischief_count, expected = canada_mischief$population ) # See which provinces are most surprising under the selected models canada_mischief$surprise <- result$surprise canada_mischief$signed_surprise <- result$signed_surprise canada_mischief[order(-abs(canada_mischief$signed_surprise)), ]data(canada_mischief) # Basic exploration head(canada_mischief) # Compute surprise result <- auto_surprise( observed = canada_mischief$mischief_count, expected = canada_mischief$population ) # See which provinces are most surprising under the selected models canada_mischief$surprise <- result$surprise canada_mischief$signed_surprise <- result$signed_surprise canada_mischief[order(-abs(canada_mischief$signed_surprise)), ]
Computes the data needed to create a funnel plot, including control limits.
compute_funnel_data( observed, sample_size, target_rate = NULL, type = c("count", "proportion"), limits = c(2, 3) )compute_funnel_data( observed, sample_size, target_rate = NULL, type = c("count", "proportion"), limits = c(2, 3) )
observed |
Numeric vector of observed values |
sample_size |
Numeric vector of sample sizes |
target_rate |
Target rate. If NULL, computed from data. |
type |
Type of data: "count" or "proportion" |
limits |
Vector of SD multiples for control limits (default: c(2, 3)) |
A data frame with columns: observed, sample_size, expected, z_score, and control limit columns (lower_2sd, upper_2sd, lower_3sd, upper_3sd, etc.)
observed <- c(50, 100, 150, 200) sample_size <- c(10000, 50000, 100000, 25000) funnel_data <- compute_funnel_data(observed, sample_size)observed <- c(50, 100, 150, 200) sample_size <- c(10000, 50000, 100000, 25000) funnel_data <- compute_funnel_data(observed, sample_size)
Computes the surprise (KL-divergence) for each observation/region, measuring how much each data point updates beliefs about the model space.
compute_surprise( model_space, observed, expected = NULL, return_signed = TRUE, return_posteriors = FALSE, return_contributions = FALSE, normalize_posterior = TRUE, ... )compute_surprise( model_space, observed, expected = NULL, return_signed = TRUE, return_posteriors = FALSE, return_contributions = FALSE, normalize_posterior = TRUE, ... )
model_space |
A |
observed |
Numeric vector of observed values (one per region) |
expected |
Numeric vector of expected values (optional, for signed surprise) |
return_signed |
Logical; compute signed surprise? |
return_posteriors |
Logical; return per-region posteriors? |
return_contributions |
Logical; return per-model contributions? |
normalize_posterior |
Logical; if TRUE (default), normalizes posteriors
to sum to 1 before computing KL divergence. This is the standard Bayesian
Surprise calculation. If FALSE, uses unnormalized posterior weights
( |
... |
Additional arguments passed to likelihood functions |
For each region i, computes:
The posterior P(M|D_i) given just that region's data
The KL-divergence from prior to posterior (surprise)
Optionally, the sign based on deviation direction
normalize_posterior = FALSE is a legacy replication mode for the original
JavaScript demo's per-region map calculation. It is not a proper KL
divergence between probability distributions and should not be used as the
default method for new analyses.
A bs_surprise object
Performs a cumulative Bayesian update, updating the prior after each observation. This is useful for streaming/temporal data.
cumulative_bayesian_update(model_space, observed, ...)cumulative_bayesian_update(model_space, observed, ...)
model_space |
A |
observed |
Numeric vector of observed values (in order) |
... |
Additional arguments passed to likelihood functions |
A list containing:
final_space: The model space after all updates
cumulative_surprise: Vector of cumulative surprise after each observation
posterior_history: Matrix of posteriors after each update
Creates a default model space suitable for most choropleth/thematic maps. Includes uniform, base rate, and funnel models.
default_model_space( expected, sample_size = expected, include_gaussian = FALSE, prior = NULL )default_model_space( expected, sample_size = expected, include_gaussian = FALSE, prior = NULL )
expected |
Numeric vector of expected values (e.g., population). Used for base rate and funnel models. |
sample_size |
Numeric vector of sample sizes. Defaults to |
include_gaussian |
Logical; include Gaussian model? |
prior |
Prior probabilities for models. Default is uniform. |
A bs_model_space object
# Default space with population as expected population <- c(10000, 50000, 100000, 25000) space <- default_model_space(population) # Include Gaussian model space <- default_model_space(population, include_gaussian = TRUE)# Default space with population as expected population <- c(10000, 50000, 100000, 25000) space <- default_model_space(population) # Include Gaussian model space <- default_model_space(population, include_gaussian = TRUE)
A simulated dataset of 50 counties with population and event counts. Some counties are designated as "hot spots" (higher than expected rates) and "cold spots" (lower than expected rates) for testing and examples.
example_countiesexample_counties
A data frame with 50 rows and 7 variables:
Unique county identifier
County name
Population count
Number of events (e.g., crimes, incidents)
Expected number of events based on population
Logical; TRUE if county has elevated rates
Logical; TRUE if county has suppressed rates
data(example_counties) # Compute surprise result <- auto_surprise( observed = example_counties$events, expected = example_counties$population ) example_counties$surprise <- result$surprise # Hot spots and cold spots should have higher surprise with(example_counties, tapply(surprise, is_hotspot, mean)) with(example_counties, tapply(surprise, is_coldspot, mean))data(example_counties) # Compute surprise result <- auto_surprise( observed = example_counties$events, expected = example_counties$population ) example_counties$surprise <- result$surprise # Hot spots and cold spots should have higher surprise with(example_counties, tapply(surprise, is_hotspot, mean)) with(example_counties, tapply(surprise, is_coldspot, mean))
Converts z-scores to two-tailed p-values under the normal distribution.
funnel_pvalue(z)funnel_pvalue(z)
z |
Numeric vector of z-scores |
Numeric vector of p-values
z <- c(-2, -1, 0, 1, 2) funnel_pvalue(z)z <- c(-2, -1, 0, 1, 2) funnel_pvalue(z)
Computes z-scores accounting for sampling variability using de Moivre's equation. This normalizes observed values by their expected standard error.
funnel_zscore(observed, expected, sample_size, type = c("count", "proportion"))funnel_zscore(observed, expected, sample_size, type = c("count", "proportion"))
observed |
Numeric vector of observed counts or rates |
expected |
Numeric vector of expected values (e.g., from base rate) |
sample_size |
Numeric vector of sample sizes (e.g., population) |
type |
Type of data: "count" (Poisson) or "proportion" (binomial) |
For count data (Poisson), SE = sqrt(expected). For proportion data (binomial), SE = sqrt(p * (1-p) / n).
Larger sample sizes result in smaller standard errors, so the same deviation is more "surprising" for larger samples.
Numeric vector of z-scores
# Observed crimes vs expected (from overall rate) observed <- c(50, 100, 150) expected <- c(45, 95, 160) sample_size <- c(10000, 50000, 100000) funnel_zscore(observed, expected, sample_size, type = "count")# Observed crimes vs expected (from overall rate) observed <- c(50, 100, 150) expected <- c(45, 95, 160) sample_size <- c(10000, 50000, 100000) funnel_zscore(observed, expected, sample_size, type = "count")
A convenience geom that combines stat_surprise with geom_sf for
easy creation of surprise maps.
geom_surprise( mapping = NULL, data = NULL, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, fill_type = c("surprise", "signed"), models = c("uniform", "baserate", "funnel"), color = NA, colour = color, linewidth = 0.1, ... )geom_surprise( mapping = NULL, data = NULL, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, fill_type = c("surprise", "signed"), models = c("uniform", "baserate", "funnel"), color = NA, colour = color, linewidth = 0.1, ... )
mapping |
Aesthetic mapping created by |
data |
Data (typically an sf object) |
position |
Position adjustment |
na.rm |
Remove NA values? |
show.legend |
Show legend? |
inherit.aes |
Inherit aesthetics from ggplot? |
fill_type |
Type of surprise for fill aesthetic:
|
models |
Character vector of model types to use. Options: "uniform", "baserate", "gaussian", "sampled", "funnel" |
color, colour
|
Border color for polygons |
linewidth |
Border line width |
... |
Additional arguments passed to the layer |
A list of ggplot2 layers
geom_surprise understands the following aesthetics:
sf geometry column (required)
Observed values (required)
Expected values (optional, for base rate model)
Sample sizes (optional, for funnel model)
Mapped to surprise by default
Border color
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic surprise map - geometry must be mapped explicitly ggplot(nc) + geom_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise() # Signed surprise with diverging scale ggplot(nc) + geom_surprise( aes(geometry = geometry, observed = SID74, expected = BIR74), fill_type = "signed" ) + scale_fill_surprise_diverging() # Customize appearance ggplot(nc) + geom_surprise( aes(geometry = geometry, observed = SID74, expected = BIR74), color = "white", linewidth = 0.2 ) + scale_fill_surprise() + theme_minimal()library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic surprise map - geometry must be mapped explicitly ggplot(nc) + geom_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise() # Signed surprise with diverging scale ggplot(nc) + geom_surprise( aes(geometry = geometry, observed = SID74, expected = BIR74), fill_type = "signed" ) + scale_fill_surprise_diverging() # Customize appearance ggplot(nc) + geom_surprise( aes(geometry = geometry, observed = SID74, expected = BIR74), color = "white", linewidth = 0.2 ) + scale_fill_surprise() + theme_minimal()
Creates a density plot of surprise values.
geom_surprise_density( mapping = NULL, data = NULL, which = c("surprise", "signed_surprise"), fill = "#4575b4", color = "black", ... )geom_surprise_density( mapping = NULL, data = NULL, which = c("surprise", "signed_surprise"), fill = "#4575b4", color = "black", ... )
mapping |
Aesthetic mapping |
data |
Data with surprise values |
which |
Which surprise to plot: "surprise" or "signed_surprise" |
fill |
Fill color |
color |
Border color |
... |
Additional arguments passed to |
A ggplot2 layer
Creates a histogram of surprise values.
geom_surprise_histogram( mapping = NULL, data = NULL, which = c("surprise", "signed_surprise"), bins = 30, fill = "#4575b4", color = "white", ... )geom_surprise_histogram( mapping = NULL, data = NULL, which = c("surprise", "signed_surprise"), bins = 30, fill = "#4575b4", color = "white", ... )
mapping |
Aesthetic mapping |
data |
Data with surprise values |
which |
Which surprise to plot: "surprise" or "signed_surprise" |
bins |
Number of bins |
fill |
Fill color |
color |
Border color |
... |
Additional arguments passed to |
A ggplot2 layer
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) nc_surprise <- surprise(nc, observed = SID74, expected = BIR74) ggplot(nc_surprise) + geom_surprise_histogram()library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) nc_surprise <- surprise(nc, observed = SID74, expected = BIR74) ggplot(nc_surprise) + geom_surprise_histogram()
Get the model space from a surprise result
get_model_space(x, ...)get_model_space(x, ...)
x |
A |
... |
Additional arguments (unused) |
A bs_model_space object
Extract surprise values from result objects
get_surprise(x, type = c("surprise", "signed"), ...)get_surprise(x, type = c("surprise", "signed"), ...)
x |
A |
type |
Which surprise to extract: "surprise" or "signed" |
... |
Additional arguments (unused) |
Numeric vector of surprise values
Extracts surprise values for a specific time period from temporal results.
get_surprise_at_time(temporal_result, time)get_surprise_at_time(temporal_result, time)
temporal_result |
A |
time |
Time value to extract |
A bs_surprise object for that time period
Computes the KL-divergence from prior to posterior distribution, which measures "surprise" in the Bayesian framework.
kl_divergence(posterior, prior, base = 2)kl_divergence(posterior, prior, base = 2)
posterior |
Numeric vector of posterior probabilities |
prior |
Numeric vector of prior probabilities (same length as posterior) |
base |
Base of logarithm (default: 2 for bits) |
KL-divergence is defined as:
where P is the posterior and Q is the prior. The divergence is 0 when posterior equals prior (no surprise), and increases as they differ.
Zero probabilities are handled by excluding those terms (convention that 0 * log(0) = 0).
Numeric scalar: the KL-divergence value (always non-negative)
# No surprise when prior equals posterior kl_divergence(c(0.5, 0.5), c(0.5, 0.5)) # High surprise when distributions differ kl_divergence(c(0.9, 0.1), c(0.5, 0.5)) # Maximum surprise when posterior is certain kl_divergence(c(1.0, 0.0), c(0.5, 0.5))# No surprise when prior equals posterior kl_divergence(c(0.5, 0.5), c(0.5, 0.5)) # High surprise when distributions differ kl_divergence(c(0.9, 0.1), c(0.5, 0.5)) # Maximum surprise when posterior is certain kl_divergence(c(1.0, 0.0), c(0.5, 0.5))
Computes log(sum(exp(x))) in a numerically stable way.
log_sum_exp(x)log_sum_exp(x)
x |
Numeric vector of log values |
Uses the identity: log(sum(exp(x))) = max(x) + log(sum(exp(x - max(x)))) This avoids overflow when x contains large positive values.
Numeric scalar: log(sum(exp(x)))
# Direct computation would overflow x <- c(1000, 1001, 1002) log_sum_exp(x) # Returns ~1002.41# Direct computation would overflow x <- c(1000, 1001, 1002) log_sum_exp(x) # Returns ~1002.41
Get Model Names
model_names(model_space)model_names(model_space)
model_space |
A |
Character vector of model names
Combines multiple models into a model space with prior probabilities. The model space represents the set of hypotheses about how data is generated.
model_space(..., prior = NULL, names = NULL)model_space(..., prior = NULL, names = NULL)
... |
|
prior |
Numeric vector of prior probabilities (must sum to 1). If NULL (default), uses uniform prior. |
names |
Optional character vector of names for models |
A bs_model_space object
# Create model space with uniform prior space <- model_space( bs_model_uniform(), bs_model_gaussian() ) # Create with custom prior space <- model_space( bs_model_uniform(), bs_model_baserate(c(0.2, 0.3, 0.5)), prior = c(0.3, 0.7) ) # Create from list models <- list(bs_model_uniform(), bs_model_gaussian()) space <- model_space(models)# Create model space with uniform prior space <- model_space( bs_model_uniform(), bs_model_gaussian() ) # Create with custom prior space <- model_space( bs_model_uniform(), bs_model_baserate(c(0.2, 0.3, 0.5)), prior = c(0.3, 0.7) ) # Create from list models <- list(bs_model_uniform(), bs_model_gaussian()) space <- model_space(models)
Get Number of Models
n_models(model_space)n_models(model_space)
model_space |
A |
Integer number of models
Scales values to the range 0 to 1 using min-max normalization.
normalize_minmax(x, min_val = NULL, max_val = NULL, na.rm = TRUE)normalize_minmax(x, min_val = NULL, max_val = NULL, na.rm = TRUE)
x |
Numeric vector |
min_val |
Minimum value for scaling (default: min of x) |
max_val |
Maximum value for scaling (default: max of x) |
na.rm |
Logical; remove NA values when computing range? |
Numeric vector scaled to the range 0 to 1
normalize_minmax(c(10, 20, 30, 40, 50)) normalize_minmax(c(-5, 0, 5), min_val = -10, max_val = 10)normalize_minmax(c(10, 20, 30, 40, 50)) normalize_minmax(c(-5, 0, 5), min_val = -10, max_val = 10)
Normalizes a numeric vector to sum to 1, creating a valid probability distribution.
normalize_prob(x, na.rm = FALSE)normalize_prob(x, na.rm = FALSE)
x |
Numeric vector (non-negative values expected) |
na.rm |
Logical; remove NA values before normalizing? |
If all values are zero or if sum is zero, returns uniform distribution. Negative values are set to zero with a warning.
Numeric vector summing to 1 (or containing NAs if input had NAs and na.rm = FALSE)
normalize_prob(c(1, 2, 3, 4)) normalize_prob(c(10, 20, 30)) normalize_prob(c(0, 0, 0)) # Returns uniformnormalize_prob(c(1, 2, 3, 4)) normalize_prob(c(10, 20, 30)) normalize_prob(c(0, 0, 0)) # Returns uniform
Computes rates by dividing counts by a base value (e.g., population).
normalize_rate(count, base, per = 1, na_for_zero = TRUE)normalize_rate(count, base, per = 1, na_for_zero = TRUE)
count |
Numeric vector of counts (e.g., number of events) |
base |
Numeric vector of base values (e.g., population) |
per |
Multiplier for scaling (e.g., 100000 for "per 100k") |
na_for_zero |
Logical; return NA when base is zero? |
Numeric vector of rates
# Crime rate per 100,000 population crimes <- c(50, 100, 200) population <- c(10000, 50000, 100000) normalize_rate(crimes, population, per = 100000)# Crime rate per 100,000 population crimes <- c(50, 100, 200) population <- c(10000, 50000, 100000) normalize_rate(crimes, population, per = 100000)
Scales values using quantiles instead of min/max for robustness to outliers.
normalize_robust(x, lower_quantile = 0.05, upper_quantile = 0.95, na.rm = TRUE)normalize_robust(x, lower_quantile = 0.05, upper_quantile = 0.95, na.rm = TRUE)
x |
Numeric vector |
lower_quantile |
Lower quantile for scaling (default: 0.05) |
upper_quantile |
Upper quantile for scaling (default: 0.95) |
na.rm |
Logical; remove NA values when computing quantiles? |
Numeric vector scaled approximately to the range 0 to 1, with outliers potentially outside this range
# With outliers x <- c(1, 2, 3, 4, 5, 100) # 100 is an outlier normalize_robust(x)# With outliers x <- c(1, 2, 3, 4, 5, 100) # 100 is an outlier normalize_robust(x)
Normalizes values to z-scores (standard deviations from mean).
normalize_zscore(x, center = NULL, scale = NULL, na.rm = TRUE)normalize_zscore(x, center = NULL, scale = NULL, na.rm = TRUE)
x |
Numeric vector |
center |
Value to center around (default: mean of x) |
scale |
Scale factor (default: standard deviation of x) |
na.rm |
Logical; remove NA values when computing center/scale? |
Numeric vector of z-scores
x <- c(10, 20, 30, 40, 50) normalize_zscore(x)x <- c(10, 20, 30, 40, 50) normalize_zscore(x)
Visualizes the prior and posterior probabilities of a model space.
## S3 method for class 'bs_model_space' plot( x, y = NULL, main = "Model Probabilities", col = c("#4575b4", "#d73027"), ... )## S3 method for class 'bs_model_space' plot( x, y = NULL, main = "Model Probabilities", col = c("#4575b4", "#d73027"), ... )
x |
A |
y |
Unused |
main |
Plot title |
col |
Colors for prior and posterior bars |
... |
Additional arguments passed to |
Invisibly returns the input object
Creates a simple histogram or density plot of surprise values.
## S3 method for class 'bs_surprise' plot( x, y = NULL, which = c("surprise", "signed_surprise"), type = c("histogram", "density"), main = NULL, xlab = NULL, col = "#4575b4", border = "white", ... )## S3 method for class 'bs_surprise' plot( x, y = NULL, which = c("surprise", "signed_surprise"), type = c("histogram", "density"), main = NULL, xlab = NULL, col = "#4575b4", border = "white", ... )
x |
A |
y |
Unused |
which |
Which to plot: "surprise" or "signed_surprise" |
type |
Plot type: "histogram" or "density" |
main |
Plot title |
xlab |
X-axis label |
col |
Fill color |
border |
Border color |
... |
Invisibly returns the input object
Plots a surprise map using sf's plot method. For ggplot2 integration,
see geom_surprise() and scale_fill_surprise().
## S3 method for class 'bs_surprise_sf' plot( x, y = NULL, which = c("surprise", "signed_surprise"), pal = NULL, breaks = NULL, nbreaks = 9, main = NULL, border = "grey50", lwd = 0.2, key.pos = 4, ... )## S3 method for class 'bs_surprise_sf' plot( x, y = NULL, which = c("surprise", "signed_surprise"), pal = NULL, breaks = NULL, nbreaks = 9, main = NULL, border = "grey50", lwd = 0.2, key.pos = 4, ... )
x |
A |
y |
Unused (for S3 method compatibility) |
which |
Which variable to plot: "surprise" or "signed_surprise" |
pal |
Color palette. If NULL, uses default diverging palette for signed surprise and sequential for unsigned. |
breaks |
Breaks for color scale. If NULL, uses pretty breaks. |
nbreaks |
Number of breaks if |
main |
Plot title. If NULL, auto-generated. |
border |
Border color for polygons |
lwd |
Line width for borders |
key.pos |
Position of legend (1=below, 2=left, 3=above, 4=right, NULL=none) |
... |
Additional arguments passed to |
Invisibly returns the input object
library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) result <- surprise(nc, observed = SID74, expected = BIR74) # Default plot plot(result) # Plot signed surprise plot(result, which = "signed_surprise") # Custom palette plot(result, pal = heat.colors(9))library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) result <- surprise(nc, observed = SID74, expected = BIR74) # Default plot plot(result) # Plot signed surprise plot(result, which = "signed_surprise") # Custom palette plot(result, pal = heat.colors(9))
Creates time series plots of surprise evolution.
## S3 method for class 'bs_surprise_temporal' plot( x, y = NULL, type = c("time_series", "heatmap", "cumulative"), regions = NULL, main = NULL, xlab = "Time", ylab = NULL, col = NULL, ... )## S3 method for class 'bs_surprise_temporal' plot( x, y = NULL, type = c("time_series", "heatmap", "cumulative"), regions = NULL, main = NULL, xlab = "Time", ylab = NULL, col = NULL, ... )
x |
A |
y |
Unused |
type |
Plot type: "time_series", "heatmap", or "cumulative" |
regions |
Which regions to plot (for time_series). If NULL, plots all. |
main |
Plot title |
xlab |
X-axis label |
ylab |
Y-axis label |
col |
Colors |
... |
Additional arguments |
Invisibly returns the input object
Removes a model from an existing model space.
remove_model(model_space, which)remove_model(model_space, which)
model_space |
A |
which |
Integer index or character name of model to remove |
Updated bs_model_space
Sequential color scale for unsigned surprise values. Uses viridis palettes for perceptually uniform color mapping.
scale_fill_surprise( ..., option = "inferno", direction = 1, name = "Surprise", begin = 0, end = 1 ) scale_colour_surprise( ..., option = "inferno", direction = 1, name = "Surprise", begin = 0, end = 1 ) scale_color_surprise( ..., option = "inferno", direction = 1, name = "Surprise", begin = 0, end = 1 )scale_fill_surprise( ..., option = "inferno", direction = 1, name = "Surprise", begin = 0, end = 1 ) scale_colour_surprise( ..., option = "inferno", direction = 1, name = "Surprise", begin = 0, end = 1 ) scale_color_surprise( ..., option = "inferno", direction = 1, name = "Surprise", begin = 0, end = 1 )
... |
Arguments passed to |
option |
Viridis palette option: "magma" (A), "inferno" (B), "plasma" (C), "viridis" (D), "cividis" (E), "rocket" (F), "mako" (G), "turbo" (H) |
direction |
Direction of palette (1 = low-to-high, -1 = reversed) |
name |
Legend title |
begin, end
|
Range of palette to use (0 to 1) |
A ggplot2 scale object
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic usage - geometry must be mapped explicitly ggplot(nc) + geom_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise()library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic usage - geometry must be mapped explicitly ggplot(nc) + geom_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise()
Discrete binned color scale for surprise values. Useful for creating choropleth maps with clearly distinguishable categories.
scale_fill_surprise_binned( n.breaks = 5, palette = "YlOrRd", direction = 1, name = "Surprise", ... ) scale_colour_surprise_binned( n.breaks = 5, palette = "YlOrRd", direction = 1, name = "Surprise", ... ) scale_color_surprise_binned( n.breaks = 5, palette = "YlOrRd", direction = 1, name = "Surprise", ... )scale_fill_surprise_binned( n.breaks = 5, palette = "YlOrRd", direction = 1, name = "Surprise", ... ) scale_colour_surprise_binned( n.breaks = 5, palette = "YlOrRd", direction = 1, name = "Surprise", ... ) scale_color_surprise_binned( n.breaks = 5, palette = "YlOrRd", direction = 1, name = "Surprise", ... )
n.breaks |
Number of breaks/bins |
palette |
ColorBrewer palette name. For unsigned surprise, sequential palettes like "YlOrRd", "Oranges", "Reds" work well. |
direction |
Direction of palette (1 = normal, -1 = reversed) |
name |
Legend title |
... |
Additional arguments passed to |
A ggplot2 scale object
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Binned surprise scale - geometry must be mapped explicitly ggplot(nc) + geom_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise_binned(n.breaks = 5)library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Binned surprise scale - geometry must be mapped explicitly ggplot(nc) + geom_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise_binned(n.breaks = 5)
Diverging color scale for signed surprise values. Negative values (lower than expected) are shown in one color, positive values (higher than expected) in another, with neutral (zero) in the middle.
scale_fill_surprise_diverging( low = scales::muted("blue"), mid = "white", high = scales::muted("red"), midpoint = 0, name = "Signed Surprise", ... ) scale_colour_surprise_diverging( low = scales::muted("blue"), mid = "white", high = scales::muted("red"), midpoint = 0, name = "Signed Surprise", ... ) scale_color_surprise_diverging( low = scales::muted("blue"), mid = "white", high = scales::muted("red"), midpoint = 0, name = "Signed Surprise", ... )scale_fill_surprise_diverging( low = scales::muted("blue"), mid = "white", high = scales::muted("red"), midpoint = 0, name = "Signed Surprise", ... ) scale_colour_surprise_diverging( low = scales::muted("blue"), mid = "white", high = scales::muted("red"), midpoint = 0, name = "Signed Surprise", ... ) scale_color_surprise_diverging( low = scales::muted("blue"), mid = "white", high = scales::muted("red"), midpoint = 0, name = "Signed Surprise", ... )
low |
Color for negative surprise (lower than expected). Default is blue. |
mid |
Color for zero surprise (as expected). Default is white. |
high |
Color for positive surprise (higher than expected). Default is red. |
midpoint |
Value of the midpoint. Default is 0. |
name |
Legend title |
... |
Additional arguments passed to |
A ggplot2 scale object
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Signed surprise with diverging scale - geometry must be mapped explicitly ggplot(nc) + geom_surprise( aes(geometry = geometry, observed = SID74, expected = BIR74), fill_type = "signed" ) + scale_fill_surprise_diverging()library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Signed surprise with diverging scale - geometry must be mapped explicitly ggplot(nc) + geom_surprise( aes(geometry = geometry, observed = SID74, expected = BIR74), fill_type = "signed" ) + scale_fill_surprise_diverging()
Binned diverging scale for signed surprise values.
scale_fill_surprise_diverging_binned( n.breaks = 7, palette = "RdBu", direction = -1, name = "Signed Surprise", ... )scale_fill_surprise_diverging_binned( n.breaks = 7, palette = "RdBu", direction = -1, name = "Signed Surprise", ... )
n.breaks |
Number of breaks/bins |
palette |
ColorBrewer diverging palette name. Options: "RdBu", "RdYlBu", "BrBG", "PiYG", "PRGn", "PuOr", "RdGy", "Spectral" |
direction |
Direction of palette (1 = normal, -1 = reversed) |
name |
Legend title |
... |
Additional arguments |
A ggplot2 scale object
Create a scale with manually specified breaks for surprise values.
scale_fill_surprise_manual( breaks = c(0.5, 1, 1.5, 2), colors = c("#ffffb2", "#fecc5c", "#fd8d3c", "#f03b20", "#bd0026"), name = "Surprise", labels = waiver(), na.value = "grey50", ... )scale_fill_surprise_manual( breaks = c(0.5, 1, 1.5, 2), colors = c("#ffffb2", "#fecc5c", "#fd8d3c", "#f03b20", "#bd0026"), name = "Surprise", labels = waiver(), na.value = "grey50", ... )
breaks |
Numeric vector of break points |
colors |
Character vector of colors (length should be length(breaks) + 1 for binned, or length(breaks) for continuous) |
name |
Legend title |
labels |
Labels for legend |
na.value |
Color for NA values |
... |
Additional arguments |
A ggplot2 scale object
Diverging color scale for signed surprise with breaks at meaningful thresholds based on the interpretation of surprise values.
scale_fill_surprise_thresholds( breaks = c(-1, -0.5, -0.1, 0.1, 0.5, 1), palette = "RdBu", direction = -1, name = "Signed\nSurprise", na.value = "grey80", ... ) scale_colour_surprise_thresholds( breaks = c(-1, -0.5, -0.1, 0.1, 0.5, 1), palette = "RdBu", direction = -1, name = "Signed\nSurprise", na.value = "grey80", ... ) scale_color_surprise_thresholds( breaks = c(-1, -0.5, -0.1, 0.1, 0.5, 1), palette = "RdBu", direction = -1, name = "Signed\nSurprise", na.value = "grey80", ... )scale_fill_surprise_thresholds( breaks = c(-1, -0.5, -0.1, 0.1, 0.5, 1), palette = "RdBu", direction = -1, name = "Signed\nSurprise", na.value = "grey80", ... ) scale_colour_surprise_thresholds( breaks = c(-1, -0.5, -0.1, 0.1, 0.5, 1), palette = "RdBu", direction = -1, name = "Signed\nSurprise", na.value = "grey80", ... ) scale_color_surprise_thresholds( breaks = c(-1, -0.5, -0.1, 0.1, 0.5, 1), palette = "RdBu", direction = -1, name = "Signed\nSurprise", na.value = "grey80", ... )
breaks |
Numeric vector of break points. Default uses meaningful thresholds: -1, -0.5, -0.1, 0.1, 0.5, 1 (symmetric around 0) |
palette |
ColorBrewer diverging palette. Default "RdBu". |
direction |
Palette direction. Default -1 (red = positive/higher). |
name |
Legend title |
na.value |
Color for NA values |
... |
Additional arguments passed to scale |
Meaningful thresholds for interpreting signed surprise:
|surprise| < 0.1: Trivial - essentially as expected
|surprise| 0.1-0.5: Minor to moderate deviation
|surprise| 0.5-1.0: Substantial - genuinely surprising
|surprise| > 1.0: High - very surprising
Positive values indicate higher than expected; negative indicate lower.
A ggplot2 scale object
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) result <- surprise(nc, observed = SID74, expected = BIR74) nc$signed_surprise <- get_surprise(result, "signed") ggplot(nc) + geom_sf(aes(fill = signed_surprise)) + scale_fill_surprise_thresholds()library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) result <- surprise(nc, observed = SID74, expected = BIR74) nc$signed_surprise <- get_surprise(result, "signed") ggplot(nc) + geom_sf(aes(fill = signed_surprise)) + scale_fill_surprise_thresholds()
Updates the prior probabilities of a model space.
set_prior(model_space, prior)set_prior(model_space, prior)
model_space |
A |
prior |
Numeric vector of new prior probabilities (must sum to 1) |
Updated bs_model_space
Computes kernel density estimates for point or polygon sf objects.
st_density( x, method = c("kde", "kriging"), bandwidth = NULL, n = 100, weights = NULL, ... )st_density( x, method = c("kde", "kriging"), bandwidth = NULL, n = 100, weights = NULL, ... )
x |
An sf object with point or polygon geometries |
method |
Density estimation method: "kde" (kernel density) or "kriging" (requires additional packages) |
bandwidth |
Bandwidth for KDE. If NULL, estimated automatically. |
n |
Grid size for density estimation |
weights |
Optional weights for each feature |
... |
Additional arguments passed to density estimation functions |
A list with density estimates and grid information
library(sf) # Create random points pts <- st_as_sf(data.frame( x = rnorm(100), y = rnorm(100) ), coords = c("x", "y")) # Compute density dens <- st_density(pts)library(sf) # Create random points pts <- st_as_sf(data.frame( x = rnorm(100), y = rnorm(100) ), coords = c("x", "y")) # Compute density dens <- st_density(pts)
Evaluates a density estimate at the locations of features in an sf object.
st_density_at(density, x)st_density_at(density, x)
density |
Density estimate from |
x |
sf object with features to evaluate at |
Numeric vector of density values
Convenience wrapper for computing surprise on sf objects.
st_surprise(x, observed, expected = NULL, ...)st_surprise(x, observed, expected = NULL, ...)
x |
An sf object |
observed |
Column name (unquoted or string) or numeric vector of observed values |
expected |
Column name or vector of expected values (for base rate model). If NULL and models include base rate, computed from observed. |
... |
Additional arguments passed to model likelihood functions |
A bs_surprise_sf object
library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) result <- st_surprise(nc, observed = SID74, expected = BIR74) plot(result)library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) result <- st_surprise(nc, observed = SID74, expected = BIR74) plot(result)
This stat computes Bayesian surprise for sf geometries, allowing you to visualize surprise directly in ggplot2.
stat_surprise( mapping = NULL, data = NULL, geom = "sf", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, models = c("uniform", "baserate", "funnel"), signed = TRUE, ... )stat_surprise( mapping = NULL, data = NULL, geom = "sf", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, models = c("uniform", "baserate", "funnel"), signed = TRUE, ... )
mapping |
Aesthetic mapping created by |
data |
Data (typically an sf object) |
geom |
Geometry to use (default: "sf") |
position |
Position adjustment |
na.rm |
Remove NA values? |
show.legend |
Show legend? |
inherit.aes |
Inherit aesthetics from ggplot? |
models |
Character vector of model types to use. Options: "uniform", "baserate", "gaussian", "sampled", "funnel" |
signed |
Logical; compute signed surprise? |
... |
Additional arguments passed to the layer |
A ggplot2 layer
Bayesian surprise (KL-divergence)
Signed surprise (if signed = TRUE)
library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic surprise map - geometry must be mapped explicitly ggplot(nc) + stat_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise()library(ggplot2) library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic surprise map - geometry must be mapped explicitly ggplot(nc) + stat_surprise(aes(geometry = geometry, observed = SID74, expected = BIR74)) + scale_fill_surprise()
Stat for Surprise with sf Geometries
stat_surprise_sf( mapping = NULL, data = NULL, geom = ggplot2::GeomSf, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, models = c("uniform", "baserate", "funnel"), signed = TRUE, ... )stat_surprise_sf( mapping = NULL, data = NULL, geom = ggplot2::GeomSf, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, models = c("uniform", "baserate", "funnel"), signed = TRUE, ... )
mapping |
Aesthetic mapping created by |
data |
Data (typically an sf object) |
geom |
Geometry to use (default: "sf") |
position |
Position adjustment |
na.rm |
Remove NA values? |
show.legend |
Show legend? |
inherit.aes |
Inherit aesthetics from ggplot? |
models |
Character vector of model types to use. Options: "uniform", "baserate", "gaussian", "sampled", "funnel" |
signed |
Logical; compute signed surprise? |
... |
Additional arguments passed to the layer |
A list containing a ggplot2 layer using StatSurpriseSf and
ggplot2::coord_sf(). The stat computes surprise and, when signed = TRUE, signed_surprise, which are available to downstream geoms via
after_stat().
Converts temporal surprise results into a format suitable for animation (e.g., with gganimate).
surprise_animate(temporal_result, include_posterior = FALSE)surprise_animate(temporal_result, include_posterior = FALSE)
temporal_result |
A |
include_posterior |
Include posterior probabilities in output? |
A data frame with columns: time, region (if applicable), surprise, signed_surprise, and optionally model posteriors
Computes surprise using a rolling window of observations.
surprise_rolling( observed, expected = NULL, window_size = 10, step = 1, models = c("uniform", "baserate", "funnel"), ... )surprise_rolling( observed, expected = NULL, window_size = 10, step = 1, models = c("uniform", "baserate", "funnel"), ... )
observed |
Numeric vector of observed values (time-ordered) |
expected |
Numeric vector of expected values |
window_size |
Number of observations in the window |
step |
Step size for moving the window |
models |
Model specification |
... |
Additional arguments passed to |
A list with surprise values for each window position
Computes surprise over time, allowing beliefs to update as new data arrives. This is useful for detecting changes in patterns over time and for streaming data applications.
surprise_temporal( data, time_col, observed, expected = NULL, region_col = NULL, models = c("uniform", "baserate", "funnel"), update_prior = TRUE, window_size = NULL, signed = TRUE, ... )surprise_temporal( data, time_col, observed, expected = NULL, region_col = NULL, models = c("uniform", "baserate", "funnel"), update_prior = TRUE, window_size = NULL, signed = TRUE, ... )
data |
Data frame with time-indexed observations |
time_col |
Column name for time variable (unquoted or string) |
observed |
Column name for observed values |
expected |
Column name for expected values (optional) |
region_col |
Column name for region/spatial identifier (optional). If provided, computes per-region surprise at each time point. |
models |
Model specification (see |
update_prior |
Logical; should prior be updated after each time step? |
window_size |
For rolling window analysis: number of time periods to include. If NULL, uses all prior data. |
signed |
Compute signed surprise? |
... |
Additional arguments passed to |
A bs_surprise_temporal object containing:
surprise_by_time: List of surprise results for each time period
time_values: Vector of time values
cumulative_surprise: Matrix of cumulative surprise
model_space: The model space used
# Create temporal data df <- data.frame( year = rep(2010:2020, each = 5), region = rep(letters[1:5], 11), events = rpois(55, lambda = 50), population = rep(c(10000, 50000, 100000, 25000, 15000), 11) ) # Compute temporal surprise result <- surprise_temporal(df, time_col = year, observed = events, expected = population, region_col = region ) # View results print(result)# Create temporal data df <- data.frame( year = rep(2010:2020, each = 5), region = rep(letters[1:5], 11), events = rpois(55, lambda = 50), population = rep(c(10000, 50000, 100000, 25000, 15000), 11) ) # Compute temporal surprise result <- surprise_temporal(df, time_col = year, observed = events, expected = population, region_col = region ) # View results print(result)
Main function to compute Bayesian surprise for spatial or tabular data. This measures how much each observation updates beliefs about a set of models, highlighting unexpected patterns while de-biasing against known factors.
## S3 method for class 'sf' surprise( data, observed, expected = NULL, sample_size = NULL, models = c("uniform", "baserate", "funnel"), prior = NULL, signed = TRUE, ... ) surprise( data, observed, expected = NULL, sample_size = NULL, models = c("uniform", "baserate", "funnel"), prior = NULL, signed = TRUE, normalize_posterior = TRUE, ... ) ## S3 method for class 'data.frame' surprise( data, observed, expected = NULL, sample_size = NULL, models = c("uniform", "baserate", "funnel"), prior = NULL, signed = TRUE, normalize_posterior = TRUE, ... ) ## S3 method for class 'tbl_df' surprise(data, ...)## S3 method for class 'sf' surprise( data, observed, expected = NULL, sample_size = NULL, models = c("uniform", "baserate", "funnel"), prior = NULL, signed = TRUE, ... ) surprise( data, observed, expected = NULL, sample_size = NULL, models = c("uniform", "baserate", "funnel"), prior = NULL, signed = TRUE, normalize_posterior = TRUE, ... ) ## S3 method for class 'data.frame' surprise( data, observed, expected = NULL, sample_size = NULL, models = c("uniform", "baserate", "funnel"), prior = NULL, signed = TRUE, normalize_posterior = TRUE, ... ) ## S3 method for class 'tbl_df' surprise(data, ...)
data |
Data frame, tibble, or sf object |
observed |
Column name (unquoted or string) or numeric vector of observed values |
expected |
Column name or vector of expected values (for base rate model). If NULL and models include base rate, computed from observed. |
sample_size |
Column name or vector of sample sizes (for funnel model).
Defaults to |
models |
Model specification. Can be:
|
prior |
Numeric vector of prior probabilities for models.
Only used when |
signed |
Logical; compute signed surprise? |
... |
Additional arguments passed to model likelihood functions |
normalize_posterior |
Logical; if TRUE (default), normalizes posteriors before computing KL divergence. This is the standard Bayesian Surprise calculation. If FALSE, uses the unnormalized per-region posterior weights used by the original Correll & Heer JavaScript demo; this option is retained only for legacy comparison. |
For data frames: the input with surprise (and optionally
signed_surprise) columns added, plus a surprise_result attribute.
For sf objects: a bs_surprise_sf object.
# Using sf package's NC data library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic usage with default models result <- surprise(nc, observed = SID74, expected = BIR74) # With specific model types result <- surprise(nc, observed = "SID74", expected = "BIR74", models = c("uniform", "baserate", "funnel") ) # With custom model space space <- model_space( bs_model_uniform(), bs_model_baserate(nc$BIR74) ) result <- surprise(nc, observed = SID74, models = space) # View results plot(result, which = "signed_surprise")# Using sf package's NC data library(sf) nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Basic usage with default models result <- surprise(nc, observed = SID74, expected = BIR74) # With specific model types result <- surprise(nc, observed = "SID74", expected = "BIR74", models = c("uniform", "baserate", "funnel") ) # With custom model space space <- model_space( bs_model_uniform(), bs_model_baserate(nc$BIR74) ) result <- surprise(nc, observed = SID74, models = space) # View results plot(result, which = "signed_surprise")
Updates an existing surprise result with new observations. Useful for real-time or streaming data applications.
update_surprise( surprise_result, new_observed, new_expected = NULL, time_value = NULL, update_prior = TRUE )update_surprise( surprise_result, new_observed, new_expected = NULL, time_value = NULL, update_prior = TRUE )
surprise_result |
An existing |
new_observed |
New observed values |
new_expected |
New expected values (optional) |
time_value |
Time value for the new observation (for temporal) |
update_prior |
Update prior with current posterior? |
Updated surprise result
# Initial computation observed <- c(50, 100, 150) expected <- c(10000, 50000, 100000) result <- auto_surprise(observed, expected) # Update with new data new_obs <- c(200, 75) new_exp <- c(80000, 20000) result <- update_surprise(result, new_obs, new_exp)# Initial computation observed <- c(50, 100, 150) expected <- c(10000, 50000, 100000) result <- auto_surprise(observed, expected) # Update with new data new_obs <- c(200, 75) new_exp <- c(80000, 20000) result <- update_surprise(result, new_obs, new_exp)