Sample a fixed number of individuals and keep species identity by group

This function lets users standardise effort by drawing the same number of individuals (N) within each sampling unit (transect, site, locality, or any grouping columns). The output keeps species identity so users can compute biomass and functional indices fairly across groups.

Usage

timefish_sample_individuals(
  data,
  N = NULL,
  group_cols = c("location"),
  species_col = "species_code",
  abundance_col = "abundance",
  biomass_col = "biomass",
  reps = 1,
  replace = FALSE,
  strict = TRUE,
  seed = NULL,
  cores = NULL,
  benchmark_cores = TRUE,
  core_candidates = NULL,
  show_parallel_info = TRUE
)

Arguments

data: A data frame with species-level observations and abundance.
N: Number of individuals to draw per group. If NULL, the function uses the minimum positive total individuals across groups.
group_cols: Character vector with grouping columns (for example c("location"), c("location", "site"), or c("transect_id")).
species_col: Column name with species identity.
abundance_col: Column name with abundance counts.
biomass_col: Optional biomass column; if provided, sampled biomass is estimated proportionally at the individual level.
reps: Number of independent random draws per group.
replace: Logical. Should individuals be sampled with replacement?
strict: Logical. If TRUE, stops when a group has fewer than N individuals (when replace = FALSE). If FALSE, uses all available individuals in that group and reports the effective sample size.
seed: Optional integer seed for reproducibility.
cores: Number of CPU cores to use. If NULL, the user can choose interactively (or defaults to 1 in non-interactive sessions).
benchmark_cores: Logical. If TRUE, print an estimated runtime table by core count before running.
core_candidates: Optional vector of candidate core counts to benchmark (for example 2:8).
show_parallel_info: Logical. If TRUE, print available cores, the benchmark table, and selected cores.

Value

A list with N, effective_N_by_group, sampled_rows, and species_pool.

Examples

if (FALSE) { # \dontrun{
joined <- timefish_query_local_join(
  year = 2020,
  country = "Brazil",
  state = c("Santa Catarina", "Sao Paulo")
)
sampled <- timefish_sample_individuals(
  data = joined,
  N = 500,
  group_cols = c("location", "site"),
  species_col = "species_code",
  abundance_col = "abundance",
  biomass_col = "biomass",
  reps = 10,
  seed = 123
)
} # }