Skip to contents

This function lets users standardise effort by drawing the same number of individuals (N) within each sampling unit (transect, site, locality, or any grouping columns). The output keeps species identity so users can compute biomass and functional indices fairly across groups.

Usage

timefish_sample_individuals(
  data,
  N = NULL,
  group_cols = c("location"),
  species_col = "species_code",
  abundance_col = "abundance",
  biomass_col = "biomass",
  reps = 1,
  replace = FALSE,
  strict = TRUE,
  seed = NULL,
  cores = NULL,
  benchmark_cores = TRUE,
  core_candidates = NULL,
  show_parallel_info = TRUE
)

Arguments

data

A data frame with species-level observations and abundance.

N

Number of individuals to draw per group. If NULL, the function uses the minimum positive total individuals across groups.

group_cols

Character vector with grouping columns (for example c("location"), c("location", "site"), or c("transect_id")).

species_col

Column name with species identity.

abundance_col

Column name with abundance counts.

biomass_col

Optional biomass column; if provided, sampled biomass is estimated proportionally at the individual level.

reps

Number of independent random draws per group.

replace

Logical. Should individuals be sampled with replacement?

strict

Logical. If TRUE, stops when a group has fewer than N individuals (when replace = FALSE). If FALSE, uses all available individuals in that group and reports the effective sample size.

seed

Optional integer seed for reproducibility.

cores

Number of CPU cores to use. If NULL, the user can choose interactively (or defaults to 1 in non-interactive sessions).

benchmark_cores

Logical. If TRUE, print an estimated runtime table by core count before running.

core_candidates

Optional vector of candidate core counts to benchmark (for example 2:8).

show_parallel_info

Logical. If TRUE, print available cores, the benchmark table, and selected cores.

Value

A list with N, effective_N_by_group, sampled_rows, and species_pool.

Examples

if (FALSE) { # \dontrun{
joined <- timefish_query_local_join(
  year = 2020,
  country = "Brazil",
  state = c("Santa Catarina", "Sao Paulo")
)
sampled <- timefish_sample_individuals(
  data = joined,
  N = 500,
  group_cols = c("location", "site"),
  species_col = "species_code",
  abundance_col = "abundance",
  biomass_col = "biomass",
  reps = 10,
  seed = 123
)
} # }