Sample a fixed number of individuals and keep species identity by group
timefish_sample_individuals.RdThis function lets users standardise effort by drawing the same number of
individuals (N) within each sampling unit (transect, site, locality, or
any grouping columns). The output keeps species identity so users can
compute biomass and functional indices fairly across groups.
Usage
timefish_sample_individuals(
data,
N = NULL,
group_cols = c("location"),
species_col = "species_code",
abundance_col = "abundance",
biomass_col = "biomass",
reps = 1,
replace = FALSE,
strict = TRUE,
seed = NULL,
cores = NULL,
benchmark_cores = TRUE,
core_candidates = NULL,
show_parallel_info = TRUE
)Arguments
- data
A data frame with species-level observations and abundance.
- N
Number of individuals to draw per group. If
NULL, the function uses the minimum positive total individuals across groups.- group_cols
Character vector with grouping columns (for example
c("location"),c("location", "site"), orc("transect_id")).- species_col
Column name with species identity.
- abundance_col
Column name with abundance counts.
- biomass_col
Optional biomass column; if provided, sampled biomass is estimated proportionally at the individual level.
- reps
Number of independent random draws per group.
- replace
Logical. Should individuals be sampled with replacement?
- strict
Logical. If
TRUE, stops when a group has fewer thanNindividuals (whenreplace = FALSE). IfFALSE, uses all available individuals in that group and reports the effective sample size.- seed
Optional integer seed for reproducibility.
- cores
Number of CPU cores to use. If
NULL, the user can choose interactively (or defaults to1in non-interactive sessions).- benchmark_cores
Logical. If
TRUE, print an estimated runtime table by core count before running.- core_candidates
Optional vector of candidate core counts to benchmark (for example
2:8).- show_parallel_info
Logical. If
TRUE, print available cores, the benchmark table, and selected cores.
Examples
if (FALSE) { # \dontrun{
joined <- timefish_query_local_join(
year = 2020,
country = "Brazil",
state = c("Santa Catarina", "Sao Paulo")
)
sampled <- timefish_sample_individuals(
data = joined,
N = 500,
group_cols = c("location", "site"),
species_col = "species_code",
abundance_col = "abundance",
biomass_col = "biomass",
reps = 10,
seed = 123
)
} # }