Skip to contents
TimeFish R package logo TimeFish R package

TimeFish: General Workflow

This tutorial walks through the main TimeFish workflow from table discovery to filtered downloads and summaries. The package is intentionally structured around small, focused steps so users can inspect the original CSV files, filter on any column, and export only the rows they need.

Installation

Install timefishr from GitHub using remotes:

install.packages("remotes")
remotes::install_github("BioScalesLab/TimeFish", subdir = "timefishr")

Alternative using pak:

install.packages("pak")
pak::pak("BioScalesLab/TimeFish/timefishr")

Then load the package:

timefishr uses the bundled local snapshot shipped with the package. Most users only need one line to get the latest approved data:

That command automatically refreshes the package snapshot with the newest approved data and updates the local cache used by the package. By default (from_drive = "auto"), it uses the public snapshot URLs and does not require Google Drive setup for end users. If drive credentials are configured, it can also pull _integrated files from Google Drive before falling back to the public snapshots. After that call, timefish_census(), timefish_location(), and timefish_taxonomic() will read the refreshed cache automatically. The tutorial still works offline because it reads the packaged files directly.

Citation

Citing timefishr

Developers

Package website: https://bioscaleslab.github.io/TimeFish/

About this tutorial

TimeFish ships with three bundled CSV tables:

tables_overview <- data.frame(
  table = c("census", "location", "taxonomic"),
  purpose = c(
    "Species observations by transect",
    "Transect, site, and geography metadata",
    "Species traits and taxonomy"
  ),
  key_fields = c(
    "transect_id, species_code, total_length, abundance",
    "transect_id, country, state, location, site, year",
    "species_code, family, genus, trophic_groups_1, trophic_groups_2"
  ),
  stringsAsFactors = FALSE
)

knitr::kable(tables_overview, caption = "The three bundled TimeFish tables")
The three bundled TimeFish tables
table purpose key_fields
census Species observations by transect transect_id, species_code, total_length, abundance
location Transect, site, and geography metadata transect_id, country, state, location, site, year
taxonomic Species traits and taxonomy species_code, family, genus, trophic_groups_1, trophic_groups_2

Workflow overview

TimeFish workflow diagram
TimeFish workflow diagram

Effort standardization schematics

The package includes two complementary standardization workflows: one based on minimum comparable sampled area, and another based on minimum comparable individual counts.

Minimum sampling area workflow
Minimum sampling area workflow
Minimum individuals workflow
Minimum individuals workflow

Use timefish_available_cores() first, then choose a practical core setting based on your task size:

  • Quick exploratory runs:
    • cores = 1
    • benchmark_cores = FALSE
  • Balanced daily workflow:
    • cores = min(4, timefish_available_cores())
    • benchmark_cores = TRUE
  • Heavy bootstrap / repeated sampling:
    • cores = max(1, min(8, timefish_available_cores() - 1))
    • benchmark_cores = TRUE

Example with individual-based sampling:

sampled <- timefish_sample_individuals(
  data = joined_data,
  N = 200,
  group_cols = c("location", "site"),
  species_col = "species_code",
  abundance_col = "abundance",
  biomass_col = "biomass",
  reps = 100,
  cores = min(4, timefish_available_cores()),
  benchmark_cores = TRUE
)

1. Know your data

Start by checking the available tables and their columns.

timefish_tables()
#> [1] "census"    "location"  "taxonomic"
timefish_columns("location")
#>  [1] "transect_id"      "province"         "state"            "country"         
#>  [5] "location"         "site"             "thermal_category" "no_take_zone"    
#>  [9] "longitude"        "latitude"         "ntransect"        "observer"        
#> [13] "sampling_method"  "sampling_area_m2" "day"              "month"           
#> [17] "year"             "sampling_season"  "depth"            "temperature"     
#> [21] "complexity"       "visibility"       "start_time"       "end_time"        
#> [25] "biotop"

The data are stored as the original CSV tables from the repository, so you can query fields such as country, state, location, site, year, species_code, or observer depending on the table you choose.

2. Query and filter

Use timefish_query() to filter any table by any of its columns. Character filters are matched case-insensitively by default, and timefish_between() helps with numeric ranges.

location_sc <- timefish_query(
  "location",
  country = "Brazil",
  state = "Santa Catarina",
  year = timefish_between(2007, 2010)
)

location_sc_preview <- location_sc[, c(
  "transect_id", "state", "country", "location", "site", "year",
  "observer", "no_take_zone", "depth", "biotop"
)]

knitr::kable(
  head(location_sc_preview, 8),
  caption = "Preview of filtered location rows (Brazil, Santa Catarina, 2007-2010)"
)
Preview of filtered location rows (Brazil, Santa Catarina, 2007-2010)
transect_id state country location site year observer no_take_zone depth biotop
2007_1 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 10 interface
2007_2 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 10 interface
2007_3 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 5 slope
2007_4 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 5 slope
2007_5 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 5 slope
2007_6 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 10 interface
2007_7 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 10 interface
2007_8 Santa Catarina Brazil Deserta Island West 2007 Diego Barneche yes 10 interface

You can also filter the census table by species or the taxonomic table by traits or family:

census_virginicus <- timefish_query(
  "census",
  species_code = "ani_vir"
)

taxonomic_family <- timefish_query(
  "taxonomic",
  family = "Pomacentridae"
)

list(
  census_rows = nrow(census_virginicus),
  taxonomic_rows = nrow(taxonomic_family)
)
#> $census_rows
#> [1] 3190
#> 
#> $taxonomic_rows
#> [1] 9

If you want to filter several tables at once, use timefish_query_bundle().

bundle <- timefish_query_bundle(
  location = list(country = "Brazil", state = "Santa Catarina"),
  census = list(species_code = "ani_vir")
)

list(
  census_rows = nrow(bundle$census),
  location_rows = nrow(bundle$location),
  taxonomic_rows = nrow(bundle$taxonomic)
)
#> $census_rows
#> [1] 3190
#> 
#> $location_rows
#> [1] 2330
#> 
#> $taxonomic_rows
#> [1] 189

Example: two states in the same year (2020)

Use a vector in state and a fixed year value:

location_two_states_2020 <- timefish_query(
  "location",
  country = "Brazil",
  state = c("santa_catarina", "sao_paulo"),
  year = 2020
)

location_two_states_2020

This pattern keeps rows from both states for the same year.

3. Download filtered data

Once you have the filtered rows you want, write them to disk.

timefish_download(
  "location",
  country = "Brazil",
  state = "Santa Catarina",
  directory = tempdir()
)

timefish_download_census(species_code = "ani_vir")
timefish_download_uvcs(species_code = "ani_vir")
timefish_download_taxonomic(family = "Pomacentridae")

The helper creates a CSV or TSV file with a date-stamped name by default. You can also pass a custom file name if you want to save a curated export.

4. Join and summarise

When you want to combine the tables for analysis, use timefish_join().

data <- timefish_join()

list(
  rows = nrow(data),
  columns = ncol(data),
  has_biomass = "biomass" %in% names(data)
)
#> $rows
#> [1] 197405
#> 
#> $columns
#> [1] 36
#> 
#> $has_biomass
#> [1] TRUE

The merged table can then be summarised by year or geography.

year_summary <- timefish_summary_by_year(data, metric = "abundance")
geo_summary <- timefish_summary_by_geo(data, metric = "biomass", level = "state")

head(year_summary)
#>   year     value
#> 1 2007  79.72000
#> 2 2008  91.42381
#> 3 2009  59.57463
#> 4 2010  69.03797
#> 5 2011 125.91489
#> 6 2012  83.62130
head(geo_summary)
#>   country          state longitude  latitude    value n_censos n_sites
#> 1  Brazil Santa Catarina -48.37950 -27.33282  495.823     2217      18
#> 2  Brazil      Sao Paulo -45.63326 -24.02124 6111.604      576      17
#>   n_locations
#> 1           8
#> 2           3

5. What to use next

The package is built around a simple flow:

  1. inspect the available CSV tables,
  2. filter the rows you need,
  3. download the filtered result,
  4. join or summarise the output for analysis.

For example:

timefish_tables()
timefish_columns("location")
timefish_query("census", species_code = "ani_vir")
timefish_download_census(species_code = "ani_vir")
timefish_join()
timefish_summary_by_geo(timefish_join(), metric = "biomass", level = "state")

For the package README and the live documentation site, see the GitHub Pages site for timefishr:

https://bioscaleslab.github.io/TimeFish/

Citation

Run this in R to see the recommended citation format:

citation("timefishr")

If you need the full contributor list for the underlying TimeFISH data record, use the Zenodo record linked in the package README.