Reproducible projects in R emphasize streamlined project setup and
efficient workflows. There are a number of very helpful tools in the R
universe, such as renv
, usethis
,
or here
,
that range from setting up a stable R environment to generating custom
project structures to getting the necessary paths in an easy way. In
addition, there are a number of project structure packages and templates
for creating easy-to-use and transparent project structures. Namely, tinProjects
,
prodigenr
or workflowr
are R packages designed to facilitate reproducible research through
automated project structuring and standardization. They all promote
organized project directories, emphasize reproducibility by integrating
with tools like Git
and renv
, and reduce
manual setup efforts to ensure consistent and error-free project
initialization. These packages help build a solid foundation for
research and ensure that best practices include using separate scripts
for data processing, analysis, and reporting, and combining code with
narrative in R Markdown documents from the start. This organized setup
improves reproducibility by making it easier to maintain, share, and
replicate research. For a more comprehensive overview, have a look at
the CRAN
Task View Reproducible Research.
In the context of link2GI
, which relies heavily on
third-party command-line APIs and requires complex and stable folder and
file structures, a flexible, lightweight R project setup greatly
improves the integration of OS command-line tools into spatial workflows
by:
GDAL
or
the sophisticated Orfeo Toolbox
(OTB) and the growing
universe of r(-)spatial packages for advanced geospatial
processing.initProj
provides a complete and flexible working
environment for GI projects. The focus is on a simple, efficient and
reproducible project management and data handling. The basic framework
is formed by a defined folder structure, initial scripts and
configuration templates as well as optional Git repositories and an
renv
environment. A corresponding RStudio project file is
also created. It supports the automatic installation (if needed) and
loading of the required libraries including various standard setup
skeletons to simplify project initialisation.
The function creates a skeleton of the skeleton scripts
main-control.R
, pre-processing.R
,
10-processing.R
and post-processing.R
, and
creates corresponding parameter configurations files stored as
yaml
files in scr/configs/
. The script
src/functions/000_settings.R holds all specific project settings. Easy
access to all project paths is provided via the list variable
dirs
.
For this reason, the link2GI
package includes a lean and
lightweight but focused approach that integrates git
,
renv
, and a highly flexible folder and package setup
process that is simpler than existing approaches, increasing efficiency,
accuracy, and performance in geospatial workflows.
When using RStudio, a new project can be created by simply selecting the Create Project Structure (link2GI) template from the File -> New Project -> New Directory -> New Project Wizard dialogue.
The basic setup of a default project, which initializes Git and renv, is done with the following call.
root_folder = tempdir() # Mandatory, variable must be in the R environment.
dirs = initProj(root_folder = root_folder, standard_setup = "baseSpatial")
It is easy to customize the folder structure. By default you will create
[1] "level0" "level1" "level2" "run" "rawdata"
[1] "src" "src/functions" "src/configs"
Use the folders
argument to create a specific structure
or subfolder structure of your project.
root_folder = tempdir() # Mandatory, variable must be in the R environment.
dirs = initProj(root_folder = root_folder,
standard_setup = "baseSpatial",
folders = c("data/rawdata/provider1/", "docs/quarto/")
)
A more complex call that integrates the git
and
renv
setup, adds some additional folders and libraries as
well as a location tag will be: