PHOTO(S): © Marco Carè/Marine Photobank
Open data science means that methods, data, and code are available so others can access, reuse, and build from it without much fuss. We use a variety of programs, tools, and practices to do reproducible research.
Our workflow depends on R, RStudio, RMarkdown, and Git/GitHub. These resources will get you started.
Introduction to Open Data Science is a hands-on training book that introduces the tools, practices, and workflows that underpin our work (Lowndes et al. 2017).
The learning hub at NCEAS has so many excellent trainings and resources. It is worth scrolling through the materials to see what is available. For example:
Learn R (and other skills) using Swirl
More about Git: Happy Git with R by Jenny Bryan (short course)
Collaboration is messy! Working with people can be challenging! But working together is one of the most rewarding things we can do in science (and, maybe, as humans). And, besides, the problems facing our world and the challenges of doing good science are too big to solve as individuals. So collaborate we must!
In addition to R, RStudio, RMarkdown, and GitHub, we use these resources to improve collaboration:
If you are dealing with ecological data, at some point you will need to embrace spatial data.
Coordinate Reference Systems (CRS) CRS describe the units used to describe real world locations. The system most people are familar with is latitude and longitude, but there are many many other ways to describe location. Each system has advantages and disadvantages. When you use spatial data, you will need to understand what CRS the data uses, and possibly project it to another CRS to get it in the same units as your other data. Here is a link to a handy primer on CRS.
Spatial data wrangling
An introduction to spatial analysis in R ~2-hour workshop, self-paced course by Jamie Montgomery
Spatial analysis in R: Vectors ~2-hour workshop, self-paced course by Casey O’Hara
Interacting with spatial files in R ~2-hour workshop, self-paced course by Jamie Montgomery
The eco-data-science study group at the University of California Santa Barbara has created a number of useful tutorials. So much goodness there!
Understanding relational data is very important. This chapter in Hadley Wickham & Garrett Grolemund’s R for Data Science provides a good explanation.
Dealing with color can be painful, this color guide should help.