A challenge with answering regional or biogeographical questions (specifically those related to the study of the geographic distribution of plants, animals and other forms of life), is that the data required are not commonplace. To assist in answering such questions, citizen science or crowd-sourced Big Data are employed to investigate patterns and/or processes at large spatial and time scales.
One such “proudly South African” project containing this rare data (in the form of millions of species records), is the second Southern African Bird Atlas Project (SABAP2). It continued from its predecessor in 2007 and saw an exponential increase in submissions around 2015 when the popular SA-built BirdLasser smartphone app was launched. The latter enabled birdwatching enthusiasts to log bird species in a simple and intuitive manner and at five-metre accuracy.
Currently, more than two million records are added to the SABAP2 database annually – an invaluable resource for scientists and conservationists but not used as often as one may expect. Here is a link to the SABAP2 website if you want to read further or become a citizen scientist yourself.
Together with this large database of bird data, some environmental data were required to answer questions about the drivers of bird diversity across the region because I (and undoubtedly other people too) am interested in the causes of certain patterns or behaviours observed in nature. In this case, a typical community ecologist such as myself (when I wear the science hat) posed the following question: How do the bird communities from inside the KNP differ from those outside and, if they do differ, what are the causes? This has never been looked at before at a nearly four-million-hectare scale in this part of the world.
Historical environmental data are nearly as rare as “chicken teeth” and especially across the larger scales that biogeographical studies (like ours) focus at, but fortunately I found some high-quality data made available (freely) by Copernicus, the European Union’s Earth observation programme. I did, however, employ Google to process the data to my needs.
We subsequently ended up with cover values for the environment such as grass, trees, two types of water (permanent and seasonal) and infrastructure across the region. I could now start walking the path of unforgiving statistical analyses by putting the computer hardware and software to work on a decade’s worth of data that Microsoft could not even read.
Something we realised and had to deal with early on, was that there is significant variation in effort with citizen science protocols such as SABAP2 (albeit brilliantly simple). This brings about challenges when the scientist wants to compare “apples with apples”.
For example, if a certain area only ever had two surveys (bird checklists) submitted over 10 years, it cannot be compared with an area for which 100 surveys had been submitted over that same period. Why? Because judged merely on the data, the latter would contain more species when in actual fact, more species would have been recorded for the first area as well, if 100 surveys had been submitted there. Those species unaccounted for because of little observer effort is something called “dark diversity” (a recent term in community ecology). It refers to those species not accounted for or absent but that can/should be found in any area.
To counter this, we applied techniques that attempted to account for effort discrepancies. We also decided on a threshold under which grids with too few surveys were discarded from our analyses.