You are on page 1of 16

The Predictive Power of Benthic Topography on Soft Corals and Algae

Using Logistic Regression to Analyze Predictive Power of Benthic


Topography on the Presence of Soft Corals and Algae in the Santa Barbara
Channel.

K. Janus*

*Geographic Information Science, Department of IDCE, Clark University, Worcester, MA

Abstract

The availability of bathymetry and topobathy data has allowed scientists to use multibeam
acoustic-based techniques to map the contours of the sea floor in exquisite detail. It has been
noted that specific contours of the sea floor correlate to the presence or absence of certain
benthic species. This study looks at bathymetry data and species presence data taken by off the
coast of Santa Barbara in the Santa Barbara Channel as part of USGS’s California State Waters
Map Series Data Catalog. This study attempts to create a logistic regression which will identify
how topographic traits may predict species presence. Using spatial measurements of depth,
slope, slope of slope, aspect, and standard deviation, the presence (1) and absence (0) of Mounds
(see figure 1), Algae (see figure 2), Sea Pens (see figure 3), Sea Whips (see figure 4), and
Driftweed (see figure 5) are researched. The analysis of the data showed that surface floor
topography alone cannot predict the presence or absence of all of the dependent variables;
however, it was able to predict -with some accuracy- the presence of algae.

Figure 1: Mounds. Mostly created from conglomerates of oysters and mussels.

1
The Predictive Power of Benthic Topography on Soft Corals and Algae

Figure 2: Algae

Figure 3: Sea Pen

Figure 4: Sea Whip, a type of soft coral found from Santa Barbara to Alaska.

Figure 5: Drift weed, also known as Sargassum

2
The Predictive Power of Benthic Topography on Soft Corals and Algae

Introduction

The Santa Barbara Channel is located off the coast of CA between the city of Santa Barbara on
the mainland and the Channel Islands, which are located approximately forty miles off the coast.
The Santa Barbara channel is significant because of its geographic location along the path of the
California Current [1]. During migratory season, a variety of marine mammals -including three
species of whales- pass through this narrow swath of coast to travel southwards towards Mexico
or northwards towards Alaska [2] . The channel is surprisingly deep, with a max known depth
of approximately one thousand feet, deeper than the Grand Canyon [3]. Due to the natural oil
and tar found in the crevice, the Santa Barbara Channel has historically been home to oil drilling
efforts and occasionally has been the site of major oil spills [4]. Due to the economic and
ecological importance of the area, it is important to have the statistical tools to study the presence
of species in the region.

This project will study how seafloor depth and topography may predict species presence in the
Santa Barbara Channel. This project uses bathymetry and species presence/absence data
collected by USGS in 2008 and provided to the public in the Offshore Santa Barbara Data
Catalog [5].

The unit of analysis in this study is presence and/or absence of a specimen in a physical location.
The dataset from USGS includes 138 specimen documented during the Santa Barbara Channel
data collection expedition of 2008; however, the specimen this study focuses on include Mounds,
Algae, Sea Pens, Sea Whips, and Driftweed. Presence is signified by a latitude and longitude
coordinate position and a 1, while absence is signified by a latitude and longitude coordinate
position and a 0. The presence and absence data is taken at equal intervals by a remotely
operated vehicle (ROV) over the duration of a transect.

In order to give significance to the presence and absence data, and to create a predictive
equation, the presence and absence data will be compared with the following topographic
features: depth, slope, slope of slope, aspect, and standard deviation of depth. These topographic
features are derived from one bathymetry raster dataset provided by USGS. The derived layers
are produced in ArcMap. Analysis will be done to see if the five topographic features could be
used to predict presence of any of the five specimen.

Because the unit of analysis for each dependent variable is Boolean, and the independent
variables are continuous, the type of multivariate regression to be used will be logistic
regression, a multivariate regression technique often used to study data when the dependent
variable is either “yes” or “no” -or in our case “present” or “absent”.

3
The Predictive Power of Benthic Topography on Soft Corals and Algae

Data

Data Source and Contents


Data for this study came from the United States Geological Survey’s (USGS) “California State
Waters Map Series Data Catalog”.

The independent variable data came from the “Bathymetry (2m/pixel) Offshore of Santa Barbara
map area raster file”. The resolution of the data was 2 meters by 2 meters. Bathymetry data was
collected using multi-beam depth sounders and backscatter was removed.

The dependent presence and absence data came from video footage taken in 2008 aboard a
submerged remotely operated vehicle (ROV). The submersible surveyed the seafloor in tandem
with a larger research vessel on the surface. A camera aboard the ROV took continuous film
footage of the sea floor, covering 14 transects of various distances. The speed of the ROV
averaged to 3 meters per second and 570 observations were taken. A field expert reviewed the
video footage and identified the presence or absence of 138 different kinds of geologic features,
flaura, or fauna during the course of the data collection. The locations of the transects and
observations are shown in figure 6.

Figure 6: This map shows the study area. The observation points (in blue) lie along transects
which are distributed normally throughout the study range and over various habitat types.

4
The Predictive Power of Benthic Topography on Soft Corals and Algae

Habitat types are not included in the study but are included in the map for the sake of context.
There are 14 transects and 570 observations.

Species presence data for mounds, algae, sea pens, sea whips, and driftweed were chosen for this
analysis because they are a relatively abundant and ecologically important species to the region.
Additionally, the California State Waters Map Series Data Catalog already has a predictive map
for the presence of isopachs, brittle stars, cup corals, and hydroids, but it does not have predictive
analysis for mounds, algae, sea whips, and driftweed.

A goal of this study is to create a pre-emptive predictive analysis for types of species, which
USGS has not done a predictive analysis for. It should be noted however, that USGS does have
a predictive map for short and tall sea pens -a specimen which is included in this study- but, due
to their high abundance in the study area, the species presence for Sea Pens has also been
included in this study.

Data: Preparation
In order to meet the requirements of logistic regression, the dependent variables must be binary
and the independent variables must be continuous. The dependent variables in this study are
binary with presence being 1 and absence being 0. The independent variables depth, slope, slope
of slope, aspect, and standard deviation are continuous.

The independent variables needed to be prepared however, because only a bathymetry raster
image for depth was available for download. The function “Slope” in the spatial analyst toolbar
of ArcMap was used to derive the slope raster dataset as well as the slope of slope raster dataset.
The function “Aspect” in the spatial analyst toolbar of ArcMap was used to derive the aspect
raster dataset. Finally, the “Focal Statistics” tool was used to derive the standard deviation raster
dataset. After the completion of the derivatives, the descriptive statistics for the independent
variables were as follows in Table 1:

Table 1: Descriptive Statistics for Independent Variables

In Table 1, it is seen that depth has the highest standard deviation. The histograms which
demonstrate the distribution of the values are as follows in figures 6 through 10.

5
The Predictive Power of Benthic Topography on Soft Corals and Algae

Figure 6: This shows the bimodal standard deviation of depth data.

Figure 7: Slope histogram

6
The Predictive Power of Benthic Topography on Soft Corals and Algae

Figure 8: Histogram of slope of Slope

Figure 9: Histogram of Aspect

As is seen in the histogram data in figures 6 through 10, figure 6, depth data, demonstrates a
bimodal distribution while the other histograms exhibit relatively normal distribution of data.
The binning for slope is low, demonstrating that there is little variation in the slope in the study
area. This is likely because the study area exists in the shallow region of the Santa Barbara
Channel, before the plate drops off into the deeper regions of the channel.

7
The Predictive Power of Benthic Topography on Soft Corals and Algae

To analyze the dependent variables, Global Moran’s I and Incremental spatial autocorrelation of
the transects were checked in ArcMap and the results for one of the dependent variables are in
Figures 1. The results show that all of the dependent variables, Algae, Mounds, Sea Pens, Sea
Whips and Driftweed are all highly clustered with a high probability. This may or may not be
beneficial for the logistic regression analysis. Typically, the data should be not spatially
autocorrelated, but rather it should be spatially independent. However, since the study is looking
to see if certain species reside in certain topographically characterized regions, it is possible that
the clustering will allow for specific combinations of the independent variables to “stand-out” as
factors which predict species presence. The results for Algae alone are shown below:

8
The Predictive Power of Benthic Topography on Soft Corals and Algae

Figure 11: Spatial Autocorrelation Information on ALGAE

9
The Predictive Power of Benthic Topography on Soft Corals and Algae

Methods

Next, the “One-to-Many Spatial Join” function in ArcMap was used to associate the values of
each of the presence/absence observations of the five, boolean, dependent variables with the
numeric values of each of the five continuous independent variables. The spatial join allowed
each dependent variable observation to be matched with an associated value for depth, slope,
slope of slope, aspect, and standard deviation of depth. This was done so that the data could be
in the proper form to run a logistic regression.

The “Table to Excel” function was used so that the joined table in ArcMap could be imported
into excel, where it was cleaned and then imported into SPSS.

Once in SPSS, Binary Logistic Regression was run on each of the five dependent variables using
each of the five independent variables as possible explanatory variables to predict presence or
absence. The results are as follows:

Results
The results of the study showed that only Algae presence could be predicted based on the
topographic independent variables. The other variables, Sea Pens, Driftweed, Sea Whips, and
Mounds did not have presence which could be explained by topographic features on the seafloor.

Algae:

This module above shows that 569 cases were evaluated and none were ignored.

10
The Predictive Power of Benthic Topography on Soft Corals and Algae

This Classification Table indicates that of all of the observations, 486 of the observations have
no presence of Algae and 83 have presence for algae.

The iteration history module for step 0 shows that the initial -2 Log Likelihood LL for the
estimation.

11
The Predictive Power of Benthic Topography on Soft Corals and Algae

In the initial step of the logarithmic regression, we see that the constant is -1.767. This will be
used in the final predictive equation.

12
The Predictive Power of Benthic Topography on Soft Corals and Algae

The hit ratio results above show that when using the predictive independent variables, SPSS can
use Logistic Regression to accurately predict Algae presence and absence 87.9% of the time. It
inaccurately predicted algae presence 15 times when there was no algae presence and it
inaccurately predicted no algae presence 54 times when there was algae presence.

The standard for what is considered to be a good hit ratio is found by taking the percentage of
occurrences of the larger category, which in this example is absence with 85.4% of the
observations having no presence of Algae, and multiplying this by 0.25. The result of this is
21.35%. After taking 25% of the larger category, this is added to the larger category percentage
(85.4% + 21.35%). If the result is less than the overall percentage correct points classified by the
model, then the model is considered to have a good hit ratio. In this case the overall sum is
106.5%, which is undoubtedly greater than the hit ratio (an impossible hit ratio). This value of

13
The Predictive Power of Benthic Topography on Soft Corals and Algae

21.35% is the threshold which must be surpassed in order to reject the null hypothesis and
determine if the model is a better predictor than random chance. Ideally, for this model to be
better, there would be more presence data in the model. More presence data would lower the
percentage of the dependent variable which is absence data, and therefore lower the percent
threshold for which must be exceeded to reject the null hypothesis. However, because of the
nature of data collection in the field, often the data does not abide by the “desired rules”. Of the
183 species detected in the study, Algae was one of the variables with the highest presence,
making it one of the most suitable for this study.

With a significance of 0.743, the Hosmer and Lemesho test indicate that the model is a good fit.
Any value above 0.05 indicates a good fit.

The Cox & Snell and the Nagelkerke R exist at 0.268 and 0.474 respectively. This is an
estimation of the R2 fit value of the models predictive power to the actual spread of the data. The
larger the R2 value the better, until -theoretically- it reaches 1. The values seen here leave
something to be desired, however they are not terribly low and they demonstrate some
correlation of the data to the approximation.

The equation for the final model is as follows: 𝜋′ = 3.020 + .136𝑋1 − .014𝑋2 − 26.339𝑋3 +
2.868𝑋4 − .038𝑋5

The significance of these inputs in the table above should be below 0.05 to indicate a 95%
confidence interval. Depth and Slope meet these requirements while Standard Deviation comes
close. Aspect and Slope of slope have the lowest confidence interval due to their highest
significance at 0.198 and 0.472, respectively. The input with the highest slope (i.e the slope
which will weigh the heaviest on the predictive equation) is standard deviation at -26.339. The
input with the lowest slope is slope of slope. Slope of slope has the lowest contributing slope to
the model formula as well as the highest significance. Future studies may benefit from
eliminating the slope of slope variable and finding another metric of topographic ocean floor
features to study.

14
The Predictive Power of Benthic Topography on Soft Corals and Algae

The limitations of the data come from the nature of the data collection. Ideally more data could
have been collected over a wider range of topographic features. This would have given a
broader scope of context for the presence/absence data by species. However, the fact that the
presence absence data was taken along a transect eliminates data collecting bias, an issue which
often falsely biases in favor of the presence of species. Additionally, the transects were taken
over multiple habitat types (see figure 6) and in multiple depths, which again helped to eliminate
collective bias.

Conclusion
The results of the study showed that Algae presence can be predicted with relative confidence in
the shallow regions of the Santa Barbara Channel based on information collected about the
topographic features.

The consequences of this result are that researchers could potentially predict species presence by
only collecting bathymetry data. This is a much cheaper and more expedient alternative to
collecting ground verification points with a submersible ROV. The additional consequences of
this are that larger swaths of habitats could be mapped a higher number of times. Since the most
recent data from the Santa Barbara Channel came to USGS 10 years ago, it would be beneficial
to have a system where habitat and species presence data could be mapped more often. This
would allow for the creation of maps of species movements over time. This information could
be highly valuable especially considering the enormous ecological and economic value of the
Santa Barbara Channel to Southern California.

The statistical consequences of this result are that it is possible to create a relatively successful
logistic regression model for species presence using topographic data. Furthermore it could be
assumed that with the addition of more data -specifically species presence data-, a larger study
area, and perhaps the addition of a principal component analysis data layer created from the
derived topographic features to the logistic regression, that this model could be improved upon.

15
The Predictive Power of Benthic Topography on Soft Corals and Algae

Citations:
[1] https://www.shsu.edu/~dl_www/bkonline/131online/f07oceancur/07index.htm
[2] https://www.nps.gov/chis/learn/nature/marine-animals.htm
[3] https://www.nps.gov/chis/learn/nature/seafloor.htm
[4] https://www.sbck.org/current-issues/oil-and-gas/
[5] https://pubs.usgs.gov/ds/781/OffshoreSantaBarbara/data_catalog_OffshoreSantaBarbara.html

16

You might also like