You are on page 1of 49

Image Classification with RandomForests in

R (and QGIS)
Nov 28, 2015

The goal of this post is to demonstrate the ability of R to classify multispectral imagery using
RandomForests algorithms. RandomForests are currently one of the top performing algorithms
for data classification and regression. Although their interpretability may be difficult,
RandomForests are widely popular because of their ability to classify large amounts of data with
high accuracy.

In the sections below I show how to import into R a Landsat image and how to extract pixel data
to train and fit a RandomForests model. I also explain how to speed up image classification
through parallel processing. Finally I demonstrate how to implement this R-based
RandomForests algorithms for image classification in QGIS.

Loading the data in R

For the purpose of this post, Im going to conduct a land-cover classification of a 6-band Landsat
7 image (path 7 row 57) taken in 2000 that has been processed to surface reflectance, as shown
in a previous post in my blog. Several R packages are needed, including: rgdal, raster, caret,
randomForest and e1071. After installation, lets load the packages:

library(rgdal)
library(raster)
library(caret)
Now lets import the Landsat image into R as a RasterBrick object using the brick function
from the raster package. Also lets replace the original band names (e.g., X485.0.Nanometers)
with shorter ones (B1 to B5, and B7):

img <-
brick("C:/data/landsat/images/2000/LE70070572000076EDC00/L7007057_20000316_ref
l")
names(img) <- c(paste0("B", 1:5, coll = ""), "B7")

We can make a RGB visualization of the Landsat image in R using the plotRGB command, for
example, a false color composite RGB 4:5:3 (Near infrared - Shortwave infrarred - Red). Im
using the expression img * (img >= 0) to convert the negative values to zero:

plotRGB(img * (img >= 0), r = 4, g = 5, b = 3, scale = 10000)


I created a set of training areas in a polygon shapefile (training_15.shp) which stores the id for
each land cover type in a column in the attribute table called class as shown below:
Lets use the shapefile function from the raster package to import this file into R as an object
of class SpatialPolygonsDataFrame and lets create a variable to store the name of the class
column:

trainData <- shapefile("C:/data/landsat/shps/UTM18N_32618/training_15.shp")


responseCol <- "class"

Extracting training pixels values

Now lets extract the pixel values in the training areas for every band in the Landsat image and
store them in a data frame (called here dfAll) along with the corresponding land cover class id:

dfAll = data.frame(matrix(vector(), nrow = 0, ncol = length(names(img)) + 1))


for (i in 1:length(unique(trainData[[responseCol]]))){
category <- unique(trainData[[responseCol]])[i]
categorymap <- trainData[trainData[[responseCol]] == category,]
dataSet <- extract(img, categorymap)
dataSet <- lapply(dataSet, function(x){cbind(x, class =
as.numeric(rep(category, nrow(x))))})
df <- do.call("rbind", dataSet)
dfAll <- rbind(dfAll, df)
}

The data frame resulting from working with my data has about 80K rows. It is necessary to work
with a smaller dataset as it may take a long time to train and fit a RandomForests model with a
dataset this size. For a start, lets subset the data generating 1000 random samples:

nsamples <- 1000


sdfAll <- subset(dfAll[sample(1:nrow(dfAll), nsamples), ])

Model fitting and image classification

Next we must define and fit the RandomForests model using the train function from the caret
package. First, lets specify the model as a formula with the dependent variable (i.e., the land
cover types ids) encoded as factors. For this exercise Ill only use three bands as explanatory
variables (Red, Near infrared and Short wave infrared bands). We then define the method as rf
which stands for the random forest algorithm. (Note: try names(getModelInfo()) to see a
complete list of all the classification and regression methods available in the caret package).

modFit_rf <- train(as.factor(class) ~ B3 + B4 + B5, method = "rf", data =


sdfAll)
At this point we could simply use the predict command to make a raster with predictions from
the fitted model object (i.e., modFit_rf). However, it is possible to speed up computations using
the clusterR function from the raster package which supports multi-core computing for
functions such as predict (Note: the snow package has to be installed). We just need to add
one line for creating a cluster object and another one for deleting it after the operation is finished:

beginCluster()
preds_rf <- clusterR(img, raster::predict, args = list(model = modFit_rf))
endCluster()

The implementation of parallel computation using my 8-core processor laptop gave an


improvement of about 70% in terms of computation time (~14.2 min for unparallel processing
vs. ~4.1 min for multicore procedure). You can see an screenshot of the classified image below:
Additional arguments for parameter tuning such as number of trees to grow (default to 500),
minimum size of terminal nodes or maximum number of terminal nodes trees in the forest,
among others, can also be modified in or added to the model. Please refer to the documentation
in the randomForest and caret packages for more details.

The following video shows these R commands in action in RStudio:

How to perform a RandomForests classification in QGIS using R packages

For running the QGIS version of the R script described above, you can download the script
available in the following link and save it in the R Scripts folder (or copy and paste the content in
the QGIS Script editor) as explained in my previous post:

* R Script for RandomForests classification in QGIS

Watch the following video to see how to perform a RandomForests classification for a Landsat
image in QGIS using R packages:

Additional resources

For digging into the process of predictive models creation, I recommend you visit the caret
package website which provides extensive documentation about data preprocessing, data
splitting, variable importance evaluation and model fitting and tuning. Also take a look at
RStoolbox, a new R package that provides a set of tools for remote sensing processing.

The R+QGIS approach shown in this post expands the image classification methods available in
QGIS. There are other image processing techniques included in QGIS such as those found in the
Semi-Automatic Classification Plugin, the GRASS GIS plugin and the Orfeo Toolbox. I suggest
you also explore these other options.

In a future post Ill write about recommended practices for accuracy assessment of classified
images through the comparison of reference data versus the corresponding classification results.
Stay tuned!

You may also be interested in:

* Integrating QGIS and R: A stratified sampling example

* Prepare files for production of reflectance imagery in CLASlite using R


COMMENTS
73 comments
R Spatialist
Login
1

Recommend
Share
Sort by Best

Join the discussion

RIRABE 16 days ago

Hi Ali,
I solved the problem of overall accuracy. I have writing what follows:
> tm<- tm_shape(preds_rf)+ tm_raster(alpha = 0, n = 9, style = "pretty",
+ interval.closure = "left", labels = c("Culture", "Batis", "Savane_Arbustive",
"Savane_Herbeuse", "Sol_Argilo_Sableux", "Sol_Sablo_Argileux", "Zone_Humide",
"Eau", "Sable"), auto.palette.mapping = TRUE, max.categories = 9,
+ saturation = 1,interpolate = FALSE, title = "Land Cover of Iro region")
> tm
I get this error message:
Error: cannot allocate vector of size 676.1 Mb
How can resolv this problem?
Sincerely,

o
o
o Reply
o
o Share

http://disq.u
o

Ali Mod RIRABE 16 days ago

Hello. You may use memory.limit to increase limits on memory allocation. Also please
read these threads on Stackoverflow and other forums: http://stackoverflow.com/quest...,
https://stat.ethz.ch/pipermail..., https://www.r-bloggers.com/mem...

o
o
o Reply
o
o Share

http://disq.u

o

Manikandan Sathiyanarayanan 24 days ago

hi ali,
i would like to do SVM classification in Landsat ETM+ image.is it possible to do SvM
classification using R language



Reply

Share
o

http://disq.u
o

Ali Mod Manikandan Sathiyanarayanan 24 days ago

Hi Manikandan. Yes, it's possible. You can use the svmRadial method from the kernlab
package through caret, for instance:

mod.svm <- train(as.factor(class) ~ B3 + B4 + B5, method = "svmRadial", data =


training_bc)

as I show in this other post: http://amsantac.co/blog/en/201...


There are other options for SVM as you may find in the caret manual:
http://topepo.github.io/caret/...

o
o
o Reply
o
o Share

http://disq.u

Manikandan Sathiyanarayanan a month ago

hi ali ,Thanks for provide coding for Random forest. actually its help me a lot to learn about R.
as I am learning stage of R for classification in TM ETM+ .im getting error when run the code
"""""" sdfAll <- subset(dfAll[sample(1:nrow(dfAll), nsamples), ])
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'"""""

could u please help me to solve this error.



Reply

Share
o
http://disq.u
o

Ali Mod Manikandan Sathiyanarayanan a month ago

Hi Manikandan. To help you, please upload your script and a sample of your data to an
online repository and send me the download link through my contact page:
http://amsantac.co/contact.htm...

o
o
o Reply
o
o Share

http://disq.u
o

Manikandan Sathiyanarayanan Ali a month ago

the code which i use for the classification below


""""""library(raster)
library(rgdal)
library(caret)
img <- brick("G:/adama/classifiction/test/layerstack1-71.img")
names(img) <- c(paste0("B", 1:6, coll = ""), "B8")
trainData <- shapefile("G:/adama/classifiction/test/training_set.shp")
responseCol <- "Classname"
dfAll = data.frame(matrix(vector(), nrow = 0, ncol = length(names(img)) + 1))
for (i in 1:length(unique(trainData[[responseCol]]))){
category <- unique(trainData[[responseCol]])[i]
categorymap <- trainData[trainData[[responseCol]] == category,]
dataSet <- extract(img, categorymap)
dataSet <- lapply(dataSet, function(x){cbind(x, class = as.numeric(rep(category,
nrow(x))))})
df <- do.call("rbind", dataSet)
dfAll <- rbind(dfAll, df)
}

sdfAll <- subset(dfAll[sample(1:nrow(dfAll), nsamples), ])"""""""


TM band 7 band data and training data in polygon

o
o
o Reply
o
o Share

http://disq.u
o

Ali Mod Manikandan Sathiyanarayanan a month ago


Did you define nsamples? I don't see it in your code

o
o
o Reply
o
o Share

http://disq.u

Lucy Ryan 2 months ago

Hi Ali,

This is such a useful blog, I really appreciate you sharing it.

I have a problem when I run the model:

"Something is wrong; all the Accuracy metric values are missing:


Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :2 NA's :2
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In eval(expr, envir, enclos) :
model fit failed for Resample01: mtry=2 Error in randomForest.default(x, y, mtry = param$mtry,
...) :
Can't have empty classes in y."
...and so on...

From the warnings I thought perhaps I have some NA values, but:

> modFit_rf <- train(as.factor(class) ~ R + G + B, data = sdfAll, method = "rf")


> any(is.na(sdfAll))
[1] FALSE

There are many samples with all zero values though - could this be the problem? If so, can I just
remove the zero values in a similar way to the NAs? Or perhaps I should remove them all prior
to doing anything else...

see more


Reply

Share
o

http://disq.u
o

Ali Mod Lucy Ryan a month ago

Thanks Lucy for your comment. I'm glad you find my posts useful,
I'd suggest you examine the class column in your data. The warning says: 'Can't have
empty classes in y', so there may be some rows lacking class label.

Are those zero values present in more than one class? If so, they may be not providing
any useful information to the classifier and could be removed. If the zero values are
present just in one class, for example water, then it might be better to keep them. Also
please examine whether zero is the value assigned to NoData pixels. Sometimes NoData
pixels are coded as -9999, for instance. It may be the case that NoData pixels in your
image are coded as zero.

Hope you find a solution for this issue soon

o
o
o Reply
o
o Share

http://disq.u
o

Lucy Ryan Ali a month ago

Thanks Ali.

I have removed all the zeros from the whole dataset - they took up more than one class as
far as I could see.

The train function now seems to be running...but it is taking a VERY long time. It has
already been running for 24 hrs... My dataset is huge, but this is just the train part so
perhaps something has gone wrong...!
o
o
o Reply
o
o Share

http://disq.u
o

Ali Mod Lucy Ryan a month ago

Lucy, if your dataset is huge, then, yes, it may take a very long time. I'd suggest you try
first with a smaller dataset (eg., about 10000 observations) and take the time spent in
training the model, so you have an idea of how long it would take for the whole dataset.

I'd also suggest you use the randomForest package directly, instead of the caret package.
That is:

library(randomForest)
model1 <- randomForest(as.factor(class) ~ R + G + B, data = sdfAll)

Usually this is faster than the train function from caret. Finally, it may depend on type of
data and number of classes, but a model trained with millions of observations may offer
just a slightly higher accuracy than a model trained with a smaller dataset (i.e., thousands
of records)

o
o
o Reply
o
o Share

http://disq.u

Christine Swanson 2 months ago

Thank you for this comprehensive tutorial. I am trying to run the random forest classifier on a
Landsat 8 image. I have gotten to the part where you extract the training pixel values, and I am
getting the following error:

Error in .xyValues(x, coordinates(y), ..., df = df) :


xy should have 2 columns only.
Found these dimensions: 34, 3

My code is below:

##Load libraries
library(rgdal)
library(raster)
library(caret)
library(randomForest)
library(e1071)

##Create a files list for each image


rasters_2013217 <- list.files(path = "./Data",
pattern = "LC82220672013217LGN00_B.*.TIF",
full.names = TRUE)
rasters_2013233 <- list.files(path = "./Data",
pattern = "LC82220672013233LGN00_B.*.TIF",
full.names = TRUE)

##Load and stack raster images


l8_2013217 <- stack(rasters_2013217)
l8_2013233 <- stack(rasters_2013233)

##Load the training data


trainData <- shapefile("./Data/TO_training_points.lyr")
trainData_utm22 <- spTransform(trainData, crs(l8_2013217))
responseCol <- "Name"

##Extract training pixel values


dfAll = data.frame(matrix(vector(), nrow = 0, ncol = length(names(l8_2013217)) + 1))
for (i in 1:length(unique(trainData_utm22[[responseCol]]))){
category <- unique(trainData_utm22[[responseCol]])[i]
categorymap <- trainData_utm22[trainData_utm22[[responseCol]] == category,]
dataSet <- extract(l8_2013217, categorymap)

if(is(trainData, "SpatialPointsDataFrame")){
dataSet <- cbind(dataSet, class = as.numeric(category))
dfAll <- rbind(dfAll, dataSet)
}
if(is(trainData, "SpatialPolygonsDataFrame")){
dataSet <- lapply(dataSet, function(x){cbind(x, class = as.numeric(rep(category, nrow(x))))})
df <- do.call("rbind", dataSet)
dfAll <- rbind(dfAll, df)
}
}

------

If you have any suggestions on how I can fix this, I would be really grateful. Thank you!

see more


Reply

Share
o

http://disq.u
o

Ali Mod Christine Swanson 2 months ago

Hi Christine. Thanks for commenting on my blog.

Does your training data come perhaps from a file with elevation data (Z coordinates)
included? If so, the Z data have to be dropped when importing the file into R. You can
use the pointDropZ parameter in the readOGR for that purpose:

library(rgdal)
trainData <- readOGR("path_to_your_file", layer = "name_of_your_file", pointDropZ =
TRUE)

Also please see my answer to a comment below (from Johannes May) about the modified
code version for working with points instead of polygons, if that applies to your data,

Let me know if these suggestions don't solve the problem

o
o
o Reply
o
o Share

http://disq.u

RIRABE 2 months ago

Hello,
I applied successfully your tutorial to the classification of image with randomForest.The result
obtained is as follows:
I have some questions:
1) how to make to obtain the grids and legend with the name of each type of land cover like that:

2) How to obtain accuracy(or kappa coeficient)?


3)what meaning for this result? (system.time(preds_rf<-clusterR(L8_Iro_2016, raster::predict,
args = list(model=modFit_rf))):
user system elapsed
23.75 3.91 1203.91



Reply

Share
o

http://disq.u
o

Ali Mod RIRABE 2 months ago

Hello,

Thanks for stopping by my blog. Regarding your questions:


1) There are several options to create the grids and legend. You can use either the spplot,
rasterVis or tmap packages. For spplot examples see this link: https://edzer.github.io/sp/.
For tmap see: https://cloud.r-project.org/we.... For example, the starting code for tmap
would be:
library(tmap)
tm_shape(preds_rf) + tm_raster() + tm_grid() # customize it changing the default
parameters
2) You can get the accuracy and kappa metrics (based on the training dataset) just by
printing your model (eg., modFit_rf). For calculating accuracy metrics by cross-
tabulating observed and predicted classes you can use the 'confusionMatrix' command
from the caret package.
3) If you look at the elapsed time, it tells you that the previous command line took 1203
seconds (about 20 minutes) to be processed

o
o
o Reply
o
o Share

http://disq.u

Johannes May 4 months ago

Using a Sentinel 2 image and 3 LC classes, I get "Error in rep(category, nrow(x)) : invalid 'times'
argument " for the rep() function. are you able to suggest how to fix the time argument? This
seems to exceed my personal R expertise. NoData is no issue in my data. Apparently such an
error happens if times gets negative, though I'm not sure how this should happen. Hope you have
a suggestion?


Reply

Share
o

http://disq.u
o

Ali Mod Johannes May 4 months ago

Thanks for contacting me and for sharing your files,

The issue is that you are using a point shapefile instead of a polygons shapefile, which
the algorithm was designed for,

The following code solve the issue. It generates the data.frame with the extracted training
data using a point shape:

dfAll = data.frame(matrix(vector(), nrow = 0, ncol = length(names(img)) + 1))


for (i in 1:length(unique(trainData[[responseCol]]))){
category <- unique(trainData[[responseCol]])[i]
categorymap <- trainData[trainData[[responseCol]] == category,]
dataSet <- extract(img, categorymap)

if(is(trainData, "SpatialPointsDataFrame")){
dataSet <- cbind(dataSet, class = as.numeric(category))
dfAll <- rbind(dfAll, dataSet)
}
if(is(trainData, "SpatialPolygonsDataFrame")){
dataSet <- lapply(dataSet, function(x){cbind(x, class = as.numeric(rep(category,
nrow(x))))})
df <- do.call("rbind", dataSet)
dfAll <- rbind(dfAll, df)
}
}

I'll be updating the code in the post with this improvement soon.

o
o
o Reply
o
o Share

http://disq.u
o

Ali Mod Johannes May 4 months ago

Hi Johannes. First please make sure that in this command line:

dataSet <- lapply(dataSet, function(x){cbind(x, class = as.numeric(rep(category, nrow(x))))})

you are using the lapply command (not sapply). Second as you only have 3 classes, then I
suggest you iterate the command lines inside the loops by yourself. I mean, set i = 1, and then
run the six lines inside the loop. Then set i = 2, and repeat, and then do the same for i = 3. That
way it may be easier for you to identify at what step the error shows up. If you don't identify the
error, I'd suggest you send me a message through my contact page
(http://amsantac.co/contact.htm... with links for downloading a sample of your data so I can
check them out. I'd be glad to help you solve this issue



Reply

Share
o

o

http://disq.u

Shariful Islam 5 months ago

Interesting! Can you post/refer more satellite image classification with different machine
learning algorithms like boosting, neural network, convolutional neural network, ensemble
learning, svm, ridge regression, back propagation etc.



Reply

Share
o

http://disq.u
o

Ali Mod Shariful Islam 5 months ago

Yes, that's my plan! As soon as I have some free time (hopefully in a couple of weeks) I'll
be posting more on this topic. Stay tuned!
o
o
o Reply
o
o Share

http://disq.u
o

Shariful Islam Ali 4 months ago

Thanks. Could you elaborate how can I save the classified image, as output of your
random forest algorithm, as a georeferenced image whereas I can compare this classified
image with the raw unclassified image in QGIS. Thanks..

o
o
o Reply
o
o Share

http://disq.u
o

o
Ali Mod Shariful Islam 4 months ago

Sure. Following my example, one can export the classified image with the writeRaster
command:

writeRaster(preds_rf, "exported.tif")

to export as a .TIF file, for instance.

o
o
o Reply
o
o Share

http://disq.u
o

Shariful Islam Ali 4 months ago

Thank you. could you say why i am having trouble in training like modFit_rf <-
train(as.factor(class) ~ B3 + B4 + B5, method = "J48", data = sdfAll) where I am using
C4.5-like Trees model as described in http://topepo.github.io/caret/...

I am getting error like


Loading required package: RWeka
Error : .onLoad failed in loadNamespace() for 'rJava', details:
call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared object 'C:/Users/Winrock/Documents/R/win-
library/3.3/rJava/libs/x64/rJava.dll':
LoadLibrary failure: %1 is not a valid Win32 application.

Error : .onLoad failed in loadNamespace() for 'rJava', details:


call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared object 'C:/Users/Winrock/Documents/R/win-
library/3.3/rJava/libs/x64/rJava.dll':
LoadLibrary failure: %1 is not a valid Win32 application.

Error: package or namespace load failed for RWeka

Even I installed package named 'RWeka' separately but all went vain.

Thanks in advance..

o
o
o Reply
o
o Share

http://disq.u
o

Ali Mod Shariful Islam 4 months ago


For this error, please make sure R and Java have matching architectures. For example if
you have 32-bit R, you need to have 32-bit Java installed. You can find some clues for
solving this issue in this link: http://stackoverflow.com/quest...

o
o
o Reply
o
o Share

http://disq.u

ahmed 5 months ago

thanks for the great example, i am actually getting warnings that i don't know the reason of it. i
got 35 warnings messages saying "model fit failed for Resample......Can't have empty classes in
y " although i am sure that all the polygons got class value (in string and also number) same as
you did in your example.



Reply

Share
o

http://disq.u
o

Ali Mod ahmed 5 months ago

Hi Ahmed. A possible reason is that there are pixels with No Data values (NAs) in one or
more of the bands of your raster files. If you are using my script, you may test presence
of NAs with: any(is.na(dfAll)). To remove data.frame rows with NAs, use: na.omit(dfAll)

o
o
o Reply
o
o Share

http://disq.u

nandhini 5 months ago

Interesting Post. I am having hyperspectral images contains 102 bands. Whether the above
methods will work?.. Suggest me .. Thank you



Reply

Share
o

http://disq.u
o

Ali Mod nandhini 5 months ago

Hi Nandhini. Thanks for your comment. Yes, these methods should work if you are
interested in conducting a supervised classification. Be aware, however, that the number
of bands is quite large so trying RandomForests in R with you hyperspectral image can be
computationally intensive. Best regards.

o
o
o Reply
o
o Share


o

http://disq.u

bleesand 7 months ago

I am following this tutorial, however I am having difficulty loading the Landsat 8 image with 12
bands. The directory is changed to point to the image path however there seems to be a problem
using the brick function. Each band [1:12] is a .tif file. Appreciate you assistance.



Reply

Share
o

http://disq.u
o

Ali Mod bleesand 7 months ago

Hi. Please verify that all the bands that you want to stack (or brick) have the same extent.
For example, the panchromatic band (B8) may have a different extent. I'd also suggest
you use the stack command (before using brick). Hope this helps to solve the problem
you found

o
o
o Reply
o
o Share

http://disq.u

aamer 9 months ago

Nice post. I have images that are fMRI images. I need to extract the features and prepare a
dataset out of it. After that i have to use the random forests for the classification purpose.

Can you please help me how to do that. Also at the same time i need to lower the number of
features as they are in huge number.



Reply

Share
o

http://disq.u
o

o

Ali Mod aamer 9 months ago

Thanks for your comment. That's an interesting question. What is the format of your
fMRI images? If they can be read into R as a multilayer image format, then the remaining
processing would be quite similar to what I show in this post. I'd be glad to talk to you
about that off this thread, so please send me a message through my contact page:
http://amsantac.co/contact.htm...

Regarding feature selection, please see these links: http://www.analyticsvidhya.com...,


https://www.mql5.com/en/articl...

o
o
o Reply
o
o Share

http://disq.u

Tim Salabim 10 months ago

For an interactive version of plotRGB have a look at mapview::viewRGB



Reply

Share
o

http://disq.u
o

Ali Mod Tim Salabim 9 months ago

I just tried the viewRGB function. It's awesome! The mapview package has very
interesting and useful features. Thanks a lot for sharing, Tim!

o
o
o Reply
o
o Share

http://disq.u
o

o

Tim Salabim Ali 9 months ago

Glad you like it! :-)

o 1
o
o Reply
o
o Share

http://disq.u

Diego J. Lizcano 10 months ago

Great post! Thanks for the detailed explanation and tutorial. Definitely a very useful tool I have
to use.


Reply

Share
o

http://disq.u
o

Ali Mod Diego J. Lizcano 9 months ago

Thank you very much Diego for your comment. Hope this post serves for your research.
I'll be posting new content in my blog soon. Saludos!

o
o
o Reply
o
o Share

http://disq.u

Maria Rafaela Braga Salum de A 10 months ago

Nice post, very usefull, would like try with LIDAR data?
If you, ican share a small piece

Best Regards

Rafaela



Reply

Share
o

http://disq.u
o

kemilla Maria Rafaela Braga Salum de A 8 months ago

Hi Maria, if you want to classify DEM's, DSM's or derivatives from LiDAR, check out
this paper: http://www.mdpi.com/2072-4292/... (shameless self promotion). There is also
a script for running random forest in R and a small lidar dataset to test it out.

o 1
o
o Reply
o
o Share

http://disq.u
o

Maria Rafaela Braga Salum de A kemilla 7 months ago

Sorry , yes sure..I'll have a look

cheers

Rafaela

o
o
o Reply
o
o Share

http://disq.u
o

Maria Rafaela Braga Salum de A kemilla 8 months ago

Thanks for paper...This script who amsantac is done. It is working well, I did not try with LiDAR
already...Actually have not seen LiDAR and Random Forest. I tried to do some things, but never
I got successful. Can you show to us?

Cheers

Rafaela
my email: rafasalum@hotmail.com



Reply

Share
o

http://disq.u
o

kemilla Maria Rafaela Braga Salum de A 8 months ago

Hi Maria,

in the supplemental information of that paper (scroll down almost to the bottom) you can
download a small piece of lidar data and some training data and try with the script
provided.

o
o
o Reply
o
o Share

http://disq.u

Ali Mod Maria Rafaela Braga Salum de A 10 months ago

Hi Maria Rafaela. Glad to hear you liked this post. Thanks!

For LiDAR data classification, I would recommend to use specialized algorithms such as those
provided by LAStools (http://rapidlasso.com/lastools... or MARS software
(http://www.merrick.com/Geospat...,

Particulary, there is a LAStools toolbox available for QGIS (http://rapidlasso.com/2013/09/....


Take a look at the lasclassify algorithm (http://rapidlasso.com/lastools... for classifying buildings
and high vegetation,

Have a nice day!



Reply

Share
o

http://disq.u

bic ton a year ago

Thanks for this post. Can you please post the datasets you used? i would like to test the code on
your datasets and explore more?



Reply

Share
o

http://disq.u
o

Ali Mod bic ton a year ago

Thanks for your comment. I considered to post the datasets but discarded the idea due to
the large file size of the Landsat images. You may find recent images for your area of
interest using EarthExplorer which has a user-friendly interface for browsing and
downloading images

o
o
o Reply
o
o Share

http://disq.u

RIRABE 2 months ago

Hello,
I applied successfully your tutorial to the classification of image with randomForest.The result
obtained is as follows:

.I have some questions:


1) how to make to obtain the grids and legend with the name of each type of land cover?
2)how to get accuracy(or kappa coeficient)?
3)what meaning for this result? (system.time(preds_rf<-clusterR(L8_Iro_2016, raster::predict,
args = list(model=modFit_rf))):
user system elapsed
23.75 3.91 1203.91
4) how to compute weigths for a given images need to run neuralnet algorithm?



Reply

Share
o

http://disq.u
o


o
o

Ali Mod RIRABE 2 months ago

Hello,

Thanks for stopping by my blog. Regarding your questions:

1) There are several options to create the grids and legend. You can use either the spplot,
rasterVis or tmap packages. For spplot examples see this link: https://edzer.github.io/sp/.
For tmap see: https://cloud.r-project.org/we.... For example, the starting code for tmap
would be:
library(tmap)
tm_shape(preds_rf) + tm_raster() + tm_grid() # customize it changing the default
parameters
2) You can get the accuracy and kappa metrics (based on the training dataset) just by
printing your model (eg., modFit_rf). For calculating accuracy metrics by cross-
tabulating observed and predicted classes you can use the 'confusionMatrix' command
from the caret package.
3) If you look at the elapsed time, it tells you that the previous command line took 1203
seconds (about 20 minutes) to be processed
4) In neuralnet, all weights are initialized by default with random values drawn from a
standard normal distribution. If you set startweights = NULL, then the weights will be
randomly initialized. If you have computed your own weigths, then you can enter them as
a vector for the startweights parameter

o
o
o Reply
o
o Share

http://disq.u
o


o
o

RIRABE Ali 2 months ago

Hi,
Thank you for your explanation.
I would like to have more information about data and reference in "confusMatrix. When i
type :
L8_Iro_2016<-stack("L8_Iro_2016.tif")
names(L8_Iro_2016)<-c(paste0("B",1:3))
writeRaster(L8_Iro_2016, filename="merge_Iro_2016.tif",overwrite=TRUE)
plotRGB(L8_Iro_2016*(L8_Iro_2016>=0), r=1,g=2, b=3, stretch="hist",scale=10000)
trainData<-shapefile("F://Sminaire_R/Classification_randomForest/training9.shp")
reponseCol1<-"CLASS_ID"
trainData<-shapefile("F://Sminaire_R/Classification_randomForest/trainTrue9.shp")
reponseCol2<-"CLASS_ID"
##Extracting Training pixels values pour Iro
dfAll = data.frame(matrix(vector(), nrow = 0, ncol = length(names(L8_Iro_2016)) + 1))
for (i in 1:length(unique(trainData[[reponseCol1]]))){
category <- unique(trainData[[reponseCol1]])[i]
categorymap <- trainData[trainData[[reponseCol1]] == category,]
dataSet <- extract(L8_Iro_2016, categorymap)
dataSet <- lapply(dataSet, function(x){cbind(x, class = as.numeric(rep(category,
nrow(x))))})
df <- do.call("rbind", dataSet)
dfAll <- rbind(dfAll, df)
}
##Model fitting and image classification for Iro using dfAll and "rf"
modFit_rf <- train(as.factor(class) ~ B1 + B2 + B3, method = "rf", data = dfAll)
beginCluster()
#preds_rf <- clusterR(L8_Iro_2016, raster::predict, args = list(model = modFit_rf))
system.time(preds_rf<-clusterR(L8_Iro_2016,raster::predict,args =
list(model=modFit_rf)))
endCluster()
plot(preds_rf)
print(modFit_rf)
data<-c("reponseCol1", "reponseCol2")

confus<-confusionMatrix(data, reference= reponseCol2, positive = NULL, prevalence =


NULL)
I obtained this error message:

> data<-c("reponseCol1", "reponseCol2")


> confus<-confusionMatrix(data, reference= reponseCol2, positive = NULL, prevalence
= NULL)
Error in confusionMatrix.default(data, reference = reponseCol2, positive = NULL, :
the data cannot have more levels than the reference
I dont understand this message.

see more

o
o
o Reply
o
o Share

http://disq.u
o

Ali Mod RIRABE 2 months ago

Hi. First, you have to extract the band pixel values for the trainTrue9.shp polygons:

dfAll2 = data.frame(matrix(vector(), nrow = 0, ncol = length(names(L8_Iro_2016)) + 1))


for (i in 1:length(unique(trainData[[reponseCol2]]))){
category <- unique(trainData[[reponseCol2]])[i]
categorymap <- trainData[trainData[[reponseCol2]] == category,]
dataSet <- extract(L8_Iro_2016, categorymap)
dataSet <- lapply(dataSet, function(x){cbind(x, class = as.numeric(rep(category,
nrow(x))))})
df <- do.call("rbind", dataSet)
dfAll2 <- rbind(dfAll2, df)
}

Then apply the fitted model and evaluate accuracy:

predicted <- predict(modFit_rf, dfAll2)


confusionMatrix(predicted, dfAll2$CLASS_ID)

(I've not tested the code above so it may have bugs) Please note that you should use an
independent dataset for validation. I recommend you read these other posts which can
provide help on validation: http://amsantac.co/blog/en/201...,
http://amsantac.co/blog/en/201....

Please also note that you are overwriting the trainData object in your code.

You might also like