« Home | links for 2006-05-03 Harper's Bazaar, Fashion Ma... » | Born in the USA » | links for 2006-05-02 Lykeion Books - Nanowhere ... » | links for 2006-05-01 Phramer An Open-Source Sta... » | Freemasonry today! » | Power Puff Girls Anime » | The Clash - London Calling Video » | Coolest Music Repository Ever » | links for 2006-04-30 Computer History Museum - S... » | links for 2006-04-29 iScratch iScratch (beta) i... » 

Thursday, May 04, 2006 

Visualizing classifier performance in R, with only 3 commands

I have recently discovered ROC-R, an R package that is really usefull for IR students. ROC-R was designed for evaluating and visualizing classifier performance, supporting combinations of many of the typical IR metrics (e.g. precision, recall, f-measure, acurracy or error). It only adds three new commands to R, and integrates tightly with R's built-in graphics facilities.

Here's a short how-to. Let's assume you have the following experimental data from a binary classification problem on a file called "data.txt". The column entitled ClassifyerOutput shows the values output by the classifyer, while the column GroundTruth represets the real values for each sample.

ClassifyerOutput ^ GroundTruth
0.35 ^ 0
1.0 ^ 1
1.0 ^ 0
0.1 ^ 1
0.58 ^ 0

The following R script would produce a nice precicion/recall curve.

library(ROCR)
data <- read.table('data.txt', sep='^', header=TRUE);
pred <- prediction(data$ClassifyerOutput , data$GroundTruth)
perf <- performance(pred,"prec","rec")
plot(perf,col="grey82",lty=3)
plot(perf,avg="vertical",spread.estimate="boxplot",add=TRUE)

It could not be simpler. The complete documentation is only 14 pages long (assuming that you are familiar with R) and in no time you'll be producing nice looking charts from your data. I had some problems installing the package on linux (You will also need to install gplots from the R package bundle gregmisc) but everything worked fine on my Windows machine.



If you're using the package don't forget to cite the original authors:

Sing, T. & Sander, O. & Beerenwinkel, N. & Lengauer, T. (2004).
"ROCR: An R Package for visualizing the performance of scoring classifiers".
http://rocr.bioinf.mpi-sb.mpg.de

About me

www.flickr.com
This is a Flickr badge showing public photos from Bruno Martins. Make your own badge here.

Listening to


 All the Web
Me at BookCrossing
Campos Magneticos

Previous posts

Friendly Blogs

Powered by Blogger, Flickr
and del.icio.us