Visualizing classifier performance in R, with only 3 commands
I have recently discovered ROC-R, an R package that is really usefull for IR students. ROC-R was designed for evaluating and visualizing classifier performance, supporting combinations of many of the typical IR metrics (e.g. precision, recall, f-measure, acurracy or error). It only adds three new commands to R, and integrates tightly with R's built-in graphics facilities.
Here's a short how-to. Let's assume you have the following experimental data from a binary classification problem on a file called "data.txt". The column entitled ClassifyerOutput shows the values output by the classifyer, while the column GroundTruth represets the real values for each sample.
The following R script would produce a nice precicion/recall curve.
It could not be simpler. The complete documentation is only 14 pages long (assuming that you are familiar with R) and in no time you'll be producing nice looking charts from your data. I had some problems installing the package on linux (You will also need to install gplots from the R package bundle gregmisc) but everything worked fine on my Windows machine.
If you're using the package don't forget to cite the original authors:
Sing, T. & Sander, O. & Beerenwinkel, N. & Lengauer, T. (2004).
"ROCR: An R Package for visualizing the performance of scoring classifiers".
http://rocr.bioinf.mpi-sb.mpg.de
Here's a short how-to. Let's assume you have the following experimental data from a binary classification problem on a file called "data.txt". The column entitled ClassifyerOutput shows the values output by the classifyer, while the column GroundTruth represets the real values for each sample.
ClassifyerOutput ^ GroundTruth
0.35 ^ 0
1.0 ^ 1
1.0 ^ 0
0.1 ^ 1
0.58 ^ 0
The following R script would produce a nice precicion/recall curve.
library(ROCR)
data <- read.table('data.txt', sep='^', header=TRUE);
pred <- prediction(data$ClassifyerOutput , data$GroundTruth)
perf <- performance(pred,"prec","rec")
plot(perf,col="grey82",lty=3)
plot(perf,avg="vertical",spread.estimate="boxplot",add=TRUE)
It could not be simpler. The complete documentation is only 14 pages long (assuming that you are familiar with R) and in no time you'll be producing nice looking charts from your data. I had some problems installing the package on linux (You will also need to install gplots from the R package bundle gregmisc) but everything worked fine on my Windows machine.
If you're using the package don't forget to cite the original authors:
Sing, T. & Sander, O. & Beerenwinkel, N. & Lengauer, T. (2004).
"ROCR: An R Package for visualizing the performance of scoring classifiers".
http://rocr.bioinf.mpi-sb.mpg.de