Transparent overlapping histograms in R

July 20th, 2010

Today I wanted to compare the histograms from two data sets, but it’s hard to see the differences when the plots overlap. Here’s an example of the problem:

a = rnorm(10000,5)
b = rnorm(10000,3)
hist(a,xlim=c(0,10))
hist(b,col="gray20",add=T)

Output:
normal histograms

I want to know what’s going on behind all that solid grey!

So I coded up a little function to add pseudo-transparency to pairs of histograms:

plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
                                breaks=NULL, xlim=NULL, ylim=NULL){

  ahist=NULL
  bhist=NULL
  
  if(!(is.null(breaks))){
    ahist=hist(a,breaks=breaks,plot=F)
    bhist=hist(b,breaks=breaks,plot=F)
  } else {
    ahist=hist(a,plot=F)
    bhist=hist(b,plot=F)

    dist = ahist$breaks[2]-ahist$breaks[1]
    breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)

    ahist=hist(a,breaks=breaks,plot=F)
    bhist=hist(b,breaks=breaks,plot=F)
  }

  if(is.null(xlim)){
    xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))   
  }

  if(is.null(ylim)){
    ylim = c(0,max(ahist$counts,bhist$counts))   
  }

  overlap = ahist
  for(i in 1:length(overlap$counts)){  
    if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
      overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
    } else {
      overlap$counts[i] = 0
    }
  }

  plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
  plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
  plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T) 
}

The results are much easier to interpret:

a = rnorm(10000,5)
b = rnorm(10000,3)
plotOverlappingHist(a,b)

overlapping histograms

I’m sure this could be improved upon and generalized to multiple histograms, but this solves my problem, so I’m calling it quits for now. Let me know if you make improvements or find it useful.


Tags: , , | 3 Comments »

3 Responses to “Transparent overlapping histograms in R”

  1. Alice Becker Says:

    Very useful. You could also use transparent colors, which could mean less programming effort:
    e.g. col=rgb(0, 1, 0,0.5)) ..the 4th argument is the degree of transparency alpha

    cheers, alice

  2. Wow – I didn’t know that R supported transparency. That makes things a lot simpler. Nice tip!

  3. a = rnorm(10000,5)
    b = rnorm(10000,3)
    hist(a,xlim=c(0,10), col=rgb(0, 1, 0,0.5))
    hist(b,col= rgb(1, 0, 0,0.5),add=T)