R2D3 and other letters and numbers

Check out the alphabet soup of data web visualizations I am swimming in today.

  • R is statistical and computational software.
  • d3.js is a JavaScript library for building beautiful visualizations on the web. It uses scalable vector graphics (SVGs) directly from data through the document object model (DOM).
  • ggplot2 is a graphing library for R, developed by Hadley Wickham.
  • Raphaël.js — This is a JavaScript library for working with vector graphics. (It’s different: Raphaël.js creates and manipulates vector graphical objects that are also DOM objects. D3.js is primarily designed to tie data directly to DOM objects.  There is some overlap, but they’re different.)

The first three are pretty powerful and, if they are not already, are fast becoming critical parts of the data toolkit. The last is a promising newcomer, worth keeping an eye on.

So far so good. If you’re a data nerd, you probably already know all this. Stick with me.

It turns out that all these libraries, doing slightly different but related things, and doing them well, would work very well together. They’re not tightly integrated (yet) but there are several efforts to make it so.

Hadley Wickam, creator of the R package ggplot2, is a fan of d3.js and has suggested that the next version of ggplot2 will probably be redone on the web, likely using d3. He’s also working on a new R library that more immediately allows them to work well together. This is  great news.

He’s calling it R2D3 (– named, supposedly, more at the insistence of friends that are Star Wars geeks than due to his own fandom).

r2d3

(Confusingly, there were some unfounded rumors that Hadley’s next version of ggplot would be called R2D3.)

There are also a few projects to get Raphaël.js to work well with d3.js. One of them is called ‘d34raphael‘. Another, a bit more ambitious, is a custom build of d3 powered by Raphael. Awesome! Guess what it’s called? R2D3.

It’s not that uncommon for two open source libraries to have the same name, but these libraries both address the needs of a pretty niche audience. They both work with d3.js, but one extends “upstream” towards the data and the other extends “downstream” toward the graphics. It’s more than conceivable for someone to want to use all them at the same time: R, R2D3, D3, R2D3, and Raphael.

Apparently the the two authors, Mike Hemesath and Hadley Wickham didn’t know about each other’s projects when they named their own. If both projects are adopted widely, it will be interesting to see if either of them eventually decides to change names.

 

Advertisements

3 comments

  1. mhemesath

    Hadley and I spoke about this briefly on twitter. R2D3, the custom build of D3 and Raphael, should have a short shelf life. I’m hoping that IE8 support isn’t needed much more than a year or two from now, at which point a D3 powered by Raphael will be useless.

  2. Brad Barnett (@bradrbarnett)

    Thanks for the great writeup. I’ve been trying to decide whether to focus on R or D3, mainly for purposes of visualization rather than analysis. Given the choice, which would you suggest sinking time into? I’m in the middle of learning python right now, so I’m wondering whether I could just use it for some of the data wrangling and then focus on D3 for all the visualization. Any thoughts on this or whether R2D3 might change your answer on way or another?

    • Aman

      @Brad: Depends on your goals, of course. Here’s a generalization that might help:

      • R + ggplot uses (one or a combination of several) “standard” plots, such as bar, line, and others. The sophistication is that of building your plot in layers, being able to apply statistical transforms on the fly, etc etc. You can publish your work by exporting to a image format like PNG or to postscript/PDF. See http://docs.ggplot2.org/current/. If your goal is understanding your data or communicating your data analysis and similar work, then it makes sense to focus your viz energies on R+ggplot. [Also consider using plotting/viz tools within the python umbrella, since you are using python already.]
      • D3.js takes a different perspective, allowing you to build your visualizations from the ground-up in a way that directly ties the data to the browser’s DOM. It is much more work to create each viz, but the benefit is complete control and interactivity. You publish your work as a page on a website, and the viz may change based on changing data or user interactions. A lot of work is going on to build tools on top of d3, to use it as a kernel to drive other higher-level “languages”. If your goal is to build expertise in beautiful, interactive, engaging visualizations, d3 and the current and future d3 tools are a good place to focus.

      And yes, this answer will definitely change — the available tools are evolving rapidly. Perhaps a future workflow might involve using ggplot to quickly create d3-based visualizations that can serve as a starting point for adding d3-like features, such as interactivity. Perhaps the future in lies in a python version of ggplot, incorporating the grammar of graphics (https://github.com/ContinuumIO/Bokeh?). In the near future, d3 seems well positioned to be involved, even if invisible behind the scenes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s