Category: Digression

Google Search definitions upgrade is amazing

Have you used a Google search to look up the definition of a word recently?

For a long while now, Google has returned a definition from some online dictionary or wiki. Now, not only has the definition section improved, but you also get the word usage over time (from Google Book n-grams) and a very cool etymology tree.

I’d love to find out more about how that etymology visual is being generated!

incorrigible - Google Search

Joining The Data Guild

I am happy to announce that recently I’ve joined forces with The Data Guild!

What is — who are — The Data Guild? Their website says:

The Data Guild brings together deeply experienced data scientists, social scientists, designers and engineers from diverse industry backgrounds to tackle important problems and challenges.

This new relationship doesn’t encroach on any of the benefits and freedoms that I enjoy by working independently, and that was an important consideration. And there are great practical reasons to work with a team. But what really attracted my interest in The Data Guild, and the reasons why I want to work with them, are less tangible than these.


When I visited universities in India a few years ago, I had noticed a strong resistance to the sharing of knowledge that leads to creative thinking and unique ideas. The system in which those schools lived seemed severely limited in this regard. But by working as an independent consultant, I am constantly fighting a similar battle.

It costs me a great deal of energy to continually expose myself to new ideas and projects, to find inter-disciplinary collaboration. And I am rarely able to bounce ideas around with someone who understands the nuances of what I am talking about; I also lack intra-disciplinary collaboration.

By being part of a community like The Data Guild, I am hopeful to find frequent opportunities for such cross-pollination of ideas.

But that’s not the best part.


It has been two years since I started working independently as a consultant, and I have been naturally in a mood of self-assessment. I had quit my job back then because I was not satisfied in just earning a good salary. I had wanted to work on problems that I found more interesting and challenging. I feel good about what my progress on this front. But I had also wanted to work on projects that had some positive impact in a way that mattered to me. In that, I have far to go.

So it was perfect timing when, last month, I met with the founders of The Data Guild — Chris Diehl, Dave Gutelius and Cameron Turner. They talked about their vision of assembling a team of experts that were passionate about doing something significant with their efforts.

There is plenty of money to be made forming a company or working for one in the “big data” world. In this nascent industry, the “low-hanging fruit” — the business models that are immediately profitable — are ones that I do not find to be satisfying. Developing a new non-relational database, or optimizing bidding strategies for advertising — these projects are often technically impressive and have good business justification. But I do not find them compelling.

I would like to spend my time working on problems that are interesting not just for their own sake, but for the impact that they have on our world. On their first blog post, The Data Guild writes:


“We shouldn’t have been surprised; the best and brightest people we know want a chance to make a difference in the world, and to work creatively on teams where they can reach their full potential.  We wanted to create a space where these incredible teams could tackle the most significant global challenges we face – but also make a living doing it.  We wanted to challenge the idea that there’s a necessary tradeoff between making a difference and making a living.”

People who think like this are people I can be proud to work with. That is the reason I’m excited about working The Data Guild.


Kibana, Disqus at SF Python Project Night

At the #SFPython Project night earlier this week, Disqus gave us access to a pipeline of their data. Their data is composed of conversations (posts, comments, and votes) taking place throughout the web. As one of the largest blog (the largest?) blog comment-hosting services, they have a large amount of data.

The Data

The data we had to work with was streaming JSON. Mostly we were there for beer, food, and socializing, but in between sips we managed to make some efforts in tinkering with it.


Rob Scott of Inkling was only there for about 20 minutes, but he had enough time to throw together a sweet recursive python script that made it easy to parse out fields in the streaming JSON data.

I wrote a script to examine documents in the stream and conditionally add an event to one of several time-series. I was planning to use the stream to populate time series for things like “Votes”, “Posts by Users with n+ followers” “Comments by female users”, “Likes by Users in South America” etc. Then I hoped to use some existing time series anomaly detection libraries. I abandoned this effort in favor of a different idea…

Bobby Manuel of Shoptouch walked by and suggested using ElasticSearch + Kibana, which works very well with JSON data. We all agreed this was a great idea. It is also exactly the kind of thing that’s great for a hackathon — a pretty impressive output for a very small amount of work.

Elasticsearch + Kibana, very quickly

There wasn’t much time left when Bobby shared his idea. I had to start from scratch, installing Elasticsearch and Kibana, so I took several shortcuts. Instead of working with the JSON stream, I piped a few seconds of the data to a file, indexed it via bulk import in Elastic Search, and setup a Kibana dashboard.

The following are a few screenshots of the results. The first shows a breakdown of events in the pipeline by Message Type (Vote, Post, Threadvote, etc).

Disqus events by message_type

Disqus events by message_type

The second is what Kibana calls a “histogram” — I would find a different name — showing counts of events in buckets of 100 milliseconds. The interesting thing here is that Kibana easily parsed the timestamp once I specified the field. I was also able to filter by time ranges.

Histogram of events by time

Histogram of events by time

There were some other interesting screenshots that I will not post here since I don’t actually own this data.  but I was starting to explore the actual content of the data and users. A simple query revealed that “LOL” is more common than “ha”. The Disqus API allowed me to look up user details by the user id in the stream, so there is potential for augmenting the data stream in various ways.


This didn’t take much time; the biggest time-suck was what I thought was an encoding/unicode related error during the bulk import. If I had had more time I would have liked to work with the actual JSON data stream rather than a small segment of it.

One step would be to ease resource constraints by using something better than my little laptop. An EC2 instance would be enough, I think for a hack it would be just fine to stick ES and Kibana on the same box.

The second step would be to find an easy way to continually index the stream in Elasticsearch. One approach would be to use the pipe input API in Logstash ( I was already using curl to pipe the data stream to stdin, so this would be a straightforward proof-of-concept. Easy trumps robust for hacks. Alternately, I could have writen a script that catches the incoming JSON documents, adds the necessary metadata, and XPUTs them into Elasticsearch.  I explored neither of these approaches, and had another burrito instead.


I’m almost embarrassed that the Elasticsearch family of tools wasn’t already in my tool-belt for exploratory data analysis. They are now. It takes much less time that I had expected to set up and get useful results.  They are flexible with various types of input. And the above hacks barely scratch the surface of what is possible to do with ES.

These tools aren’t just for finely tuned production environments to deliver specific functionality around search.  Give them a try for data exploration and quick results … and hackathons.

/cc @bobbymanuel @cpdomina @northisup @rbscott7 @disqus

Acquisition of Drink Me magazine

Some of you may know that for some years now I’ve been working with Drink Me magazine. Recently, Drink Me was acquired, marking the end of a wonderful digression in my life.

I met Daniel Yaffe, the co-founder of Drink Me magazine, in 2008, on the advisory board of Momentus International. Before long I was working with Daniel and his co-founder Eriq Wities on developing a website Open Content‘s new project. The work was low stress, the people were fun to work with, and we all became good friends.

I’ve seen and heard Drink Me magazine described in many ways. To me, the magazine is an amazing and wonderfully successful effort to encourage that part of you that wonders how to make that drink you’re holding, that part of you that is more interested in gaining knowledge than in losing brain cells when imbibing a beverage. For the curious, inquisitive, and discriminate, for those that want to expand their horizons or deepen their appreciation, Drink Me is the perfect magazine.

Working with the Drink Me team has been a great learning experience for me, both technically and otherwise. The many perks included the frequent Drink Me Issue Release parties and sponsored events, festivals and tappings and little-known parties. I learned about beer and brewing, about more types of alcohol than I ever thought existed, and endless alcohol-related trivia. I’ve read every issue of the magazine, usually cover-to-cover, and even wrote a guest post for the blog once. I met many very interesting people, brewers and bartenders and cicerones and various types of alcohol aficionados.  The Drink Me crew and board members are a very talented group of designers and writers and entrepreneurs and organizers, and I had an opportunity to meet and even work with many of them.

My friendship with Daniel and Eriq has been the most important thing to come out of the project, and that’s not going anywhere. I’m keeping tabs on both of them, especially Daniel’s exciting new project, which I’m not sure I’m supposed to talk about here, so I won’t. But you’re going to love it.

So, here’s a toast to Drink Me and everyone that’s been involved with the project so far… and a toast as well to the new owners Cornelius and his team at Firewater Partners!

I sincerely hope the spirit of Drink Me lives on and grows in its new home. I look forward to reading every future issue, cover-to-cover.

Press Release:

Drink Me Magazine:
Firewater Partners:

China: Norway, too? Really?

I recently shared about how China was poking at India by publishing a controversial map on their passports. Well, China is irking another country I happen to feel close to … Norway.

Link to article by Daniel Drezner:

China has left Norway off a list of countries that can visit Beijing without a visa. The list includes all the EU countries, the US, Russia, Japan, major Latin-American countries, and others.Why? Apparently Norway or its citizens are “low-quality” and “badly behaved.”*snicker*The pot calling the kettle black, if you ask me… or if you ask the experts, like Drezner. In his article, he wonders how exactly one should measure quality and good behavior:

  • 2012 Corruptions Perception Index: Norway — Rank 7. China — Rank 80. 
  • 2012 Rule of Law Index: Norway — Rank 1. China — Rank 94 (out of 97).
  • 2012 Prosperity Index: Norway — Rank 1. China — Rank 55.
  • 2012 Commitment to Development Index: Norway — Rank 2. China — unranked
  • etc

A poem by a twitter bot

A poem by a twitter-bot.

I really wanna see Miami play
I’m eating like a healthy bitch today.
I notice I surprise myself a lot.
Like I’m a student not a astronaut.

You’re giving me a cardiac arrest
Mint chocolate truffle kisses are the best.
I see an angel with a broken wing.
mi baby anything a anything

I’d be a great assassin ….. Fighting Crime
I’m waiting. Waiting for the perfect time…
Another freaking day in paradise??
Relationships require sacrifice

By Pentametron 2013 (@pentametron)

Tweets credits (in order of appearance):

@RealPeterBotros, @omgchel, @0BEY_BlondeWiz, @T_Mackjr,
@Fay_alqahtani, @whatsaidjay, @sPiKeDiTaGaIn, @SandyAniska
@NotSoThick, @bellinaabra, @gmarkumdavis, @ReWjR93