Detached HEAD — a git discovery

Recently I found myself with a detached HEAD. In Git.

This was the first time I encountered such a thing. When you are working on, or checkout, commits that are not attached to any branch, you have a detached head situation. Your commits are branchless. There is a pretty easy fix to this, and the solution is pretty easy to find on SO.

Check out SO: Why did git detach my head?

I retraced my steps to figure out exactly how this happened.

I created a branch (git branch newfeature; git checkout newfeature) and then cloned my repository for further work on this branch. This created an ambiguity for git: both the clone and master branch had a branch named newfeature. When I pulled my work from master with git pull , the commits were not attached to any branch.

The symptoms

I didn’t recognize this unfamiliar situation. I did notice I couldn’t find all those commits.

  • They weren’t visible with git log or git log newfeature.
  • git status with newfeature checked out showed a clean working directory.

With help from @ddunlop, I was finally able to view the commits with git log <hash>. I got the commit hash using git log in my cloned repo.

This is how I resolved the problem.

  1. git checkout <hash>.  I checked out my most recent commit using its hash. Git informed me that I was now in a ‘detached HEAD’ state. After that it was easy. I googled the provocative “detached HEAD” message and did some learning.
  2. git checkout newfeature
  3. git branch newfeature_2 6e51426cdb
  4. git merge newfeature_2
  5. git checkout master
  6. git merge newfeature

Then I just deleted the extra branches.

In the process, I also learned about “tracking” branches. Check out the useful SO: Switch branch without detaching head

Interpretation of Silhouette Plots (Clustering)

I am cross-posting here one of my answers on Stack Exchange – Cross Validation (that’s like Stack Overflow for statistics). The question had already been answered and accepted when I posted my answer several months later, but I chose to spend some time putting in my thoughts anyway.

I’m particularly interested in the interpretation of simple plots in the context of exploratory data analysis, and am planning to compile resources for data explorers on this subject. My plan is to do this via a wiki — which I have already installed but not yet populated with much information. So busy these days!

Q: How to interpret mean of Silhouette plot?

[I have paraphrased the OP's question here, because I am reluctant to copy someone else's content onto my blog. -AA]

How does one determine the number of clusters through interpretation of a Silhouette plot?

My answer

(Click through to my answer on Cross Validated for the most recent version of the answer, which may have changed.)

Sergey’s answer contains the critical point, which is that the silhouette coefficient quantifies the quality of clustering achieved — so you should select the number of clusters that maximizes the silhouette coefficient.


The long answer is that the best way to evaluate the results of your clustering efforts is to start by actually examining — human inspection — the clusters formed and making a determination based on an understanding of what the data represents, what a cluster represents, and what the clustering is intended to achieve.

There are numerous quantitative methods of evaluating clustering results which should be used as tools, with full understanding of the limitations. They tend to be fairly intuitive in nature, and thus have a natural appeal (like clustering problems in general).

Examples: cluster mass / radius / density, cohesion or separation between clusters, etc. These concepts are often combined, for example, the ratio of separation to cohesion should be large if clustering was successful.

The way clustering is measured is informed by the type of clustering algorithms used. For example, measuring quality of a complete clustering algorithm (in which all points are put into clusters) can be very different from measuring quality of a threshold-based fuzzy clustering algorithm (in which some point might be left un-clustered as ‘noise’).


The silhouette coefficient is one such measure. It works as follows:

For each point p, first find the average distance between p and all other points in the same cluster (this is a measure of cohesion, call it A). Then find the average distance between p and all points in the nearest cluster (this is a measure of separation from the closest other cluster, call it B). The silhouette coefficient for p is the difference between B and A divided by the greater of the two (max(A,B)).

We evaluate the cluster coefficient of each point and from this we can obtain the ‘overall’ average cluster coefficient.

Intuitively, we are trying to measure the space between clusters. If cluster cohesion is good (A is small) and cluster separation is good (B is large), the numerator will be large, etc.

I’ve constructed an example here to demonstrate this graphically.

Clustering coefficient Results of clustering for nclusters = 2:5

In these plots the same data is plotted five times; the colors indicate the clusters created by k-means clustering, with k = 1,2,3,4,5. That is, I’ve forced a clustering algorithm to divide the data into 2 clusters, then 3, and so on, and colored the graph accordingly.

The silhouette plot shows the that the silhouette coefficient was highest when k = 3, suggesting that’s the optimal number of clusters. In this example we are lucky to be able to visualize the data and we might agree that indeed, three clusters best captures the segmentation of this data set.

If we were unable to visualize the data, perhaps because of higher dimensionality, a silhouette plot would still give us a suggestion. However, I hope my somewhat long-winded answer here also makes the point that this “suggestion” could be very insufficient or just plain wrong in certain scenarios.

Acquisition of Drink Me magazine

Some of you may know that for some years now I’ve been working with Drink Me magazine. Recently, Drink Me was acquired, marking the end of a wonderful digression in my life.

I met Daniel Yaffe, the co-founder of Drink Me magazine, in 2008, on the advisory board of Momentus International. Before long I was working with Daniel and his co-founder Eriq Wities on developing a website Open Content‘s new project. The work was low stress, the people were fun to work with, and we all became good friends.

I’ve seen and heard Drink Me magazine described in many ways. To me, the magazine is an amazing and wonderfully successful effort to encourage that part of you that wonders how to make that drink you’re holding, that part of you that is more interested in gaining knowledge than in losing brain cells when imbibing a beverage. For the curious, inquisitive, and discriminate, for those that want to expand their horizons or deepen their appreciation, Drink Me is the perfect magazine.

Working with the Drink Me team has been a great learning experience for me, both technically and otherwise. The many perks included the frequent Drink Me Issue Release parties and sponsored events, festivals and tappings and little-known parties. I learned about beer and brewing, about more types of alcohol than I ever thought existed, and endless alcohol-related trivia. I’ve read every issue of the magazine, usually cover-to-cover, and even wrote a guest post for the blog once. I met many very interesting people, brewers and bartenders and cicerones and various types of alcohol aficionados.  The Drink Me crew and board members are a very talented group of designers and writers and entrepreneurs and organizers, and I had an opportunity to meet and even work with many of them.

My friendship with Daniel and Eriq has been the most important thing to come out of the project, and that’s not going anywhere. I’m keeping tabs on both of them, especially Daniel’s exciting new project, which I’m not sure I’m supposed to talk about here, so I won’t. But you’re going to love it.

So, here’s a toast to Drink Me and everyone that’s been involved with the project so far… and a toast as well to the new owners Cornelius and his team at Firewater Partners!

I sincerely hope the spirit of Drink Me lives on and grows in its new home. I look forward to reading every future issue, cover-to-cover.

Press Release: http://firewaterpartners.com/Media/FIREWATER_PARTNERS_Drink_Me_PR_Jan_22.pdf

Drink Me Magazine: http://drinkmemag.com
Firewater Partners: http://firewaterpartners.com

A list of public open Data Sets

I have been collecting a small list of public / open data sets for my own personal use. I have put the list online as a first entry on a new wiki. You can check it out here:

http://eda.amanahuja.me/PublicDataSets

A couple other comments, for the curious:

  • The wiki is also tied to another domain, so you can see the same page from http://eda.fenristech.com/PublicDataSets. This is an unresolved internal conflict. Suggestions welcome. 
  • The purpose of this wiki isn’t quite defined, but I have a good idea of where it is headed. Stay tuned to learn more.
  • I really wanted a way to sync an Evernote notebook to a MoimMoin wiki. I don’t think something like that already exists, nor do I have the time to work on it, but it would be damn convenient right now.

 

R2D3 and other letters and numbers

Check out the alphabet soup of data web visualizations I am swimming in today.

  • R is statistical and computational software.
  • d3.js is a JavaScript library for building beautiful visualizations on the web. It uses scalable vector graphics (SVGs) directly from data through the document object model (DOM).
  • ggplot2 is a graphing library for R, developed by Hadley Wickham.
  • Raphaël.js – This is a JavaScript library for working with vector graphics. (It’s different: Raphaël.js creates and manipulates vector graphical objects that are also DOM objects. D3.js is primarily designed to tie data directly to DOM objects.  There is some overlap, but they’re different.)

The first three are pretty powerful and, if they are not already, are fast becoming critical parts of the data toolkit. The last is a promising newcomer, worth keeping an eye on.

So far so good. If you’re a data nerd, you probably already know all this. Stick with me.

It turns out that all these libraries, doing slightly different but related things, and doing them well, would work very well together. They’re not tightly integrated (yet) but there are several efforts to make it so.

Hadley Wickam, creator of the R package ggplot2, is a fan of d3.js and has suggested that the next version of ggplot2 will probably be redone on the web, likely using d3. He’s also working on a new R library that more immediately allows them to work well together. This is  great news.

He’s calling it R2D3 (– named, supposedly, more at the insistence of friends that are Star Wars geeks than due to his own fandom).

r2d3

(Confusingly, there were some unfounded rumors that Hadley’s next version of ggplot would be called R2D3.)

There are also a few projects to get Raphaël.js to work well with d3.js. One of them is called ‘d34raphael‘. Another, a bit more ambitious, is a custom build of d3 powered by Raphael. Awesome! Guess what it’s called? R2D3.

It’s not that uncommon for two open source libraries to have the same name, but these libraries both address the needs of a pretty niche audience. They both work with d3.js, but one extends “upstream” towards the data and the other extends “downstream” toward the graphics. It’s more than conceivable for someone to want to use all them at the same time: R, R2D3, D3, R2D3, and Raphael.

Apparently the the two authors, Mike Hemesath and Hadley Wickham didn’t know about each other’s projects when they named their own. If both projects are adopted widely, it will be interesting to see if either of them eventually decides to change names.

 

China: Norway, too? Really?

I recently shared about how China was poking at India by publishing a controversial map on their passports. Well, China is irking another country I happen to feel close to … Norway.

Link to article by Daniel Drezner:
http://drezner.foreignpolicy.com/posts/2012/12/06/is_norway_a_low_quality_badly_behaved_country

China has left Norway off a list of countries that can visit Beijing without a visa. The list includes all the EU countries, the US, Russia, Japan, major Latin-American countries, and others.Why? Apparently Norway or its citizens are “low-quality” and “badly behaved.”*snicker*The pot calling the kettle black, if you ask me… or if you ask the experts, like Drezner. In his article, he wonders how exactly one should measure quality and good behavior:

  • 2012 Corruptions Perception Index: Norway — Rank 7. China — Rank 80. 
  • 2012 Rule of Law Index: Norway — Rank 1. China — Rank 94 (out of 97).
  • 2012 Prosperity Index: Norway — Rank 1. China — Rank 55.
  • 2012 Commitment to Development Index: Norway — Rank 2. China — unranked
  • etc

A poem by a twitter bot

A poem by a twitter-bot.

I really wanna see Miami play
I’m eating like a healthy bitch today.
I notice I surprise myself a lot.
Like I’m a student not a astronaut.

You’re giving me a cardiac arrest
Mint chocolate truffle kisses are the best.
I see an angel with a broken wing.
mi baby anything a anything

I’d be a great assassin ….. Fighting Crime
I’m waiting. Waiting for the perfect time…
Another freaking day in paradise??
Relationships require sacrifice

By Pentametron 2013 (@pentametron)
http://pentametron.com/

Tweets credits (in order of appearance):

@RealPeterBotros, @omgchel, @0BEY_BlondeWiz, @T_Mackjr,
@Fay_alqahtani, @whatsaidjay, @sPiKeDiTaGaIn, @SandyAniska
@NotSoThick, @bellinaabra, @gmarkumdavis, @ReWjR93