virtualenv for a nltk project with ipython configuration

On a new Ubuntu machine, I needed to use NLTK. This serves as a quick reference for myself, and maybe you’ll find it useful as well.

> mkdir bananaproject
> cd bananaproject
> virtualenv ENV
> source ENV/bin/activate

I created a new folder for my project, and a new virtualenv for it. Virtualenv comes in  damn handy for managing portability and dependencies on multiple python projects. The last command activated the virtual environment, so subsequent commands are now taking place within it.

> pip install yolk
> yolk -l

I installed yolk, which then tells me what I have installed and ready to use in my virtualenv. I use this to check dependencies before I can install NLTK.

> sudo apt-get install python-numpy
> pip install pyyaml

Numpy is a package I’m okay with having installed system-wide, not just in this virtualenv. Pyyaml on the other hand I installed just for this project.

> mkdir ENV/src
> cd ENV/src
> wget
> unzip
> cd nltk-2.0.1rc1
> python install

Self-explanatory. Of course, the link to NLTK will soon be outdated; the latest can be found at The virtualenv was activated while I ran the install.

At this point I thought I was done, but when I started ipython and tried to import nltk, I got an import error. I need to tell ipython about the python executable I’m using and the changes to sys.path.

This is only necessary because of the way I set up my virtualenv and the order in which I have installed things. A simple alternate is to to use a virtualenv with the --no-site-packages option, and then install ipython afresh for that project.

This post came in handy: However, it was written in 2009, and I’m using ipython 0.12. A slight variation is necessary for ipython >= 0.11.

> vi ~/.ipython/
[ Use this, or a variation thereof: ]
> ipython profile create

The profile create command tells ipython to create default config files, which we can then play with. The command will tell you where the file has been created, and in it we need to find this line:
c.InteractiveShellApp.exec_files = []
and change it to:
c.InteractiveShellApp.exec_files = ['']

Now whenever I start ipython, the script will be executed, which will set my sys.path variables the way I need them. I can now happily import numpy and import nltk in ipython.

In [1]: import nltk
In [2]: phrase = nltk.word_tokenize("That was easy.")
In [3]: nltk.pos_tag(phrase)
Out[3]: [('That', 'DT'), ('was', 'VBD'), ('easy', 'JJ'), ('.', '.')]



  1. Andrew

    Hey Aman – I was just running into this issue w virtulenv and NLTK (and ipython for that matter). Thanks for the write up. I’ll be sure to try it out.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s