Udacity ud1111 Anaconda and Jupyter Notebooks notebook

completed in 2017-06-20

course can be found here



Managing environments

Managing environments

As I mentioned before, conda can be used to create environments to isolate your projects. To create an environment, use conda create -n env_name list of packages in your terminal. Here -n env_name sets the name of your environment (-n for name) and list of packages is the list of packages you want installed in the environment. For example, to create an environment named my_env and install numpy in it, type conda create -n my_env numpy.

When creating an environment, you can specify which version of Python to install in the environment. This is useful when you’re working with code in both Python 2.x and Python 3.x. To create an environment with a specific Python version, do something like conda create -n py3 python=3 or conda create -n py2 python=2. I actually have both of these environments on my personal computer. I use them as general environments not tied to any specific project, but rather for general work with each Python version easily accessible. These commands will install the most recent version of Python 3 and 2, respectively. To install a specific version, use conda create -n py python=3.3 for Python 3.3.

Entering an environment

Once you have an environment created, use source activate my_env to enter it on OSX/Linux. On Windows, use activate my_env.

When you’re in the environment, you’ll see the environment name in the terminal prompt. Something like (my_env) ~ $. The environment has only a few packages installed by default, plus the ones you installed when creating it. You can check this out with conda list. Installing packages in the environment is the same as before: conda install package_name. Only this time, the specific packages you install will only be available when you’re in the environment. To leave the environment, type source deactivate (on OSX/Linux). On Windows, use deactivate.

More environment actions

Saving and loading environments

A really useful feature is sharing environments so others can install all the packages used in your code, with the correct versions. You can save the packages to a YAML file with conda env export > environment.yaml. The first part conda env export writes out all the packages in the environment, including the Python version.

Above you can see the name of the environment and all the dependencies (along with versions) are listed. The second part of the export command, > environment.yaml writes the exported text to a YAML file environment.yaml. This file can now be shared and others will be able to create the same environment you used for the project.

To create an environment from an environment file use conda env create -f environment.yaml. This will create a new environment with the same name listed in environment.yaml.

Listing environments

If you forget what your environments are named (happens to me sometimes), use conda env list to list out all the environments you’ve created. You should see a list of environments, there will be an asterisk next to the environment you’re currently in. The default environment, the environment used when you aren’t in one, is called root.

Removing environments

If there are environments you don’t use anymore, conda env remove -n env_name will remove the specified environment (here, named env_name).

Best practices

Using environments

One thing that’s helped me tremendously is having separate environments for Python 2 and Python 3. I used conda create -n py2 python=2 and conda create -n py3 python=3 to create two separate environments, py2 and py3. Now I have a general use environment for each Python version. In each of those environments, I’ve installed most of the standard data science packages (numpy, scipy, pandas, etc.)

I’ve also found it useful to create environments for each project I’m working on. It works great for non-data related projects too like web apps with Flask. For example, I have an environment for my personal blog using Pelican.

Sharing environments

When sharing your code on GitHub, it’s good practice to make an environment file and include it in the repository. This will make it easier for people to install all the dependencies for your code. I also usually include a pip requirements.txt file using pip freeze (learn more here) for people not using conda.

More to learn

To learn more about conda and how it fits in the Python ecosystem, check out this article by Jake Vanderplas: Conda myths and misconceptions. And here’s the conda documentation you can reference later.

On Python versions at Udacity

Most Nanodegree programs at Udacity will be (or are already) using Python 3 almost exclusively.

Why we’re switching to Python 3

At this point, there are enough new features in Python 3 that it doesn’t make much sense to stick with Python 2 unless you’re working with old code. All new Python code should be written for version 3.

The main breakage between Python 2 and 3

For the most part, Python 2 code will work with Python 3. Of course, most new features introduced with Python 3 versions won’t be backwards compatible. The place where your Python 2 code will fail most often is the print statement.

For most of Python’s history including Python 2, printing was done like so:

print "Hello", "world!"
> Hello world!

This was changed in Python 3 to a function.

print("Hello", "world!")
> Hello world!

The print function was back-ported to Python 2 in version 2.6 through the __future__ module:

# In Python 2.6+
from __future__ import print_function
print("Hello", "world!")
> Hello world!

The print statement doesn’t work in Python 3. If you want to print something and have it work in both Python versions, you’ll need to import print_function in your Python 2 code.

Note for students in the Data Analyst Nanodegree program

Currently, most of the materials for this Nanodegree program are still guaranteed to work only for Python 2.7. You can quickly set up an environment for the current DAND program by opening the Resources tab and downloading an appropriate YAML file.

Note for students in the Machine Learning Engineer Nanodegree program

Currently, Machine Learning Engineer Nanodegree requires Python 2.7 to finish all the projects.

Jupyter Notebooks

What are Jupyter notebooks?

Markdown cells

so if you don’t have experience with LaTeX please read this primer on using it to create math expressions.

Magic keywords


here’s the list of all available magic commands.

Converting notebooks

For example, to convert a notebook to an HTML file, in your terminal use

jupyter nbconvert --to html notebook.ipynb

As always, learn more about nbconvert from the documentation.

Creating a slideshow

You can see an example of a slideshow here

Slides are full slides that you move through left to right. Sub-slides show up in the slideshow by pressing up or down. Fragments are hidden at first, then appear with a button press. You can skip cells in the slideshow with Skip and Notes leaves the cell as speaker notes.

Running the slideshow

To create the slideshow from the notebook file, you’ll need to use nbconvert:

jupyter nbconvert notebook.ipynb --to slides

This just converts the notebook to the necessary files for the slideshow, but you need to serve it with an HTTP server to actually see the presentation.

To convert it and immediately see it, use

jupyter nbconvert notebook.ipynb --to slides --post serve

This will open up the slideshow in your browser so you can present it.