Setting Up Python for Machine Learning on Windows

This Post Was Originally Published on Real Python on Oct 31st, 2018 by Renato Candido.

Python has been largely used for numerical and scientific applications in the last years. However, to perform numerical computations in an efficient manner, Python relies on external libraries, sometimes implemented in other languages, such as the NumPy library, which is partly implemented using the Fortran language.

Due to these dependencies, sometimes it isn’t trivial to set up an environment for numerical computations, linking all the necessary libraries. It’s common for people to struggle to get things working in workshops involving the use of Python for machine learning, especially when they are using an operating system that lacks a package management system, such as Windows.

In this article, you’ll:

  • Walk through the details for setting up a Python environment for numerical computations on a Windows operating system
  • Be introduced to Anaconda, a Python distribution proposed to circumvent these setup problems
  • See how to install the distribution on a Windows machine and use its tools to manage packages and environments
  • Use the installed Python stack to build a neural network and train it to solve a classic classification problem

Introducing Anaconda and Conda

Since 2011, Python has included pip, a package management system used to install and manage software packages written in Python. However, for numerical computations, there are several dependencies that are not written in Python, so the initial releases of pip could not solve the problem by themselves.

To circumvent this problem, Continuum Analytics released Anaconda, a Python distribution focused on scientific applications and Conda, a package and environment management system, which is used by the Anaconda distribution. It’s worth noticing that the more recent versions of pip can handle external dependencies using wheels, but, by using Anaconda, you’ll be able to install critical libraries for data science more smoothly. (You can read more on this discussion here.)

Although Conda is tightly coupled to the Anaconda Python Distribution, the two are distinct projects with different goals:

  • Anaconda is a full distribution of the software in the PyData ecosystem, including Python itself along with binaries for several third-party open-source projects. Besides Anaconda, there’s also Miniconda, which is a minimal Python distribution including basically Conda and its dependencies so that you can install only the packages you need, from scratch.

  • Conda is a package, dependency, and environment management system that could be installed without the Anaconda or Miniconda distribution. It runs on Windows, macOS, and Linux and was created for Python programs, but it can package and distribute software for any language. The main purpose is to solve external dependencies issues in an easy way, by downloading pre-compiled versions of software.

    In this sense, it is more like a cross-platform version of a general purpose package manager such as APT) or YUM), which helps to find and install packages in a language-agnostic way. Also, Conda is an environment manager, so if you need a package that requires a different version of Python, by using Conda, it is possible to set up a separate environment with a totally different version of Python, maintaining your usual version of Python on your default environment.

There’s a lot of discussion regarding the creation of another package management system for the Python ecosystem. It’s worth mentioning that Conda’s creators pushed Python standard packaging to the limit and only created a second tool when it was clear that it was the only reasonable way forward.

Curiously, even Guido van Rossum, at his speech at the inaugural PyData meetup in 2012, said that, when it comes to packaging, “it really sounds like your needs are so unusual compared to the larger Python community that you’re just better off building your own.” (You can watch a video of this discussion.) More information about this discussion can be found here and here.

Anaconda and Miniconda have become the most popular Python distributions, widely used for data science and machine learning in various companies and research laboratories. They are free and open source projects and currently include 1400+ packages in the repository. In the following section, we’ll go through the installation of the Miniconda Python distribution on a Windows machine.

Installing the Miniconda Python Distribution

In this section, you’ll see step-by-step how to set up a data science Python environment on Windows. Instead of the full Anaconda distribution, you’ll be using Miniconda to set up a minimal environment containing only Conda and its dependencies, and you’ll use that to install the necessary packages.

The installation processes for Miniconda and Anaconda are very similar. The basic difference is that Anaconda provides an environment with a lot of pre-installed packages, many of which are never used. (You can check the list here.) Miniconda is minimalist and clean, and it allows you to easily install any of Anaconda’s packages.

In this article, the focus will be on using the command line interface (CLI) to set up the packages and environments. However, it’s possible to use Conda to install Anaconda Navigator, a graphical user interface (GUI), if you wish.

Miniconda can be installed using an installer available here. You’ll notice there are installers for Windows, macOS, and Linux, and for 32-bit or 64-bit operating systems. You should consider the appropriate architecture according to your Windows installation and download the Python 3.x version (at the time of writing this article, 3.7).

There’s no reason to use Python 2 on a fresh project anymore, and if you do need Python 2 on some project you’re working on, due to some library that has not been updated, it is possible to set up a Python 2 environment using Conda, even if you installed the Miniconda Python 3.x distribution, as you will see in the next section.

After the download finishes, you just have to run the installer and follow the installation steps:

  • Click on Next on the welcome screen:
Miniconda Installer 1
  • Click on I Agree to agree to the license terms:
Miniconda Installer 2
  • Choose the installation type and click Next. Another advantage of using Anaconda or Miniconda is that it is possible to install the distribution using a local account. (It isn’t necessary to have an administrator account.) If this is the case, choose Just Me. Otherwise, if you have an administrator account, you may choose All Users:
Miniconda Installer 3
  • Choose the install location and click Next. If you’ve chosen to install just for you, the default location will be the folder Miniconda3 under your user’s personal folder. It’s important not to use spaces in the folder names in the path to Miniconda, since many Python packages have problems when spaces are used in folder names:
Miniconda Installer 4
  • In Advanced Installation Options, the suggestion is to use the default choices, which are to not add Anaconda to the PATH environment variable and to register Anaconda as the default Python. Click Install to begin installation:
Miniconda Installer 5
  • Wait while the installer copies the files:
Miniconda Installer 6
  • When the installation completes, click on Next:
Miniconda Installer 7
  • Click on Finish to finish the installation and close the installer:
Miniconda Installer 8

As Anaconda was not included in the PATH environment variable, its commands won’t work in the Windows default command prompt. To use the distribution, you should start its own command prompt, which can be done by clicking on the Start button and on Anaconda Prompt under Anaconda3 (64 bit):

Anaconda Prompt Start

When the prompt opens, you can check if Conda is available by running conda --version:

(base) C:\Users\IEUser>conda --version
conda 4.5.11

To get more information about the installation, you can run conda info:

(base) C:\Users\IEUser>conda info

     active environment : base
    active env location : C:\Users\IEUser\Miniconda3
            shell level : 1
       user config file : C:\Users\IEUser\.condarc
 populated config files : C:\Users\IEUser\.condarc
          conda version : 4.5.11
    conda-build version : not installed
         python version : 3.7.0.final.0
       base environment : C:\Users\IEUser\Miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/win-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/win-64
                          https://repo.anaconda.com/pkgs/pro/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\Users\IEUser\Miniconda3\pkgs
                          C:\Users\IEUser\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\IEUser\Miniconda3\envs
                          C:\Users\IEUser\AppData\Local\conda\conda\envs
                          C:\Users\IEUser\.conda\envs
               platform : win-64
             user-agent : conda/4.5.11 requests/2.19.1 CPython/3.7.0 Windows/10 Windows/10.0.17134
          administrator : False
             netrc file : None
           offline mode : False

Now that you have Miniconda installed, let’s see how Conda environments work.

Understanding Conda Environments

When you start developing a project from scratch, it’s recommended that you use the latest versions of the libraries you need. However, when working with someone else’s project, such as when running an example from Kaggle or Github, you may need to install specific versions of packages or even another version of Python due to compatibility issues.

This problem may also occur when you try to run an application you’ve developed long ago, which uses a particular library version that does not work with your application anymore due to updates.

Virtual environments are a solution to this kind of problem. By using them, it is possible to create multiple environments, each one with different versions of packages. A typical Python set up includes Virtualenv, a tool to create isolated Python virtual environments, widely used in the Python community.

Conda includes its own environment manager and presents some advantages over Virtualenv, especially concerning numerical applications, such as the ability to manage non-Python dependencies and the ability to manage different versions of Python, which is not possible with Virtualenv. Besides that, Conda environments are entirely compatible with default Python packages that may be installed using pip.

Miniconda installation provides Conda and a root environment with a version of Python and some basic packages installed. Besides this root environment, it is possible to set up additional environments including different versions of Python and packages.

Using the Anaconda prompt, it is possible to check the available Conda environments by running conda env list:

(base) C:\Users\IEUser>conda env list
# conda environments:
#
base                  *  C:\Users\IEUser\Miniconda3

This base environment is the root environment, created by the Miniconda installer. It is possible to create another environment, named otherenv, by running conda create --name otherenv:

(base) C:\Users\IEUser>conda create --name otherenv
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3\envs\otherenv


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate otherenv
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As notified after the environment creation process is finished, it is possible to activate the otherenv environment by running conda activate otherenv. You’ll notice the environment has changed by the indication between parentheses in the beginning of the prompt:

(base) C:\Users\IEUser>conda activate otherenv

(otherenv) C:\Users\IEUser>

You can open the Python interpreter within this environment by running python:

(otherenv) C:\Users\IEUser>python
Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

The environment includes Python 3.7.0, the same version included in the root base environment. To exit the Python interpreter, just run quit():

>>> quit()

(otherenv) C:\Users\IEUser>

To deactivate the otherenv environment and go back to the root base environment, you should run deactivate:

(otherenv) C:\Users\IEUser>deactivate

(base) C:\Users\IEUser>

As mentioned earlier, Conda allows you to easily create environments with different versions of Python, which is not straightforward with Virtualenv. To include a different Python version within an environment, you have to specify it by using python=<version> when running conda create. For example, to create an environment named py2 with Python 2.7, you have to run conda create --name py2 python=2.7:

(base) C:\Users\IEUser>conda create --name py2 python=2.7
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3\envs\py2

  added / updated specs:
    - python=2.7


The following NEW packages will be INSTALLED:

    certifi:        2018.8.24-py27_1
    pip:            10.0.1-py27_0
    python:         2.7.15-he216670_0
    setuptools:     40.2.0-py27_0
    vc:             9-h7299396_1
    vs2008_runtime: 9.00.30729.1-hfaea7d5_1
    wheel:          0.31.1-py27_0
    wincertstore:   0.2-py27hf04cefb_0

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py2
#
# To deactivate an active environment, use
#
#     $ conda deactivate


(base) C:\Users\IEUser>

As shown by the output of conda create, this time some new packages were installed, since the new environment uses Python 2. You can check the new environment indeed uses Python 2 by activating it and running the Python interpreter:

(base) C:\Users\IEUser>conda activate py2

(py2) C:\Users\IEUser>python
Python 2.7.15 |Anaconda, Inc.| (default, May  1 2018, 18:37:09) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Now, if you run conda env list, you should see the two environments that were created, besides the root base environment:

(py2) C:\Users\IEUser>conda env list
# conda environments:
#
base                     C:\Users\IEUser\Miniconda3
otherenv                 C:\Users\IEUser\Miniconda3\envs\otherenv
py2               *  C:\Users\IEUser\Miniconda3\envs\py2


(py2) C:\Users\IEUser>

In the list, the asterisk indicates the activated environment. It is possible to remove an environment by running conda remove --name <environment name> --all. Since it is not possible to remove an activated environment, you should first deactivate the py2 environment, to remove it:

(py2) C:\Users\IEUser>deactivate

(base) C:\Users\IEUser>conda remove --name py2 --all

Remove all packages in environment C:\Users\IEUser\Miniconda3\envs\py2:


## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3\envs\py2


The following packages will be REMOVED:

    certifi:        2018.8.24-py27_1
    pip:            10.0.1-py27_0
    python:         2.7.15-he216670_0
    setuptools:     40.2.0-py27_0
    vc:             9-h7299396_1
    vs2008_runtime: 9.00.30729.1-hfaea7d5_1
    wheel:          0.31.1-py27_0
    wincertstore:   0.2-py27hf04cefb_0

Proceed ([y]/n)? y


(base) C:\Users\IEUser>

Now that you’ve covered the basics of managing environments with Conda, let’s see how to manage packages within the environments.

Understanding Basic Package Management With Conda

Within each environment, packages of software can be installed using the Conda package manager. The root base environment created by the Miniconda installer includes some packages by default that are not part of Python standard library.

The default installation includes the minimum packages necessary to use Conda. To check the list of installed packages in an environment, you just have to make sure it is activated and run conda list. In the root environment, the following packages are installed by default:

(base) C:\Users\IEUser>conda list
# packages in environment at C:\Users\IEUser\Miniconda3:
#
# Name                    Version                   Build  Channel
asn1crypto                0.24.0                   py37_0
ca-certificates           2018.03.07                    0
certifi                   2018.8.24                py37_1
cffi                      1.11.5           py37h74b6da3_1
chardet                   3.0.4                    py37_1
conda                     4.5.11                   py37_0
conda-env                 2.6.0                         1
console_shortcut          0.1.1                         3
cryptography              2.3.1            py37h74b6da3_0
idna                      2.7                      py37_0
menuinst                  1.4.14           py37hfa6e2cd_0
openssl                   1.0.2p               hfa6e2cd_0
pip                       10.0.1                   py37_0
pycosat                   0.6.3            py37hfa6e2cd_0
pycparser                 2.18                     py37_1
pyopenssl                 18.0.0                   py37_0
pysocks                   1.6.8                    py37_0
python                    3.7.0                hea74fb7_0
pywin32                   223              py37hfa6e2cd_1
requests                  2.19.1                   py37_0
ruamel_yaml               0.15.46          py37hfa6e2cd_0
setuptools                40.2.0                   py37_0
six                       1.11.0                   py37_1
urllib3                   1.23                     py37_0
vc                        14                   h0510ff6_3
vs2015_runtime            14.0.25123                    3
wheel                     0.31.1                   py37_0
win_inet_pton             1.0.1                    py37_1
wincertstore              0.2                      py37_0
yaml                      0.1.7                hc54c509_2

(base) C:\Users\IEUser>

To manage the packages, you should also use Conda. Next, let’s see how to search, install, update, and remove packages using Conda.

Searching and Installing Packages

Packages are installed from repositories called channels by Conda, and some default channels are configured by the installer. To search for a specific package, you can run conda search <package name>. For example, this is how you search for the keras package (a machine learning library):

(base) C:\Users\IEUser>conda search keras
Loading channels: done
# Name                  Version           Build  Channel
keras                     2.0.8  py35h15001cb_0  pkgs/main
keras                     2.0.8  py36h65e7a35_0  pkgs/main
keras                     2.1.2          py35_0  pkgs/main
keras                     2.1.2          py36_0  pkgs/main
keras                     2.1.3          py35_0  pkgs/main
keras                     2.1.3          py36_0  pkgs/main

... (more)

According to the previous output, there are different versions of the package and different builds for each version, such as for Python 3.5 and 3.6.

The previous search shows only exact matches for packages named keras. To perform a broader search, including all packages containing keras in their names, you should use the wildcard *. For example, when you run conda search *keras*, you get the following:

(base) C:\Users\IEUser>conda search *keras*
Loading channels: done
# Name                  Version           Build  Channel
keras                     2.0.8  py35h15001cb_0  pkgs/main
keras                     2.0.8  py36h65e7a35_0  pkgs/main
keras                     2.1.2          py35_0  pkgs/main
keras                     2.1.2          py36_0  pkgs/main
keras                     2.1.3          py35_0  pkgs/main
keras                     2.1.3          py36_0  pkgs/main

... (more)

keras-applications           1.0.2          py35_0  pkgs/main
keras-applications           1.0.2          py36_0  pkgs/main
keras-applications           1.0.4          py35_0  pkgs/main

... (more)

keras-base                2.2.0          py35_0  pkgs/main
keras-base                2.2.0          py36_0  pkgs/main

... (more)

As the previous output shows, there are some other keras related packages in the default channels.

To install a package, you should run conda install <package name>. By default, the newest version of the package will be installed in the active environment. So, let’s install the package keras in the environment otherenv that you’ve already created:

(base) C:\Users\IEUser>conda activate otherenv

(otherenv) C:\Users\IEUser>conda install keras
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3\envs\otherenv

  added / updated specs:
    - keras


The following NEW packages will be INSTALLED:

    _tflow_1100_select:  0.0.3-mkl
    absl-py:             0.4.1-py36_0
    astor:               0.7.1-py36_0
    blas:                1.0-mkl
    certifi:             2018.8.24-py36_1
    gast:                0.2.0-py36_0
    grpcio:              1.12.1-py36h1a1b453_0
    h5py:                2.8.0-py36h3bdd7fb_2
    hdf5:                1.10.2-hac2f561_1
    icc_rt:              2017.0.4-h97af966_0
    intel-openmp:        2018.0.3-0
    keras:               2.2.2-0
    keras-applications:  1.0.4-py36_1
    keras-base:          2.2.2-py36_0
    keras-preprocessing: 1.0.2-py36_1
    libmklml:            2018.0.3-1
    libprotobuf:         3.6.0-h1a1b453_0
    markdown:            2.6.11-py36_0
    mkl:                 2019.0-117
    mkl_fft:             1.0.4-py36h1e22a9b_1
    mkl_random:          1.0.1-py36h77b88f5_1
    numpy:               1.15.1-py36ha559c80_0
    numpy-base:          1.15.1-py36h8128ebf_0
    pip:                 10.0.1-py36_0
    protobuf:            3.6.0-py36he025d50_0
    python:              3.6.6-hea74fb7_0
    pyyaml:              3.13-py36hfa6e2cd_0
    scipy:               1.1.0-py36h4f6bf74_1
    setuptools:          40.2.0-py36_0
    six:                 1.11.0-py36_1
    tensorboard:         1.10.0-py36he025d50_0
    tensorflow:          1.10.0-mkl_py36hb361250_0
    tensorflow-base:     1.10.0-mkl_py36h81393da_0
    termcolor:           1.1.0-py36_1
    vc:                  14-h0510ff6_3
    vs2013_runtime:      12.0.21005-1
    vs2015_runtime:      14.0.25123-3
    werkzeug:            0.14.1-py36_0
    wheel:               0.31.1-py36_0
    wincertstore:        0.2-py36h7fe50ca_0
    yaml:                0.1.7-hc54c509_2
    zlib:                1.2.11-h8395fce_2

Proceed ([y]/n)?

Conda manages the necessary dependencies for a package when it is installed. Since the package keras has a lot of dependencies, when you install it, Conda manages to install this big list of packages.

It’s worth noticing that, since the keras package’s newest build uses Python 3.6 and the otherenv environment was created using Python 3.7, the package python version 3.6.6 was included as a dependency. After confirming the installation, you can check that the Python version for the otherenv environment is downgraded to the 3.6.6 version.

Sometimes, you don’t want packages to be downgraded, and it would be better to just create a new environment with the necessary version of Python. To check the list of new packages, updates, and downgrades necessary for a package without installing it, you should use the parameter --dry-run. For example, to check the packages that will be changed by the installation of the package keras, you should run the following:

(otherenv) C:\Users\IEUser>conda install keras --dry-run

However, if necessary, it is possible to change the default Python of a Conda environment by installing a specific version of the package python. To demonstrate that, let’s create a new environment called envpython:

(otherenv) C:\Users\IEUser>conda create --name envpython
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3\envs\envpython


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate envpython
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As you saw before, since the root base environment uses Python 3.7, envpython is created including this same version of Python:

(base) C:\Users\IEUser>conda activate envpython

(envpython) C:\Users\IEUser>python
Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

(envpython) C:\Users\IEUser>

To install a specific version of a package, you can run conda install <package name>=<version>. For example, this is how you install Python 3.6 in the envpython environment:

(envpython) C:\Users\IEUser>conda install python=3.6
Solving environment: done

Package Plan

  environment location: C:\Users\IEUser\Miniconda3\envs\envpython

  added / updated specs:
    - python=3.6

The following NEW packages will be INSTALLED:

certifi:        2018.8.24-py36_1
pip:            10.0.1-py36_0
python:         3.6.6-hea74fb7_0
setuptools:     40.2.0-py36_0
vc:             14-h0510ff6_3
vs2015_runtime: 14.0.25123-3
wheel:          0.31.1-py36_0
wincertstore:   0.2-py36h7fe50ca_0


Proceed ([y]/n)?

In case you need to install more than one package in an environment, it is possible to run conda install only once, passing the names of the packages. To illustrate that, let’s install numpy, scipy, and matplotlib, basic packages for numerical computation in the root base environment:

(envpython) C:\Users\IEUser>deactivate

(base) C:\Users\IEUser>conda install numpy scipy matplotlib
Solving environment: done

Package Plan

  environment location: C:\Users\IEUser\Miniconda3

  added / updated specs:
    - matplotlib
    - numpy
    - scipy

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
libpng-1.6.34              |       h79bbb47_0         1.3 MB
mkl_random-1.0.1           |   py37h77b88f5_1         267 KB
intel-openmp-2019.0        |              117         1.7 MB
qt-5.9.6                   |   vc14h62aca36_0        92.5 MB
matplotlib-2.2.3           |   py37hd159220_0         6.5 MB
tornado-5.1                |   py37hfa6e2cd_0         668 KB
pyqt-5.9.2                 |   py37ha878b3d_0         4.6 MB
pytz-2018.5                |           py37_0         232 KB
scipy-1.1.0                |   py37h4f6bf74_1        13.5 MB
jpeg-9b                    |       hb83a4c4_2         313 KB
python-dateutil-2.7.3      |           py37_0         260 KB
numpy-base-1.15.1          |   py37h8128ebf_0         3.9 MB
numpy-1.15.1               |   py37ha559c80_0          37 KB
mkl_fft-1.0.4              |   py37h1e22a9b_1         120 KB
kiwisolver-1.0.1           |   py37h6538335_0          61 KB
pyparsing-2.2.0            |           py37_1          96 KB
cycler-0.10.0              |           py37_0          13 KB
freetype-2.9.1             |       ha9979f8_1         470 KB
icu-58.2                   |       ha66f8fd_1        21.9 MB
sqlite-3.24.0              |       h7602738_0         899 KB
sip-4.19.12                |   py37h6538335_0         283 KB
------------------------------------------------------------
                                       Total:       149.5 MB

The following NEW packages will be INSTALLED:

blas:            1.0-mkl
cycler:          0.10.0-py37_0
freetype:        2.9.1-ha9979f8_1
icc_rt:          2017.0.4-h97af966_0
icu:             58.2-ha66f8fd_1
intel-openmp:    2019.0-117
jpeg:            9b-hb83a4c4_2
kiwisolver:      1.0.1-py37h6538335_0
libpng:          1.6.34-h79bbb47_0
matplotlib:      2.2.3-py37hd159220_0
mkl:             2019.0-117
mkl_fft:         1.0.4-py37h1e22a9b_1
mkl_random:      1.0.1-py37h77b88f5_1
numpy:           1.15.1-py37ha559c80_0
numpy-base:      1.15.1-py37h8128ebf_0
pyparsing:       2.2.0-py37_1
pyqt:            5.9.2-py37ha878b3d_0
python-dateutil: 2.7.3-py37_0
pytz:            2018.5-py37_0
qt:              5.9.6-vc14h62aca36_0
scipy:           1.1.0-py37h4f6bf74_1
sip:             4.19.12-py37h6538335_0
sqlite:          3.24.0-h7602738_0
tornado:         5.1-py37hfa6e2cd_0
zlib:            1.2.11-h8395fce_2

Proceed ([y]/n)?

Now that you’ve covered how to search and install packages, let’s see how to update and remove them using Conda.

Updating and Removing Packages

Sometimes, when new packages are released, you need to update them. To do so, you may run conda update <package name>. In case you wish to update all the packages within one environment, you should activate the environment and run conda update --all.

To remove a package, you can run conda remove <package name>. For example, this is how you remove numpy from the root base environment:

(base) C:\Users\IEUser>conda remove numpy
Solving environment: done

Package Plan

  environment location: C:\Users\IEUser\Miniconda3

  removed specs:
    - numpy

The following packages will be REMOVED:

matplotlib: 2.2.3-py37hd159220_0
mkl_fft:    1.0.4-py37h1e22a9b_1
mkl_random: 1.0.1-py37h77b88f5_1
numpy:      1.15.1-py37ha559c80_0
scipy:      1.1.0-py37h4f6bf74_1

Proceed ([y]/n)?

It’s worth noting that when you remove a package, all packages that depend on it are also removed.

Using Channels

Sometimes, you won’t find the packages you want to install on the default channels configured by the installer. For example, this is how you install pytorch, another machine learning package:

(base) C:\Users\IEUser>conda search pytorch
Loading channels: done

PackagesNotFoundError: The following packages are not available from current channels:

pytorch


Current channels:

https://repo.anaconda.com/pkgs/main/win-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/free/win-64
https://repo.anaconda.com/pkgs/free/noarch
https://repo.anaconda.com/pkgs/r/win-64
https://repo.anaconda.com/pkgs/r/noarch
https://repo.anaconda.com/pkgs/pro/win-64
https://repo.anaconda.com/pkgs/pro/noarch
https://repo.anaconda.com/pkgs/msys2/win-64
https://repo.anaconda.com/pkgs/msys2/noarch


To search for alternate channels that may provide the conda package you&rsquo;re
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

In this case, you may search for the package here. If you search for pytorch, you’ll get the following results:

Pythorch Anaconda Search

The channel pytorch has a package named pytorch with version 0.4.1. To install a package from a specific channel you can use the -c <channel> parameter with conda install:

(base) C:\Users\IEUser>conda install -c pytorch pytorch
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3

  added / updated specs:
    - pytorch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pytorch-0.4.1              |py37_cuda90_cudnn7he774522_1       590.4 MB  pytorch

The following NEW packages will be INSTALLED:

    pytorch: 0.4.1-py37_cuda90_cudnn7he774522_1 pytorch

Proceed ([y]/n)?

Alternatively, you can add the channel, so that Conda uses it to search for packages to install. To list the current channels used, you can run conda config --get channels:

(base) C:\Users\IEUser>conda config --get channels
--add channels 'defaults'   # lowest priority

(base) C:\Users\IEUser>

The Miniconda installer includes only the defaults channels. When more channels are included, it is necessary to set the priority of them to determine from which channel a package will be installed in case it is available from more than one channel.

To add a channel with the lowest priority to the list, you should run conda config --append channels <channel name>. To add a channel with the highest priority to the list, you should run conda config --prepend channels <channel name>. It is recommended to add new channels with low priority, to keep using the default channels prior to the others. So, alternatively, you can install pytorch, adding the pytorch channel and running conda install pytorch:

(base) C:\Users\IEUser>conda config --append channels pytorch

(base) C:\Users\IEUser>conda config --get channels
--add channels 'pytorch'   # lowest priority
--add channels 'defaults'   # highest priority

(base) C:\Users\IEUser>conda install pytorch
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3

  added / updated specs:
    - pytorch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pytorch-0.4.1              |py37_cuda90_cudnn7he774522_1       590.4 MB  pytorch

The following NEW packages will be INSTALLED:

    pytorch: 0.4.1-py37_cuda90_cudnn7he774522_1 pytorch

Proceed ([y]/n)?

Not all packages are available on Conda channels. However, this is not a problem, since you also can use pip to install packages inside Conda environments. Let’s see how to do this.

Using pip Inside Conda Environments

Sometimes, you may need pure Python packages and, generally, these packages are not available on Conda’s channels. For example, if you search for unipath, a package to deal with file paths in Python, Conda won’t be able to find it.

You could search for the package here and use another channel to install it. However, since unipath is a pure Python package, you could use pip to install it, as you would do on a regular Python setup. The only difference is that you should use pip installed by the Conda package pip. To illustrate that, let’s create a new environment called newproject. As mentioned before, you can do this running conda create:

conda create --name newproject

Next, to have pip installed, you should activate the environment and install the Conda package pip:

(base) C:\Users\IEUser>conda activate newproject

(newproject) C:\Users\IEUser>conda install pip
Solving environment: done

## Package Plan ##

  environment location: C:\Users\IEUser\Miniconda3\envs\newproject

  added / updated specs:
    - pip


The following NEW packages will be INSTALLED:

    certifi:        2018.8.24-py37_1
    pip:            10.0.1-py37_0
    python:         3.7.0-hea74fb7_0
    setuptools:     40.2.0-py37_0
    vc:             14-h0510ff6_3
    vs2015_runtime: 14.0.25123-3
    wheel:          0.31.1-py37_0
    wincertstore:   0.2-py37_0

Proceed ([y]/n)?

Finally, use pip to install the package unipath:

(newproject) C:\Users\IEUser>pip install unipath
Collecting unipath
Installing collected packages: unipath
Successfully installed unipath-1.1
You are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

(newproject) C:\Users\IEUser>

After installation, you can list the installed packages with conda list and check that Unipath was installed using pip:

(newproject) C:\Users\IEUser>conda list
# packages in environment at C:\Users\IEUser\Miniconda3\envs\newproject:
#
# Name                    Version                   Build  Channel
certifi                   2018.8.24                py37_1
pip                       10.0.1                   py37_0
python                    3.7.0                hea74fb7_0
setuptools                40.2.0                   py37_0
Unipath                   1.1                       <pip>
vc                        14                   h0510ff6_3
vs2015_runtime            14.0.25123                    3
wheel                     0.31.1                   py37_0
wincertstore              0.2                      py37_0

  (newproject) C:\Users\IEUser>

It’s also possible to install packages from a version control system (VCS) using pip. For example, let’s install supervisor, version 4.0.0dev0, available in a Git repository. As Git is not installed in the newproject environment, you should install it first:

(newproject) C:\Users\IEUser> conda install git

Then, install supervisor, using pip to install it from the Git repository:

(newproject) pip install -e git://github.com/Supervisor/supervisor@abef0a2be35f4aae4a4edeceadb7a213b729ef8d#egg=supervisor

After the installation finishes, you can see that supervisor is listed in the installed packages list:

(newproject) C:\Users\IEUser>conda list
#
# Name                    Version                   Build  Channel
certifi                   2018.8.24                py37_1
git                       2.18.0               h6bb4b03_0
meld3                     1.0.2                     <pip>
pip                       10.0.1                   py37_0
python                    3.7.0                hea74fb7_0
setuptools                40.2.0                   py37_0
supervisor                4.0.0.dev0                <pip>

... (more)

Now that you know the basics of using environments and managing packages with Conda, let’s create a simple machine learning example to solve a classic problem using a neural network.

A Simple Machine Learning Example

In this section, you’ll set up the environment using Conda and train a neural network to function like an XOR gate.

An XOR gate implements the digital logic exclusive OR operation, which is widely used in digital systems. It takes two digital inputs, that can be equal to 0, representing a digital false value or 1, representing a digital true value and outputs 1 (true) if the inputs are different or 0 (false), if the inputs are equal. The following table (referred as a truth table in the digital systems terminology) summarizes the XOR gate operation:

Input A Input B Output: A XOR B
0 0 0
0 1 1
1 0 1
1 1 0

The XOR operation can be interpreted as a classification problem, given that it takes two inputs and should classify them in one of two classes represented by 0 or 1, depending on whether the inputs are equal to each other or different from one another.

It is commonly used as a first example to train a neural network because it is simple and, at the same time, demands a nonlinear classifier, such as a neural network. The neural network will use only the data from the truth table, without knowledge about where it came from, to “learn” the operation performed by the XOR gate.

To implement the neural network, let’s create a new Conda environment, named nnxor:

(base) C:\Users\IEUser>conda create --name nnxor

Then, let’s activate it and install the package keras:

(base) C:\Users\IEUser>conda activate nnxor

(nnxor) C:\Users\IEUser>conda install keras

keras is a high-level API that makes easy-to-implement neural networks on top of well-known machine learning libraries, such as TensorFlow.

You’ll train the following neural network to act as an XOR gate:

Neural Network to Implement XOR

The network takes two inputs, A and B, and feeds them to two neurons, represented by the big circles. Then, it takes the outputs of these two neurons and feeds them to an output neuron, which should provide the classification according to the XOR truth table.

In brief, the training process consists of adjusting the values of the weights w_1 until w_6, so that the output is consistent with the XOR truth table. To do so, input examples will be fed, one at a time, the output will be calculated according to current values of the weights and, by comparing the output with the desired output, given by the truth table, the values of the weights will be adjusted in a step-by-step process.

To organize the project, you’ll create a folder named nnxor within Windows user’s folder (C:\Users\IEUser) with a file named nnxor.py to store the Python program to implement the neural network:

nnxor.py file

In the nnxor.py file, you’ll define the network, perform the training, and test it:

import numpy as np
np.random.seed(444)

from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD

X = np.array([[0, 0], 
              [0, 1],
              [1, 0],
              [1, 1]])
y = np.array([[0], [1], [1], [0]])

model = Sequential()
model.add(Dense(2, input_dim=2))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))

sgd = SGD(lr=0.1)
model.compile(loss='mean_squared_error', optimizer=sgd)

model.fit(X, y, batch_size=1, epochs=5000)

if __name__ == '__main__':
  print(model.predict(X))

First, you import numpy, initialize a random seed, so that you can reproduce the same results when running the program again, and import the keras objects you’ll use to build the neural network.

Then, you define an X array, containing the 4 possible A-B sets of inputs for the XOR operation and a y array, containing the outputs for each of the sets of inputs defined in X.

The next five lines define the neural network. The Sequential() model is one of the models provided by keras to define a neural network, in which the layers of the network are defined in a sequential way. Then you define the first layer of neurons, composed of two neurons, fed by two inputs, defining their activation function as a sigmoid function in the sequence. Finally, you define the output layer composed of one neuron with the same activation function.

The following two lines define the details about the training of the network. To adjust the weights of the network, you’ll use the Stochastic Gradient Descent (SGD) with the learning rate equal to 0.1, and you’ll use the mean squared error as a loss function to be minimized.

Finally, you perform the training by running the fit() method, using X and y as training examples and updating the weights after every training example is fed into the network (batch_size=1). The number of epochs represents the number of times the whole training set will be used to train the neural network.

In this case, you’re repeating the training 5000 times using a training set containing 4 input-output examples. By default, each time the training set is used, the training examples are shuffled.

On the last line, after the training process has finished, you print the predicted values for the 4 possible input examples.

By running this script, you’ll see the evolution of the training process and the performance improvement as new training examples are fed into the network:

(nnxor) C:\Users\IEUser>cd nnxor

(nnxor) C:\Users\IEUser\nnxor>python nnxor.py
Using TensorFlow backend.
Epoch 1/5000
2018-09-16 09:49:05.987096: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-09-16 09:49:05.993128: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
4/4 [==============================] - 0s 39ms/step - loss: 0.2565
Epoch 2/5000
4/4 [==============================] - 0s 0us/step - loss: 0.2566
Epoch 3/5000
4/4 [==============================] - 0s 0us/step - loss: 0.2566
Epoch 4/5000
4/4 [==============================] - 0s 0us/step - loss: 0.2566
Epoch 5/5000
4/4 [==============================] - 0s 0us/step - loss: 0.2566
Epoch 6/5000
4/4 [==============================] - 0s 0us/step - loss: 0.2566

After the training finishes, you can check the predictions the network gives for the possible input values:

Epoch 4997/5000
4/4 [==============================] - 0s 0us/step - loss: 0.0034
Epoch 4998/5000
4/4 [==============================] - 0s 0us/step - loss: 0.0034
Epoch 4999/5000
4/4 [==============================] - 0s 0us/step - loss: 0.0034
Epoch 5000/5000
4/4 [==============================] - 0s 0us/step - loss: 0.0034
[[0.0587215 ]
 [0.9468337 ]
 [0.9323144 ]
 [0.05158457]]

As you defined X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]), the expected output values are 0, 1, 1, and 0, which is consistent with the predicted outputs of the network, given you should round them to obtain binary values.

Where To Go From Here

Data science and machine learning applications are emerging in the most diverse areas, attracting more people. However, setting up an environment for numerical computation can be a complicated task, and it’s common to find users having trouble in data science workshops, especially when using Windows.

In this article, you’ve covered the basics of setting up a Python numerical computation environment on a Windows machine using the Anaconda Python distribution.

Now that you have a working environment, it’s time to start working with some applications. Python is one of the most used languages for data science and machine learning, and Anaconda is one of the most popular distributions, used in various companies and research laboratories. It provides several packages to install libraries that Python relies on for data acquisition, wrangling, processing, and visualization.

Fortunately there are a lot of tutorials about these libraries available at Real Python, including the following:

Also, if you’d like a deeper understanding of Anaconda and Conda, check out the following links:

Comments

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>