Skip to content

Introduction to Python

This tutorial assumes that the reader of this tutorial is a scientist who is new to Python and programming and wants to process, analyse and plot data and beyond. It also assumes that the reader will use Python 3 (the most current at time of writing). Programming basics will not be discussed here (e.g. interpreted vs. compiled languages, stuff like that), since this is more of a 'quick start' guide.

Installation Requirements

Creating the first program.

Here, we could skip straight to writing code. However, we will first do a few 'housekeeping' steps which will help in building good habits for when you have several different programming projects and more complex code.

Creating a project folder

Create a folder on your hard drive in which you will store all of your projects. It helps to have all of them in one place. You can call it what you want. Let's say it is called python_programs. Then create a folder inside python_programs called MyFirstScript. Navigate into this folder using your command line.

Initialise a git repository

An explanation of using git is outside the scope of this tutorial. There is an introduction to git here.If you already know how it works, initialise the repository now. It is good practice to create a git repository and regularly stage and commit changes to your files, so that you can follow what you have done over time. Being able to do this is very useful for more complex projects and vital when working in a team.

Create a virtual environment for the project

Virtual environments can be thought of as separate compartments in which Python libraries or packages (extra Python code that you can import into your own) are stored. Often, these packages may depend on each other. If a given library has more than one version available (e.g. 1.0, 1.1), it may behave differently. This means that a packages must be used with the right versions of the packages that it depends upon. Sometimes, two libraries may need the same package, but different versions. Sometimes, you might want to update a package, but forget that this might break another package that depends on the old version of the one being updated. Virtual environments help you to manage these potential conflicts.

For more about virtual environments, have a look here.

Creating a Python script

Now, finally onto creating a program! If you have a text editor installed, open it. Create new file called HelloWorld.py (because 'Hello World' is customary) and save it in the MyFirstScript directory. Inside this file, type:

print('Hello World')

Save the file, and go back to command line. Then type python HelloWorld.py and hit return. You should get a response like this:

python HelloWorld.py
Hello World

And there you go, you can program in Python. Take a break!

Running a Jupyter Notebook

A very useful program for running Python (and R, Julia) code alongside plots and notes it the Jupyter Notebook (standalone or included in Jupyter Lab). This platform is great for learning, collaborating and sharing data analysis (but it is not without its drawbacks: one study in 2019 noted that "24% of 863,878 publicly available Jupyter notebooks on GitHub could be successfully re-executed, and only 4% produced the same results" DOI). The reasons for this are that it allows the user to execute code out of sequence, resulting it being hard to track how the values of variables are set. However, it is particularly useful for loading and working with large data sets, or operations which may take a while to run. Providing text right next to written explanations is also a bonus. Jupyter Notebooks can be interconverted between data formats, including Markdown. An introduction to Jupyter notebooks is given here

Other options for running scripts

In the above section, we ran a Python script by telling the Python interpreted to run it in the command line. This way is one of many. We can also start a Python interpreter (something which takes in Python code and runs it) by simply typing python in the command line. You can then type in Python code (like the 'HelloWorld' example above), which will be executed when you press return.

If you downloaded anaconda (the 'full' version of miniconda), the integrated development environment (IDE) Spyder was probably installed on your computer. You can try opening it through command line (spyder), or opening it through the anaconda launcher. Other options are VSCode or VSCodium.

Tips

  • Type everything out yourself. This may seem pointless if you can copy-paste, but it really helps with understanding examples early on when you're learning.
  • If something doesn't work, try to fix it. Make a change and re-run.
  • The print function is your friend (in python, as least)!
  • The good thing about programming, in contrast to experimental work, is that feedback can be practically instantaneous, so the learning cycle can be equally fast.
  • If you really can't figure out how to do something, search engines help a lot (e.g. type python how to slice numpy array).
  • The forum Stack Overflow often comes at the top of the list, and contains many good answers on many different questions (but beware... it's the internet).
  • Often, you are not the first person ever to have the problem!
  • ChatGPT is now a thing. It is a useful resource, but don't let it do all of the thinking for you!
  • Making code easy to read is important both for yourself and others.
  • Well-written Python code can state clearly enough what is going on
  • stick to coding standards for the packages you are using
  • commenting on your code goes a long way to improving understanding of it
  • you will not write perfect code the first time. Write to get the code working, then re-factor to make it more readable.
  • however, we are scientists, not software engineers - what we want is reproducible and transparent data processing. It doesn't have to be super-efficient or be the perfect code for the job.
  • Keep learning!. Everything is connected!

Using Packages

Packages guide

Using numpy

Numpy guide

Using pandas

Pandas guide

Plotting

Matplotlib guide

SciPy, Sklearn, and more

Sklearn guide

The RDKit

The RDKit guide

Loading text data

TBC

Writing data to files

TBC