Introduction to Python
This tutorial assumes that the reader of this tutorial is a scientist who is new to Python and programming and wants to process, analyse and plot data and beyond. It also assumes that the reader will use Python 3 (the most current at time of writing). Programming basics will not be discussed here (e.g. interpreted vs. compiled languages, stuff like that), since this is more of a 'quick start' guide.
Installation Requirements
Creating the first program.
Here, we could skip straight to writing code. However, we will first do a few 'housekeeping' steps which will help in building good habits for when you have several different programming projects and more complex code.
Creating a project folder
Create a folder on your hard drive in which you will store all of your projects.
It helps to have all of them in one place. You can call it what you want. Let's
say it is called python_programs
. Then create a folder inside
python_programs
called MyFirstScript
. Navigate into this folder using your
command line.
Initialise a git repository
An explanation of using git
is outside the scope of this tutorial. There is an
introduction to git here.If you already know how it works,
initialise the repository now. It is good practice to create a git repository
and regularly stage and commit changes to your files, so that you can follow
what you have done over time. Being able to do this is very useful for more
complex projects and vital when working in a team.
Create a virtual environment for the project
Virtual environments can be thought of as separate compartments in which Python libraries or packages (extra Python code that you can import into your own) are stored. Often, these packages may depend on each other. If a given library has more than one version available (e.g. 1.0, 1.1), it may behave differently. This means that a packages must be used with the right versions of the packages that it depends upon. Sometimes, two libraries may need the same package, but different versions. Sometimes, you might want to update a package, but forget that this might break another package that depends on the old version of the one being updated. Virtual environments help you to manage these potential conflicts.
For more about virtual environments, have a look here.
Creating a Python script
Now, finally onto creating a program! If you have a text editor installed, open
it. Create new file called HelloWorld.py
(because 'Hello World' is
customary) and save it in the MyFirstScript
directory. Inside this file, type:
print('Hello World')
Save the file, and go back to command line. Then type python HelloWorld.py
and
hit return
. You should get a response like this:
python HelloWorld.py
Hello World
And there you go, you can program in Python. Take a break!
Running a Jupyter Notebook
A very useful program for running Python (and R, Julia) code alongside plots and notes it the Jupyter Notebook (standalone or included in Jupyter Lab). This platform is great for learning, collaborating and sharing data analysis (but it is not without its drawbacks: one study in 2019 noted that "24% of 863,878 publicly available Jupyter notebooks on GitHub could be successfully re-executed, and only 4% produced the same results" DOI). The reasons for this are that it allows the user to execute code out of sequence, resulting it being hard to track how the values of variables are set. However, it is particularly useful for loading and working with large data sets, or operations which may take a while to run. Providing text right next to written explanations is also a bonus. Jupyter Notebooks can be interconverted between data formats, including Markdown. An introduction to Jupyter notebooks is given here
Other options for running scripts
In the above section, we ran a Python script by telling the Python interpreted
to run it in the command line. This way is one of many. We can also start a
Python interpreter (something which takes in Python code and runs it) by simply
typing python
in the command line. You can then type in Python code (like the
'HelloWorld' example above), which will be executed when you press return
.
If you downloaded anaconda (the 'full' version of miniconda), the integrated
development environment (IDE) Spyder was probably installed on your computer.
You can try opening it through command line (spyder
), or opening it through
the anaconda launcher. Other options are VSCode
or VSCodium.
Tips
- Type everything out yourself. This may seem pointless if you can copy-paste, but it really helps with understanding examples early on when you're learning.
- If something doesn't work, try to fix it. Make a change and re-run.
- The
print
function is your friend (in python, as least)! - The good thing about programming, in contrast to experimental work, is that feedback can be practically instantaneous, so the learning cycle can be equally fast.
- If you really can't figure out how to do something, search engines help a lot
(e.g. type
python how to slice numpy array
). - The forum Stack Overflow often comes at the top of the list, and contains many good answers on many different questions (but beware... it's the internet).
- Often, you are not the first person ever to have the problem!
- ChatGPT is now a thing. It is a useful resource, but don't let it do all of the thinking for you!
- Making code easy to read is important both for yourself and others.
- Well-written Python code can state clearly enough what is going on
- stick to coding standards for the packages you are using
- commenting on your code goes a long way to improving understanding of it
- you will not write perfect code the first time. Write to get the code working, then re-factor to make it more readable.
- however, we are scientists, not software engineers - what we want is reproducible and transparent data processing. It doesn't have to be super-efficient or be the perfect code for the job.
- Keep learning!. Everything is connected!
Using Packages
Using numpy
Using pandas
Plotting
SciPy, Sklearn, and more
The RDKit
Loading text data
TBC
Writing data to files
TBC