An Introduction to Image Processing with Python

Astronomical Laboratory 29:137, Fall 2013
by Philip Kaaret



Introduction

In order to gain a deeper understanding of how astronomical images are processed, we will be writing our own software to do the processing.  To minimize the effort in doing this, we will use the computer language Python.  Python is an interactive and extensible programming language.  It has extensions to load images from FITS files, process images using compact code without loops, and display images on the screen.  During this week's lab, you will become familiar with the basics of python and some of the particulars about handling images in python.

Python is free software.  So, if you want to work on learning python outside of class, you can obtain it via the links on the class web page.  Note that we will use Python 2.7 and not Python 3 (this is because matplotlib only recently added Python 3 support).  We will use a number of python packages to process images (as numerical arrays), read FITS files, and display data.  Anaconda includes python and the other needed packages.  If you use straight python (e.g. under Linux), you will need to install these packages separately (which is very easy for distributions like Ubuntu).  As a reminder, when we start using telescopes on the roof and elsewhere for astronomical observations and measurements, each lab team will need to have the software installed on a laptop computer.  It would be good to identify a (windows) laptop in the next few days, install the software, and try it out.

First example program - reading and plotting an image

To get started with python, let's look at a simple program to read in an image in FITS format and plot it on the screen.  The program is here: showimage.py.  You should download the program and then load it into a text editor of your choice.

The first line is a comment about what the program does.  It is good practice to liberally comment your code.

The second set of lines (3-7) tells python that we will be using a bunch of extensions.  Extensions are pre-written modules to do various tasks.  numpy, or 'Numerical Python' is a package to define mathematical constructions like matrices (or arrays) and make it easy to handle them in python.  matplotlib is a scientific plotting package.  pyfits is a package to read and write FITS files.

The first thing that we actually do in the program is on line 12, where we read in a FITS file.  The program currently reads in a file called 'testpattern1.fit'.  You should edit this line to read in one of the files that you made in the last lab.  Note if you are running python in the same directory as your data, you need only the file name.  Otherwise, you will need to add the file's directory.

On line 15, we copy the image data from the FITS file to a variable called img1.  img1 is actually a numpy array and later on we'll discuss how to do math operations on numpy arrays (it's easy).

The next set of lines, 18-21, plot the image.  Back in the import stuff, we imported the plotting package "as plt".  This allows us to use the shorthand "plt" to refer to the plotting package.  All four lines call routines in the matplotlib.pyplot package using plt as shorthand.  The first line sets the plotting to interactive mode; this just means that the program continues on after making the plot rather than waiting for the user to look at the plot before continuing.  The second line sets up the color map.  Since the camera is black and white, we use a gray scale map to make realistic looking images.  The third line uses the "imshow" routine to make a plot of the image data using the selected color map.  The four line actually puts the plot up on the screen.

Once you have edited the program to read in your file (edit line 12, see above), go ahead and run the program.  Start up ipython.  ipython is an interactive, shell interface to python that uses unix-like commands.  To setup Anaconda python for the plotting interface that we use, type '%pylab' without the quotes.  You will want to do this every time you start ipython or the program will freeze when you make plots.  If you are not running Anaconda, this command probably isn't necessary.  Some useful commands are: 'ls' - to list the contents of the current directory, 'cd' - to change the current directory, and 'pwd' - to print the current directory.  Use cd to move to the directory where showimage.py is and type 'run showimage' without the quotes and press Enter.  ipython should pop up a displaying the image.  Note that the window is interactive: one can zoom, pan, or save the image in a variety of formats.  If you have problems running the program, make sure that you are in the right directory and that you have edited the program to look for the FITS file in the correct place.  Make sure the program works before proceeding.  Record in your lab notebook, the procedures that you followed to display the image including the directories where showimage.py and your image are saved.  Remember to start on a new page and write today's date and a short description of today's activities (like 'Intro to Python') at the top of the page.

FITS files

After you have run the program, it should leave you at the ipython command prompt.  Now type 'print h[0].header'.   The ipython screen will be filled with a bunch of information encoded in the file.  FITS is a "self-documenting" file format.  In addition to the image data, the file contains 'meta-data' describing the image parameters and how and when the image was obtained.  These were written by CCDOps when you saved the file in the last lab.  The meta-data consists of a keyword name, a value, and an optional comment.

For example, the NAXIS keyword gives the number of axes, it should be 2.  The NAXIS1 and NAXIS2 keywords give the number of pixels along the two axes of the image.  Record these in your lab notebook.  If you saved a full image (as instructed in the last lab), these should match the number of pixels in the CCD.  Check the CCD documentation to see if that is true and record in your notebook.  The CCD-TEMP keyword records the CCD temperature when the image was obtained.  Record that in your notebook and note the units.  Check the TELESCOP keyword, it should contain what you wrote for 'Telescope Description'.  Write in your notebook if TELESCOP agrees with FOCALLEN and what you wrote in your notebook about the lens used for the image.  As long as you pay attention while taking the data, the meta-data in FITS files provides a nice way to check that you are working with the correct data files.

Second example program - difference image

Now download diffimage.py and load it into your editor.  Modify lines 12 and 13 to read in the two images of the test pattern that you took in the first lab.  This programs reads in two images, extracts the image data, calculates a new array which is the difference in the two images, and then plots all three images.  The difference image is calculated on line 33.  Note how simply the code is.  The other new element in the code is the use of plt.figure(#) to pop-up multiple windows on the screen.  Note that only one plt.show() is needed to display all three images.

Take a look at your three images.  If the two images that you took with the CCD were under exactly identical conditions, then the test pattern should disappear in the difference image.  Did this happen?  Write down your conclusions and also any thoughts about how to improve the subtraction in your notebook.  Also, can you think of some astronomical application for difference imaging?

Third example program - image histogram

Now we'll use python to calculate some statistics and plot a histogram of the pixel values in our images.  Get histimage.py.  The first parts should be familiar.  Edit the line that reads the FITS file to use your image file.  The new stuff starts at line 26.  The numpy/matplotlib routines to make histograms and calculate statistics need to have one-dimensional arrays as input.  In line 26, we find the size of the 2-d array img.  the img.shape routine returns two values in a 'tuple'.  Python tuples are sets of values separated by commas and sometime enclosed in parenthesis.  In line 26, nx gets set to the first value in the tuple and ny gets set to the second value.  In line 27, we make a new (1-d) array imgh.  It has the same values as the 2-d image array img, but arranged like ducks in a row.

Following this are a few lines to calculate statistics of the image values.  min, max, and mean should be self explanatory.  std gives the standard deviation of a set of data.  The standard deviation is a measure of the fluctuations of the data around the average value.  It is the square root of the average of the squared deviations from the mean, i.e.,  std = sqrt(mean((x-mean(x))**2)), where ** is Python's operator for raising a number to a power.  Record the statistics of the pixel values of your image.

Then, we plot the histogram.  A histogram is a graphical representation of the distribution of a set of values.  In this case, the values are the intensity readings from the CCD for each pixel.  If you arrange these in a 2-d array, you get an image.  A histogram discards the spatial information and only shows the distribution of the values themselves, in particular the frequency with which each value (or range of values) occurs.  The matplotlib routine hist plots histograms.  The first argument is a 1-d of values, the other arguments set the plot parameters.  The most crucial of these is 'bins' which sets the number of bins in the histogram.  By default, the bins will evenly spaced between the lowest and highest input values.

In calculating the statistics of a data set, one often prefers to discard outlier, e.g. pixels that have high values because they are saturated or 'hot' or pixels that have low values because they are damaged.  Examine the histogram of your pixel values.  Are there outliers?  In lines 41-44, we set an allowed range for 'good' pixels values (between plow and phi) and then make a new array, imghcut, keeping only the values in that range.  Adjust the values for plow and phi according to your histogram and record them.  Note that the current coordinates of the mouse cursor are displayed in the plot window when you move the mouse over the plot.  Then, run histimage.py again and record the statistics of the pixel values of your image when keeping only the 'good' pixels.

Now try Python for yourself

Spend the rest of the lab learning python.  We will be using python throughout the semester, so it is a good investment of your time.  A good way to do this is to set yourself a goal and then try to write the needed code.  For example, you could add some lines to the histogram plotting program that overplot the mean value on the histogram.  After getting that to work, you might try also plotting the mean ± the standard deviation (maybe with dashed lines).  That will give you a nice graphical illustration of how the standard deviation is a measure of the fluctuations of the data around the average value.  Later on, we will use the median (the value such that half the data lies above and half lies below) in addition to the mean.  Try calculating the median and plotting it and see how it compares to the mean.  In the next lab, you will need to make plots of values.  You might try making an array of the integers from 1 to 10, finding their squares, and then plotting the square versus the value.

Resources for learning Python

The resources below are a good place to start.  If you want to do something specific in python, a good first step is to search on the internet, e.g. try searching for 'python median'.

Interactive, online tutorial: http://www.codecademy.com/tracks/python
PyFITS documentation: http://pythonhosted.org/pyfits/
NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial
Matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html
Python official web site: http://www.python.org/
PyFits official site: http://www.stsci.edu/institute/software_hardware/pyfits
Question and answer site for programming: http://stackoverflow.com