An Introduction to Image Processing with Python

Astronomical Laboratory ASTR:4850, Spring 2018
by Philip Kaaret



Introduction

In order to gain a deeper understanding of how astronomical images are processed, we will be writing our own software to do the processing. To minimize the effort in doing this, we will use the computer language Python. Python is an interactive and extensible programming language. It has extensions to load images from FITS files, process images using compact code without loops, and display images on the screen.  During this week's lab, you will become familiar with the basics of python and some of the particulars about handling images in python. The following is written assuming no prior knowledge of Python. If you are familiar with Python and have a preferred editor and environment, please use those.

Python is free software. So, if you want to work on learning python outside of class, you can obtain it via the links on the class web page. Note that we will use Python 2.7 and not Python 3 (this is because your professor is a fossil and doesn't like to change). We will use a number of python packages to process images (as numerical arrays), read FITS files, and display data. Anaconda includes python and the other needed packages. If you use straight python (e.g. under Linux), you will need to install these packages separately (which is very easy for distributions like Ubuntu). If you have a mac, you're on your own.

You'll be using python heavily during the semester, so it would be good to install it on a computer that you have access to outside of class (like a laptop if you own one). To install Anaconda, use this link https://www.anaconda.com/download/ .

The machines in the lab run 32-bit windows. Once you get python installed, try running ipython. After you get to the ipython command line, you should type
cd h:/
to change the current directory to where your files get stored.

The ipython command line uses unix commands.  Some useful commands are: 'ls' - to list the contents of the current directory, 'cd' - to change the current directory, and 'pwd' - to print the current directory.

First example program - reading and plotting an image

To get started with python, let's look at a simple program to read in an image in FITS format and plot it on the screen. The program is here: showimage.py. You should download the program and then load it into a text editor of your choice. Notepad messes up the formatting. Wordpad is ok. Gedit and Notepad++ are pretty nice, but you may need to install them. People who like wasting space on their screen use atom. Real geeks use emacs and its author is a saint. vi sucks out your soul; the extension for emacs that makes it behave like vi is evil.

Now that you have showimage.py loaded in an editor of your choice, let's have a look at it. The first line is a comment about what the program does. It is good practice to liberally comment your code.

The second set of lines (4-7) tells python that we will be using a bunch of extensions. Extensions are pre-written modules to do various tasks. numpy, or 'Numerical Python' is a package to define mathematical constructions like matrices (or arrays) and make it easy to handle them in python. matplotlib is a scientific plotting package. pyfits is a package to read and write FITS files.

The first thing that we actually do in the program is on line 11, where we open a FITS file. The program currently opens a file called 'platescale20171215.fit'. You should edit this line to read in one of the files that you made in the last lab. Note if you are running python in the same directory as your data, you need only the file name. Otherwise, you will need to add the file's directory.

On line 14, we copy the image data from the FITS file to a variable called img1. img1 is actually a numpy array and later on we'll discuss how to do math operations on numpy arrays (it's easy).

The next set of lines, 17-20, plot the image. Back in the import stuff, we imported the plotting package matplotlib.pyplot "as plt". This allows us to use the shorthand "plt" to refer to the plotting package. All four lines call routines in the matplotlib.pyplot package using plt as shorthand. The first line sets the plotting to interactive mode; this just means that the program continues on after making the plot rather than stopping for the user to look at the plot and then close it. The second line sets up the color map. Since the camera is black and white, we use a gray scale map to make realistic looking images. The third line uses the "imshow" routine to make a plot of the image data using the selected color map. The fourth line actually puts the plot up on the screen.

Once you have edited the program to read in your file (edit line 11, see above), go ahead and run the program. Start up ipython. ipython is an interactive, shell interface to python that uses unix-like commands. Use cd to move to the directory where showimage.py is and type 'run showimage' without the quotes and press Enter. ipython should pop up a window that displays the image. Note that the window is interactive: one can zoom, pan, or save the image in a variety of formats. If you have problems running the program, make sure that you are in the right directory and that you have edited the program to look for the FITS file in the correct place. Make sure the program works before proceeding. Write down the procedures that you followed to display the image including the directories where showimage.py and your image are saved.

If Anaconda freezes when you make plots, exit out of it and then %pylab when you start up ipython again. You will then want to do this every time you start ipython.

FITS files

After you have run the program, it should leave you at the ipython command prompt. Now type 'print h[0].header'. The ipython screen will be filled with a bunch of information encoded in the file. FITS is a "self-documenting" file format. In addition to the image data, the file contains 'meta-data' describing the image parameters and how and when the image was obtained.  These were written by CCDOps when you saved the file in the last lab. The meta-data consists of a keyword name, a value, and an optional comment.

For example, the NAXIS keyword gives the number of axes, it should be 2. The NAXIS1 and NAXIS2 keywords give the number of pixels along the two axes of the image. If you saved a full image (as instructed in the last lab), these should match the number of pixels in the CCD.  Check the CCD documentation to see if that is true. The CCD-TEMP keyword records the CCD temperature when the image was obtained. As long as you pay attention while taking the data, the meta-data in FITS files provides a nice way to check that you are working with the correct data files.

Second example program - difference image

Now download diffimage.py and load it into your editor. Modify lines 11 and 12 to read in the two images of the test pattern that you took in the first lab.  This programs reads in two images, extracts the image data, calculates a new array which is the difference in the two images, and then plots all three images. The difference image is calculated on line 32. Note how simple the code is. The other new element in the code is the use of plt.figure(#) to plot multiple windows on the screen. Note that only one plt.show() is needed to display all three images.

Take a look at your three images. If the two images that you took with the CCD were under exactly identical conditions, then the test pattern should disappear in the difference image. Did this happen?  Write down your conclusions and also any thoughts about how to improve the subtraction in your notebook. Also, can you think of some astronomical application for difference imaging?

Third example program - image histogram

Now we'll use python to calculate some statistics and plot a histogram of the pixel values in our images. Get histimage.py. The first parts should be familiar. Edit the line that reads the FITS file to use your image file. After setting up figure 1 on line 21, we clear the figure on line 22 using plt.clf(). This resets anything that was done to the plotting window.

The interesting stuff starts at line 26. The numpy/matplotlib routines to make histograms and calculate statistics need to have one-dimensional arrays as input. In line 26, we find the size of the 2-d array img. the img.shape routine returns two values in a 'tuple'. Python tuples are sets of values separated by commas and sometime enclosed in parenthesis. In line 26, nx gets set to the first value in the tuple and ny gets set to the second value. In line 27, we make a new (1-d) array imgh. It has the same values as the 2-d image array img, but arranged like ducks in a row.

Following this are a few lines to calculate statistics of the image values. min, max, and mean should be self explanatory. std gives the standard deviation of a set of data. The standard deviation is a measure of the fluctuations of the data around the average value. It is the square root of the average of the squared deviations from the mean, i.e.,  std = sqrt(mean((x-mean(x))**2)), where ** is Python's operator for raising a number to a power.

Then, we plot the histogram. A histogram is a graphical representation of the distribution of a set of values. In this case, the values are the intensity readings from the CCD for each pixel. If you arrange these in a 2-d array, you get an image. A histogram discards the spatial information and only shows the distribution of the values themselves, in particular the frequency with which each value (or range of values) occurs.  The matplotlib routine hist plots histograms. The first argument is a 1-d array of values, the other arguments set the plot parameters. The most crucial of these is 'bins' which sets the number of bins in the histogram. By default, the bins will evenly spaced between the lowest and highest input values.

In calculating the statistics of a data set, one often prefers to discard outliers, e.g. pixels that have high values because they are saturated or 'hot' or pixels that have low values because they are damaged. Examine the histogram of your pixel values. Are there outliers?  In lines 41-44, we set an allowed range for 'good' pixels values (between plow and phi) and then make a new array, imghcut, keeping only the values in that range. Adjust the values for plow and phi according to your histogram. Note that the current coordinates of the mouse cursor are displayed in the plot window when you move the mouse over the plot. Then, run histimage.py again and see the statistics of the pixel values of your image change when keeping only the 'good' pixels. If you don't have any hot pixels or dead pixels (with very low values), then you have a good CCD.

Now try Python for yourself

To encourage you to learn python, the following exercises are due at the beginning of the next lab. You should do these exercises by yourself, not with your lab partners. If you have not previously had much exposure to python, speak to an instructor if you need more time to learn enough python to finish the assignment.

1) Add code to the histogram plotting program to overplot the mean value on the histogram. After getting that to work, add more code to plotting the mean ± the standard deviation with dashed lines. This plot gives a nice graphical illustration of how the standard deviation is a measure of the fluctuations of the data around the average value. Then add code to calculate the median (the value such that half the data lies above and half lies below). Plot the median and see how it compares to the mean. Print out your code and a copy of your plot with mean, standard deviations, and median to hand in.

2) In the next lab, you will need to make plots of values.  Write code to make an array of the integers from 1 to 10, find their squares, and then plot the square versus the value. Print out your code and a copy of your plot to hand in.

3) Add some lines to diffimage.py so that all values less than zero in the difference image are set to zero. Print out your code to hand in and highlight the added lines. Note that you do not need any loops to do this (and points will be taken off if you use loops).

4) Add some lines to diffimage.py to calculate the sum of a 9x9 box pixels centered on pixel 100,100 in the difference image.  Print out your code to hand in and highlight the added lines.

Resources for learning Python

The resources below are a good place to start. If you want to do something specific in python, a good first step is to search on the internet, e.g. try searching for 'python median'.

Interactive, online tutorial: http://www.codecademy.com/tracks/python
PyFITS documentation: http://pythonhosted.org/pyfits/
NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial
Matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html
Python official web site: http://www.python.org/
PyFits official site: http://www.stsci.edu/institute/software_hardware/pyfits
Question and answer site for programming: http://stackoverflow.com