An Introduction to Image Processing with
Python
Astronomical Laboratory 29:137, Fall 2013
by Philip Kaaret
Introduction
In order to gain a deeper understanding of how astronomical images
are processed, we will be writing our own software to do the
processing. To minimize the effort in doing this, we will use
the computer language Python. Python is an
interactive and extensible programming language. It
has extensions to load images from FITS files, process images using
compact code without loops, and display images on the screen.
During this week's lab, you will become familiar with the basics of
python and some of the particulars about handling images in python.
Python is free software. So, if you want to work on learning
python outside of class, you can obtain it via the links on the
class web page. Note that we will use Python 2.7 and not
Python 3 (this is because matplotlib only recently added Python 3
support). We will use a number of python packages to process
images (as numerical arrays), read FITS files, and display
data. Anaconda includes python and the other needed
packages. If you use straight python (e.g. under Linux), you
will need to install these packages separately (which is very easy
for distributions like Ubuntu). As a reminder, when we start
using telescopes on the roof and elsewhere for astronomical
observations and measurements, each lab team will need to have the
software installed on a laptop computer. It would be good to
identify a (windows) laptop in the next few days, install the
software, and try it out.
First example program - reading and plotting an image
To get started with python, let's look at a simple program to read
in an image in FITS format and plot it on the screen. The
program is here: showimage.py. You
should download the program and then load it into a text editor of
your choice.
The first line is a comment about what the program does. It is
good practice to liberally comment your code.
The second set of lines (3-7) tells python that we will be using a
bunch of extensions. Extensions are pre-written modules to do
various tasks. numpy, or 'Numerical Python' is a package to
define mathematical constructions like matrices (or arrays) and make
it easy to handle them in python. matplotlib is a scientific
plotting package. pyfits is a package to read and write FITS
files.
The first thing that we actually do in the program is on line 12,
where we read in a FITS file. The program currently reads in a
file called 'testpattern1.fit'. You should edit this line to
read in one of the files that you made in the last lab. Note
if you are running python in the same directory as your data, you
need only the file name. Otherwise, you will need to add the
file's directory.
On line 15, we copy the image data from the FITS file to a variable
called img1. img1 is actually a numpy array and later on we'll
discuss how to do math operations on numpy arrays (it's easy).
The next set of lines, 18-21, plot the image. Back in the
import stuff, we imported the plotting package "as plt". This
allows us to use the shorthand "plt" to refer to the plotting
package. All four lines call routines in the matplotlib.pyplot
package using plt as shorthand. The first line sets the
plotting to interactive mode; this just means that the program
continues on after making the plot rather than waiting for the user
to look at the plot before continuing. The second line sets up
the color map. Since the camera is black and white, we use a
gray scale map to make realistic looking images. The third
line uses the "imshow" routine to make a plot of the image data
using the selected color map. The four line actually puts the
plot up on the screen.
Once you have edited the program to read in your file (edit line 12,
see above), go ahead and run the program. Start up
ipython. ipython is an interactive, shell interface to python
that uses unix-like commands. To setup Anaconda python for the
plotting interface that we use, type '%pylab' without the
quotes. You will want to do this every time you start ipython
or the program will freeze when you make plots. If you are not
running Anaconda, this command probably isn't necessary. Some
useful commands are: 'ls' - to list the contents of the current
directory, 'cd' - to change the current directory, and 'pwd' - to
print the current directory. Use cd to move to the directory
where showimage.py is and type 'run showimage' without the quotes
and press Enter. ipython should pop up a displaying the
image. Note that the window is interactive: one can zoom, pan,
or save the image in a variety of formats. If you have
problems running the program, make sure that you are in the right
directory and that you have edited the program to look for the FITS
file in the correct place. Make sure the program works before
proceeding. Record in your lab notebook, the procedures that
you followed to display the image including the directories where
showimage.py and your image are saved. Remember to start on a
new page and write today's date and a short description of today's
activities (like 'Intro to Python') at the top of the page.
FITS files
After you have run the program, it should leave you at the ipython
command prompt. Now type 'print h[0].header'. The
ipython screen will be filled with a bunch of information encoded in
the file. FITS is a "self-documenting" file format. In
addition to the image data, the file contains 'meta-data' describing
the image parameters and how and when the image was obtained.
These were written by CCDOps when you saved the file in the last
lab. The meta-data consists of a keyword name, a value, and an
optional comment.
For example, the NAXIS keyword gives the number of axes, it should
be 2. The NAXIS1 and NAXIS2 keywords give the number of pixels
along the two axes of the image. Record these in your lab
notebook. If you saved a full image (as instructed in the last
lab), these should match the number of pixels in the CCD.
Check the CCD documentation to see if that is true and record in
your notebook. The CCD-TEMP keyword records the CCD
temperature when the image was obtained. Record that in your
notebook and note the units. Check the TELESCOP keyword, it
should contain what you wrote for 'Telescope Description'.
Write in your notebook if TELESCOP agrees with FOCALLEN and what you
wrote in your notebook about the lens used for the image. As
long as you pay attention while taking the data, the meta-data in
FITS files provides a nice way to check that you are working with
the correct data files.
Second example program - difference image
Now download diffimage.py and load it
into your editor. Modify lines 12 and 13 to read in the two
images of the test pattern that you took in the first lab.
This programs reads in two images, extracts the image data,
calculates a new array which is the difference in the two images,
and then plots all three images. The difference image is
calculated on line 33. Note how simply the code is. The
other new element in the code is the use of plt.figure(#) to pop-up
multiple windows on the screen. Note that only one plt.show()
is needed to display all three images.
Take a look at your three images. If the two images that you
took with the CCD were under exactly identical conditions, then the
test pattern should disappear in the difference image. Did
this happen? Write down your conclusions and also any thoughts
about how to improve the subtraction in your notebook. Also,
can you think of some astronomical application for difference
imaging?
Third example program - image histogram
Now we'll use python to calculate some statistics and plot a
histogram of the pixel values in our images. Get histimage.py. The first parts should
be familiar. Edit the line that reads the FITS file to use
your image file. The new stuff starts at line 26. The
numpy/matplotlib routines to make histograms and calculate
statistics need to have one-dimensional arrays as input. In
line 26, we find the size of the 2-d array img. the img.shape
routine returns two values in a 'tuple'. Python tuples are
sets of values separated by commas and sometime enclosed in
parenthesis. In line 26, nx gets set to the first value in the
tuple and ny gets set to the second value. In line 27, we make
a new (1-d) array imgh. It has the same values as the 2-d
image array img, but arranged like ducks in a row.
Following this are a few lines to calculate statistics of the image
values. min, max, and mean should be self explanatory.
std gives the standard deviation of a set of data. The
standard deviation is a measure of the fluctuations of the data
around the average value. It is the square root of the average
of the squared deviations from the mean, i.e., std =
sqrt(mean((x-mean(x))**2)), where ** is Python's operator for
raising a number to a power. Record the statistics of the
pixel values of your image.
Then, we plot the histogram. A histogram is a graphical
representation of the distribution of a set of values. In this
case, the values are the intensity readings from the CCD for each
pixel. If you arrange these in a 2-d array, you get an
image. A histogram discards the spatial information and only
shows the distribution of the values themselves, in particular the
frequency with which each value (or range of values) occurs.
The matplotlib routine hist plots histograms. The first
argument is a 1-d of values, the other arguments set the plot
parameters. The most crucial of these is 'bins' which sets the
number of bins in the histogram. By default, the bins will
evenly spaced between the lowest and highest input values.
In calculating the statistics of a data set, one often prefers to
discard outlier, e.g. pixels that have high values because they are
saturated or 'hot' or pixels that have low values because they are
damaged. Examine the histogram of your pixel values. Are
there outliers? In lines 41-44, we set an allowed range for
'good' pixels values (between plow and phi) and then make a new
array, imghcut, keeping only the values in that range. Adjust
the values for plow and phi according to your histogram and record
them. Note that the current coordinates of the mouse cursor
are displayed in the plot window when you move the mouse over the
plot. Then, run histimage.py again and record the statistics
of the pixel values of your image when keeping only the 'good'
pixels.
Now try Python for yourself
Spend the rest of the lab learning python. We will be using
python throughout the semester, so it is a good investment of your
time. A good way to do this is to set yourself a goal and then
try to write the needed code. For example, you could add some
lines to the histogram plotting program that overplot the mean value
on the histogram. After getting that to work, you might try
also plotting the mean ± the standard deviation (maybe with dashed
lines). That will give you a nice graphical illustration of
how the standard deviation is a measure of the fluctuations of the
data around the average value. Later on, we will use the
median (the value such that half the data lies above and half lies
below) in addition to the mean. Try calculating the median and
plotting it and see how it compares to the mean. In the next
lab, you will need to make plots of values. You might try
making an array of the integers from 1 to 10, finding their squares,
and then plotting the square versus the value.
Resources for learning Python
The resources below are a good place to start. If you want to
do something specific in python, a good first step is to search on
the internet, e.g. try searching for 'python median'.
Interactive, online tutorial: http://www.codecademy.com/tracks/python
PyFITS documentation: http://pythonhosted.org/pyfits/
NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial
Matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html
Python official web site: http://www.python.org/
PyFits official site: http://www.stsci.edu/institute/software_hardware/pyfits
Question and answer site for programming: http://stackoverflow.com