An Introduction to Image Processing with
Python
Astronomical Laboratory ASTR:4850, Spring 2018
by Philip Kaaret
Introduction
In order to gain a deeper understanding of how astronomical images
are processed, we will be writing our own software to do the
processing. To minimize the effort in doing this, we will use the
computer language Python. Python is an interactive
and extensible programming language. It has extensions to
load images from FITS files, process images using compact code
without loops, and display images on the screen. During this
week's lab, you will become familiar with the basics of python and
some of the particulars about handling images in python. The
following is written assuming no prior knowledge of Python. If you
are familiar with Python and have a preferred editor and
environment, please use those.
Python is free software. So, if you want to work on learning python
outside of class, you can obtain it via the links on the class web
page. Note that we will use Python 2.7 and not Python 3 (this is
because your professor is a fossil and doesn't like to change). We
will use a number of python packages to process images (as numerical
arrays), read FITS files, and display data. Anaconda includes python
and the other needed packages. If you use straight python (e.g.
under Linux), you will need to install these packages separately
(which is very easy for distributions like Ubuntu). If you have a
mac, you're on your own.
You'll be using python heavily during the semester, so it would be
good to install it on a computer that you have access to outside of
class (like a laptop if you own one). To install Anaconda, use this
link
https://www.anaconda.com/download/
.
The machines in the lab run 32-bit windows. Once you get python
installed, try running ipython. After you get to the ipython command
line, you should type
cd h:/
to change the current directory to where your files get stored.
The ipython command line uses unix commands. Some useful
commands are: 'ls' - to list the contents of the current directory,
'cd' - to change the current directory, and 'pwd' - to print the
current directory.
First example program - reading and plotting an image
To get started with python, let's look at a simple program to read
in an image in FITS format and plot it on the screen. The program is
here: showimage.py. You should download
the program and then load it into a text editor of your choice.
Notepad messes up the formatting. Wordpad is ok. Gedit and Notepad++
are pretty nice, but you may need to install them. People who like
wasting space on their screen use atom. Real geeks use emacs and its
author is a saint.
vi sucks out your soul; the extension for emacs that makes it behave
like vi is evil.
Now that you have showimage.py loaded in an editor of your choice,
let's have a look at it. The first line is a comment about what the
program does. It is good practice to liberally comment your code.
The second set of lines (4-7) tells python that we will be using a
bunch of extensions. Extensions are pre-written modules to do
various tasks. numpy, or 'Numerical Python' is a package to define
mathematical constructions like matrices (or arrays) and make it
easy to handle them in python. matplotlib is a scientific plotting
package. pyfits is a package to read and write FITS files.
The first thing that we actually do in the program is on line 11,
where we open a FITS file. The program currently opens a file called
'platescale20171215.fit'. You should edit this line to read in one
of the files that you made in the last lab. Note if you are running
python in the same directory as your data, you need only the file
name. Otherwise, you will need to add the file's directory.
On line 14, we copy the image data from the FITS file to a variable
called img1. img1 is actually a numpy array and later on we'll
discuss how to do math operations on numpy arrays (it's easy).
The next set of lines, 17-20, plot the image. Back in the import
stuff, we imported the plotting package matplotlib.pyplot "as plt".
This allows us to use the shorthand "plt" to refer to the plotting
package. All four lines call routines in the matplotlib.pyplot
package using plt as shorthand. The first line sets the plotting to
interactive mode; this just means that the program continues on
after making the plot rather than stopping for the user to look at
the plot and then close it. The second line sets up the color map.
Since the camera is black and white, we use a gray scale map to make
realistic looking images. The third line uses the "imshow" routine
to make a plot of the image data using the selected color map. The
fourth line actually puts the plot up on the screen.
Once you have edited the program to read in your file (edit line 11,
see above), go ahead and run the program. Start up ipython. ipython
is an interactive, shell interface to python that uses unix-like
commands. Use cd to move to the directory where showimage.py is and
type 'run showimage' without the quotes and press Enter. ipython
should pop up a window that displays the image. Note that the window
is interactive: one can zoom, pan, or save the image in a variety of
formats. If you have problems running the program, make sure that
you are in the right directory and that you have edited the program
to look for the FITS file in the correct place. Make sure the
program works before proceeding. Write down the procedures that you
followed to display the image including the directories where
showimage.py and your image are saved.
If Anaconda freezes when you make plots, exit out of it and then
%pylab when you start up ipython again. You will then want to do
this every time you start ipython.
FITS files
After you have run the program, it should leave you at the ipython
command prompt. Now type 'print h[0].header'. The ipython screen
will be filled with a bunch of information encoded in the file. FITS
is a "self-documenting" file format. In addition to the image data,
the file contains 'meta-data' describing the image parameters and
how and when the image was obtained. These were written by
CCDOps when you saved the file in the last lab. The meta-data
consists of a keyword name, a value, and an optional comment.
For example, the NAXIS keyword gives the number of axes, it should
be 2. The NAXIS1 and NAXIS2 keywords give the number of pixels along
the two axes of the image. If you saved a full image (as instructed
in the last lab), these should match the number of pixels in the
CCD. Check the CCD documentation to see if that is true. The
CCD-TEMP keyword records the CCD temperature when the image was
obtained. As long as you pay attention while taking the data, the
meta-data in FITS files provides a nice way to check that you are
working with the correct data files.
Second example program - difference image
Now download diffimage.py and load it
into your editor. Modify lines 11 and 12 to read in the two images
of the test pattern that you took in the first lab. This
programs reads in two images, extracts the image data, calculates a
new array which is the difference in the two images, and then plots
all three images. The difference image is calculated on line 32.
Note how simple the code is. The other new element in the code is
the use of plt.figure(#) to plot multiple windows on the screen.
Note that only one plt.show() is needed to display all three images.
Take a look at your three images. If the two images that you took
with the CCD were under exactly identical conditions, then the test
pattern should disappear in the difference image. Did this
happen? Write down your conclusions and also any thoughts
about how to improve the subtraction in your notebook. Also, can you
think of some astronomical application for difference imaging?
Third example program - image histogram
Now we'll use python to calculate some statistics and plot a
histogram of the pixel values in our images. Get histimage.py. The first parts should be
familiar. Edit the line that reads the FITS file to use your image
file. After setting up figure 1 on line 21, we clear the figure on
line 22 using plt.clf(). This resets anything that was done to the
plotting window.
The interesting stuff starts at line 26. The numpy/matplotlib
routines to make histograms and calculate statistics need to have
one-dimensional arrays as input. In line 26, we find the size of the
2-d array img. the img.shape routine returns two values in a
'tuple'. Python tuples are sets of values separated by commas and
sometime enclosed in parenthesis. In line 26, nx gets set to the
first value in the tuple and ny gets set to the second value. In
line 27, we make a new (1-d) array imgh. It has the same values as
the 2-d image array img, but arranged like ducks in a row.
Following this are a few lines to calculate statistics of the image
values. min, max, and mean should be self explanatory. std gives the
standard deviation of a set of data. The standard deviation is a
measure of the fluctuations of the data around the average value. It
is the square root of the average of the squared deviations from the
mean, i.e., std = sqrt(mean((x-mean(x))**2)), where ** is
Python's operator for raising a number to a power.
Then, we plot the histogram. A histogram is a graphical
representation of the distribution of a set of values. In this case,
the values are the intensity readings from the CCD for each pixel.
If you arrange these in a 2-d array, you get an image. A histogram
discards the spatial information and only shows the distribution of
the values themselves, in particular the frequency with which each
value (or range of values) occurs. The matplotlib routine hist
plots histograms. The first argument is a 1-d array of values, the
other arguments set the plot parameters. The most crucial of these
is 'bins' which sets the number of bins in the histogram. By
default, the bins will evenly spaced between the lowest and highest
input values.
In calculating the statistics of a data set, one often prefers to
discard outliers, e.g. pixels that have high values because they are
saturated or 'hot' or pixels that have low values because they are
damaged. Examine the histogram of your pixel values. Are there
outliers? In lines 41-44, we set an allowed range for 'good'
pixels values (between plow and phi) and then make a new array,
imghcut, keeping only the values in that range. Adjust the values
for plow and phi according to your histogram. Note that the current
coordinates of the mouse cursor are displayed in the plot window
when you move the mouse over the plot. Then, run histimage.py again
and see the statistics of the pixel values of your image change when
keeping only the 'good' pixels. If you don't have any hot pixels or
dead pixels (with very low values), then you have a good CCD.
Now try Python for yourself
To encourage you to learn python, the following exercises are due at
the beginning of the next lab. You should do these exercises by
yourself, not with your lab partners. If you have not previously had
much exposure to python, speak to an instructor if you need more
time to learn enough python to finish the assignment.
1) Add code to the histogram plotting program to overplot the mean
value on the histogram. After getting that to work, add more code to
plotting the mean ± the standard deviation with dashed lines. This
plot gives a nice graphical illustration of how the standard
deviation is a measure of the fluctuations of the data around the
average value. Then add code to calculate the median (the value such
that half the data lies above and half lies below). Plot the median
and see how it compares to the mean. Print out your code and a copy
of your plot with mean, standard deviations, and median to hand in.
2) In the next lab, you will need to make plots of values.
Write code to make an array of the integers from 1 to 10, find their
squares, and then plot the square versus the value. Print out your
code and a copy of your plot to hand in.
3) Add some lines to diffimage.py so that all values less than zero
in the difference image are set to zero. Print out your code to hand
in and highlight the added lines. Note that you do not need any
loops to do this (and points will be taken off if you use loops).
4) Add some lines to diffimage.py to calculate the sum of a 9x9 box
pixels centered on pixel 100,100 in the difference image.
Print out your code to hand in and highlight the added lines.
Resources for learning Python
The resources below are a good place to start. If you want to do
something specific in python, a good first step is to search on the
internet, e.g. try searching for 'python median'.
Interactive, online tutorial: http://www.codecademy.com/tracks/python
PyFITS documentation: http://pythonhosted.org/pyfits/
NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial
Matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html
Python official web site: http://www.python.org/
PyFits official site: http://www.stsci.edu/institute/software_hardware/pyfits
Question and answer site for programming: http://stackoverflow.com