An Introduction to Image Processing with
Python
Astronomical Laboratory ASTR:4850, Fall 2015
by Philip Kaaret
Introduction
In order to gain a deeper understanding of how astronomical images
are processed, we will be writing our own software to do the
processing. To minimize the effort in doing this, we will use
the computer language Python. Python is an
interactive and extensible programming language. It
has extensions to load images from FITS files, process images using
compact code without loops, and display images on the screen.
During this week's lab, you will become familiar with the basics of
python and some of the particulars about handling images in python.
Python is free software. So, if you want to work on learning
python outside of class, you can obtain it via the links on the
class web page. Note that we will use Python 2.7 and not
Python 3 (this is because matplotlib only recently added Python 3
support, actually it's because your professor is a fossil and
doesn't like to change). We will use a number of python
packages to process images (as numerical arrays), read FITS files,
and display data. Anaconda includes python and (almost all)
the other needed packages. If you use straight python (e.g.
under Linux), you will need to install these packages separately
(which is very easy for distributions like Ubuntu). If you
have a mac, you're on your own.
You'll be using python heavily during the semester, so it would be
good to install it on a computer that you have access to outside of
class (like a laptop if you own one).
To install Anaconda, use this link http://continuum.io/downloads.
The machines in the lab run 32-bit windows. Once you get
python installed, try running ipython. After you get to the
ipython command line, you should type
cd h:/
to change the current directory to where your files get
stored. The ipython command line uses unix commands.
Some useful commands are: 'ls' - to list the contents of the current
directory, 'cd' - to change the current directory, and 'pwd' - to
print the current directory.
I prefer to edit .py files using gedit. You may need to
install gedit. Most other text editors also work just fine.
You'll need to install pyfits by going to this page http://www.stsci.edu/institute/software_hardware/pyfits/Download.
First example program - reading and plotting an image
To get started with python, let's look at a simple program to read
in an image in FITS format and plot it on the screen. The
program is here: showimage.py. You
should download the program and then load it into a text editor of
your choice. Notepad messes up the formatting. Wordpad
is ok. Gedit should be installed on the machines and is pretty
nice.
The first line is a comment about what the program does. It is
good practice to liberally comment your code.
The second set of lines (3-7) tells python that we will be using a
bunch of extensions. Extensions are pre-written modules to do
various tasks. numpy, or 'Numerical Python' is a package to
define mathematical constructions like matrices (or arrays) and make
it easy to handle them in python. matplotlib is a scientific
plotting package. pyfits is a package to read and write FITS
files.
The first thing that we actually do in the program is on line 12,
where we read in a FITS file. The program currently reads in a
file called 'testpattern1.fit'. You should edit this line to
read in one of the files that you made in the last lab. Note
if you are running python in the same directory as your data, you
need only the file name. Otherwise, you will need to add the
file's directory.
On line 15, we copy the image data from the FITS file to a variable
called img1. img1 is actually a numpy array and later on we'll
discuss how to do math operations on numpy arrays (it's easy).
The next set of lines, 18-21, plot the image. Back in the
import stuff, we imported the plotting package "as plt". This
allows us to use the shorthand "plt" to refer to the plotting
package. All four lines call routines in the matplotlib.pyplot
package using plt as shorthand. The first line sets the
plotting to interactive mode; this just means that the program
continues on after making the plot rather than waiting for the user
to look at the plot before continuing. The second line sets up
the color map. Since the camera is black and white, we use a
gray scale map to make realistic looking images. The third
line uses the "imshow" routine to make a plot of the image data
using the selected color map. The four line actually puts the
plot up on the screen.
Once you have edited the program to read in your file (edit line 12,
see above), go ahead and run the program. Start up
ipython. ipython is an interactive, shell interface to python
that uses unix-like commands. To setup Anaconda python for the
plotting interface that we use, type '%pylab' without the
quotes. You will want to do this every time you start ipython
or the program will freeze when you make plots. If you are not
running Anaconda, this command probably isn't necessary. Use
cd to move to the directory where showimage.py is and type 'run
showimage' without the quotes and press Enter. ipython should
pop up a displaying the image. Note that the window is
interactive: one can zoom, pan, or save the image in a variety of
formats. If you have problems running the program, make sure
that you are in the right directory and that you have edited the
program to look for the FITS file in the correct place. Make
sure the program works before proceeding. Record in your lab
notebook, the procedures that you followed to display the image
including the directories where showimage.py and your image are
saved. Remember to start on a new page and write today's date
and a short description of today's activities (like 'Intro to
Python') at the top of the page.
FITS files
After you have run the program, it should leave you at the ipython
command prompt. Now type 'print h[0].header'. The
ipython screen will be filled with a bunch of information encoded in
the file. FITS is a "self-documenting" file format. In
addition to the image data, the file contains 'meta-data' describing
the image parameters and how and when the image was obtained.
These were written by CCDOps when you saved the file in the last
lab. The meta-data consists of a keyword name, a value, and an
optional comment.
For example, the NAXIS keyword gives the number of axes, it should
be 2. The NAXIS1 and NAXIS2 keywords give the number of pixels
along the two axes of the image. Record these in your lab
notebook. If you saved a full image (as instructed in the last
lab), these should match the number of pixels in the CCD.
Check the CCD documentation to see if that is true and record in
your notebook. The CCD-TEMP keyword records the CCD
temperature when the image was obtained. Record that in your
notebook and note the units. Check the TELESCOP keyword, it
should contain what you wrote for 'Telescope Description'.
Write in your notebook if TELESCOP agrees with FOCALLEN and what you
wrote in your notebook about the lens used for the image. As
long as you pay attention while taking the data, the meta-data in
FITS files provides a nice way to check that you are working with
the correct data files.
Second example program - difference image
Now download diffimage.py and load it
into your editor. Modify lines 12 and 13 to read in the two
images of the test pattern that you took in the first lab.
This programs reads in two images, extracts the image data,
calculates a new array which is the difference in the two images,
and then plots all three images. The difference image is
calculated on line 33. Note how simply the code is. The
other new element in the code is the use of plt.figure(#) to plot
multiple windows on the screen. Note that only one plt.show()
is needed to display all three images.
Take a look at your three images. If the two images that you
took with the CCD were under exactly identical conditions, then the
test pattern should disappear in the difference image. Did
this happen? Write down your conclusions and also any thoughts
about how to improve the subtraction in your notebook. Also,
can you think of some astronomical application for difference
imaging?
Third example program - image histogram
Now we'll use python to calculate some statistics and plot a
histogram of the pixel values in our images. Get histimage.py. The first parts should
be familiar. Edit the line that reads the FITS file to use
your image file. The new stuff starts at line 26. The
numpy/matplotlib routines to make histograms and calculate
statistics need to have one-dimensional arrays as input. In
line 26, we find the size of the 2-d array img. the img.shape
routine returns two values in a 'tuple'. Python tuples are
sets of values separated by commas and sometime enclosed in
parenthesis. In line 26, nx gets set to the first value in the
tuple and ny gets set to the second value. In line 27, we make
a new (1-d) array imgh. It has the same values as the 2-d
image array img, but arranged like ducks in a row.
Following this are a few lines to calculate statistics of the image
values. min, max, and mean should be self explanatory.
std gives the standard deviation of a set of data. The
standard deviation is a measure of the fluctuations of the data
around the average value. It is the square root of the average
of the squared deviations from the mean, i.e., std =
sqrt(mean((x-mean(x))**2)), where ** is Python's operator for
raising a number to a power. Record the statistics of the
pixel values of your image.
Then, we plot the histogram. A histogram is a graphical
representation of the distribution of a set of values. In this
case, the values are the intensity readings from the CCD for each
pixel. If you arrange these in a 2-d array, you get an
image. A histogram discards the spatial information and only
shows the distribution of the values themselves, in particular the
frequency with which each value (or range of values) occurs.
The matplotlib routine hist plots histograms. The first
argument is a 1-d of values, the other arguments set the plot
parameters. The most crucial of these is 'bins' which sets the
number of bins in the histogram. By default, the bins will
evenly spaced between the lowest and highest input values.
In calculating the statistics of a data set, one often prefers to
discard outlier, e.g. pixels that have high values because they are
saturated or 'hot' or pixels that have low values because they are
damaged. Examine the histogram of your pixel values. Are
there outliers? In lines 41-44, we set an allowed range for
'good' pixels values (between plow and phi) and then make a new
array, imghcut, keeping only the values in that range. Adjust
the values for plow and phi according to your histogram and record
them. Note that the current coordinates of the mouse cursor
are displayed in the plot window when you move the mouse over the
plot. Then, run histimage.py again and record the statistics
of the pixel values of your image when keeping only the 'good'
pixels.
Now try Python for yourself
Spend the rest of the lab learning python. We will be using
python throughout the semester, so it is a good investment of your
time. A good way to do this is to set yourself a goal and then
try to write the needed code. Also, going through the Code
Academy and NumPy tutorials would be useful (the links are on the
main class web page).
There is no class on Labor day, so it would be good to spend 12
hours over the next two weeks learning python. There are
python exercises due on 9/14 to encourage this.
Python exercises due at the next of the next lab:
1) Add code to the histogram plotting program to overplot the mean
value on the histogram. After getting that to work, add more
code to plotting the mean ± the standard deviation with dashed
lines. This plot gives a nice graphical illustration of how
the standard deviation is a measure of the fluctuations of the data
around the average value. Then add code to calculate the
median (the value such that half the data lies above and half lies
below). Plot the median and see how it compares to the
mean. Print out your code and a copy of your plot with mean,
standard deviations, and median to hand in.
2) In the next lab, you will need to make plots of values.
Write code to make an array of the integers from 1 to 10, find their
squares, and then plot the square versus the value. Print out
your code and a copy of your plot to hand in.
3) Add some lines to diffimage.py so that all values less than zero
in the difference image are set to zero. Print out your code
to hand in and highlight the added lines. Note that you do not
need any loops to do this (and points will be taken off if you use
loops).
4) Add some lines to diffimage.py to calculate the sum of a 9x9 box
pixels centered on pixel 100,100 in the difference image.
Print out your code to hand in and highlight the added lines.
Resources for learning Python
The resources below are a good place to start. If you want to
do something specific in python, a good first step is to search on
the internet, e.g. try searching for 'python median'.
Interactive, online tutorial: http://www.codecademy.com/tracks/python
PyFITS documentation: http://pythonhosted.org/pyfits/
NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial
Matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html
Python official web site: http://www.python.org/
PyFits official site: http://www.stsci.edu/institute/software_hardware/pyfits
Question and answer site for programming: http://stackoverflow.com