Lecture #12

Scientific Computing Using Python - PHYS:4905

Lecture #12 - Prof. Kaaret

Image display and processing

Today we'll looking at processing and displaying images in Python. Most of the code we'll examine involves calling routines in various Python libraries. As discussed in lecture #4, a large part of the effort in using libraries to perform tasks is figuring out what function calls you need to make to do what you want. The code that you'll use today are useful examples for plotting images and histograms, making contour and surface plots, and smoothing images.

M57 - the Ring nebula

The image we will use today was taken using the Hubble Space Telescope and is of the object known as Messier 57 (M 57) and also known as the Ring nebula. Objects like M57 are called "planetary nebula" because they have a disk-like shape and look like planets if viewed through a telescope with low resolution. "Planetary" nebula have nothing to do with planets. The name persists because astronomers are too steeped in their own history.

"Planetary" nebula are actually shells of gas expelled when a star sheds it outer layers as it ends the giant phase of its evolution and becomes a white dwarf - a hot, compact stellar remnant. The white dwarf in the Ring nebula has a temperature of 1.2×10⁵ K and is visible as the white dot in the center. Ultraviolet light from the white dwarf illuminates the surrounding gas. The colors in the image are approximately true colors. The color image was assembled from three images taken through different narrow band filters with the Hubble telescope's Wide Field Planetary Camera 2. Blue isolates emission from very hot helium (the filter bandpass is centered at 469 nm to capture He II), which is located primarily close to the hot central star. Green represents ionized oxygen ([O III] at 502 nm), which is located farther from the star. Red shows ionized nitrogen ([N II] at 658 nm), which is radiated from the coolest gas, located farthest from the star. M57 is about a light-year in diameter and is located about 2000 light-years away in the direction of the constellation Lyra.

Reading and plotting an image

To get started, let's look at a simple program to read in an image in FITS format and plot it on the screen. Load the program plot_image.py and the FITS image file m57_f658n.fits into a directory on your computer. We are going to run through plot_image one 'cell' at a time. Spyder allows you to execute programs in what it calls 'cells' with the start of each cell marked by a comment line beginning with #%%. Open plot_image in Spyder and put your cursor on the first line. Then press the button to "Run current call and go to the next one" or press Shift+Enter.

Spyder should list the lines executed in the console pane. The first cell has a bunch of imports and a plt.ion() command to setup interactive plotting mode. Run the next cell. This cell loads the image file and puts the image data into a variable called img. Go to the variable explorer window and you should see img. It is an array of floats with a size of 1200 by 1400.

Run the next cell. This cell makes a histogram of the values of each pixel in the image. A histogram shows the distribution of values. We call plt.hist with bins=100, meaning that our histogram has 100 bins. By default, plt.hist will make those bins go from the minimum value in the input data to the maximum value. Look at the plot in the histogram window. It shows that there are a whole lot of very low values (values in the first bins) with a small tail going to higher bins. The histogram isn't that useful. The cell also plots an image using plt.imshow, say this as 'plot dot image show'. To plot the image, we first set up a color map, which is maps from pixel values to color. We have loaded a gray scale map that will produce what looks like a black and white image. By default, plt.imshow adjusts the color map to cover from the minimum value in the input data to the maximum value. The image is even less useful because it looks totally black.

The problem with our first try at a histogram and an image is that we have outliers in the data - meaning one or more points with values very different from the typical values. We can have a look at this by executing the next cell, which you should do now. That cell prints out some statistics about the image, in particular the minimum, the maximum, the mean, and the median. We can see that the maximum is much larger than the mean or the median. That's our problem.

We need to set the histogram range with more sensible values. The next block uses the image statistics to try to get a better scaling. There are several lines of code to choose from to set the variable vmax, which we will use to set the maximum value for scaling. Think about what you want to use (maximum, mean, or median) and uncomment the appropriate line. Also adjust the numerical factor in front of your chosen statistic. Then run the cell. However, use "Run the current cell" (Ctrl+Enter ) instead of "Run current call and go to the next one" so that you can run this cell repeatedly as you play with different statistics and different numerical factors. Recall that you can use the zoom button in the "Better Histogram" window to adjust your view of that histogram. Keep trying different statistics and/or different numerical factors until you are happy with your image and histogram.

Smoothing images

Contour and surface plots are more sensitive to noise or fluctuations in the data. So before generating those plots, we have chosen to smooth our data. We do this by applying a 'Gaussian filter'. A one dimensional Gaussian looks like the function below.

The equation for a Gaussian is

f(x) = \frac{1}{\sqrt{2π{σ}^{2}}}{e}^{-\frac{{(x-μ)}^{2}}{2{σ}^{2}}}

where µ is the mean and σ is the standard deviation which is determines the width. The Full Width at Half Maximum (FWHM) = 2.35σ. The factor in front makes the integral over all real numbers equal to one.

Since we have a 2D image, we use a 2D Gaussian that looks like the image below. A 2D Gaussian is the product of two Gaussians, one for x and one for y.

To smooth the image, we look at a particular pixel. We place a 2D Gaussian centered at that pixel. We then calculate a sum by multiplying the Gaussian times the image pixel value for each pixel in the Gaussian. We replace the value of the particular pixel with the value of the sum. In principle, the Gaussian has infinite extent in each dimension, but in practice, we limit the Gaussian and the sum to cover only a finite number of pixels. The Gaussian is normalized so that its sum over selected set of pixels is unity. This process is called convolution. The wikipedia page on convolution, http://en.wikipedia.org/wiki/Convolution, has a good explanation with some nice graphics to help you get some intuition about what convolution means and does to functions. One can do convolution with functions other than Gaussians. The function selected for convolution with the image is called the 'kernel'. We use gaussian_filter from the scipy.ndimage.filters package that handles all of these details for us. That function takes two parameters. One is the image to be smoothed and the other is the standard deviation for the Gaussian, which is called sigma. Sigma can be given a single number, which makes a symmetric function, or as a sequence with a different value for each axis.

Smoothing will change the range of pixel values in the image. Thus, after smoothing, we need to re-evaluate the range of interesting pixel values to use for our plots. Execute the cell that does smoothing and then plots the smoothed image. Try adjusting the value of sigma to see what it does to the plots. Each time you adjust sigma, you may need to adjust the how vmax is set in order to get a good image. Try values of sigma of 1, 2, 5, 10, 20, and 50. How would you describe the effect of different values for sigma on the image? After you are done, set sigma back to 10.

Contour and surface plots

Contour and surface plots are different ways of presenting data. They have advantages and disadvantages relative to images. The choice of which type of plot to use is often aesthetic and the plots of different types are sometimes combined. For example, to compare gamma-ray and radio images of a nebula, one might overlay contours generated from the radio data on to an image of the gamma-ray data.

To make contour and surface plots in Python, we need three variables: one that holds the X coordinate of each bit of data (in our case, the pixel location), one that holds the Y coordinate, and one that holds the Z coordinate (in our case, the intensity in the pixel). We have the Z data already in the image. The first thing that the next block does is generate arrays with the needed X and Y coordinates. We first generate arrays of numbers for X and Y. These arrays must have the same length as the corresponding axis of the image. Our code uses the pixel number for the coordinate value. You might want to use something like the astronomical coordinate values instead. We then take those arrays and combine them into a grid. The resulting arrays x and y have exactly the same shape as the image. Each entry gives the x or y coordinate of the corresponding pixel in the image.

For contour plots, we need to set the contour levels. These are the pixel values at which the contours are drawn. For example, if we pick a level of 0.5, the plot will have a contour that separates pixels with values above 0.5 from those with values below 0.5. We have chosen to generate contour levels from vmin to vmax using np.linspace and then drop the values at vmin and vmax (those contours would be empty, think about it). The selected levels go into the variables contours. The plt.contour function makes the contour plot and returns information about the contours that can then be used by plt.clabel to label the contours.

Surface plots are representations of 3D images. (I am still waiting for my holographic 3D display.) We need to import Axes3D from mpl_toolkits.mplot3d to get functions to set up 3D plots. The plot_surface function then plots surfaces in the graphical space set up with Axes3D. Note that you can click and drag in the surface plot window if you want to see the plot from different angles. It's cool.

Your turn

We used some simple statistics to attempt to adjust the scaling for our images. Another possible way to scale the data would be to adjust the lower and upper bound on the scale so that some percentage of the pixel values lie below the lower bound and some percentage lie below the upper bound. Modify the code in plot_image.py so that it takes two parameters, vmin_frac and vmax_frac, that adjust vmin and vmax so that vmin_frac of the pixels have values below vmin and vmax_frac have values below vmax. To start, you might use vmin_frac = 0.01 and vmax_frac = 0.95, but note that they must be adjustable. Save this code for your future use, in particular to hand in for homework #11.

FYI

The image that we used was downloaded from the Hubble Legacy Archive (HLA). The HLA is "designed to optimize science from the Hubble Space Telescope by providing online, enhanced Hubble products". If you want to plot more Hubble images, you can access the HLA at https://hla.stsci.edu/hlaview.html. Type an object name in the box (M16 is a nice one), click on Search, then click on the Images tab. You can then download by right clicking (or whatever mac people do) on one of the FITS-Science links. You should be able to read the FITS file directly into today's program. However, you'll need to do some work on setting vmin and vmax as the raw HST images usually include outliers.