Using Python to Read and Plot Data
Before Class
Python is an interactive and extensible programming language. It has
extensions to load data from FITS files, process data using compact
code without loops, and display plots on the screen. During this
project, you will become familiar with the basics of python and some
of the particulars about handling data in python.
Python is free software. So, if you want to work on learning python
outside of class, you can obtain it via the links and the end of
this page. Note that we will use Python 2.7 and not Python 3 (this
is because matplotlib only recently added Python 3 support and we
don't want to change yet). We will use a number of python packages
to read FITS files and display data. You'll be using python heavily
during the semester, so it would be good to install it on a computer
that you have access to outside of class (like a laptop if you own
one).
If you are using a Windows machine or a Mac, we suggest using
Anaconda, which eases the installation process and includes most of
the needed python packages. To download Anaconda, use this link http://continuum.io/downloads.
You'll also need to install pyfits by going to this page http://www.stsci.edu/institute/software_hardware/pyfits/Download.
If you use a Linux machine, you can either use Anaconda or do a
straight install of python using whatever package manager your
system comes with. You will need to install the NumPy, matplotlib,
ipython, and pyfits packages separately, which is very easy for
distributions like Ubuntu. For those using remote access to orfeo,
the required software is already installed.
You will need a text editor to edit the python programs. Matt
prefers to edit .py files using gedit. On windows machines, Notepad
messes up the formatting, while Wordpad is ok. On Ubuntu (including
orfeo), gedit should be installed by default and is pretty nice.
Most other text editors work just fine. You should have a text
editor on your machine before class and have loaded henley.py (see
below) into it to make sure it looks ok.
We will analyze a data file that contains the oxygen line emission
measured for different sky fields by Henley and Shelton (2012).
Follow the link to the Henley
and Shelton (2012) paper and find the 'Online Data' link. This
link brings you to a page of links to online data cited in the
paper. Many papers place their large tables online through the
Vizier Catalogue Service. Go there to find Tables 1, 2, and 4 of
this paper. Make a directory on your machine call hs2012 and
download all three tables in fits format. Have a look at the files
using fv and check that they look ok.
Instructions
Write a python program to read in the table of oxygen emission lines
strengths from Henley
and Shelton (2012). Plot the line intensities and their ratio
versus Galactic latitude and longitude. Discuss the implications of
your results for the geometry of the halo emission. You may wish to
address what can be learned from repeated observations of the same
field. This should be done individually and handed in individually.
- Figure out how to download the table of your choosing as a
FITS binary table. Make sure to uncheck the RA and DEC columns
when selecting which columns you want before downloading the
binary FITS table because those two columns cause issues in
pyfits. You can examine these files using a fits viewer (i.e.,
'fv').
- To get started with python, let's look at a simple program to
read in a FITS format file and plot some of its contents on the
screen. The program is here: henley.py.
You should download the program and then load it into a text
editor of your choice.
- The first line is a comment about what the program does. It
is good practice to liberally comment your code.
- The first thing that we actually do in the program is on line
7, where we read in a FITS file using the 'pyfits' module. The
program currently reads in a file called 'table.fit'. You should
edit this line to read in the file that you downloaded from
http://vizier.cfa.harvard.edu/. Note, if you are running python
in the same directory as your data, you need only the file name.
Otherwise, you will need to add the file's directory.
- Next, the program reads in the column names, followed by
reading in individual columns as arrays.
- You may need to change the column names in these lines of
code to match the column names in the FITS file.
- The next step filters out the null entries in the table, keeping only those values for which line intensities were measured.
- In order to plot the data on an Aitoff projection of the sky,
the Galactic longitude coordinates need to be in radians and
wrapped over the range [-π,π]. This is done in lines 22-26.
- After all this, we can began plotting the data.
- Now let's actually run the program.
- Start up ipython. ipython is an interactive, shell
interface to python that uses unix-like commands.
- To setup Anaconda python for the plotting interface that we
use, type '%pylab' without the quotes. You will want to do
this every time you start ipython or the program will freeze
when you make plots. If you are not running Anaconda, this
command probably isn't necessary.
- Use cd to move to the directory where henley.py is and type
'run henley' without the quotes and press Enter. ipython
should pop up a displaying the image. Note that the window
is interactive: one can zoom, pan, or save the image in a
variety of formats. If you have problems running the
program, make sure that you are in the right directory and that
you have edited the program to look for the FITS file in the
correct place. Make sure the program works before
proceeding.
What to hand in
Read the Henley and Shelton (2012) paper. Using the given Python
code as a starting point, write your own program that will plot the
line intensities and their ratio versus Galactic latitude and
longitude. Discuss the implications of your results for the geometry
of the halo emission. You may wish to address what can be learned
from repeated observations of the same field.
- Short description of how you obtained the tables and which
one(s) you used for this project and why
- Brief discussion of modifications you made to the code
- Plots versus Galactic longitude and latitude
- The example code plots individual data points from the table. You should consider plotting average line intensities and ratios in bins of latitude and longitude (Why?).
- Some helpful python functions might be: np.digitize(), np.linspace(), mean(); or you may find a more straightforward way using for loops.
- Aitoff projection plots of line intensities and ratio
- Discussion of results and implications
- Address what can be learned from repeated observations of the
same field
Resources for learning Python
The resources below are a good place to start. If you want to do
something specific in python, a good first step is to search on the
internet, e.g. try searching for 'python median'.
Python 2.7 Download: http://www.python.org/download/releases/2.7.8/
Interactive, online tutorial: http://www.codecademy.com/tracks/python
PyFITS documentation: http://pythonhosted.org/pyfits/
NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial
Matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html
Python official web site: http://www.python.org/
PyFits official site: http://www.stsci.edu/institute/software_hardware/pyfits
Question and answer site for programming: http://stackoverflow.com