Scientific Computing Using Python - PHYS:4905 - Fall 2018

Lecture #04 - 8/30/2018 - Prof. Kaaret


Code snippets

When you start using libraries to perform tasks, a large part of the effort is reading about the library to figure out what function calls you need to make in order to get it to do what you want.  After you do this once, it is a good idea to save that code so that you can copy it, rather than having to figure out everything again.  It is even easier, if you let someone else figure out how to interface with the library and then copy their snippets.  The web site stackexchange.com is a useful resource as are general web searches.  This lecture presents several code snippets that you may find useful.

Importing data

Often, particularly if you are an experimentalist, you'll want to process data produced by some instrument.  To do so, you'll need to get those data into Python.  That requires code that can read the data files and correctly interpret their format.

Reading text files -  A surprising amount of data comes in the form of text files.  Text files have the advantange that humans can read them in a text editor, like the one in Spyder, but the disadvantage that they tend to take more disk space than binary format files.  Most text data files are organized with each line in the file containing a related set of data.  For example, if you have a probe submerged in a river taking repeated measurements, you might have on one line: the time of the measurement, the temperature of the water, and the flow speed of the water.  The other lines would have the same format, but, of course, different data.

To read a text data file into Python, you need to know the format.  One common format is to have values separated by commas.  These are called comma seperated values files or csv files and are sometimes given the extension .csv.  Here is an example of a csv file containing orbit data for the HaloSat CubeSat, halosat_orbit.csv.  The first column is time whcih is given in TAI seconds since 1 January 2000.  International Atomic Time (TAI, from the French name temps atomique international) doesn't include leap seconds.  The other three columns give the position of the spacecraft in a reference frame centered on the Earth and rotating with the Earth (called ECEF). The z-axis extends through true north, which is basically the average direction of the Earth's rotational axis, and the x-axis intersects the sphere of the earth at 0° latitude (the equator) and 0° longitude (prime meridian in Greenwich).

The following program will read the file into Python and put the data into four numpy arrays called time, posx, posy, and posz.

# read in halosat_orbit.csv
import numpy as np
f = open('halosat_orbit.csv') # open the text data file
t1, t2, t3, t4 = [], [], [], [] # temporary list to hold data
for line in f : # read each line in the file in turn
  print(line)
  s1, s2, s3, s4 = line.split(',') # split the input line at the commas
  t1.append(float(s1))  # convert the string to a float and add to the list
  t2.append(float(s2))
  t3.append(float(s3))
  t4.append(float(s4))
f.close() # close the text data file
time = np.array(t1)  # make numpy arrays from the lists
posx = np.array(t2)
posy = np.array(t3)
posz = np.array(t4)

The second line opens the file with the given name and creates an object that points to the file.  The third line makes empty lists to store the data.  Why do we use lists rather than numpy arrays at this point?  The forth line sets up a loop that reads each line from the file, one at a time.  In the body of the loop, we first print out the line just read in.  You might want to comment this out after you know the code works.  Then we split the line into smaller strings at the commas; this gives us individual strings for each of the numbers without an commas.  Then we convert each string into a float and append it at the end of the appropriate list.  After the loop is done, we close the file, which is very important because we won't be able to open the file again until it is closed.  Then we convert the lists into numpy arrays.

Run the program to check if it works.  Note that it doesn't print anything out, so after running look at the variables in the Variable explorer pane.  Check if time has the same length as the position variables. 

People often have to read in text data file, so NumPy has a routine to do this called loadtxt.  The code about could be shortened to

(time, posx, posy, posz) = np.loadtxt('halosat_orbit.csv', delimiter=',', unpack=True)

The loadtxt routine reads in a text file and then splits each line into pieces, just like our code above.  The data file name is the only required argument.  Using delimiter=',' tells loadtxt to divide each line using commas.  The default output of loadtxt is a 2D array.  Using unpack=True makes loadtxt transposes the data so you can write each column to a variable using a tuple.  The routine has lots of options, as shown athttps://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html.  It allows comment lines in the file.  By default any line that starts with # is a comment, but you can change the character if you like.  Also, you can skip rows at the beginning of the file, useful if your file starts with a block of text describing the contents.  loadtxt does the job in most cases, but sometimes the data are formatted in a way that it can't handle.  In that case, you can write a custom routine starting with the longer code above.

Reading Flexible Image Transport System (FITS) files  - A disadvantage of using text files for data is that they tend to be large.  For example "175.5928192138672," is 18 bytes while a float in Python uses 8 bytes.  Also, it is difficult to preserve the structure of data in standard format text files like csv.  There are many, many data formats that store numbers in binary formats.  Examples include mp3 and wav for audo data, mp4 for video data, and Hierarchical Data Format for particle physics data (currently HDF5).  Astrophysicists like to use the Flexible Image Transport System of FITS.  FITS is a standard file format used for everything from images from optical telescopes to spacecraft orbit data to X-ray event lists.  FITS can store a great variety of data structures and preserve those structures.  FITS is also self commenting.  Just as commenting code is important, comment data may be even more important.  If you forget the conditions under which some data were obtained, those data often become useless.

This file halosat_orbit.fits contains orbit data for the HaloSat CubeSat in FITS format.  Let's run the program below to read the data into Python.

# read data from a FITS file
# import libraries
import numpy as np
import astropy.io.fits as pyfits

fitsfile = 'halosat_orbit.fits' # name of FITS file
f = pyfits.open(fitsfile) # read in the FITS file
h = f[1].header # copy the FITS header
d = f[1].data # put all of the data into the variable d
f.close() # close the FITS file

# move the data from d to numpy arrays
time = array(d['TIME']) # TAI Time
pos = array(d['POSITION']) # position
as a 3-component vector in ECEF [km]

The program uses the AstroPy library that includes a bunch of astronomically inclined functions and ones to read FITS files.  The name of the file is in fitsfile.  The function open in the astropy.io.fits library, which we have decided to refer to with the name pyfits, is similar to the open function built into Python, but reads FITS files.  It returns an object that points to the file, similar to the standard open function.  However, FITS files can have multiple extensions, each of which can contain different types of data.  The data of interest in our file is in extension 1, so we access it using f[1].  After you run the code, try typing f[1].header at the command line.  This will print out the header for extension 1 containing a description of the data organized as a bunch of keywords with values and descriptions.  NAXIS2 tells you how many data points you have.  The TTYPE1, TFORM1, and TUNIT1 keywords give you a description, format, and units for the TAI time data. You get at the data in extension 1 using f[1].data.  We put that data into the variable d and then close the file.  The last two lines move data from d into numpy arrays.  Note that position is a three component vector.  Using FITS, we can preserve the structure of the original data (as a vector rather than 3 floats).


Exporting data

You may also want to save the results of calculations that you have done with Python or you may want to use Python to acquire data from instruments and save those to files. 

Writing text files -  So far, we have printed to the console in order to get results out.  If you have a large set of results, you may want to print to a file instead.  The open function can also be used to create and write to files.  Let's try the program below.

# write squares from 1 to 9 to a csv file
import numpy as np
f = open('squares.csv', 'w') # open the file for writing
f.write('# integer, square of integer\n')
for x in np.arange(1, 10) :
  f.write(str(x)+', '+str(x**2)+'\n')
f.close()

In the third line, adding a second parameter of 'w' to the open statement makes open create a new file and write to it rather than read from it.  If the file already exists, it will be overwritten, so be careful about your files.  In the fourth line, we use the object f associated with the file, sometimes called a handle, to write a comment line to the file.  The comment starts with a # and ends with a \n.  Back slashes are used in Python (and C and C++) to indicate special characters.  \n is a newline.  In contrast with the print function, we have to explicitly add newlines when using the write function.  We then loop over the integers from 1 to 9 and write a line to the file with each new integer and its square, separated by a comma and ending the line with \n.  To finish, we close the file.  That makes sure that everything gets recorded.  If you forget to close the file, then sometime the last few writes can get lost.

Writing FITS files  - Writing the same data to a FITS file is slightly more complicated.  The piece of the code that wrote the HaloSat FITS file you used above is at write_fits.py.  It will make errors if you run it because the rest of the code that filled the data arrays is not included.  Many of the lines are used to make the file self documenting.  For example, it is standard when writing FITS files that one includes a MISSION keyword that specifies the mission or the name of the ground-based telescope or instrument that produced the data.  Also, one typically records when the file was created and the time range over which the data were collected.  This particular code snippet stores individual NumPy arrays are columns in the file.  The units for each column and a short description of the data are included in the file.  The documentation for writing FITS files along with examples is available at http://docs.astropy.org/en/stable/io/fits/.

Note that I haven't discussed NumPy's built in procedures to store data.  I do not recommend using them since they introduce yet another data format that is not particularly self documenting.  If you want your data to have a long life, use a self-documenting file format like FITS or a text file format with comments.

Plotting

There are no functions to make plots built into Python.  However, there is a very extensive plotting package called Matplotlib that is very widely used in scientific applications of Python.  The best way to learn Matplotlib is to look at examples.  Let's start with the
program below.  Copy it into Spyder, save it with an appropriate file name, and run it.

# plot y = x^2 - 9x for x = 0 to 10
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0.0, 10.0, 100) # generate 100 evenly spaced points from 0 to 10
y = -x**2 + 9*x

plt.ion() # make plotting interactive
plt.figure('Quadratic') # make a plot window called Quadratic
plt.clf() # clear the plot window
plt.xlabel('X') # label the X-axis
plt.ylabel('Y') # labdel the Y-axis
plt.title('Quadratic  y = -x^2 + 9x') # give the plot a title
#plt.xscale('log') # scale is linear by default
#plt.yscale('log')
plt.plot(x, y, '-k', label='quadratic') # plot the quadratic as a solid, black line
plt.plot(x, 0*x, ':r', label='zero') # plot y=0 as a dotted red line
plt.legend(loc='upper left') # add a legend to explain the curves
plt.show() # display the plot

To begin, we need to import the pyplot part of the matplotlib library and we call it plt for convenience.  Then, we fill the x with our chosen plot range and then compute y according to our chosen equation.  All of the lines starting with plt are calls to matplotlib.  There are several different ways to access matplotlib and pyplot provides a particularly easy one.  The individual calls are documented.  The basic idea with pyplot is that we create a plot window and then the following commands modify the plot(s) in that window.  One can then create another plot window and work on a new plot in that window. 

Making Spyder use external plot windows and saving plots for use elsewhere  By default, Sypder will put plots in the console pane.  A great feature of matplotlib is that you can interact with the plots and this is lost if the plot is in the console.  To make Spyder give each plot its own window, do Tools/Preferences then click on 'Ipython console' and then on the Graphics tab and change 'Graphics backend' to Automatic.  Then click Apply.  Now run your the plotting program again and it should produce a new window for the plot.  If you move your mouse cursor inside the plot, you'll see the coordinates of the cursor in the plot variables.  You can use the magnifying glass button to zoom in on a region of the plot and the home buttom to get back to the whole plot.  You can save the plot with the floppy disk button.  You can choose a variety of file formats.  I use png (Portable Network Graphics) for saving plots for presentations and eps (encapsulated postscript) for saving plots for publications.  The textbook suggest svg (scalable vector graphics) for saving plots in a format that you can then edit if you like.

Matplotlib resources - Matplotlib is extremely capable.  If you have an idea of a plot that you would like to make, go to https://matplotlib.org/gallery.html and see if there is an image similar to what you want.  You can then click on image to find the code that produced it.

Assignment

The remainder of today's class will be devoted to working through chapter 4 of the textbook where all of these topics are covered.  As you read the chapter,  enter the commands in the purple highlights into Spyder and see what you get.  When you are done with the chapter, do the fourth assignment which will be due at the start of class on Tuesday, September 4.

HW #4 is due at the beginning of class on Tuesday, September 4.
https://homepage.physics.uiowa.edu/~pkaaret/2018f_p4905/hw04.html