Often, particularly if you are an experimentalist, you'll want to
process data produced by some instrument. To do so, you'll
need to get those data into Python. That requires code that
can read the data files and correctly interpret their format.
Reading text files - A surprising amount of data
comes in the form of text files. Text files have the
advantage that humans can read them in a text editor, like the one
in Spyder, but the disadvantage that they tend to take more disk
space than binary format files. Most text data files are
organized with each line in the file containing a related set of
data. For example, if you have a probe submerged in a river
taking repeated measurements, you might have on one line: the time
of the measurement, the temperature of the water, and the flow
speed of the water. The other lines would have the same
format, but, of course, different data.
To read a text data file into Python, you need to know the
format. One common format is to have values separated by
commas. These are called comma separated values files or csv
files and are sometimes given the extension .csv. Here is an
example of a csv file containing orbit data for the HaloSat
CubeSat, halosat_orbit.csv.
The first column is time which is given in TAI seconds since 1
January 2000. International Atomic Time (TAI, from the
French name temps atomique international) doesn't include leap
seconds. The other three columns give the position of the
spacecraft in a reference frame centered on the Earth and rotating
with the Earth (called ECEF). The z-axis extends through true
north, which is basically the average direction of the Earth's
rotational axis, and the x-axis intersects the sphere of the earth
at 0° latitude (the equator) and 0° longitude (prime meridian in
Greenwich).
The following program will read the file into Python and put the
data into four numpy arrays called time, posx, posy, and posz.
# read in
halosat_orbit.csv
import numpy as np
f = open('halosat_orbit.csv') # open the text data file
t1, t2, t3, t4 = [], [], [], [] # temporary list to hold data
for line in f : # read each line in the file in turn
print(line)
s1, s2, s3, s4 = line.split(',') # split the input line
at the commas
t1.append(float(s1)) # convert the string to a
float and add to the list
t2.append(float(s2))
t3.append(float(s3))
t4.append(float(s4))
f.close() # close the text data file
time = np.array(t1) # make numpy arrays from the lists
posx = np.array(t2)
posy = np.array(t3)
posz = np.array(t4)
The second line opens the file with the given name and creates an
object that points to the file. The third line makes empty
lists to store the data. Why do we use lists rather than
numpy arrays at this point?
The forth line sets up a loop that reads each line from the file,
one at a time. In the body of the loop, we first print out
the line just read in. You might want to comment this out
after you know the code works. Then we split the line into
smaller strings at the commas; this gives us individual strings
for each of the numbers without any commas. Then we convert
each string into a float and append it at the end of the
appropriate list. After the loop is done, we close the file,
which is very important because we won't be able to open the file
again until it is closed. Then we convert the lists into
numpy arrays.
Run the program to check if it works. Note that it doesn't
print anything out, so after running look at the variables in the
Variable explorer pane. Check if time has the same length as
the position variables.
People often have to read in text data file, so NumPy has a routine to do this called loadtxt. The code above could be shortened to
(time, posx, posy,
posz) = np.loadtxt('halosat_orbit.csv', delimiter=',',
unpack=True)
The loadtxt
routine reads in a text file and then splits each line into
pieces, just like our code above. The data file name is the
only required argument. Using delimiter=',' tells loadtxt to divide
each line using commas. The default output of loadtxt is a 2D
array. Using unpack=True
makes loadtxt
transpose its output so you can write each column to a variable
using a tuple on the left hand side of the equal sign. The
routine has lots of options, as shown at https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html.
loadtxt allows
comment lines in the file. By default any line that starts
with # is a comment, but you can change the character if you
like. Also, you can skip rows at the beginning of the file;
this is useful if your file starts with a block of text describing
the contents. loadtxt
does the job in most cases, but sometimes the data are formatted
in a way that it can't handle. In that case, you can write a
custom routine starting with the code in the first example.
Reading Flexible Image Transport System (FITS) files - A
disadvantage of using text files for data is that they tend to be
large. For example "175.5928192138672," is 18 bytes while a
float in Python uses 8 bytes. Also, it is difficult to
preserve the structure of data in standard format text files like
csv. There are many, many data formats that store numbers in
binary formats. Examples include mp3 and wav for audo data,
mp4 for video data, and Hierarchical Data Format for particle
physics data (currently HDF5). Astrophysicists like to use
the Flexible Image Transport System of FITS. FITS is a
standard file format used for everything from images from optical
telescopes to spacecraft orbit data to X-ray event lists.
FITS can store a great variety of data structures and preserve
those structures. FITS is also self commenting. Just
as commenting code is important, comment data may be even more
important. If you forget the conditions under which some
data were obtained, those data often become useless.
This file halosat_orbit.fits
contains orbit data for the HaloSat CubeSat in FITS format.
Let's run the program below to read the data into Python.
# read data from a
FITS file
# import libraries
import numpy as np
import astropy.io.fits as pyfits
fitsfile = 'halosat_orbit.fits' # name of FITS file
f = pyfits.open(fitsfile) # read in the FITS file
h = f[1].header # copy the FITS header
d = f[1].data # put all of the data into the variable d
f.close() # close the FITS file
# move the data from d to numpy arrays
time = array(d['TIME']) # TAI Time
pos = array(d['POSITION']) # position as a 3-component vector in ECEF
[km]
The program uses the AstroPy library that includes a bunch of
astronomically inclined functions and ones to read FITS
files. The name of the file is set in the variable fitsfile.
The function open in the astropy.io.fits library, which we have
decided to refer to with the name pyfits, is similar to the open
function built into Python, but reads FITS files. It returns
an object that points to the file, similar to the standard open
function. However, FITS files can have multiple extensions,
each of which can contain different types of data. The data
of interest in our file is in extension 1, so we access it using
f[1]. After you run the code, try typing f[1].header at the
command line. This will print out the header for extension 1
containing a description of the data organized as a bunch of keywords
with values and descriptions. NAXIS2 tells you how many data
points you have. The TTYPE1, TFORM1, and TUNIT1 keywords
give you a description, format, and units for the TAI time data.
You get at the data in extension 1 using f[1].data. We put
that data into the variable d and then close the file. The
last two lines move data from d into numpy arrays. Note that
position is a three component vector. Using FITS, we can
preserve the structure of the original data (as a vector rather
than 3 floats).
You may also want to save the results of calculations that you
have done with Python or you may want to use Python to acquire
data from instruments and save those to files.
Writing text files - So far, we have printed to the
console in order to get results out. If you have a large set
of results, you may want to print to a file instead. The
open function can also be used to create and write to files.
Let's try the program below.
# write squares from
1 to 9 to a csv file
import numpy as np
f = open('squares.csv', 'w') # open the file for writing
f.write('# integer, square of integer\n')
for x in np.arange(1, 10) :
f.write(str(x)+', '+str(x**2)+'\n')
f.close()
In the third line, adding a second parameter of 'w' to the open
statement makes open create a new file and write to it rather than
read from it. If the file already exists, it will be
overwritten, so be careful about your files. In the fourth
line, we use the object f associated with the file, sometimes called a
handle, to write a comment line to the file. The comment
starts with a # and ends with a \n. Back slashes are used in
Python (and C and C++) to indicate special characters. \n is
a newline. In contrast with the print function, we have to
explicitly add newlines when using the write function. We
then loop over the integers from 1 to 9 and write a line to the
file with each new integer and its square, separated by a comma
and ending the line with \n. To finish, we close the
file. That makes sure that everything gets recorded.
If you forget to close the file, then sometime the last few writes
can get lost.
Writing FITS files - Writing the same data to a
FITS file is slightly more complicated. The piece of the
code that wrote the HaloSat FITS file you used above is at write_fits.py. It will make
errors if you run it because the rest of the code that filled the
data arrays is not included. Many of the lines are used to
make the file self documenting. For example, it is standard
when writing FITS files that one includes a MISSION keyword that
specifies the mission or the name of the ground-based telescope or
instrument that produced the data. Also, one typically
records when the file was created and the time range over which
the data were collected. This particular code snippet stores
individual NumPy arrays as columns in the file. The units
for each column and a short description of the data are included
in the file. The documentation for writing FITS files along
with examples is available at http://docs.astropy.org/en/stable/io/fits/.
There are no functions to make plots built into Python.
However, there is a very extensive plotting package called
Matplotlib that is very widely used in scientific applications of
Python. The best way to learn Matplotlib is to look at
examples. Let's start with the
program below. Copy it into Spyder, save it with an
appropriate file name, and run it.
The remainder of today's class will be devoted to working through chapter 4 of the textbook where all of these topics are covered. As you read the chapter, enter the commands in the purple highlights into Spyder and see what you get. When you are done with the chapter, start on homework #4.
HW #4 is due at the beginning of the
next class. The random walk assignment is a reasonably
challenging programming task and additional help is available if
you ask.