Astrophysics I, Prof. Gayley: Week 1

Week 1: Overview of the course

Welcome to the astrophysics advanced major sequence. Our purpose is to understand how physics plays out in the cosmos, and to see how studying the cosmos was central for advancing our physics. Our goal will to do more than describe, but to actually explain-- although that requires fitting these ideas in our limited brains, which necessitates simplification and idealization. Navigating simplification and idealzation, while still obtaining useful quantities, is one of the central themes of both physics and, especially, astronomy. But first we must better understand the goal of any explanation.

What's in an explanation

Although there will not be a test question on what constitutes a good explanation, there will be many test and homework questions that require an explanation as part of the answer, so it is worth understanding the elements of a good scientific explanation. This is also true because part of understanding is being able to explain things to yourself and your peers, and also, you may soon be called on to be a teacher or teaching assistant.

Explanations vs. descriptions

A good explanation does more than describe the answer, it moves through the "how" into the "how come." This means it must empower us to make predictions about not just when the phenomenon being explained will occur, but also when it should not occur. If the explanation does not pass that test, it is not an explanation, it is just a description.

Example: Saying that planets orbit in the same plane because they form from disks is just a description-- an explanation also tells why disks themselves form, as that is a necessary part of the same question. So we must say that disks form from gas that has angular momentum that must be conserved, and also the virial theorem must be obeyed. The latter requires that the characteristic speed of the orbiting particles must scale like the inverse square root of the characteristic distance from the star (right?), whereas the former says that the mass of all the particles that carry the angular momentum (so orbit in the same sense in roughly the same plane), times their characteristic speed, must scale like the inverse of the characteristic distance from the star (yes?). Put those two statements together, and you can see that the mass of the stuff that is carrying the angular momentum must scale like the inverse square root of the characteristic distance to the star! In short, as the orbiting gas cloud shrinks (to satisfy the virial theorem as energy is dissipated), more and more mass has to be participating in the angular momentum, i.e., more and more of the mass has to be in a disklike orbit. This is an explanation, and it also tells you what the characteristic radius of the disk must be once it forms, given its mass and angular momentum.

Now, of course the real world is never really that simple-- if the angular momentum really stayed constant during gravitational contraction, we could get disks easily enough, but we could not get the central star! So there actually must be some way to extract angular momentum from a cloud undergoing gravitational collapse. The gas that is losing a large amount of its angular momentum can form the star, and the gas that keeps most of its angular momentum is what forms the disk and the planets. What separates these populations of gas is a complicated story that is a subject of modern research. So we see that a good explanation not only describes what happens, it allows us to estimate the central quantities involved, and it also brings us into contact with the research frontier when we push the explanation beyond its simplest idealizations. Note how just saying "the planets are in a plane because they form from a disk" does none of those things!

Explanations are not unique

An explanation is not a specific prediction, so it is not unique-- there may be many ways to explain the same phenomenon, all making the same predictions. But a good explanation should exhibit certain basic properties, some of which I suggest below.

Example: You could explain orbits by saying that objects both fall and move sideways at the right rate to maintain a fixed distance, which is a kinematic description, or you can explain it dynamically by saying that objects moving in a circle are accelerating toward the center, and that acceleration is due to the force of gravity. Or enter a rotating frame and say the balance is between gravity and the centrifugal force, there are many options and the more you know, the more useful and powerful is that knowledge. There is value in being able to provide multiple explanations, because you may understand one better than another, and it is a good way to catch your own misconceptions.

Explanations are unifying

An explanation that only works on the exact process it was created for is of no value, it is essentially just a restatement that the process happens because that process happens. So a good explanation must encompass more than just the process it is being used to understand-- it must unify that process with other processes that were not already obviously the same. Unification is central to gaining conceptual power over phenomena, because once we recognize similarities in seemingly disparate phenomena (the resonances of a musical instrument, and the resonances of a spectral line, for example), that is when we start to create an understanding that has predictive power in a wide range of situations.

Use what is already known

Unification is particularly powerful when it connects to other things that are already better understood than the phenomena in question. Often the person asking the question (which may be you, of course) has prior knowledge of some aspects of the issue, and they certainly have common experiences that people share. A good explanation is framed so as to make contact in some way with these common experiences, and basic pre-existing knowledge. Sometimes that means placing yourself in the shoes of the person asking the question (which is particularly easy if that person actually is you). If you are presenting the explanation for grading, putting yourself in the grader's shoes means trying to hit on the key points that are being looked for in the grading rubric. That means avoid verbose answers that mostly repeat vague claims, but instead look for concise and pointed "bullets" that are crucial to the explanation and will score points in the grading. If you are presenting the explanation to a student, putting yourself in their shoes means finding the concepts they already understand, and building the explanation out of those concepts. If you are asking yourself for your own explanation (which you should do constantly as a student), notice the key concepts that come up, and make sure you have a good grasp of them-- explaining things to yourself is how you find out what you do and do not yet understand.

Example: Many people can assimilate the information in a diagram more quickly than in words, so take advantage of this general ability to interpret figures whenever possible. This is not just true in learning and in teaching, it is also true in oral presentations and in research papers, which are fundamentally about explaining things. Often it helps to think of any explanation as centering on the understanding of a single key figure, and all the logical steps necessary to make sense of that figure.

Kill the misconceptions

The flip side of connecting with what is already known is that it can bring pre-existing misconceptions into the question, and if the new answer does not explicitly displace those misconceptions, it can get merged with them in a rather grotesque way. Always look for inconsistencies in the things you hold to be true, because you will not automatically notice them without looking for them, whereas the exams you are given will likely reveal them if you have not already.

Example: If you tell students that an orbit represents some kind of balance involving the force of gravity, they might think the balance has to do with the action/reaction pair involved in the mutual gravity between the two objects, or between the gravities of those objects and other orbiting planets. If you do not specifically kill those misconception (say by pointing out that the action/reaction pair balance is responsible not for an orbit but for the conservation of momentum of the system as a whole, and by saying that an orbit can exist even if there are no other orbiting planets), then the misconceptions will become embedded in what otherwise seems to be a successful explanation.

Seek contrasts and balances

Our brains are adept at thinking in terms of contrasts, and in terms of opposing aspects that are either in balance or imbalance. Often explanations can be most powerful when they are framed in these same terms, a kind of "yin-yang" approach where understanding is empowered by manipulating the tension between the opposites.

Example: The above explanation of the formation of disks can be viewed as resulting from the confrontation between Kepler's second and third laws, tides can be viewed as caused by the contrast between the gravitational acceleration at various points, the temperature of the Earth's surface can be explained in terms of a balance between the visible light absorbed from the Sun and the infrared light emitted by the ground.

Make contact with what is fundamentally unknown

Any answer to a "why" or "how come" question can be used to generate new "why" or "how come" questions based on that answer. This produces a chain of questions and answers that must eventually result in the answer "I don't know." But a truly good explanation will result, at the end of that chain, in the answer "no one knows." When that happens, you know you have connected the initial question to some fundamental scientific unknown, and then you are truly finished. You should not feel disappointed when you encounter such a fundamental uncertainty, you should instead feel marvel-- whether you are asking yourself the question, or someone else is. Science is not designed to completely demystify a question, it is designed to replace superficial mysteries with more profound ones. The advancement of physics involves increasingly profound questions, like replacing "why is energy conserved" with "why, and under what circumstances, does the universe exhibit a symmetry in regard to the passage of time." This allows us to look for reasons behind the time symmetry, and helps us understand how violation of that symmetry, such as in the Big Bang, leads to violation of that conserved quantity.

Additional examples

Why is the sky blue?
Particles in the air can scatter light, that's why we can see dust in a sunbeam or vapor from a tea kettle, or even air itself (in the case of the blue sky). The reason the sky is blue is that it scatters blue light better than red light-- which is also why the Sun looks red when setting, a long column of air intervenes when the Sun is low, and this preferentially removes the blue light that the air scatters. The "missing" blue light contributes to someone else's blue sky, leaving your red sunset. The explanation is not done though-- the reason blue light scatters better than red is that the frequency of oscillation of visible light, though very fast, is actually quite slow compared to the resonant modes of oscillation of air particles. This means the particles respond even quicker to the oscillations in the electric field of the light, allowing the particle response to be nearly in force balance with the electromagnetic forces from the light. Hence if there is just as bright red and blue light from the Sun (which is why the Sun looks so white), then the amplitude of the red-light-induced oscillations is the same as the amplitude of the blue-light-induced oscillations. But the emission of light by an oscillating dipole is proportional to the square of the acceleration, which is the square of the amplitude times the fourth power of the frequency. With no difference in amplituce, all that remains is the dependence on frequency to the fourth power, hence blue light scatters about 4 times more than does red light (as blue light has a frequency about 1.4 times that of red light, and 1.4 to the 4th power is about 4). This preferential scattering of blue light is called "Rayleigh scattering."

Notice that this explanation connects to what is already known (dust in sunbeams and stem from a kettle), and provides a kind of yin-yang contrast between the blue sky and the red setting Sun. Finally, it connects with a fundamental unknown, which is why an accelerating charge radiates the way it does (you could take that one more step by deriving the Larmor formula, but you still don't know why charges couple to light that way-- no one does).

Another example: why is the ocean blue?
The ocean is blue because a body of water mirrors whatever color light you shine on it, and the sky is generally blue, so the ocean generally mirrors that and looks blue. Water is not inherently blue, as you know if you've ever seen the red setting Sun reflected on the ocean, or the white Moon. (Judge for yourself if this answer connects with what is known, kills misconceptions, and offers contrasts in colors. What fundamental unknowns does it make contact with?)

Why are high-mass main-sequence stars so much more luminous than the Sun?
Stars are leaky buckets of light. Thus their luminosity depends on how much light is in the star at any given time, divided by how long it takes to leak out. A higher-mass star will reach fusion temperatures in its core when it still has a much larger radius than the Sun, because its large mass makes its gravity more effective at heating it as it contracts to a given size. Since the size of the "bucket" is set by whenever the core reaches fusion temperatures, a high-mass main-sequence star will have a larger volume than a low-mass one. The energy density in the light, on the other hand, is similar, because that depends only on temperature, and both high- and low-mass main-sequence stars have similar temperatures (as set by fusion). At fixed opacity, the characteristic diffusion time is also similar, because it scales with the radius times the optical depth, which scales with the mass/radius ratio-- which is nearly constant again because that characterizes the core temperature by the virial theorem. Thus the diffusion time for the light to get out is not sensitive to mass, and all that ends up mattering is the total radiant energy in the "bucket," i.e., the volume of the star-- which is much larger for high-mass main-sequence stars (even though they are all considered "dwarf stars"). Note that the value of the fusion temperature was never mentioned, nor anything about how fusion rate depends on anything, as none of those things matter except that the fusion acts to regulate the core temperature.

Why are some stars red and others white?
Red stars have lower surface temperatures, which means lower average energy per particle, because a lower temperature reservoir is loathe to give up as much energy to each mode of particle motion (that's what temperature means). Light comes in quanta called photons, so each photon is made by a single discrete process that requires a quantum of energy equal to what the photon will have, so whenever such quanta of energy are rare in the processes that create the photons (as for blue photons from stars with low surface T), the blue photons will be rare as well, whereas red will not be. But at high surface T, the blue photons will be just as common as red, because the high-temperature reservoir will be simiilarly able to excite either one, and the star looks white. A star can even look a bit blue, because being higher momentum particles, blue photons actually have more states in phase space available to them!

Other insights from the laws of physics
Understanding the laws not only allows us to find explanations, it also can give us perspective for framing the rest of what we know. For example, it is often held that astronomy makes humans seem very small, since our ~ 1 meter scale is dwarfed by some 27 orders of magnitude in size by the diameter of the observable universe, and a 1 with 27 zeroes behind it is surely a ghastly large number. But the laws of physics tell us several things that are not normally included in that story:
1) the laws of statistical mechanics say that the conditions necessary to produce systems as complex and organized as human intelligence must be incredibly unlikely to achieve and maintain. Thus, in order for them to appear, there has to be available a spectacular number of opportunities. This requires a vast universe-- and certainly a universe with a single Sun and a single earthlike planet, as the ancient Greeks envisioned, would never suffice. 2) the combination of the laws of quantum mechanics (via the parameter h), special relativity (via the parameter c), and general relativity (via G), allow these parameters to be combined in only one way to create a length scale. That scale is called the Planck length, the square root of hG/c^3, and it is some 35 orders of magnitude smaller than a human being. So if we seem small in a universe that is at least 27 orders of magnitude larger than we are, we must remember that we are even more, 35 orders of magnitude, larger than the size scale on which the universe can function in the way we understand. If we think of the Planck length and the Planck time (some 10^-44 seconds) as the fundamental bricks of which space and time are built, then we ourselves are built of a colossal number of those spatial bricks, and the timescales on which we act are a colossal number of those temporal bricks. If you think the Ents of Middle Earth were large and slow, you haven't met us!
Incidentally, another aspect of the Planck length and Planck time is that they give the scales on which the laws of physics as we know them break down. In order to specify the location of an event to within the accuracy of the Planck length, or the moment when it happened to within the accuracy of the Planck time, requires, by the uncertainty principle, an energy whose mass equivalent is the way to get the units of mass from h, c, and G. This is the Planck mass, and is surprisingly large, 10^-8 kg scale, which is an amount of mass you could hold in your hand and recognize it is there. But that is also the amount of mass that, if confined to the Planck length, would create a black hole, and destroy the system you were trying to locate to that accuracy. For this reason, and other issues about how the laws as we know them combine, we can say that the laws are built to break down. They tell us just where they cannot work any more (they might fail sooner than that, but they tell us they must fail there). This is often regarded as a terrible problem for the laws, but I would argue it is actually not a bug but a feature. Ironically, laws that cannot tell us when they must fail are more clearly incomplete than laws that do!
So these are all examples of insights that understanding the laws, not just using them to make predictions or solve problems, can provide to us. It is very much our goal to capitalize on such insights and explanations, and not just predictions and tests of the laws, in all that follows in this course. As a result, some of the questions we will pose will be hypothetical, "what if" kinds of questions, which might at first seem perplexing, but this is the purpose behind them.

Scientific knowledge comes in the form of testing and honing models
This may seem obvious, but the point is there are two types of quantities in science, the parameters we build into our quantitative models, and the data we use to test and hone those models. It is an important distinction, as the two types of numbers often get confused in informal descriptions of how science works. I will use the notation x for an observable datapoint, and y for a model quantity used to make sense of x. Hence x comes from experience, and is empirical, while y comes from conceptualization, and is rationalized. The interplay between observation and theory that is the central core of the scientific method involves using new x to better constrain y, and the current state of y to motivate new and better x values. Ranges in x constitute what we call "error", and ranges in y constitute what we call "uncertainty", but the two are never the same thing. We often take pains to teach students the difference between precision and accuracy, which are types of observational error, but we rarely make the equally important distinction between the error in a measurement and the uncertainty in the parameters in a model built to understand those measurements. Finally, neither the data x, nor the model parameters y, are ever the same thing as "what really is" z. Science constitutes better x producing better y, there is never any z in the formal process, and this can be a source of considerable confusion.
In particular, the existence of a successful scientific model does not displace the possibility of other successful models that are quite different, and whichever model has higher accuracy may be regarded as "more correct", but that does not make it better for all purposes, nor does it suggest we are "converging on absolute correctness". Absolute correctness is not the way science works, and more importantly, it is not even the way science is intended to work. These subtle but important distinctions are central to establishing some form of scientific rigor, a topic that receives little attention and can lead to significant misunderstanding. It has also led to upheaval and strife in history, because when science itself is not understood, scientific revolutions cannot be put in their proper context.
To give a better sense of what I mean, consider that I was once asked if I believe in the existence of dark energy, a topic that has already generated a Nobel prize. I paused, unable to understand what I was being asked, but I assumed what they really wanted was for me to use my scientific experience and training to assess that the model we call dark matter was our currently most successful model for explaining the data at hand. So I said it was the current best model for explaining existing data and making predictions with the highest probability for success about the future observations it motivates, which is all we ever expect from a scientific model. But I never said that I believed in it because I have no such relationship to the dark energy model, and they shouldn't care if I did because I would be leaving my role as scientist.

The geocentric vs. heliocentric model of the solar system
The classic example of the difference between data x, model attribute y, and absolute truth z, is the history of solar system models. This is a beautiful of example of how science works, how it does not work, and why failing to understand the difference caused no small inconvenience to a man named Galileo-- and continues to generate societal upheaval even to this very day. Here y is geocentric vs. heliocentric, x is the nature of gravity, the smallness of stellar parallax, retrograde motion of the planets, the moons of Jupiter, and the presence of stars in the sky that could only be like the Sun if they are unbelievably far away. We have data to explain, and models showing various degrees of success in various situations, but there is never any z in science-- there is never whether the Earth really is the center of the universe, or if the Sun really is the center of the solar system, because quite frankly neither one holds any value if interpreted as some kind of absolute truth. They are simply a mistake in what science is trying to do, shortchanging the significant accomplishments of science that have allowed us to navigate the solar system with space ships, and someday perhaps visit them ourselves, or (not any time soon) even live on them.

Bayesian inference
The way we implement these quantities x and y in the scientific method is via the concept of probabilities, and the mathematics for formally manipulating them is called Bayesian inference, after a Presbyterian minister (which seems ironic but actually makes perfect sense) named Thomas Bayes in the early 1700s. The probabilities that relate to measurement error are in the form of a conditional probability I will call p_d(x | y), where the little p means it is a conditional probability, the subscript _d means it refers to empirical data (not a theoretical model), x is the possible outcomes of a measurement conditioned on the value y of some model. So p_d(x | y) means, if we assume y is a true description (hypothetically), then what is the probability that we will obtain data x in our experiment. The way probability relates to the model is we let y be a parameter in our model, and we say P_m(y) is our current expected probability (often called the expectation "prior" to some new experiment) that y is the appropriate value of the model parameter. So if I was modeling my own height and then doing an experiment to test that model, P_m(y) is my currently expected probability distribution for my height model (which has a range that reflects my uncertainty about how to best model my own height), and p_d(x | y) is the distribution of outcomes x of a height measurement if each given value of y is regarded as the true height (again, the concept of a true height is a hypothetical device as always). Hence p_d(x | y) includes all our information about the measurement process itself, and no information about the universe at all (because it ranges through all possible y), whereas P_m(y) includes all our prior understanding of the universe, with no information about the experiment about to be done. The Bayesian approach completely separates what we think we know about the universe from our acts of learning new things that can test and hone that knowledge, and this is the essential feature of the scientific method as it is currently formalized.

The process of using a new datapoint x to hone our model presents us with a new conditional probability, p_m(y | x), where the _m means we are referring to the conditional probability of our model taking value y given that we just observed x. In other words, p_m(y | x) will be our new P_m(y) if our data comes out x, and the Bayes theorem tells us how to take our building blocks, the "prior" P_m(y) and the workings of our observation p_d(x | y), to obtain p_m(y | x). In class we will find what makes sense is
p_m(y | x) = p_d(x | y)*P_m(y)/P_d(x) where P_d(x) = sum over y of p_d(x | y)*P_m(y) is our expected probability that our next observation will come out x, given our "prior" state of knowledge expressed in P_m(y). This is how science hones its model parameters, and goes into every physical constant you can look up in a textbook-- without always explaining the process whereby these values are obtained, and why the current best value of a model parameter is quite different from the data x that went into it, and is also not a statement of some absolute truth z.