Week 1: Overview of the course
Welcome to the astrophysics advanced major sequence. Our purpose is
to understand how physics plays out in the cosmos, and to see how
studying the cosmos was central for advancing our physics.
Our goal will to do more than describe, but to actually explain--
although that requires fitting these ideas in our limited brains, which
necessitates simplification and idealization.
Navigating simplification and idealzation, while still obtaining useful
quantities, is one of the central themes of both physics and, especially, astronomy.
But first we must better understand the goal of any explanation.
What's in an explanation
Although there will not be a test question on what constitutes a good
explanation, there will be many test and homework questions that require
an explanation as part of the answer, so it is worth understanding the
elements of a good scientific explanation.
This is also true because part of understanding is being able to explain
things to yourself and your peers, and also, you may soon be called on
to be a teacher or teaching assistant.
Explanations vs. descriptions
A good explanation does more than describe the answer, it moves
through the "how" into the "how come."
This means it must empower us to make predictions about not just when
the phenomenon being explained will occur, but also when it should
not occur.
If the explanation does not pass that test, it is not an explanation, it
is just a description.
Example: Saying that planets orbit in the same plane because they form
from disks is just a description-- an explanation also tells why disks
themselves form, as that is a necessary part of the same question.
So we must say that disks form from gas that has angular momentum that
must be conserved, and also the virial theorem must be obeyed. The
latter requires that the characteristic speed of the orbiting particles
must scale like the inverse square root of the characteristic distance
from the star (right?), whereas the former says that the
mass of all the particles that carry the angular momentum (so
orbit in the same sense in roughly the same plane),
times their characteristic speed, must scale like the inverse of the
characteristic distance from the star (yes?).
Put those two statements together, and you can see that the mass of the
stuff that is carrying the angular momentum must scale like the inverse
square root of the characteristic distance to the star!
In short, as the orbiting gas cloud shrinks (to satisfy the virial
theorem as energy is dissipated), more and more mass has to be participating
in the angular momentum, i.e., more and more of the mass has to be in
a disklike orbit.
This is an explanation, and it also tells you what the characteristic
radius of the disk must be once it forms, given its mass
and angular momentum.
Now, of course the real world is never really that simple-- if the angular
momentum really stayed constant during gravitational contraction, we
could get disks easily enough, but we could not get the central star!
So there actually must be some way to extract angular momentum from a cloud
undergoing gravitational collapse.
The gas that is losing a large amount of its angular momentum can form the
star, and the gas that keeps most of its angular momentum is what forms
the disk and the planets. What separates these populations of gas is
a complicated story that is a subject of modern research.
So we see that a good explanation not only describes what happens, it allows
us to estimate the central quantities involved, and it also brings us into
contact with the research frontier when we push the explanation beyond its
simplest idealizations.
Note how just saying "the planets are in a plane because they form from a
disk" does none of those things!
Explanations are not unique
An explanation is not a specific prediction, so it is not unique-- there
may be many ways to explain the same phenomenon, all making the same
predictions.
But a good explanation should exhibit certain basic properties, some of
which I suggest below.
Example: You could explain orbits by saying that objects both fall and
move sideways at the right rate to maintain a fixed distance, which is
a kinematic description, or you
can explain it dynamically by saying that objects moving in a circle
are accelerating toward the center, and that acceleration is due
to the force of gravity.
Or enter a rotating frame and say the
balance is between gravity and the centrifugal force, there are many
options and the more you know, the more useful and powerful is that
knowledge. There is value in being able to provide multiple explanations,
because you may understand one better than another, and it is a good way
to catch your own misconceptions.
Explanations are unifying
An explanation that only works on the exact process it was created for
is of no value, it is essentially just a restatement that the process
happens because that process happens.
So a good explanation must encompass more than just the process it is being
used to understand-- it must unify that process with other processes that
were not already obviously the same.
Unification is central to gaining conceptual power over phenomena, because
once we recognize similarities in seemingly disparate phenomena (the
resonances of a musical instrument, and the resonances of a spectral line,
for example), that is when we start to create an understanding that
has predictive power in a wide range of situations.
Use what is already known
Unification is particularly powerful when it connects to other things
that are already better understood than the phenomena in question.
Often the person asking the question (which may be you, of course)
has prior knowledge of some aspects
of the issue, and they certainly have common experiences that people share.
A good explanation is framed so as to make contact in some way with these
common experiences, and basic pre-existing knowledge.
Sometimes that means placing yourself in the shoes of the person asking
the question (which is particularly easy if that person actually is you).
If you are presenting the explanation for grading, putting yourself
in the grader's shoes means trying to hit on the key points that are
being looked for in the grading rubric.
That means avoid verbose answers that mostly repeat vague claims, but
instead look for concise and pointed "bullets" that are crucial to
the explanation and will score points in the grading.
If you are presenting the explanation to a student, putting yourself
in their shoes means finding the concepts they already understand,
and building the explanation out of those concepts.
If you are asking yourself for your own explanation (which you should do
constantly as a student), notice the key concepts that come up, and make
sure you have a good grasp of them-- explaining things to yourself is how
you find out what you do and do not yet understand.
Example: Many people can assimilate the information in a diagram
more quickly than in words, so take advantage of this general ability
to interpret figures whenever possible.
This is not just true in learning and in teaching, it
is also true in oral presentations and in research papers, which are
fundamentally about explaining things.
Often it helps to think of any explanation as centering on the understanding
of a single key figure, and all the logical steps necessary to make sense
of that figure.
Kill the misconceptions
The flip side of connecting with what is already known is that it can
bring pre-existing misconceptions into the question, and if the new
answer does not explicitly displace those misconceptions, it can get merged
with them in a rather grotesque way.
Always look
for inconsistencies in the things you hold to be true, because you
will not automatically notice them without looking for them, whereas the
exams you are given will likely reveal them if you have not already.
Example: If you tell students that an orbit represents some
kind of balance involving
the force of gravity, they might think the balance has to do with
the action/reaction pair involved in the mutual gravity between the two
objects, or between the gravities of those objects and other orbiting planets.
If you do not specifically kill those misconception (say by pointing out that
the action/reaction pair balance is responsible not for an orbit but for
the conservation of momentum of the system as a whole, and by saying that
an orbit can exist even if there are no other orbiting planets), then the
misconceptions will become embedded in
what otherwise seems to be a successful explanation.
Seek contrasts and balances
Our brains are adept at thinking
in terms of contrasts, and in terms of opposing aspects
that are either in balance or imbalance.
Often explanations can be most powerful when they are framed in
these same terms, a kind of "yin-yang" approach where understanding
is empowered by manipulating the tension between the opposites.
Example: The above explanation of the
formation of disks can be viewed as resulting from the confrontation between
Kepler's second and third laws, tides can be viewed as caused by the
contrast between the gravitational acceleration at various points,
the temperature of the Earth's surface can be explained in terms of
a balance between the visible light absorbed from the Sun and the
infrared light
emitted by the ground.
Make contact with what is fundamentally unknown
Any answer to a "why" or "how come" question can be used to generate
new "why" or "how come" questions based on that answer.
This produces a chain of questions and answers that must eventually
result in the answer "I don't know."
But a truly good explanation will result, at the end of that chain,
in the answer "no one knows."
When that happens, you know you have connected the initial question
to some fundamental scientific unknown, and then you are truly
finished.
You should not feel disappointed when you encounter such a fundamental
uncertainty, you should instead feel marvel-- whether you are asking
yourself the question, or someone else is.
Science is not designed to completely demystify a question, it is designed to replace
superficial mysteries with more profound ones.
The advancement of physics involves increasingly profound questions, like replacing
"why is energy conserved" with "why, and under what circumstances, does the universe
exhibit a symmetry in regard to the passage of time."
This allows us to look for reasons behind the time symmetry, and helps us understand
how violation of that symmetry, such as in the Big Bang, leads to violation of that
conserved quantity.
Additional examples
Why is the sky blue?
Particles in the air can scatter
light,
that's why we can see dust in a sunbeam or vapor from a
tea kettle, or even air itself (in the case of the blue sky).
The reason the sky is blue is that
it scatters blue light better than red light-- which is also why
the Sun looks red when setting, a long column of
air intervenes when the Sun is low, and this preferentially removes
the blue light that the air scatters.
The "missing" blue light contributes
to someone
else's blue sky, leaving your red sunset.
The explanation is not done though--
the reason blue light scatters better than red is that the frequency
of oscillation of
visible light, though very fast, is actually quite slow compared
to the resonant modes of oscillation of air particles.
This means the particles respond even quicker to the oscillations
in the electric field of the light, allowing the particle response to be
nearly in force balance with the electromagnetic
forces from the light.
Hence if there is just as bright red and blue light from the Sun (which is why
the Sun looks so white), then
the amplitude of the red-light-induced oscillations is the same
as the amplitude of the blue-light-induced oscillations.
But the emission of light by an oscillating dipole is proportional to
the square of the acceleration, which is the square of the amplitude
times the fourth power of the frequency.
With no difference in amplituce,
all that remains is the dependence on frequency to the fourth power, hence
blue light scatters about 4 times more than does
red light (as blue light has a frequency about 1.4 times that of red light, and
1.4 to the 4th power is about 4).
This preferential scattering of blue light is
called "Rayleigh scattering."
Notice that this explanation connects to what is already known (dust
in sunbeams and stem from a kettle), and provides a kind of yin-yang
contrast between the blue sky and the red setting Sun.
Finally, it connects with a fundamental unknown, which is why an
accelerating charge radiates the way it does (you could take that one
more step by deriving the Larmor formula, but you still don't know why
charges couple to light that way-- no one does).
Another example: why is the ocean blue?
The ocean is blue because a body of water mirrors whatever color light
you shine on it, and the sky is generally blue, so the ocean generally
mirrors that and looks blue.
Water is not inherently blue, as you know if you've ever seen the red
setting Sun reflected on the ocean, or the white Moon.
(Judge for yourself if this answer connects with what is known, kills
misconceptions, and offers contrasts in colors. What fundamental
unknowns does it make contact with?)
Why are high-mass main-sequence stars so much more luminous than the Sun?
Stars are leaky buckets of light. Thus their luminosity depends on
how much light is in the star at any given time,
divided by how long it takes to leak out.
A higher-mass star will reach fusion temperatures in its core when it
still has a much larger radius than the Sun, because its
large mass makes its gravity more
effective at heating it as it contracts to a given size.
Since the size of the "bucket" is set by whenever the core
reaches fusion temperatures, a high-mass main-sequence star will have
a larger volume than a low-mass one.
The energy density in the light, on the other hand, is similar,
because that depends only on temperature, and both high- and low-mass
main-sequence stars have similar temperatures (as set by fusion).
At fixed opacity, the characteristic diffusion time is also similar,
because it scales with the radius times the optical depth, which
scales with
the mass/radius ratio-- which is nearly constant again because that
characterizes the core temperature by the virial theorem.
Thus the diffusion time for the light to get out is not sensitive to mass,
and all that ends up mattering is the total radiant energy in the "bucket,"
i.e., the volume of the star-- which is much larger for high-mass
main-sequence stars (even though they are all considered "dwarf stars").
Note that the value of the fusion temperature was never mentioned, nor
anything about how fusion rate depends on anything, as none of those
things matter except that the fusion acts to regulate the core temperature.
Why are some stars red and others white?
Red stars have lower surface temperatures, which means lower
average energy per particle, because a lower temperature reservoir
is loathe to give up as much energy to each mode of particle motion
(that's what temperature means).
Light comes in quanta called photons, so each photon is made by a
single discrete process that requires a quantum of energy equal to
what the photon will have, so whenever such quanta of energy are rare
in the processes that create the photons (as for
blue photons from stars with low surface T), the blue photons
will be rare as well, whereas red will not be.
But at high surface T, the blue photons will be just as common
as red, because the high-temperature reservoir will be simiilarly
able to excite either one, and
the star looks white.
A star can even look a bit blue, because being higher momentum
particles, blue photons actually have more states in phase space
available to them!
Other insights from the laws of physics
Understanding the laws not only allows us to find explanations, it also can
give us perspective for framing the rest of what we know.
For example, it is often held that astronomy makes humans seem very small,
since our ~ 1 meter scale is dwarfed by some 27 orders of magnitude in size
by the diameter of the observable universe, and a 1 with 27 zeroes behind it
is surely a ghastly large number. But the laws of physics tell us several
things that are not normally included in that story:
1) the laws of statistical mechanics say that the conditions necessary to
produce systems as complex and organized as human intelligence must be
incredibly unlikely to achieve and maintain. Thus, in order for them to
appear, there has to be available a spectacular number of opportunities.
This requires a vast universe-- and certainly a universe with a single
Sun and a single earthlike planet, as the ancient Greeks envisioned, would
never suffice.
2) the combination of the laws of quantum mechanics (via the parameter h),
special relativity (via the parameter c), and general relativity (via G),
allow these parameters to be combined in only one way to create a length
scale. That scale is called the Planck length, the square root of hG/c^3,
and it is some 35 orders of magnitude smaller than a human being. So if
we seem small in a universe that is at least 27 orders of magnitude larger
than we are, we must remember that we are even more, 35 orders of magnitude,
larger than the size scale on which the universe can function in the way
we understand. If we think of the Planck length and the Planck time (some
10^-44 seconds) as the fundamental bricks of which space and time are built,
then we ourselves are built of a colossal number of those spatial bricks, and the
timescales on which we act are a colossal number of those temporal bricks.
If you think the Ents of Middle Earth were large and slow, you haven't met us!
Incidentally, another aspect of the Planck length and Planck time is that they
give the scales on which the laws of physics as we know them break down.
In order to specify the location of an event to within the accuracy of the
Planck length, or the moment when it happened to within the accuracy of the
Planck time, requires, by the uncertainty principle, an energy whose mass
equivalent is the way to get the units of mass from h, c, and G. This is the
Planck mass, and is surprisingly large, 10^-8 kg scale, which is an amount
of mass you could hold in your hand and recognize it is there. But that is
also the amount of mass that, if confined to the Planck length, would create
a black hole, and destroy the system you were trying to locate to that accuracy.
For this reason, and other issues about how the laws as we know them combine,
we can say that the laws are built to break down. They tell us just where
they cannot work any more (they might fail sooner than that, but they tell us
they must fail there). This is often regarded as a terrible problem for the
laws, but I would argue it is actually not a bug but a feature. Ironically, laws that
cannot tell us when they must fail are more clearly incomplete than laws that
do!
So these are all examples of insights that understanding the laws, not just using
them to make predictions or solve problems, can provide to us. It is very much our
goal to capitalize on such insights and explanations, and not just predictions and
tests of the laws, in all that follows in this course. As a result, some of the
questions we will pose will be hypothetical, "what if" kinds of questions, which
might at first seem perplexing, but this is the purpose behind them.
Scientific knowledge comes in the form of testing and honing models
This may seem obvious, but the point is there are two types of quantities
in science, the parameters we build into our quantitative models, and
the data we use to test and hone those models.
It is an important distinction, as the two types of numbers often get
confused in informal descriptions of how science works.
I will use the notation x for an observable datapoint, and y for a
model quantity used to make sense of x. Hence x comes from experience,
and is empirical, while y comes from conceptualization, and is rationalized.
The interplay between observation and theory that is the central core of
the scientific method involves using new x to better constrain y, and
the current state of y to motivate new
and better x values.
Ranges in x constitute what we call "error", and ranges in y constitute
what we call "uncertainty", but the two are never the same thing.
We often take pains to teach students the difference between precision
and accuracy, which are types of observational error, but we rarely make
the equally important distinction between the error in a measurement and
the uncertainty in the parameters in a model built to understand those
measurements. Finally, neither the data x, nor the model parameters y,
are ever the same thing as "what really is" z. Science constitutes
better x producing better y, there is never any z in the formal process,
and this can be a source of considerable confusion.
In particular, the existence of a successful scientific model does not
displace the possibility of other successful models that are quite different,
and whichever model has higher accuracy may be regarded as "more correct", but
that does not make it better for all purposes, nor does it suggest we are
"converging on absolute correctness". Absolute correctness is not
the way science works, and more importantly, it is not even the way
science is intended to work.
These subtle but
important distinctions are central to establishing some
form of scientific rigor, a topic that
receives little attention and can lead to significant misunderstanding.
It has also led to upheaval and strife in history, because when science
itself is not understood, scientific revolutions cannot be put in their
proper context.
To give a better sense of what I mean, consider that I was once asked
if I believe in the existence of dark energy, a topic that has already
generated a Nobel prize. I paused, unable to understand what I was
being asked, but I assumed what they really wanted was for me to use
my scientific experience and training to assess that the model we call
dark matter was our currently most successful model for explaining the
data at hand. So I said it was the current best model for explaining
existing data and making predictions with the highest probability for
success about the future observations it motivates, which is all we ever
expect from a scientific model. But I never said that I believed in it
because I have no such relationship to the dark energy model, and they
shouldn't care if I did because I would be leaving my role as scientist.
The geocentric vs. heliocentric model of the solar system
The classic example of the difference between data x, model attribute y,
and absolute truth z, is the history of solar system models.
This is a beautiful of example of how science works, how it does
not work, and why failing to understand the difference caused no
small inconvenience to a man named Galileo-- and continues to generate
societal upheaval even to this very day. Here y is geocentric vs.
heliocentric, x is the nature of gravity, the smallness of stellar
parallax, retrograde motion of the planets, the moons of Jupiter, and
the presence of stars in the sky that could only be like the Sun if
they are unbelievably far away. We have data to explain, and models
showing various degrees of success in various situations, but there is
never any z in science-- there is never whether the Earth really is the
center of the universe, or if the Sun really is the center of the solar
system, because quite frankly neither one holds any value if interpreted
as some kind of absolute truth. They are simply a mistake in what science
is trying to do, shortchanging the significant accomplishments of science
that have allowed us to navigate the solar system with space ships, and
someday perhaps visit them ourselves, or (not any time soon) even live on them.
Bayesian inference
The way we implement these quantities x and y in the scientific method
is via the concept of probabilities, and the mathematics for
formally manipulating
them is called Bayesian inference, after a Presbyterian minister (which seems
ironic but actually makes perfect sense) named Thomas Bayes in the early 1700s.
The probabilities that relate to measurement error are in the form of a
conditional probability I will call p_d(x | y), where the little p means
it is a conditional probability, the subscript _d means it refers to
empirical data
(not a theoretical model), x is the possible outcomes of a measurement
conditioned on the value y of some model. So p_d(x | y) means, if we assume
y is a true description (hypothetically), then what is the probability that
we will obtain data x in our experiment. The way probability relates to
the model is we let y be a parameter in our model, and we say P_m(y) is our
current expected probability (often called the expectation "prior" to some
new experiment) that y is the appropriate value of the model
parameter. So if I was modeling my own height and then doing an experiment
to test that model, P_m(y) is my currently expected probability distribution
for my height model (which has a range that reflects my uncertainty about
how to best model my own height), and p_d(x | y) is the distribution of
outcomes x of a height measurement if each given value of y is regarded
as the true height (again, the concept of a true height is a hypothetical
device as always). Hence p_d(x | y) includes all our information about
the measurement process itself, and no information about the universe at
all (because it ranges through all possible y), whereas P_m(y) includes
all our prior understanding of the universe, with no information about
the experiment about to be done. The Bayesian approach completely
separates what we think we know about the universe from our acts of learning
new things that can test and hone that knowledge, and this is the essential
feature of the scientific method as it is currently formalized.
The process of using a new datapoint x to hone our model
presents us with a new conditional probability,
p_m(y | x), where the _m means we are referring to the conditional
probability of our model taking value y given that we just observed x.
In other words, p_m(y | x) will be our new P_m(y) if our data comes out x,
and the Bayes theorem tells us how to take our building blocks,
the "prior" P_m(y) and the workings of our observation p_d(x | y),
to obtain p_m(y | x). In class we will find what makes sense is
p_m(y | x) = p_d(x | y)*P_m(y)/P_d(x)
where P_d(x) = sum over y of p_d(x | y)*P_m(y) is our expected probability
that our next observation will come out x, given our "prior" state of
knowledge expressed in P_m(y). This is how science hones its model
parameters, and goes into every physical constant you can look up in
a textbook-- without always explaining the process whereby these values
are obtained, and why the current best value of a
model parameter is quite different from the data x that went into it, and is
also not a statement of some absolute truth z.