In 1948 Claude
Shannon forged a link between the thermodynamic concept of entropy and a new
formal concept of information. This event marked the beginning of information
theory. This discovery captured the imagination of Ed Jaynes, a physicist with
strong interest in statistical mechanics and probability theory. His expertise
in statistical mechanics meant that he understood entropy better than many. His
recognition of probability theory as an extended form of logic meant that he
understood that probability calculations (and therefore all of science) are
concerned not directly with truths about reality, as many have supposed, but
with information about truths.

The distinction
may seem strange – science accepts that there are statements about nature that
are objectively either true or false, and definitely not some combination of
true and false, so the most desirable goal must be to know which of the
options, ‘true’ or ‘false’ is the case. But the truth values of such statements
are not accessible to human sensation, and therefore remain hidden also from
human science. This is a difficult fact for intelligent animals like us to deal
with, but we have learned to do so, partly by inventing a set of procedures
called science. Science acknowledges that the truth of a proposition can not be
known with certainty, and so it sets out instead to determine the probability
of truth. For this purpose, it combines empirical information and logic.

For Ed Jaynes, therefore, Shannon’s new information theory
was instantly recognizable as a breakthrough of massive importance. Jaynes
thought about this new tool, meditated on it, digested it, and played with it
intensely. One of the outcomes of this meditation was a beautiful idea known as
maximum entropy. The title of this blog, then, is a tribute to Edwin Jaynes, to
this beautiful idea of his, and to the many more exceptional ideas he produced.

As a physicist,
I never received much education in statistics and probability – we know the sum
and product rules, we know how to write down the formulae for the Poisson and
normal distributions and how to calculate a mean and a standard deviation, and
that’s about it really. Oh and some typically badly understood model fitting by
maximum likelihood (we call it ‘method of least squares’, which if you know
stats, tells you how limited our understanding is).

During my PhD
studies in semiconductor physics, I became very dissatisfied with this
situation, as it gradually dawned on me that scientific method and statistical
inference must rightly be considered as synonymous: they are both the rational
procedure for estimating what is likely to be true, given our necessarily
limited information. I set out to teach myself as much as I could about
statistics. Not surprisingly, my first investigations led me to what is often
referred to as orthodox methodology. I laboured with the traditional hypothesis
tests – t-tests and so forth – but I found the whole framework very
unpalatable: confused, disjointed, self-contradicting – just ugly. Then I
stumbled on Bayes’ theorem, and my world view was elevated to a higher plane.
Some time after that I discovered Ed Jaynes’ book, ‘Probability Theory: The
Logic of Science,’ and my horizon was expanded again, by another order of magnitude.
Problems that I had thought to be only approachable by the orthodox methods
became recognizable as simple extensions of Bayes’ theorem, and any nagging
doubts I had about the validity of the Bayesian program were banished by
Jaynes’ clearly formulated logic.

It is not that I
am totally against orthodox (sometimes called frequentist) methods. But the
success of frequentist techniques is limited to the range of circumstances in
which they do a reasonable job of approximating Bayes’ theorem. The range of
applications, however, in which the two approaches diverge is unfortunately
quite large, while orthodox theory seems to have nothing fundamental to say about when to expect such divergence.

Bayes’ theorem
works by taking a prior probability distribution and combining it with some
data to produce an updated distribution, known as the posterior probability.
After the next set of data comes in, the posterior probability is treated as
the new prior, and another update is performed. The process goes on as long as
we wish, with presumably the posterior probability distributions narrowing down
ever closer upon a particular hypothesis.

One of the
problems we might anticipate with this procedure, however, is where does the
process start? What do we use as our original prior? The principle of
indifference works in many cases. Indifference works like this: if I am told
that a 6 sided die is to be thrown, with no additional information about the
die or the method of throwing, then symmetry considerations require that the
probability for any of the sides to end up facing upwards is 1/6. For some
more complex situations, however, indifference fails. One of the things that the principle of maximum
entropy achieves is to provide a technique for assigning priors in a huge range
of new problems, unassailable using the principle of indifference.

As Shannon
discovered, information can be considered as the flip side of entropy, a
thermodynamic idea representing disorder – the more information, the less
entropy. Why then should science be interested in maximizing entropy? What we
are looking for is the probability distribution that incorporates whatever
information we have, without inadvertently incorporating any assumed
information that we do not have. We need that probability distribution with the
maximum amount of entropy possible, given the constraints set by our available
information. Maximum entropy, therefore, is a tool for specifying exactly how
much information we possess on a given matter, which is evidently one of the
highest possible goals of honest, rational science. This is why I feel that
‘maximum entropy’ is an appropriate title for this blog about scientific
method.

Substantial portions of

*Probability Theory: The Logic of Science*, by E. T. Jaynes can be viewed here.

More resources relating to Ed Jaynes and his writings can be found at http://bayes.wustl.edu/

The scientific method is normally defined as being a social phenomenon - due to the so-called "confirmation" phase. Inductive inference is not necessarily social. So: these things can't really be synonyms, unless you define "the scientific method" in a rather unusual manner.

ReplyDeleteHi Tim

DeleteI'm not really sure what your point is, but I define scientific method to be the systematic evaluation of empirical evidence in order to distinguish what is probably true from what is not. I find that perfectly reasonable.

There are social aspects to science, in that, for example, one person's ideas can inspire the experiments or theoretical developments of another. Peer review and replication are also social, but they are rooted in the need to reduce the probability for systematic error and bias, and are therefore part of the process of inductive inference.

Science depends, however, on many processes that are not scientific. For example, the research that gets done depends heavily on who gets funding. This does not imply that the funding mechanisms are part of scientific method, or are necessarily logical.

The ease with which an idea is accepted as part of the scientific consensus depends on peer review, replication, etc., as well as several processes that are not scientific. These latter provide some measure of the extent to which scientists betray their own methodology.