Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Biology
BiologyBotanyMicrobiologyEntomologyEvolutionPaleontology
Chemistry
General ChemistryAnalytical ChemistryElectrochemistryOrganic Synthesis
Earth Science
GeologyMineralogyOceanographyMeteorologyEarthquakes
Physics
General PhysicsResearchRelativityParticle PhysicsElectromagnetismFusionOpticsAcousticsNew Theories

Natural Science Forum / Physics / Research / June 2008



Tip: Looking for answers? Try searching our database.

definition and interpretation of entropy

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
chris.meyer123@googlemail.com - 23 Jun 2008 23:09 GMT
It is often said, that entropy is the average number of yes-no
questions you have to ask to obtain the information "which event"
occurred.

In Honerkamp "Statistical Physics An Advanced Approach with
Applications" (2.4.1) I found an explanation of the above dictum, but
I don't understand it. I first cite Honerkamp, tell you what I don't
understand and pose some further questions below.

Perhaps someone can help me!

Honerkamp writes:

"Let {A_1,...,A_N} be a complete, disjoint set of events, i.e A_1
u ....u A_N = \Omega.

Furthermore, let P be a probability defined for those events. We then
define the entropy as S = -k \sum_{i=1}^{N} P(A_i) ln(P(A_i)). Here k
represents a factor which we set 1 for the moment. In the framework of
statistical mechanics k will be Boltzmann's constant k_B."
..
"If an event has occurred, then, as we will show in a moment, -log_2
P(A_j) is a good measure of the number of questions to be asked in
order to find out that it is just A_j which is realized."
..

and now the section I don't understand:

"To show that -log_2 P(A_j) is just equal to the number of required
yes-or-no questions, we first divide \Omega into two disjoint domains
\Omega_1 and \Omega_2 such that \sum_{A_i \in \Omega_1} P(A_i) =
\sum_{A_i \in \Omega_2} P(A_i) = 1/2."

I think this is in general not possible. Consider just the trivial
case P(A_1) = 1 and P(A_i) = 0 for i != 1.

Honerkamp proceeds:

"The first question is now: Is A_j in \Omega_1? Having the answer to
this question we next consider the set containing A_j and multiply the
probabilities for this set by a factor of 2. The sum of probabilities
for this set is now again equal to 1, and we are in the same position
as before with the set \Omega: We divide  it again and ask the
corresponding yes-or-no question.This procedure ends after k steps,
where k is the smallest integer such that 2^kP(A_j) becomes equal to
or larger than 1. Consequently, -log_2P(A_j) is a good measure of the
number of yes-or-no questions needed."

Is there a (simple) way to modify this algorithm to get a satisfactory
interpretation of the concept of entropy? Let me make this question
more concrete:

Assume you know the probabilities P_1,...,P_N, you want to find out
what event (A_1 or ... or A_N) occurred and you have an algorithm aa
to do so. Then, if you apply your algorithm and suppose the result
will be A_j, you have to ask, say f_aa(A_j) questions.

Since I have statistical mechanics in mind, let N be a natural number
or infinity (Honerkamp doesn't say something about this as I can see)
and let us assume, that the following series converge (in the case of
N = infinity) (do you agree, that this can be assumed??).

In that case, the average number of questions you have to ask is
C_aa((P_k)_k\in{1...N}) := \sum_j P(A_j)f_aa(A_j).

Then the question is: Is there a modification mm of the algorithm
which Honerkamp described (and which doesn't work in my opinion), that
leads to f_mm = -log_2? If not, does an algorithm ll exist at all,
which leads to f_ll = -log_2? If not, how would you motivate the
definition of entropy? If yes, is it true that there is no other
algorithm aa such that C_aa((P_k)_{k\in\{1...N\}}) <  C_ll((P_k)_{k\in
\{1...N\}}) for some (P_k)_{k \in \{1...N\}} (P_k \in [0,1]...). If
yes, can you give me a mathematical rigorous proof. If not, can you
give me an algorithm cc which serves as counterexample? In the latter
case: why is then entropy not defined as sum_j P(A_j) f_cc(A_j) ?

Are there other ways to motivate, that -log_2P(A_j) is a good measure
for the "increment of knowledge" by receiving the information that A_j
is realized.

Surely -log_2P(A_j) >= 0 and -log_2P(A_j) = 0 iff P(A_j) = 1, which
are reasonable properties of such a measure, but that's not enough to
motivate the formula -log_2P(A_j)...

References to books or papers where (parts of) my questions were
answered would be nice.

Thanks,

Chris M
Ian Parker - 26 Jun 2008 04:43 GMT
On 23 Jun, 23:09, chris.meyer...@googlemail.com wrote:
> It is often said, that entropy is the average number of yes-no
> questions you have to ask to obtain the information "which event"
[quoted text clipped - 82 lines]
> References to books or papers where (parts of) my questions were
> answered would be nice.

I think your book explains things very badly. Let us look at a quantum
mechanical system. No matter what its complexity it can be expressed
in terms of a large but finite number of quantum states. A finite
number of quantum states can be expressed as a binary number. It says
a bit placed un an up/down true/false position.

Quantum theory gives us a clear numerical value, without quantum
theory we would have to express states in terms of Cantor's infinite
sets.

The second law of thermodynamics says that heat flows from a higher to
a lower temperature. We define temperature such that dE/dS = T

T = Temperature
E = Energy
S = Entropy

This is the normal definition of temperature. Entropy increases and
heat flows from a higher to a lower temperature. Entropy defined
statistically is

Log(number of states)

We may thus say that the temperature of a body is dE/d(Log N) where N
is the number of states. This is where we take our two partitions.
Physicists tend to think in terms of "e" whereas information theory
tends to use the base 2. Heat flows so as to maximize the number of
Quantum states. We thus find out how many states are created by a
nanojoule in each partition.

A statistical view point means that we can readily assign temperatures
to sub systems within a solid or a gas. We can define color
temperatures, we can say that the temperature of the electrons in a
florescent tube is 20,000K, similarly for carriers in an LED. What is
the meaning of these "temperatures"? It is quantum mechanical and
statistical.

A binary number does have a feel of Hutter. We can express the first
100 Megabytes of Wikipaedia in the form of a binary number. Hutter has
defined AI in entropic terms. What is entropy? It is the same thing as
compression.

 - Ian Parker
Einar Andreas Rodland - 26 Jun 2008 21:20 GMT
[Moderator's note: Lines reformatted to make them shorter.  Although 80
is a maximum to enable nice display on almost all terminals, it's better
to keep your own, original, unquoted text in posts at 72 characters per
line or less, to enable it to be quoted a couple of times and still keep
things at less than 80 characters per line total.  Also, don't use more
than 2 characters (including space) for a quote symbol.  -P.H.]

> It is often said, that entropy is the average number of yes-no
> questions you have to ask to obtain the information "which event"
> occurred.

This is an explanation of entropy which I suppose is more popular in
information sciences than in physics, hence Ian Parker's response.

However, let me address your issue from the point you make it.

> Honerkamp writes:
>
[quoted text clipped - 20 lines]
> I think this is in general not possible. Consider just the trivial
> case P(A_1) = 1 and P(A_i) = 0 for i != 1.

Well, in your trivial counter-example, you don't have to ask any
questions at all: you already know the answer is A_1. However, writing
P_i for P(A_i), let's take (P_1,P_2)=(2/3,1/3), and it also breaks down
since you always have to ask one question, but the entropy is less than
one bit: the reason for this is you don't get the optimal 1/2--1/2 split
corresponding to 1 bit of information, but only the 2/3--1/3 split
corresponding to less than 1 bit of information.

Honerkamp's explanation thus works only to the extent you can divide the
remaining alternatives into two groups such that each group has
likelihood 1/2. Of course, this will generally not be the case, and the
average number of yes-no questions will therefore be at least the number
of bits in the entropy: i.e. at least -Sum[P_i*log2(P_i)].

However, let's instead say you have X_1, X_2,...,X_K drawn independently
from 1...N with probabilities (P_1,...,P_N). Again, if the entropy of
the distribution is S = s bits, then even given an optimal strategy the
expected number of questions would have to be at least K*s. However,
given any epsilon>0, for sufficiently large K you can do with less than
K*(s+epsilon).

E.g. with (P_1,P_2)=(2/3,1/3) and K=2, your first question could be "Is
X_1=X_2=1?" which could at least help you get the expected number of
questions below 2 (to 17/9 if I'm right).

> Honerkamp proceeds:
>
[quoted text clipped - 11 lines]
> interpretation of the concept of entropy? Let me make this question
> more concrete:

I seem to recall there is an algorithm where you encode
(X_1,X_2,...,X_K) as a number in the interval [0,1]---X_1 determines
which of the intervals [0,P_1),[P_1,P_1+P_2),..., X_2 determines
subintervals similarly, etc.---and you can split the remaining interval
in half by each yes-no question. I suspect you cannot take the
K=infinity limit directly with respect to the encoding in [0,1] since
you'd want 0.99999...<1 in this encoding, but if you use the equivalent
ordering of (X_1,...), you can always make perfect 1/2--1/2 splits, so
Honerkamp's algorithm would have worked (with some caution I suppose
since K=infinity).

I'm afraid it's too long since I was looking at this to remember any
references.

Einar
a student - 26 Jun 2008 21:20 GMT
On Jun 24, 8:09am, chris.meyer...@googlemail.com wrote:
> It is often said, that entropy is the average number of yes-no
> questions you have to ask to obtain the information "which event"
[quoted text clipped - 4 lines]
> I don't understand it. I first cite Honerkamp, tell you what I don't
> understand and pose some further questions below.

The proof you describe does sound rather handwaving.  One good book,
which from memory discusses this interpretation (and others) of
entropy, is `Information Theory' by Robert Ash  there is a recent
Dover edition.

Entropy is only an `average' concept, and one has to consider a number
of events to pull out its meaning.  You can check Ash or other
textbooks for complete rigour, but here is the basic argument.

1.  For a random event generator, that generates outcome k with
probability p_k, consider a sequence of N outcomes (k1,k2,=85,kN) from
the generator (eg, a dice is thrown N times).  It is assumed that each
outcome is generated independently of the others.   The probability of
generating a given sequence is then given by
    P = (p_1)^(n_1) (p_2)^(n_2) =85. ,
where n_k is the number of times that outcome k appears in the
sequence.

2.  Now, as N is increased, we expect that any particular outcome k
will, typically, appear in the sequence a total of approximately
      n_k = Np_k
times (rounded to the nearest integer).  Such a sequence is called a
`typical sequence'.  It follows that the probability of generating
given typical sequence is given by, approximately,
   P_typ = (p_1)^(Np_1) (p_2)^(Np_2) =85.  = 2^{-NH] ,
where H is the Shannon entropy function
   H = - sum_k p_k log_2 p_k .

3.  It can be shown rigorously is that, for N sufficiently large, the
probability of obtaining a `typical' sequence of outcomes becomes as
close to 1 as one desires.  Hence, any non-typical sequences can be
ignored FAPP.  You will have to consult Ash or another textbook for
this 'law of large numbers'.

4.  Since each typical sequence has the same probability, P_typ, of
occurring, and the sum of these probabilities is very close to unity
for large N, the number of typical sequences of length N must be
approximately given by
   N_typ =  1 / P_typ  = 2^{NH} .

5.  We are now almost finished.  To determine which actual typical
sequence has occurred in a given run of N outcomes, how many yes/no
questions must I ask?  But this is same problem as guessing a number
from 1 to N_typ.  I number the possible typical sequences from 1 to
N_typ, and ask `is the sequence of outcomes in the first half or in
the second half.  And so on.  This will take at most log_2 N_typ
questions, i.e., the number of questions required is
    N_q = log_2 N_typ = NH.

6.  Finally, the average number of questions required per outcome is
then
     N_q / N = H.
Arnold Neumaier - 28 Jun 2008 19:08 GMT
chris.meyer123@googlemail.com schrieb:
> It is often said, that entropy is the average number of yes-no
> questions you have to ask to obtain the information "which event"
> occurred.
>
> References to books or papers where (parts of) my questions were
> answered would be nice.

You may find the discussion in Appendix A of
    http://lanl.arxiv.org/abs/0705.3790
illuminating.

Arnold Neumaier
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.