Fine and
Coarse Tuning, Renormalizability
and
Probabilistic Reasoning
Alexander R. Pruss
September 26, 2003
The laws of physics depend on a number of basic constants. It has been argued that these constants appear “fine tuned” for the possibility of life. If the mass of the proton were significantly different from what it is, say, then life would be impossible. But, it is claimed, no more basic scientific reason can be given for why the mass of the proton falls in the life-admitting range rather than outside of it. The proponent of the fine tuning argument (FTA) concludes that this, together with many similar cases, gives some evidence for the existence of an intelligent designer of the cosmos who chose the values of the basic constants in the laws of nature in such a way that life would be possible.
The main point of this paper is to respond to an old objection, the Renormalizability Objection, as defended with great plausibility recently by Timothy McGrew, Lydia McGrew and Eric Vestrup [MMV].[1] For simplicity, suppose we are dealing with a single constant K which in fact has the value 12.4, in some appropriate unit system. Suppose, further, that it turns out that we know that life can only exist if 8.0<K<20.2. Assume, further, that the laws of physics do not prescribe any value for K nor yield any objective stochastic process that sets the value K. The laws make sense for any positive value of K, and only for positive values of K. I will call a set of values “the life-admitting range” provided that all values within the set are physically compatible with life and no values outside the set are. The FTA then notes that it is a priori unlikely that a constant, on which the only prior constraint is that it should be positive should fall in the life-admitting range which is a subset of the interval from 8.0 to 20.2).
The Renormalizability Objector, however, responds that we do not have an appropriate probability assignment on the set of positive real numbers (0,¥) to make sense of the claim that it is “unlikely” that the constant should fall in the life-admitting range. If the constant were physically constrained to lie in the set (0,1000000), then the FTA would work just fine. The probability that the constant should fall in the life-affirming range would then be no bigger than (20.2-8.0)/1000000=0.0000122, because we could reasonably assign a uniform probability measure to the set of possible positive constants (0,1000000). I will in general use the standard notation (a,b)={ x : a < x < b }, [a,b] = { x : a £ x £ b }, (a,b] = { x : a < x £ b } and [a,b) = { x : a £ x < b }. But there is no appropriate probability measure on the set (0,¥) the embodies our intuition that all values of K are a priori equally likely. Specifically, an appropriate probability measure would be one such that the probability of every interval of length L would be the same (or, more precisely, such that the probability of K’s falling into every such interval would be the same), no matter what the value of a was, and one that assigns a probability to every Borel subset of (0, ¥).
Our not having such a probability measure is not just due to our ignorance. In fact, no such probability measure can exist. It is easy to see that such a probability measure would also assign the same probability to every interval (a,a+L] of length L.[2] But now let p be the probability of the interval (0,1]. Then, this is also the probability of (1,2], and of (2,3], and so on. But by the countable additivity property of classical (Kolmogorovian) probabilities, since the intervals (0,1], (1,2], (2,3], … are pairwise disjoint, it follows that:
P((0,¥))=P((0,1]È(1,2]È(2,3]È…)=P((0,1])+P((1,2])+P((2,3])+…=p+p+p+….
Now the left hand side here is finite—indeed, equal to one. The only way an infinite sum of p’s can have finite total is if p=0. But then it follows that P((0,¥))=0, which contradicts the fact that P is a probability measure on the set (0,¥), since a probability measure on a set assigns probability one to the whole set. Thus, the assumption of an appropriate probability measure leads to absurdity, in light of the countable additivity property of probability.
Without a probability measure on the space of all possible values of K, there is no way to make sense of the FTA based on the “smallness” of the life-admitting range which is a subset of (8.0,20.2). This is the Renormalizability Objection: we cannot “renormalize” the state space (0,¥) so as to give it probability one while preserving the intuition that all values of K are prima facie equally likely.
There is a natural response. If we restrict the space of possible values of K to some subset (0,L) of (0,¥), there is no difficulty in assigning probabilities. We just let the probability measure PL of (Borel measurable) subsets of (0,L) be that associated with the uniform distribution on (0,L). Thus, the probability of a subset of the form (a,b) is (b-a)/L. We can then extend this definition to all (Borel measurable) subsets of (0,¥) via a limiting procedure:
P¥(A)=limL®¥ PL(AÇ(0,L)).
This seems to support the FTA very nicely: P¥((8.0,20.2)) is in
fact equal to zero since PL((8.0,20.2))=(20.2-8.0)/L
if 20.2£L, and hence tends to zero as L tends to infinity. Unfortunately
we have two problems here. First of all,
P¥
is not a probability measure in the sense of classical probability theory. For instance, P¥ fails to be
countably additive since PL((n,n+1])=1/L if n+1£L, so that P¥((n,n+1])=0,
while P¥((0,¥))=1,
and (0,¥)
is the union of the intervals (n,n+1].
Moreover, P¥ fails to assign probabilities to all Borel
subsets of (0,¥),
as can be seen without undue difficulty.[3]
Suppose, however, that we are not daunted by these technicalities—as I will argue later we should not be daunted. We now face MMV’s ingenious Coarse Tuning Objection. If K, which is in fact 12.4, must lie between 8.0 and 20.2 to allow for life, we may well be quite impressed, and the FTA may have some intuitive pull on us. But suppose that instead we found scientifically that in order for life to exist, K would have to lie between 0.0000000001 and 10000000000. Our amazement at the fine tuning would surely cease. The argument for intelligent design based on such coarse tuning would impress no one. But if we use the limiting probability “measure” P, then we are committed to the Coarse Tuning Argument (CTA) being just as good as the Fine Tuning Argument. For just as P¥((8.0,20.2))=0, so too P¥((0.0000000001,10000000000))=0. If the former equation supports the existence of a designer of the universe, so does the latter, and to exactly the same degree. But since the CTA is, or so it is alleged, an evidently bad one, likewise the FTA based on this limiting account of probabilities is a bad argument.
The challenge MMV sets is thus to come up with a version of the FTA that does not simultaneously support the CTA. But this is a challenge I will reject. Instead, I will defend the CTA. But before doing this, I will argue that the difficulties in the probabilistic reconstruction of the FTA are not due to the argument’s being a bad one, but due to the fact that classical probability theory has its limitations for modeling the epistemic phenomenon of credence.
I will, however, add a sceptical note at the end. The MMV objection may have been answered, but perhaps, private assurances from the McGrews’ notwithstanding, the objection merely points towards a more serious objection, to which at present I have no answer.
The theory of epistemic probabilities claims that in some sense we can model credences, degrees of belief, within a classical probability calculus. Before continuing with the discussion, we will need to define classical Kolmogorovian probability theory. We are given a state space W, together with a certain s-algebra B of subsets of W, known as the “measurable subsets” or “events”. A s-algebra is a collection of subsets with the property that the empty set is a member, the complement of a member of the s-algebra is a member and the union of a countable number of members is also a member. Furthermore, we have a probability measure P on W which is a real-valued function on B satisfying:
(NonNeg) P(A)³0 for all A in B;
(CountAdd) For every sequence A1,A2,…,An of disjoint members of B we have: P(A1ÈA2ÈA3È…)=P(A1)+P(A2)+P(A3)+….
(TotalOne) P(W)=1.[4]
A claim that we can model credences within classical probability theory then seems to be a claim that given an agent and a sufficiently large collection P of propositions beliefs in some of which we wish to model, we can find a probability space <W,B,P> such that each proposition p in P is associated with an event p* in B, such that P(p*) is equal to the agent’s degree of credence in p, and such that
(C1) (p1&p2&…)*=p1*Çp2*Ç…,
(C2) (p1 or p2 or …)*=p1*Èp2*È…,
(C3) p*Íq* whenever p entails q, and
(C4) (~p)*=(p*)c, where Ac indicates the complement of a set.
One might think of W as a set of situations and p* as the set of those situations which would make p true, or one might think of W as a set of worlds and p* as those worlds at which p is true, or one might even think of W as the set of all functions f from P to {T,F} such that it is possible that all and only the members p of P such that f(p)=T are true and let p* be the set of all such functions that assign T to p, and then in each case choose B appropriately. Frequently, the distinction between a proposition p in P and the event p* is blurred, and if done carefully this is quite harmless—I will not be avoiding talk of the probability of a proposition p.
Unfortunately, the claim that probabilities model credences is empirically false except in the simplest cases. People’s credences do not satisfy the probability calculus because they tend to be inconsistent. This is not a problem. While it is false that our credences are modeled by the probability calculus, it is surely plausible that they ought to be. Inconsistency is to be avoided, after all. Thus, even if the probabilistic account of credences is descriptively false, it may be normatively true. And that is all that we need for analyzing arguments like the FTA, since the question there is not of what we in fact believe, but of what we ought to believe.
However, there are very serious problems with the thesis that classical probabilities ought to model credences. Suppose I know with complete certainty that X could have any value in the interval [0,1], and there is no reason to favor any value over any other. Thus, P(XÎ[0,1]) is 1. But if we assign a uniform probability distribution to [0,1], which is surely what the setup calls for, then P(X=1/2)=0: indeed, the probability of X taking on any particular value is zero. This implies that P(XÎ[0,1] and X¹1/2)=1. But surely one’s credence that X is between 0 and 1 (inclusive), and one’s credence that X is between 0 and 1 (inclusive) and is not 1/2 are not the same: the latter credence is surely smaller. Likewise, there should be a difference between one’s credence that X is 1/2 and one’s credence that X is the largest integer. One has a certainty with respect to the latter, arising from the simple fact that there just is no largest integer, that one does not have with respect to the former, even though in both cases the numerical probability assignment will be zero.
It seems
that classical numerical probabilities are not fine-grained enough to fit even
with idealized credences. For a case
closer to the renormalizability considerations, suppose I learn that tomorrow a
very large conventional bomb will explode somewhere in the Eastern
But my credences in the last case cannot be modeled with classical probability theory. Divide up the infinite universe into an infinite sequence, B1, B2, …, of disjoint rectangular blocks of equal size, with the Earth and a large buffer zone around the Earth being contained in B1. My credence that the blast will be centered in Bn ought then to be the same as my credence that the blast will be centered in Bm, no matter what n and m are. Not having any further information about the location of the blast does not let me assign a higher credence to any one blast location. Now, I rightly assign credence one to the disjunctive claim that the blast will be centered in B1, or in B2, or in B3, or …. Assuming these are mutually exclusive possibilities, if pn is the proposition that the blast will be centered in Bn we would have to have P(p1*)+P(p2*)+…=1 by CountAdd. Since all the pn have the same credence, we would obtain a contradiction: the only way an infinite sum of the same finite number can be finite is if each summand is zero, but then the sum is also zero, not one.
Hence, my credences in the infinite case are not modeled by the classical probability calculus, not even normatively modeled. Yet we can meaningfully talk of these credences and engage in likelihood-based reasoning about them, where I use the word “likelihood” as being more general than the specific term “probability” which I will use in connection with classical probability theory.
Moreover, we
can engage in some inferences on the bases of these likelihoods. Before, I assumed that I had no idea of any
sort of bias in the choice of blast location.
Suppose now that this is only one of two hypotheses, and assume that it
turns out that the blast occurs in
The choice
to opt for the malevolent agent theory would already be a good one if our initial
knowledge was that the blast would occur somewhere on Earth, and the null hypothesis
said that there was no bias in favor of one place on Earth over another. But as the volume of space in which the blast
could occur increases, the likelihood that the blast would occur near
This can be formulated as a limiting case argument, but need not be. Rather, the point of talking of expanding volumes of space is to highlight a general observation:
(Expand-1) Under the hypothesis that there is no bias in the process in favor of one volume of space over another, the likelihood that the blast would occur in any one specified location decreases as we add more space for the blast to happen in.
This is intuitively quite clear and we can justify it as follows. If S* is a volume of space containing a smaller volume of space S, both of which contain some location a, then we can say the following about the case where what we know is just that the blast will occur in S*. Either the blast will occur in S or in S*-S. If it occurs in S*-S then it will certainly not occur at x. If it occurs in S, then it might well occur at a. The situation where we add to the information that the blast is in S* the additional information that it occurs in S is just like the situation where from the beginning we knew the blast would occur in S. Thus, the likelihood that the blast would occur at a is lower if we just know that the blast will occur in S*, since that allows for the possibility of the blast occurring in S*-S, in which case it is certain not to occur at a.
There are other ways of convincing someone that in the case of infinite space we do not need to worry about the blast. For instance, a quick calculation shows that the likelihood that the blast will occur within a thousand miles of Earth is less than 10-44 of the likelihood that the blast will occur somewhere in the Andromeda Galaxy. Any likelihood that is that much smaller than some other likelihood is negligible. Note that for this argument we do not need to make any assumptions on whether space is finite or not: all we need to assume is that it contains Earth and the Andromeda Galaxy, and to note that it is just so much more likely that the blast should be in the Andromeda Galaxy that we need not worry. It would be strange to suppose that this argument only works in the case of finite space, and if space is infinite it fails. Surely, if anything, it works better in the infinite case.
Similarly, when offered a bet on which one pays a billion dollars—or indeed any amount up to 1044 dollars—if the blast occurs within a thousand miles of Earth and is paid a dollar if the blast occurs in the Andromeda Galaxy, it is rational to accept the bet. And a fortiori it is rational to accept such a bet if one is paid the dollar if the blast occurs anywhere in an infinite universe at least a thousand miles from earth. After all, in every case in which one gets paid in the former bet one would get paid in the latter bet. Running the same argument with larger finite areas than that of the Andromeda Galaxy, we conclude that it is rational to accept a bet at any finite odds against the blast happening within a thousand miles of Earth in an infinite universe. The best way to explain the rationality of such bet acceptance is that the credence of the blast happening within a thousand miles of Earth is lower than any credence associated with a non-zero epistemic probability.
Here is a different way to the conclusion. Consider first a two-step game.
(Blast Game). There will be a blast—you know that. You do not have any information on whether the universe is finite or not, or else we assume it is infinite. First step: You are told whether the blast is going to be within 10100 miles of earth or not. Then you are asked if you want to play the game—you would pay a hundred dollars if it ends up within a thousand miles of earth and get a dollar otherwise. Second step. The blast happens and the payments take place.
Now, whatever you were told at the first step, it is rational to play. If you’re told it is going to be within 10100 miles of earth, you can make a conventional probability calculation that it is worth playing the game, by calculating the probability of its being within a thousand miles of earth given it is within a sphere of radius 10100, and of course it is worth playing. On the other hand, if you are told it is not within 10100 miles of earth, it is worth playing, since you’re guaranteed to win. So, no matter what you are told in step one, it is rational to play at step two. Thus, it would seem to be rational just to tell the other player before step one that you are going to play. In other words, step one is unnecessary—it is rational to play the game anyway. So the game is rational to play. And it does not matter if we are dealing with a hundred dollars or a billion or any finite amount. It is worth betting against the blast being near earth, given an infinite universe and no bias, no matter what the finite odds offered are.
If it is insisted that there is a significant difference between Blast Game and a one-step variant where you are not told whether the blast is within 10100 miles of earth or not, so that it is rational to play the two-step game but not the one-step game, consider some absurd consequences. Suppose you’re playing the two-step game. The room you are playing in is noisy, and you fear that you will not hear whether the blast is within 10100 miles of earth or not. If it is rational to play the two-step game but not the one-step game, then surely you should work hard to strain your ears as to what the answer is. But it is strange that things are such that you should strain your ears to hear an answer that is not going to affect your behavior. Or suppose that you did not in fact hear the answer, but know that an answer is given. What is the point of asking what the answer is, if whatever the answer, what it will be rational for you to do is the same?
Or consider the order in which things are done. Since you know ahead of time that it will be rational for you to wager whatever you are told in the first step, it is likewise rational, if it is convenient for you, to take the dollar out ahead of time. Suppose that wagering is done by putting the dollar into a box, and one counts as having wagered if the dollar is in the box within a minute of the completion of the first step. Since you know you will wager whatever you are told, you put the dollar into the hat ahead of time. Do you have any reason at this point to listen to the answer to the question of whether the blast is within 10100 miles of earth or not? After all, whatever answer is given, you will not have reason to remove the dollar. Or suppose instead that once the dollar is put into the box, the box closes permanently. Would it be any less rational to put the dollar in ahead of time in this case? After all, the only reason it might not be rational is if one thinks that one might be in a position in which it might be rational to remove the dollar, and one knows one will not be in such a position. But if it would be rational to put the dollar in ahead of time in this case, why would the answer to the question matter? Would one have any rational reason for listening to the answer in this case? Surely not. But if not, then one would be just playing the one-step variant of the game.
If one
thinks that rationality at game playing goes hand-in-hand with credences
(except perhaps in the case of games where the number of times one is allowed
to play depends on the very variables one is betting on, as in the Sleeping
Beauty case[5]), and
one agrees that both the two-step and one-step games are rational to play, then
one has to say here that one has a less than 1/100 credence to the claim that
the blast is within a thousand miles of earth.
Likewise, similar
considerations make one opt for the malevolent agent hypothesis rather than the
null hypothesis in our earlier situation.
Under the null hypothesis, a blast in
Admittedly, all the reasoning here cannot be reconstructed within a standard Bayesian epistemology based on the classical probability calculus. But the reasoning is nonetheless reasonable. For instance, Expand-1 is a highly plausible thesis. Likewise, even if absolute probability judgments cannot be made about the likelihoods of the blast being on Earth or in the Andromeda Galaxy, a relative probability judgment can be made. A rational person ought not, if she believes the null hypothesis that there is no bias in the choice of explosion location, fear the nuclear blast. Given an alternate hypothesis of a malevolent agent, she ought to opt for that hypothesis if the blast occurs in a large population center, at least assuming Earth is the only populated area of the universe.
The inapplicability of the classical probability calculus only shows that even the normative application of this calculus to credences is fraught with difficulties. This is no surprise. The probabilistic account of credences has a hard time handling cases that are either impossible to normalize, such as the nuclear blast in an infinite universe.
There are a number of such infinitary cases. Suppose that there are infinitely many sapient creatures in the universe. You learn that one will be tortured to death tomorrow. Plainly, if there were six billion sapient creatures, you would have little self-centered reason to worry. The more sapient creatures there are, the less reason you have to worry. And with an infinite number, there is in a sense no reason to worry at all. We can reason this out either intuitively, or noting that even if it is someone in our galaxy who is to be tortured, it is much more likely it will be someone other than you, and if it is someone not in our galaxy, you don’t need to worry at all.
Now, if the reasoning in any of the above infinitary cases is correct, then likewise so is the Coarse Tuning Argument. We can, for instance, cite the following intuitive principle:
(Expand-2) Under the hypothesis that there is no bias in the relevant processes in favor of one interval over another of equal length, the likelihood that the constant K would occur in any one specified interval decreases as we allow the constant to range over a greater subset of positive reals.
If we had known that K
was limited to occurring within (0,1000), then we would estimate the likelihood
that it would occur within (8.0,20.2) at (20.2-8.0)/1000=1.22%. Once the restriction is removed, our
likelihood estimate had better go down.
The restriction, intuitively, would have increased the likelihood of K’s being in the life-affirming
range. The same reasoning, however, works
whatever the life-admitting range is,
as long as this range is finite. But
since the reasoning is sound, the conclusion is that the Coarse Tuning argument
is a good one.
It may seem somewhat counterintuitive, however, to think that it would be rational to accept the following game. You know nothing about K other than that it is positive. You pay ten thousand dollars to play. If it turns out that K is in the range (0,10100,000), you lose. Otherwise, you get your stake back plus one dollar. If the above reasoning is sound, then playing this game must be rational. But no doubt few of us would play this game.
This thought experiment, however, is prey to several confounding variables. We are risk averse. Many people would not stake ten thousand dollars on a game where the chance of winning is 999,999,999 out of a billion, if the net gain for winning were only a dollar, despite the fact that standard decision theory says this is rational. Furthermore, the gain of a dollar may be seen as completely insignificant if the game is to be played only once.
Consider how the earlier reasoning in the case of the blast applies here. Suppose we were offered a variant game. You put $10,000 in the pot. Your opponent puts a dollar. If K is in the range (0,10100,000), your opponent gets the pot. If K is in the range (10100,000,10100,000+101,000,000), you get the pot. Otherwise, you both get your money back. Intuitively your chance of winning is 10900,000 times bigger than your opponent’s chance of winning, so the game seems a rational one to play. But plainly the game as originally described is if anything a more rational game to play, since any value of K that would make you win this variant game would make you win the original game.
There may, however, be further resistance to playing the game due to some conviction that there is a non-zero, and indeed non-negligible, probability that K is in (0,10100,000). This conviction is either well-founded or not. If it is badly founded, then we should dismiss it from our considerations. If, however, it is well-founded, then it may well be irrational to play the game. However, then the renormalizability objection to the FTA fails. For we have just admitted that it is permissible to assign some non-zero probability p to K’s falling within (0,10100,000). But if we can do that, then the problems with infinities disappear.
What might make one take the horn of saying that the intuition is well-founded is an inductive argument. All the other constants in the laws of nature, expressed in “natural” units (e.g., units defined by such natural choices as setting the speed of light, the gravitational constant, the mass of the electron, etc., to one) are less than 10100,000. Thus, probably, so is K.
I am
sceptical of this inductive argument, because I think that simple induction might
only be applicable over classes of objects that may be reasonably thought to
have a common origin, or to be members of the same natural kind, or to have
their properties explained by the same laws of nature. But the values of different constants of
nature are not explained by laws of nature, because the values are themselves a
part of the laws of nature, and the laws of nature are not self-explanatory. Nor is there much reason to suppose the
constants to have the same origin, unless of course one posits the design
hypothesis or some other First Cause hypothesis, which presumably the FTA’s
typical opponent will not be willing to.
Nor is “constants in laws of nature” a natural kind. After all, its members are not natural
objects, but numbers. Moreover, given that the laws of nature, of
which the constants are aspects, in some sense govern natural kinds, it would seem to be a category mistake of
some sort to suppose that “constants in laws of nature” is a natural kind. In any case, whether the argument succeeds or
not, the FTA survives. If the argument
succeeds, then the Coarse Tuning Argument fails, however.
It is tempting to try to solve the renormalizability problem by using infinitesimals. We now do have mathematically coherent “non-standard arithmetic” theories of infinitesimals, so that for instance we can coherently talk about a “non-standard” number which is bigger than zero but less than every positive real number.[6] Infinitesimals can then be used both to counter the renormalizability problem and to challenge my claim that the countably additive probability calculus cannot be used to model such things as the difference between the credence that a uniformly distributed random variable on [0,1] does not take on the value 1/2 and the credence that it does not take on a value equal to the greatest integer.
Consider first how infinitesimals can be used to challenge the latter claim: this argument was essentially used by Timothy McGrew in discussion. Given our random variable X, we can say that the probability that X=1/2 is a positive infinitesimal i, while the probability that X is the largest integer is just zero. Thus, as long as we extend classical probability theory to include infinitesimals, these kinds of cases do not seem to challenge the basic idea that countably additive probability theory yields the right normative analysis of our credences.
Of course, the defender of the FTA can also try to use infinitesimals. Let i be an appropriately chosen positive infinitesimal. If we are dealing once again with probabilities for the constant K on our state space (0,∞), we can then define P((n,n+1])=i for every non-negative integer n. If A is a subset of (n,n+1], we can then define P(A)=im(A), where m(A) is the measure that a uniform probability density on A would assign to A. This definition is exactly right since P(A)=P(n,n+1]P(A|(n,n+1])=im(A) since our assumptions about K force us to assign a uniform distribution once we’ve conditioned on a finite subset. We can then extend the definition for any measurable subset S of (0,∞) by the formula:
.
This
approach assigns an infinitesimal probability to any subset S of finite extent (i.e., any subset S which is contained in (0,N] for some finite N). Hence, it licenses the
Coarse Tuning Argument. However, it
preserves the intuition that the FTA is better than the CTA, because the
smaller the subset S, the smaller its
infinitesimal probability, so we do have a better argument with smaller
life-admitting ranges.
Unfortunately, as Timothy McGrew has pointed out[7], such an approach is incoherent. Here is a simple way to see this. Consider first the fact that:
P((0,∞)) = P((0,1])+P((1,2])+P((2,3])+… = i + i + i + ….
But observe that also:
P((0,∞)) = P((0,2])+P((2,4])+P((4,6])+… = 2i + 2i + 2i ….
Therefore, P((0,∞))=2P((0,∞)). Thus, P((0,∞))=0, which is absurd since P((0,∞)) ³ i and i>0.
Thus, it does not seem possible to solve the problem just by using infinitesimal probabilities while retaining countable additivity. However, note that this argument of McGrew’s tells equally well against the use of infinitesimals to refute my claim that to model credences about our uniformly distributed random variable X on [0,1] we need more than just countably additive probability theory. For the same kind of a contradiction can be set up from the use of infinitesimals in that setting. Supposedly, we are to set P(X=x)=i where x is a number in [0,1] and i is a positive infinitesimal. Moreover, since all values are intuitively equally likely, this is true for every number x in [0,1]. Let S={1,1/2,1/3,…}. Then, by countable additivity:
P(XÎS)=P(X=1)+P(X=1/2)+P(X=1/3)+…=i+i+i+….
But likewise:
P(XÎS)=P(X=1 or X=1/2)+P(X=1/3 or 1/4)+P(X=1/4 or 1/5)+…=2i+2i+2i+….
Consequently, as before, P(XÎS)=2P(XÎS), and hence P(XÎS)=0, which contradicts the fact that P(XÎS)³i>0.
It seems that infinitesimals, thus, help neither side in the debate. Technically, the problem arises from the fact that is ambiguous when f takes on non-standard values. The problem is that is defined as a limit of as N tends to infinity. Now we say that the limit of an is a if and only if for every e>0 there is a finite integer N such that if n³N then |an-a|<e. Unfortunately, what this means depends on whether we mean the quantifications to be over non-standard e and N (with “finite” now meaning “hyperfinite”) or over standard e and N. The latter simply will not do, because once we are dealing with infinitesimals, we no longer have unambiguous limits if the limiting procedure is defined via standard e and N.[8] So the quantifications are to be over non-standard e and N. Unfortunately, that too turns out to fail as a result of the fact that our problem was so set up that it was a given that X and K take on only standard real values. Thus, we would have to define, say, P(X=1/n) in such a way that it is equal to i if n is finite and equal to 0 if n is infinite. The problem with this is that f(n) = P(X=1/n) to be defined as a non-standard function must be defined using terms available in the purely non-standard language, and “finite” relative to standard reals are not available: only hyperfinite is.
Alternately, there seems to be a different infinitesimal-based way of dealing with the renormalizability problem as long as one is willing to allow for the epistemic possibility that our constant K can take on some but not all non-standard values. If one could argue that we know prior to the observation of K that K<k where k is some positive infinite number (if one has infinitesimals, one has infinite numbers: if i is a positive infinitesimal, then 1/i is a positive infinity), then one could define a non-standard uniform probability measure on (0,k). The non-standard interval (8.0,20.2) would then have a well-defined non-standard probability measure which would end up being infinitesimal. Thus, we could run the FTA: the probability of K falling into the life-admitting range is infinitesimal, and smaller than in the case of a CTA.
The problem with this is just that modeling our knowledge with a non-standard uniform probability measure on (0,k) fails to reflect our physics. Since physics involves standard-valued functions, and I will assume that we know this with certainty, the probability that K lies in the range, say, [k-1,k] is exactly zero, since there are no standard numbers in that range. However, the uniform probability measure assigns probability 1/k>0 to that range.
Nor will it
help to say that we do not have it in our background knowledge that K is a standard number. For once we allow K to be non-standard, then any way of fixing a constant k such that P(K<k)=1 will be ad hoc, and will fail to reflect what we
know. Infinitesimals, thus, seem to be
of no help.
All the difficulties in applying the probability calculus to the infinitary case were caused by CountAdd. It would be prima facie surprising, however, if our credences either did or ought to satisfy CountAdd. For, CountAdd requires a distinction between orders of infinity, between countable infinities, like the cardinality of the set of natural numbers, and uncountable infinities, like the cardinality of the set of real number numbers. CountAdd cannot be required for uncountable unions of events. If it were required for them, we could not have a uniform probability distribution on an interval like [0,1], whereas such probability distributions are essential to our applications of probability theory. To see this, observe that under the uniform probability distribution on [0,1], the singleton set { x } always has zero probability. The union of all these singleton sets as x ranges over the set [0,1] is [0,1], and hence has probability one. But the sum, even an uncountable sum, of zeros is still zero. Thus, zero would equal one if CountAdd were required for uncountable unions of events. Hence, it is essential to CountAdd that it be required only for countable unions. But it would be surprising if our epistemology had always depended on the distinction between countable and uncountable sums, even before the distinction had been made. Thus, it is prima facie unlikely that our epistemology involves such a distinction.
One might
have the intuition that if one has a disjunction of mutually exclusive
propositions, then the probability of the disjunction is equal to the sum of
the probabilities of the disjuncts. Or,
more weakly, one might have the intuition that if each disjunct is completely
unlikely, i.e., has probability zero, then so does the disjunction (this would
be weaker than CountAdd). Unfortunately, as we have just seen, these
intuitions are false when extended to uncountable disjunctions, even though
nothing in our reasons for accepting these claims had anything to do with the
distinction between countable and uncountable cases. This opens the possibility that what is wrong
is not just taking these intuitions to apply to uncountable cases, but perhaps
that what is wrong is taking these intuitions to apply to infinite
disjunctions. After all, there is a
bright line between the infinite and the finite. There is good reason to be sceptical of the
extension of intuitions about finite cases to infinite cases, and our
acceptance of these intuitions about disjunctions is no doubt at least causally
grounded in our perception that they are correct in everyday finite cases.
Let me put the argument this way. Any intuition we have for requiring countable additivity is an intuition for requiring uncountable additivity. But we do not, and cannot, require uncountable additivity. Hence, likewise, we should not require countable additivity. The McGrews’ intuition, expressed in conversation, that countable additivity is needed to tie together the local and the global should be tied together through would also lead one to make one correlate probabilities of singletons with the probability of the whole space, and this should make it clear that the intuition is suspect.
There is thus no reason to object to replacing CountAdd with:
(FiniteAdd) For every finite sequence A1,A2,…,An of disjoint members of B we have: P(A1ÈA2È…ÈAn)=P(A1)+P(A2)+…+P(An).
We keep NonNeg and TotalOne as before. We have a choice about what to do with the requirement that B be a s-algebra. Either we can weaken the requirement so that it is only required that B is an algebra, i.e., closed under complementation and finite unions, or else we can keep the requirement. The resulting theory, in either case, I will call “probability* theory”. For most epistemological purposes, it makes no difference whether we employ probability* theory or probability theory: for instance, Bayes’ theorem holds equally well in both cases.
But probability* theory has a significant advantage for the kinds of infinitary cases that we are interested in. For there is a probability* measure on (0,¥) which has the property of being translation invariant, i.e., P(x+A)=P(A) where x+A={ x+y : yÎA } is the set A shifted over by x, where x>0, and which even matches the function P¥ which we had already defined whenever P¥ is defined (i.e., whenever the limit defining it converges). This probability* measure fits well with the intuition that each value of the constant in question is equally likely. I will sketch in the next section how one gets such a probability* measure, but for now take it on authority.
This probability* measure will have the property that if A is any finite-length interval, then P(A)=0. This does lead to the Coarse Tuning Argument. But given the contention that P correctly embodies the intuitions concerning the matter, we should simply accept the Coarse Tuning Argument.
On the other
hand, we can find sets A that have P(A) anywhere between 0 and 1. For instance, the set A(a)=[0,a]È[1,a]È[2,a]È… will have probability*
exactly a, assuming 0£a£1. Thus, we do not get a design argument from every life-admitting range, but only
from some, either ones that are sufficiently sparsely distributed on the real
line like A(10-100) or ones that are contained in a
finite interval.
Similarly, we can come up with a probability* measure that is applicable to the question of where the nuclear blast will occur in an infinite universe, and which coincides with an appropriate limit of probabilities whenever the latter is defined. This probability* measure will again end up having the property that if V is a finite volume of space, then the probability* of the blast’s occurring in V is zero. And this ability to account for the intuition that we have nothing to fear if we learn there will be a blast somewhere in an infinite universe is in favor of this approach.
Note, finally, that while arguments for countable additivity of credences are also arguments for uncountable additivity, there is a good argument for finite additivity that does not seem to imply infinite additivity. Suppose we don’t have finite additivity. Then, a Dutch Book can be imposed on us. This is a situation where our credences make it rational for us lay a system of bets that would always result in our loss. For instance, suppose I assign probability 1/2 to event A, 1/3 to disjoint event B and 2/3 to the union of the two events. Then, it is rational for me to lay the following bets:
If A, then I get $1, and if not A, then I pay $0.99.
If B, then I get $1.95, and if not B, then I pay $0.95.
If A or B, then I pay $1, and if neither A nor B, then I get $2.05.
Now, there are three possibilities. Suppose that neither A nor B happens. I then pay 0.95+1+2.05=$4. Big loss. Suppose A happens. Then, I get $1. But since B does not happen, I pay $1. Small loss. Moreover I pay $1 since A or B happened. So I pay $1. Suppose that B happens. I get $1.95 for that. But since A doesn’t happen, I lose $0.99. I also lose $1 because A or B happens. Consequently, I am behind by four cents in this case, too.
It’s a bad thing to have credences that allow for a Dutch Book. Hence we need finite additivity. But why do we need countable additivity?
There is a final suggestion as to why we may need it. This comes from Timothy McGrew[9]. We want to have available to us the many tools of statistical analysis, tools that include things like the Central Limit Theorem (CLT) which says that, under appropriate assumptions, the distribution of the sum of independent random variables is Gaussian in the limit of taking that sum to be infinite. Unfortunately, such theorems tend to require countable additivity.
Of course, one might just say that we can have countable additivity in many cases, and in these cases we can use the standard statistical tools. It would have been unjustified to infer in the 18th century that nothing is smaller than the lower resolving limit of the optical microscopes of the time from the fact that if something were smaller than that limit, we could not use our scientific tools on it. We use the tools we have when they are applicable.
But there is a more satisfying answer here. At least some statistical tools carry over in the finitely additive case. As a case in point, consider the CLT. This says that if X1,X2,… is a sequence of independent random variables with mean zero and finite non-zero variances and satisfying an auxiliary size condition, and if Sn=X1+…+Xn and sn2 is the variance of Sn, then P(Sn/sn > x) converges to F(x)=P(G > x) as n goes to infinity, where G is a Gaussian random variable with mean zero and variance one. But note what the CLT does not say. It does not give us any concrete information how close P(Sn/sn > x) should be expected to be to F(x) in practice. For in practice, of course, we are dealing with a finite n, and a limiting theorem only tells us what happens in the limit. As far as the CLT goes, it might be that P(Sn/sn > x) is not at all close to F(x) until n exceeds 10100.
Fortunately, there are CLT-like theorems that are informative in finite cases. For instance, the Berry-Esseen estimate as refined by van Beek[10] implies that:
|P(Sn/sn > x)-F(x)| £ 0.7975 (E[|X1|3]+…+E[|Xn|3]) / sn3.
This means that if we have some handle on the size of sn and on the third moments E[|Xn|3] (and we anyway need some estimates of the size of Xn to apply a CLT), we can give an upper bound on how far away the distribution of Sn is from being Gaussian even for finite n. This is much more informative than the CLT.
And the
Berry-Esseen-van Beek estimate holds just as well if the random variables are
defined only on a finitely additive probability space. This is a theorem: I sketch a proof in the
appendix. The same method of proof shows
that many other results that come along with explicit estimates of the distance
to the limit carry over to the finitely additive case, including some rather
esoteric ones like very recent generalizations of Hsu-Robbins-Erdős
estimates.[11] Thus, at least some convergence estimates
continue to work in the finitely additive case.
All this said, there is a serious difficulty that remains. We can take any region in parameter space and re-arrange it. For instance, instead of a talking about a law that involves a constant K that is a positive real number, we can talk about a law that involves the constant C which is strictly between 0 and 1, where C=e-K, rewriting the law to involve K instead of C. Should we work with C or K? Should we go for the constant that gives the “simpler” law? What is the simplicity of a law? Is not one formulation rather than another a purely arbitrary thing? This is a serious problem. For given just about any constant and just about any life-affirming range that isn’t the completely range, we can transform the equations in such a way that the life-affirming range seems to be almost all of parameter space—and in such a way that it seems to be almost none of it. I don’t know how to solve this problem. But some intuition about simplicity of law might help and it is likely that we need some such intuition to do ordinary science. Nonetheless this is not a particularly satisfying state of affairs.[12]
The proof proceeds in three steps. We start with the original Berry-Esseen-van Beek (BEvB) estimate for random variables on a s-algebra. We then prove, in order:
(A) The BEvB estimate holds for random variables defined on a finitely additive probability space if each of the random variables X1,…,Xn can only take on finitely many values (i.e., if there is a finite set V such that P({X1,…,Xn}ÍV)=1).
(B) The BEvB estimate holds for the random variables X1,…,Xn defined on a finitely additive probability space if there is a finite number M such that P(|X1|<M & … & |Xn|<M)=1.
(C) The BEvB estimate holds in general for random variables defined on a finitely additive probability space.
Before we
give the proof, we need to explain one thing.
The BEvB estimate in more than one places uses the expectation of a
random variable. The expectation of a
random variable is defined as a Lebesgue integral of that random variable,
considered as a function on the probability space. Lebesgue intervals are defined in the case of
countably additive measures. So we need
a finitely additive replacement for the notion of the expectation of a random
variable. Fortunately, we can do
this. If X is a non-negative random variable on a finitely additive
probability space, then we can define . This is well-defined
since P(X³x) is a monotone function of x,
even if P is only a finitely additive measure.
We still need to define the general case. This, too, is easy. If X
is a random variable, we can write X=X+-X-, where X+ is the positive part of a random variable, defined by
X+(w)=X(w) for w
in the probability space whenever X(w)>0, and X+(w)=0 otherwise,
and X- is the negative part defined as (-X)+. Then we can define E[X]=E[X+]-E[X-] whenever at least one of the terms
on the right hand side is finite. It is
easy to check that this coincides with the standard definition if X is random variable defined with
respect to a countably additive probability space.
The first step of our proof is the easiest. Suppose that the random variables X1,…,Xn are defined as measurable over an algebra B (where an algebra is defined just as a s-algebra, but with closure only required over finite unions). Let B* be the collection of all subsets of the probability space W of the form { w in W : (X1,…,Xn)=(a1,…,an) } where a1,…,an are numbers that are in the range of possible values of the X1,…,Xn. Since there are only finitely many such possible values, B* is a finite set. Moreover, it is easy to see that the intersection of any two distinct members of B* is empty. Let B* be the collection of all countable unions (including the empty union, which is itself the empty set) of members of B*. The fact that the intersection of any two distinct members of B* is empty implies that B* is a s-algebra—this is an easy exercise. Note, too, that B* is finite as B* is: in fact, any member of B* is a finite union of members of B*. Moreover, any set in B* is a member of B, since X1,…,Xn are measurable over B. It follows from these facts that B* is a subset of B, and moreover is a s-algebra, even if B is not.
We can now consider the probability space <W,B*,P>, where P is the same probability measure as on B. But since B* is a finite collection of subsets, countable additivity for P is entailed by finite additivity (any countable sequence of disjoint members of B* has all but finitely many members of the sequence be the empty set). Thus <W,B*,P> is a bona fide countably additive probability space. Note that the distributions of variables measurable over B* defined on this space coincide with those distributions as defined on our finitely additive space <W,B,P>. Applying the original BEvB estimate on <W,B*,P>, we automatically get the BEvB estimate for our variables X1,…,Xn on <W,B,P>. Q.E.D.
Step (B) takes some actual calculations. Fix any number e>0. Let X1e,…,Xne be a collection of independent random variables measurable over our finitely additive probability space with the properties that (i) the mean of each variable is zero, (ii) P(|Xie-Xi|>e)=0 for every i, and (iii) each Xie can only take on at most finitely many values. Thus, Xie is guaranteed to be e-close to Xi. It is easy to construct such random variables. For instance, for w in W, choose the unique integer n such that en/2 £ Xi(w) < e(n+1)/2, and set Yie(w)=en/2. Then, all possible values of Yie(w) are of the form en/2 for an integer n, and moreover all the value are between -M-e/2 and M+e/2, and there are only finitely many such values. Observe that Yie is always within e/2 of Xi. Since Xi has mean zero, this implies that |E[Yie]| is at most e/2. We can then let Xie=Yie-E[Yie] and it is easy to verify (i), (ii) and (iii).
By step (A),
the BEvB holds for X1e,…,Xne.
But now observe that one can estimate how close the quantities E[|Xie|3] and E[(X1e+…+Xne)2] are to E[|Xi|3] and E[(X1+…+Xn)2], and we can also notice that there is a
positive function g(e), depending implicitly on the numbers M and sn, such that F(x)
and F(y) are within distance g(e) whenever x and y are within distance
2e/sn,
and g(e)
tends to zero as e does. These kinds of estimates can be used to go
from the BEvB for X1e,…,Xne to the BEvB for X1,…,Xn
in the limit as e goes to zero. The details are a little involved, though
fairly straightforward, and left to an interested reader. The important thing to remember when working
out the details is that one must not assume that the variables are over a
countably additive probability space.
Finally, we
are left with step (C). This is relatively
easy. Fix a large finite positive number
M.
Define ZiM(w)=Xi(w) whenever |Xi(w)|£M/2 and set ZiM(w)=0
otherwise. Let WiM=ZiM-E[ZiM]. Then, |WiM|
never exceeds M. Consequently, we can apply the BEvB to W1M,…,WnM. Moreover, using the relatively easy to prove
observation that E[|WiM|p] converges to E[|Xi|p] whenever the latter is finite as M tends to infinity[13],
we can obtain BEvB for X1,…,Xn.
[1]
“Probabilities and the Fine Tuning Argument: A Sceptical View”, Mind 110 (2001), 1027–1038.
[2]
Proof: The probability of (a,a+L]
is just the limit of the probabilities of (a,a+L+e) as e>0
tends to zero. Since the latter
probabilities do not depend on a, but
only on L and e, neither does that of (a,a+L].
[3]
Let An=2nn. Put A-1=0.
Let A=(A0,A1)È(A2,A3)È(A4,A5)È…. Let pn
be PL(A) for L=An. Then, if n>1
is odd, it is easy to see that (An-An-1)/An£pn£(An-An-1+An-2)/An. Now, An-1/An
and An-2/An
both tend to zero as n tends to
infinity. It follows that pn tends to 1 as n tends to infinity along odd
integers. On the other hand, if n>1 is even, then pn<An-1/An, and so pn tends to zero along even
integers. Consequently, pn has no well-defined limit
as n tends to infinity: it tends to
zero along some sequences of n and to
infinity along others. Hence, P¥(A) is not defined at A, even though A is a Borel set, being a countable union of open intervals.
[4]
This presentation is based on Kiyosi Itô (ed.), Encyclopedic Dictionary of Mathematics, 2nd edition, Cambridge, MA
and London: MIT Press, 1987, s.v.
“Probability Theory”.
[5]
James Dreier (correspondence, 2001) has argued that Elga’s Sleeping Beauty
problem (see Adam Elga, “Self-locating
Belief and the Sleeping Beauty Problem”, Analysis, 60 (2000), 143–147)
can be solved by letting betting rationality diverge from credences. However, the Sleeping Beauty case is
precisely one where the number of times you are given a chance to bet or make
up your mind depends on what the value of the unknown variable that you are
interested in is.
[6] ??ref
[7] Fine Tuning Workshop, Notre Dame University, April, 2003.
[8] To
see this, observe that if a is a
limit by this definition, so will a+i be if i
is an infinitesimal. But limits when
they exist are supposed to be unique.
[9] Fine Tuning Workshop, Notre Dame University, April, 2003.
[10]
“An application of Fourier methods to the problem of sharpening the
Berry-Esseen inequality”, Zeitschrift für
Wahrscheinlichkeitstheorie und Verwandte Gebiete 23 (1972), 187–196.
[11] For instance the main theorem in my “A general Hsu-Robbins-Erdős type estimate of tail probabilities of sums of independent identically distributed random variables”, Periodica Mathematica Hungarica (forthcoming) generalizes to the finitely additive case.
[12] I would like to thank all of the participants and organizers of the Fine Tuning Workshop at Notre Dame University in April, 2003, for many discussions of these issues.
[13]
The easiest way to show this is to use the Dominated Convergence Theorem after
writing out our special definition of E[X]
for non-negative X.