Ok, so yesterday I tweeted this:

Annoying: when people conflate the difference between random and merely unpredictable processes.

It received a fair bit of attention. This is my attempt to explain the context. Please bear in mind that I am neither a statistician nor a scientist of any kind. These are just my reflections on a portion of education policy class last night that I found confusing. If you have anything more interesting and/or more accurate to say about randomness and/or predictability, then please chime in.

We were discussing different models of the policymaking process. Traditional/rational models posit that the process proceeds from deliberation about the problem, to identification and evaluation of all possible solutions, to implementation and evaluation, all in an organized, reason- and/or evidence-governed way.

On the contrary, an alternative model, the “garbage can model,” expresses and depends on the idea that policy choices are not made in such an organized, rational way. Rather, there are various “streams” contributing to the outcome: problems themselves and how they are defined, politics, and policies. The confluence of all these factors results in a kind of “organizational anarchy,” in which multiple streams collide to influence policy outcomes, often in ways that are unintended by policy actors. It’s apparently called the “garbage can” model in order to emphasize how the various policy factors are all just sort of mixed together in a container, like garbage, and that the way they collide and converge is disorderly. There’s a concise description of the model here, if you’re interested.

In discussing the garbage can model, my prof switched between saying that the outcome was “random” and “unpredictable,” as if the two are interchangeable in this context. In fairness to him, other authors writing about the garbage can model around the internet seem to have a tendency to do the same thing. However, I think this can’t be quite right. Like I said, I’m no expert on the meanings or applications of the concept of randomness. But, to me, as a student in a policy class, what “random” conveys is that all of the possible policy outcomes are *equally as likely* to clump together in the garbage can and ultimately get selected. That’s just got to be false; some policies are surely more likely to be selected than others. For instance, ceteris paribus, a policy benefiting business leaders with lobbyists may be more likely to be selected than a policy benefiting a small and non-vocal group of grassroots activists. The policy outcomes may *seem* random, because we can’t tell why a certain blend of factors resulted in them, but that doesn’t make the process *actually *random. Saying that the garbage can process is “unpredictable” is fine, though. The factors that contribute to the process are many, varied, difficult to observe, and difficult to measure. Also, note that this process could be both random and unpredictable, in which case it would be unpredictable *in virtue of being random*. But I don’t think that’s the case here.

The reason this matters is because if the garbage can model of education policy is true and accurately described as “random,” then the appropriate response is to become thoroughly pessimistic about the possibility for anyone or any group to influence policy outcomes. After all, that would mean that, as soon as the policy factors mix up in the garbage can, they become equally likely to be selected. So why even bother? But unpredictability is weaker – it suggests that we don’t know everything about the policy process and should be realistic about our prospects for influencing it, our ability to avoid unintended consequences, etc., without dismissing the possibility altogether.

Updating to add: None of this is to say that the garbage can model is worthless. It helpfully reminds us that policy work is very, very messy. It just isn’t *random*.

TL;DR – The dichotomy is false. There is no “excluded middle.”

I think most applied science treats randomness and unpredictability as interchangeable, but their contexts are different. If we can’t explain it, it can be modeled as a random process. If a random process can accurately describe a cause-and-effect chain, what’s the point of saying it isn’t random?

For example, the motions of an atom or molecule follow no known pattern, and are indeed random by most definitions (Google “brownian motion” for more info). That doesn’t discount the possibility that a process or law exists, just that we don’t (or can’t) know it.

I think the real key is thinking about cause-and-effect. We observe effects and rarely observe causes. If we can’t explain why something happens, it might be best described by a random process. Random, in this context, means unpredictable. It means a properly weighted die or random number generator could put up a decent fight against the best expert predictions.

Another example: pseudo-random numbers are generated by a process that is obscure and unknowable to most people. To them, the numbers really are random. But a computer can’t do random things, so the number is pseudo-random: the computer uses a process you don’t know or understand to create a random-to-you number. Pseudo-randomness is important for reproducible research, since a “seed” can lead to the same random numbers being produced time and time again.

So predictability and randomness are definitely intertwined. They’re not orthogonal. Pseudo-random numbers appear random because you can’t predict them, but anyone with the algorithm and the seed could reproduce them. For most applications, this is good enough.

“… what ‘random’ conveys is that all of the possible policy outcomes are equally as likely.” This is most certainly not the definition of randomness scientists employ. You’re referring to a “uniform distribution” like a roll of the die. But there are other distributions as well.

Suppose policies A, B, and C are in the garbage can. If the chosen policy follows a uniform distribution, then each policy is chosen 1/3 of the time, like a 3-sided die. But, policy A could be chosen 99% of the time, policy B 1% of the time, and policy C never. Policy is still a random variable, and it isn’t “predictable” in the sense that you know what policy is going to be chosen a priori each time. Of course, A is still a good guess.

Hope this helps. Looking forward to talking more if you wish.

OK, so I have a few thoughts here. First, I agree with you (obviously) that the conflation of “random” and “unpredictable” is problematic. It’s probably a general reflection of the fact that the nature of probabilistic statements is unclear overall, though. It’s hard to make sense of what words like “likely” and “unlikely” or even “random” mean. The SEP article on interpretations of probability has a lot on this.

The biggest caution I’d make here is that “random” certainly DOESN’T mean that all outcomes are equally likely. There are two notions at play in predicting the outcome of a stochastic process: the selection method (in this case, random choice), and the probability DISTRIBUTION. The probability distribution gives the probability that any given possibility will actually be the possibility chosen. A case where all possibilities are equally likely to be picked is a special case in which the probability distribution is uniform.

Here’s an example. Consider a genuinely random lottery that produces (at random) a number between 1 and 10. What does it mean to say that the production is RANDOM? Just that no amount of information about the prior state of the system could allow you to predict what number is going to be produced. That’s not the same as saying that you don’t know ANYTHING about what numbers will come out, though. We might know, for instance, that some numbers come out more frequently than others do–we might know that the probability distribution is skewed toward certain values. A distribution that looked like a bell curve, for instance, would mean that values around the middle of the range would be more likely to be selected–that doesn’t make the process not random, it just means that the random selection isn’t completely uniform across all possibilities.

So the upshot for your point, I guess, is that even if the garbage can model is random, that doesn’t mean that all hope is lost: we could look for ways to influence the distribution such that we could skew it toward values we want to come out. Does that make sense?

(please excuse disjointedness and possible unproductiveness of the following)

@Jeffrey-

Thanks for elaborating. I basically agree with you, and appreciate the examples. I definitely understand the value of modeling unknowable stuff as ‘random.’ And actually I never thought the two were dichotomous; even though they’re conceptually distinct it seems right that any given process could be either, both, or neither.

@Jon-

Ok, so I figured that establishing that randomness and unpredictability are different would be *much* easier than actually putting my finger on what either means, strictly. I might still be working off an elementary school kind of version of “random.”

I get that probability distributions are generally not uniform, and can provide some information about what the outcome of the process will be. So in the stupid marbles-in-a-bag example, there might be more greens in there or whatever, making it quite definitely *not* the case that the marble you pick is equally as likely to be green as red (of which there are fewer).

But is it wrong to say that, if the marble picking process is random, the likelihood of choosing any one *particular* marble should be equally as likely as picking any other *particular* marble?

(trying hard to avoid spending the next few hours on the SEP probability article, as much as I would like to…)

I guess I’m just saying, “I agree.”

Random and unpredictable are certainly not the same, and things which are random are certainly not always independently and identically distributed (that’s the i.i.d. thing you probably see stats folks writing fairly frequently as a set of assumptions).

I think the point that you’re getting at though is lost in this semantic discussion over the proper use of jargon. Now let me give a thoroughly jargon-filled restatement of where I think you’re coming from.

The policymaking process is very complex with many competing simultaneous processes that make direct modeling of all of the relationships between input, process, and outcome is practically out of reach. This is true to such an extent that in many instances, using random modeling gives at least as accurate a prediction as any more coherent model that attempts to understand why we observe certain outcomes. However, that’s not to say that there are not many constraints on how policy is actually made. Not only is the universe of all possible outcomes greatly limited by some of these constraints, but there are well reasoned and sometimes well understood reasons why you would have different prior likelihoods for various outcomes that are generated due to variation in inputs and the constraints on the processes available to these inputs.

In this sense, I would think that policymaking is random but not unpredictable. That is, there are constraints which limit the number of types of states and determine the number of states within each type, but amongst these states the system is randomly distributed. As a result, we can talk about having macro-expectations for outcomes and predict the likelihood of certain scenarios.

The problem is that we’re still working out a) What types of states there are and b) How do certain conditions precisely determine and distribute these states? In thermodynamics or quantum mechanics this work is easier. In social science, it’s a bitch. So in the end we have incomplete information about the possibilities that are out there and about the relative probabilities of these events (incomplete information about the posterior probability and likelihood).

These comments have made me wonder whether the following is accurate:

1) Random processes are always unpredictable.

2) Unpredictable processes are sometimes random.

So while unpredictability doesn’t imply randomness, it’s surely a necessary condition.

@Jason-

Spot on, and thanks, only wish I had written that myself.

@Jeffrey-

Yep, that seems right to me. 1 because a process can be unpredictable *precisely in virtue of* its being random, and 2 because a process can be unpredictable for *some other reason* like that it’s just some really complicated social science-y situation.

Taleb makes the argument in the second part of /Black Swan/, chapter 3 or 4 (of that part), that any process that is unpredictable should be treated as random. He’s pretty humble as thinkers go, and is against assuming that everything is knowable. He’s actually very annoyed by the mathematical minutiae of the difference between true randomness and pseudo-randomness or “deterministic chaos”.

If a variable is unpredictable and we don’t know the generating process (it’s unpredictable precisely *because* we don’t understand the process), it’s random in practice and should be treated as such.

Just another data point to consider whether the distinction is that importnat.

Thanks for mentioning this, I have been meaning to read Black Swan for quite some time and it is definitely worth something that Taleb thinks that.

Interesting discussion here. If I may add my “two cents,” I am reminded of an adage by Diane Ravitch, which I have translated into french, and then back into english: “Rien de bon ne peut venir de toute réforme que les enseignants ne pas embrasser”

“Nothing good can come from any reform that teachers do not embrace.”