Hari Seshadri

Puzzle from Reddit (link)

04/19/2015

My friend linked me a programming puzzle on Reddit and I found it interesting. I didn't get anywhere thinking about it, but I brought it up with my friends and we came up with a neat solution. This post outlines our approach to solving the puzzle (solution below, pardon my artwork as always).

Assumes all information given is consistent!

Problem description

As a brief problem description, you are given N input statements giving us the probabilities of various combined events. For instance, in the default example above, the input/output should read:

P(!A & B) = 0.01
P(!A & !B) = 0.85
P(B) = 0.12
What is P(A & !B)?

or, the event "Not A and B" happens with probability 1%. Given many probability assignments like this, the challenge is to find the probability of a target event; in the example above, this is P(A & !B).

Renaming variables for convience in linear algebra

The input statements are not conducive to linear arithmetic manipulation in their undoctored form. For instance, if we are given

P(A) = 0.1
P(B) = 0.4
we can't combine those equations to get P(A & B), P(A || B), P(A & !B), or really any meaningful combination of the two events; in order to do so we need some quantitative metric of how related A and B are. Notably, however, if we knew that A and B were mutually exclusive, then we can set up linear systems where the probability of one event or another is simply the sum of the two individual probabilities. And that's the first trick - rephrasing the inputs as combinations of mutually exclusive events.

In a universe of N boolean events, there are 2^N different, mutually exclusive, outcomes. Specifically, let's label the original events X₀, X₁, ..., X_N-1, and label the entire universe outcomes Y₀, Y₁, ..., Y_2^N-1. Let's assume that Y₀ corresponds to the specific outcome of

!X₀ & !X₁ & ... & !X_N-2 & !X_N-1
and Y₁ corresponds to
!X₀ & !X₁ & ... & !X_N-2 & X_N-1
and Y₂ corresponds to
!X₀ & !X₁ & ... & X_N-2 & !X_N-1
etc, "incrementing" from "not X_i" to "X_i" like counting in binary.

If we rephrase the problem inputs in this grammar, using Y instead of X, we now get the benefit of knowing that each of our "events" are mutually exclusive; only one element of the Y₀ ... Y_2^N-1 sequence can be true. Because of this, we can add and subtract probabilities and use linear algebraic reasoning. To make this transition, define Y'_i as P(Y_i) for all i in [0, 2^N-1].

Solving the example

Bringing this back to a concrete example, let's start with the example given in the dashed brown box, except let's rename A to be X₀ and B to be X₁ to be consistent with our naming conventions. We then have

P(!X₀ & X₁) = 0.01
P(!X₀ & !X₁) = 0.85
P(X₁) = 0.12
What is P(X₀ & !X₁)?

This can be rephrased in the new "Y"-grammar defined above as follows:

P(Y₁) = 0.01
P(Y₀) = 0.85
P(Y₁ || Y₃) = 0.12
What is P(Y₂)?

Note that X₁ actually corresponds the union of the two events, X₁ & !X₀ with X₁ & X₀, which is why X₁ corresponds to Y₁ || Y₃. Anyway - we can again rephrase in the new " Y' "-grammar:

Y'₁ = 0.01
Y'₀ = 0.85
Y'₁ + Y'₃ = 0.12
What is Y'₂?

As you may have realized, there's an additional equality which goes unsaid - the sum all of everything in the Y_i sequence adds up to 1. Recall that the universe is exactly partitioned into the Y_i sequence. So, our final linear system is

Y'₁ = 0.01
Y'₀ = 0.85
Y'₁ + Y'₃ = 0.12
Y'₀ + Y'₁ + Y'₂ + Y'₃ = 1
What is Y'₂?

which we can easily solve with Gaussian row reduction.

Incomplete information

If we take a step back, we solved the example problem by finding the probability of every single true/false assignment to each of the input events. If they've given us enough information, then this is a fine approach. But what if the question was just

P(!X₀ & X₁) = 0.01
P(!X₀ & !X₁) = 0.85
What is P(X₀)?

In this case, we don't have enough information to solve for all of {Y'₀, Y'₁, Y'₂, Y'₃}, but we can still evaluate P(X₀), which is Y'₂ + Y'₃. Rewritten in the " Y' "-grammar,

Y'₁ = 0.01
Y'₀ = 0.85
Y'₀ + Y'₁ + Y'₂ + Y'₃ = 1
What is Y'₂ + Y'₃?

Re-written again as a matrix, M,

This is the second trick of this problem. How can we solve one particular linear combination of variables without solving for all of the variables? The naive Gaussian elimination hinges on being able to solve for all variables and is therefore not a valid solution. One solution is to find some combination of the input expressions which gives you the specific linear combination you're looking for. In the example above, note that

1 * (Y'₀ + Y'₁ + Y'₂ + Y'₃) - 1 * (Y'₁) - 1 * (Y'₀) = Y'₂ + Y'₃
So the problem has changed from solving for Y_i to solving for the coefficients of the input equations which you can combine to get the desired linear combination!

To set up the new linear system (which solves for the coefficients of the input equations, not for Y_i) we can take the transpose of the unaugmented M matrix, and augment it with the coefficients of the desired linear combination. Neat, right? We are now solving for the coefficients we can use to linearly combine the input equations to get our desired probability. Here is the resultant augmented matrix. The left-side is the transpose of M and the right side is coefficients of Y_i representing our desired probability, which is Y₂ + Y₃.
which, using Gaussian row reduction, we can simplify to

The right-side coefficients here can be used to linearly combine our inputs with our inputs, M to get our desired answer. In matrix form, this is

to give our final answer of 0.14 for this problem. And that's all there is to it!

Beyond first glance

Puzzle from Reddit (link)

Problem description

Renaming variables for convience in linear algebra

Solving the example

Incomplete information