×

We use cookies to help make LingQ better. By visiting the site, you agree to our cookie policy.


image

Bayesian Statistics: Techniques and Models, 1.10 (V) Non-conjugate Models

1.10 (V) Non-conjugate Models

[MUSIC] Let's talk about a couple examples of models that don't have nice, clean posterior distributions. We'll first look at an example of a one parameter model that is not conjugate. Suppose we have values that represent the percentage change in total personnel from last year to this year for, we'll say, ten companies. These companies come from a particular industry. We're going to assume for now, that these are independent measurements from a normal distribution with a known variance equal to one, but an unknown mean. So we'll say the percentage change in the total personnel for company I, given the unknown mean mu. Will be distributed normally with mean mu, and we're just going to use variance 1. In this case, the unknown mean could represent growth for this particular industry. It's the average of the growth of all the different companies. The small variance between the companies and percentage growth might be appropriate if the industry is stable. We know that the conjugate prior for mu in this location would be a normal distribution. But suppose we decide that our prior believes about mu are better reflected using a standard t distribution with one degree of freedom. So we could write that as the prior for mu is a t distribution with a location parameter 0. That's where the center of the distribution is. A scale parameter of 1 to make it the standard t distribution similar to a standard normal, and 1 degree of freedom. This particular prior distribution has heavier tails than the conjugate and normal distribution, which can more easily accommodate the possibility of extreme values for mu. It is centered on zero so, that a priori, there is a 50% chance that the growth is positive and a 50% chance that the growth is negative. Recall that the posterior distribution of mu is proportional to the likelihood times the prior. Let's write the expression for that in this model. That is the posterior distribution for mu given the data y1 through yn is going to be proportional to the likelihood which is this piece right here. It is a product from i equals 1 to n, in this case that's 10. Densities from a normal distribution. Let's write the density from this particular normal distribution. Is 1 over the square root of 2 pi. E to the negative one-half. Yi minus the mean squared, this is the normal density for each individual Yi and we multiplied it for likelihood. The density for this t prior looks like this. It's 1 over pi times 1 plus Mu squared. This is the likelihood times the prior. If we do a little algebra here, first of all, we're doing this up to proportionality. So, constants being multiplied by this expression are not important. So, the square root of 2 pi being multiplied n times, just creates the constant number, and this pi out here creates a constant number. We're going to drop them in our next step. So this is now proportional too, we're removing this piece and now we're going to use properties of exponents. The product of exponents is the sum of the exponentiated pieces. So we have the exponent of negative one-half times the sum from i equals 1 to n, of Yi minus mu squared. And then we're dropping the pie over here, so times 1 plus mu squared. We're going to do a few more steps of algebra here to get a nicer expression for this piece. But we're going to skip ahead to that. We've now added these last two expressions. To arrive at this expression here for the posterior, or what's proportional to the posterior distribution. This expression right here is almost proportional to a normal distribution except we have this 1 plus mu squared term in the denominator. We know the posterior distribution up to a constant but we don't recognize its form as a standard distribution. That we can integrate or simulate from, so we'll have to do something else. Let's move on to our second example. For a two parameter example, we're going to return to the case where we have a normal likelihood. And we're now going to estimate mu and sigma squared, because they're both unknown. Recall that if sigma squared were known, the conjugate prior from mu would be a normal distribution. And if mu were known, the conjugate prior we could choose for sigma squared would be an inverse gamma. We saw earlier that if you include sigma squared in the prior for mu, and use the hierarchical model that we presented earlier, that model would be conjugate and have a closed form solution. However, in the more general case that we have right here, the posterior distribution does not appear as a distribution that we can simulate or integrate. Challenging posterior distributions like these ones and most others that we'll encounter in this course kept Bayesian in methods from entering the main stream of statistics for many years. Since only the simplest problems were tractable. However, computational methods invented in the 1950's, and implemented by statisticians decades later, revolutionized the field. We do have the ability to simulate from the posterior distributions in this lesson as well as for many other more complicated models. How we do that is the subject of next week's lesson. [MUSIC]


1.10 (V) Non-conjugate Models

[MUSIC] Let's talk about a couple examples of models that don't have nice, clean posterior distributions. We'll first look at an example of a one parameter model that is not conjugate. Suppose we have values that represent the percentage change in total personnel from last year to this year for, we'll say, ten companies. These companies come from a particular industry. We're going to assume for now, that these are independent measurements from a normal distribution with a known variance equal to one, but an unknown mean. So we'll say the percentage change in the total personnel for company I, given the unknown mean mu. Will be distributed normally with mean mu, and we're just going to use variance 1. In this case, the unknown mean could represent growth for this particular industry. It's the average of the growth of all the different companies. The small variance between the companies and percentage growth might be appropriate if the industry is stable. We know that the conjugate prior for mu in this location would be a normal distribution. But suppose we decide that our prior believes about mu are better reflected using a standard t distribution with one degree of freedom. So we could write that as the prior for mu is a t distribution with a location parameter 0. That's where the center of the distribution is. A scale parameter of 1 to make it the standard t distribution similar to a standard normal, and 1 degree of freedom. This particular prior distribution has heavier tails than the conjugate and normal distribution, which can more easily accommodate the possibility of extreme values for mu. It is centered on zero so, that a priori, there is a 50% chance that the growth is positive and a 50% chance that the growth is negative. Recall that the posterior distribution of mu is proportional to the likelihood times the prior. Let's write the expression for that in this model. That is the posterior distribution for mu given the data y1 through yn is going to be proportional to the likelihood which is this piece right here. It is a product from i equals 1 to n, in this case that's 10. Densities from a normal distribution. Let's write the density from this particular normal distribution. Is 1 over the square root of 2 pi. E to the negative one-half. Yi minus the mean squared, this is the normal density for each individual Yi and we multiplied it for likelihood. The density for this t prior looks like this. It's 1 over pi times 1 plus Mu squared. This is the likelihood times the prior. If we do a little algebra here, first of all, we're doing this up to proportionality. So, constants being multiplied by this expression are not important. So, the square root of 2 pi being multiplied n times, just creates the constant number, and this pi out here creates a constant number. We're going to drop them in our next step. So this is now proportional too, we're removing this piece and now we're going to use properties of exponents. The product of exponents is the sum of the exponentiated pieces. So we have the exponent of negative one-half times the sum from i equals 1 to n, of Yi minus mu squared. And then we're dropping the pie over here, so times 1 plus mu squared. We're going to do a few more steps of algebra here to get a nicer expression for this piece. But we're going to skip ahead to that. We've now added these last two expressions. To arrive at this expression here for the posterior, or what's proportional to the posterior distribution. This expression right here is almost proportional to a normal distribution except we have this 1 plus mu squared term in the denominator. We know the posterior distribution up to a constant but we don't recognize its form as a standard distribution. That we can integrate or simulate from, so we'll have to do something else. Let's move on to our second example. For a two parameter example, we're going to return to the case where we have a normal likelihood. And we're now going to estimate mu and sigma squared, because they're both unknown. Recall that if sigma squared were known, the conjugate prior from mu would be a normal distribution. And if mu were known, the conjugate prior we could choose for sigma squared would be an inverse gamma. We saw earlier that if you include sigma squared in the prior for mu, and use the hierarchical model that we presented earlier, that model would be conjugate and have a closed form solution. However, in the more general case that we have right here, the posterior distribution does not appear as a distribution that we can simulate or integrate. Challenging posterior distributions like these ones and most others that we'll encounter in this course kept Bayesian in methods from entering the main stream of statistics for many years. Since only the simplest problems were tractable. However, computational methods invented in the 1950's, and implemented by statisticians decades later, revolutionized the field. We do have the ability to simulate from the posterior distributions in this lesson as well as for many other more complicated models. How we do that is the subject of next week's lesson. [MUSIC]