Questions regarding Homework 4

Questions regarding Homework 4

di FRANCESCO BRAND -
Numero di risposte: 4

Hi Ginevra,

I have a couple of questions regarding homework 4.

For what regards the first exercise, something weird is happening while using pyro.param:

1) By using this notation

                                                            y=pyro.param("y",(theta1*x)/(theta2+x), obs=obs)

inside the model, we get a correct clumping of the observation but apparently the posterior it's always super close to the prior (I've tried using Gaussians priors for theta1 and theta2 and, no matter where I was centering them, the posterior was always perfectly aligned with the mean and variance of the prior). In fact, I suspect it is not conditioning at all and it is just sampling from the prior, regardless of the observed y.
2) Even worse, by using
                                                                    y=pyro.param("y",(theta1*x)/(theta2+x),
     and then outside the model
                                              conditioned_model = pyro.condition(model1, data={"y": y_obs})
     returning y from the conditioned model yields us y that are not clumped to the observed y!
    
Therefore, I have no idea how to reliably condition on a delta distribution (since the MCMC algorithm doesn't work with Deltas, as pinpointed by Simone during the Q&A).

I also have a question regarding the second point of exercise 2: since we have a defined relation between y,x,theta1 and theta2, it seems to me that, if one would clump y,x and theta2 to a defined value, then there would only be one possible value for theta1 given by the inverse of our relation! (Granted that its inverse exists)
So for each triplet of values there is just one possible sample available for theta1/theta2. How do we avoid determinism in this case? Should we cycle through the different observations that we have in order to actually obtain different values for the thetas?

Thanks in advance for your time.

Best Regards,

Francesco Brand


In riposta a FRANCESCO BRAND

Ri: Questions regarding Homework 4

di GINEVRA CARBONE -

Hi Francesco,


1) According to the documentation, pyro.param is supposed to make the specified weight "trainable", meaning that you can perform inference and condition on it. So it should be working, unless there is some bug in the version of pyro we are using (not that I'm aware of).

Otherwise, you could avoid using it by just defining an additional distribution in your model and using the equation as an intermediate relationship between variables. For example, it could be the mean of a normal distribution for y:

yhat = (theta1*x)/(theta2+x)
y = pyro.sample("y", dist.Normal(yhat,1), obs=y)


2) Since you inferred the distributions of theta1 and theta2 in the previous exercise, you can use them to set the bivariate normal distribution for theta. Just substitute the appropriate mean vector and correlation parameter rho, which you can calculate from the posterior chains. Then at each iteration you always sample from the conditional distributions, that you are able to compute from the bivariate normal.


Hope I helped.

Best

Ginevra

In riposta a GINEVRA CARBONE

Ri: Questions regarding Homework 4

di MICHELE RISPOLI -
I avoid opening another thread since my doubts are on the same subject.
I'm sorry but even after reading your answer i haven't understood what to do in point 2 of exercise 2.

Here's what i understood and thus did so far:
in exercise 1 we're asked to implement an HMC simulation to sample from the posterior distributions p(theta1|x,y) and p(theta2|x,y), where x and y are the samples' vectors given in the exercise text.
This comprises giving an explicit definition of the model (i.e. choosing the hyperparameters and at least the analitical forms for X and the priors for the thetas), running the pyro implementation of HMC using our model and adjusting the parameters of the priors and of the HMC simulation in order to obtain the best approximation of the samples accordingly to the stationarity indexes, i.e. R hat and n_eff.
The result of these operations is thus a set of samples for said posteriors, which we can use to make their distplot.

Now, in exercise 2 we implement a gibbs sampler which can produce samples of a bivariate normal with arbitrary mean and correlation parameter rho.
In order to run it we'll have to specify the mean vector and correlation parameter of the gaussian we wish to sample.
Now, we may run it on a bivariate gaussian whose mean vector is the mean of the samples obtained in exercise 1, and rho is computed as the correlation factor between said samples (numpy.corrcoeff?), but that would simply produce another set of samples.
My understaing is that the mean of the latter set of samples would be the estimate for the thetas, but, assuming that our implementation is fine, wouldn't that coincide with (or be very very close to) the input means?

I hope my explanation is clear and i thank you in advance for your time.
In riposta a MICHELE RISPOLI

Ri: Questions regarding Homework 4

di GINEVRA CARBONE -

Hi Michele,


Your interpretation of both points is correct. My idea was that of giving you two options:

a) use the prior described in ex 2.1

b) use the posterior estimate from ex. 1 as a prior for ex. 2.2

In the first case, you would ideally get a similar result from both exercises.

In the second case you would get a "refinement" of your first estimate and ex 2.2 would also be a double check for both exercises. In fact, if the analysis of convergence and the implementation of Gibbs sampler are correct, the posterior estimate of ex.2.2 should not change much wrt the prior, as you pointed out.


Best

Ginevra