Homework 2 - Exercise 2 (ecdf implementation)

Homework 2 - Exercise 2 (ecdf implementation)

di MICHELE RISPOLI -
Numero di risposte: 3
I report the text of ex.2 of homework 2 for practicality:
---
Implement the empirical cumulative distribution function F_X(x)= cdf(dist, x) taking as inputs a pyro.distributions object dist, corresponding to the distribution of X

, and integer value x.

Suppose that XN(0,1)

and plot F_X(x).

(sorry for the messy format)
---
Now here are my doubts:
1. Shouldn't the ecdf be derived from a sample of a distribution? Why would we feed the distribution object to the function instead of a vector samples from said `dist` object?
2. If the `x` input parameter was intended to be the input of the F_X(x) function, why would it be an integer? Shouldn't it be possible to chose it in the generic distribution support, therefore more likely a real number? This is especially true in the case asked for in the second point!

Thanks in advance,
Michele
In riposta a MICHELE RISPOLI

Ri: Homework 2 - Exercise 2 (ecdf implementation)

di GINEVRA CARBONE -

Hi Michele,

The request for an input integer was my mistake, it was intended to be any real number. I updated the notebook soon after noticing the error (https://github.com/ginevracoal/statistical-machine-learning/blob/master/homeworks/homework_02.ipynb).

The ecdf should compute the estimates from a vector of samples from the input distribution and, additionaly, it should be able to return the estimated value corresponding to a specific input x. I hope the request is clearer now.

Ginevra

In riposta a GINEVRA CARBONE

Ri: Homework 2 - Exercise 2 (ecdf implementation)

di MICHELE RISPOLI -
Hello Ginevra,
thanks for the answer, it did clear my doubts.

I reckon i should have come up with the question earlier, since at the time i wrote the post I had my solutions already submitted.
Since I had these doubts, in the proposed solution I chose to explain and implement my best guess for the exercise request, that is, ecdf takes a sample vector from a pyro.distribution and an arbitrary x value: I believed the choice of the sampling parameters (i.e. size and random seed) was better left to the function caller, and having otherwise a non-deterministic function was not desirable.

If I got your answer correctly though, we were supposed to take the sample inside the function.
I understand this was a toy problem, therefore i could have probably avoided posing myself the parameters problem mentioned above, but since I mentioned it, what's the common/best approach with random seeding inside functions in a realistic scenario?


In riposta a MICHELE RISPOLI

Ri: Homework 2 - Exercise 2 (ecdf implementation)

di GINEVRA CARBONE -
Yes, this was just a simple example and in general there is no need to worry about the style of your implementation, I'm just interested in the conceptual part of it.
I don't think there is a "best approach" for seeding. Any choice is correct as long as your results on that function call are reproducible.