Disclaimer: I initially posted this question on math.stackexchange.com, but a user there suggested I should post it here instead.
I've seen mortality rates for the coronavirus reported for different countries. For example, Worldometers gives the worldwide mortality rate as $3.4\%$. Under the estimate it says 'Globally, about $3.4\%$ of reported COVID-19 cases have died[...]' However, this seems to me to be a fundamentally flawed estimate. I can think of so many problems.
First of all, people don't just die instantly with some probability when they get infected. I can't find a source for this number right now, but I've read it's usually about 14 days from when you get infected to when you die or recover. Anyway, the exact number isn't too important, just that it is defientely not 0 days. So then, you would have to modify the formula to something more like $\text{mortality rate} = \frac{\text{deaths}}{\text{cases 14 days ago}}$, right? However, this is still not good enough. We can't just plug in the number of reported cases, as that grossly underestimates the real number of cases. Most countries around the world have literally just stopped testing people who are not 'at risk'. If they are just using the number of deaths due to the virus and the reported cases, then they will get a number that is way too big. Now, let's say that they think about this, and instead they only include the deaths of people that had been tested for the virus to account for this. Now, they have a biased sample, because, heuristically, only the people who are the most sick will be tested, while the people who are young and healthty and show no symptoms yet are not tested. So both of these methods will give a wrong mortality rate. I have also seen a formula where you calculate the mortality rate like this: $\text{mortality rate} = \frac{\text{deaths}}{\text{deaths + recovered}}$. This formula makes more sense to me than the others, but when applied it seemingly gives a much higher mortality rate than these other methods, whilst I would have thought that the others methods were flawed in such a way that they would give too high a mortality rate, not too low.
Next, I thought that maybe they have just calculated the mortality rate for reported cases, and rolled with that, being careful not to apply that anywhere it would not be valid. However, I found the following method to estimate the number of real cases today in this article by Tomas Pueyo.
If we assume for now a death rate of $1\%$, then we can use that to estimate the real number of cases based on the number of deaths. So, if we have $D$ deaths today, and the mortality rate is $1\%$, then we can assume about $100$ people had the virus $14$ days ago. Then, using a made up doubling period of about $5$ days (there is real data on this of course), the number of cases will double about three times in those $4$ days, so the real number of cases today is about $100 \times 2^3 = 800$.
This method makes sense to me, but only if we use the real mortality rate. not the one that based on a biased sample. Aside from this one quirk, I haven't found anything else I think is wrong in that article, and I've seen nothing but praise for the analysis Tomas Pueyo did in this article. Notably, Khan Academy made a video on this method, using this method.
The only reliable method I can think of to calculate the real mortality rate right now would be to conduct an experiment in which you infect a group of volunteers, and then you calculate the mortality rate from only that group. That way, you would get an unbiased sample. However, this experiment would obviously be incredibly unethical and I doubt it has been done.
Does anyone know how to estimate the real mortality rate during the epidemic?