{"success":1,"msg":"","color":"rgb(28, 35, 49)","title":"Minimax and Bayesian estimation of the unseen under tail-index regularity<\/b>","description":"webinar","title2":"","start":"2021-04-09 14:00","end":"2021-04-09 15:00","responsable":"Botond Szabo <\/i><\/a>","speaker":"Zacharie Naulet (Universite' Paris-Sud)","id":"34","type":"webinar","timezone":"Europe\/Amsterdam","activity":"zoom (Meeting ID: 916 2981 6385, Passcode: 049612)","abstract":"In this talk, I will discuss the famous problem of estimating the number of unseen species in a population. This problem has a long history in statistics, but has recently received a lot of attention after the breakthrough of Orlitsky et al. who established impressive information-theoretical limits of predictability of the unseen. For a suitable notion of loss, they proved that minimax estimation of the unseen over all population distributions from a sample of size $n$ is possible if and only if the population size is at most $o(n \\log(n))$. Their result is interesting in many ways, but perhaps deceptive for statisticians, as the estimator they provide works only for large sample sizes. This leaves open the problem of what can be done if we are willing to assume regularity on the population distribution. On the Bayesian side, the literature has largely investigated random partition models, giving many ways of estimating the unseen. Most famous models, such as Poisson-Kingman partitions or Pitman-Yor processes, generate populations that possess the property of having a finite \"tail-index\" which entirely determine the asymptotic behaviour of the number of species. Inspired by this long line of works, we do a minimax analysis of the unseen problem over classes of population distributions having a finite tail-index $\\alpha \\in (0,1)$. Albeit our analysis is mostly frequentist, the upper bound is derived by constructing an estimator using BNP arguments and can be easily extended to the classical Pitman-Yor estimates. Importantly, our estimator can be efficiently computed, which I will support with some simulations. The main challenge is on deriving the minimax lower bound. For this matter, we propose a generic machinery for obtaining minimax lower bounds in partition models which is of interest by itself as it can be used for many other quantities. Interestingly, this machinery also relies on BNP arguments. In the end, our main result is that estimating the tail-index and the unseen are equivalent problems, and under suitable second-order assumptions on the tail-index, minimax estimation of the unseen is possible all the way up to population sizes that are as large as $o(\\exp(c n^{\\alpha})$ and impossible for larger populations, in contrast with the $o(n \\log(n))$ limit under no regularity assumption. Joint work with Stefano Favaro."}