About Us

The continuously growing capacities for the acquisition and storage of data sets call for new approaches to process data efficiently and extract relevant information. In fact, the interest in large data sets is when they are actually 'strange' and allow us to learn about complex mechanisms generating them. In these contexts, it may be difficult or even counterproductive to employ parametric statistical models for the learning process.

The Center for the Discovery of Structures in Complex Data is funded by a grant awarded in 2018 by Iniciativa Científica Milenio from the Chilean Ministry of Economy to a group of Statisticians. The center is based at Pontificia Universidad Católica de Chile. The Center focuses on new statistical approaches for the efficient identification, reconstruction and classification of relevant structural information in complex data sets.

Associate Researchers

Bevilacqua, Moreno

Full Professor. Department of Statistics, Universidad de Valparaiso.

Jara, Alejandro (Director)

Associate Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

Porcu, Emilio

Adjoint Professor. Department of Mathematics, Universidad de Atacama.

Quintana, Fernando (Deputy Director)

Full Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

Sing-Long, Carlos

Assistant Professor. Institute of Mathematical and Computational Engineering, School of Mathematics and Engineering, Pontificia Universidad Católica de Chile.

Young Researchers

Beaudry, Isabelle

Assistant Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

García-Zattera, María José

Assistant Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

Guzman, Cristobal

Ph.D. in Mathematics, Institute for Mathematical and Computational Engineering, School of Mathematics and Engineering, Pontificia Universidad Católica de Chile

Senior Researchers

Maceachern, Steve

Full Professor. Department of Statistics, The Ohio State University.

Müeller, Peter

Full Professor. Department of Mathematics, The University of Texas at Austin.

Prünster, Igor

Full Professor. Institute of Data Science and Analytics, Bocconi University.

Research Lines

For the period 2018-2021, the Center for the Discovery of Structures in Complex Data will be centered on the following aspects of the statistical learning in the context of complex data:

(I) The development, study of properties, and the implementation of scalable Bayesian nonparametric approaches for collection of probability measures indexed by predictors, and when both responses and predictors are defined on non-standard spaces,

(II) The development, study of properties, and the implementation of nonparametric approaches for misclassified doubly-interval-censored time-to-event data, and

(III) The development, study of properties, and the implementation of nonparametric approaches for space and time data.


Previous Next
Networking - Visitors
Alternative content for the map

Past Visitors

  • Amy Herring, Professor, Department of Statistical Sciences, Duke University, January, 14 - 20th, 2019
  • David B. Dahl, Professor, Department of Statistics, Brigham Young University, January, 8 - 17th, 2019
  • Tamara Fernández, Research Associate, Gatsby Computational Neurosci Unit, University College London, December 17th, 2018 - January 14th, 2019
  • Carlos Díaz-Avalos. Professor, Departamento de Probabilidad y Estadística, IIMAS, UNAM, 12 - 26th, 2018
  • Nishant Mehtan. Assistant Professor, Department of Computer Science, University of Victoria, November, 12 - 26th, 2018
  • Alejandro Murua. Professor, Department of Statistics, University of Montreal. September, 7 - 16th, 2018
  • Garritt Page. Associate Professor, Department of Statistics, Brigham Young University. August, 7 - 14th, 2018
  • Evan Ray. Assitant Professor, Department of Statistics, Mount Holyoke College. August, 12 - 17th, 2018

Future Research Seminars

Past Research Seminars


Summarizing distributions of latent structure

In a typical Bayesian analysis, consider effort is placed on "fitting the model" (e.g., obtaining samples from the posterior distribution) but this is only half of the inference problem. Meaningful inference usually requires summarizing the posterior distribution of the parameters of interest. Posterior summaries can be especially important in communicating the results and conclusions from a Bayesian analysis to a diverse audience. If the parameters of interest live in R^n, common posterior summaries are means, medians, and modes. Summarizing posterior distributions of parameters with complicated structure is a more difficult problem. For example, the "average" network in the posterior distribution on a network is not easily defined. This paper reviews methods for summarizing distributions of latent structure and then proposes a novel search algorithm for posterior summaries. We apply our method to distributions on variable selection indicators, partitions, feature allocations, and networks. We illustrate our approach in a variety of models for both simulated and real datasets.


RKHS testing for censored data

We introduce kernel-based tests for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life testing. Our approach is based on computing distances between probability distribution embeddings in a reproducing kernel Hilbert space (RKHS). Previously, this approach has been applied in many Machine Learning and Statistical data settings obtaining very good results. The main advantages of these methods are the ability of kernels to deal with complex data and high dimensionality. In this talk we revert to the real-line problem in which the complexity of the data is due to censored observations. In particular, we propose an extension of these set of tools to censored data, derive its asymptotic results and explain its relation with dominant approaches in Survival Analysis such as the Log-rank test. We finalise showing an empirical evaluation of our methods in which we outperform competing approaches in multiple scenarios.

A sequential approach to updating posterior information

In this talk we show the performance of a sequential Monte Carlo (SMC) algorithm. As prerequisite to understand it, we discuss the Metropolis-Hastings algorithm and also illustrate the general idea of particle-based methods. The SMC algorithm presented here is a particular case of the sequential methods, where the objective is to update the posterior distribution in "static" models.

Procesos puntuales espaciales como herramienta de análisis en ecología

Los procesos puntuales espaciales han cobrado popularidad en los últimos años debido a su utilidad para contestar diversas preguntas en campos científicos. En el campo de la ecología de comunidades, los procesos puntuales han mostrado su utilidad para detectar la presencia de interacciones intra e interespecíficas en ecosistemas boscosos o para evaluar el riesgo y los factores asociados a perturbaciones ecológicas como incendios forestales. Aunque la estimación de los parámetros de modelos en aplicaciones de procesos puntuales espaciales puede ser complicada, los avances en la parte computacional han permitido lograr aproximaciones numéricas aceptables, los cual ha sido factor para su uso en diversos campos del conocimiento humano. En esta charla se presenta un panorama general de los fundamentos teóricos de los procesos puntuales espaciales y se ilustra con un ejemplo de su aplicación en la construcción de mapas de riesgo de incendios forestales.

Fast Rates for Unbounded Losses: from ERM to Generalized Bayes

I will present new excess risk bounds for randomized and deterministic estimators, discarding boundedness assumptions to handle general unbounded loss functions like log loss and squared loss under heavy tails. These bounds have a PAC-Bayesian flavor in both derivation and form, and their expression in terms of the information complexity forms a natural connection to generalized Bayesian estimators. The bounds hold with high probability and a fast $\tilde{O}(1/n)$ rate in parametric settings, under the recently introduced central' condition (or various weakenings of this condition with consequently weaker results) and a type of 'empirical witness of badness' condition. The former conditions are related to the Tsybakov margin condition in classification and the Bernstein condition for bounded losses, and they help control the lower tail of the excess loss. The 'witness' condition is new and suitably controls the upper tail of the excess loss. These conditions and our techniques revolve tightly around a pivotal concept, the generalized reversed information projection, which generalizes the reversed information projection of Li and Barron. Along the way, we connect excess risk (a KL divergence in our language) to a generalized Rényi divergence, generalizing previous results connecting Hellinger distance to KL divergence. This is joint work with Peter Grünwald.

Discovering Interactions Using Covariate Informed Random Partition Models

Combination chemotherapy treatment regimens created for patients diagnosed with childhood acute lymphoblastic leukemia have had great success in improving cure rates. Unfortunately, patients prescribed these types of treatment regimens have displayed susceptibility to the onset of osteonecrosis. Some have suggested that this is due to pharmacokinetic interaction between two agents in the treatment regimen (asparaginase and dexamethasone) and other physiological variables. Determining which physiological variables to consider when searching for interactions in scenarios like these, minus a priori guidance, has proved to be a challenging problem, particularly if interactions influence the response distribution in ways beyond shifts in expectation or dispersion only. In this paper we propose an exploratory technique that is able to discover associations between covariates and responses in a very general way. The procedure connects covariates to responses very flexibly through dependent random partition prior distributions, and then employs machine learning techniques to highlight potential associations found in each cluster. We apply the method to data produced from a study dedicated to learning which physiological predictors influence severity of osteonecrosis multiplicatively.

Cox regression with Potts-driven latent clusters model

We consider a Bayesian nonparametric survival regression model with latent partitions. Our goal is to predict survival, and to cluster survival patients within the context of building prognosis systems. We propose the Potts clustering model as a prior on the covariates space so as to drive cluster formation on individuals and/or Tumor-Node-Metastasis stage system patient blocks. For any given partition, our model assumes a interval-wise Weibull distribution for the baseline hazard rate. The number of intervals is unknown. It is estimated with a lasso-type penalty given by a sequential double exponential prior. Estimation and inference are done with the aid of MCMC. To simplify the computations, we use the Laplace's approximation method to estimate some constants, and to propose parameter updates within MCMC. We illustrate the methodology with an application to cancer survival.

A Bayesian Nonparametric Multiple Testing Procedure for Comparing Several Treatments Against a Control

We propose a Bayesian nonparametric strategy to test for differences between a control group and several treatment regimes. Most of the existing tests for this type of comparison are based on the differences between location parameters. In contrast, our approach identifies differences across the entire distribution, avoids strong modeling assumptions over the distributions for each treatment, and accounts for multiple testing through the prior distribution on the space of hypotheses. The proposal is compared to other commonly used hypothesis testing procedures under simulated scenarios. A real application is also analyzed with the proposed methodology.

Temporal and Spatio-Temporal Random Partition Models

Data that are spatially referenced often represent an instantaneous point in time at which the spatial process is measured. Because of this it is becoming more common to monitor spatial processes over time. We propose capturing the temporal evolution of dependent structures by modeling a sequence of partitions indexed by time jointly. We derive a few characteristics from the joint model and show how it impacts dependence at the observation level. Computation strategies are detailed and apply the method to Chilean standardized testing scores.


MiDaS workshops aims to highlight recent advances in modeling and computation through the lens of applied, domain-driven problems that require flexible statistical models. The workshops bring together leading experts and talented young researchers working on applications and theory of felxible parametric and nonparametric (Bayesian) statistics. The workshops focus on new statistical approaches for the efficient identification, reconstruction and classification of relevant structural information in complex data sets. MiDaS 2019 workshop will be held in the hotel Enjoy of Viña del Mar, Viña del Mar, Chile, March 25th to 29th, 2019. For more details please click here.


MiDaS Outreach Videos

MiDaS - Outreach Video 1 (spanish)
MiDaS - Highlights 2018 Big DATA Olympiads (spanish)

Other Videos about Statistics

General video about staistics (spanish)
TED talk by Arthur Benjamin
TED talk by Alan Smith
Statistics is for everyone
Statisticians making a difference
Statisticians in other fields

Big DATA Olympiad

This is a contest in which teams of high school students from Chilean schools solve problems of data analysis. The objective of the competition is to stimulate the interest of students in Statistics and Data Science.

The intent of the competition is allow competitors to ‘get their hands dirty’ by performing in depth analysis of the data in order to come up with the best recommendation to address the problem.

The competition has two stages. In the pre-selection phase, the teams must prepare a written report using basic statistical techniques and MS Excel. The teams selected in this stage will be invited to a week of training at the Faculty of Mathematics of the Pontifical Catholic University of Chile. The training will include modern techniques for the description and visualization of data, and on the statistical program R. After the training, the final competition will be carried out. The costs of stay and transfer of selected teams from regions other than the Metropolitan one will be covered by the competition.

The Selection Committee will be formed by Professors of the Department of Statistics of the Faculty of Mathematics of the UC.

For more details please click here.

Future Outreach Conferences and Seminars

Women in Data Science Santiago at UC

As part of the 2019 Stanford Women in Data Science (WiDS) conference, MiDaS is proud to host an event celebrating the women of statistics and data science in Santiago.

The WiDS initiative aims to inspire and educate data scientists worldwide, regardless of gender, and support women in the field. WiDS started as a conference at Stanford in November 2015. Now, WiDS includes a global conference, with 150+ regional events worldwide; a datathon, encouraging participants to hone their skills; and a podcast, featuring leaders in the field talking about their work, and their journeys.

We invite all women (and the men who want to support them) to join us for a day of conversation, connection, networking, training and awareness raising. Speakers include Industry leaders, shapeshifters and datapreneurs.

Date in Santiago: Monday 4th March 2019.

For more details please click here.

Past Outreach Conferences and Seminars

The Big Data Revolution in Biomedical Research

Asssociated to 'Congreso Furturo 2019', MiDaS was proud to host this event at the Catholic University of Chile. In this seminar, leading international researchers discussed the revolution that large data sets have generated in biomedical research for general public. The seminar took place on January 15th, 2019. The speakers included Professors Amy Herring of Duke University, Gerd Antes of Univefrsity of Freiburg, and Harris Lewin of University of California at Davis.

The Big Data Revolution in Biomedical Research The Big Data Revolution in Biomedical Research

Big Data: The revolution of the information in Biomedical Research

MiDaS, along with the School of Medicine of the Catholic University of Chile, has co-organized this event at the Catholic University of Chile, where some researchers from MiDaS gave talks to illustrate how the research results obtained in our center can help researchers in Biomedical Sciences to obtain better conclusions. The event took placed on December 18th, 2018.

TBig Data: The revolution of the information in Biomedical Research TBig Data: The revolution of the information in Biomedical Research

Outreach Talks


Job opportunities

Postdoc possition
We are looking for highly motivated statisticians, data scientists or computer scientists, interested to applying for a Postdoctoral Research Grant from the Chilean National Fund for Scientific and Technological Research (FONDECYT), most likely opening in August 2019. Researchers who attained a Doctoral degree as of January 1st, 2016 or later, may apply to this competition. A local researcher at a Chilean university must sponsor the proposal and MiDaS researchers would play that role. Therefore, the proposal should be about statistical methods for complex data. The projects last for 2 or 3 years and the candidate must declare a full-time commitment to the research work. However, its execution is compatible with other paid academic, research and/or outreach activities for of up to 6 hours per week in the sponsoring institution. The grant will cover salary (approximately USD 30,400 / year), travel and operational expenses (USD 6,700/year), and health insurance (USD 670/year). Interested postdoctoral applicants should send a formal application to midas AT mat.uc.cl including the following information: (i) cover letter, (ii) CV, (iii) publication list, and (iv) summary of research accomplishments and potential research interest. Please do not hesitate to contacting us for further details.

How to Contact Us

Call or email us at

Phone: +56 22 354 4506
Fax: +56 22 354 4506

Send email

Visit us at

Faculty of Mathematics UC,
Campus San Joaquin, Vicuña Mackenna 4860, Macul

View on Google Map

Be social

Twitter: @MiDaS_Chile
Facebook: facebook.com/midas.mat.uc.cl
Instagram: instagram.com/MiDaS_Chile