The continuously growing capacities for the acquisition and storage of data sets call for new approaches to process data efficiently and extract relevant information. In fact, the interest in large data sets is when they are actually 'strange' and allow us to learn about complex mechanisms generating them. In these contexts, it may be difficult or even counterproductive to employ parametric statistical models for the learning process.

The Center for the Discovery of Structures in Complex Data is funded by a grant awarded in 2018 by Iniciativa Científica Milenio from the Chilean Ministry of Economy to a group of Statisticians. The center is based at Pontificia Universidad Católica de Chile. The Center focuses on new statistical approaches for the efficient identification, reconstruction and classification of relevant structural information in complex data sets.

## Associate Researchers

#### Bevilacqua, Moreno

Full Professor. Department of Statistics, Universidad de Valparaiso.

#### Jara, Alejandro(Director)

Associate Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

#### Quintana, Fernando(Deputy Director)

Full Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

#### Sing-Long, Carlos

Assistant Professor. Institute of Mathematical and Computational Engineering, School of Mathematics and Engineering, Pontificia Universidad Católica de Chile.

## Young Researchers

#### Beaudry, Isabelle

Assistant Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

#### García-Zattera, María José

Assistant Professor. Department of Statistics, School of Mathematics, Pontificia Universidad Católica de Chile.

#### Guzman, Cristobal

Ph.D. in Mathematics, Institute for Mathematical and Computational Engineering, School of Mathematics and Engineering, Pontificia Universidad Católica de Chile

## Senior Researchers

#### Maceachern, Steve

Full Professor. Department of Statistics, The Ohio State University.

#### Müeller, Peter

Full Professor. Department of Mathematics, The University of Texas at Austin.

#### Prünster, Igor

Full Professor. Institute of Data Science and Analytics, Bocconi University.

## Research Lines

For the period 2018-2021, the Center for the Discovery of Structures in Complex Data will be centered on the following aspects of the statistical learning in the context of complex data:

(I) The development, study of properties, and the implementation of scalable Bayesian nonparametric approaches for collection of probability measures indexed by predictors, and when both responses and predictors are defined on non-standard spaces,

(II) The development, study of properties, and the implementation of nonparametric approaches for misclassified doubly-interval-censored time-to-event data, and

(III) The development, study of properties, and the implementation of nonparametric approaches for space and time data.

## Visitors

Previous Next
##### Networking - Visitors
Alternative content for the map

## Past Visitors

• Carlos Díaz-Avalos. Professor, Departamento de Probabilidad y Estadística, IIMAS, UNAM, 12 - 26th, 2018
• Nishant Mehtan. Assistant Professor, Department of Computer Science, University of Victoria November, 12 - 26th, 2018
• Alejandro Murua. Professor, Department of Statistics, University of Montreal. September, 7 - 16th, 2018
• Garritt Page. Associate Professor, Department of Statistics, Brigham Young University. August, 7 - 14th, 2018
• Evan Ray. Assitant Professor, Department of Statistics, Mount Holyoke College. August, 12 - 17th, 2018

## Danilo Alvares

### A sequential approach to updating posterior information

In this talk we show the performance of a sequential Monte Carlo (SMC) algorithm. As prerequisite to understand it, we discuss the Metropolis-Hastings algorithm and also illustrate the general idea of particle-based methods. The SMC algorithm presented here is a particular case of the sequential methods, where the objective is to update the posterior distribution in "static" models.

## Carlos Díaz-Ávalos

### Procesos puntuales espaciales como herramienta de análisis en ecología

Los procesos puntuales espaciales han cobrado popularidad en los últimos años debido a su utilidad para contestar diversas preguntas en campos científicos. En el campo de la ecología de comunidades, los procesos puntuales han mostrado su utilidad para detectar la presencia de interacciones intra e interespecíficas en ecosistemas boscosos o para evaluar el riesgo y los factores asociados a perturbaciones ecológicas como incendios forestales. Aunque la estimación de los parámetros de modelos en aplicaciones de procesos puntuales espaciales puede ser complicada, los avances en la parte computacional han permitido lograr aproximaciones numéricas aceptables, los cual ha sido factor para su uso en diversos campos del conocimiento humano. En esta charla se presenta un panorama general de los fundamentos teóricos de los procesos puntuales espaciales y se ilustra con un ejemplo de su aplicación en la construcción de mapas de riesgo de incendios forestales.

## Nishant Mehta

### Fast Rates for Unbounded Losses: from ERM to Generalized Bayes

I will present new excess risk bounds for randomized and deterministic estimators, discarding boundedness assumptions to handle general unbounded loss functions like log loss and squared loss under heavy tails. These bounds have a PAC-Bayesian flavor in both derivation and form, and their expression in terms of the information complexity forms a natural connection to generalized Bayesian estimators. The bounds hold with high probability and a fast $\tilde{O}(1/n)$ rate in parametric settings, under the recently introduced central' condition (or various weakenings of this condition with consequently weaker results) and a type of 'empirical witness of badness' condition. The former conditions are related to the Tsybakov margin condition in classification and the Bernstein condition for bounded losses, and they help control the lower tail of the excess loss. The 'witness' condition is new and suitably controls the upper tail of the excess loss. These conditions and our techniques revolve tightly around a pivotal concept, the generalized reversed information projection, which generalizes the reversed information projection of Li and Barron. Along the way, we connect excess risk (a KL divergence in our language) to a generalized Rényi divergence, generalizing previous results connecting Hellinger distance to KL divergence. This is joint work with Peter Grünwald.

## Fernando Quintana

### Discovering Interactions Using Covariate Informed Random Partition Models

Combination chemotherapy treatment regimens created for patients diagnosed with childhood acute lymphoblastic leukemia have had great success in improving cure rates. Unfortunately, patients prescribed these types of treatment regimens have displayed susceptibility to the onset of osteonecrosis. Some have suggested that this is due to pharmacokinetic interaction between two agents in the treatment regimen (asparaginase and dexamethasone) and other physiological variables. Determining which physiological variables to consider when searching for interactions in scenarios like these, minus a priori guidance, has proved to be a challenging problem, particularly if interactions influence the response distribution in ways beyond shifts in expectation or dispersion only. In this paper we propose an exploratory technique that is able to discover associations between covariates and responses in a very general way. The procedure connects covariates to responses very flexibly through dependent random partition prior distributions, and then employs machine learning techniques to highlight potential associations found in each cluster. We apply the method to data produced from a study dedicated to learning which physiological predictors influence severity of osteonecrosis multiplicatively.

## Alejandro Murua

### Cox regression with Potts-driven latent clusters model

We consider a Bayesian nonparametric survival regression model with latent partitions. Our goal is to predict survival, and to cluster survival patients within the context of building prognosis systems. We propose the Potts clustering model as a prior on the covariates space so as to drive cluster formation on individuals and/or Tumor-Node-Metastasis stage system patient blocks. For any given partition, our model assumes a interval-wise Weibull distribution for the baseline hazard rate. The number of intervals is unknown. It is estimated with a lasso-type penalty given by a sequential double exponential prior. Estimation and inference are done with the aid of MCMC. To simplify the computations, we use the Laplace's approximation method to estimate some constants, and to propose parameter updates within MCMC. We illustrate the methodology with an application to cancer survival.

## Luis Gutierrez

### A Bayesian Nonparametric Multiple Testing Procedure for Comparing Several Treatments Against a Control

We propose a Bayesian nonparametric strategy to test for differences between a control group and several treatment regimes. Most of the existing tests for this type of comparison are based on the differences between location parameters. In contrast, our approach identifies differences across the entire distribution, avoids strong modeling assumptions over the distributions for each treatment, and accounts for multiple testing through the prior distribution on the space of hypotheses. The proposal is compared to other commonly used hypothesis testing procedures under simulated scenarios. A real application is also analyzed with the proposed methodology.

## Garritt Page

### Temporal and Spatio-Temporal Random Partition Models

Data that are spatially referenced often represent an instantaneous point in time at which the spatial process is measured. Because of this it is becoming more common to monitor spatial processes over time. We propose capturing the temporal evolution of dependent structures by modeling a sequence of partitions indexed by time jointly. We derive a few characteristics from the joint model and show how it impacts dependence at the observation level. Computation strategies are detailed and apply the method to Chilean standardized testing scores.

## Workshops

MiDaS workshops aims to highlight recent advances in modeling and computation through the lens of applied, domain-driven problems that require flexible statistical models. The workshops bring together leading experts and talented young researchers working on applications and theory of felxible parametric and nonparametric (Bayesian) statistics. The workshops focus on new statistical approaches for the efficient identification, reconstruction and classification of relevant structural information in complex data sets. MiDaS 2019 workshop will be held in the hotel Enjoy of Viña del Mar, Viña del Mar, Chile, March 25th to 29th, 2019. For more details please click here.

## MiDaS Outreach Videos

##### Statisticians in other fields

This is a contest in which teams of high school students from Chilean schools solve problems of data analysis. The objective of the competition is to stimulate the interest of students in Statistics and Data Science.

The intent of the competition is allow competitors to ‘get their hands dirty’ by performing in depth analysis of the data in order to come up with the best recommendation to address the problem.

The competition has two stages. In the pre-selection phase, the teams must prepare a written report using basic statistical techniques and MS Excel. The teams selected in this stage will be invited to a week of training at the Faculty of Mathematics of the Pontifical Catholic University of Chile. The training will include modern techniques for the description and visualization of data, and on the statistical program R. After the training, the final competition will be carried out. The costs of stay and transfer of selected teams from regions other than the Metropolitan one will be covered by the competition.

The Selection Committee will be formed by Professors of the Department of Statistics of the Faculty of Mathematics of the UC.

## Women in Data Science Santiago at UC

As part of the 2019 Stanford Women in Data Science (WiDS) conference, MiDaS is proud to host an event celebrating the women of statistics and data science in Santiago.

The WiDS initiative aims to inspire and educate data scientists worldwide, regardless of gender, and support women in the field. WiDS started as a conference at Stanford in November 2015. Now, WiDS includes a global conference, with 150+ regional events worldwide; a datathon, encouraging participants to hone their skills; and a podcast, featuring leaders in the field talking about their work, and their journeys.

We invite all women (and the men who want to support them) to join us for a day of conversation, connection, networking, training and awareness raising. Speakers include Industry leaders, shapeshifters and datapreneurs.

Date in Santiago: Monday 4th March 2019.

## Job opportunities

##### Postdoc possition
We are looking for highly motivated statisticians, data scientists or computer scientists, interested to applying for a Postdoctoral Research Grant from the Chilean National Fund for Scientific and Technological Research (FONDECYT), most likely opening in August 2019. Researchers who attained a Doctoral degree as of January 1st, 2016 or later, may apply to this competition. A local researcher at a Chilean university must sponsor the proposal and MiDaS researchers would play that role. Therefore, the proposal should be about statistical methods for complex data. The projects last for 2 or 3 years and the candidate must declare a full-time commitment to the research work. However, its execution is compatible with other paid academic, research and/or outreach activities for of up to 6 hours per week in the sponsoring institution. The grant will cover salary (approximately USD 30,400 / year), travel and operational expenses (USD 6,700/year), and health insurance (USD 670/year). Interested postdoctoral applicants should send a formal application to midas AT mat.uc.cl including the following information: (i) cover letter, (ii) CV, (iii) publication list, and (iv) summary of research accomplishments and potential research interest. Please do not hesitate to contacting us for further details.

#### Call or email us at

Phone: +56 22 354 4506
Fax: +56 22 354 4506

#### Visit us at

Faculty of Mathematics UC,
Campus San Joaquin, Vicuña Mackenna 4860, Macul