The Department of Statistics and Data Sciences is pleased to announce the line-up for the 2014 Spring SDS Seminar Series. In its 4th year, the lecture series provides participants with the opportunity to hear from leading scholars and experts who work in different applied areas, including business, biology, text mining, computer vision, economics, and public health.

The series is envisioned as a vital contribution to the intellectual, cultural, and scholarly environment at The University of Texas at Austin for students, faculty, and the wider community. Each talk is free of charge and open to the public. For more information, contact Sasha Schellenberg

**January 14, 2014 – Tamara Broderick**

(University of California Berkeley, *Department of Statistics*)*“Feature allocations, paintboxes, and probability functions”***GDC 4.302, 1:00 to 2:00 PM**

**January 17, 2014 – Po-Ling Loh**

(University of California Berkeley, *Department of Statistics*)*“Nonconvex methods for high-dimensional regression with noisy and missing data”* **CLA 0.122, 1:00 to 2:00 PM**

**January 21, 2014 - Xiang Zhou**

(University of Chicago, *Department of Statistics*)*
”Polygenic modeling with Bayesian sparse linear mixed models in genome-wide association studies"
* **GDC 4.302, 1:00 to 2:00 PM
**

**January 22, 2014—Sara Wade**

(University of Cambridge, *Machine Learning Group*) *"A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting"
* **CLA 1.104, 1:00 to 2:00 PM
**

**January 24, 2014—Lizhen Lin**

(Duke University, *Department of Statistical Science*)*"Shape constrained regression using Gaussian process projections"
* **CLA 1.106****, 1:00 to 2:00 PM**

**March 07, 2014—David Dunson**

(Duke University, *Department of Statistical Science*)*"Robust and Scalable Bayes via the Median Posterior"
* **CBA 4.330, 2:00 to 3:00 PM
** **March 21, 2014—Yuan (Alan) Qi**

(Purdue University, *Department of Statistics*)*“Scalable Gaussian Process Inference for Big Data”* **CBA 4.330, 2:00 to 3:00 PM
****March 28, 2014—Mark Girolami**

(University of Warwick, *Department of Statistics*)*“Defining Posterior Measures on the Hilbert Space of Differential Equation Solutions”* **CBA 4.330, 2:00 to 3:00 PM
**

**April 04, 2014—Yee Whye Teh**

(University of Oxford, *Department of Statistics*)*"Mondrian Forests: Efficient Random Forests for Streaming Data via Bayesian Nonparametrics"*
**CLA 0.102, 2:00 to 3:00 PM
**

**April 11, 2014—Debashis Ghosh**

(Penn State, *Department of Statistics*)*
Cancelled
* **CBA 4.330, 2:00 to 3:00 PM
**

** April 16, 2014—Max Welling**(University of Amsterdam,

**April 25, 2014—David Draper**

(University of California Santa Cruz, *Department of Applied Mathematics & Statistics*)*
"Bayesian model specification: toward a Theory of Applied Statistics"
*

**Tamara Broderick** (University of California Berkeley, *Department of Statistics*)

**Title:** "Feature allocations, paintboxes, and probability functions"

**Abstract: **Clustering involves placing entities into mutually exclusive categories. We wish to relax the requirement of mutual exclusivity, allowing objects to belong simultaneously to multiple classes, a formulation that we refer to as "feature allocation." The first step is a theoretical one. In the case of clustering the class of probability distributions over exchangeable partitions of a dataset has been characterized (via exchangeable partition probability functions and the Kingman paintbox). These characterizations support an elegant nonparametric Bayesian framework for clustering in which the number of clusters is not assumed to be known a priori. We establish an analogous characterization for feature allocation; we define notions of "exchangeable feature probability functions" and "feature paintboxes" that lead to a Bayesian framework that does not require the number of features to be fixed a priori. The second step is a computational one. Rather than appealing to Markov chain Monte Carlo for Bayesian inference, we develop a method to transform Bayesian methods for feature allocation (and other latent structure problems) into optimization problems with objective functions analogous to K-means in the clustering setting. These yield approximations to Bayesian inference that are scalable to large inference problems.

**Po-Ling Loh **(University of California Berkeley, *Department of Statistics*)

**Title:** "Nonconvex methods for high-dimensional regression with noisy and missing data"

**Abstract: **Noisy and missing data are prevalent in many real-world statistical estimation problems. Popular techniques for handling nonidealities in data, such as imputation and expectation-maximization, are often difficult to analyze theoretically and/or terminate in local optima of nonconvex functions -- these problems are only exacerbated in high-dimensional settings. We present new methods for obtaining high-dimensional regression estimators in the presence of corrupted data, and provide theoretical guarantees for the statistical consistency of our methods. Although our estimators also arise as minima of nonconvex functions, we show the rather surprising result that all stationary points are clustered around a global minimum. Motivated by a fundamental connection between linear regression and inverse covariance matrices, we demonstrate an important application of our method for graphical model estimation with noisy and missing data.

**Xiang Zhou** (University of Chicago, *Department of Statistics*)

**Title:** "Polygenic modeling with Bayesian sparse linear mixed models in genome-wide association studies
"

**Abstract:** Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, I consider a hybrid of the two, which we refer to as a “Bayesian sparse linear mixed model” (BSLMM) that includes both these models as special cases. I address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. I apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes (i.e. "chip heritability"), and phenotype (or breeding value) prediction. For PVE estimation, I demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem.

**Sara Wade** (University of Cambridge, *Machine Learning Group*)

**Title:** "A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting”

**Abstract:** In this talk, we examine the use of Dirichlet process mixtures for curve fitting. An important modelling aspect in this setting is the choice between constant and covariate-dependent weights. By examining the problem of curve fitting from a predictive perspective, we show the advantages of using covariate-dependent weights. These advantages are a result of the incorporation of covariate proximity in the latent partition. However, closer examination of the partition yields further complications, which arise from the vast number of total partitions. To overcome this, we propose to modify the probability law of the random partition to strictly enforce the notion of covariate proximity, while still maintaining certain properties of the Dirichlet process. This allows the distribution of the partition to depend on the covariate in a simple manner and greatly reduces the total number of possible partitions, resulting in improved curve fitting and faster computations. Numerical illustrations are presented.

**Lizhen Lin **(Duke University,* Department of Statistical Science*)

**Title:** "Shape constrained regression using Gaussian process projections"

**
Abstract:** Shape constrained regression analysis has applications in dose-response modeling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. In this talk, I will talk about nonparametric methods for estimating shape constrained (mainly monotone constrained) regression functions. I will focus on a novel Bayesian method from our recent work for estimating monotone curves and surfaces using Gaussian process projections. Inference is based on projecting posterior samples from the Gaussian process. Theory is developed on continuity of the projection and rates of contraction. Our approach leads to simple computation with good performance in finite samples. The projection approach can be applied in other constrained function estimation problems including in multivariate settings.

**David Dunson** (Duke University, *Department of Statistical Science*)

**Title:** "Robust and Scalable Bayes via the Median Posterior"

**Abstract:** Bayesian methods have great promise in big data sets, but this promise has not been fully realized due to the lack of scalable computational methods. Usual MCMC and SMC algorithms bog down as the size of the data and number of parameters increase. For massive data sets, it has become routine to rely on penalized optimization approaches implemented on distributed computing systems. The most popular scalable approximation algorithms rely on variational Bayes, which lacks theoretical guarantees and badly under-estimates posterior covariance. Another problem with Bayesian inference is the lack of robustness; data contamination and corruption is particularly common in large data applications and cannot easily be dealt with using traditional methods. We propose to solve both the robustness and the scalability problem using a new alternative to exact Bayesian inference we refer to as the *median posterior*. Data are divided into subsets and stored on different computers prior to analysis. For each subset, we obtain a stochastic approximation to the full data posterior, and run MCMC to generate samples from this approximation. The median posterior is defined as the geometric median of the subset-specific approximations, and can be rapidly approximated. We show several strong theoretical results for the median posterior, including general theorems on concentration rates and robustness. The methods are illustrated through simple examples, including Gaussian process regression with outliers.

Joint work with Stas Minsker, Lizhen Lin and Sanvesh Srivastava

**Yuan (Alan) Qi** (Purdue University, *Department of Statistics*)

**Title:** "Scalable Gaussian Process Inference for Big Data"

**Abstract: **Gaussian process (GP) models are powerful Bayesian nonparametric models. However, it is computationally intensive; given big data, the high computational cost has become a bottleneck for GPs' applications. To address this issue, my group has developed a series of sparse GP inference algorithms. Today I will cover two recent works of ours. First, I will present a scalable GP learning method, EigenGP, which provides nonlinear predictions in a subspace spanned by GP eigenfunctions, automatically learned from data. EigenGP enjoys accurate prediction and uncertainty quantification with fast computation. I will demonstrate its effectiveness on regression, time series forecasting, and semi-supervised classification. Second, I will present GP models on graphs and tensors for predicting unknown interactions or elements. Compared with state-of-the-art methods on benchmark datasets, our method achieves a striking three-fold error reduction. On tensors with billions of elements--which were impossible for existing GP inference methods)---using a distributed online inference algorithm on a new hierarchical Bayesian model, our method achieves higher prediction accuracy with less time than the state-of-the-art alternative.

**Mark Girolami **(University of Warwick, *Department of Statistics*)

**Title:** "Defining Posterior Measures on the Hilbert Space of Differential Equation Solutions."

**
Abstract: **Solving the forward and inverse problems when quantifying uncertainty in models of physical systems described by ordinary and partial differential equations requires a coherent probabilistic framework. Quantifying sources of uncertainty in the forward problem must include the non-analytic nature of solutions of ODE and PDEs which in all but the simplest cases demands finite dimensional functional approximations e.g. Finite elements, and discrete time numerical integration. The epistemic nature of this uncertainty can be formally defined by imposing appropriate prior measures on the Hilbert space of vector fields and corresponding solutions via the Radon-Nikodyn derivative leading to continuous posterior measures over solutions. This work describes such methods to probabilistically solve ODE and PDEs with proofs of consistency provided. Examples will include UQ for Navier-Stokes equations, chaotic PDEs (Kuramoto-Shivasinsky), and biochemical kinetics.

**Yee Whye Teh** (University of Oxford, *Department of Statistics*)

**Title:** "Mondrian Forests: Efficient Random Forests for Streaming Data via Bayesian Nonparametrics"

**Abstract:** Ensembles of randomized decision trees are widely used for classification and regression tasks in machine learning and statistics. They achieve competitive predictive performance and are computationally efficient to train (batch setting) and test, making them excellent candidates for real world prediction tasks. However, the most popular variants (such as Breiman's random forest and extremely randomized trees) work only in the batch setting and cannot handle streaming data easily. In this talk, I will present Mondrian Forests, where random decision trees are generated from a Bayesian nonparametric model called a Mondrian process (Roy and Teh, 2009). Making use of the remarkable consistency properties of the Mondrian process, we develop a variant of extremely randomized trees that can be constructed in an incremental fashion efficiently, thus making their use on streaming data simple and efficient. Experiments on real world classification tasks demonstrate that Mondrian Forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.

Joint work with Balaji Lakshminarayanan and Daniel Roy

**Debashis Ghosh** (Penn State, *Department of Statistics*)

**Title:** *Cancelled*

**Abstract:**

**Max Welling** (University of Amsterdam, *Faculty of Science*)

**Title: **"*Austerity in MCM-Land: Cutting the computational Budget"*

**Abstract: **Will MCMC survive the “Big Data revolution”? Current MCMC methods for posterior inference, compute the likelihood of a model *for every data-case *in order to make a single binary decision: to accept or reject a proposed parameter value. Compare this with stochastic gradient descent that uses O(1) computations per iteration. In this talk I will discuss two MCMC algorithms that cut the computational budget of an MCMC update. The first algorithm, “stochastic gradient Langevin dynamics” (and its successor “stochastic gradient Fisher scoring”) performs updates based on stochastic gradients and ignore the Metropolis‐Hastings step altogether. The second algorithm uses an approximate Metropolis‐Hastings rule where accept/reject decisions are made with high (but not perfect) confidence based on sequential hypothesis tests. We argue that for any finite sampling window, we can choose hyper‐parameters (stepsize, confidence level) such that the extra bias introduced by these algorithms is more than compensated by the reduction in variance due to the fact that we can draw more samples. We anticipate a new framework where bias and variance contributions to the sampling error are optimally traded-off.

**Title: **"TBA"

**Abstract: **