2022 Seminars

Spring 2022 Seminars

Association and Causation: Attributes and Effects of Judges in Equal Employment Opportunity Commission Litigation Outcomes

Date: Friday, May 6, 2022

Speaker: Michael Sobel

Affiliation: Columbia University

Federal Communications Commission (FCC)—"Addressing the Spectrum Crunch at the Federal Communications Commission"

Karla Hoffman

Professor, George Mason University

Date: Friday, April 29, 2022

Abstract

From mobile devices to improved Wi-Fi, from connected cars to satellites, the demand for more wireless connectivity grows every day. These uses require more dedicated spectrum but much of the available spectrum has already been allocated. Resolving the “spectrum crunch” requires creative solutions and new approaches. In this talk, we will discuss different ways the Federal Communications Commission has attempted to use the available spectrum more efficiently: repacking television stations and then using optimization to determine and dynamically re-schedule the TV transition, reconfiguring 37 and 39 GHz licensees, and creating dynamic licenses in 3.5 GHz. We will discuss how mathematical optimization could be used in each of these scenarios. This is joint work with the optimization team led by Brian Smith who received his Masters’ degree from the SEOR department of George Mason University.

About the speaker

Karla Hoffman is a Professor within the School of Engineering and Computing of George Mason University. She received her BS in mathematics from Rutgers and an M.B.A. and a D.Sc. from GWU. Previously, she worked as a mathematician at the National Institute of Standards and Technology (NIST). She has received NIST’s Applied Research Award, a Commerce Department Silver Medal, GMU’s Distinguished Faculty Award and is a Fellow of the Institute of for Operations Research and the Management Sciences (INFORMS), where she also received the Edelman Prize and the Kimball medal. Dr. Hoffman’s primary areas of research are optimization and auction design and testing. Her research focuses on the development of new algorithms for solving complex problems arising in industry and government. She consults to the FCC on auction design and testing and has served as a consultant on combinatorial optimization problems for the telecommunications, transportation, and military industries.

Least Squares Estimation of a Quasiconvex Regression Function

Rohit Patra

Assistant Professor, University of Florida

Date: Friday, April 22, 2022

Abstract

We develop a new approach for the estimation of a multivariate function based on the economic axioms of quasiconvexity (and monotonicity). On the computational side, we prove the existence of the quasiconvex constrained least squares estimator (LSE) and provide a characterization of the function space to compute the LSE via a mixed integer quadratic programme. On the theoretical side, we provide finite sample risk bounds for the LSE via a sharp oracle inequality. Our results allow for errors to depend on the covariates and to have only two finite moments. We illustrate the superior performance of the LSE against some competing estimators via simulation. Finally, we use the LSE to estimate the production function for the Japanese plywood industry and the cost function for hospitals across the US.

About the speaker

I am an assistant professor in the Department of Statistics at University of Florida. My research centers around semiparametric/nonparametric methodology and large sample theory - efficient estimation in semiparametric models, nonparametric function estimation (with special emphasis on shape constrained estimation), likelihood and bootstrap based inference in (non-standard) parametric and nonparametric models. The main motivation of the research is in developing nonparametric procedures that are automated (free from tuning parameters) but still flexible enough to incorporate data-driven features.

My research has applications in broad areas such as genetics (multiple testing problems), economics (utility and production function estimation and binary response models), causal inference (conditional independence) and astronomy (analysis of accretion of galaxies), among other fields.

Spatiotemporal modeling of an estuarine decapod using Bayesian inference: environmental drivers of juvenile blue crab abundance

Grace Chiu

Professor of Environmental Statistics at the Virginia Institute of Marine Science, College of William & Mary

Date: Friday, April 15, 2022

Abstract

Nursery grounds substantially enhance secondary production of commercially exploited fish and crustacean populations by providing food and refugia for their juveniles. Previous small-scale studies for blue crabs have emphasized seagrass meadows as highly productive nurseries. Yet, to generalize inference of nursery function, identify highly productive regions, and inform regional management, it is vital to unify digitized data on structurally complex habitats with survey data over larger spatiotemporal scales. Thus, we construct five Bayesian hierarchical models with various spatiotemporal dependence structures on 21 years of data across temperate estuaries in Virginia to infer nursery habitat value for blue crabs. Our results indicate that 1) the nonseparable spatiotemporal model outperformed the simpler models in cross validations, and 2) salt marsh surface area and turbidity, not seagrass, are the strongest determinants of local juvenile blue crab production. These highlight the need to consider nursery function at multiple spatiotemporal resolutions, and therefore, spatiotemporal dependence in large scale fisheries catch data, in order for robust inference on local productivity. Details of our work can be found in Hyman et al. (2022) in Frontiers in Marine Science, DOI: 10.3389/fmars.2022.834990.

About the speaker

Grace Chiu is Professor of Environmental Statistics at the Virginia Institute of Marine Science (VIMS), home to William & Mary's Graduate School of Marine Science. Her career has spanned three countries (US, Canada, Australia) as an academic and a federal government scientist. In her research, she develops computationally intensive Bayesian models to understand complex natural phenomena from human societies and the environment. At W&M, she advises and teaches statistics to VIMS graduate students, and advises honors students in the Computational & Applied Mathematics & Statistics (CAMS) program. For 25 years, she has been a devoted educator of statistics to undergraduate and graduate students from a wide range of disciplines. Since joining VIMS in 2019, she has been actively developing an advanced statistics curriculum for the School of Marine Science. Grace is also an affiliate faculty member at VCU, University of Washington, University of Waterloo, and the Australian National University.

New measures for assessing non-ignorable selection bias in non-probability samples and low response rate probability samples

Brady T. West

Research Associate Professor

University of Michigan-Ann Arbor

Date: Friday, April 8, 2022

Abstract

Recent developments in survey statistics have yielded simple, novel measures of the non-ignorable selection bias in estimates of means, proportions, and regression coefficients that may arise due to deviations from ignorable sample selection, where these deviations might be introduced by the sampling mechanism (e.g., non-probability sampling) or survey nonresponse. This presentation will review the computation of these indicators, the data required to compute them, software tools for computing them, and examples of their use and interpretation based on real survey data. Future directions for research in this area, including ongoing work to assess selection bias in pre-election polls conducted for the 2020 presidential election, will be provided in conclusion.

About the speaker

Brady T. West is a research associate professor in the Survey Methodology Program, located within the Survey Research Center at the Institute for Social Research (ISR) on the University of Michigan-Ann Arbor (U-M) campus. He earned his PhD from the Michigan Program in Survey and Data Science (formerly the Michigan Program in Survey Methodology) in 2011. Before that, he received an MA in Applied Statistics from the U-M Statistics Department in 2002, being recognized as an Outstanding First-year Applied Masters student, and a BS in Statistics with Highest Honors and Highest Distinction from the U-M Statistics Department in 2001. His current research interests include total survey error / total data quality, responsive and adaptive survey design, interviewer effects, survey paradata, the analysis of complex sample survey data, and multilevel regression models for clustered and longitudinal data. He has developed short courses on statistical analysis using SAS, SPSS, R, Stata, and HLM, and regularly consults on the use of procedures in these software packages for the analysis of longitudinal and clustered data. The author or co-author of more than 180 peer-reviewed publications in survey statistics, survey methodology, applied statistics, and public health, in addition to three edited volumes on survey methodology, he is also the lead author of Linear Mixed Models: A Practical Guide Using Statistical Software (Third Edition, with Kathy Welch and Andrzej Galecki), and a co-author of a book entitled Applied Survey Data Analysis (with Steven Heeringa and Patricia Berglund), the second edition of which was published by Chapman Hall in 2017.

“Geographer’s” perspectives on analyzing spatial data

David Wong

Professor, George Mason University

Date: Friday, April 1, 2022

Abstract

Instead of focusing on one specific research topic, this talk is to share some geographer’s views on analyzing spatial data. Statisticians and geographers may approach spatial data analysis differently, partly due to the differences in their statistical skills. But the differences may also reflect how they perceive space differently. Thus, some geographer’s perspectives may be ignored by statisticians. In this talk, I will review some challenges that geographers have encountered in analyzing spatial data. While some challenges are well-known and have been investigated for decades, some are raised recently. The goal of the talk is to share different views and to facilitate cross-disciplinary communication.

About the speaker

David Wong, Professor in Geography & Geoinformation Science Department. Except spending two years teaching at the University of Hong Kong between 2013 and 2015, he has been teaching at Mason since 1993. He has broad research interests, ranging from geovisualization, to the more social-oriented issues in spatial epidemiology and aging. His primary research interest is in population analysis, particularly in measuring segregation. Some of his publications include three co-authored books and more than 90 papers in peer-reviewed journals. Some of his research funding supports were provided by HUD, U.S. Census Bureau and NIH (both NICHD and NCI through R01, R03 and contracts). He has served on the editorial boards of seven international journals in GIS, spatial analysis and population.

Equivariant machine learning, structured like classical physics

Soledad Villar

Assistant Professor, Department of Applied Mathematics & Statistics, and Mathematical Institute for Data Science, Johns Hopkins University

Date: Friday, March 25, 2022

Abstract

There has been enormous progress in the last few years in designing conceivable (though not always practical) neural networks that respect the gauge symmetries – or coordinate freedom – of physical law. Some of these frameworks make use of irreducible representations, some make use of higher order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), and permutations. Here we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincaré groups, at any dimensionality d. The key observation is that nonlinear O(d)-equivariant (and related-group-equivariant) functions can be expressed in terms of a lightweight collection of scalars — scalar products and scalar contractions of the scalar, vector, and tensor inputs. These results demonstrate theoretically that gauge-invariant deep learning models for classical physics with good scaling for large problems are feasible right now.

Pathfinder: Parallel quasi-Newton variational inference

Bob Carpenter

Center for Computational Mathematics, Flatiron Institute

Date: Friday, March 11, 2022

Abstract

In this talk, I'll introduce Pathfinder, a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to automatic differentiation variational inference (ADVI) and short dynamic Hamiltonian Monte Carlo (HMC) runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL-divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores.

[joint work with Lu Zhang, Aki Vehtari, and Andrew Gelman]; Preprint available on arXiv.

About the speaker

Bob Carpenter joined the Flatiron Institute’s Center for Computational Mathematics in March 2020. He previously was a research scientist at Columbia University, Alias-I (LingPipe), SpeechWorks, and Lucent Bell Labs. Carpenter was also previously a professor of computational linguistics at Carnegie Mellon University. Carpenter is known for developing Stan, a probabilistic programming language and is one of the Stan core developers. In addition to numerous publications, Carpenter has written two books on computational linguistics. Carpenter has also received grants from the NSF, ONR, Sloan, IES, and NIH for his various programming. Carpenter has a B.A. in Math and Computer Science from Michigan State University and a Ph.D. in Cognitive Science and Computer Science from the University of Edinburgh.

Statistical hurdles and swamps in predicting future forests and winegrowing regions

Elizabeth Wolkovich

Associate Professor, Department of Forest & Conservation Sciences, University of British Columbia

Date: Friday, March 4, 2022

Abstract

Climate change is having large impacts on natural and agricultural systems around the globe. Mitigating the worst consequences requires models that mechanistically predict changes. Towards that goal, my lab (Temporal Ecology Lab) works on models to better predict the most reported biological impact---shifts in phenology, the timing of recurring life history events such as leafout and flowering. Phenological records of cherry blossoms are the longest written records on earth, yet we still struggle to accurately predict them across space, time and climatic change. Here I review several major areas of research where statistical inference has been critical to my lab's insights and advances, but which also highlight some of the deep methodological issues in the field: plant sensitivity to warming temperatures over time and space, timing mismatches between critical species interactions (for example, plants and pollinators) and predicting shifting winegrowing regions with warming.

About the speaker

Elizabeth Wolkovich is an Associate Professor in Forest and Conservation Sciences and Canada Research Chair at the University of British Columbia. She runs the Temporal Ecology Lab, which focuses on understanding how climate change shapes plants and plant communities, with a focus on shifts in the timing of seasonal development (e.g., budburst, flowering and fruit maturity)---known as phenology. Her lab both collects new data on forest trees and winegrapes and collates existing data to provide global estimates of shifts in phenology with warming from plants to birds and other animals, and to understand how human choices will impact future inegrowing regions. Her research benefits from an interdisciplinary team of collaborators from agriculture, biodiversity science, climatology, evolution and viticulture, as well as from shared long-term datasets from across North America and Europe.

Improved Small Domain Estimation via Compromise Regression Weights

Thomas A. Louis

Professor Emeritus of Biostatistics, Johns Hopkins Bloomberg School of Public Health

Date: Friday, February 25, 2022

Abstract

Shrinkage estimates of small domain attributes combine a noisy direct estimate with a more stable, regression-based estimate. When the regression model is misspecified estima- tion performance for the noisier domains can suffer due to substantial shrinkage towards a poorly estimated regression surface. To address this issue, we introduce a class empirically- determined weights used to estimate the regression that improve performance for the noisy domains. The weights are a convex combination of the those that produce the best linear unbiased predictor (BLUP) and those that produce the observed best predictor (OBP) of Jiang and co-authors. The convex combination is found by minimizing an unbiased estimate of the summed mean-squared prediction error, producing the “compromise best predictor” (CBP). This data-adaptive mixture of regression weights retains the robustness of the OBP while maintaining much of the advantage of the BLUP when the regression model is correct. We compare the BLUP, OBP and CBP via simulation and demonstrate their output in estimating gait speed in older adults. Joint work with Nick Henderson and Ravi Varadhan.

Marginal and Conditional Sufficient Variable Screening for Ultrahigh Dimensional Data

Chenlu Ke

Assistant Professor, Department of Statistical Sciences and Operations Research, Virginia Commonwealth University

Date: Friday, February 18, 2022

Abstract

Many contemporary research problems in diverse fields are characterized by ultrahigh dimensional datasets, where the number of variables can be much higher than the sample size. To extract core information by identifying low-dimensional presentations of predictive features is very challenging with interrelations, redundancy and noises embedded in ultrahigh dimensional data. Traditional variable selection and regularization methods are no longer applicable or favorable in terms of computational expediency, statistical accuracy and algorithmic stability. Variable screening aims to swiftly filter out redundant variables through independence learning. In this talk, we will introduce a novel unified framework of variable screening for ultrahigh dimensional data based on the notion of sufficiency. Candidate variables are ranked according to their marginal and conditional contributions to the response measured by "kernel" inverse regression statistics. Our screening procedure is model-free and applicable to continuous and categorical responses. When prior information is available or when potential confounding exists, the method can be readily extended to achieve conditional variable screening, where the conditional set can also be ultrahigh dimensional. The proposed framework enjoys the sure screening property and the rank consistency property in the regime of sufficient variable selection, with which its superiority over existing methods is well-established. We will also demonstrate the advantages of our method through simulation studies and real data applications.

About the speaker

Dr. Chenlu Ke is an Assistant Professor in the Department of Statistical Sciences and Operations Research at Virginia Commonwealth University. Chenlu obtained her PhD in Statistics in 2019 from the University of Kentucky. Her research focuses on developing variable selection and dimension reduction methods for ultrahigh dimensional data as well as their applications in survival analysis.

Let’s Talk About Data Ethics

Wendy Martinez

Director, Mathematical Statistics Research Center, Bureau of Labor Statistics

Date: Friday, February 11, 2022

 

Abstract

I have had the honor of giving talks on data ethics at various events around the world, including the US, Bangladesh, Hong Kong, and Japan. These talks sparked very interesting conversations about the ethical use of data. These conversations made me realize that statisticians and data scientists must be intentional in our application of ethical guidelines for statistical practice. I also learned that data ethics is something we all need to worry about, regardless of where we work and live in the world. I will begin my presentation by offering my definition of data ethics and will then provide a few real-world examples where ethical concerns arose. I will conclude the discussion by providing examples of data ethics frameworks and efforts from around the world.

About the speaker

Wendy Martinez has been serving as the Director of the Mathematical Statistics Research Center at the Bureau of Labor Statistics (BLS) for over ten years. Before that, she worked in several research positions throughout the Department of Defense. She held the position of Science and Technology Program Officer at the Office of Naval Research, where she established a research portfolio comprised of academia and industry performers developing data science products for the future Navy and Marine Corps. Her areas of interest include computational statistics, exploratory data analysis, and text data mining. She is the lead author of three books on MATLAB and statistics. Dr. Martinez was elected as a Fellow of the American Statistical Association (ASA) in 2006 and is an elected member of the International Statistical Institute. She was honored by the American Statistical Association when she received the ASA Founders Award at the JSM 2017 conference. Wendy is also proud and grateful to have been elected as the 2020 ASA President.

Fall 2022 Seminars

A Distributed Approach for Learning Spatial Heterogeneity

Zhengyuan Zhu, Professor, Department of Statistics

Iowa State University

Date: Friday, December 2, 2022

Abstract

Spatial regression is widely used for modeling the relationship between a spatial dependent variable and explanatory covariates. In many applications there are spatial heterogeneity in such relationships, i.e., the regression coefficients may vary across space. It is a fundamental and challenging problem to detect the systematic variation in the model and determine which locations share common regression coefficients and where the boundary is. In this talk, we introduce a Spatial Heterogeneity Automatic Detection and Estimation (SHADE) procedure for automatically and simultaneously subgrouping and estimating covariate effects for spatial regression models, and present a distributed spanning-tree-based fused-lasso regression (DTFLR) approach to learn spatial heterogeneity in the distributed network systems, where the data are locally collected and held by nodes. To solve the problem parallelly, we design a distributed generalized alternating direction method of multiplier algorithm, which has a simple node-based implementation scheme and enjoys a linear convergence rate. Theoretical and numerical results as well as real-world data analysis will be presented to show that our approach outperforms existing works in terms of estimation accuracy, computation speed, and communication costs. 

About the speaker

Dr. Zhengyuan Zhu is the College of Liberal Arts and Sciences Dean's Professor, Director of the Center for Survey Statistics Methodology, and Professor of Statistics in the Department of Statistics at Iowa State University. He received his B.S. in Mathematics from Fudan University and Ph.D. in Statistics from the University of Chicago. His research interests include spatial statistics, survey statistics, machine learning, statistical data integration, and applications in environmental science, agriculture, remote sensing, and official statistics. He is a fellow of the American Statistics Association, and an elected member of the International Statistical Institute.

Are Decision Trees as Powerful as Neural Networks?

Jason M. Klusowski, Assistant Professor, Department of Operations Research and Financial Engineering

Princeton University

Abstract

Decision trees and neural networks are conventionally seen as two contrasting approaches to learning. The popular belief is that decision trees compromise accuracy for being easy to use and understand, whereas neural networks are more accurate, but at the cost of being less transparent. In this talk, we challenge the status quo by showing that, under suitable conditions, decision trees that recursively place splits along linear combinations of the covariates achieve similar modeling power and predictive accuracy as single-hidden layer neural networks. The analytical framework presented here can importantly accommodate many existing computational tools in the literature, such as those based on randomization, dimensionality reduction, and mixed-integer optimization. 

About the speaker

Jason Klusowski is an assistant professor in the department of Operations Research and Financial Engineering (ORFE) at Princeton University. Prior to joining Princeton, he was an assistant professor in the Department of Statistics at Rutgers University, New Brunswick. He received his PhD in Statistics and Data Science from Yale University in 2018. His research interests lie broadly in statistical machine learning, with an emphasis on describing the tensions among interpretability, statistical accuracy, and computational feasibility.

Principal Flow, Sub-Manifold and Boundary

Zhigang Yao, Associate Professor, Department of Statistics and Data Science

National University of Singapore

Date: Friday, November 11, 2022

Abstract

While classical statistics has dealt with observations which are real numbers or elements of a real vector space, nowadays many statistical problems of high interest in the sciences deal with the analysis of data which consist of more complex objects, taking values in spaces which are naturally not (Euclidean) vector spaces but which still feature some geometric structure. I will discuss the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. The aim is to extend the geometric interpretation of PCA, while being able to capture the non-geodesic form of variation in the data. I will introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. We show the principal sub-manifold yields the usual principal components in Euclidean space. We illustrate how to find, use and interpret the principal sub-manifold, by which a principal boundary can be further defined for data sets on manifolds.

About the speaker

Zhigang Yao is an Associate Professor in the Department of Statistics and Data Science at the National University of Singapore (NUS). His current research is focused on the interface between statistics and geometry, especially on the manifold fitting problem. Currently he is a member of the Center of Mathematical Sciences and Applications at Harvard University. He also holds a courtesy joint appointment with the Department of Mathematics at NUS. He is a Faculty Affiliate of the Institute of Data Science (IDS) at NUS. He has held several visiting positions including Visiting Professorship at EPFL. He received his Ph.D. in Statistics from University of Pittsburgh in 2011. His thesis advisors are Bill Eddy at Carnegie Mellon and Leon Gleser at University of Pittsburgh. He has been an Assistant Professor at NUS from 2014-2020. Before joining NUS, he has been working with Victor Panaretos as a post-doc researcher at EPFL from 2011-2014.

Dynamic Mechanistic Spatio-Temporal Modeling for (Re)Emerging Epidemics

Ali Arab, Associate Professor, Department of Statistics
Georgetown University

Date: Friday, November 4, 2022

Abstract

The dynamics of emerging and reemerging epidemics are complex to understand and thus, difficult to model. Moreover, data for rare conditions (over time and space) often include excess zeros which may result in inefficient inference and ineffective prediction for such processes. This is a common issue in modeling rare or emerging diseases or diseases that are not common in specific areas, specific time periods, or those conditions that are hard to detect. A common approach to modeling data with excess zeroes is to use zero-modified models (i.e., hurdle and zero-inflated models). Here, we discuss a mechanistic science-based modeling framework to effectively model the dynamics of disease spread based on zero-modified hierarchical modeling approaches. Our proposed method combines ideas from mechanistic physical-statistical modeling and zero-modified modeling to effectively model the early stages of the pandemics of infectious diseases which is critical in combating the spread of the disease. To demonstrate our work, we provide a case study of modeling the spread of Lyme disease based on confirmed cases of the disease in Virginia during the period 2001-2016.

About the speaker

Ali Arab is an Associate Professor of Statistics in the Department of Mathematics and Statistics of Georgetown University. His methodological research is in spatio-temporal and spatial statistics, and hierarchical Bayesian modeling. He is interested in applications of statistics in environmental science, epidemiology of infectious diseases, ecology, and science and human rights. His current research is focused on developing methodological tools for studying problems in the intersection of climate change and social/natural phenomena, in particular, these projects are focused on bird phenology and climate change, climate and conflict driven forced migration, and climate change and vector-borne disease. Ali also serves as one of the American Statistical Association representatives to the American Association for the Advancement of Science (AAAS) Science and Human Rights Coalition.

Environmental Exposures and Public Health

Jenna Krall, Assistant Professor, Department of Global and Community Health

George Mason University

Date: Friday, October 28, 2022

Abstract

Air pollution is associated with increased cardiorespiratory emergency department visits and hospitalizations.  Because air pollution is a chemical mixture of both particles and gases, challenges remain in determining which air pollutants are most harmful.  Furthermore, because the air pollution mixture varies over time and space, identifying exposure settings that are most harmful is critical to protecting health.  This talk will discuss statistical and methodological approaches for estimating exposure to environmental mixtures and their impacts on health.

About the speaker

Jenna R. Krall, PhD is an Assistant Professor in the Department of Global and Community Health at George Mason University in Fairfax, VA.  Her research interests include estimating exposure to environmental mixtures, such as air pollution, and their impacts on human health.  Dr. Krall received her PhD in Biostatistics from Johns Hopkins University and completed a postdoctoral fellowship in Biostatistics at Emory University.

Challenges in Constructing Human-centric Natural Language Interfaces

Ziyu Yao, Assistant Professor, Department of Computer Science

George Mason University

Date: Friday, October 21, 2022

Abstract

Many existing methods for analyzing spatial data rely on the Gaussian assumption, which is violated in many applications such as wind speed, precipitation and COVID mortality data. In this talk, I will discuss several recent developments of copula-based approaches for analyzing non-Gaussian spatial data. First, I will introduce a copula-based spatio-temporal model for analyzing spatio-temporal data and a semiparametric estimator. Second, I will present a copula-based multiple indicator kriging model for the analysis of non-Gaussian spatial data by thresholding the spatial observations at a given set of quantile values. The proposed algorithms are computationally simple, since they model the marginal distribution and the spatio-temporal dependence separately. Instead of assuming a parametric distribution, the approaches model the marginal distributions nonparametrically and thus offer more flexibility. The methods will also provide convenient ways to construct both point and interval predictions based on the estimated conditional quantiles. I will present some numerical results including the analyses of a wind speed and a precipitation data. If time allows, I will also discuss a recent work on copula-based approach for analyzing count spatial data. 

About the speaker

Ziyu Yao is an Assistant Professor at the Computer Science department of George Mason University. She graduated with a PhD degree from the Ohio State University in 2021. Her research interests lie in Natural Language Processing, Artificial Intelligence, and their applications to other disciplines. In particular, she has been focusing on developing natural language interfaces (e.g., question answering systems) that can reliably assist humans in various domains (e.g., Software Engineering and Clinical Informatics). She was awarded the Presidential Fellowship by OSU in 2020 and the Graduate Student Research Award by the OSU CSE department in 2021. Her work in NLP for Clinical Informatics won the Best Paper Award in IEEE BIBM 2021.

Efficient Shape-constrained Inference for the Autocovariance Sequence from a Reversible Markov Chain

Hyebin Song, Assistant Professor, Department of Statistics

The Pennsylvania State University

Date: Friday, October 14, 2022

Abstract

In this talk, I will present a novel shape-constrained estimator of the autocovariance sequence resulting from a reversible Markov chain.  A motivating application for studying this problem is the estimation of the asymptotic variance in central limit theorems for Markov chains. Asymptotic variance is a key quantity in quantifying the uncertainty of the sample mean from Markov chain iterates, so accurate estimation of asymptotic variance has both statistical and practical significance. Our approach is based on the key observation that the representability of the autocovariance sequence as a moment sequence imposes certain shape constraints, which we can exploit in the estimation procedure. I will discuss the theoretical properties of the proposed estimator and provide strong consistency guarantees for the proposed estimator. Finally, I will empirically demonstrate the effectiveness of our estimator in comparison with other current state-of-the-art methods for Markov chain Monte Carlo variance estimation, including batch means, spectral variance estimators, and the initial convex sequence estimator.

About the speaker

Hyebin Song is an assistant professor at Pennsylvania State University. She received her PhD in Statistics at University of Wisconsin-Madison in 2020, advised by Garvesh Raskutti. Her research interests include high-dimensional statistics and semi-parametric inference, shape-constrained inference, and applications in biomedical research.

Copula-based approaches for analyzing non-Gaussian spatial data

Huixia Judy Wang, Department Chair and Professor, Department of Statistics
George Washington University

Date: Friday, October 7, 2022

Abstract

Many existing methods for analyzing spatial data rely on the Gaussian assumption, which is violated in many applications such as wind speed, precipitation and COVID mortality data. In this talk, I will discuss several recent developments of copula-based approaches for analyzing non-Gaussian spatial data. First, I will introduce a copula-based spatio-temporal model for analyzing spatio-temporal data and a semiparametric estimator. Second, I will present a copula-based multiple indicator kriging model for the analysis of non-Gaussian spatial data by thresholding the spatial observations at a given set of quantile values. The proposed algorithms are computationally simple, since they model the marginal distribution and the spatio-temporal dependence separately. Instead of assuming a parametric distribution, the approaches model the marginal distributions nonparametrically and thus offer more flexibility. The methods will also provide convenient ways to construct both point and interval predictions based on the estimated conditional quantiles. I will present some numerical results including the analyses of a wind speed and a precipitation data. If time allows, I will also discuss a recent work on copula-based approach for analyzing count spatial data. 

About the speaker

Huixia Judy Wang is the Chair of the Department of Statistics at George Washington University. She received her PhD in Statistics at the University of Illinois at Urbana-Champaign in 2006. She taught at North Carolina State for eight years and then moved to George Washington University. She served as Program Director for the National Science Foundation division of Mathematical Sciences for several years. Her research interests span a wide range of fields, which include quantile regression, extreme value theory and applications, bioinformatics and biostatistics, nonparametric and semiparametric methods, in addition to regression, survival analysis, longitudinal and spatial data analysis, and missing data.

Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

Tianshu Feng

Assistant Professor, George Mason University

Date: Friday, September 30, 2022

Abstract

When a financial institution declines an application for credit, an adverse action (AA) is said to occur. The applicant is then entitled to an explanation for the negative decision. The talk focuses on credit decisions based on a predictive model for probability of default and proposes a methodology for AA explanation. The problem involves identifying the important predictors responsible for the negative decision and is straightforward when the underlying model is additive. However, it becomes non-trivial even for linear models with interactions. We consider models with low-order interactions and develop a simple and intuitive approach based on first principles. We then show how the methodology generalizes to the well-known Shapely decomposition and the recently proposed concept of Baseline Shapley (B-Shap). Unlike other Shapley techniques in the literature for local interpretability of machine learning results, B-Shap is computationally tractable since it involves just function evaluations. An illustrative case study is used to demonstrate the usefulness of the method.

About the speaker

Tianshu Feng's work is data-driven and centers on the systematic approach to processing, visualizing, analyzing, modeling, and examining data with complex features. This involves developing and applying novel, flexible, and reliable models via interdisciplinary collaborations in various areas, such as transportation, bioinformatics, healthcare, and finance. His research interests include machine learning and statistical modeling, explainable AI, model fairness and robustness, data exploration, and active learning. Prior to joining Mason, Tianshu was a Quantitative Analytics Specialist at Wells Fargo. He received his PhD degree in Industrial Engineering from the University of Washington and his Bachelor's degree in Statistics from the University of Science and Technology of China.

Scalable Bayesian p-generalized Probit and Logistic Regression Via Coresets

Katja Ickstadt
Professor and Department Chair, Department of Statistics, Technical University of Dortmund

Date: Friday, September 23, 2022

Abstract

In this talk, we consider data reduction techniques like sketching and coresets that retain the statistical information up to only little distortion quantified by theoretic bounds. Our approaches address resource restrictions like memory access, communication cost, and runtime. Coresets are small, possibly weighted data sets designed to approximate an input data set with respect to a computational problem. Often, they are subsets of the input data obtained via sampling techniques. Here, we study coresets for generalized linear models, in particular for binary outcomes. We will present coreset approaches for logistic regression as well as for p-generalized probit regression, the latter also in a Bayesian framework. The resulting reduced data sets have better scaling properties and allow for efficient computations via the established (classic) algorithms.  

About the speaker

Katja Ickstadt (Faculty of Statistics, TU Dortmund University, Dortmund, Germany) studied mathematics with a focus on technology at the Technical University of Darmstadt, Germany, where she received her doctorate in mathematics in 1994. Before her habilitation in mathematics at the Technical University of Darmstadt in 2001, she spent several years abroad, with research and teaching at the University of Basel, Switzerland, Duke University, North Carolina, USA, and the University of North Carolina in Chapel Hill, USA. In her research, Katja Ickstadt focuses on regression methods for very large, high-dimensional data, spatial and spatio-temporal models for biological and epidemiological problems, and the analysis of Gaussian process models. In particular, Bayesian methods are in the foreground. She is involved in the German region of the International Biometric Society and is Co-editor of Biometrics.

Big Spatial Data Learning: A Parallel Solution

Shan Yu
Assistant Professor, Department of Statistics, University of Virginia

Date: Friday, September 09, 2022

Abstract

Nowadays, we are living in the era of “Big Data.” A significant portion of big data is big spatial data captured through advanced technologies or large-scale simulations. Explosive growth in spatial and spatiotemporal data emphasizes the need for developing new and computationally efficient methods and credible theoretical support tailored for analyzing such large-scale data. Parallel statistical computing has proved to be a handy tool when dealing with big data. However, it is hard to execute the conventional spline regressions in parallel. In this talk, I will present a novel parallel smoothing technique for generalized partially linear spatially varying coefficient models, which can be used under different hardware parallelism levels. Moreover, conflated with concurrent computing, the proposed method can be easily extended to the distributed system. The newly developed method is evaluated through several simulation studies and an analysis of the US Loan Application Data.

About the speaker

Dr. Shan Yu joined the Department of Statistics at the University of Virginia as an Assistant Professor last August 2020 after receiving her Ph.D. from Iowa State University. Her research interests focus on advanced statistical methods for complex-structured data, statistical machine learning, and "big data" analytics. Specifically, she has been engaged in projects utilizing non-/semi-parametric regression methods, spatial/spatiotemporal data analysis, biomedical imaging analysis, statistical epidemiology, and trajectory data analysis.