Seminar 2021-09-17

Supervised Network Centrality Estimation and Prediction

Dr. Linda Zhao, Professor of Statistics in the Wharton School, University of Pennsylvania
Dr. Linda Zhao, Professor of Statistics in the Wharton School at the University of Pennsylvania
Linda Zhao
Professor of Statistics and Data Science, University of Pennsylvania

Date: Friday, September 17, 2021
Time: 11am–noon
Live-stream: The virtual talk will live-streamed via Zoom. For connection information, please contact


Directed networks play a ubiquitous and crucial role in our lives and have implications for individual’s behavior and outcomes. The node’s position in the network, usually captured by the centrality, is an important intermediary of network effects, and is often incorporated in regression model to elucidate the effect of the network on outcome variable of interest. In empirical studies, researchers often adopt a two-stage procedure to evaluate the network effect – first estimate the centrality from the observed network and then employ the estimated centrality in regression. Despite the prevalent adoption of such two-stage procedure, it fails to incorporate the observational errors from the observed network and lacks valid inference. We first propose a unified inferential framework that combines the network error model and the regression on centrality model, under which we prove the shortcoming of the two-stage in estimating the centrality and demonstrate the consequent undesirable effect in the outcome regression. We then propose a novel supervised network centrality estimation and prediction (SuperCENT) methodology that simultaneously combines the information from the two es- sential models. The proposed method always provides superior estimates of the centrality and the true underlying network over the two-stage procedure, and produces better network effect estimation and more accurate outcome prediction when the observational error of the network is severe. We further derive the distribution of the centrality and network effect for both the SuperCENT and two-stage, which can be used to construct valid confidence intervals. Our model is applied to predict the currency risk premium based on the centrality of the global trade network. We show that a trading strategy based on centralities estimated by SuperCENT yields return three times as high as the two-stage method.
Joint work with Cai, J., Yang, D., Zhu, W. and Shen, H.

About the speaker

Linda Zhao is a full professor of statistics in the Wharton School. She received her Ph.D. from Cornell in 1993 and joined the University of Pennsylvania since 1994. A fellow of the IMS, Linda has been actively engaged in her academic career. Her specialty falls in modern machine learning methods, replicability in science, network and high dimensional data, housing price prediction, and Bayesian methods. Current projects include equity ownership network, and its relationship to firm performance and innovation activities; identify signals from noisy data using non-parametric Bayesian scheme; and model-free data analysis. Her work has won NSF support for over 20 years. Since past five years, she has been developing and teaching a modern data mining course to undergraduate, MBA, Master, and Ph.D. students throughout the entire Penn campus. Students comment that her data mining course is one of the most fun and useful courses offered at Penn. She is also an avid ballroom dancer and she loves to travel around the world.