I am a statistician interested in both statistical methodological research and statistical applications. My recent research interests in statistical methodology include functional data analysis, big data analytics, lifetime data analysis, and ROC curve methodology. I welcome collaborations from all disciplines and have collaborated with researchers from engineering, public health, computer science, and biology.
I joined Virginia Tech in 2006 upon obtaining my PhD from the Department of Statistics at Purdue University. My earlier alma maters are University of Science and Technology of China and Johns Hopkins University.
Functional data analysis and nonparametric smoothing
Types of data: data of curves/surfaces/shapes, long time series or collection of time series, image data, data generated from nonlinear processes, data of complex objects.
High dimensional data and statistical learning
Types of data: bioinformatics data, data of high dimensions, network data, data for classification or clustering.
Lifetime data analysis and survival models
Types of data: data on failure times or terminal events, censored data, data with limits of detection.
Diagnostic test data and ROC curve methodology
Types of data: data on medical diagnostic tests, industrial testing data.
Feb 2025: Our paper, "An accurate computational approach for partial likelihood using Poisson-binomial distributions", got accepted by Computational Statistics and Data Analysis. The paper introduces a new Poisson-binomial distribution based approach to accurately compute the partial likelihood in the original Cox model that works especially well for tied survival data. The correponding R package can be downloaded from ExactCoxPBD.
Sep 20, 2024: Zhiyuan Du successfully completed his final defense and will join AbbVie, Inc. as a Senior Research Statistician. Congratulations!
Jun 2024: Our paper, "Reliability study of battery lives: A functional degradation analysis approach", got accepted by Annals of Applied Statistics. The paper introduces the first functional model for degradation analysis of quality control data collected on heterogeneous domains.
Apr 2023: Our paper, "Contrast tests for groups of functional data", got accepted by Canadian Journal of Statistics. The paper introduces a new functional linear contrast test procedure for correlated functional data that can be used after a rejection of the functional ANOVA test.
Feb 2023: Our paper, "Minimax Nonparametric Multi-sample Test under Smoothing", got accepted by Statistica Sinica. The paper proposes a new penalized likelihood ratio test for multi-sample testing that is minimax optimal.
Aug 3, 2022: Quyen Do successfully completed her final defense and will join Corning, Inc. as a Statistical Engineer. Congratulations!
May 2022: Our paper, "Shape Constrained Kernel PDF and PMF Estimation", got accepted by Statistica Sinica. The paper extends our estimation method for shape-constrained multivariate regression functions to the estimation of shape constrained PDFs and PMFs.
Selected Publications
Cho, Y., Hong, Y., and Du, P. (2025).
An accurate computational approach for partial likelihood using Poisson-binomial distributions. Computational Statistics and Data Analysis, to appear. (The R package ExactCoxPBD)
Cho, Y., Do, Q., Du, P., and Hong, Y. (2024).
Reliability study of battery lives: A functional degradation analysis approach. Annals of Applied Statistics 18(4), 3185-3204.
Du, Z., Du, P., and Liu, A. (2024).
Likelihood ratio combination of multiple biomarkers via smoothing spline estimated densities. Statistics in Medicine 43(7), 1372–1383.
Do, Q. and Du, P. (2024).
Contrast tests for groups of functional data. Canadian Journal of Statistics 52(3), 713-733.
Xing, X., Shang, Z., Du, P., Ma, P., Zhong, W., and Liu, J. S. (2023).
Minimax nonparametric multi-sample test under smoothing.
Statistica Sinica, to appear.
Du, P., Parmeter, C. F., and Racine, J. S. (2023).
Shape constrained kernel PDF and PMF estimation.
Statistica Sinica, to appear.
Jin, H., Sun, X., and Du, P. (2023).
Optimal function-on-function regression with interaction between functional predictors.
Statistica Sinica 33(2), 1047-1068.
Xu, Y., Du, P., Senger, R., Robertson, J., and Pirkle, J. (2021).
ISREA: An efficient peak-preserving baseline correction algorithm for Raman spectra.
Applied Spectroscopy 75(1): 34-45.
Gao, Z., Du, P., Jin, R., and Robertson, J. L. (2020).
Surface temperature monitoring in liver procurement via functional variance change point analysis,
Annals of Applied Statistics 14(1), 143-159.
Gao, Z., Shang, Z., Du, P., and Robertson, J. L. (2019).
Variance change point detection under a smoothly-changing mean trend with application to liver procurement,
Journal of the American Statistical Association 114(526), 773-781.
Du, P., Sun, Z, Chen, H., Cho, J.-H., and Xu, S. (2018).
Statistical estimation of malware detection metrics in the absence of ground truth,
IEEE Transactions on Information Forensics and Security 13(12), 2965-2980.
Chen, T. and Du, P. (2018).
Promotion time cure rate model with nonparametric form of covariate effects,
Statistics in Medicine 37(10), 1625-1635.
Sun, X., Du, P., Wang, X., and Ma, P. (2018).
Optimal penalized function-on-function regression under a reproducing kernel Hilbert space framework,
Journal of the American Statistical Association 113(524), 1601-1611.
Lian, H., Du, P., Li, Y., and Liang, H. (2014).
Partially linear structure identification in generalized additive models with NP-dimensionality, Computational Statistics and Data Analysis 80, 197-208.
Chen, Y., Du, P., and Wang, Y. (2014).
Variable selection in linear models,
WIREs Computational Statistics 6, 1-9.
(a "hopefully" comprehensive review of variable selection for linear regression models)
Du, P. and Wang, X. (2014).
Penalized likelihood functional regression,
Statistica Sinica 24(2), 1017-1041. (program available at the journal site)
Wang, X., Du, P., and Shen, J. (2013).
Smoothing splines with varying smoothing parameter,
Biometrika 100(4), 955-970. (send us
an email for the program used in the paper)
Du, P., Parmeter, C. F., and Racine, J. S. (2013).
Nonparametric kernel regression with multiple
predictors and multiple shape constraints, Statistica Sinica 23(3), 1347-1371.
Liang, H. and Du, P. (2012).
Maximum likelihood estimation in logistic regression models with a diverging number of covariates, Electronic Journal of Statistics 6, 1838-1846.
Du, P., Cheng, G., and Liang, H. (2012).
Semiparametric regression models with additive
nonparametric components and high dimensional parametric components, Computational Statistics and Data Analysis 56(6), 2006-2017.
Wang, L., Du, P., and Liang, H. (2012).
Two-component mixture cure rate model with
spline estimated nonparametric components,
Biometrics 68, 726-735.
Ma, S. and Du, P. (2012).
Variable selection in partly linear regression
model with diverging dimensions for right censored data,
Statistica Sinica 22, 1003-1020.
Du, P., Jiang, Y., and Wang, Y. (2011).
Smoothing spline ANOVA frailty
model for recurrent event data,
Biometrics 67, 1330-1339.
li>
Du, P. and Ma, S. (2010).
Frailty
model with spline estimated nonparametric
hazard function,
Statistica Sinica 20(2), 561-581.
Du, P., Ma, S., and Liang, H. (2010).
Penalized
variable selection procedure for
Cox models with semiparametric relative risk,
Annals of Statistics 38, 2092-2117.
Du, P. and Tang, L. (2009).
Transformation-invariant
and nonparametric monotone smooth estimation of ROC curves,
Statistics in Medicine 28: 349-359.