Repeated cross-sectional time series single cell data confound several sources of variation, with contributions from measurement noise, stochastic cell-to-cell variation and cell progression at different rates. Time series from single cell assays are particularly susceptible to confounding as the measurements are not averaged over populations of cells. When several genes are assayed in parallel these effects can be estimated and corrected for under certain smoothness assumptions on cell progression.
We present a principled probabilistic model with a Bayesian inference scheme to analyse such data. We demonstrate our method's utility on public microarray, nCounter and RNA-seq data sets from three organisms. Our method almost perfectly recovers withheld capture times in an Arabidopsis data set, it accurately estimates cell cycle peak times in a human prostate cancer cell line and it correctly identifies two precocious cells in a study of paracrine signalling in mouse dendritic cells. Furthermore, our method compares favourably with Monocle, a state-of-the-art technique. We also show using held-out data that uncertainty in the temporal dimension is a common confounder and should be accounted for in analyses of repeated cross-sectional time series.
Our method is available on CRAN in the DeLorean package.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Bioinformatics (Oxford, England). 2016 Jun 17 [Epub ahead of print]
John E Reid, Lorenz Wernisch
Biostatistics Unit, MRC, Cambridge, CB2 0SR, United Kingdom ., Biostatistics Unit, MRC, Cambridge, CB2 0SR, United Kingdom.