Barbara Lerner, Emery Boose, and Luis Perez. 2018. “Using Introspection to Collect Provenance in R.” Informatics, 5, 12. Publisher's VersionAbstract
Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts ( In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R's powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.
Xueyuan Han, Thomas Pasquier, and Margo Seltzer. 2018. “Provenance-based Intrusion Detection: Opportunities and Challenges.” In Workshop on the Theory and Practice of Provenance (TaPP’18). USENIX. tapp-2018.pdf
Thomas Pasquier, Xueyuan Han, Thomas Moyer, Adam Bates, Olivier Hermant, David Eyers, Jean Bacon, and Margo Seltzer. 2018. “Runtime Analysis of Whole-System Provenance.” In Conference on Computer and Communications Security (CCS'18). ACM. ccs-2018.pdf
Thomas Pasquier, Matthew Lau, Xueyuan Han, Elizabeth Fong, Barbara Lerner, Emery Boose, Merce Crosas, Aaron Ellison, and Margo Seltzer. 2018. “Sharing and Preserving Computational Analyses for Posterity with encapsulator.” IEEE Computing in Science and Engineering (CiSE). cise-2018.pdf
Thomas Pasquier, Jatinder Singh, Julia Powles, David Eyers, Margo Seltzer, and Jean Bacon. 2017. “Data provenance to audit compliance with privacy policy in the Internet of Things.” Springer Personal and Ubiquitous Computing. Publisher's Version ubicomp-2017.pdf
Xueyuan Han, Thomas Pasquier, Tanvi Ranjan, Mark Goldstein, and Margo Seltzer. 2017. “FRAPpuccino: Fault-detection through Runtime Analysis of Provenance.” In Workshop on Hot Topics in Cloud Computing (HotCloud'17). USENIX. han-hotcloud-2017.pdf
Thomas Pasquier, Matthew K. Lau, Ana Trisovic, Emery R. Boose, Ben Couturier, Mercè Crosas, Aaron M. Ellison, Valerie Gibson, Chris R. Jones, and Margo Seltzer. 2017. “If these data could talk.” Nature Scientific Data, 4. Publisher's Version sdata2017114.pdf
Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Eyers, Margo Seltzer, and Jean Bacon. 2017. “Practical Whole-System Provenance Capture.” In Symposium on Cloud Computing (SoCC’17). ACM. socc-2017.pdf
E. R. Boose and B. S. Lerner. 2017. “Replication of Data Analyses: Provenance in R.” In Stepping in the Same River Twice: Replication in Biological Research, edited by A. Shavit and A. M. Ellison. Yale University Press.
Nikilesh Balakrishnan, Thomas Bytheway, Lucian Carata, Oliver RA Chick, James Snee, Sherif Akoush, Ripduman Sohan, Margo Seltzer, and Andy Hopper. 2015. “Recent advances in computer architecture: the opportunities and challenges for provenance.” In Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP).
Lucian Carata, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Seltzer, and Andy Hopper. 2014. “A primer on provenance.” Communications of the ACM, 57, 5, Pp. 52–60.
Michelle A Borkin, Chelsea S Yeh, Madelaine Boyd, Peter Macko, Krzysztof Z Gajos, Margo Seltzer, and Hanspeter Pfister. 2013. “Evaluation of filesystem provenance visualization tools.” IEEE Transactions on Visualization and Computer Graphics, 19, 12, Pp. 2476–2485.
Peter Macko, Daniel Margo, and Margo Seltzer. 2013. “Local clustering in provenance graphs.” In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, Pp. 835–840. ACM.
Peter Macko, Marc Chiarini, Margo Seltzer, and SEAS Harvard. 2011. “Collecting Provenance via the Xen Hypervisor.” In TaPP.
Elaine Angelino, Uri Braun, David A Holland, and Daniel W Margo. 2011. “Provenance Integration Requires Reconciliation.” In TaPP.
Peter Macko and Margo Seltzer. 2011. “Provenance map orbiter: Interactive exploration of large provenance graphs.” In TaPP.
Kiran-Kumar Muniswamy-Reddy and Margo Seltzer. 2010. “Provenance as first class cloud data.” ACM SIGOPS Operating Systems Review, 43, 4, Pp. 11–16.
Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo I Seltzer. 2010. “Provenance for the Cloud.” In FAST, 10: Pp. 15–14.
Aaron M Ellison. 2010. “Repeatability and transparency in ecological research.” Ecology, 91, 9, Pp. 2536–2539. Publisher's Version
Uri Braun, Margo I Seltzer, Adriane Chapman, Barbara T Blaustein, David M Allen, and Len Seligman. 2010. “Towards Query Interoperability: PASSing PLUS.” In TaPP, Pp. 1–10.