%0 Journal Article %J Informatics %D 2018 %T Using Introspection to Collect Provenance in R %A Barbara Lerner %A Emery Boose %A Luis Perez %K data provenance %K R %K RDataTracker %X Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R's powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility. %B Informatics %V 5 %G eng %U http://www.mdpi.com/2227-9709/5/1/12/htm %N 12 %0 Conference Paper %B Workshop on the Theory and Practice of Provenance (TaPP’18) %D 2018 %T Provenance-based Intrusion Detection: Opportunities and Challenges %A Xueyuan Han %A Thomas Pasquier %A Seltzer, Margo %B Workshop on the Theory and Practice of Provenance (TaPP’18) %I USENIX %G eng %0 Conference Paper %B Conference on Computer and Communications Security (CCS'18) %D 2018 %T Runtime Analysis of Whole-System Provenance %A Thomas Pasquier %A Xueyuan Han %A Moyer, Thomas %A Bates, Adam %A Hermant, Olivier %A Eyers, David %A Bacon, Jean %A Seltzer, Margo %B Conference on Computer and Communications Security (CCS'18) %I ACM %G eng %0 Journal Article %J IEEE Computing in Science and Engineering (CiSE) %D 2018 %T Sharing and Preserving Computational Analyses for Posterity with encapsulator %A Thomas Pasquier %A Matthew Lau %A Xueyuan Han %A Elizabeth Fong %A Barbara Lerner %A Emery Boose %A Merce Crosas %A Aaron Ellison %A Seltzer, Margo %B IEEE Computing in Science and Engineering (CiSE) %I IEEE %G eng %0 Journal Article %J Springer Personal and Ubiquitous Computing %D 2017 %T Data provenance to audit compliance with privacy policy in the Internet of Things %A Thomas Pasquier %A Jatinder Singh %A Julia Powles %A Eyers, David %A Seltzer, Margo %A Bacon, Jean %B Springer Personal and Ubiquitous Computing %I Springer %G eng %U https://link.springer.com/epdf/10.1007/s00779-017-1067-4?author_access_token=SodeflNjmza43xrdBnb0Rfe4RwlQNchNByi7wbcMAY42eQQGCyJLzFVf9criL4waDvy0TWCA7mDdrcsBlffflkDzHERaEGboYD7ss5BbmwTjz0ZZg1CvzGksodvWzqnRBMz2mMhdcz28clv5y0FM2Q%3D%3D %0 Conference Paper %B Workshop on Hot Topics in Cloud Computing (HotCloud'17) %D 2017 %T FRAPpuccino: Fault-detection through Runtime Analysis of Provenance %A Xueyuan Han %A Thomas Pasquier %A Tanvi Ranjan %A Mark Goldstein %A Seltzer, Margo %B Workshop on Hot Topics in Cloud Computing (HotCloud'17) %I USENIX %G eng %0 Journal Article %J Nature Scientific Data %D 2017 %T If these data could talk %A Thomas Pasquier %A Lau, Matthew K. %A Trisovic, Ana %A Boose, Emery R. %A Couturier, Ben %A Mercè Crosas %A Ellison, Aaron M. %A Gibson, Valerie %A Jones, Chris R. %A Seltzer, Margo %B Nature Scientific Data %V 4 %G eng %U https://www.nature.com/articles/sdata2017114 %0 Conference Paper %B Symposium on Cloud Computing (SoCC’17) %D 2017 %T Practical Whole-System Provenance Capture %A Thomas Pasquier %A Xueyuan Han %A Mark Goldstein %A Moyer, Thomas %A Eyers, David %A Seltzer, Margo %A Bacon, Jean %B Symposium on Cloud Computing (SoCC’17) %I ACM %G eng %0 Book Section %B Stepping in the Same River Twice: Replication in Biological Research %D 2017 %T Replication of Data Analyses: Provenance in R %A E. R. Boose %A B. S. Lerner %E A. Shavit %E A. M. Ellison %B Stepping in the Same River Twice: Replication in Biological Research %I Yale University Press %G eng %& Replication of Data Analyses: Provenance in R %0 Conference Paper %B Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP) %D 2015 %T Recent advances in computer architecture: the opportunities and challenges for provenance %A Balakrishnan, Nikilesh %A Bytheway, Thomas %A Carata, Lucian %A Chick, Oliver RA %A Snee, James %A Akoush, Sherif %A Sohan, Ripduman %A Seltzer, Margo %A Hopper, Andy %B Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP) %G eng %0 Journal Article %J Communications of the ACM %D 2014 %T A primer on provenance %A Carata, Lucian %A Akoush, Sherif %A Balakrishnan, Nikilesh %A Bytheway, Thomas %A Sohan, Ripduman %A Seltzer, Margo %A Hopper, Andy %B Communications of the ACM %I ACM %V 57 %P 52–60 %G eng %N 5 %0 Journal Article %J IEEE Transactions on Visualization and Computer Graphics %D 2013 %T Evaluation of filesystem provenance visualization tools %A Michelle A. Borkin %A Chelsea S. Yeh %A Madelaine Boyd %A Peter Macko %A Krzysztof Z. Gajos %A Seltzer, Margo %A Hanspeter Pfister %B IEEE Transactions on Visualization and Computer Graphics %I IEEE %V 19 %P 2476–2485 %G eng %N 12 %0 Conference Paper %B Proceedings of the 22nd ACM international conference on Information & Knowledge Management %D 2013 %T Local clustering in provenance graphs %A Peter Macko %A Daniel Margo %A Seltzer, Margo %B Proceedings of the 22nd ACM international conference on Information & Knowledge Management %I ACM %P 835–840 %G eng %0 Conference Paper %B TaPP %D 2011 %T Collecting Provenance via the Xen Hypervisor. %A Peter Macko %A Chiarini, Marc %A Seltzer, Margo %A Harvard, SEAS %B TaPP %G eng %0 Conference Paper %B TaPP %D 2011 %T Provenance Integration Requires Reconciliation. %A Angelino, Elaine %A Uri Braun %A Holland, David A %A Margo, Daniel W %B TaPP %G eng %0 Conference Paper %B TaPP %D 2011 %T Provenance map orbiter: Interactive exploration of large provenance graphs. %A Peter Macko %A Seltzer, Margo %B TaPP %G eng %0 Journal Article %J ACM SIGOPS Operating Systems Review %D 2010 %T Provenance as first class cloud data %A Kiran-Kumar Muniswamy-Reddy %A Seltzer, Margo %B ACM SIGOPS Operating Systems Review %I ACM %V 43 %P 11–16 %G eng %N 4 %0 Conference Paper %B FAST %D 2010 %T Provenance for the Cloud. %A Kiran-Kumar Muniswamy-Reddy %A Peter Macko %A Seltzer, Margo I %B FAST %V 10 %P 15–14 %G eng %0 Journal Article %J Ecology %D 2010 %T Repeatability and transparency in ecological research %A Ellison, Aaron M %B Ecology %V 91 %P 2536–2539 %G eng %U http://harvardforest.fas.harvard.edu/sites/harvardforest.fas.harvard.edu/files/ellison-pubs/2010/ellison\_2010\_ecology.pdf %N 9 %0 Conference Paper %B TaPP %D 2010 %T Towards Query Interoperability: PASSing PLUS. %A Uri Braun %A Seltzer, Margo I %A Chapman, Adriane %A Blaustein, Barbara T %A Allen, M David %A Seligman, Len %B TaPP %P 1–10 %G eng %0 Conference Paper %B Workshop on the Theory and Practice of Provenance %D 2009 %T The Case for Browser Provenance. %A Margo, Daniel W %A Seltzer, Margo I %B Workshop on the Theory and Practice of Provenance %G eng %0 Conference Paper %B USENIX Annual technical conference %D 2009 %T Layering in Provenance Systems. %A Kiran-Kumar Muniswamy-Reddy %A Uri Braun %A Holland, David A %A Peter Macko %A MacLean, Diana L %A Margo, Daniel W %A Seltzer, Margo I %A Robin Smogor %B USENIX Annual technical conference %G eng %0 Conference Paper %B Workshop on the Theory and Practice of Provenance %D 2009 %T Making a Cloud Provenance-Aware. %A Kiran-Kumar Muniswamy-Reddy %A Peter Macko %A Seltzer, Margo I %B Workshop on the Theory and Practice of Provenance %G eng %0 Conference Paper %B Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications %D 2009 %T Provenance: a future history %A Cheney, James %A Stephen Chong %A Foster, Nate %A Seltzer, Margo %A Vansummeren, Stijn %B Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications %I ACM %P 957–964 %G eng %0 Journal Article %D 2008 %T Choosing a data model and query language for provenance %A Holland, David A %A Braun, Uri Jacob %A Diana Maclean %A Kiran-Kumar Muniswamy-Reddy %A Seltzer, Margo I %I Springer %G eng %0 Journal Article %J Concurrency and Computation: Practice and Experience %D 2008 %T PASSing the provenance challenge %A Holland, David A %A Seltzer, Margo I %A Uri Braun %A Kiran-Kumar Muniswamy-Reddy %B Concurrency and Computation: Practice and Experience %I Wiley Online Library %V 20 %P 531–540 %G eng %N 5 %0 Conference Paper %B HotSec %D 2008 %T Securing Provenance. %A Uri Braun %A Avraham Shinnar %A Seltzer, Margo I %B HotSec %G eng %0 Journal Article %J Ecological Informatics %D 2007 %T Ensuring reliable datasets for environmental models and forecasts %A Boose, Emery R. %A Ellison, Aaron M. %A Osterweil, Leon J. %A Clarke, Lori a. %A Podorozhny, Rodion %A Hadley, Julian L. %A Wise, Alexander %A Foster, David R. %K Analytic web %K Little-JIL %K Metadata %K Process %K Sensor network %K Water flux %B Ecological Informatics %V 2 %P 237–247 %@ 1574-9541 %G eng %R 10.1016/j.ecoinf.2007.07.006 %0 Conference Paper %B International Provenance and Annotation Workshop %D 2006 %T Issues in automatic provenance collection %A Uri Braun %A Simson Garfinkel %A Holland, David A %A Kiran-Kumar Muniswamy-Reddy %A Seltzer, Margo I %B International Provenance and Annotation Workshop %I Springer %P 171–183 %G eng %0 Conference Paper %B USENIX Annual Technical Conference, General Track %D 2006 %T Provenance-aware storage systems. %A Kiran-Kumar Muniswamy-Reddy %A Holland, David A %A Uri Braun %A Seltzer, Margo I %B USENIX Annual Technical Conference, General Track %P 43–56 %G eng %0 Conference Paper %B Data Engineering Workshops, 2005. 21st International Conference on %D 2005 %T Provenance-aware sensor data storage %A Jonathan Ledlie %A Chaki Ng %A Holland, David A %B Data Engineering Workshops, 2005. 21st International Conference on %I IEEE %P 1189–1189 %G eng