Dataverse Meeting Agenda

July 11th: Dataverse Community Day 1

Room: Armenise Amphitheater, Harvard Medical School

Keynote & Update on Dataverse Recording

8:30 - 9am Registration and Coffee

9:00 - 9:45am Welcome & Keynote 

Welcome from Mercè Crosas, Chief Data Science and Technology Officer, IQSS 

Keynote: Sarah Thomas, Vice President for the Harvard Library and University Librarian  (Presentation Slides) 

10:00 - 10:30am Update on Dataverse Part 1 

Mercè Crosas, Chief Data Science & Technology Officer, IQSS (Presentation Slides)

Sonia Barbosa, Manager of Data Curation & Manager of the Murray Research Archive, IQSS 

Elizabeth Quigley, User Experience Lead, IQSS 

(Presentation Slides)

10:30 - 10:45am Coffee Break with snacks

10:45am - 12:15pm Dataverse Repositories Around the World Part 1

Moderator: Sonia Barbosa

Piotr Sliz, Associate Professor, Harvard Medical School (Presentation Slides)

Sherry Lake, Scholarly Repository Librarian, University of Virginia (Presentation Slides)

Ryan Steans, Assistant Director, Texas Digital Library (Presentation Slides)

Sophia Lafferty-Hess, Research Data Manager, Odum Institute Data Archive (Presentation Slides)

Miguel Angel Mardero, Coordinator, CARINIANA Network (Presentation Slides)

Sebastian Karcher, Associate Director, QDR

Nic Weber, Associate Technical Director, QDR

(Presentation Slides)

12:15 - 1:30pm Lunch

1:30 - 2:30pm Using Dataverse Part 1 (for social sciences and biomedical sciences)

Moderator: Simo Goshev

Richard Ball, Associate Professor of Economics, Haverford College & Project TIER (Presentation Slides)

Dan O'Brien, Research Director, Boston Area Research Initiative (BARI) & Assistant Professor, Northeastern University (Presentation Slides)

Andres Colubri, Researcher, Sabeti Lab, Broad Institute (Presentation Slides)

Caroline Shamu, Assistant Professor, Harvard Medical School  (Presentation Slides)

2:30 - 3pm Coffee Break

3:00 - 4:30pm Using Dataverse Part 2 (by schools and colleges, research groups and journals)

Moderator: Sophia Lafferty-Hess

Emily Gustainis, Deputy Director, Countway Medical Library at Harvard Medical School  (Presentation Slides)

William G. Jacoby, Professor in the Department of Political Science at Michigan State University & Editor of American Journal of Political Science (Presentation Slides)

Alvaro Lima, Founder, Digaai (Live Demo)

David Ruvolo, Managing Editor, World Historical (Presentation Slides)

Dana Sievers, Research Assistant, PSI (Presentation Slides)

Barbara Mento, Data/GIS Librarian, Boston College (Presentation Slides)

Garth Griffin, Data Scientist, Collaboration with Radcliffe Academic Ventures (Presentation Slides)

Sonia Barbosa, Manager of Data Curation & Manager of the Murray Research Archive, IQSS (Presentation Slides)

Alex Caracuzzo, Research Data & Collections Librarian, Baker Library, Harvard Business School (Presentation Slides)

4:30 - 5pm Summary of First Day

Mercè Crosas, Chief Data Science & Technology Officer, IQSS

5:30pm Reception and Social Event at IQSS, with Poster Session

Room: Cafeteria in the first floor, 1737 Cambridge Street, Cambridge, MA (transportation provided, if you have signed up for it)

Welcome remarks from IQSS


July 12th: Dataverse Community Day 2

Room: Armenise Amphitheater, Harvard Medical School

Keynote & Update on Dataverse Recording

8:30 - 9:00am Registration and Coffee 

9:00 - 9:45am Keynote

Trisha Cruse, Executive Director, DataCite (Presentation Slides)

10:00 - 10:30am Update on Dataverse Part 2 

Len Wisniewski, Director of Engineering, IQSS

Gustavo Durand, Technical Lead, The Dataverse Project

Philip Durbin, Software Developer, The Dataverse Project

Steve Kraffmiller, Software Developer, The Dataverse Project

(Presentation Slides)

10:30 - 10:45am Coffee Break

10:45am - 12:15pm Dataverse Repositories Around the World Part 2

Moderator: Sherry Lake

Alan Darnell, Director of Scholars Portal Services (Presentation Slides)

Jonathan Crabtree, Assistant Director for Archives & Information Technology at Odum Institute (Presentation Slides)

Yin Shenqin, Assistant Director, Fudan University Social Science Data Research Center (Presentation Slides)

David Raila, Senior Research Programmer, National Data Service Labs (Presentation Slides)

Jochen Apel, Senior Technical Officer and Librarian, heiDATA at Heidelberg University (Presentation Slides)

Odu Obiajulu, Senior IT Engineer, UIT Open Data (Presentation Slides)

12:15 - 1:30pm Lunch

1:30 - 2:15pm Tools Connected with Dataverse

Moderator: Bill McKinney

WorldMap and BOP
Ben Lewis, Senior GIS Specialist, Center for Geographic Analysis at IQSS, Harvard University

(Presentation Slides)

Funded by a grant from the Sloan Foundation, the Center for Geographic Analysis (CGA) at Harvard is developing a “big geodata”, remotely hosted, real-time-updated archive which will be made available from within Dataverse.  This is a prototype for a new Dataverse data type hosted outside Dataverse which supports streaming updates, and is accessed via an API.  The CGA is developing 1) the software and hardware platform to support interactive exploration of a billion spatio-temporal objects. 2) an API to provide query access to the archive from Dataverse. 3) client-side tools for querying/visualizing the contents of the archive and extracting data subsets.  The initial system will focus on “geo-tweets” which are tweets containing a GPS coordinate from the originating device.  Currently 1-2% of tweets are geo-tweets, about 8 million per day. The CGA has been harvesting geo-tweets since 2012.

Vito D'Orazio, Assistant Professor, University of Texas at Dallas 
James Honaker, Senior Research Scientist, Harvard University

(Presentation Slides)

TwoRavens is a tool for statistical analysis that allows users, across the range of statistical expertise, to explore data and appropriately construct and interpret statistical models. As a gesture-driven, graphical tool, users are able to visibly explore data, construct statistical models, and interpret results with minimal training in statistical software. TwoRavens integrates with remote data repositories, including Harvard's Dataverse, and may be used to analyze data stored in the repository without ever downloading a local copy of the data. This is particularly useful in a classroom setting for pedagogical purposes, where an instructor may upload example data to Dataverse, explore and analyze that data using TwoRavens in any classroom with an Internet connection, and have confidence that her students can explore and analyze that same data, using the same tool, from any device that can browse the Web. Analyzing data remotely, through a graphical interface, is also particularly useful for datasets that contain sensitive information, as many secure datasets in Dataverse currently do. Our project homepage,, includes user guides, links to our Dataverse, and sample demonstration material.

Scientific data inside scientific articles: Authorea and integration to data repositories
Adyam Ghebre and Josh Nicholson

Tools that scientists use for the preparation of scholarly manuscripts, such as Microsoft Work and LaTeX, function offline and do not account for the digital nature of research data and are not designed for collaboration. Authorea allows scientists to collaboratively write rich data-driven manuscripts on the web and offers a dynamic, interactive experience with an article's text, images, data, and code - paving the road to increased data sharing, data re-use, research reproducibility and Open Science. In this talk, we will showcase Authorea and its current data deposit offering and discuss plans to integrate with data repository systems.

2:15 - 2:30pm Break and Walk to Breakout Sessions

2:30 - 4:00pm Afternoon Breakout Sessions 

Data Provenance Session with Margo Seltzer (TMEC 426 Conference Room)

Operational Dataverse Session  with Kevin Condon and Len Wisniewski (TMEC 443 Conference Room)

Train the Trainer Session with Sonia Barbosa (TMEC 445 Conference Room)

Demos of Tools Integrated with Dataverse (Modell 100A Fred S. Rosen Lecture Hall)

User Testing of Upcoming Dataverse Features (Modell 100A Fred S. Rosen Lecture Hall)

4:15 - 5:30pm: Closing Remarks and Reception

July 13th: Developers Day

9:30 - 10:30am Data Management Workflow Integration

Room: TMEC 426 Conference Room, Tosteson Medical Education Center, Harvard Medical School

This session will focus on the integration of Dataverse with Starfish Storage and ResearchSpace E-Notebooks. This integration could apply or extend to other file data management solutions and e-notebooks.

10:30 - 11:00am Coffee Break 

11:00 - 12:30pm Developers Discussion with Dataverse Dev Team
This session will provide a forum for developers to meet with the core Dataverse development team and discuss topics of interest. Possible topics may include the Dataverse permissions model, the registration provider and export abstractions, or future plans for dynamic metadata (dataset and file level). If participants are interested, the session will also include an overview of the APIs available to application developers who would like to embed Dataverse programmatically into their workflows and tools.  

Gustavo Durand, Technical Lead, The Dataverse Project and Dataverse Project Development Team

12:30 - 1:30pm Lunch with Privacy Workshop