Slides for the presentations are available in an Open Science Framework project.

Videos for the sessions are linked below in the agenda.

Tuesday June 15 - Plenary Sessions

All times Eastern Daylight Time (EDT) (UTC−04:00)


9:45am - 10:00am
Coffee via Zoom

10:00am - 10:10am
Gary King (IQSS, Harvard University)

10:10am - 11:30am
Main Presentation/Keynotes
Moderator: Mercè Crosas (IQSS, Harvard University)
Carole Anne Goble (University of Manchester)
"FAIR workflows and Research Objects get a workout."
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services?  Package them up with data and other stuff (or just package up data for that matter)? How?  WorkflowHub ( and RO-Crate Research Objects ( that’s how!  A step towards FAIR Digital Objects gets a workout.

Daniel S. Katz (National Center for Supercomputing Applications)
"Progress Toward More Sustainable Research Software"
This talk will lay out a vision for research software in research, and will discuss progress towards two elements of this vision. The first is making research software FAIR - findable, accessible, interoperable, and reusable, and the second is making research software citable. In both cases, past, current, and future work is split between definitions and implementation.

Herbert Van de Sompel (DANS)
"FAIR Signposting: A KISS Approach to a Burning Issue"
My involvement, over the years, in a range of interoperability efforts has brought the insight that two factors strongly influence adoption: addressing a burning issue and delivering a KISS solution to tackle it. Undoubtedly, FAIR and FAIR DOs are burning issues. FAIR Signposting is an ad-hoc repository interoperability effort that squarely fits in this problem space and that purposely specifies a KISS solution, hoping to inspire wide adoption.

11:30am - 11:45am

11:45am - 12:15pm
Dataverse Software - New Features and Future Plans
Danny Brooke (IQSS, Harvard University)

12:15pm - 12:45pm
Community Update Videos

12:45pm - 1:00pm
#Dataverse2021: What's Ahead on Wednesday and Thursday
Sonia Barbosa (IQSS, Harvard University)

Tuesday June 15 - Breakout Session

All times Eastern Daylight Time (EDT) (UTC−04:00)

6:00pm - 7:00pm
GDCC/Governance Session 1:
Mercè Crosas (IQSS, Harvard University)
Jonathan Crabtree (Odum Institute UNC Chapel Hill)
The session is open to all existing Global Dataverse Community Consortium (GDCC) members and others interested in becoming a member or learning more about GDCC. It will address the following topics:

  • Financial updates
  • Ongoing projects
  • GDCC website
  • Governance and Advisory Board

Two identical sessions will be provided in order to give an opportunity for all GDCC members to participate. The other session is offered on Wednesday at 8:00am EDT.

Wednesday June 16 - Breakout Sessions

All times Eastern Daylight Time (EDT) (UTC−04:00)

8:00am - 9:00am
GDCC/Governance Session 2:
Mercè Crosas (IQSS, Harvard University)
Jonathan Crabtree (Odum Institute UNC Chapel Hill)
The session is open to all existing Global Dataverse Community Consortium (GDCC) members and others interested in becoming a member or learning more about GDCC. It will address the following topics:

  • Financial updates
  • Ongoing projects
  • GDCC website
  • Governance and Advisory Board

Two identical sessions will be provided in order to give an opportunity for all GDCC members to participate. The other session is offered on Tuesday at 6:00pm EDT.

8:00am - 9:30am
Core Trust Seal:
Sonia Barbosa (IQSS, Harvard University)
Karine Burger (Portage Network)
Philipp Conzett (UiT The Arctic University of Norway, DataverseNO)
Laura Rezende (Federal University of Goiás- Brazil)
The CoreTrustSeal is the most widely used certification for helping establish trustworthy data repositories. The application process to achieve certification includes a peer-reviewed self-assessment of a repository’s facilities, organization, and policies.

It’s been one year since our last conversation about Dataverse repositories and CoreTrustSeal certification. In this breakout session, we will learn more about the recently published Dataverse Project Guide for CoreTrustSeal and from repository managers about the impact that receiving certification has had or will have upon their Dataverse installations.
Chairs: Ceilyn Boyd, James Doiron, Katie Mika

9:45am - 11:15am
Flexible Metadata/Controlled Vocabularies:
Philipp Conzett (UiT The Arctic University of Norway)
John Graybeal (Stanford)
Jim Myers (Global Dataverse Community Consortium)
Slava Tykhonov (DANS-KNAW/SSHOC Dataverse)
This session is about flexible metadata support in the Dataverse software. Following up on our discussions at the Dataverse Community Meeting in 2020, we will present and discuss ongoing efforts and future ideas for improving the FAIR support of the Dataverse software focusing on metadata schemas and controlled vocabularies. Important issues will be raised such as interoperability, compliance with metadata standards, integration with external services, and the maintenance and sustainability of flexible metadata support. The session will contain of three presentations followed by Q&A and discussion after each presentation. We encourage you to add questions beforehand in this Google doc.
Chairs: Marion Wittenberg, Philipp Conzett, Slava Tykhonov

11:30am - 1:00pm
Remote Storage and Large Datasets:
Kate Dreher (CIMMYT): Exploring the Costs of Freely Sharing Data Through Dataverse at CIMMYT
Jesús Herrera de la Cruz (CIMMYT): A Pilot Integration of a Blockchain-Based Storage Solution With Dataverse
Ben Golub (Storj): A New Model for Globally Sustainable, Decentralized Storage
Vas Vasiliadis (Globus): Dataverse and Globus integration
Jim Myers (GDCC):  Going Direct for Larger Data Uploads
Deirdre Kirmis and Matt Harp (Systems & Security, Technology Services, ASU Library): ASU Library Large File Direct Upload Project
This session will cover the need for affordable large data support. Currently, the high-cost constraints of bandwidth use stemming from cloud storage prevent many repositories from supporting and sharing large data. This impedes the support repositories can provide to their research communities that have a need to share large data and ultimately limits the evolution of repositories and the analytical aspects related to the current big data process. How do we make it affordable for repositories to fulfill this need? How can we complement existing infrastructure and tools with new approaches to achieve these objectives? How can we enhance collaboration between developers and mortals, to find solutions.
Chairs: Jesús Herrera de la Cruz, Sonia Barbosa, Amber Leahey

1:15pm - 2:45pm
Terms of Use, Licensing, Installation Policies:
Julian Gautier (IQSS, Harvard University)
Wim Hugo (DANS)
High quality access to data outputs and published research objects depends on repositories clearly defining terms and licenses for reuse. This session will present an overview of the importance of data licensing, describe some of the recent work in the Dataverse Community for adding software support for multiple license options, and introduce opportunities and implications for machine readability and automation of licensing in the Dataverse software and other repositories. Following short presentations and Q&A, the discussion will be structured around complex licensing scenarios and community input on policies for data access and reuse including privacy, GDPR, and other related subjects.
Chair: Katie Mika

3:00pm - 4:30pm
Sensitive Data:
Julie Goldman (Harvard Countway Library): Preparing for Sensitive Data Use Throughout the Data Lifecycle
Sebastian Karcher (Qualitative Data Repository): Sensitive Qualitative Data at QDR: Current Processes and Next Steps
Ellen Kraffmiller (IQSS, Harvard University) and Raman Prasad (SEAS, Harvard University): Dataverse and OpenDP
Caroline Wood (Harvard Medical School): Health Policy Research Data
Aakash Sharma, Håvard D. Johansen, and Thomas Bye Nilsen (DataverseNO UiT Department of Computer Science): Compliant Sharing of Sensitive Data with Dataverse and Lohpi
Sensitive data support is becoming an increasingly important need for the research community. During this session, we will hear about the management and support of sensitive data. This will include an introduction to the DPCreator; overviews of the management of sensitive qualitative data and health policy research data; and an update on the Harvard Medical School’s DMP tool and how it can help the community prepare to share sensitive data.
Chairs: Sonia Barbosa, Julian Gautier

4:45pm - 6:15pm    
Geospatial Data:
Kevin Worthington (CSU): Geospatial Data Discovery and Exploration at CSU Libraries
Wim Hugo (DANS): Geospatial Data Use Cases
Maura Carbone (Harvard), Marc McGee (Harvard): Harvard Dataverse/Harvard Geospatial Library Integration
Jamie Jamison, Kristian Allen, and Zhiyuan Yao (UCLA): Integration of Dataverse and Geospatial Applications at UCLA
Paul Dante (UBC): Geodisy
Jim Myers (GDCC): Ideas for Supporting Geospatial Previews in Dataverse
This session will highlight current efforts and provide a forum for discussion of geospatial data within the Dataverse software. At present, The Dataverse software’s geospatial data capability consists solely of geospatial metadata entered manually. Researchers could better discover and explore geospatial data if the Dataverse software had improved geospatial capabilities; however, the many different geospatial data formats complicate this effort. Our community is pursuing a broad range of geospatial development, and design discussions, with this session aimed at furthering these discussions collectively to develop priorities for next steps.
Chairs: Jim Myers, Kevin Worthington

6:30pm - 8:00pm
Integrations and External Tools:
Steven McEachern (Australian Data Archive)
Lukas Rosario (RE3)
Andreas Oliveira (RE3)
Nick Lilovich (The Collaboratory)
The External Tools Framework was originally designed to connect the Dataverse Software to tools that preview, explore, and curate datasets. More recently, we've been examining additional ways to link to interesting projects and enhance what is possible within the Dataverse Ecosystem. We'll start with a brief intro discussing the history and philosophy behind the framework, then view some demos on these new and innovative projects: a tool to upload from Globus, a tool that can tie into a Request Access Workflow, a project that highlights datasets and papers that are conceptually similar to one another, and a tool for code readability and reproducibility. Finally, we'll have some time for discussion on these developments and other ways we can continue to expand the framework.
Chairs: Gustavo Durand, Leonid Andreev

Thursday June 17 - Breakout Sessions

All times Eastern Daylight Time (EDT) (UTC−04:00)

8:00am - 9:30am
Software Metadata and Containerization:
Arfon Smith (Github): Five Ways You Can Use Github to Automate Scholarly Work
Stian Soiland-Reyes (University of Manchester): Capturing Just Enough Data, Software and Metadata With RO-Crate
Martin Fenner (Datacite): Standardize Software Metadata Using Codemeta
Support for computational artifacts such as software, workflows, and containers is needed in the Dataverse software in order to meet the demands of advanced data analysis and enable research reproducibility and reuse. This session will feature invited presentations on software metadata Codemeta, Research Object and RO-Crate metadata, git integration, and adequate documentation and guidelines. We aim to identify practical considerations and implementation steps on the topic for the Dataverse software platform. Following the presentations and Q&A, we will have a discussion on shaping Dataverse software support for Codemeta, RO-Crate, and git integration.
Chairs: Oliver Bertuch, Dorothea Iglezakis, Ana Trisovic

9:45am - 11:15am
Ceilyn Boyd and Tricia Patterson (Harvard Library): Preservation Pathways for Research Data at Harvard Library
Grant Hurley (Scholars Portal): AIPing Research Data: Integrating Dataverse and Archivematica
Rondineli Saad (Scielo Network- Brazil): Preserving Dataverse Using Archivematica
Nic Weber (QDR), Jim Myers (GDCC), Sebastian Karcher (QDR), Seba Ostrowski (QDR): Human-in-the-Loop Preservation at QDR
Courtney Mumma (Texas Digital Library): Preservation of the Texas Data Repository
Deirdre Kirmis (ASU Library): ASU Research Data Repository Backup Workflow
The main question of the session is what are the relevant preservation practices designed to ensure data in a Dataverse installation is usable for as long as it needs to be kept within the organizations. The experiences to be presented will explore what best practices exist to manage data preservation and the efforts that have been adopted so far by the Dataverse community for the purpose of archiving content to ensure interoperability, consistency and the safety and security of digital data.
Chairs: Sonia Barbosa, Miguel Ángel Márdero Arellano

11:30am - 1:00pm
Curation Workflows:
MingJing Peng (Australian Data Archive): ADA Deposit and Preservation Tool (ADAPT2)
Philipp Conzett (UiT The Arctic University of Norway): Curation Support in DataverseNO
Mara Blake (Johns Hopkins University & Data Curation Network): Collaborative and Local Data Curation: The Data Curation Network and the JHU Data Archive
Amber Leahey (Scholars Portal): Curating Survey Data in Dataverse
Mandy Gooch (ODUM: CoRe2): Supporting Reproducible Research and Data Curation: An Overview of the Odum Institute’s Data Curation Practices and Tools
Michael Steeleworthy and Alex Cooper (NDRIO-Portage Network Dataverse Metadata Working Group, Curation Guide Working Group): Documenting Best Practices: Dataverse Metadata and Curation Guides
Sonia Barbosa (IQSS, Harvard University): Harvard Dataverse Curation Services
In order for data to be more easily discoverable, better understood, and re-usable, it must be “curated”. There are many levels of data curation. The panelists for this session will describe the tools, best practices, and the level of curation performed on datasets in their respective Dataverse repositories.
Chairs: Sonia Barbosa, Sherry Lake

1:15pm - 1:45pm
Closing Session:
Danny Brooke (IQSS, Harvard University)

Monday June 14 - Workshop Session

All times Eastern Daylight Time (EDT) (UTC−04:00)

8:00am - 9:30am
Automated CI/CD Testing, Installation and Deployment of a Dataverse Installation on a Cloud:
Slava Tykhonov (DANS-KNAW/SSHOC Dataverse)
Don Sizemore (Odum Institute UNC Chapel Hill)
Samuel Bernardo (LIP/EOSC Synergy)
Stefan Kasberger (AUSSDA)
Over the last several years, the Dataverse Project has increased in maturity and has moved forward with CI/CD as a pipeline concept. In addition to the traditional topics around DevOps, Installation Deployments with Vagrant, Ansible, Jenkins, and Automated Testing, attendees will discuss how the Dataverse Community currently uses Cloud technologies such as Docker and Kubernetes to help with the development process and running Dataverse in both production and non-production environments. Serverless computing has also increased in importance recently, to allow for the deployment of shared services as a part of the common data infrastructure.

You will learn more about the Cloud related features available in the pyDataverse module, which is widely used by the Dataverse Community for various tasks. We are going to show some practical use cases with the Selenium framework used for automation testing for Jenkins integration.
You will also learn about the Software Quality as a Service (SQAaaS) framework that was developed in EOSC Synergy project in the context of the European Open Science Cloud, which covers the following topics:

  • Promoting adoption of software best practices
  • Automatically validating software and services quality of both: thematic services and generic services
  • Promoting the adoption of FAIR data principles
  • Leveraging actionable features on data repositories to analyse and validate FAIR compliance

The further automation of all processes allows the community to increase the Dataverse Software ecosystem capabilities and build maintainable and sustainable services integrated with the data repository.
Chairs: Slava Tykhonov, Don Sizemore

10:00am - 11:30am
Introduction to The Dataverse Software:
Sonia Barbosa (IQSS, Harvard University)
This session will introduce the Dataverse Software for data sharing and preservation, using the Harvard Dataverse Repository as the model. You will have an opportunity to learn about the software and how it can support institutional repositories, research projects, team projects, teaching courses, and the general data sharing needs of your community, including:

  • How the Dataverse Software supports FAIR data sharing standards
  • How to curate data using the Dataverse tool
  • Data analysis and visualization tool integrations
  • Metadata Support
  • Featured APIs
  • Sharing restricted and sensitive data

You will have an opportunity to ask questions during this session.