Method Preservation: workflows and models matter

Back to: 
Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access

Carole Goble (The University of Manchester, UK; The Software Sustainability Institute UK)

Data preservation is, of course, important. But so is the preservation of method. In practice the exchange, reuse, reproduction and preservation of data-centric experiments requires the bundling and exchanging the experimental methods, computational codes, algorithms, workflows and so on along with the narrative and the data. "FAIR Research Objects [1]" are composite and evolutionary, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released.

In the EU Wf4ever project we set out a Research Object Framework for preserving computational workflows, in particular those using remote and independently stewarded datasets and third party services [2]. This framework is the basis of the community effort on the Common Workflow Language and has been developed for models in the FAIRDOM System Biology Commons [3] and STELAR Asthma eLab [4]. The RO term has gathered momentum as the NIH BD2K program is building a Research Object Commons.

ROs are metadata objects for explicitly describing aggregations or packages of content: boxes of components, and assembling instructions, with a shipping manifest for what is in the box and where it is from.  We specify the ontologies needed to construct manifests (aggregation and annotation) and to guide their content (checklists, provenance, versioning, dependencies).  The RO container is implemented using off-the-self platforms, like Zip, BagIt, and Docker. The RO content is not all physically within but likely logically held outside – the containers have “holes” in them. We need ways of knowing where their content is and it has changed and identifiers to glue the whole thing together.

In this talk we will discuss workflow/computational method reproducibility and how the Research Object metadata framework helps preserve computational artifacts alongside their data. I will also raise the importance of the sustainability of software in the preservation landscape [5].

[1] Bechhofer et al (2013) Why Linked Data is Not Enough for Scientists, Future Generation Computer Systems, doi:10.1016/j.future.2011.08.004 http://www.researchobject.org

[2] Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, Web Semantics: Science, Services and Agents on the World Wide Web, doi:10.1016/j.websem.2015.01.003

[3] Wolstencroft et al (2015) SEEK: a systems biology data and model management platform BMC Systems Biology doi: 10.1186/s12918-015-0174-y http://www.fair-dom.org

[4] Custovic et al (2015) The Study Team for Early Life Asthma Research (STELAR) consortium ‘Asthma e-lab’: team science bringing data, methods and investigators together, Thorax doi:10.1136/thoraxjnl-2015-206781 (https://www.asthmaelab.org)

[5] The Software Sustainability Institute UK, http://www.software.ac.uk

AttachmentSize
PDF icon Presentation slides4.99 MB