Home / Events / Open Science and Sustainable Software for Data-driven Discovery

Open Science and Sustainable Software for Data-driven Discovery

Monday, 18 July 2022, 9.15am to 5.30pm
Location: Aurora Innovation Centre - British Antarctic Survey, High Cross, Madingley Road, Cambridge, CB3 0ET

Cambridge Centre for Data-Driven Discovery (C2D3) and British Antarctic Survey (BAS) invite participants to the Open Science and Sustainable Software for Data-driven Discovery workshop. 

This in-person one-day workshop will explore the topics of reproducible data science, ethical data science, collaborative data science, The Turing Way, trustworthy AI, and transparency. The day will give you the opportunity to meet peers with similar experiences to you own and build your research support network. 

The morning session will start with a presentation on the importance and benefits of openness and sustainability, followed by guidance on online resources and local support available to participants. We will round off the morning with hands-on laptop exercises, where the workshop tutors will be available to help you through the tasks. 

The afternoon session will start with a panel discussion with speakers from different disciplines. We will discuss how the speakers use open science and sustainable software in their research. We will then hear about real-world uses cases from three speakers. The afternoon will have plenty of opportunity to engage with the speakers. 

We will finish the day with a drink reception! 

Participant prerequisites

This is an introductory workshop suited to participants with little or no experience with the topics covered, or as a refresher workshop for more experienced participants. 

Participants should bring their laptop for the hands-on exercises. 


Places are very limited; therefore, you should only register if you are able to attend in-person for the whole day. 

Please reserve your space using the appropriate link below, if you are eligible in one of the four categories:

There will be a waiting list in operation once all the available spaces have been filled. 


09:15-09:30 Registration and arrival refreshments

Session 1: Introduction to open data science tools and techniques 

09:30-11:00 Setting yourself up for effective and open data science 

11:00-11:30 Break with refreshments

11:30-13:00 Exploring the tricks and tools of the trade 

13:00-14:00 Lunch

Session 2: Exploring data sciences through discussion and real-world use cases 

14:00-15:30 Panel Discussion

15:30-16:00 Break with refreshments

16:00-17:30 Presentations of real-use cases

17:30-18:15 Drinks reception

Travel Information

Cycle - Allow approximately 30 minutes from the station and 20 minutes from the city centre.

Car - The British Antarctic Survey (BAS) is located close to the M11 motorway, on the outskirts of Cambridge. Visitor parking is available at Madingley Road Park and Ride.

Bus, Train or further information - Travel to BAS

Open Science and Sustainable Software for Data-driven Discovery Workshop poster

Forthcoming talks


Thursday, 7 July 2022, 4.00pm to 5.00pm
Speaker: Heidi Howard, Microsoft Research
Venue: FW11 and


Synthetics with Digital Humans

Friday, 8 July 2022, 12.00pm to 1.00pm
Speaker: Dr. Erroll Wood (Staff Software Engineer at Google)
Venue: (meeting ID: 649 250 9351 / passcode: 7mu5ZJ)


Nowadays, collecting the right dataset for machine learning is often more challenging than choosing the model. We address this with photorealistic synthetic training data – labelled images of humans made using computer graphics. With synthetics we can generate clean labels without annotation noise or error, produce labels otherwise impossible to annotate by hand, and easily control variation and diversity in our datasets. I will show you how synthetics underpins our work on understanding humans, including how it enables fast and accurate 3D face reconstruction, in the wild.


Dr. Erroll Wood is a Staff Software Engineer at Google, working on Digital Humans. Previously, he was a member of Microsoft's Mixed Reality AI Lab, where he worked on hand tracking for HoloLens 2, avatars for Microsoft Mesh, synthetic data for face tracking, and Holoportation. He did his PhD at the University of Cambridge, working on gaze estimation.

Google Calendar for Future Seminars:

Combining multi-omics and biological knowledge to extract disease mechanisms

Monday, 11 July 2022, 3.00pm to 4.00pm
Speaker: Julio Saez-Rodriguez, Faculty of Medicine of Heidelberg University, Director of the Institute of Computational Biomedicine and Group Leader at the EMBL- Heidelberg University Molecular Medicine Partnership Unit (MMPU)
Venue: CRUK CI Lecture Theatre

Multi-omics technologies, and in particular those with single-cell and spatial resolution, provide unique opportunities to study deregulation of intra- and inter-cellular processes in cancer and other diseases. In this talk I will present recent methods and applications from our group towards this aim, with a focus is on computational approaches that combine data with biological knowledge within statistical and machine learning methods. This combination allows us to increase both the statistical power of our approaches and the mechanistic interpretability of the results. I will also discuss the value to perform perturbation studies, combined with mathematical modeling, to increase our understanding and therapeutic opportunities. Finally, I will show how, using novel microfluidics-based technologies, this approach can also be applied directly to biopsies, allowing to build mechanistic models for individual cancer patients, and use these models to propose new therapies.

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

Tuesday, 12 July 2022, 3.00pm to 4.00pm
Speaker: Martin Fajčík ( Brno University of Technology )
Venue: Computer Lab, FW26


We present Claim-Dissector: a novel latent variable model for fact-checking and fact-analysis, which given a claim and a set of retrieved provenances allows learning jointly (i) what are the provenances relevant to this claim (ii) what is the veracity of this claim. We show that our system achieves state-of-the-art results on FEVER comparable to two-stage systems often used in traditional fact-checking pipelines, while using significantly less parameters and computation.
Our analysis shows that proposed approach further allows to learn not just which provenances are relevant, but also which provenances lead to supporting and which toward denying the claim, without direct supervision. This not only adds interpretability, but also allows to detect claims with conflicting evidence automatically. Furthermore, we study whether our model can learn fine-grained relevance cues while using coarse-grained supervision. We show that our model can achieve competitive sentence-recall while using only paragraph-level relevance supervision. Finally, traversing towards the finest granularity of relevance, we show that our framework is capable of achieving strong token-level interpretability. To do this, we present a new benchmark focusing on token-level interpretability ― humans annotate tokens in relevant provenances they considered essential when making their judgement. Then we measure how similar are these annotations to tokens our model is focusing on. Our code, dataset and demo will be released online.


Martin Fajčík (read as Fay-Cheek) is a PhD candidate in Natural Language Processing from Knowledge Technology Research Group active at FIT-BUT in Brno, Czech Republic, advised by prof. Pavel Smrž (ž is read like j in french "Jean"). From 2021, he also works as a research assistant in IDIAP research institute based in Martigny, Switzerland. His PhD work is focusing on open-domain knowledge processing, mainly in question answering and fact-checking. He enjoys a good hikes and an informal discussions over tea.

Statistics Clinic Summer 2022 I

Wednesday, 13 July 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

If you would like to participate, please fill in the following "form": The deadline for signing up for a session is 12pm on Monday the 11th of July. Subject to availability of members of the Statistics Clinic team, we will confirm your in-person or remote appointment.

This event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.