Home / Events / High Dimensional Big Data Engineering

High Dimensional Big Data Engineering

Friday, 22 January 2016, 9.00am to Monday, 22 January 2018, 5.00pm
Location: FW11, Computer Laboratory, Cambridge

Thanks to advances in monitoring devices and modelling techniques, modern big data grows in both its quantity and dimensionality. Many popular applications involve processing and understanding the information contained in high-dimensional datasets, for example, document classification, pattern recognition, intrusion detection, recommender systems, etc. The intelligence of these applications heavily relies on the efficacy of processing and extracting meaningful patterns out of the datasets and the accuracy of searching. In reality, the balance between efficiency and accuracy plays a key role in building scalable services.

The workshop, held at the Computer Laboratory, University of Cambridge on Friday 22 January 2016, brought together the experts in computer science, statistics and mathematics in the leading institutes within and beyond the UK. It focused on the state-of-the-art engineering and algorithmic solutions adopted in realistic and large-scale applications in the context of high-dimensional big data.


Presentations and speakers



Large-Volume, High-Dimensional Data Processing at Thomson Reuters

Dr. Jochen Leidner (Director), Corporate Research & Development, Thomson Reuters, UK

Random Projection Ensemble Classification

Prof. Richard Samworth, Statistical Laboratory, University of Cambridge, UK

Multiple Random Projection Tree (MRPT) on High-Dimensional Data

Prof. Teemu Roos, Department of Computer Science, University of Helsinki, Finland

High-Dimensional Big Data Analysis

Dr. Dimitris Tasoulis (Senior Execution Researcher), Winton Capital Management, UK

A Knowledge Graph for Education and Learning

Mads Holmen (CEO), Bibblio Inc., UK

Multi Scale Machine Learning Methodologies for Molecular Biology Data

Dr. Pietro Lio’, Computer Laboratory, University of Cambridge, UK

Statistical Calculations at Scale Using Decisions and Emulation

Dr. Daniel Lawson, School of Social and Community Medicine, University of Bristol, UK

Uncovering Multi-Modal Spread Modes using Joint Diagonalisation

Dr. Eiko Yoneki, Computer Laboratory, University of Cambridge, UK

Optimal Hyperplanes for Clustering. Early results from High Dimensional Genomics Data

Dr. David Hofmeyr, Lancaster University

Accurate estimation of breakouts in high-dimensional panel data

Leonid Torgovitski Mathematical Institute, University of Cologne, Germany

Forthcoming talks

Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop

Thursday, 18 August 2022, 4.00pm to 5.00pm
Speaker: Zili Meng, Tsinghua Unversity
Venue: FW11 and

Real-time communication (RTC) applications like video conferencing or cloud gaming require consistent low latency to provide a seamless interactive experience. However, wireless networks including WiFi and cellular, albeit providing a satisfactory median latency, drastically degrade at the tail due to frequent and substantial wireless bandwidth fluctuations. We observe that the control loop for the sending rate of RTC applications is inflated when congestion happens at the wireless access point (AP), resulting in untimely rate adaption to wireless dynamics. Existing solutions, however, suffer from the inflated control loop and fail to quickly adapt to bandwidth fluctuations. In this paper, we propose Zhuge, a pure wireless AP based solution that reduces the control loop of RTC applications by separating congestion feedback from congested queues. We design a Fortune Teller to precisely estimate per-packet wireless latency upon its arrival at the wireless AP. To make Zhuge deployable at scale, we also design a Feedback Updater that translates the estimated latency to comprehensible feedback messages for various protocols and immediately delivers them back to senders for rate adaption. Trace-driven and real-world evaluation shows that Zhuge reduces the ratio of large tail latency and RTC performance degradation by 17% to 95%.

Speaker Bio: Zili is a 3rd-year PhD student in Tsinghua University. His current research interest focuses on real-time video communications. He has published several papers in SIGCOMM / NSDI and received the Microsoft Research Asia PhD Fellowship, Gold Medal of SIGCOMM 2018 Student Research Competition, and two best paper awards.

BSU Seminar: "Genome-wide genetic models for association, heritability analyses and prediction"

Monday, 22 August 2022, 4.30pm to 5.30pm
Speaker: David Balding, Honorary Professor of Statistical Genetics at UCL Genetics Institute and University of Melbourne
Venue: Seminar Rooms 1 & 2, School of Clinical Medicine, Hills Road, Cambridge CB2 0SP

Although simultaneous analysis of genome-wide SNPs has been popular for over a decade, the problems posed by more SNPs than study participants (more parameters than data points), and correlations among the SNPs, have not been adequately overcome so that almost all published genome-wide analyses are suboptimal. While there has been much attention paid to the shape of prior distributions for SNP effect sizes, we argue that this attention is misplaced. We focus on what we call the "heritability model": a low-dimensional model for the expected heritability at each SNP, which is key to both individual-data and summary-statistic analyses. The 1-df uniform heritability model has been implicitly adopted in a wide range of analyses. Replacing it with better heritability models, using predictors based on allele frequency, linkage disequilibrium and functional annotations, leads to substantial improvements in estimates of heritability and selection parameters over traits, and over genome regions, as well as improvements in gene-based association testing and prediction. Key collaborators Doug Speed, Aarhus, Denmark and Melbourne PhD student Anubhav Kaphle.

Statistics Clinic Summer 2022 III

Wednesday, 31 August 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

If you would like to participate, please fill in the following "form": The deadline for signing up for a session is 12pm on Monday the 29th of August. Subject to availability of members of the Statistics Clinic team, we will confirm your in-person or remote appointment.

This event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.

Statistics Clinic Summer 2022 IV

Wednesday, 21 September 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

Abstract not available

Title to be confirmed

Monday, 26 September 2022, 3.00pm to 4.00pm
Speaker: Christopher Yau, University of Manchester
Venue: CRUK CI Lecture Theatre

Abstract not available