Learning Materials

A collection of texts and online resources have been collated that would be of interest to data scientists. We are always looking for recommended reading material and videos to help our students and research community with their learning. If you have a good read or video to recommend, please get in touch.

  • Self-paced online learning
  • Cambridge University Press Textbooks
  • More recommended textbooks
  • MOOCs: Massive Open Online Courses​​


Self-paced online learning

Accelerate science with Python and Pandas e-learning module


A new open e-learning module from Accelerate and Cambridge Spark offers training in Python programming for research challenges.

You’ll learn the fundamentals of Python, some of the kinds of data it can handle, and how to store that data. You’ll also learn about a powerful data analysis library called Pandas, and you’ll use this to analyse data, and create excellent visualisations. You’ll be working on a real dataset of Nigerian healthcare facilities, and using code to decide on how changes to policy might impact this data.

Cambridge University Press Textbooks

The full textbook catalogue for Cambridge University Press can be found here.

A few books from the catalogue that would be of interest to data scientist are listed below.

Data Mining and Machine Learning

Online access

Authors: Mohammed J. Zaki, Rensselaer Polytechnic Institute, New York , Wagner Meira, Jr, Universidade Federal de Minas Gerais, Brazil

This textbook for senior undergraduate and graduate courses provides a comprehensive, in-depth overview of data mining, machine learning and statistics, offering solid guidance for students, researchers, and practitioners. The book lays the foundations of data analysis, pattern mining, clustering, classification and regression, with a focus on the algorithms and the underlying algebraic, geometric, and probabilistic concepts. New to this second edition is an entire part devoted to regression methods, including neural networks and deep learning.

Foundations of Data Science

Online access

Authors: Avrim Blum, Toyota Technical Institute at Chicago , John Hopcroft, Cornell University, New York , Ravi Kannan, Microsoft Research, India

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing.

Hands-On Introduction to Data Science

Online access

Authors: Chirag Shah, University of Washington

This book introduces the field of data science in a practical and accessible manner, using a hands-on approach that assumes no prior knowledge of the subject. The foundational ideas and techniques of data science are provided independently from technology, allowing students to easily develop a firm understanding of the subject without a strong technical background, as well as being presented with material that will have continual relevance even after tools and technologies change.

Linear Algebra

Online access

Authors: Elizabeth S. Meckes, Case Western Reserve University, Ohio, Mark W. Meckes, Case Western Reserve University, Ohio

Linear Algebra offers a unified treatment of both matrix-oriented and theoretical approaches to the course, which will be useful for classes with a mix of mathematics, physics, engineering, and computer science students. Major topics include singular value decomposition, the spectral theorem, linear systems of equations, vector spaces, linear maps, matrices, eigenvalues and eigenvectors, linear independence, bases, coordinates, dimension, matrix factorizations, inner products, norms, and determinants.

Mathematics for Machine Learning

Online access

Authors: Marc Peter Deisenroth, University College London, A. Aldo Faisal, Imperial College London, Cheng Soon Ong, Data61, CSIRO

This self-contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines.

Mining of Massive Datasets

Online access

Authors: Jure Leskovec, Stanford University, California, Anand Rajaraman, Milliways Laboratories, California, Jeffrey David Ullman, Stanford University, California

This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing.

Pattern Recognition and Neural Networks

Online access

Authors: Brian D. Ripley, University of Oxford

With unparalleled coverage and a wealth of case-studies this book gives valuable insight into both the theory and the enormously diverse applications (which can be found in remote sensing, astrophysics, engineering and medicine, for example). So that readers can develop their skills and understanding, many of the real data sets used in the book are available from the author's website.


More recommended textbooks

Machine Learning: A Probabilistic Perspective

Access via University of Cambridge Library: ebook available on iDiscover; physical copies in various department and college libraries.

Authors: Kevin P. Murphy, University of British Colombia

This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.

Pattern Recognition and Machine Learning

Access via University of Cambridge Library: unfortunately no ebook available at time of writing; physical copies in various department and college libraries.

Authors: Christopher M. Bishop, Microsoft Research Cambridge

The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed.


MOOCs: Massive Open Online Courses

Online courses aimed at anyone with a general or specialised interest and looking to gain more skills.

Class Centre

MOOC access

This website has collated hundreds of MOOCs and you can browse by subject matter: computer science, data science, programming, mathematics, health & medicine etc. Let us know if you access a great MOOC through this website, or there are any issues.

Future Learn

MOOC access

Future Learn offers a diverse selection of courses from leading universities and cultural institutions from around the world. More about Future Learn here. Previous MOOCs have included 'Data Science in the Games Industry', 'Programming for Everybody (Getting Started with Python)', and 'Teaching Physical Computing with Raspberry Pi and Python'.




The collection of material list above is just a selection of learning resources that you may find helpful. C2D3 are not endorsing the resources (e.g. individual MOOCs on repository websites). If there any issues with the above material please let us know.