A collection of texts and online resources have been collated that would be of interest to data scientists. We are always looking for recommended reading material and videos to help our students and research community with their learning. If you have a good read or video to recommend, please get in touch.
- Self-paced online learning
- Cambridge University Press Textbooks
- More recommended textbooks
- MOOCs: Massive Open Online Courses
Self-paced online learning
Accelerate science with Python and Pandas e-learning module
A new open e-learning module from Accelerate and Cambridge Spark offers training in Python programming for research challenges.
You’ll learn the fundamentals of Python, some of the kinds of data it can handle, and how to store that data. You’ll also learn about a powerful data analysis library called Pandas, and you’ll use this to analyse data, and create excellent visualisations. You’ll be working on a real dataset of Nigerian healthcare facilities, and using code to decide on how changes to policy might impact this data.
Cambridge University Press Textbooks
The full textbook catalogue for Cambridge University Press can be found here.
A few books from the catalogue that would be of interest to data scientist are listed below.
Data Mining and Machine Learning
Authors: Mohammed J. Zaki, Rensselaer Polytechnic Institute, New York , Wagner Meira, Jr, Universidade Federal de Minas Gerais, Brazil
This textbook for senior undergraduate and graduate courses provides a comprehensive, in-depth overview of data mining, machine learning and statistics, offering solid guidance for students, researchers, and practitioners. The book lays the foundations of data analysis, pattern mining, clustering, classification and regression, with a focus on the algorithms and the underlying algebraic, geometric, and probabilistic concepts. New to this second edition is an entire part devoted to regression methods, including neural networks and deep learning.
Foundations of Data Science
Authors: Avrim Blum, Toyota Technical Institute at Chicago , John Hopcroft, Cornell University, New York , Ravi Kannan, Microsoft Research, India
This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing.
Hands-On Introduction to Data Science
Authors: Chirag Shah, University of Washington
This book introduces the field of data science in a practical and accessible manner, using a hands-on approach that assumes no prior knowledge of the subject. The foundational ideas and techniques of data science are provided independently from technology, allowing students to easily develop a firm understanding of the subject without a strong technical background, as well as being presented with material that will have continual relevance even after tools and technologies change.
Authors: Elizabeth S. Meckes, Case Western Reserve University, Ohio, Mark W. Meckes, Case Western Reserve University, Ohio
Linear Algebra offers a unified treatment of both matrix-oriented and theoretical approaches to the course, which will be useful for classes with a mix of mathematics, physics, engineering, and computer science students. Major topics include singular value decomposition, the spectral theorem, linear systems of equations, vector spaces, linear maps, matrices, eigenvalues and eigenvectors, linear independence, bases, coordinates, dimension, matrix factorizations, inner products, norms, and determinants.
Mathematics for Machine Learning
Authors: Marc Peter Deisenroth, University College London, A. Aldo Faisal, Imperial College London, Cheng Soon Ong, Data61, CSIRO
This self-contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines.
Mining of Massive Datasets
Authors: Jure Leskovec, Stanford University, California, Anand Rajaraman, Milliways Laboratories, California, Jeffrey David Ullman, Stanford University, California
This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing.
Pattern Recognition and Neural Networks
Authors: Brian D. Ripley, University of Oxford
With unparalleled coverage and a wealth of case-studies this book gives valuable insight into both the theory and the enormously diverse applications (which can be found in remote sensing, astrophysics, engineering and medicine, for example). So that readers can develop their skills and understanding, many of the real data sets used in the book are available from the author's website.
More recommended textbooks
Machine Learning: A Probabilistic Perspective
Access via University of Cambridge Library: ebook available on iDiscover; physical copies in various department and college libraries.
Authors: Kevin P. Murphy, University of British Colombia
This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.
Pattern Recognition and Machine Learning
Access via University of Cambridge Library: unfortunately no ebook available at time of writing; physical copies in various department and college libraries.
Authors: Christopher M. Bishop, Microsoft Research Cambridge
The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed.
MOOCs: Massive Open Online Courses
Online courses aimed at anyone with a general or specialised interest and looking to gain more skills.
This website has collated hundreds of MOOCs and you can browse by subject matter: computer science, data science, programming, mathematics, health & medicine etc. Let us know if you access a great MOOC through this website, or there are any issues.
Future Learn offers a diverse selection of courses from leading universities and cultural institutions from around the world. More about Future Learn here. Previous MOOCs have included 'Data Science in the Games Industry', 'Programming for Everybody (Getting Started with Python)', and 'Teaching Physical Computing with Raspberry Pi and Python'.
The collection of material list above is just a selection of learning resources that you may find helpful. C2D3 are not endorsing the resources (e.g. individual MOOCs on repository websites). If there any issues with the above material please let us know.