A collection of texts and online resources have been collated that would be of interest to data scientists. We are always looking for recommended reading material and videos to help our students and research community with their learning. If you have a good read or video to recommend, please get in touch.
Cambridge University Press Textbooks
The full textbook catalogue for Cambridge University Press can be found here.
A few books from the catalogue that would be of interest to data scientist are listed below.
Free Access Until 31st May
For students, researchers and all learners whose education will be disrupted over the coming weeks and months Cambridge University Press have opened up their entire catalogue of online textbooks for free. All 700 texts are freely available to everybody until May 31st 2020. Full information about the free access is here.
Data Mining and Machine Learning
Authors: Mohammed J. Zaki, Rensselaer Polytechnic Institute, New York , Wagner Meira, Jr, Universidade Federal de Minas Gerais, Brazil
This textbook for senior undergraduate and graduate courses provides a comprehensive, in-depth overview of data mining, machine learning and statistics, offering solid guidance for students, researchers, and practitioners. The book lays the foundations of data analysis, pattern mining, clustering, classification and regression, with a focus on the algorithms and the underlying algebraic, geometric, and probabilistic concepts. New to this second edition is an entire part devoted to regression methods, including neural networks and deep learning.
Foundations of Data Science
Authors: Avrim Blum, Toyota Technical Institute at Chicago , John Hopcroft, Cornell University, New York , Ravi Kannan, Microsoft Research, India
This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing.
Hands-On Introduction to Data Science
Authors: Chirag Shah, University of Washington
This book introduces the field of data science in a practical and accessible manner, using a hands-on approach that assumes no prior knowledge of the subject. The foundational ideas and techniques of data science are provided independently from technology, allowing students to easily develop a firm understanding of the subject without a strong technical background, as well as being presented with material that will have continual relevance even after tools and technologies change.
Authors: Elizabeth S. Meckes, Case Western Reserve University, Ohio, Mark W. Meckes, Case Western Reserve University, Ohio
Linear Algebra offers a unified treatment of both matrix-oriented and theoretical approaches to the course, which will be useful for classes with a mix of mathematics, physics, engineering, and computer science students. Major topics include singular value decomposition, the spectral theorem, linear systems of equations, vector spaces, linear maps, matrices, eigenvalues and eigenvectors, linear independence, bases, coordinates, dimension, matrix factorizations, inner products, norms, and determinants.
Mathematics for Machine Learning
Authors: Marc Peter Deisenroth, University College London, A. Aldo Faisal, Imperial College London, Cheng Soon Ong, Data61, CSIRO
This self-contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines.
Mining of Massive Datasets
Authors: Jure Leskovec, Stanford University, California, Anand Rajaraman, Milliways Laboratories, California, Jeffrey David Ullman, Stanford University, California
This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing.
Pattern Recognition and Neural Networks
Authors: Brian D. Ripley, University of Oxford
With unparalleled coverage and a wealth of case-studies this book gives valuable insight into both the theory and the enormously diverse applications (which can be found in remote sensing, astrophysics, engineering and medicine, for example). So that readers can develop their skills and understanding, many of the real data sets used in the book are available from the author's website.
More recommended textbooks
Machine Learning: A Probabilistic Perspective
Access via University of Cambridge Library: ebook available on iDiscover; physical copies in various department and college libraries.
Authors: Kevin P. Murphy, University of British Colombia
This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.
Pattern Recognition and Machine Learning
Access via University of Cambridge Library: unfortunately no ebook available at time of writing; physical copies in various department and college libraries.
Authors: Christopher M. Bishop, Microsoft Research Cambridge
The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed.