Data Management in Machine Learning Systems

· ·
· Springer Nature
电子书
157
评分和评价未经验证  了解详情

关于此电子书

Large-scale data analytics using machine learning (ML) underpins many modern data-driven applications. ML systems provide means of specifying and executing these ML workloads in an efficient and scalable manner. Data management is at the heart of many ML systems due to data-driven application characteristics, data-centric workload characteristics, and system architectures inspired by classical data management techniques.

In this book, we follow this data-centric view of ML systems and aim to provide a comprehensive overview of data management in ML systems for the end-to-end data science or ML lifecycle. We review multiple interconnected lines of work: (1) ML support in database (DB) systems, (2) DB-inspired ML systems, and (3) ML lifecycle systems. Covered topics include: in-database analytics via query generation and user-defined functions, factorized and statistical-relational learning; optimizing compilers for ML workloads; execution strategies and hardware accelerators;data access methods such as compression, partitioning and indexing; resource elasticity and cloud markets; as well as systems for data preparation for ML, model selection, model management, model debugging, and model serving. Given the rapidly evolving field, we strive for a balance between an up-to-date survey of ML systems, an overview of the underlying concepts and techniques, as well as pointers to open research questions. Hence, this book might serve as a starting point for both systems researchers and developers.

作者简介

Matthias Boehm is a professor at Graz University of Technology, Austria, where he holds a BMVIT-endowed chair for data management. Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA, with a focus on compilation and runtime techniques for declarative, large-scale machine learning. He received his Ph.D.from Dresden University of Technology, Germany in 2011 with a dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award, and a 2016 SIGMOD Research Highlight Award.Arun Kumar is an Assistant Professor at the University of California, San Diego. He received his Ph.D. from the University of Wisconsin-Madison in 2016. His research interests are in the intersection of data management, systems, and ML, with a focus on making ML-based data analytics easier,faster, cheaper, and more scalable. Ideas from his work have been adopted by many companies, including EMC, Oracle, Cloudera, Facebook, and Microsoft. He is a recipient of the Best Paper Award at SIGMOD 2014, the 2016 CS dissertation research award from UW-Madison, a 2016 Google Faculty Research Award, and a 2018 Hellman Fellowship.Jun Yang is a Professor of Computer Science at Duke University, where he has been teaching since receiving his Ph.D. from Stanford University in 2001. He is broadly interested in databases and data-intensive systems. He is a recipient of the NSF CAREER Award, IBM Faculty Award, HP Labs Innovation Research Award, and Google Faculty Research Award. He also received the David and Janet Vaughan Brooks Teaching Award at Duke. His current research interests lie in making data analysis easier and more scalable for scientists, statisticians, and journalists.

为此电子书评分

欢迎向我们提供反馈意见。

如何阅读

智能手机和平板电脑
只要安装 AndroidiPad/iPhone 版的 Google Play 图书应用,不仅应用内容会自动与您的账号同步,还能让您随时随地在线或离线阅览图书。
笔记本电脑和台式机
您可以使用计算机的网络浏览器聆听您在 Google Play 购买的有声读物。
电子阅读器和其他设备
如果要在 Kobo 电子阅读器等电子墨水屏设备上阅读,您需要下载一个文件,并将其传输到相应设备上。若要将文件传输到受支持的电子阅读器上,请按帮助中心内的详细说明操作。