Aimed at students, professionals, and data enthusiasts who want to learn, implement, and automate machine learning pipelines using Spark ML in real-world environments. This book teaches everything from data ingestion to model deployment in production, with hands-on integration of leading market services, including AWS, Azure, Google Cloud, Databricks, Hadoop, Kubernetes, Apache Airflow, S3, BigQuery, Redshift, and Delta Lake.
The content covers:
• Integration of Spark ML with cloud environments and data platforms
• Construction and automation of pipelines with Spark MLlib and Airflow
• Implementation of supervised and unsupervised models
• Deployment, monitoring, and management of models in cloud and hybrid environments
• Workflow optimization with Delta Lake, BigQuery, and Redshift
• Tuning techniques, cross-validation, and MLOps fundamentals
• Performance analysis and scalability of machine learning solutions
All examples and routines serve as a starting point, allowing adaptation to different academic and professional contexts. The goal is to deliver technical onboarding, practical autonomy, and mastery of the most widely used integrations in the market.
spark ml, aws, azure, google cloud, databricks, hadoop, airflow, s3, bigquery, redshift, delta lake, pipelines, mlops, deploy, automation, predictive models
Diego Rodrigues
Technical Author and Independent Researcher
ORCID: https://orcid.org/0009-0006-
StudioD21 Smart Tech Content & Intell Systems
Email: [email protected]
LinkedIn: linkedin.com/in/diegoexpertai
International technical author (tech writer) focused on the structured production of applied knowledge. He is the founder of StudioD21 Smart Tech Content & Intell Systems, where he leads the creation of intelligent frameworks and the publication of didactic technical books supported by artificial intelligence, such as the Kali Linux Extreme series, SMARTBOOKS D21, among others.
Holder of 42 international certifications issued by institutions such as IBM, Google, Microsoft, AWS, Cisco, META, Ec-Council, Palo Alto, and Boston University, he works in the fields of Artificial Intelligence, Machine Learning, Data Science, Big Data, Blockchain, Connectivity Technologies, Ethical Hacking, and Threat Intelligence.
Since 2003, he has developed more than 200 technical projects for brands in Brazil, the USA, and Mexico. In 2024, he established himself as one of the leading technical book authors of the new generation, with over 180 titles published in six languages. His work is based on his proprietary TECHWRITE 2.3 applied technical writing protocol, focused on scalability, conceptual precision, and practical applicability in professional environments.