Prepare data and environment for ML and DL
This section describes how to prepare your data and your Azure Databricks environment for machine learning and deep learning.
Prepare data
The articles in this section cover aspects of loading and preprocessing data that are specific to ML and DL applications.
- Load data for machine learning and deep learning
- Preprocess data for machine learning and deep learning
Prepare environment
Databricks Runtime for Machine Learning (Databricks Runtime ML) is a ready-to-go environment optimized for machine learning and data science. Databricks Runtime ML includes many external libraries, including TensorFlow, PyTorch, Horovod, scikit-learn and XGBoost, and provides extensions to improve performance, including GPU acceleration in XGBoost, distributed deep learning using HorovodRunner, and model checkpointing using a Databricks File System (DBFS) FUSE mount.
To use Databricks Runtime ML, select the ML version of the runtime when you create your cluster.
Note
To access data in Unity Catalog for machine learning workflows, the access mode for the cluster must be single user (assigned). Shared clusters are not compatible with Databricks Runtime for Machine Learning.
Install libraries
You can install additional libraries to create a custom environment for your notebook or cluster.
- To make a library available for all notebooks running on a cluster, create a cluster library. You can also use an init script to install libraries on clusters upon creation.
- To install a library that is available only to a specific notebook session, use Notebook-scoped Python libraries.
Use GPU clusters
You can create GPU clusters to accelerate deep learning tasks. For information about creating Azure Databricks GPU clusters, see GPU-enabled compute. Databricks Runtime ML includes GPU hardware drivers and NVIDIA libraries such as CUDA.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for