Example Jupyter notebooks show how to enrich data with Open Datasets
The example Jupyter notebooks for Azure Open Datasets show you how to load open datasets and use them to enrich demo data. Techniques include use of Apache Spark and Pandas to process data.
Important
When working in a non-Spark environment, Open Datasets allows downloading only one month of data at a time with certain classes in order to avoid MemoryError with large datasets.
Load NOAA Integrated Surface Database (ISD) data
Notebook | Description |
---|---|
Load one recent month of weather data into a Pandas dataframe | Learn how to load historical weather data into your favorite Pandas dataframe. |
Load one recent month of weather data into a Spark dataframe | Learn how to load historical weather data into your favorite Spark dataframe. |
Join demo data with NOAA ISD data
Notebook | Description |
---|---|
Join demo data with weather data - Pandas | Join a 1-month demo dataset of sensor locations with weather readings in a Pandas dataframe. |
Join demo data with weather data – Spark | Join a demo dataset of sensor locations with weather readings in a Spark dataframe. |
Join NYC taxi data with NOAA ISD data
Notebook | Description |
---|---|
Taxi trip data enriched with weather data - Pandas | Load NYC green taxi data (over 1 month) and enrich it with weather data in a Pandas dataframe. This example overrides the method get_pandas_limit and balances data load performance with the amount of data. |
Taxi trip data enriched with weather data – Spark | Load NYC green taxi data and enrich it with weather data, in Spark dataframe. |
Next steps
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for