Deep Learning for Climate Model Emulation

Data Science Capstone - DSC 180A/B Section B03 (TA: Yanyi)

Introduction to Topic

The choices humanity makes in the next few decades will determine how much warmer the Earth will be by the end of the century, with implications for billions of lives and trillions of dollars in GDP. Many different emission pathways exist that are compatible with the Paris climate agreement, and many more are possible that miss that target. While some of the most complex climate models have simulated a small selection of these, it is impractical to use these computationally expensive models to fully explore the space of possibilities or assess all the associated risks. Our lab has recently developed a state-of-the-art climate model emulation benchmark to enable fast, accurate and reliable predictions for any given scenario: ClimateBench.

Phase I - Replication

The aim of reproducing a paper’s results is to affirm the original authors’ findings and methodologies. This process is vital in science to ensure results are robust and reliable, not merely due to chance or error. Reproduction would reinforce the evidence that the constructed emulators are faithfully reproducing the underlying climate model and can be trusted for such tasks. It also provides a deeper understanding of the applied methods like long short-term memory networks. Ultimately, this endeavor seeks to enable fast and efficient sampling of different climate scenarios to improve decision making.

The paper, linked here, we will be working with is:

Watson-Parris, D., Rao, Y., Olivié, D., Seland, Ø., … “ClimateBench v1.0: A benchmark for data-driven climate projections”. Journal of Advances in Modeling Earth Systems 14, e2021MS002954:

Accessing the ClimateBench Dataset

While the processed dataset is publically available, it will be instructive for you to generate it yourselves, and an important part of the replication process. You will be provided with access to Casper data analysis cluster at the National Center for Atmospheric Research (NCAR) with sufficient resources to perform the analyses throughout the project. Please note this is a national facility with shared resources so be mindful of your requests and be sure to abide by their rules.

The data we will use is available from the sixth Coupled Model Intercomparison Project (CMIP6) which represents the combined efforts of dozens of international research laboratories running hundreds of thousands of simulation years of experiments. The data (all 30 petabytes!) is publically archived and available e.g. here, and also recently mirrored to the cloud here. Fortunately, all the data you will need is already available on Casper so you shouldn’t need to download any large datasets, which can be quite cumbersome.


Click the “topic” links below for details regarding the readings, questions, and tasks for that week.

Week Topic
Summer Summer preperation
1 Introduction to topic, domain, and paper
2 Dive into the ClimateBench dataset
3-4 Begin data preprocessing and learn about xarray
5-7 Start implementing regression models
8-9 Perform validation and testing of baselines
10 Project wrap up and debrief

Phase II


Possible options for extension include:

  • Incorporating multiple climate models at different levels of fidelity
  • Improving the regression models or dimensionality reduction approaches, perhaps using graph-based regulariztion
  • Extending the variables that are emulated

Section & Group Participation

Participation in the weekly discussion section is mandatory. Each week, during phase 1, you are responsible for doing the reading/task assigned in the schedule and submitting answers to the listed questions before discussion section begins.

Weekly assigned questions help me to observe how you are all doing on the project, as well as to focus your work for the week and help prepare you for discussion. If you have questions about your work, please ask them in section, on discord, or in office hours (I will rarely comment on your submission answers).

Office Hours

Wednesdays 10-11am in HDSI room 325.

You’re also welcome to join our group meetings on Mondays at 1pm in Nierenberg Hall room 432 at SIO.