Handling missing values

When handling data for machine learning models, missing values (NaNs) can pose a challenge, as models require complete data to operate effectively and may crash otherwise. Ideally, we anticipate having complete data in all fields.

However, there are scenarios where NaNs naturally occur, such as with variables only relevant on land or at sea. This happens for sea surface temperature (sst), for example. In such cases, the default behavior is to reject data with NaNs as invalid. To accommodate NaNs and accurately compute statistics based on them, you can include the allow_nans key in the configuration.

Here’s an example of how to implement it:

statistics:
  allow_nans: [sst, ci]