Subsetting datasets

Subsetting is the action of filtering the dataset by it’s first dimension (dates).

start

This option lets you subset the dataset by time. You can pass a date or a string:

open_dataset(dataset, start=1980)

end

As for the start option, you can pass a date or a string:

open_dataset(dataset, end="2020-12-31")

The following are equivalent ways of describing start or end:

  • 2020 and "2020"

  • 202306, "202306" and "2023-06"

  • 20200301, "20200301" and "2020-03-01"

Note that the start="2020" is equivalent to start="2020-01-01" while end="2020" is equivalent to end="2020-12-31".

Note also how the frequency of the dataset will change how the end option is interpreted: - end="2020" with a frequency of one hour is equivalent to end="2020-12-31 23:00:00" - end="2020" with a frequency of 6 hours is equivalent to end="2020-12-31 18:00:00"

frequency

You can change the frequency of the dataset by passing a string with:

ds = open_dataset(dataset, frequency="6h")

The new frequency must be a multiple of the original frequency.

To artificially increase the frequency, you can use the interpolate_frequency option. This will create new dates in the dataset by linearly interpolating the data values between the original dates.

ds = open_dataset(dataset, interpolate_frequency="10m")