jiff/tz/
posix.rs

1/*!
2Provides a parser for [POSIX's `TZ` environment variable][posix-env].
3
4NOTE: Sadly, at time of writing, the actual parser is in `src/shared/posix.rs`.
5This is so it can be shared (via simple code copying) with proc macros like
6the one found in `jiff-tzdb-static`. The parser populates a "lowest common
7denominator" data type. In normal use in Jiff, this type is converted into
8the types defined below. This module still does provide the various time zone
9operations. Only the parsing is written elsewhere.
10
11The `TZ` environment variable is most commonly used to set a time zone. For
12example, `TZ=America/New_York`. But it can also be used to tersely define DST
13transitions. Moreover, the format is not just used as an environment variable,
14but is also included at the end of TZif files (version 2 or greater). The IANA
15Time Zone Database project also [documents the `TZ` variable][iana-env] with
16a little more commentary.
17
18Note that we (along with pretty much everyone else) don't strictly follow
19POSIX here. Namely, `TZ=America/New_York` isn't a POSIX compatible usage,
20and I believe it technically should be `TZ=:America/New_York`. Nevertheless,
21apparently some group of people (IANA folks?) decided `TZ=America/New_York`
22should be fine. From the [IANA `theory.html` documentation][iana-env]:
23
24> It was recognized that allowing the TZ environment variable to take on values
25> such as 'America/New_York' might cause "old" programs (that expect TZ to have
26> a certain form) to operate incorrectly; consideration was given to using
27> some other environment variable (for example, TIMEZONE) to hold the string
28> used to generate the TZif file's name. In the end, however, it was decided
29> to continue using TZ: it is widely used for time zone purposes; separately
30> maintaining both TZ and TIMEZONE seemed a nuisance; and systems where "new"
31> forms of TZ might cause problems can simply use legacy TZ values such as
32> "EST5EDT" which can be used by "new" programs as well as by "old" programs
33> that assume pre-POSIX TZ values.
34
35Indeed, even [musl subscribes to this behavior][musl-env]. So that's what we do
36here too.
37
38Note that a POSIX time zone like `EST5` corresponds to the UTC offset `-05:00`,
39and `GMT-4` corresponds to the UTC offset `+04:00`. Yes, it's backwards. How
40fun.
41
42# IANA v3+ Support
43
44While this module and many of its types are directly associated with POSIX,
45this module also plays a supporting role for `TZ` strings in the IANA TZif
46binary format for versions 2 and greater. Specifically, for versions 3 and
47greater, some minor extensions are supported here via `IanaTz::parse`. But
48using `PosixTz::parse` is limited to parsing what is specified by POSIX.
49Nevertheless, we generally use `IanaTz::parse` everywhere, even when parsing
50the `TZ` environment variable. The reason for this is that it seems to be what
51other programs do in practice (for example, GNU date).
52
53# `no-std` and `no-alloc` support
54
55A big part of this module works fine in core-only environments. But because
56core-only environments provide means of indirection, and embedding a
57`PosixTimeZone` into a `TimeZone` without indirection would use up a lot of
58space (and thereby make `Zoned` quite chunky), we provide core-only support
59principally through a proc macro. Namely, a `PosixTimeZone` can be parsed by
60the proc macro and then turned into static data.
61
62POSIX time zone support isn't explicitly provided directly as a public API
63for core-only environments, but is implicitly supported via TZif. (Since TZif
64data contains POSIX time zone strings.)
65
66[posix-env]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
67[iana-env]: https://data.iana.org/time-zones/tzdb-2024a/theory.html#functions
68[musl-env]: https://wiki.musl-libc.org/environment-variables
69*/
70
71use core::fmt::Debug;
72
73use crate::{
74    civil::DateTime,
75    error::{err, Error, ErrorContext},
76    shared,
77    timestamp::Timestamp,
78    tz::{
79        timezone::TimeZoneAbbreviation, AmbiguousOffset, Dst, Offset,
80        TimeZoneOffsetInfo, TimeZoneTransition,
81    },
82    util::{array_str::Abbreviation, escape::Bytes, parse},
83};
84
85/// The result of parsing the POSIX `TZ` environment variable.
86///
87/// A `TZ` variable can either be a time zone string with an optional DST
88/// transition rule, or it can begin with a `:` followed by an arbitrary set of
89/// bytes that is implementation defined.
90///
91/// In practice, the content following a `:` is treated as an IANA time zone
92/// name. Moreover, even if the `TZ` string doesn't start with a `:` but
93/// corresponds to a IANA time zone name, then it is interpreted as such.
94/// (See the module docs.) However, this type only encapsulates the choices
95/// strictly provided by POSIX: either a time zone string with an optional DST
96/// transition rule, or an implementation defined string with a `:` prefix. If,
97/// for example, `TZ="America/New_York"`, then that case isn't encapsulated by
98/// this type. Callers needing that functionality will need to handle the error
99/// returned by parsing this type and layer their own semantics on top.
100#[cfg(feature = "tz-system")]
101#[derive(Debug, Eq, PartialEq)]
102pub(crate) enum PosixTzEnv {
103    /// A valid POSIX time zone with an optional DST transition rule.
104    Rule(PosixTimeZoneOwned),
105    /// An implementation defined string. This occurs when the `TZ` value
106    /// starts with a `:`. The string returned here does not include the `:`.
107    Implementation(alloc::boxed::Box<str>),
108}
109
110#[cfg(feature = "tz-system")]
111impl PosixTzEnv {
112    /// Parse a POSIX `TZ` environment variable string from the given bytes.
113    fn parse(bytes: impl AsRef<[u8]>) -> Result<PosixTzEnv, Error> {
114        let bytes = bytes.as_ref();
115        if bytes.get(0) == Some(&b':') {
116            let Ok(string) = core::str::from_utf8(&bytes[1..]) else {
117                return Err(err!(
118                    "POSIX time zone string with a ':' prefix contains \
119                     invalid UTF-8: {:?}",
120                    Bytes(&bytes[1..]),
121                ));
122            };
123            Ok(PosixTzEnv::Implementation(string.into()))
124        } else {
125            PosixTimeZone::parse(bytes).map(PosixTzEnv::Rule)
126        }
127    }
128
129    /// Parse a POSIX `TZ` environment variable string from the given `OsStr`.
130    pub(crate) fn parse_os_str(
131        osstr: impl AsRef<std::ffi::OsStr>,
132    ) -> Result<PosixTzEnv, Error> {
133        PosixTzEnv::parse(parse::os_str_bytes(osstr.as_ref())?)
134    }
135}
136
137#[cfg(feature = "tz-system")]
138impl core::fmt::Display for PosixTzEnv {
139    fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
140        match *self {
141            PosixTzEnv::Rule(ref tz) => write!(f, "{tz}"),
142            PosixTzEnv::Implementation(ref imp) => write!(f, ":{imp}"),
143        }
144    }
145}
146
147/// An owned POSIX time zone.
148///
149/// That is, a POSIX time zone whose abbreviations are inlined into the
150/// representation. As opposed to a static POSIX time zone whose abbreviations
151/// are `&'static str`.
152pub(crate) type PosixTimeZoneOwned = PosixTimeZone<Abbreviation>;
153
154/// An owned POSIX time zone whose abbreviations are `&'static str`.
155pub(crate) type PosixTimeZoneStatic = PosixTimeZone<&'static str>;
156
157/// A POSIX time zone.
158///
159/// # On "reasonable" POSIX time zones
160///
161/// Jiff only supports "reasonable" POSIX time zones. A "reasonable" POSIX time
162/// zone is a POSIX time zone that has a DST transition rule _when_ it has a
163/// DST time zone abbreviation. Without the transition rule, it isn't possible
164/// to know when DST starts and stops.
165///
166/// POSIX technically allows a DST time zone abbreviation *without* a
167/// transition rule, but the behavior is literally unspecified. So Jiff just
168/// rejects them.
169///
170/// Note that if you're confused as to why Jiff accepts `TZ=EST5EDT` (where
171/// `EST5EDT` is an example of an _unreasonable_ POSIX time zone), that's
172/// because Jiff rejects `EST5EDT` and instead attempts to use it as an IANA
173/// time zone identifier. And indeed, the IANA Time Zone Database contains an
174/// entry for `EST5EDT` (presumably for legacy reasons).
175///
176/// Also, we expect `TZ` strings parsed from IANA v2+ formatted `tzfile`s to
177/// also be reasonable or parsing fails. This also seems to be consistent with
178/// the [GNU C Library]'s treatment of the `TZ` variable: it only documents
179/// support for reasonable POSIX time zone strings.
180///
181/// Note that a V2 `TZ` string is precisely identical to a POSIX `TZ`
182/// environment variable string. A V3 `TZ` string however supports signed DST
183/// transition times, and hours in the range `0..=167`. The V2 and V3 here
184/// reference how `TZ` strings are defined in the TZif format specified by
185/// [RFC 9636]. V2 is the original version of it straight from POSIX, where as
186/// V3+ corresponds to an extension added to V3 (and newer versions) of the
187/// TZif format. V3 is a superset of V2, so in practice, Jiff just permits
188/// V3 everywhere.
189///
190/// [GNU C Library]: https://www.gnu.org/software/libc/manual/2.25/html_node/TZ-Variable.html
191/// [RFC 9636]: https://datatracker.ietf.org/doc/rfc9636/
192#[derive(Clone, Debug, Eq, PartialEq)]
193// NOT part of Jiff's public API
194#[doc(hidden)]
195// This ensures the alignment of this type is always *at least* 8 bytes. This
196// is required for the pointer tagging inside of `TimeZone` to be sound. At
197// time of writing (2024-02-24), this explicit `repr` isn't required on 64-bit
198// systems since the type definition is such that it will have an alignment of
199// at least 8 bytes anyway. But this *is* required for 32-bit systems, where
200// the type definition at present only has an alignment of 4 bytes.
201#[repr(align(8))]
202pub struct PosixTimeZone<ABBREV> {
203    inner: shared::PosixTimeZone<ABBREV>,
204}
205
206impl PosixTimeZone<Abbreviation> {
207    /// Parse a IANA tzfile v3+ `TZ` string from the given bytes.
208    #[cfg(feature = "alloc")]
209    pub(crate) fn parse(
210        bytes: impl AsRef<[u8]>,
211    ) -> Result<PosixTimeZoneOwned, Error> {
212        let bytes = bytes.as_ref();
213        let inner = shared::PosixTimeZone::parse(bytes.as_ref())
214            .map_err(Error::shared)
215            .map_err(|e| {
216                e.context(err!("invalid POSIX TZ string {:?}", Bytes(bytes)))
217            })?;
218        Ok(PosixTimeZone { inner })
219    }
220
221    /// Like `parse`, but parses a POSIX TZ string from a prefix of the
222    /// given input. And remaining input is returned.
223    #[cfg(feature = "alloc")]
224    pub(crate) fn parse_prefix<'b, B: AsRef<[u8]> + ?Sized + 'b>(
225        bytes: &'b B,
226    ) -> Result<(PosixTimeZoneOwned, &'b [u8]), Error> {
227        let bytes = bytes.as_ref();
228        let (inner, remaining) =
229            shared::PosixTimeZone::parse_prefix(bytes.as_ref())
230                .map_err(Error::shared)
231                .map_err(|e| {
232                    e.context(err!(
233                        "invalid POSIX TZ string {:?}",
234                        Bytes(bytes)
235                    ))
236                })?;
237        Ok((PosixTimeZone { inner }, remaining))
238    }
239
240    /// Converts from the shared-but-internal API for use in proc macros.
241    #[cfg(feature = "alloc")]
242    pub(crate) fn from_shared_owned(
243        sh: shared::PosixTimeZone<Abbreviation>,
244    ) -> PosixTimeZoneOwned {
245        PosixTimeZone { inner: sh }
246    }
247}
248
249impl PosixTimeZone<&'static str> {
250    /// Converts from the shared-but-internal API for use in proc macros.
251    ///
252    /// This works in a `const` context by requiring that the time zone
253    /// abbreviations are `static` strings. This is used when converting
254    /// code generated by a proc macro to this Jiff internal type.
255    pub(crate) const fn from_shared_const(
256        sh: shared::PosixTimeZone<&'static str>,
257    ) -> PosixTimeZoneStatic {
258        PosixTimeZone { inner: sh }
259    }
260}
261
262impl<ABBREV: AsRef<str> + Debug> PosixTimeZone<ABBREV> {
263    /// Returns the appropriate time zone offset to use for the given
264    /// timestamp.
265    ///
266    /// If you need information like whether the offset is in DST or not, or
267    /// the time zone abbreviation, then use `PosixTimeZone::to_offset_info`.
268    /// But that API may be more expensive to use, so only use it if you need
269    /// the additional data.
270    pub(crate) fn to_offset(&self, timestamp: Timestamp) -> Offset {
271        Offset::from_ioffset_const(
272            self.inner.to_offset(timestamp.to_itimestamp_const()),
273        )
274    }
275
276    /// Returns the appropriate time zone offset to use for the given
277    /// timestamp.
278    ///
279    /// This also includes whether the offset returned should be considered
280    /// to be "DST" or not, along with the time zone abbreviation (e.g., EST
281    /// for standard time in New York, and EDT for DST in New York).
282    pub(crate) fn to_offset_info(
283        &self,
284        timestamp: Timestamp,
285    ) -> TimeZoneOffsetInfo<'_> {
286        let (ioff, abbrev, is_dst) =
287            self.inner.to_offset_info(timestamp.to_itimestamp_const());
288        let offset = Offset::from_ioffset_const(ioff);
289        let abbreviation = TimeZoneAbbreviation::Borrowed(abbrev);
290        TimeZoneOffsetInfo { offset, dst: Dst::from(is_dst), abbreviation }
291    }
292
293    /// Returns a possibly ambiguous timestamp for the given civil datetime.
294    ///
295    /// The given datetime should correspond to the "wall" clock time of what
296    /// humans use to tell time for this time zone.
297    ///
298    /// Note that "ambiguous timestamp" is represented by the possible
299    /// selection of offsets that could be applied to the given datetime. In
300    /// general, it is only ambiguous around transitions to-and-from DST. The
301    /// ambiguity can arise as a "fold" (when a particular wall clock time is
302    /// repeated) or as a "gap" (when a particular wall clock time is skipped
303    /// entirely).
304    pub(crate) fn to_ambiguous_kind(&self, dt: DateTime) -> AmbiguousOffset {
305        let iamoff = self.inner.to_ambiguous_kind(dt.to_idatetime_const());
306        AmbiguousOffset::from_iambiguous_offset_const(iamoff)
307    }
308
309    /// Returns the timestamp of the most recent time zone transition prior
310    /// to the timestamp given. If one doesn't exist, `None` is returned.
311    pub(crate) fn previous_transition<'t>(
312        &'t self,
313        timestamp: Timestamp,
314    ) -> Option<TimeZoneTransition<'t>> {
315        let (its, ioff, abbrev, is_dst) =
316            self.inner.previous_transition(timestamp.to_itimestamp_const())?;
317        let timestamp = Timestamp::from_itimestamp_const(its);
318        let offset = Offset::from_ioffset_const(ioff);
319        let dst = Dst::from(is_dst);
320        Some(TimeZoneTransition { timestamp, offset, abbrev, dst })
321    }
322
323    /// Returns the timestamp of the soonest time zone transition after the
324    /// timestamp given. If one doesn't exist, `None` is returned.
325    pub(crate) fn next_transition<'t>(
326        &'t self,
327        timestamp: Timestamp,
328    ) -> Option<TimeZoneTransition<'t>> {
329        let (its, ioff, abbrev, is_dst) =
330            self.inner.next_transition(timestamp.to_itimestamp_const())?;
331        let timestamp = Timestamp::from_itimestamp_const(its);
332        let offset = Offset::from_ioffset_const(ioff);
333        let dst = Dst::from(is_dst);
334        Some(TimeZoneTransition { timestamp, offset, abbrev, dst })
335    }
336}
337
338impl<ABBREV: AsRef<str>> core::fmt::Display for PosixTimeZone<ABBREV> {
339    fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
340        core::fmt::Display::fmt(&self.inner, f)
341    }
342}
343
344// The tests below require parsing which requires alloc.
345#[cfg(feature = "alloc")]
346#[cfg(test)]
347mod tests {
348    use super::*;
349
350    #[cfg(feature = "tz-system")]
351    #[test]
352    fn parse_posix_tz() {
353        // We used to parse this and then error when we tried to
354        // convert to a "reasonable" POSIX time zone with a DST
355        // transition rule. We never actually used unreasonable POSIX
356        // time zones and it was complicating the type definitions, so
357        // now we just reject it outright.
358        assert!(PosixTzEnv::parse("EST5EDT").is_err());
359
360        let tz = PosixTzEnv::parse(":EST5EDT").unwrap();
361        assert_eq!(tz, PosixTzEnv::Implementation("EST5EDT".into()));
362
363        // We require implementation strings to be UTF-8, because we're
364        // sensible.
365        assert!(PosixTzEnv::parse(b":EST5\xFFEDT").is_err());
366    }
367}