Struct Deserializer

Source
pub struct Deserializer<'de, R, E: EntityResolver = PredefinedEntityResolver>
where R: XmlRead<'de>,
{ reader: XmlReader<'de, R, E>, read: VecDeque<DeEvent<'de>>, write: VecDeque<DeEvent<'de>>, limit: Option<NonZeroUsize>, key_buf: String, }
Expand description

A structure that deserializes XML into Rust values.

Fields§

§reader: XmlReader<'de, R, E>

An XML reader that streams events into this deserializer

§read: VecDeque<DeEvent<'de>>

When deserializing sequences sometimes we have to skip unwanted events. That events should be stored and then replayed. This is a replay buffer, that streams events while not empty. When it exhausted, events will requested from Self::reader.

§write: VecDeque<DeEvent<'de>>

When deserializing sequences sometimes we have to skip events, because XML is tolerant to elements order and even if in the XSD order is strictly specified (using xs:sequence) most of XML parsers allows order violations. That means, that elements, forming a sequence, could be overlapped with other elements, do not related to that sequence.

In order to support this, deserializer will scan events and skip unwanted events, store them here. After call Self::start_replay() all events moved from this to Self::read.

§limit: Option<NonZeroUsize>

Maximum number of events that can be skipped when processing sequences that occur out-of-order. This field is used to prevent potential denial-of-service (DoS) attacks which could cause infinite memory consumption when parsing a very large amount of XML into a sequence field.

§key_buf: String

Buffer to store attribute name as a field name exposed to serde consumers

Implementations§

Source§

impl<'de, R, E> Deserializer<'de, R, E>
where R: XmlRead<'de>, E: EntityResolver,

Source

fn new(reader: R, entity_resolver: E) -> Self

Create an XML deserializer from one of the possible quick_xml input sources.

Typically it is more convenient to use one of these methods instead:

Source

pub fn is_empty(&self) -> bool

Returns true if all events was consumed.

Source

pub const fn get_ref(&self) -> &R

Returns the underlying XML reader.

use serde::Deserialize;
use quick_xml::de::Deserializer;
use quick_xml::Reader;

#[derive(Deserialize)]
struct SomeStruct {
    field1: String,
    field2: String,
}

// Try to deserialize from broken XML
let mut de = Deserializer::from_str(
    "<SomeStruct><field1><field2></SomeStruct>"
//   0                           ^= 28        ^= 41
);

let err = SomeStruct::deserialize(&mut de);
assert!(err.is_err());

let reader: &Reader<_> = de.get_ref().get_ref();

assert_eq!(reader.error_position(), 28);
assert_eq!(reader.buffer_position(), 41);
Source

pub fn event_buffer_size(&mut self, limit: Option<NonZeroUsize>) -> &mut Self

Set the maximum number of events that could be skipped during deserialization of sequences.

If <element> contains more than specified nested elements, $text or CDATA nodes, then DeError::TooManyEvents will be returned during deserialization of sequence field (any type that uses deserialize_seq for the deserialization, for example, Vec<T>).

This method can be used to prevent a DoS attack and infinite memory consumption when parsing a very large XML to a sequence field.

It is strongly recommended to set limit to some value when you parse data from untrusted sources. You should choose a value that your typical XMLs can have between different elements that corresponds to the same sequence.

§Examples

Let’s imagine, that we deserialize such structure:

struct List {
  item: Vec<()>,
}

The XML that we try to parse look like this:

<any-name>
  <item/>
  <!-- Bufferization starts at this point -->
  <another-item>
    <some-element>with text</some-element>
    <yet-another-element/>
  </another-item>
  <!-- Buffer will be emptied at this point; 7 events were buffered -->
  <item/>
  <!-- There is nothing to buffer, because elements follows each other -->
  <item/>
</any-name>

There, when we deserialize the item field, we need to buffer 7 events, before we can deserialize the second <item/>:

  • <another-item>
  • <some-element>
  • $text(with text)
  • </some-element>
  • <yet-another-element/> (virtual start event)
  • <yet-another-element/> (virtual end event)
  • </another-item>

Note, that <yet-another-element/> internally represented as 2 events: one for the start tag and one for the end tag. In the future this can be eliminated, but for now we use auto-expanding feature of a reader, because this simplifies deserializer code.

Source

fn peek(&mut self) -> Result<&DeEvent<'de>, DeError>

Source

fn next(&mut self) -> Result<DeEvent<'de>, DeError>

Source

fn skip_checkpoint(&self) -> usize

Returns the mark after which all events, skipped by Self::skip() call, should be replayed after calling Self::start_replay().

Source

fn skip(&mut self) -> Result<(), DeError>

Extracts XML tree of events from and stores them in the skipped events buffer from which they can be retrieved later. You MUST call Self::start_replay() after calling this to give access to the skipped events and release internal buffers.

Source

fn skip_event(&mut self, event: DeEvent<'de>) -> Result<(), DeError>

Source

fn start_replay(&mut self, checkpoint: usize)

Moves buffered events, skipped after given checkpoint from Self::write skip buffer to Self::read buffer.

After calling this method, Self::peek() and Self::next() starts return events that was skipped previously by calling Self::skip(), and only when all that events will be consumed, the deserializer starts to drain events from underlying reader.

This method MUST be called if any number of Self::skip() was called after Self::new() or start_replay() or you’ll lost events.

Source

fn read_string(&mut self) -> Result<Cow<'de, str>, DeError>

Source

fn read_string_impl( &mut self, allow_start: bool, ) -> Result<Cow<'de, str>, DeError>

Consumes consequent Text and CData (both a referred below as a text) events, merge them into one string. If there are no such events, returns an empty string.

If allow_start is false, then only text events are consumed, for other events an error is returned (see table below).

If allow_start is true, then two or three events are expected:

Corresponding events are consumed.

§Handling events

The table below shows how events is handled by this method:

EventXMLHandling
DeEvent::Start<tag>...</tag>if allow_start == true, result determined by the second table, otherwise emits UnexpectedStart("tag")
DeEvent::End</any-tag>This is impossible situation, the method will panic if it happens
DeEvent::Texttext content or <![CDATA[cdata content]]> (probably mixed)Returns event content unchanged
DeEvent::EofEmits UnexpectedEof

Second event, consumed if DeEvent::Start was received and allow_start == true:

EventXMLHandling
DeEvent::Start<any-tag>...</any-tag>Emits UnexpectedStart("any-tag")
DeEvent::End</tag>Returns an empty slice. The reader guarantee that tag will match the open one
DeEvent::Texttext content or <![CDATA[cdata content]]> (probably mixed)Returns event content unchanged, expects the </tag> after that
DeEvent::EofEmits InvalidXml(IllFormed(MissingEndTag))
Source

fn read_text(&mut self, name: QName<'_>) -> Result<Cow<'de, str>, DeError>

Consumes one DeEvent::Text event and ensures that it is followed by the DeEvent::End event.

§Parameters
  • name: name of a tag opened before reading text. The corresponding end tag should present in input just after the text
Source

fn read_to_end(&mut self, name: QName<'_>) -> Result<(), DeError>

Drops all events until event with name name won’t be dropped. This method should be called after Self::next()

Source§

impl<'de> Deserializer<'de, SliceReader<'de>>

Source

pub fn from_str(source: &'de str) -> Self

Create new deserializer that will borrow data from the specified string.

Deserializer created with this method will not resolve custom entities.

Source§

impl<'de, E> Deserializer<'de, SliceReader<'de>, E>
where E: EntityResolver,

Source

pub fn from_str_with_resolver(source: &'de str, entity_resolver: E) -> Self

Create new deserializer that will borrow data from the specified string and use specified entity resolver.

Source§

impl<'de, R> Deserializer<'de, IoReader<R>>
where R: BufRead,

Source

pub fn from_reader(reader: R) -> Self

Create new deserializer that will copy data from the specified reader into internal buffer.

If you already have a string use Self::from_str instead, because it will borrow instead of copy. If you have &[u8] which is known to represent UTF-8, you can decode it first before using from_str.

Deserializer created with this method will not resolve custom entities.

Source§

impl<'de, R, E> Deserializer<'de, IoReader<R>, E>
where R: BufRead, E: EntityResolver,

Source

pub fn with_resolver(reader: R, entity_resolver: E) -> Self

Create new deserializer that will copy data from the specified reader into internal buffer and use specified entity resolver.

If you already have a string use Self::from_str instead, because it will borrow instead of copy. If you have &[u8] which is known to represent UTF-8, you can decode it first before using from_str.

Trait Implementations§

Source§

impl<'de, 'a, R, E> Deserializer<'de> for &'a mut Deserializer<'de, R, E>
where R: XmlRead<'de>, E: EntityResolver,

Source§

fn deserialize_char<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Character represented as strings.

Source§

fn deserialize_string<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Representation of owned strings the same as non-owned.

Source§

fn deserialize_bytes<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Forwards deserialization to the deserialize_any.

Source§

fn deserialize_byte_buf<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Forwards deserialization to the deserialize_bytes.

Source§

fn deserialize_unit_struct<V>( self, _name: &'static str, visitor: V, ) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Representation of the named units the same as unnamed units.

Source§

fn deserialize_tuple<V>( self, _len: usize, visitor: V, ) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Representation of tuples the same as sequences.

Source§

fn deserialize_tuple_struct<V>( self, _name: &'static str, len: usize, visitor: V, ) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Representation of named tuples the same as unnamed tuples.

Source§

fn deserialize_map<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Forwards deserialization to the deserialize_struct with empty name and fields.

Source§

fn deserialize_identifier<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Identifiers represented as strings.

Source§

fn deserialize_ignored_any<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Forwards deserialization to the deserialize_unit.

Source§

fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Unit represented in XML as a xs:element or text/CDATA content. Any content inside xs:element is ignored and skipped.

Produces unit struct from any of following inputs:

  • any <tag ...>...</tag>
  • any <tag .../>
  • any consequent text / CDATA content (can consist of several parts delimited by comments and processing instructions)
§Events handling
EventXMLHandling
DeEvent::Start<tag>...</tag>Calls visitor.visit_unit(), consumes all events up to and including corresponding End event
DeEvent::End</tag>This is impossible situation, the method will panic if it happens
DeEvent::Texttext content or <![CDATA[cdata content]]> (probably mixed)Calls visitor.visit_unit(). The content is ignored
DeEvent::EofEmits UnexpectedEof
Source§

fn deserialize_newtype_struct<V>( self, _name: &'static str, visitor: V, ) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Forwards deserialization of the inner type. Always calls Visitor::visit_newtype_struct with the same deserializer.

Source§

type Error = DeError

The error type that can be returned if some error occurs during deserialization.
Source§

fn deserialize_i8<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an i8 value.
Source§

fn deserialize_i16<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an i16 value.
Source§

fn deserialize_i32<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an i32 value.
Source§

fn deserialize_i64<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an i64 value.
Source§

fn deserialize_u8<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a u8 value.
Source§

fn deserialize_u16<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a u16 value.
Source§

fn deserialize_u32<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a u32 value.
Source§

fn deserialize_u64<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a u64 value.
Source§

fn deserialize_i128<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an i128 value. Read more
Source§

fn deserialize_u128<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an u128 value. Read more
Source§

fn deserialize_f32<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a f32 value.
Source§

fn deserialize_f64<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a f64 value.
Source§

fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a bool value.
Source§

fn deserialize_str<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a string value and does not benefit from taking ownership of buffered data owned by the Deserializer. Read more
Source§

fn deserialize_struct<V>( self, _name: &'static str, fields: &'static [&'static str], visitor: V, ) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a struct with a particular name and fields.
Source§

fn deserialize_enum<V>( self, _name: &'static str, _variants: &'static [&'static str], visitor: V, ) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an enum value with a particular name and possible variants.
Source§

fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting a sequence of values.
Source§

fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Hint that the Deserialize type is expecting an optional value. Read more
Source§

fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, DeError>
where V: Visitor<'de>,

Require the Deserializer to figure out how to drive the visitor based on what data type is in the input. Read more
Source§

fn is_human_readable(&self) -> bool

Determine whether Deserialize implementations should expect to deserialize their human-readable form. Read more
Source§

impl<'de, 'a, R, E> SeqAccess<'de> for &'a mut Deserializer<'de, R, E>
where R: XmlRead<'de>, E: EntityResolver,

An accessor to sequence elements forming a value for top-level sequence of XML elements.

Technically, multiple top-level elements violates XML rule of only one top-level element, but we consider this as several concatenated XML documents.

Source§

type Error = DeError

The error type that can be returned if some error occurs during deserialization.
Source§

fn next_element_seed<T>( &mut self, seed: T, ) -> Result<Option<T::Value>, Self::Error>
where T: DeserializeSeed<'de>,

This returns Ok(Some(value)) for the next value in the sequence, or Ok(None) if there are no more remaining items. Read more
Source§

fn next_element<T>(&mut self) -> Result<Option<T>, Self::Error>
where T: Deserialize<'de>,

This returns Ok(Some(value)) for the next value in the sequence, or Ok(None) if there are no more remaining items. Read more
Source§

fn size_hint(&self) -> Option<usize>

Returns the number of elements remaining in the sequence, if known.

Auto Trait Implementations§

§

impl<'de, R, E> Freeze for Deserializer<'de, R, E>
where R: Freeze, E: Freeze,

§

impl<'de, R, E = PredefinedEntityResolver> !RefUnwindSafe for Deserializer<'de, R, E>

§

impl<'de, R, E> Send for Deserializer<'de, R, E>
where R: Send, E: Send,

§

impl<'de, R, E> Sync for Deserializer<'de, R, E>
where R: Sync, E: Sync,

§

impl<'de, R, E> Unpin for Deserializer<'de, R, E>
where R: Unpin, E: Unpin,

§

impl<'de, R, E = PredefinedEntityResolver> !UnwindSafe for Deserializer<'de, R, E>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.