Utilities related to FFI bindings.
This module provides utilities to handle data across non-Rust interfaces, like other programming languages and the underlying operating system. It is mainly of use for FFI (Foreign Function Interface) bindings and code that needs to exchange C-like strings with other languages.
Rust represents owned strings with the
String type, and
borrowed slices of strings with the
str primitive. Both are
always in UTF-8 encoding, and may contain nul bytes in the middle,
i.e., if you look at the bytes that make up the string, there may
\0 among them. Both
str store their length
explicitly; there are no nul terminators at the end of strings
like in C.
C strings are different from Rust strings:
Encodings - Rust strings are UTF-8, but C strings may use other encodings. If you are using a string from C, you should check its encoding explicitly, rather than just assuming that it is UTF-8 like you can do in Rust.
Character size - C strings may use
wchar_t-sized characters; please note that C’s
charis different from Rust’s. The C standard leaves the actual sizes of those types open to interpretation, but defines different APIs for strings made up of each character type. Rust strings are always UTF-8, so different Unicode characters will be encoded in a variable number of bytes each. The Rust type
charrepresents a ‘Unicode scalar value’, which is similar to, but not the same as, a ‘Unicode code point’.
Nul terminators and implicit string lengths - Often, C strings are nul-terminated, i.e., they have a
\0character at the end. The length of a string buffer is not stored, but has to be calculated; to compute the length of a string, C code must manually call a function like
char-based strings, or
wchar_t-based ones. Those functions return the number of characters in the string excluding the nul terminator, so the buffer length is really
len+1characters. Rust strings don’t have a nul terminator; their length is always stored and does not need to be calculated. While in Rust accessing a string’s length is an O(1) operation (because the length is stored); in C it is an O(n) operation because the length needs to be computed by scanning the string for the nul terminator.
Internal nul characters - When C strings have a nul terminator character, this usually means that they cannot have nul characters in the middle — a nul character would essentially truncate the string. Rust strings can have nul characters in the middle, because nul does not have to mark the end of the string in Rust.
Representations of non-Rust strings
CStr are useful when you need to transfer
UTF-8 strings to and from languages with a C ABI, like Python.
From Rust to C:
CStringrepresents an owned, C-friendly string: it is nul-terminated, and has no internal nul characters. Rust code can create a
CStringout of a normal string (provided that the string doesn’t have nul characters in the middle), and then use a variety of methods to obtain a raw
*mut u8that can then be passed as an argument to functions which use the C conventions for strings.
From C to Rust:
CStrrepresents a borrowed C string; it is what you would use to wrap a raw
*const u8that you got from a C function. A
CStris guaranteed to be a nul-terminated array of bytes. Once you have a
CStr, you can convert it to a Rust
&strif it’s valid UTF-8, or lossily convert it by adding replacement characters.
OsStr are useful when you need to transfer
strings to and from the operating system itself, or when capturing
the output of external commands. Conversions between
OsStr and Rust strings work similarly to those for
OsStringlosslessly represents an owned platform string. However, this representation is not necessarily in a form native to the platform. In the Rust standard library, various APIs that transfer strings to/from the operating system use
OsStringinstead of plain strings. For example,
env::var_os()is used to query environment variables; it returns an
Option<OsString>. If the environment variable exists you will get a
Some(os_string), which you can then try to convert to a Rust string. This yields a
Result, so that your code can detect errors in case the environment variable did not in fact contain valid Unicode data.
OsStrlosslessly represents a borrowed reference to a platform string. However, this representation is not necessarily in a form native to the platform. It can be converted into a UTF-8 Rust string slice in a similar way to
OsStr implements the
std::os::unix::ffi::OsStrExt trait, which
augments it with two methods,
These do inexpensive conversions from and to byte slices.
Additionally, on Unix
OsString implements the
into_vec methods that consume
their arguments, and take or produce vectors of
OsStr can be losslessly converted to a native Windows string. And
a native Windows string can be losslessly converted to an
OsStr implements the
which provides an
encode_wide method. This provides an
iterator that can be
collected into a vector of
u16. After a nul
characters is appended, this is the same as a native Windows string.
Additionally, on Windows
OsString implements the
trait, which provides a
from_wide method to convert a native Windows
string (without the terminating nul character) to an
signed long long(
long long) type.
unsigned long longtype.