RtlUTF8ToUnicodeN - NtDoc

Native API online documentation, based on the System Informer (formerly Process Hacker) phnt headers
#ifndef _NTRTL_H

/**
 * The RtlUTF8ToUnicodeN routine translates the specified source string into a Unicode string, using the 8-bit Unicode Transformation Format (UTF-8) code page.
 *
 * \param UnicodeStringDestination Pointer to a caller-allocated buffer to receive the translated string.
 * \param UnicodeStringMaxByteCount Maximum number of bytes to be written at MultiByteString. If this value causes the translated string to be truncated, RtlUpcaseUnicodeToMultiByteN does not return an error status.
 * \param UnicodeStringActualByteCount Pointer to a caller-allocated variable that receives the length, in bytes, of the translated string. This parameter can be NULL.
 * \param UTF8StringSource Pointer to the Unicode source string to be translated.
 * \param UTF8StringByteCount Size, in bytes, of the string at UnicodeString.
 * \return NTSTATUS Successful or errant status.
 * \sa https://learn.microsoft.com/en-us/windows/win32/devnotes/rtlutf8tounicoden
 */
NTSYSAPI
NTSTATUS
NTAPI
RtlUTF8ToUnicodeN(
    _Out_writes_bytes_to_(UnicodeStringMaxByteCount, *UnicodeStringActualByteCount) PWSTR UnicodeStringDestination,
    _In_ ULONG UnicodeStringMaxByteCount,
    _Out_opt_ PULONG UnicodeStringActualByteCount,
    _In_reads_bytes_(UTF8StringByteCount) PCCH UTF8StringSource,
    _In_ ULONG UTF8StringByteCount
    );

#endif

View code on GitHub
// ntifs.h

NTSYSAPI NTSTATUS RtlUTF8ToUnicodeN(
  [out, optional] PWSTR  UnicodeStringDestination,
  [in]            ULONG  UnicodeStringMaxByteCount,
  [out]           PULONG UnicodeStringActualByteCount,
  [in]            PCCH   UTF8StringSource,
  [in]            ULONG  UTF8StringByteCount
);
View the official Windows Driver Kit DDI reference
// wdm.h

NTSYSAPI NTSTATUS RtlUTF8ToUnicodeN(
  [out, optional] PWSTR  UnicodeStringDestination,
  [in]            ULONG  UnicodeStringMaxByteCount,
  [out]           PULONG UnicodeStringActualByteCount,
  [in]            PCCH   UTF8StringSource,
  [in]            ULONG  UTF8StringByteCount
);
View the official Windows Driver Kit DDI reference
// wdm.h

NTSTATUS WINAPI RtlUTF8ToUnicodeN(
  _Out_     PWSTR  UnicodeStringDestination,
  _In_      ULONG  UnicodeStringMaxByteCount,
  _Out_opt_ PULONG UnicodeStringActualByteCount,
  _In_      PCCH   UTF8StringSource,
  _In_      ULONG  UTF8StringByteCount
);
View the official Win32 development documentation

NtDoc

This function is documented in Windows Driver Kit here and here.

Windows Driver Kit DDI reference (nf-ntifs-rtlutf8tounicoden)

RtlUTF8ToUnicodeN function (ntifs.h)

Description

The RtlUTF8ToUnicodeN routine converts a UTF-8 string to a Unicode string.

Parameters

UnicodeStringDestination [out, optional]

A pointer to a caller-allocated destination buffer into which the routine writes the Unicode output string. If this parameter is NULL, the routine writes the required size of the output buffer to *UnicodeStringActualByteCount.

UnicodeStringMaxByteCount [in]

Specifies the maximum number of bytes that the routine can write to the buffer that UnicodeStringDestination points to. If UnicodeStringDestination = NULL, set UnicodeStringMaxByteCount = 0.

UnicodeStringActualByteCount [out]

A pointer to a location into which the routine writes the actual number of bytes that the routine has written to the buffer that UnicodeStringDestination points to. If UnicodeStringDestination is non-NULL, this count never exceeds the value of UnicodeStringMaxByteCount. If UnicodeStringDestination is NULL, this count is the number of bytes that are required to contain the entire output string.

UTF8StringSource [in]

A pointer to the UTF-8 source string.

UTF8StringByteCount [in]

Specifies the number of bytes in the UTF-8 source string that the UTF8StringSource parameter points to.

Return value

RtlUTF8ToUnicodeN returns STATUS_SUCCESS if the call is successful and all UTF-8 character codes in the input string were converted to the corresponding Unicode character codes in the output string. It returns STATUS_SOME_NOT_MAPPED if the call is successful but one or more input characters were invalid and were converted to the Unicode replacement character, U+FFFD. Possible error return values include the following error codes:

Return code Description
STATUS_BUFFER_TOO_SMALL The UnicodeStringMaxByteCount parameter specifies a buffer size that is too small to contain the entire output string.
STATUS_INVALID_PARAMETER The UnicodeStringDestination and UnicodeStringActualByteCount parameters are both NULL.
STATUS_INVALID_PARAMETER_4 The UTF8StringSource parameter is NULL.

Remarks

The Unicode output string is null-terminated only if the UTF-8 input string is null-terminated.

The routine returns STATUS_BUFFER_TOO_SMALL if the UnicodeStringMaxByteCount parameter specifies a buffer size that is too small to contain the entire output string. In this case, the routine writes as many Unicode characters as will fit in the buffer, and the *UnicodeStringActualByteCount value specifies the number of valid bytes that the routine has written to the buffer. The partial string that is contained in the output buffer might not include a terminating null character.

You can make an initial call to RtlUTF8ToUnicodeN to obtain the required output buffer size, and then call RtlUTF8ToUnicodeN again to obtain the Unicode output string. In the initial call, set UnicodeStringDestination = NULL and UnicodeStringMaxByteCount = 0, and the routine will write the required buffer size to UnicodeStringActualByteCount. Next, allocate a buffer of the required size and call *RtlUTF8ToUnicodeN a second time to obtain the Unicode output string.

RtlUTF8ToUnicodeN supports Unicode surrogate pairs. However, a surrogate leading word value that is not followed by a trailing word value, or a trailing word value that is not preceded by a leading word value, is not recognized as a valid character and is replaced by the Unicode replacement character, U+FFFD.

RtlUTF8ToUnicodeN continues to convert the input string to an output string until it reaches the end of the source buffer or the end of the destination buffer, whichever occurs first. The routine converts any null characters in the input string to null characters in the output string. If the input string contains a terminating null character, but the null character is not located at the end of the source buffer, the routine continues past the terminating null character until it reaches the end of the available buffer space.

The RtlUnicodeToUTF8N routine converts a Unicode string to a UTF-8 string.

You can use the RtlUTF8ToUnicodeN and RtlUnicodeToUTF8N routines to perform a lossless conversion of valid text strings between the UTF-8 and Unicode formats. However, strings that have arbitrary data values are likely to violate the Unicode rules for encoding surrogate pairs, and any information that is contained in the invalid values in an input string is lost and cannot be recovered from the resulting output string.

See also

RtlUnicodeToUTF8N


Windows Driver Kit DDI reference (nf-wdm-rtlutf8tounicoden)

RtlUTF8ToUnicodeN function (wdm.h)

Description

The RtlUTF8ToUnicodeN routine converts a UTF-8 string to a Unicode string.

Parameters

UnicodeStringDestination [out, optional]

A pointer to a caller-allocated destination buffer into which the routine writes the Unicode output string. If this parameter is NULL, the routine writes the required size of the output buffer to *UnicodeStringActualByteCount.

UnicodeStringMaxByteCount [in]

Specifies the maximum number of bytes that the routine can write to the buffer that UnicodeStringDestination points to. If UnicodeStringDestination = NULL, set UnicodeStringMaxByteCount = 0.

UnicodeStringActualByteCount [out]

A pointer to a location into which the routine writes the actual number of bytes that the routine has written to the buffer that UnicodeStringDestination points to. If UnicodeStringDestination is non-NULL, this count never exceeds the value of UnicodeStringMaxByteCount. If UnicodeStringDestination is NULL, this count is the number of bytes that are required to contain the entire output string.

UTF8StringSource [in]

A pointer to the UTF-8 source string.

UTF8StringByteCount [in]

Specifies the number of bytes in the UTF-8 source string that the UTF8StringSource parameter points to.

Return value

RtlUTF8ToUnicodeN returns STATUS_SUCCESS if the call is successful and all UTF-8 character codes in the input string were converted to the corresponding Unicode character codes in the output string. It returns STATUS_SOME_NOT_MAPPED if the call is successful but one or more input characters were invalid and were converted to the Unicode replacement character, U+FFFD. Possible error return values include the following error codes:

Return code Description
STATUS_BUFFER_TOO_SMALL The UnicodeStringMaxByteCount parameter specifies a buffer size that is too small to contain the entire output string.
STATUS_INVALID_PARAMETER The UnicodeStringDestination and UnicodeStringActualByteCount parameters are both NULL.
STATUS_INVALID_PARAMETER_4 The UTF8StringSource parameter is NULL.

Remarks

The Unicode output string is null-terminated only if the UTF-8 input string is null-terminated.

The routine returns STATUS_BUFFER_TOO_SMALL if the UnicodeStringMaxByteCount parameter specifies a buffer size that is too small to contain the entire output string. In this case, the routine writes as many Unicode characters as will fit in the buffer, and the *UnicodeStringActualByteCount value specifies the number of valid bytes that the routine has written to the buffer. The partial string that is contained in the output buffer might not include a terminating null character.

You can make an initial call to RtlUTF8ToUnicodeN to obtain the required output buffer size, and then call RtlUTF8ToUnicodeN again to obtain the Unicode output string. In the initial call, set UnicodeStringDestination = NULL and UnicodeStringMaxByteCount = 0, and the routine will write the required buffer size to UnicodeStringActualByteCount. Next, allocate a buffer of the required size and call *RtlUTF8ToUnicodeN a second time to obtain the Unicode output string.

RtlUTF8ToUnicodeN supports Unicode surrogate pairs. However, a surrogate leading word value that is not followed by a trailing word value, or a trailing word value that is not preceded by a leading word value, is not recognized as a valid character and is replaced by the Unicode replacement character, U+FFFD.

RtlUTF8ToUnicodeN continues to convert the input string to an output string until it reaches the end of the source buffer or the end of the destination buffer, whichever occurs first. The routine converts any null characters in the input string to null characters in the output string. If the input string contains a terminating null character, but the null character is not located at the end of the source buffer, the routine continues past the terminating null character until it reaches the end of the available buffer space.

The RtlUnicodeToUTF8N routine converts a Unicode string to a UTF-8 string.

You can use the RtlUTF8ToUnicode and RtlUnicodeToUTF8N routines to perform a lossless conversion of valid text strings between the UTF-8 and Unicode formats. However, strings that have arbitrary data values are likely to violate the Unicode rules for encoding surrogate pairs, and any information that is contained in the invalid values in an input string is lost and cannot be recovered from the resulting output string.

See also

RtlUnicodeToUTF8N


Win32 development documentation (rtlutf8tounicoden)

RtlUTF8ToUnicodeN function

Translates the specified source string into a Unicode string, using the 8-bit Unicode Transformation Format (UTF-8) code page.

Parameters

UnicodeStringDestination [out]

A pointer to a caller-allocated buffer that receives the translated string.

UnicodeStringMaxByteCount [in]

Maximum number of bytes to be written at UnicodeStringDestination. If this value causes the translated string to be truncated, RtlUTF8ToUnicodeN returns an error status.

UnicodeStringActualByteCount [out, optional]

A pointer to a caller-allocated variable that receives the length, in bytes, of the translated string. This parameter is optional and can be NULL. If the string is truncated then the returned number counts the actual truncated string count.

UTF8StringSource [in]

A pointer to the string to be translated.

UTF8StringByteCount [in]

Size, in bytes, of the string at UTF8StringSource.

Return value

RtlUTF8ToUnicodeN returns one of the following NTSTATUS values:

Return code Description
STATUS_SUCCESS The string was converted to Unicode.
STATUS_SOME_NOT_MAPPED An invalid input character was encountered and replaced. This status is considered a success status.
STATUS_INVALID_PARAMETER Both pointers to UnicodeStringDestination and UnicodeStringActualByteCount were NULL.
STATUS_INVALID_PARAMETER_4 The UTF8StringSource was NULL.
STATUS_BUFFER_TOO_SMALL UnicodeStringDestination was truncated.

Remarks

Although UnicodeStringActualByteCount is optional and can be NULL, callers should provide storage for it, because the received length can be used to determine whether the conversion was successful.

If the output is truncated and an invalid input character is encountered then the function returns STATUS_BUFFER_TOO_SMALL error.

If the UnicodeStringDestination is set to NULL the function will return the required number of bytes to host the translated string without any truncation in UnicodeStringActualByteCount.

RtlUTF8ToUnicodeN does not modify the source string unless the UnicodeStringDestination and UTF8StringSource pointers are equivalent. The returned Unicode string is not null-terminated.

Callers of RtlUTF8ToUnicodeN must be running at IRQL < DISPATCH_LEVEL.

Requirements

Requirement Value
Minimum supported client
Windows 7 [desktop apps only]
Minimum supported server
Windows Server 2008 R2 [desktop apps only]
Header
Wdm.h
DLL
Ntdll.dll

See also

RtlUnicodeToUTF8N