C# - How to Reverse a Unicode String


Perhaps due to the lack of a built-in String.Reverse method in the .NET Framework, it's very common for implementations of such a method to be posted.
Unfortunately, most of these implementations do not handle characters outside Unicode's Basic Multilingual Plane correctly. These supplementary characters have code points between U+10000 and U+10FFFF and so cannot be represented with one 16-bit char. In UTF-16 (which is how .NET strings are encoded), these Unicode characters are represented as two C# chars, a high surrogatefollowed by a low surrogate. When the string is reversed, the order of these two chars has to be preserved.
Here's our method that reverses a string while handling surrogate code units correctly:

/// <summary>
/// Reverses the specified string.
/// </summary>
/// <param name="input">The string to reverse.</param>
/// <returns>The input string, reversed.</returns>
/// <remarks>This method correctly reverses strings containing supplementary characters
/// (which are encoded with two surrogate code units).</remarks>
public static string Reverse(this string input)
{
    if (input == null)
        throw new ArgumentNullException("input");

    // allocate a buffer to hold the output
    char[] output = new char[input.Length];
    for (int outputIndex = 0, inputIndex = input.Length - 1; outputIndex <    input.Length; outputIndex++, inputIndex--)
    {
        // check for surrogate pair
        if (input[inputIndex] >= 0xDC00 && input[inputIndex] <= 0xDFFF &&
            inputIndex > 0 && input[inputIndex - 1] >= 0xD800 &&  input[inputIndex - 1] <= 0xDBFF)
        {
            // preserve the order of the surrogate pair code units
            output[outputIndex + 1] = input[inputIndex];
            output[outputIndex] = input[inputIndex - 1];
            outputIndex++;
            inputIndex--;
        }
        else
        {
            output[outputIndex] = input[inputIndex];
        }
    }

    return new string(output);
}

No comments:

Post a Comment