C# – .NET 4.7 – Stream Encoding problem – incorrect reads of unicode and ascii characters

Solution for C# – .NET 4.7 – Stream Encoding problem – incorrect reads of unicode and ascii characters
is Given Below:

I wrote small function to write the program data to a stream. Presumably this stream could go anywhere: To a file (the current use-case), to a socket, to memory, wherever. So naturally I just wrote the data using the default encoding. After some testing it threw an encoding exception. So I had to encode the string into a byte array and write the bytes.

There’s a problem though: The bytes written do not decode back into the same string when read. This is not a problem if we use ascii characters that are typable on a keyboard, but it does become a problem when we begin to used unicode characters and apparently 27 ascii characters.

Here is the test case. I encourage you to run it:

using System.IO;
using System.Text;
using System;

public class TestCase
{
    public static void Main(string[] args)
    {
        readwrite_tests();
    }
    
    public static void readwrite_tests()
    {
        string temps, result;
        ulong count = 0;
        byte[] buffer = new byte[sizeof(char) * 4];

        using(MemoryStream mem = new MemoryStream(buffer))
        using (BinaryReader reader = new BinaryReader(mem, Encoding.Default))
        using (BinaryWriter writer = new BinaryWriter(mem, Encoding.Default))
        {
            for(char c = char.MinValue; c <= 0xfff; ++c)
            {
                temps = c.ToString();
                if(mem.Position != 0) mem.Seek(0, SeekOrigin.Begin);
                result = read_write(temps, writer, reader, mem);
                if(!result.Equals(temps))
                {
                    //Console.Write("char: " + c.ToString() + "  int: " + ((int)c).ToString() +
                    //    "tread: " + result + "  int: [");
                    //foreach (char d in result) Console.Write(((int)d).ToString() + " ");
                    //Console.WriteLine("]");
                    ++count;
                }
            }
        }
        Console.WriteLine("Incorrect reads is " + count.ToString() + 
            " out of " + int.Parse("fff", System.Globalization.NumberStyles.HexNumber));
        Console.WriteLine("Correct Reads: " + ((ulong)int.Parse("fff", System.Globalization.NumberStyles.HexNumber) - count));
    }

    public static string read_write(string s, BinaryWriter writer, BinaryReader reader, Stream stream)
    {
        string read_string = "";

        byte[] bytes = Encoding.Default.GetBytes(s);
        writer.Write(bytes.Length);
        writer.Write(bytes);
        stream.Seek(0, SeekOrigin.Begin);
        try
        {
            read_string = Encoding.Default.GetString(reader.ReadBytes(reader.ReadInt32()));
        }
        catch(EndOfStreamException)
        {
        }
        return read_string;
    }
}

Please run this on https://dotnetfiddle.net/ to observe the results.

As you can see, we have only 238 correct reads. I don’t understand why this is happening. Let me know if there is any more information I can provide, but I have tried quite a bit, including using JsonSerializer instead (with the same results).

Choose an explicit encoding and stick to it. Preferrably UTF-8. In .NET 4.7.2 the default encoding (at least on .NET Fiddle) is Western European (Windows). In .NET 5 is it is Unicode (UTF-8).

If you don’t believe me, add this line to your read_write routine:

Console.WriteLine(Encoding.Default.EncodingName);