Byte arrays, typed values, binary reader, and fwrite
I was trying to read a binary file created from a native app using the C# BinaryReader class but kept getting weird numbers. When I checked the hex in visual studio I saw that the bytes were backwards from what I expected, indicating endianess issues. This threw me for a loop since I was writing the file from C++ on the same machine that I was reading the file in C# in. Also, I wasn’t sending any data over the network so I was a little confused. Endianess is usually an issue across machine architectures or over the network.
The issue is that I ran into an endianess problem when writing values byte by byte, versus by using the actual data type of an object. Let me demonstrate the issue
What happens if I write 65297 (0xFF11) using C++
#include "stdafx.h"
#include "fstream"
int \_tmain(int argc, \_TCHAR\* argv[])
{
char buffer[] = { 0xFF, 0x11 };
auto \_stream = fopen("test2.out","wb");
fwrite(buffer, 1, sizeof(buffer), \_stream);
fclose(\_stream);
}
And read it in using the following C# code
public void ReadBinary()
{
using (var reader = new BinaryReader(new FileStream(@"test2.out", FileMode.Open)))
{
// read two bytes and print them out in hex
foreach (var b in reader.ReadBytes(2))
{
Console.Write("{0:X}", b);
}
Console.WriteLine();
// go back to the beginning
reader.BaseStream.Seek(0, SeekOrigin.Begin);
// read a two byte short and print it out in hex
var val = reader.ReadUInt16();
Console.WriteLine("{0:X}", val);
}
}
What would you expect I get? You might think we get the same thing both times, a 16 bit unsigned integer (2 bytes) and reading two bytes from the file should be the same right?
Actually, I got
FF11 \<-- reading in two bytes
11FF \<-- reading in a two byte short
What gives?
Turns out that since I’m on a little endian system (intel x86), when you read data as a typed structure it will always read little endian. The binary reader class in C# reads little endian, and fwrite in C++ will write little endian, as long as you aren’t writing a value byte by byte.
When you write a value byte by byte it doesn’t go through the correct endianess conversion. This means that you should make sure to use consistent write semantics. If you are going to write values byte by byte always write them byte by byte. If you are going to use typed data, always write with typed data. If you mix the write paradigms you can get into weird situations where some numbers are “big endian” (by writing it byte by byte), and some other values are little endian (by using typed data).
Here’s a good quote from the ibm blog on writing endianness independent code summarizing the effect:
Endianness does matter when you use a type cast that depends on a certain endian being in use.
If you do happen to need to write byte by byte, and you want to read values in directly as casted types in C#, you can make use of Jon Skeet’s MiscUtil which contains a big endian and little endian binary reader/writer class. By using the big endian reader you can now read files where you wrote them from C++ byte by byte.
Here is a fixed version
using (var reader = new EndianBinaryReader(new BigEndianBitConverter(), new FileStream(@test2.out", FileMode.Open)))
{
foreach (var b in reader.ReadBytes(2))
{
Console.Write("{0:X}", b);
}
Console.WriteLine();
reader.BaseStream.Seek(0, SeekOrigin.Begin);
var val = reader.ReadUInt16();
Console.WriteLine("{0:X}", val);
}
Which spits out
FF11
FF11