Sometimes we get a file with the BOM (or preamble) bytes at the start of the file, which denote a UNICODE encoded file. We don’t always care these and want to simple remove the BOM (if one exists).
Here’s some fairly simple code which shows the reading of a stream or file with code to “skip the BOM” at the bottom
using (var stream =
File.Open(currentLogFile,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite))
{
var length = stream.Length;
var bytes = new byte[length];
var numBytesToRead = (int)length;
var numBytesRead = 0;
do
{
// read the file in chunks of 1024
var n = stream.Read(
bytes,
numBytesRead,
Math.Min(1024, numBytesToRead));
numBytesRead += n;
numBytesToRead -= n;
} while (numBytesToRead > 0);
// skip the BOM
var bom = new UTF8Encoding(true).GetPreamble();
return bom.Where((b, i) => b != bytes[i]).Any() ?
bytes :
bytes.Skip(bom.Length).ToArray();
}