How to read UTF8 file with BOM?

Wikipedia about UTF8 BOM

The byte order mark (BOM) is a Unicode character, U+FEFF byte order mark (BOM), whose appearance as a magic number at the start of a text stream can signal several things to a program consuming the text:[1]

  • What byte order, or endianness, the text stream is stored in;
  • The fact that the text stream is Unicode, to a high level of confidence;
  • Which of several Unicode encodings that text stream is encoded as.

BOM use is optional, and, if used, should appear at the start of the text stream.


UTF8 file are a special case because it is not recommended to add a BOM to them because it can break other tools like Java. In fact, Java assumes the UTF8 don’t have a BOM so if the BOM is present it won’t be discarded and it will be seen as data. I focused with some issues when I was working with integrations. I built a web service, which accepts request data. Request payload was looking fine and I was wondering why XML validator is not working, when I figured out that one strange character exists in the begging of data.


Here is a Java example how to read and ignore BOM from text file (GitHub link):

package com.vilbay;


public class Main {

    public static final String UTF8_BOM = "\uFEFF";

    public static void main(String[] args) throws IOException {

        InputStreamReader inputStreamReader;
        inputStreamReader = new InputStreamReader(Main.class.getResourceAsStream("example.txt"), "UTF8");
        BufferedReader bufferedReader = new BufferedReader(inputStreamReader);

        boolean firstLine = true;

        for (String line = ""; (line = bufferedReader.readLine()) != null;) {
            if (firstLine) {
                line = Main.removeUTF8BOM(line);
                firstLine = false;




    private static String removeUTF8BOM(String string) {
        if (string.startsWith(UTF8_BOM)) {
            string = string.substring(1);
        return string;


Leave a Reply

Your email address will not be published. Required fields are marked *