Saturday, February 09, 2013

org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence

I came across this error as I was trying to unmarshal some xml data that was being received by a REST service:

org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence

I would only get the error sporadically. I finally determined it was due to certain characters in the XML being passed in. When I changed the XML encoding of the incoming XML to ISO-8859-1, then one particular XML document would work fine, but I didn't want to have to worry about changing from UTF-8, as UTF-8 is pretty much the global standard now.

After digging a little deeper, I decided to change my unmarshal code from this:

            byte[] bArray = xmlInput.getBytes();
           ByteArrayInputStream bais = new ByteArrayInputStream(bArray);
            JAXBContext jc = JAXBContext.newInstance(Article.class);
            Unmarshaller u = jc.createUnmarshaller();
            Article article = (Article) u.unmarshal(bais);


to this:

            StringReader reader = new StringReader(xmlInput);
            JAXBContext jc = JAXBContext.newInstance(Article.class);
            Unmarshaller u = jc.createUnmarshaller();
            Article article = (Article) u.unmarshal(reader);


This solved my problems! Apparently the conversion of the xml into a byte array was causing the problem. My initial code was taken from a tutorial I had gone through so I hadn't considered that it would be causing the problems but glad I found it.