LiteByteStringUtf8
This pattern will silently corrupt certain byte sequences from the serialized protocol message. Use ByteString or byte[] directly

Category
Severity
ERROR
Maturity

The problem

When serializing bytes from a MessageLite, one can use toByteString to get a ByteString, effectively an immutable wrapper over a byte[]. This ByteString can be passed around and deserialized into a message using MyMessage.Builder.mergeFrom(ByteString).

ByteString#toStringUtf8 copies UTF-8 encoded byte data living inside the ByteString to a java.lang.String, replacing any invalid UTF-8 byte sequences with � (the Unicode replacement character).

In this circumstance, a protocol message is being serialized to a ByteString, then immediately turned into a Java String using the toStringUtf8 method. However, serialized protocol buffers are arbitrary binary data and not UTF-8-encoded data. Thus, the resulting String may not match the actual serialized bytes from the protocol message.

Instead of holding the serialized protocol message in a Java String, carry around the actual bytes in a ByteString, byte[], or some other equivalent container for arbitrary binary data.

Suppression

Suppress false positives by adding an @SuppressWarnings("LiteByteStringUtf8") annotation to the enclosing element.