Cay Horstmann’s Unblog

Kay Horstmann’s Unblog


.jpg

Project Valhalla promises to level up objects in the Java Virtual Machine. Of course, it is much better to have an array of fifty million values ​​than fifty million object references, each with a header and a value. Just before I gave a presentation about it at the JFall conference, I saw an article inside.java who promised such leveling with an array LocalDate objects. But I couldn’t reproduce it. Criminal? Of course, serialization. Read on for the gory details!

Project Valhalla

Eleven years in the making, Project Valhalla promises: “The code runs like a class, like a class intAnd it’s really getting there. you define a value class Or value recordSuch as

public value record Point(double x, double y) {
   public Point moveBy(double dx, double dy) { return new Point(x + dx, y + dy); }
}

The JVM will try to represent it as a 16 byte value rather than a reference to a heap object containing headers + 16 bytes.

Right now, it is not easy to observe. In my conference presentation, I had to use an internal API to create an array that could be flattened:

Point[] path = (Point[]) ValueClass.newNullRestrictedNonAtomicArray(Point.class,
    NPOINTS, new Point(0, 0));

At some point in the future, you will be able to write something like this

var path = new Point![NPOINTS]{ new Point(0, 0) } // Syntax may change

The morning before the presentation I came across an article by Dan Smith that made it so easy. he makes a set LocalDateCall toArrayand processes the values. With Valhalla, trivial benchmarks are significantly faster. No internal API required.

But this only works with the “Early Access” release posted at https://jdk.java.net/valhalla/. When I built Valhalla JDK from the Git repo, there was no improvement. I was very confused.

The whole point of my presentation was “Trust, but verify”. So, following your wise advice, I confirmed that

jcmd DateTest GC.class_histogram | head

And indeed, with the EA build, we can see the flattening:

1:             1      400000016  [Ljava.time.LocalDate; (java.base@26-jep401ea2)

Woohoo—a flat array of 50_000_000 × 8 bytes + a header.

But when building Valhalla from source, no:

    1:      50000003     1200000072  java.time.LocalDate (java.base@26-internal)
    2:             1      200000016  [Ljava.time.LocalDate; (java.base@26-internal)

It is a chain of fifty million objects and fifty million references. Also a header.

Why the difference? Mystery.

disability

We need to talk on our way to solving this mystery nullJava is Java, any object can potentially be nullbeing an object value class (Or value record) doesn’t change that. So, with a value type Point

Point p = null;

It will be absolutely fine.

This puts pressure on prices. An extra bit is needed to indicate whether there is a value or not. nullOr flat area filled with goodness. In the example of PointA potentially zero value requires 129 bits. In my presentation, I was able to demonstrate effective flattening with a non-public method newNullRestrictedNonAtomicArrayPromise no zero values. Because Valhalla currently has no syntax for explicitly declaring such information. ! Syntax will come later.

With Valhalla, LocalDate Is a value type. Its year/month/day representation was flattened. Or nullEventually, there will be some code where a LocalDate Of null Some indicated a situation where no date was available.

On current hardware, an implicitly flattenable value needs to fit into eight bytes, including the null indicator bit. It seems that LocalDate Will not be eligible. It has three example fields

    private final int year;
    private final short month;
    private final short day;

All together, 65 bits. This is why I couldn’t see any optimizations when building Valhalla from source.

Why did the Early Access build perform better? it has been changed LocalDate To

    private final int year;
    private final byte month;
    private final byte day;

And why not? Calendar months and days fit into one byte. and now LocalDate Takes 48 bits, or 49 with the zero bit. (The latter actually takes one bite.) Small enough to flatten.

serialization

Sadly, change from short To byte In LocalDate Was soon undone. And why? Numbering. Absolutely.

Absolutely? I looked closely. LocalDate Class doesn’t just write field values ObjectOutputStreamThat must be really delicate, Instead, it uses writeReplace Mechanism for sorting a separate object, keeping the year as one intAnd as the month and day byteFor efficiency, This has been the case since Java 8,

So, changing the field from short To byte It shouldn’t make any difference. The JDK experts thought so too, because in Java 25 they did exactly the same.

which allowed significant performance improvements in inside.java Article.

Why did they undo such an auspicious change? It is very subtle. writeReplace The system did what it was supposed to do. This separated the wire format from internal representation. to sort java.time.LocalDate Creating objects with Java 8 and deserializing them with Java 25, or vice versa, worked perfectly.

But then, someone sorted it out LocalDate.class Object, possibly in the depths of some structure. And that didn’t work well.

More than you ever wanted to know about numbering

.gif

In 1996, within a few weeks of the release of Java 1.0, the first version of Core Java was published. Its code complement included a rudimentary serialization library, a feature that was conspicuously missing from Java at the time. Numbering is not difficult. The point is to be able to identify objects that have been encountered before. When Java 1.1 came out with a proper serialization mechanism, I updated the coverage to omit my own, and the official description including a deep dive into the bytes of the wire format.

The basic idea is simple. An object is serialized by typing its fields, excluding fields that are static Or transientIn contrast, when reading in an object, those fields are restored from the values ​​in the object stream,

What happens when a class changes after the time the object was written? The serialized object’s data contains serialVersionUID Of the class: A hash of the names and modifiers of the class, interface, fields and methods. If that hash changes, deserialization fails. And it makes sense. If the class has changed in any way, all bets are off beyond the meaning of those old field values.

But classes evolve all the time, and wouldn’t it be a shame if a harmless change stood in the way of deserialization? For this reason, Java designers provide a mechanism for “versioning”.

To opt for versioning, a class must declare a static field

private static final long serialVersionUID = ...L;

That value, not the hash, is then used to identify the class. The programmer is responsible for changing it serialVersionUID Whenever the data representation changes inconsistently, or keep it the same and do something to take care of any compatibility issues.

For example, java.util.Date class one provides serialVersionUID and declares all instance fields transientNone of them have been saved, Instead, the number of milliseconds since the Unix epoch was written:

@java.io.Serial
private void writeObject(ObjectOutputStream s) throws IOException {
    s.defaultWriteObject();
    s.writeLong(getTimeImpl());
}

With this setup, Date The class is free to change the internal representation at will, as long as it always converts it within those milliseconds.

what’s with java.io.Serial Comment? Over time, the rules for numbering became somewhat Byzantine. Annotating all serialization features is good practice and allows checking that the programmer is getting everything right.

an unhappy example java.math.BigIntegerOnce upon a time, a lot of fields were written about that no longer exist, To maintain backwards compatibility, all those field data are still written, class declares a static serialPersistentFields Create an array with all the “fields” that should be saved, thereby overriding the default mechanism of saving non-static and non-transient fields.

@java.io.Serial
private static final ObjectStreamField[] serialPersistentFields = {
    new ObjectStreamField("signum", Integer.TYPE),
    new ObjectStreamField("magnitude", byte[].class),
    new ObjectStreamField("bitCount", Integer.TYPE),
    new ObjectStreamField("bitLength", Integer.TYPE),
    new ObjectStreamField("firstNonzeroByteNum", Integer.TYPE),
    new ObjectStreamField("lowestSetBit", Integer.TYPE)
};

Then writeObject The method writes field data:

@java.io.Serial
private void writeObject(ObjectOutputStream s) throws IOException {
    ObjectOutputStream.PutField fields = s.putFields();
    fields.put("signum", signum);
    fields.put("magnitude", magSerializedForm());
    // The values written for cached fields are compatible with older
    // versions, but are ignored in readObject so don't otherwise matter.
    fields.put("bitCount", -1);
    fields.put("bitLength", -1);
    fields.put("lowestSetBit", -2);
    fields.put("firstNonzeroByteNum", -2);
    s.writeFields();
}

java.time.localDate The class handles serialization more elegantly. This bypasses normal serialization with writeReplace Method:

@java.io.Serial
private Object writeReplace() {
    return new Ser(Ser.LOCAL_DATE_TYPE, this);
}

This means that a Ser Object is written instead of object stream LocalDate object.

He Ser Object writes year, month and day as one int and two bytewhether LocalDate There are three example variables of type int, shortAnd short,

.png

Other than this, Ser uses object Externalizable Interface for slightly more efficient serialization, but that’s not important right now.

The important thing is that classes can take control over serialization, writing and reading data that is stable and independent of the current implementation.

Why did it fail?

So, let’s recap. valhalla wants to change LocalDate field from int,short,short To int,byte,byteAnd the numbering is flexible enough to accommodate that, Because LocalDate had the foresight to abandon the default mechanism in favor of writeReplace,

And that really works. You can freely exchange the order LocalDate Examples between Java 25 and Valhalla Early Access builds.

but it fails while serializing class object LocalDate.class,

To find out the reason, one needs to delve deeper into the wire protocol. A class object is saved as

0x76 classDesc

Where? classDesc Contains non-static non-transient fields

If they do not match, deserialization fails for the class object, because the fields are not compatible.

Let him in. Even if fields are completely ignored for serialization, their mismatch causes matching failure class object,

There is a solution. LocalDate The class needs to deny knowledge of any fields for the purpose of serialization:

private static final ObjectStreamField[] serialPersistentFields = {};

Then the fields become compatible again because the missing fields are fixed. https://bugs.openjdk.org/browse/JDK-8371410 suggests just this solution.

Hopefully this will happen, so that the arrays LocalDate They can be flattened, as they should be.

conclusion

Everyone loves to hate serialization, and for once, everyone is right.

If you really want to serialize your own class, don’t just use the default one that dumps the instance fields. Instead, design a wire format that is stable over the long term. Use writeReplace Mechanism for writing those static data, e.g. LocalDate Does. set serialVersionUID Up to 42L, or whatever your preferred constant. And finally, set serialPersistentFields for an empty array. Or nullBut those are two more characters.

notes

With a Mastodon account (or any account on Fediverse), please visit this link to add a comment.

Not on Fediverse yet? Comment below with Talkyard.



Leave a Comment