Saturday, October 12, 2013

Still using Externalizable to get faster Serialization with Java ?


Update: When turning off detection of cycles, performance can be improved even further (FSTConfiguration.setUnshared(true)). However in this mode an object referenced twice is written twice.

According to popular belief, the only way to go is handcrafted implementation of the Externalizable interface in order to get fast Java Object serialization.

Manual coding of "readExternal" and "writeExternal" methods is an errorprone and boring task. Additionally, with each change of a class's fields, the externalizable methods need adaption.

In contradiction to popular belief, a good implementation of generic serialization can be faster than handcrafted implementation of the Externalizable interface 99% if not 100% of the time.

Fortunately I managed to save the world with the fast-serialization library by addressing the disadvantages of Serialization vs Externalizable.

The Benchmark

The following class will be benchmarked.
public class BlogBench implements Serializable {
    public BlogBench(int index) {
        // avoid benchmarking identity references instead of StringPerf
        str = "Some Value "+index;
        str1 = "Very Other Value "+index;
        switch (index%3) {
            case 0: str2 = "Default Value"; break;
            case 1: str2 = "Other Default Value"; break;
            case 2: str2 = "Non-Default Value "+index; break;
        }
    }
    private String str;
    private String str1;
    private String str2;
    private boolean b0 = true;
    private boolean b1 = false;
    private boolean b2 = true;
    private int test1 = 123456;
    private int test2 = 234234;
    private int test3 = 456456;
    private int test4 = -234234344;
    private int test5 = -1;
    private int test6 = 0;
    private long l1 = -38457359987788345l;
    private long l2 = 0l;
    private double d = 122.33;
}
To implement Externalizable, a copy of the class above is made but with Externalizable implementation
(source is here).
The main loop for fast-serialization is identical, I just replace "ObjectOutputStream" with "FSTObjectOutput" and "ObjectInputStream" with "FSTObjectInput".
Result:

  • Externalizable performance with JDK-Serialization is much better compared to Serializable
  • FST manages to serialize faster than manually written "read/writeExternal' implementation

Size is 319 bytes for JDK Serializable, 205 for JDK Externalizable, 160 for FST Serialization. Pretty big gain for a search/replace operation vs handcrafted coding ;-). BTW if the "Externalizable" class is serialized with FST it is still slightly slower than letting FST do generic serialization.

There is still room for improvement ..

The test class is rather small, so setup + allocation of the Input and Output streams take a significant part on the times measured. Fortunately FST provides mechanisms to reuse both FSTObjectInput and FSTObjectOutput. This yields ~200ns better read and write times.

So "new FSTObjectOutput(inputStream)" is replaced with
FSTConfiguration fstConf = FSTConfiguration.getDefaultConfiguration();
...
fstConf.getObjectOutput(bout)
There is even more improvement ..

Since Externalizable does not need to track references and this is not required for the test class, we turn off reference tracking for our sample by using the @Flat annotation. We can also make use of the fact, "str3" is most likely to contain a default value ..

@Flat
public class BlogBenchAnnotated  implements Serializable {
    public BlogBenchAnnotated(int index) {
        // avoid benchmarking identity references instead of StringPerf
        str = "Some Value "+index;
        str1 = "Very Other Value "+index;
        switch (index%3) {
            case 0: str2 = "Default Value"; break;
            case 1: str2 = "Other Default Value"; break;
            case 2: str2 = "Non-Default Value "+index; break;
        }
    }
    @Flat private String str;
    @Flat private String str1;
    @OneOf({"Default Value","Other Default Value"})
    @Flat private String str2;


and another one ..

To be able to instantiate the correct class at readtime, the classname must be transmitted. However in many cases both reader and writer know (at least most of) serialized classes at compile time. FST provides the possibility to register classes in advance, so only a number instead of a full classname is transmitted.
FSTConfiguration.getDefaultConfiguration().registerClass(BlogBenchAnnotated.class);



What's "Bulk"

Setup/Reuse of Streams actually require some nanoseconds, so by benchmarking just read/write of a tiny objects, a good part of per-object time is stream init. If an array of 10 BenchMark objects is written, per object time goes <300ns per object read/write.
Frequently an application will write more than one object into a single stream. For RPC encoding applications, a kind of "Batching" or just writing into the same stream calling "flush" after each object are able to actually get <300ns times in the real world. Of course Object Reference sharing must be turned off then (FSTConfiguration.setShared(false)).

For completeness: JDK (with manual Externalizable) Bulk yields 1197 nanos read and 378 nanos write, so it also profits from less initilaization. Unfortunately reuse of ObjectInput/OutputStream is not that easy to achieve mainly because ObjectOutputStream already writes some bytes into the underlying stream as it is instantiated.

Note that if (constant) initialization time is taken out of the benchmarks, the relative performance gains of FST are even higher (see benchmarks on fast serialization site).

Links:

Source of this Benchmark
FST Serialization Library (moved to github from gcode recently)



32 comments:

  1. I am planning to use the fast-serialization library, but first I have a question. Is it possible to define an alternative classloader in the reader?

    Another, do you have some documentation regarding the internal implementation of fastcast?

    ReplyDelete
  2. Since 1.55 there is the possibility to set a ClassLoader at the shared FSTConfiguration object. If you need a classloader per-InputStream you currently have to create a new FSTConfiguration with each Stream which is very slow. Having a distinct Classloader for each InputStream will come with 2.0 (other people requested). However you can workaround it with the 1.x version by setting a custom classloader in FSTConfiguration and implement client-specific classlaoding using a ThreadLocal.
    Regarding Fast-Cast: There is the Wiki usage dokumentation and the source code :-). The algorithms used are also explained in the documentation. Source is poor documented. As fast-cast has not a broad test coverage yet, I'd be happy to help out with responsive support+issue fixing in case.

    ReplyDelete
  3. I am testing fast-serialization library, but i replace ObjectInputStream and ObjectOutputStream in a socket and my app stoped work.

    client
    out = conf.getObjectOutput(s.getOutputStream());
    out.writeObject(msg);
    out.flush();
    s is a socket

    In the other side:

    Server
    FSTObjectInput in = conf.getObjectInput(str.getInputStream());
    ###Stop here....
    JCL_message msg = (JCL_message) in.readObject();


    André Luís


    ReplyDelete
    Replies
    1. Oops, we always serialize from/to byte arrays which then are put int buffers (NIO). Probably streaming from/to Blocking Streams is untested. I'll check this tomrrow.
      I've opened a ticket here:
      https://github.com/RuedigerMoeller/fast-serialization/issues/10

      Delete
    2. Thanks, I'm developing a middleware for high performance computing and would like to test the FST as a way to optimize the communication.

      Delete
  4. FST is used in productoin at Eurex Exchange in a 80-Node clustered application having up to 150k transactions per second. Using fst gave a major speed boost here. Standard Object Serialization is very bad with very short messages (remote calls) due to init time issues.

    Analysis of your issue:

    FST cannot handle blocking streams, as this would require to fetch lieterally each single byte from the underlying stream, which is very expensive. Because FST tries to prefetch when reading, it runs into a blocking read. Making this possible would require changes which would slow down a lot.

    Instead when dealing with blocking streams you have to apply a minor tweak:

    Sender:
    Serialize to a byte[], then write len+byte[]

    Receiver:
    len = readInt, then read byte[len], the deserialize

    see
    https://github.com/RuedigerMoeller/fast-serialization/blob/master/src/test_nojunit/java/gitissue10/GitIssue10.java

    for an example with sockets.

    With FST:
    time for 10000 req/resp (20000 encodes, 20000 decodes) 681
    time for 10000 req/resp (20000 encodes, 20000 decodes) 680
    time for 10000 req/resp (20000 encodes, 20000 decodes) 685
    time for 10000 req/resp (20000 encodes, 20000 decodes) 683
    time for 10000 req/resp (20000 encodes, 20000 decodes) 691
    time for 10000 req/resp (20000 encodes, 20000 decodes) 678

    With JDK:
    time for 10000 req/resp (20000 encodes, 20000 decodes) 1926
    time for 10000 req/resp (20000 encodes, 20000 decodes) 1909
    time for 10000 req/resp (20000 encodes, 20000 decodes) 1892
    time for 10000 req/resp (20000 encodes, 20000 decodes) 1901
    time for 10000 req/resp (20000 encodes, 20000 decodes) 1908
    time for 10000 req/resp (20000 encodes, 20000 decodes) 1907

    Also some notes:
    a) You cannot reuse JDK's ObjectOutputstream to write series of objects to a socket,
    as the ObjectOutputStream will keep a reference any Object written. This will cause a memory leak.
    b) for high performance use NIO, else you'll need a thread for each client.

    ReplyDelete
    Replies
    1. Er ..

      kind regards,
      Rüdiger
      :-)

      Delete
    2. Note: link has changed to https://github.com/RuedigerMoeller/fast-serialization/blob/1.x/src/test_nojunit/java/gitissue10/GitIssue10.java

      Delete
  5. Thank you very much.
      I will test your solution. My application uses threads to handle each client. I will try NIO to see what is faster.

    André

    ReplyDelete
    Replies
    1. Threads get really slow with the number of clients. So if you expect more than say 5 clients, NIO will pay off a lot, as processing several clients within a single thread is much more cache friendly, alos blocking a thread (as required by sockets) is expensive. Check out the netty project. it supports zero copy NIO.

      Delete
  6. Thank you admin for sharing this valuable post.
    JAVA Training Chennai

    ReplyDelete
  7. Thank you for your fast serialization it is extremely super fast. I have some performance issues while trying the code
    to serialize
    configuration.asByteArray((Serializable)obj);

    and deserialize where data is a byte array.
    configuration.asObject(data);

    For the most part it works great when my object size is medium almost 2.5 MB the time taken to serialize jumps to over 600 MS. I ran the metrics for 11,000 iterations across 2 threads and 50+ iterations took anywhere from 100 to 700 ms to complete. The average was 47 ms. I am assuming there is some thread synchronization going on which makes the metrics look bad. I would appreciate your help in bring it down to resonable levels.

    ReplyDelete
    Replies
    1. Hi, FST is used + tested for message encoding and persistance of smaller object graphs (<5k) in a massive mutlitthreaded environment. For very large objects performance is known to degrade (still being better as alternatives).

      Can you provide a runnable Test case and add an issue at

      https://github.com/RuedigerMoeller/fast-serialization/issues

      please ?

      If there is a corner case performance issue I'll can fix it as long I am able to reproduce it.

      Thanks + regards,
      Rüdiger

      Delete
    2. Also note that 'asByteArray' and 'fromByteArray' are convenience methods producing some garbage (unnecessary array alloc+copy). Its not the fastest way to use FST (however good enough for 'normal' requrirements).

      Delete
    3. Just a note: I just realized we are using a FSTConfiguration per Thread to avoid contention. Do this for high througput Apps, where there is little processing and only multithreaded encoding/decoding of small objects going on.

      E.g. ThreadLocals like:
      ThreadLocal cfg = new ThreadLocal() {
      @Override
      protected Object initialValue() {
      return FSTConfiguration.createDefaultConfiguration();
      }
      };

      Delete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Does it support backward compatibility ? e.g. after writing data to a file(from an object), I change object's variable , will it still be able to read existing variables'(for that object) data from the file.

    ReplyDelete
    Replies
    1. since ~2.20 versioning using annotations is supported. For ultimative recovery you can use MinBin codec which allows to read files without access to original classes. Also with 2.30 a JSon codec will be added. Note these codecs are an order of magnitude slower than the binary codec used by default.

      Delete
  11. Its not possible to automagically support this, as fieldNames are not written (slow!). However current 2.x version supports adding fields keeping backward compatibility using the "Version" annotation.
    Another way to tackle this issue is (a) explicit conversion using kind of an export format or (b) create a subclass of the original data class in order to extend (c) use (slower) FST MinBin codec, which also serializes fieldnames, so it can handle this similar to JSon.

    ReplyDelete
  12. Hi,
    coul you compare FST benchmarks with UKA.transport by JavaParty (http://www.ipd.uka.de/JavaParty/ukatransport.html). It is old solution but I hope can be compared.

    ReplyDelete
    Replies
    1. They a hardly comparable as FST is completely compatible to standard JDK interfaces, this means it will use the existing serialization implementation of JDK system classes, so you don't have to change anything on your source FSTObjectOutput is a compatible + faster replacement for ObjectOutputStream.

      In contrast ukatransport requires to implement special interfaces, provide specific marshalling methods and constructors. Besides the effort, this also disables serialization of JDK classes or serializable classes of dependent libraries.

      Emulating the weird mechanics of stock JDK-Serialization can have a cost, the good news is, FST enables to override slow implementations in a non invasive fashion (registering of serializers).

      Beside that, I'll make a simple bench once I find time ..

      regards,
      ruediger

      Delete
  13. Hello, FST is a wonderful tool! I was wondering, if is it possible to use FST for JBoss remote invocations instead of JBossSerialization?
    Best regards

    ReplyDelete
    Replies
    1. From the feature set provided (JDK-compatible) it should be absolutely possible. However:
      1) Its likely jboss remoting is tied to their serialization implementation using some special extensions of jboss serialization.
      2) Class loading inside the app server might be an issue, it might be required to change/enhance classloader setting in FST
      3) If JBoss remoting is a synchronous RPC implementation, network latency is the bottleneck not serialization. Unfortunately sync communication is used frequently in the jee stack which easily reduces throughput by a factor of 10, so serialization performance improvements are dwarfed.

      Delete
  14. How can I upgrade to 2.38 if my classes were earlier serialized using version 1.63? Would I still be able to deserialize my classes?

    ReplyDelete
    Replies
    1. Pls see https://github.com/RuedigerMoeller/fast-serialization/issues/62

      Delete
    2. Ah that means no backward compatibility, that is really sad. Any chances it can happen again e.g. version 3.0 won't be compatible with 2.0 serialized objects?

      Delete
    3. Its unlikely that the format will change, however fst is "performance first", so I won't hesitate to break backward compatibility in case. You should not rely on it for long term persistence or be prepared to convert your data. FST's main target is distributed systems communication encoding / temporal data.

      Keeping compatibility has a significant performance cost (+ huge addition of test effort), it does not make sense to slow things down for all users to satisfy the needs of edgy (mis-) use cases.
      If you need something slow and backward compatible, there is JDK serialization :).

      Note that changing classes frequently breaks compatibility of existing serialized data, so mid term any software needs to deal with conversion anyway.

      Delete
  15. Hi
    Thanks for this awesome java serilaiser, FST is much better than java native serialisation.


    Thanks

    ReplyDelete
  16. Thanks for sharing this information and keep updating us. This is informative and really useful for me.

    JAVA Training Institute in Delhi
    Network Training Company in Delhi
    Software Testing Training In Delhi

    ReplyDelete