Monday, December 22, 2014

A persistent KeyValue Server in 40 lines and a sad fact

This post originally was submitted to the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

It also has been published on voxxed.com .

picking up Peters well written overview on the uses of Unsafe, i'll have a short fly-by on how low level techniques in Java can save development effort by enabling a higher level of abstraction or allow for Java performance levels probably unknown to many.

My major point is to show that conversion of Objects to bytes and vice versa is an important fundamental, affecting virtually any modern java application.

Hardware enjoys to process streams of bytes, not object graphs connected by pointers as "All memory is tape" (M.Thompson if I remember correctly ..).

Many basic technologies are therefore hard to use with vanilla Java heap objects:
  • Memory Mapped Files - a great and simple technology to persist application data safe, fast & easy.
  • Network communication is based on sending packets of bytes
  • Interprocess communication (shared memory)
  • Large main memory of today's servers (64GB to 256GB). (GC issues)
  • CPU caches work best on data stored as a continuous stream of bytes in memory
so use of the Unsafe class in most cases boil down in helping to transform a java object graph into a continuous memory region and vice versa either using
  • [performance enhanced] object serialization or
  • wrapper classes to ease access to data stored in a continuous memory region.
(Code & examples of this post can be found here)

    Serialization based Off-Heap

    Consider a retail WebApplication where there might be millions of registered users. We are actually not interested in representing data in a relational database as all needed is a quick retrieve of user related data once he logs in. Additionally one would like to traverse the social graph quickly.

    Let's take a simple user class holding some attributes and a list of 'friends' making up a social graph.


    easiest way to store this on heap, is a simple huge HashMap.
    Alternatively one can use off heap maps to store large amounts of data. An off heap map stores its keys and values inside the native heap, so garbage collection does not need to track this memory. In addition, native heap can be told to automagically get synchronized to disk (memory mapped files). This even works in case your application crashes, as the OS manages write back of changed memory regions.

    There are some open source off heap map implementations out there with various feature sets (e.g. ChronicleMap), for this example I'll use a plain and simple implementation featuring fast iteration (optional full scan search) and ease of use.

    Serialization is used to store objects, deserialization is used in order to pull them to the java heap again. Pleasantly I have written the (afaik) fastest fully JDK compliant object serialization on the planet, so I'll make use of that.


     Done:
    • persistence by memory mapping a file (map will reload upon creation). 
    • Java Heap still empty to serve real application processing with Full GC < 100ms. 
    • Significantly less overall memory consumption. A user record serialized is ~60 bytes, so in theory 300 million records fit into 180GB of server memory. No need to raise the big data flag and run 4096 hadoop nodes on AWS ;).

    Comparing a regular in-memory java HashMap and a fast-serialization based persistent off heap map holding 15 millions user records, will show following results (on a 3Ghz older XEON 2x6):

    consumed Java Heap (MB)Full GC (s)Native Heap (MB)get/put ops per srequired VM size (MB)
    HashMap6.865,0026,03903.800.000,00
    12.000,00
    OffheapMap (Serialization based)
    63,00
    0,026
    3.050
    750.000,00
    500,00

    [test source / blog project] Note: You'll need at least 16GB of RAM to execute them.

    As one can see, even with fast serialization there is a heavy penalty (~factor 5) in access performance, anyway: compared to other persistence alternatives, its still superior (1-3 microseconds per "get" operation, "put()" very similar).

    Use of JDK serialization would perform at least 5 to 10 times slower (direct comparison below) and therefore render this approach useless.


    Trading performance gains against higher level of abstraction: "Serverize me"


    A single server won't be able to serve (hundreds of) thousands of users, so we somehow need to share data amongst processes, even better: across machines.

    Using a fast implementation, its possible to generously use (fast-) serialization for over-the-network messaging. Again: if this would run like 5 to 10 times slower, it just wouldn't be viable. Alternative approaches require an order of magnitude more work to achieve similar results.

    By wrapping the persistent off heap hash map by an Actor implementation (async ftw!), some lines of code make up a persistent KeyValue server with a TCP-based and a HTTP interface (uses kontraktor actors). Of course the Actor can still be used in-process if one decides so later on.


    Now that's a micro service. Given it lacks any attempt of optimization and is single threaded, its reasonably fast [same XEON machine as above]:
    • 280_000 successful remote lookups per second 
    • 800_000 in case of fail lookups (key not found)
    • serialization based TCP interface (1 liner)
    • a stringy webservice for the REST-of-us (1 liner).
    [source: KVServer, KVClient] Note: You'll need at least 16GB of RAM to execute the test.

    A real world implementation might want to double performance by directly putting received serialized object byte[] into the map instead of encoding it twice (encode/decode once for transmission over wire, then decode/encode for offheaping map).

    "RestActorServer.Publish(..);" is a one liner to also expose the KVActor as a webservice in addition to raw tcp:




    C like performance using flyweight wrappers / structs

    With serialization, regular Java Objects are transformed to a byte sequence. One can do the opposite: Create  wrapper classes which read data from fixed or computed positions of an underlying byte array or native memory address. (E.g. see this blog post).

    By moving the base pointer its possible to access different records by just moving the the wrapper's offset. Copying such a "packed object" boils down to a memory copy. In addition, its pretty easy to write allocation free code this way. One downside is, that reading/writing single fields has a performance penalty compared to regular Java Objects. This can be made up for by using the Unsafe class.

    "flyweight" wrapper classes can be implemented manually as shown in the blog post cited, however as code grows this starts getting unmaintainable.
    Fast-serializaton provides a byproduct "struct emulation" supporting creation of flyweight wrapper classes from regular Java classes at runtime. Low level byte fiddling in application code can be avoided for the most part this way.



    How a regular Java class can be mapped to flat memory (fst-structs):


    Of course there are simpler tools out there to help reduce manual programming of encoding  (e.g. Slab) which might be more appropriate for many cases and use less "magic".

    What kind of performance can be expected using the different approaches (sad fact incoming) ?

    Lets take the following struct-class consisting of a price update and an embedded struct denoting a tradable instrument (e.g. stock) and encode it using various methods:

    a 'struct' in code

    Pure encoding performance:

    Structsfast-Ser (no shared refs)fast-SerJDK Ser (no shared)JDK Ser
    26.315.000,007.757.000,005.102.000,00649.000,00644.000,00




    Real world test with messaging throughput:

    In order to get a basic estimation of differences in a real application, i do an experiment how different encodings perform when used to send and receive messages at a high rate via reliable UDP messaging:

    The Test:
    A sender encodes messages as fast as possible and publishes them using reliable multicast, a subscriber receives and decodes them.

    Structsfast-Ser (no shared refs)fast-SerJDK Ser (no shared)JDK Ser
    6.644.107,004.385.118,003.615.584,0081.582,0079.073,00

    (Tests done on I7/Win8, XEON/Linux scores slightly higher, msg size ~70 bytes for structs, ~60 bytes serialization).

    Slowest compared to fastest: factor of 82. The test highlights an issue not covered by micro-benchmarking: Encoding and Decoding should perform similar, as factual throughput is determined by Min(Encoding performance, Decoding performance). For unknown reasons JDK serialization manages to encode the message tested like 500_000 times per second, decoding performance is only 80_000 per second so in the test the receiver gets dropped quickly:

    "
    ...
    ***** Stats for receive rate:   80351   per second *********
    ***** Stats for receive rate:   78769   per second *********
    SUB-ud4q has been dropped by PUB-9afs on service 1
    fatal, could not keep up. exiting
    "
    (Creating backpressure here probably isn't the right way to address the issue ;-)  )

    Conclusion:
    • a fast serialization allows for a level of abstraction in distributed applications impossible if serialization implementation is either
      - too slow
      - incomplete. E.g. cannot handle any serializable object graph
      - requires manual coding/adaptions. (would put many restrictions on actor message types, Futures, Spore's, Maintenance nightmare)
    • Low Level utilities like Unsafe enable different representations of data resulting in extraordinary throughput or guaranteed latency boundaries (allocation free main path) for particular workloads. These are impossible to achieve by a large margin with JDK's public tool set.
    • In distributed systems, communication performance is of fundamental importance. Removing Unsafe is  not the biggest fish to fry looking at the numbers above .. JSON or XML won't fix this ;-).
    • While the HotSpot VM has reached an extraordinary level of performance and reliability, CPU is wasted in some parts of the JDK like there's no tomorrow. Given we are living in the age of distributed applications and data, moving stuff over the wire should be easy to achieve (not manually coded) and as fast as possible. 

    Addendum: bounded latency

    A quick Ping Pong RTT latency benchmark showing that java can compete with C solutions easily, as long the main path is allocation free and techniques like described above are employed:



    [credits: charts+measurement done with HdrHistogram]

    This is an "experiment" rather than a benchmark (so do not read: 'Proven: Java faster than C'), it shows low-level-Java can compete with C in at least this low-level domain.
    Of course its not exactly idiomatic Java code, however its still easier to handle, port and maintain compared to a JNI or pure C(++) solution. Low latency C(++) code won't be that idiomatic either ;-)

    About me: I am a solution architect freelancing at an exchange company in the area of realtime GUIs, middleware, and low latency CEP (Complex Event Processing) nightly hacking at https://github.com/RuedigerMoeller.


    70 comments:

    1. why off heap map not part of jdk?

      ReplyDelete
    2. I've gotta ask, where is the gif of the kid falling from?

      ReplyDelete
    3. There might be an interesting article here, but those animations are phenomenally annoying.

      ReplyDelete
    4. It's a live action parody of the game QWOP

      ReplyDelete
    5. Re: animated gif, it looks conspicuously like somebody re-enacting the game QWOP

      ReplyDelete
    6. The GIF of the kid falling is someone cosplaying the QWOP game at the Anime North 2013 convention (or maybe 2014, 2012).

      ReplyDelete
      Replies
      1. wasn't aware .. just browsed it out the internet and found it somehow reminds me on doing zero alloc structs on the JVM :)

        Delete
    7. Unfortunately, Oracle is removing Unsafe from Java 9 release ...

      ReplyDelete
      Replies
      1. fast serialization does not rely on unsafe. If its present it makes use of it (Android compatible), for most applications difference is < 5%. Only exception are large native arrays, the diff there is >100%. MemMapped currently uses Unsafe however its wrapped and could be replaced by a bytebuffer implementation (with some degradation, hope Oracle provides replacement for missing features in J9)

        Delete
    8. Devops is not a Tool.Devops Is a Practice, Methodology, Culture or process used in an Organization or Company for fast collaboration, integration and communication between Development and Operational Teams. In order to increase, automate the speed of productivity and delivery with reliability.

      python training in bangalore
      aws training in bangalore
      artificial intelligence training in bangalore
      data science training in bangalore
      machine learning training in bangalore
      hadoop training in bangalore
      devops training in bangalore

      ReplyDelete
    9. Gaining Python certifications will validate your skills and advance your career.
      python certification

      ReplyDelete
    10. JavaScript is the most widely deployed language in the world
      Javascript Interview Questions

      ReplyDelete
    11. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

      DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

      Good to learn about DevOps at this time.


      devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

      ReplyDelete
    12. If you want to take a great deal from this post then you
      have to apply these strategies to your won blog.

      ReplyDelete
    13. Outstanding blog thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us.
      Machine learning training in chennai
      python machine learning training in chennai
      best training insitute for machine learning

      ReplyDelete
    14. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

      DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

      Good to learn about DevOps at this time.


      devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

      ReplyDelete
    15. Such a wonderful blog on Machine learning . Your blog have almost full information about Machine learning .Your content covered full topics of Machine learning that it cover from basic to higher level content of Machine learning . Requesting you to please keep updating the data about Machine learning in upcoming time if there is some addition.
      Thanks and Regards,
      Machine learning tuition in chennai
      Machine learning workshops in chennai
      Machine learning training with certification in chennai

      ReplyDelete
    16. Thank you for sharing your article. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.

      best openstack training in chennai | openstack course fees in chennai | openstack certification in chennai | openstack training in chennai velachery

      ReplyDelete
    17. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
      Python Course institute in Bangalore

      ReplyDelete
    18. Excellent blog I visit this blog it's really awesome. The important thing is that in this blog content written clearly and understandable. The content of information is very informative.
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training
      Oracle Fusion Financials Online Training
      oracle fusion financials classroom training
      Workday HCM Online Training
      Oracle Fusion HCM Classroom Training

      ReplyDelete
    19. I really liked your article and the photo is super. Thanks you.

      ReplyDelete
    20. A bewildering web journal I visit this blog, it's unfathomably heavenly. Oddly, in this present blog's substance made purpose of actuality and reasonable. The substance of data is informative
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    21. An astounding web diary I visit this blog, it's inconceivably magnificent. Strangely, in this current blog's substance made point of fact and sensible. The substance of information is instructive.
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    22. Thanks for the info And I hope to read this good article again.


      โปรโมชั่นGclub ของทางทีมงานตอนนี้แจกฟรีโบนัส 50%
      เพียงแค่คุณสมัคร Gclub กับทางทีมงานของเราเพียงเท่านั้น
      ร่วมมาเป็นส่วนหนึ่งกับเว็บไซต์คาสิโนออนไลน์ของเราได้เลยค่ะ
      สมัครสมาชิกที่นี่ >>> Gclub online

      ReplyDelete
    23. Excellent Post as always and you have a great post and i like it


      เว็บไซต์คาสิโนออนไลน์ที่ได้คุณภาพอับดับ 1 ของประเทศ
      เป็นเว็บไซต์การพนันออนไลน์ที่มีคนมา สมัคร Gclub Royal1688
      และยังมีเกมส์สล็อตออนไลน์ 1688 slot อีกมากมายให้คุณได้ลอง
      สมัครสมาชิกที่นี่ >>> Gclub Royal1688

      ReplyDelete
    24. A befuddling web diary I visit this blog, it's incredibly grand. Strangely, in this present blog's substance made motivation behind fact and sensible. The substance of information is instructive
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    25. A befuddling web diary I visit this blog, it's incredibly grand. Strangely, in this present blog's substance made motivation behind fact and sensible. The substance of information is instructive
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    26. I really liked your article and the photo is super. Thanks you.

      ReplyDelete
    27. An astounding web diary I visit this blog, it's inconceivably magnificent. Strangely, in this current blog's substance made point of fact and sensible. The substance of information is instructive.
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    28. A befuddling web diary I visit this blog, it's incredibly grand. Strangely, in this present blog's substance made motivation behind fact and sensible. The substance of information is instructive
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    29. TopHomeworkhelper.com have provided the students with instant homework helper and assisted them in improving their academic performance everytime is requested.

      ReplyDelete
    30. A befuddling web diary I visit this blog, it's incredibly grand. Strangely, in this present blog's substance made motivation behind fact and sensible. The substance of information is instructive
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    31. Thank you for excellent article.You made an article that is interesting.
      Tavera car for rent in coimbatore|Indica car for rent in coimbatore|innova car for rent in coimbatore|mini bus for rent in coimbatore|tempo traveller for rent in coimbatore|kodaikanal tour package from chennai

      Keep on the good work and write more article like this...

      Great work !!!!Congratulations for this blog

      ReplyDelete
    32. An overwhelming web journal I visit this blog, it's unfathomably amazing. Unusually, in this present blog's substance made inspiration driving truth and reasonable. The substance of data is enlightening
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    33. An overwhelming web journal I visit this blog, it's unfathomably amazing. Unusually, in this present blog's substance made inspiration driving truth and reasonable. The substance of data is enlightening
      Oracle Fusion Financials Online Training
      Oracle Fusion HCM Online Training
      Oracle Fusion SCM Online Training

      ReplyDelete
    34. Agarwal packers and movers are the best packers as the activities delegated by them are so reliable and smooth that is loved by all. They do not compromise with safety at any step of the move. I liked their services as not a single issue came in their packing and everything was perfect and the staff was quite disciplined one. There was no attitude in their staff. Just enjoy the taste of their service.

      Agarwal Packers Reviews
      Agarwal Packers Feedback
      Agarwal Packers Complaint

      ReplyDelete
    35. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
      Informatica Online Training

      ReplyDelete