Tuesday, October 28, 2014

The Internet is running in debug mode

With the rise of the Web, textual encodings like xml/JSON have become very popular. Of course textual message encoding has many advantages from a developer's perspective. What most developers are not aware of, is how expensive encoding and decoding of textual messages really is compared to a well defined binary protocol.

Its common to define a system's behaviour by its protocol. Actually, a protocol messes up two distinct aspects of communication, namely:

  • Encoding of messages
  • Semantics and behavior (request/response, signals, state transition of communication parties ..)

Frequently (not always), these two very distinct aspects are mixed up without need. So we are forced to run the whole internet in "debug mode", as 99% of webservice and webapp communication is done using textual protocols.

The overhead in CPU consumption compared to a well defined binary encoding is factor ~3-5 (JSON) up to >10-20 (XML). The unnecessary waste of bandwith also adds to that greatly (yes you can zip, but this in turn will waste even more CPU).

I haven't calculated the numbers, but this is environmental pollution at big scale. Unnecessary CPU consumption to this extent wastes a lot of energy (global warming anyone ?).

Solution is easy:
  • Standardize on some simple encodings (pure binary, self describing binary ("binary json"), textual)
  • Define the behavioral part of a protocol (exclude encoding)
  • use textual encoding during development, use binary in production.
Man we could save a lot of webserver hardware if only http headers could be binary encoded ..


  1. There go all our pretty logs and quick curl tests and such. Text-protocols allow quick look into what's going on binary protocols don't. As far as I last checked services don't self heel - a human has to jump in. Which might be a reflection of other things going wrong but that is how it is :)

    1. I think that's a narrow view. In fact, text is also binary encoded. Its because of standardization on how to read e.g. UTF-8, tools "know" how to decode it. One could standardize on a self-describing binary format and provide a similar tool chain of "pretty printers" without problems (think of it like a "special" charset encoding). However in the first place protocol behaviour must not be mixed up with message encoding

    2. """ and provide a similar tool chain of "pretty printers" without problems (think of it like a "special" charset encoding)"""

      This is an even more narrow view. Text formats work with what we already have, here and now. What you propose is replacing for something that's not out there, is not inoperable, and most tools don't know about it.

      First get the tooling you say it's easy to provide "without problems", then ask for the change to binary formats.

    3. I'd like to remind you the internet should serve users not developers :). Anyway Http 2 will solve the issues. Took only like 15 years and will take like 10 years until its widely adopted :)

  2. I fully agree in your desire to separate protocol handling from that of encoding (format). Doing this is under-appreciated way to both decouple handling, and to allow for more efficient handling, by choosing the most suitable encoding for context. Sometimes this should be a well-known textual format (for inter-operability, flexibility and diagnostics); other times a more compact binary representation makes more sense. Protocols should not unnecessarily limit the choice.

    I am not sure I share your concern on performance aspects: without disputing specific encoding numbers, I just question their significant in big picture. In case of HTTP, for example, decoding of Header values is a miniscule part of handling that even if it was changed to binary encoding, benefits in many cases would not be significant at all. Payload remains as-is, and the real complexity in HTTP comes from managing connections, state, liveness checks and other protocol level aspects.

    1. regarding the Http-Header I left out context. E.g. for a low-latency, many client long polling http server, the payload consists of a single sequence number (of last received message), so header parsing indeed adds significant overhead in case there are no pending messages (processing is session lookup+sequence number comparision) for this special case. Another example is DOS-protection. It eats significant processng time to weed out DOS requests from application requests.

    2. Ofc in generaö you are right in that "header parsing" is not a good example regarding the big picture...

    3. Any in-between component (proxy, reverseproxy) needs to encode and decodes (parse) headers, it's probably f relevance especially for tiny payloads typical for e.g. webservice remote calls

  3. My best citation on the mec.symp.group ...The next time i've to write another Json parser/serializer (brrrr Jackson...:() i'll pray that someone listen to you..
    :) thanks Rüdiger

    1. You are doing jackson wrong, it helps dealing with json and afaik is pretty fast, it has not invented JSon ..

    2. The java binding and the streaming API are not free..in any way.Although Jackson is "pretty" fast it not deliver any zero-copy (AFAIK) ability in the serialisation stage...and produce TONS of garbage.Comparing it with a serious serializer ( hand-made?) and really GC-free is simple. But that's another story...so far if i'll have to send a long why on earth i've to send more than 8 bytes? ^^ P.s. Jackson is not a bad tool per se and i avere with you that is a great help if you don't want to deal directly with Json...

    3. Hm .. do you have kind of reusable opensource variant of a zero copy, low garbage json parser ? I'd be interested in something like that, as Jackson doubles some stuff i already do in the serialization layer, such that JSon-Codec is well below what would be possible.

    4. Hi Rüdiger,
      I've only custom own-rolled libs that i've developed for my own needs...undocumented too :P
      But TextWire of https://github.com/OpenHFT/Chronicle-Bytes looks very promising...if you'll wait few weeks i've contacted Peter Lawrey to contribute to this repo and maybe there will be a little more docs and example for it :)

    5. Wrong repo sorry :P
      I really need a coffee this morning...https://github.com/OpenHFT/Chronicle-Wire

    6. Interesting bottom up.approach (many "planned" features though ;) ). In contradiction fast-serialization goes top down providing different wire formats to represent serialized object graphs (binary,json). Maybe i could add a chronicle wire Codec to fst once C-wire is in a more mature state

  4. MsgPack is efficient, schema-less, and has a 1:1 mapping with JSON (unlike BSON despite its name).

    1. I am aware of msgpack. I even tried to build a codec for fast serialization based on msgpack but somehow lost track. Might be a better alternative to json for actor <=> javascript remoting. Are there any java benchmarks ?

  5. I used to push Sun's XDR [1] for this very reason. Client-server apps with binary protocols, too. That's because the web was more inefficient, complex, and insecure in about every area. It was crap. It's main advantages were the networking effect, instant distribution, and widespread compatibility of HTML/JS. We could've just fixed the problems in our native C/S and P2P models but adopted web instead.

    Two other good alternatives from long ago were Juice [2] for applets and Globe [3] for WAN architecture.

    [1] https://en.wikipedia.org/wiki/External_Data_Representation

    [2] http://www.modulaware.com/mdlt69.htm

    [3] http://www.cs.vu.nl/~philip/globe/

    Nick P

    1. wow, the globe project looks interesting. Annoyed by lack of abstraction and poor performance of existing distributed application products, I am active in a somewhat similar direction: http://ruedigermoeller.github.io/kontraktor/ . Well its actually mostly JVM bound erm and not that global ;).

  6. We use binary protocols from day 1 at Aerospike - that's one of the "small" reasons we massively reduce server counts compared to other databases. You might be surprised how hard it is to fight against an entire industry - hardware companies don't like getting cut out, cloud companies don't like getting cut out, open source companies that charge by node count don't like getting cut out. All of these guys pay lip service to efficiency, then bury technology that is actually more efficient. Flash (SSD) storage is similar - you end up paying a lot less for most use cases compared to DRAM and compared to Rotational, but only a handful of database companies have optimized for Flash.

    1. Agree. In addition there is widespread lack of knowledge of what is technically possible. People's gut has adopted to crappy tech. Premature scaleout dominates.

  7. It's better to use binary protocol for service to service communication, much more efficient than any text protocol, takes less bandwidth too.

  8. This article is very informative and easy to understand. Thank you for sharing!

  9. I read this article. I think You put a lot of effort to create this article. I appreciate your work.
    thesis Writing Service

  10. Devops is not a Tool.Devops Is a Practice, Methodology, Culture or process used in an Organization or Company for fast collaboration, integration and communication between Development and Operational Teams. In order to increase, automate the speed of productivity and delivery with reliability.

    python training in bangalore
    aws training in bangalore
    artificial intelligence training in bangalore
    data science training in bangalore
    machine learning training in bangalore
    hadoop training in bangalore
    devops training in bangalore

  11. Gaining Python certifications will validate your skills and advance your career.
    python certification

  12. Good article about Java. There's a lot of good points here and you explained them very well. www.spectrummobile.com/activate

  13. Good Post
    Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
    video editing studios in hyderabad
    short film editors in hyderabad
    corporate video editing studio in hyderabad
    ad making company in hyderabad

  14. Nice Blog
    "Pressure Vessel Design Course is one of the courses offered by Sanjary Academy in Hyderabad. We have offer professional
    Engineering Course like Piping Design Course,QA / QC Course,document Controller course,pressure Vessel Design Course,
    Welding Inspector Course, Quality Management Course, #Safety officer course."
    Piping Design Course in India­
    Piping Design Course in Hyderabad
    QA / QC Course
    QA / QC Course in india
    QA / QC Course in Hyderabad
    Document Controller course
    Pressure Vessel Design Course
    Welding Inspector Course
    Quality Management Course
    Quality Management Course in india
    Safety officer course

  15. If you are stuck with your online management assignment then in this case you can opt for our Database Management Assignment help. we provide the best assignment assignment help online.
    We also provide Advanced Database Management System help. for students across the globe.
    for more information contact us +16692714848