Java is the new C: Don't REST ! Revisiting RPC performance and where Fowler might have been wrong ..

Saturday, June 27, 2015

Don't REST ! Revisiting RPC performance and where Fowler might have been wrong ..

[Edit: Title is a click bait of course, Fowler is aware of the async/sync issues, recently posted http://martinfowler.com/articles/microservice-trade-offs.html with clarifying section regarding async. ]

Hello my dear citizens of planet earth ...

There are many good reasons to decompose large software systems into decoupled message passing components (team size + decoupling, partial + continuous software delivery, high availability, flexible scaling + deployment architecture, ...).

With distributed applications, there comes the need for ordered point to point message passing. This is different to client/server relations, where many clients send requests at low rate and the server can choose to scale using multiple threads processing requests concurrently.

Remote Messaging performance is to distributed systems what method invocation performance is for non-distributed monolithic applications.

(guess what is one of the most optimized areas in the JVM: method invocation)

[Edit: with "REST", I also refer to HTTP based webservice style API, this somewhat imprecise]

Revisiting high level remoting Abstractions

There were various attempts at building high-level, location transparent abstractions (e.g. corba, distributed objects), however in general those idea's have not received broad acceptance.

This article of Martin Fowler sums up common sense pretty well:

http://martinfowler.com/articles/distributed-objects-microservices.html

Though not explicitely written, the article implies synchronous remote calls, where a sender blocks and waits for a remote result to arrive thereby including cost of a full network roundtrip for each remote call performed.

With asynchronous remote calls, many of the complaints do not hold true anymore. When using asynchronous message passing, the granularity of remote calls is not significant anymore.

"course grained" processing

remote.getAgeAndBirth().then( (age,birth) -> .. );

is not significantly faster than 2 "fine grained" calls

all( remote.getAge(), remote.getBirth() )

.then( resultArray -> ... );

as both variants include network round trip latency only once.

On the other hand with synchronous remote calls, every single remote method call has a penalty of one network round trip, and only then Fowlers arguments hold.

Another element changing the picture is the availability of "Spores", a snippet of code which can be passed over the network and executed at receiver side e.g.

remote.doWithPerson( "Heinz", heinz -> {
// executed remotely, stream data back to callee
stream( heinz.salaries().sum() / 12 ); finish();
}).then( averageSalary -> .. );

Spore's are implementable effectively with availability of VM's and JIT compilation.

Actor Systems and similar asynchronous message passing approaches have gained popularity in recent years. Main motivation was to ease concurrency and the insight that multithreading with shared data does not scale well and is hard to master in an industrial grade software development environment.

As large servers in essence are "distributed systems in a box", those approaches apply also for distributed systems.

Following I'll test performance of remote invocations of some frameworks. I'd like to proof that established frameworks are far from what is technically possible and want to show that popular technology choices such as REST are fundamentally inept to form the foundation of large and fine grained distributed applications.

Test Participants

Disclaimer: As tested products are of medium to high complexity, there is danger of misconfiguration or test errors, so if anybody has a (verfied) improvement to one of the testcases, just drop me a comment or file an issue to the github repository containing the tests:

https://github.com/RuedigerMoeller/remoting-benchmarks.

I verified by googling forums etc. that numbers are roughly in line with what others have observed.

Features I expect from a distributed application framework:

Ideally fully location transparent. At least there should be a concise way (e.g. annotations, generators) to do marshalling half-automated.
It is capable to map responses to their appropriate request callbacks automatically (via callbacks or futures/promises or whatever).
its asynchronous

products tested (Disclaimer: I am the author of kontraktor):

Akka 2.11
Akka provides a high level programming interface, marshalling and networking details are mostly invisible to application code (full location transparency).
Vert.x 3.1
provides a weaker level of abstraction compared to actor systems, e.g. there are no remote references. Vert.x has a symbolic notion of network communication (event bus, endpoints).
As it's "polyglot", marshalling and message encoding need some manual support.
Vert.x is kind of a platform and addresses many practical aspects of distributed applications such as application deployment, integration of popular technology stacks, monitoring, etc.
REST (RestExpress)
As Http 1.1 based REST is limited by latency (synchronous protocol), I just choosed this more or less randomly.
Kontraktor 3, distributed actor system on Java 8. I believe it hits a sweet spot regarding performance, ease of use and mind model complexity. Kontraktor provides a concise, mostly location transparent high level programming model (Promises, Streams, Spores) supporting many transports (tcp, http long poll, websockets).

Libraries skipped:

finagle - requires me to clone and build their fork of thrift 0.5 first. Then I'd have to define thrift messages, then generate, then finally run it.
parallel universe - at time of writing the actor remoting was not in a testable state ("Galaxy" is alpha), examples are without build files, the gradle build did not work. Once i managed to build, the programs where expecting configuration files which I could not find. Maybe worth a revisit (accepting pull requests as well :) ).

The Test

I took a standard remoting example:

The "Ask" testcase:

The sender sends a message of two numbers, the remote receiver answers with the sum of those 2 numbers. The remoting layer has to track and match requests and responses as there can be tens of thousand "in-flight".

The "Tell" testcase:

Sender sends fire-and forget. No reply is sent from the receiver.

Results

Attention: Don't miss notes below charts.

Platform: Linux Centos 7 dual socket 20 real cores @2.5 GHZ, 64GB ram. As the tests are ordered point to point, none of the tests made use of more than 4 cores.

	tell Sum (msg/second)	ask Sum (msg/second)
Kontraktor Idiomatic	1.900.000	860.000
Kontraktor Sum-Object	1.450.000	795.000
Vert.x 3	200.000	200.000
AKKA (Kryo)	120.000	65.000
AKKA	70.000	64.500
RestExpress	15.000	15.000
Rest >15 connections	48.000	48.000

let me chart that for you ..

Remarks:

Kontraktor 3 outperforms by a huge margin. I verified the test is correct and all messages are transmitted (if in doubt just clone the git repo and reproduce).
Vert.x 3 seems to have a built-in rate limiting. I saw peaks of 400k messages/second however it averaged at 200k (hints for improvement welcome). In addition, the first connecting sender only gets 15k/second throughput. If I stop and reconnect, throughput is as charted.
I tested the very first Vert.x 3 final release. For marshalling fast-serialization (FST) was used (also used in kontraktor). Will update as Vert.x 3 matures
Akka. I spent quite some time on improving the performance with mediocre results. As Kryo is roughly of same speed as fst serialization, I'd expect at least 50% of kontraktor's performance.

Edit: Further analysis shows, Akka is hit by poor serialization performance. It has an option to use Protobuf for encoding, which might improve results (but why kryo did not help then ?).

Implications of using protobuf:
* need each message be defined in a .proto file, need generator to be run
* frequently additional datatransfer is done like "app data => generated messages => app data"
* no reference sharing support, no cyclic object graphs can be transmitted
* no implicit compression by serialization's reference sharing.
* unsure wether the ask() test profits as it did not profit from Kryo as well
* Kryo performance is in the same ballpark as protobuf but did not help that much either.

Smell: I had several people contacting me aiming to improve Akka results. They disappear somehow.

Once I find time I might add a protobuf test. Its a pretty small test program, so if there was an easy fix, it should not be a huge effort to provide it. The git repo linked above contains a maven buildable ready to use project.
REST. Poor throughput is not caused by RestExpress (which I found quite straight forward to use) but by Http1.1's dependence on latency. If one moves a server to other hardware (e.g. different subnet, cloud), throughput of a service can change drastically due to different latency. This might change with Http 2.
Good news is: You can <use> </any> <chatty> { encoding: for messages }, as it won't make a big difference for point to point REST performance.
Only when opening many connections (>20) concurrently, throughput increases. This messes up transaction/message ordering, so can only be used for idempotent operations (a species mostly known from white papers and conference slides, rarely seen in the wild).

Misc Observations

Backpressure

Sending millions of messages as fast as possible can be tricky to implement in a non-blocking environment. A naive send loop

might block the processing thread
build up a large outbound queue as put is faster than take+sending.
can prevent incoming Callbacks from being enqueued + executed (=Deadlock or OOM).

Of course this is a synthetic test case, however similar situations exist e.g. when streaming large query results or sending large blob's to other node's (e.g. init with reference data).

None of the libraries (except rest) did that out of the box:

Kontraktor requires a manual increase of queue sizes over default (default is 32k) in order to not deadlock in the "ask" test. In addition its required to programatically adopt send rate by using the backpressure signal raised by the TCP stack (network send blocks). This can be done non-blocking, "offer()" style.
For VertX i used a periodic task sending a burst of 500 to 1000 messages. Unfortunately the optimal number of messages per burst depends on hardware performance, so the test might need adoption when run on e.g. a Laptop.
For Akka I send 1 million messages each 30 seconds in order to avoid implementation of application level flow control. It just queued up messages and degrades to like 50 msg/s when used naively (big loop).
REST was not problematic here (synchronous Http1.1 anyway). Degraded by default.

Why is kontraktor remoting that faster ?

premature optimization
adaptive batching works wonders, especially when applied to reference sharing serialization.
small performance compromises stack up, reduce them bottom up

Kontraktor actually is far from optimal. It still generates and serializes a "MethodCall() { int targetId, [..], String methodName, Object args[]}" for each message remoted. It does not use Externalizable or other ways of bypassing generic (fast-)serialization.
Throughputs beyond 10 million remote method invocations/sec have been proved possible at cost of a certain fragility + complexity (unique id's and distributed systems ...) + manual marshalling optimizations.

Conclusion

As scepticism regarding distributed object abstractions is mostly performance related, high performance asynchronous remote invocation is a game changer
Popular libraries have room for improvement in this area
Don't use REST/Http for inter-system connect, (Micro-) Service oriented architectures. Point to point performance is horrible. It has its applications in the area of (WAN) web services / platform neutral, easily accessible API's and client/server patterns.
Asynchronous programming is different, requires different/new solution patterns (at source code level). It is unavoidable to learn use of asynchronous messaging primitives.
"Pseudo Synchronous" approaches (e.g. fibers) are good in order to better scale multithreading, but do not work out for distributed systems.
lack of craftsmanship can kill visions.

60 comments:

Jochen WiedmannJune 28, 2015 at 8:11 PM
How about

Data data = remote.getData();
data.getAge(), data.getBirth()
ReplyDelete
Replies
Ben ManesJune 29, 2015 at 3:35 AM
Reviewing your library and benchmarks, it appears that there is bias in favor of kontraktor. Even though you claim to have spent significant time optimizing Akka this isn't evident from the benchmark. Note that I don't use any of these libraries and am just eyeballing the code, but from what I see its an invalid comparison.

The benchmarks are of a single actor with one sender and one consumer. This single-producer / single-consumer (SPSC) throughput test is not a good indicator of concurrency and doesn't model a real application's behavior. This favors kontraktor's design choices much more than Akka's.

Akka uses ForkJoinPool as its default executor, which is good at handling lots of actors being enqueued/dequeued quickly. Kontraktor uses a ThreadPoolExecutor with an unbounded LinkedBlockingQueue, which typically has worse performance. However this isn't evident in the benchmark because the executor is never contended and threads are not context switching rapidly.

Akka is configured with the default mailbox, a ConcurrentLinkedQueue, even though the documentation recommends using a SingleConsumerOnlyUnboundedMailbox when a BalancingDispatcher is not used. Kontraktor uses a multiple producer / single consumer (MPSC) bounded array queue, which is allows to discard messages when overflowing (whereas Akka may not). The array queue performs excellently because it takes advantage of caching effects and is not contended, but at 512k size it is heavy weight. The reason Akka prefers a linked MPSC queue is to avoid wasted memory when having millions of actors in the system (at least that was a condition Victor Klang had when asking for an algorithm on concurrency-interest).

An MPSC linked queue can outperform an array queue under contention by using a combining backoff arena [1]. My implementation is 2.5x faster on a 4-core laptop and matches the performance with no contention (and beats it in the "optimistic" mode). That gap would increase on a larger machine because the caching effects are offset by producer contention. The array queue is faster than ConcurrentLinkedQueue with low/mild contention, and similar at high.

I am sure there are many other differences in configuration and design choices that make the comparison unfair. We all know benchmarking is hard, but you don't acknowledge that or highlight any caveats of your scenario. There's nothing to indicate that you reached out for a peer review or profiled to understand the performance behavior of the different libraries.

Like I said I don't use any of these libraries. Unfortunately if I decided to then I couldn't justify a choice based on your benchmarks because I think your analysis is flawed. Instead I'd have to look at other benchmarks, the api / feature set, and the community. I might even hold the benchmarks against you after seeing many other projects knowingly lie for self promotion, and I hold integrity as one of the primary characteristic an engineer must display.

It would be really interesting to see accurate benchmarks and I hope your current set don't mislead you in development of your library.

[1] https://github.com/ben-manes/caffeine/wiki/SingleConsumerQueue
ReplyDelete
Replies
Sam BESSALAHJuly 1, 2015 at 11:18 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJuly 13, 2015 at 2:34 PM
You are discussing about of good techniques by this post. Remote messaging is best technique and this information is clear my doubts about it. I learn Java from Best Java Development Course Training Institute in Jaipur so its information is very important for me. Please keep updating more information about it.
ReplyDelete
Replies
PaulJuly 17, 2015 at 11:45 AM
Great article, learned couple of things and clear couple of misconception. thanks..
ReplyDelete
Replies
UnknownNovember 13, 2015 at 2:43 PM
I am experimenting with Kontraktor to build a simulation system for testing a component. I'd be interested in understanding better how to use the different actor 'calling' mechanisms. Here is my problem. I have a plain java class that loops through 96 timeslots each representing 15 minutes. During each slot metadata determines the number of each of 2 types of messages that will be sent, either to a MQ queue or a file or both. These are 'metered across the 15 minutes, so sending 900 messages in a 15 minute slot would mean sending 1 per second.

How I implemented this is a loop, in which I use first mesageType1.sendForTheseIds(ArrayList); and then mesageType2.sendForTheseIds(ArrayList); The message sending Actors meter the output of their messages across the 15 minutes. The loop also has a 15 minute sleep before continuing. Given this sleep I do not have issues with messages coming out of order so did no use onSerial.

Is there a better way to implement this scenario, better using the actor system?
ReplyDelete
Replies
AnonymousJanuary 17, 2019 at 10:43 AM
Nice article thanks for sharing valuable information. Artificial Intelligence Training in Bangalore
ReplyDelete
Replies
rohilJanuary 17, 2019 at 10:47 AM
Nice article thanks for sharing valuable information. Artificial Intelligence Training in Bangalore
ReplyDelete
Replies
bashaMarch 11, 2019 at 10:39 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
KiruthiprabhaSeptember 14, 2019 at 3:54 PM
After seeing your article I want to say that the presentation is very good and also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts like this.
MSBI Online Training
MSBI Online Certification
Learn MSBI Course
ReplyDelete
Replies
Chandra Sekhar ReddyOctober 2, 2019 at 8:32 AM
Excellent blog I visit this blog it's really awesome.
Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
video editing studios in hyderabad
short film editors in hyderabad
corporate video editing studio in hyderabad
ad making company in hyderabad
ReplyDelete
Replies
lajwantideviDecember 14, 2019 at 9:14 AM

vidmate
ReplyDelete
Replies
Android Training in JaipurMarch 30, 2020 at 1:37 PM
Thank you for sharing such a great information.Its really nice and informative.hope more posts from you. I also want to share some information recently i have gone through and i had find the one of the best institute visit : https://www.dzone.co.in/java.aspx
ReplyDelete
Replies
Error Code 2000-0146December 23, 2020 at 10:25 AM
I am glad to read about your blog where you have described outstanding information. Check out the way to fix Error Code 2000-0146 . Lean how you can fix it at your own or feel free to call our experts on our toll-free numbers or visit our website to know more!
ReplyDelete
Replies
vivikhapnoiJanuary 6, 2021 at 11:23 AM
An awesome blog thanks a lot for giving me this great opportunity to write on this.

giá vé máy bay đi thượng hải

bay từ việt nam sang anh mất bao lâu

vé máy bay đi mỹ hãng eva

combo quy nhơn flc

combo hà nội đà lạt
ReplyDelete
Replies
vé máy bay từ Nhật Bản về Việt NamMarch 31, 2021 at 5:27 AM
Aivivu đại lý vé máy bay, tham khảo

vé máy bay đi Mỹ giá bao nhiêu

vé máy bay phú quốc hồ chí minh

vé máy bay từ sài gòn ra hà nội

vé máy bay đi đà lạt vietjet

vé máy bay đi Huế vietnam airline

taxi sân bay nội bài
ReplyDelete
Replies
Sarah WilsonApril 20, 2021 at 8:42 AM
This comment has been removed by the author.
ReplyDelete
Replies
Sarah WilsonApril 20, 2021 at 8:43 AM
This comment has been removed by the author.
ReplyDelete
Replies
french bulldog puppiesMay 3, 2021 at 5:06 PM
its been long since i saw a post that's so educative and informational. i will make sure to share this my facebook group. you can also view contents on our websites below.

French Bulldog Puppies For Sale

French Bulldog Breeders

French Bulldog Puppies For Sale Near Me

French Bulldog Puppies For adoption

French Bulldog Puppies

Blue French Bulldog Puppies
ReplyDelete
Replies
siberian husky puppiesMay 4, 2021 at 11:53 AM
it's so refreshing to see a post that talks straight to the point. thanks so much for writing about this it has really helped me with building my experience. thanks a lot

siberian husky puppies for sale near me
Siberian Husky puppies
Siberian Husky puppies for adoption
Siberian Husky puppies breeders near me

white Siberian Husky puppies
ReplyDelete
Replies
french bulldog puppiesMay 30, 2021 at 7:50 PM
miniature dachshund Puppies For Sale Near Me
cheap Dachshund Puppies For adoption
Dapple dachshund Puppies For Sale Near Me
Dachshund Puppies For Adoption
ReplyDelete
Replies
QuickBooks Error Support - QuickBooks Customer ServiceJuly 16, 2021 at 2:32 PM
If you are a QuickBook user, then you must be aware of the frequently occurring errors of QuickBook like QuickBooks Error Code 80029c4a. These errors must be bothering you for a long time, but not now! Our expert team is here for you who can fix all the QuickBook errors or issues in just one phone call. Hire us for the quick and the guaranteed QuickBooks Error resolution free of cost. We are here for you round the clock Just Give us a call and fix the error right now.

ReplyDelete
Replies
Nursing Writing CenterSeptember 28, 2021 at 10:53 AM
This is quite a .good blog. Keep sharing. I love them Are you also searching for Nursing Writing Center? we are the best solution for you. We are best known for delivering nursing writing services to students without having to break the bank.
ReplyDelete
Replies
nursing writing services loginSeptember 28, 2021 at 10:56 AM
Such great content.This is authentic. Are you also searching for nursing writing services login? we are the best solution for you. We are best known for delivering the best
ReplyDelete
Replies
bhanuNovember 8, 2021 at 4:34 PM
The article is very interesting. Thank for sharing
Python Online Training In Hyderabad
Python Online Training

ReplyDelete
Replies
AnonymousJanuary 13, 2022 at 5:04 AM
블랙잭 게임 방법? 규칙,전략 - 라이더카지노
ReplyDelete
Replies
OliverJanuary 13, 2022 at 5:05 AM
We Are A Devoted Team Of Forex Traders And Reviews Site Who Aim To Provide Unbiased And Detailed Reviews On All Major Forex Brokers Out There. At Iforexs We Have Spent The Last Few Years Researching, Reviewing, And Testing Forex Brokers So That We Can Help You Find One That Suits Your Trading Needs. With The Ever-increasing Number Of Forex Brokers Out There, We Understand It Is Hard To Know Where To Put Your Money – But We’re Here To Help!
ReplyDelete
Replies
Top Computer CoachingMarch 21, 2022 at 5:20 AM
BIIT is very known and popular best computer institute in east delhi. We provides highly professional trainee to our students at an affordable price. You can call on 9311441524 or can visit to A-115 , First Floor , Near Panna Sweet, Shakarpur, Vikas Marg, Laxmi Nagar, Computer Institute, Opposite Metro Piller No. 33, Delhi, 110092 to get more information.
ReplyDelete
Replies
AnonymousOctober 22, 2022 at 8:18 AM
천안출장샵
강원도출장샵
공주출장샵
서천출장샵
강원도출장샵
논산출장샵
춘천출장샵

ReplyDelete
Replies
AnonymousDecember 12, 2022 at 5:57 AM
심심출장안마
남양주출장안마
의정부출장안마
제천출장안마
횡성출장안마
충주출장안마
부천출장안마

ReplyDelete
Replies
다이아출장샵 조조출장샵February 24, 2024 at 2:02 PM
단밤콜걸
콜걸
연천콜걸
성남콜걸
김포콜걸
경기광주콜걸
광명콜걸
군포콜걸
ReplyDelete
Replies
Majortotosite TopMay 2, 2024 at 2:00 PM
I will definitely love to read.
ReplyDelete
Replies
Totopick ProMay 2, 2024 at 2:01 PM
I will visit your blog regularly for some latest post.
ReplyDelete
Replies
토토사이트 모음November 18, 2024 at 11:20 AM
So good to discover somebody with a few unique thoughts on this subject matter.
ReplyDelete
Replies
카지노사이트March 7, 2025 at 9:58 AM
I find it to be overly wide and complex. I can't wait for your next post, and I'll attempt to figure it out!
ReplyDelete
Replies
토토사이트March 7, 2025 at 9:59 AM
Your website is awesome. I am amazed by the attention to detail you have? I found this webpage.
ReplyDelete
Replies
안전놀이터안전놀이터March 7, 2025 at 9:59 AM
I appreciate you sharing such great information.
ReplyDelete
Replies
파워볼사이트March 7, 2025 at 10:00 AM
I'm curious about the amount of work you put into creating such a wonderful, educational website.
ReplyDelete
Replies
카지노사이트March 7, 2025 at 10:01 AM
I appreciate the well-written article. In actuality, it was a funny story.
ReplyDelete
Replies
파워볼사이트March 7, 2025 at 10:01 AM
I appreciate you sharing this.Look difficult to bring from you with consent! How can we communicate, though?
ReplyDelete
Replies
슬롯사이트March 7, 2025 at 10:01 AM
I've been reading some of your blog postings, and I find this place to be really educational. Continue posting.
ReplyDelete
Replies
파워볼사이트March 7, 2025 at 10:02 AM
I simply had to say how much I loved the fundamental information you provided on your website.
ReplyDelete
Replies
토토사이트March 7, 2025 at 10:02 AM
Your work is very good, thank you for sharing great information to us.
ReplyDelete
Replies
토토사이트March 7, 2025 at 10:02 AM
I appreciate you and hopping for some more informative posts.
ReplyDelete
Replies
슬롯사이트March 15, 2025 at 3:06 AM
Nice post.
ReplyDelete
Replies
파워볼사이트March 15, 2025 at 3:07 AM
Fantastic beat !
ReplyDelete
Replies
토토사이트March 15, 2025 at 3:09 AM
Pretty portion of content.
ReplyDelete
Replies
토토사이트March 15, 2025 at 3:10 AM
I admire what you have done here. Fantastic!
ReplyDelete
Replies
카지노사이트March 15, 2025 at 3:12 AM
Its Amazingg bloggg.
ReplyDelete
Replies
소액결제 현금화March 15, 2025 at 3:13 AM
This is great article, Thankyou!
ReplyDelete
Replies
파워볼사이트April 14, 2025 at 12:47 AM
Is there anybody getting similar RSS issues? D.
ReplyDelete
Replies
파워볼사이트April 14, 2025 at 12:48 AM
Anyone who knows the solution can you kindly respond? D.
ReplyDelete
Replies
토토사이트April 14, 2025 at 12:48 AM
Wow! This blog looks exactly like my old one! D.
ReplyDelete
Replies
제트벳April 14, 2025 at 12:48 AM
Great choice of colors! D.
ReplyDelete
Replies

Add comment