Friday, October 17, 2014

Follow up: Executors and Cache Locality Experiment

Thanks to Jean Philippe Bempel who challenged my results (for a reason), I discovered an issue in last post: Code-completion let me accidentally choose Executors.newSingleThreadScheduledExecutor() instead of Executors.newSingleThreadExecutor(), so the pinned-to-thread-actor results are actually even better than reported previously. The big picture has not changed that much, but its still worthwhile reporting.

On a second note: There are many other aspects to concurrent scheduling such as queue implementations etc.. Especially if there is no "beef" inside the message processing, these differences become more dominant compared to cache misses, but this is another problem that has been covered extensively by other people in depth (e.g. Nitsan Wakart).

Focus of this experiment is locality/cache misses, keep in mind different queueing implementations of executors for sure add dirt/bias.

As requested, I add results from the linux "perf" tool to prove there are significant differences in cache misses caused by random assignment of Thread - to Actor as done by ThreadPoolExecutor and WorkStealingExecutor.

Check out my recent post for a description of the test case.

Results with adjusted SingleThreadExecutor (XEON 2 socket, each 6 cores, no HT)

As in previous post, "dedicated" actor-pinned-to-thread performs best. For very small local state, there are only few cache misses so differences are small, but widen once a bigger chunk of memory is accessed by each actor. Note that ThreadPool is hampered by its internal scheduling/queuing mechanics, regardless of locality, it performs weak.

When increasing number of Actors to 8000 (so 1000 actors per thread), "Workstealing" and "Dedicated" perform similar. Reason: executing 8000 actors round robin creates cache misses for both executors. Note that in a real world server its likely that there are active and inactive actors, so I'd expect "Dedicated" to perform slightly better than in this synthetic test.

"perf stat -e" and "perf stat -cs" results

(only 2000, 4000, 8000 local size tests where run)

333,669,424 cache-misses                                                
19.996366007 seconds time elapsed
185,440 context-switches                                            
20.230098005 seconds time elapsed
=> 9,300 context switches per second

2,524,777,488 cache-misses                                                
39.610565607 seconds time elapsed
381,385 context-switches                                            
39.831169694 seconds time elapsed
=> 9,500 context switches per second

3,213,889,492 cache-misses                                                
92.141264115 seconds time elapsed
25,387,972 context-switches                                            
87.547306379 seconds time elapsed
=>290,000 context switches per second

A quick test with a more realistic test method

In order to get a more realistic impression I replaced the synthetic int-iteration by some dirty "real world" dummy stuff (do some allocation and HashMap put/get). Instead of increasing the size of the "localstate" int array, I increase the HashMap size  (should also have negative impact on locality).

Note that this is rather short processing, so queue implementations and executor internal implementation might dominate locality here. This test is run on Opteron 8c16t * 2Sockets, a processor with 8kb L1 cache size only. (BTW: impl is extra dirty, so no performance optimization comments pls, thx)

As ThreadPoolExecutor is abnormous bad in this Test/Processor combination, plain numbers:

64 HMap entries256 HMapentries2000 HMapentries4000 HMapentries32k HMapentries320k HMapentries

Conclusions basically stay same as in original post. Remember cache misses are only one factor of overall runtime performance, so there are workloads where results might look different. Quality/specialization of queue implementation will have huge impact in case processing consists of only some lines of code.

Finally, my result: 
Pinning actors to threads created lowest cache miss rates in any case tested.


  1. Hi Rudiger
    Small typo at the beginning
    "Code-completion let me accidentally choose Executors.newSingleThreadScheduledExecutor() instead of Executors.newSingleThreadScheduledExecutor()"
    Again you selected the Scheduled one :)
    Thanks for sharing

  2. Argh .. poor me .. I am considering to fork openjdk and delete this method :-) Thx

  3. Existing without the answers to the difficulties you’ve sorted out through this guide is a critical case, as well as the kind which could have badly affected my entire career if I had not discovered your website
    Digital Marketing Training in rajajinagar

  4. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this
    Digital Marketing Training in rajajinagar

  5. This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.
    Click here:
    python training in Bangalore
    Click here:
    python training in Bangalore

  6. I read this post two times, I like it so much, please try to keep posting & Let me introduce other material that may be good for our community.
    Blue Prism Training in Pune

    Blueprism training in tambaram

    Blueprism training in annanagar

  7. I read this post two times, I like it so much, please try to keep posting & Let me introduce other material that may be good for our community.
    java online training | java training in pune

    java training in chennai | java training in bangalore

  8. Devops is not a Tool.Devops Is a Practice, Methodology, Culture or process used in an Organization or Company for fast collaboration, integration and communication between Development and Operational Teams. In order to increase, automate the speed of productivity and delivery with reliability.

    python training in bangalore
    aws training in bangalore
    artificial intelligence training in bangalore
    data science training in bangalore
    machine learning training in bangalore
    hadoop training in bangalore
    devops training in bangalore

  9. Gaining Python certifications will validate your skills and advance your career.
    python certification

  10. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 
    python training in pune | python training institute in chennai | python training in Bangalore

  11. It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we forgot all hardwork and wait for getting that happened.

    Microsoft Azure online training
    Selenium online training
    Java online training
    Java Script online training
    Share Point online training

  12. Have you been thinking about the power sources and the tiles whom use blocks I wanted to thank you for this great read!! I definitely enjoyed every little bit of it and I have you bookmarked to check out the new stuff you post
    devops online training

    aws online training

    data science with python online training

    data science online training

    rpa online training

  13. Excellent blog I visit this blog it's really awesome. The important thing is that in this blog content written clearly and understandable. The content of information is very informative.
    Oracle Fusion HCM Online Training
    Oracle Fusion SCM Online Training
    Oracle Fusion Financials Online Training
    oracle fusion financials classroom training
    Workday HCM Online Training
    Oracle Fusion HCM Classroom Training

  14. A befuddling web diary I visit this blog, it's incredibly grand. Strangely, in this present blog's substance made motivation behind fact and sensible. The substance of information is instructive
    Oracle Fusion Financials Online Training
    Oracle Fusion HCM Online Training
    Oracle Fusion SCM Online Training

  15. Thank you for providing such an awesome article and it is a very useful blog for others to read.

    Oracle ICS Online Training

  16. Thanks for providing a useful article containing valuable information. start learning the best online software courses.

    Workday Online Training