Wednesday, July 31, 2013

Impact of large primitive arrays (BLOBS) on Garbage Collection

While OldGen GC duration ("FullGC") depends on number of objects and the locality of their references to each other, young generation duration depends on the size of OldSpace memory. Alexej Ragozin has made extensive tests which give a good impression of young GC duration vs HeapSize here.

In order to avoid heavy impact of long lived data on OldGen GC, there are several workarounds/techniques.

  1. Put large parts of mostly static data Off-Heap using ByteBuffer.allocateDirect() or Unsafe.allocateMemory(). This memory is then used to store data (e.g. by using a fast serialization like http://code.google.com/p/fast-serialization/ [oops, I did it again] or specialized solutions like http://code.google.com/p/vanilla-java/wiki/HugeCollections ).
    Downside is, that one frequently has to implement a manual memory mangement on top.
  2. "instance saving" on heap by serializing into byte-arrays or transformation of datastructures. This usually involves using open adressed hashmaps without "Entry" Objects, large primitive arrays instead of small Objects like
    class ReferenceDataArray {
        int x[];
        double y[];
        long z[]; 
        public ReferenceDataArray(int size) {
             x = new int[size];
             y = new ...;
             z = ...;
        }
        public long getZ(int index) { return z[index]; }
    },
    replacement of generic collections with <Integer>, <Long> by specialized implementations with direct primitve int, long, ..
    If its worth to cripple your code this way is questionable, however the option exists.

Going the route outlined in (2) improves the effectivity of OldGen GC a lot. FullGC duration can be in the range of 2s even with heap sizes in the 8 GB area. CMS performs significantly better as it can scan OldSpace faster and therefore needs less headroom in order to avoid Full GC.

However there is still the fact, that YoungGen GC scales with OldSpace size.

The scaling effect is usually associated with "cardmarking". Young GC has to remember which areas of OldSpace have been modified (in such a way they reference objects in YoungGen). This is done with kind of a BitField where each bit (or byte) denotes the state of (modified/reference created or similar) a chunk ("card") of OldSpace.
Primitive Arrays basically are BLOBS for the VM, they cannot contain a reference to other Java Objects, so theoretically there is no need to scan or card-mark areas containing BLOBS them when doing GC. One could think e.g. of allocating large primitive arrays from top of oldspace, other objects from bottom this way reducing the amount of scanned cards.

Theory: blobs (primitive arrays) result in shorter young GC pauses then equal amount of heap allocated in smallish Objects.

Therefore I'd like to do a small test, measuring the effects of allocating large primitive arrays (such as byte[], int[], long[], double[]) on NewGen GC duration.

The Test

public class BlobTest {
    static ArrayList blobs = new ArrayList();
    static Object randomStuff[] = new Object[300000];
    public static void main( String arg[] ) {
        if ( Runtime.getRuntime().maxMemory() > 2*1024*1024*1024l) { // 'autodetect' avaiable blob space from mem settings
            int blobGB = (int) (Runtime.getRuntime().maxMemory()/(1024*1024*1024l));
            System.out.println("Allocating "+blobGB*32+" 32Mb blobs ... (="+blobGB+"Gb) ");
            for (int i = 0; i < blobGB*32; i++) {
                blobs.add(new byte[32*1024*1024]);
            }
            System.gc(); // force VM to adapt ..
        }
        // create eden collected tmps with a medium promotion rate (promotion rate can be adjusted by size of randomStuff[])
        while( true ) {
            randomStuff[((int) (Math.random() * randomStuff.length))] = new Rectangle();
        }
    }
}


The while loop at the bottom simulates the allocating application. Because I rewrite random indizes of the randomStuff arrays using a random index, a lot of temporary objects are created, because they same index is rewritten with another object instance. However because of random, some indices will not be hit in time and live longer, so they get promoted. The larger the array, the less likely index overwriting gets, the higher the promotion rate to OldSpace.

In order to avoid bias by VM-autoadjusting, I pin NewGen sizes, so the only variation is the allocation of large byte[] on top the allocation loop. (Note these settings are designed to encourage promotion, they are in now way optimal).


commandline:

java -Xms1g -Xmx1g -verbose:gc -XX:-UseAdaptiveSizePolicy -XX:SurvivorRatio=12 -XX:NewSize=100m -XX:MaxNewSize=100m -XX:MaxTenuringThreshold=2
by adding more GB the upper part of the test will use any heap above 1 GB to allocate byte[] arrays.

java -Xms3g -Xmx3g -verbose:gc -XX:-UseAdaptiveSizePolicy -XX:SurvivorRatio=12 -XX:NewSize=100m -XX:MaxNewSize=100m -XX:MaxTenuringThreshold=2
...
...
java -Xms11g -Xmx11g -verbose:gc -XX:-UseAdaptiveSizePolicy -XX:SurvivorRatio=12 -XX:NewSize=100m -XX:MaxNewSize=100m -XX:MaxTenuringThreshold=2

I am using byte[] arrays in the test, I verified int[], long[] behave exactly the same (must apply divisor then to adjust for larger size).


Results

(jdk 1.7_u21)






The 'Objects' test was done by replacing the static byte[] allocation loop in the benchmark by

            for ( int i = 0; i < blobGB*2700000; i++ )
                nonblobs.add(new Object[] {
                         new Rectangle(),new Rectangle(),new Rectangle(),
                         new Rectangle(),new Rectangle(),new Rectangle(),
                         new Rectangle(),new Rectangle(),new Rectangle(),
                         new Rectangle()});


Conclusion

Flattening data structures using on-heap allocated primitive arrays ('BLOBS') reduces OldGen GC overhead very effective. 
Young Gen pauses slightly reduce for CMS, so scaling with OldGen size is damped but not gone. For DefaultGC (PSYoung), minor pauses are actually slightly higher when the heap is filled with BLOBs.
I am not sure if the observed young gen duration variance has anything to do with "card marking" however i am satisfied quantifying effects of different allocation types and sizes :-) 

Further Improvement incoming ..


With this genious little optimization coming up in JDK 7_u40


card scanning of unmarked cards speeds up by a factor of 8. 

Additonally notice 

(for the test  -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=512 gave the best results)
At least CMS Young Gen pause scaling is not too bad.

And G1 ?

G1 fails to execute the test. If one only allocates 6GB of byte[] with 11GB of heap, it still is much more disruptive than CMS. It works if I use small byte[] chunks of 1MB size and set page size to 32MB. Even then pauses are longer compared to CMS. G1 seems to have problems with large object arrays which will be problematic for IO intensive applications requiring big and many byte buffers.

27 comments:

  1. Devops is not a Tool.Devops Is a Practice, Methodology, Culture or process used in an Organization or Company for fast collaboration, integration and communication between Development and Operational Teams. In order to increase, automate the speed of productivity and delivery with reliability.

    python training in bangalore
    aws training in bangalore
    artificial intelligence training in bangalore
    data science training in bangalore
    machine learning training in bangalore
    hadoop training in bangalore
    devops training in bangalore

    ReplyDelete
  2. Gaining Python certifications will validate your skills and advance your career.
    python certification

    ReplyDelete
  3. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us
    ou will get an introduction to the Python programming language and understand the importance of it. How to download and work with Python along with all the basics of Anaconda will be taught. You will also get a clear idea of downloading the various Python libraries and how to use them.
    Topics
    About ExcelR Solutions and Innodatatics
    Do's and Don’ts as a participant
    Introduction to Python
    Installation of Anaconda Python
    Difference between Python2 and Python3
    Python Environment
    Operators
    Identifiers
    Exception Handling (Error Handling)
    Excelr Solutions

    ReplyDelete
  4. I love your article so much. Good job
    ExcelR is a global leader delivering a wide gamut of management and technical training over 40 countries. We are a trusted training delivery partner of 350+ corporate clients and universities across the globe with 28,000+ professionals trained across various courses. With over 20 Franchise partners all over the world, ExcelR helps individuals and organisations by providing courses based on practical knowledge and theoretical concepts.

    Excelr Solutions

    ReplyDelete
  5. I love your article so much. Good job
    Participants who complete the assignments and projects will get the eligibility to take the online exam. Thorough preparation is required by the participants to crack the exam. ExcelR's faculty will do the necessary handholding. Mock papers and practice tests will be provided to the eligible participants which help them to successfully clear the examination.

    Excelr Solutions

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. thanks for ur valuable information,keep going touch with us

    Scaffolding dealers in chennai

    ReplyDelete
  8. I like you article. if you you want to saw Sufiyana Pyaar Mera Star Bharat Serials Full
    Sufiyana Pyaar Mera

    ReplyDelete
  9. I learned World's Trending Technology from certified experts for free of cost. I got a job in decent Top MNC Company with handsome 14 LPA salary, I have learned the World's Trending Technology from Data science training in btm layout experts who know advanced concepts which can help to solve any type of Real-time issues in the field of Python. Really worth trying Freelance SEO Expert in bangalore

    ReplyDelete
  10. Thanks for sharing is so amazing and helpful to us.
    Buy Hydrocodone online

    ReplyDelete
  11. Thanks for sharing is so amazing and helpful to us.
    Buy Hydrocodone online

    ReplyDelete
  12. Best Article BUY ADDERALL ONLINE Excellent post. I appreciate this site. Stick with it! Because the admin of this web page is working, no doubt very quickly it will be well-known, due to its quality contents.This website was how do you say it? Relevant!! Finally, I’ve found something that helped me.

    ReplyDelete
  13. Thanks a lot for sharing
    Having good health is what most people out there wants but can not achieve. some people takes buy ibogaine online AND buy weed online to get it.

    ReplyDelete

  14. Thanks for Valuable Information man, IT was Really helpful for me.

    Also, Please reach me for all types of loans - Personal loan at low interest rate in Bangalore

    Trust me Its Really Worth trying for Certified mobile service center in Marathahalli

    ReplyDelete
  15. Thanks a lot for sharing
    Having good health is what most people out there wants but can not achieve. some people takes buy ibogaine online AND buy weed online to get it.

    ReplyDelete
  16. Thanks for Valuable Information man, IT was Really Fantastic.,

    Also, Please reach me for all types of loans - Personal loan at low interest rate in Bangalore

    Trust me Its Really Worth trying for Certified Oneplus service center in Marathahalli

    ReplyDelete

  17. Best Article buy Pain Pills online Excellent post. I appreciate this site. Stick with it! Because the admin of this web page is working, no doubt very quickly it will be well-known, due to its quality contents.This website was how do you say it? Relevant!! Finally, I’ve found something that helped me.
    Best Article buy Roxicodone online Excellent post
    buy Xanax online
    buy Oxycodone online

    Best Article buy Pain Medications online Excellent post. I appreciate this site. Stick with it! Because the admin of this web page is working, no doubt very quickly it will be well-known, due to its quality contents.This website was how do you say it? Relevant!! Finally, I’ve found something that helped me.

    buy Research Chemicals online

    buy Roxicodone online

    buy Cbd Isolate online

    ReplyDelete