Java is the new C: Big Data the 'reactive' way

Saturday, December 21, 2013

Big Data the 'reactive' way

This has been posted (by me ofc) on the Java Advent Calendar originally. Check it out for more interesting articles:

http://www.javaadvent.com/2013/12/big-data-reactive-way.html

A metatrend going on in the IT industry is a shift from query-based, batch oriented systems to (soft) realtime updated systems. While this is associated with financial trading only, there are many other examples such as "Just-In-Time"-logistic systems, flight companies doing realtime pricing of passenger seats based on demand and load, C2C auction system like EBay, real time traffic control and many more.

It is likely this trend will continue, as the (commercial) value of information is time dependent, value decreases with age of information.

Automated trading in the finance sector is just a forerunner in this area, because some microseconds time advantage can be worth millions of dollars. Its natural real time processing systems evolve in this domain faster.

However big parts of traditional IT infrastructure is not designed for reactive, event based systems. From query based databases to request-response based Http protcol, the common paradigm is to store and query data "when needed".

Current Databases are static and query-oriented

Current approaches to data management such as SQL and NOSQL databases focus on data transactions and static query of data. Databases provide convenience in slicing and dicing data but they do not support update of complex queries in real time. Uprising NOSQL databases still focus on computing a static result.
Databases are clearly not "reactive".

Current Messaging Products provide poor query/filtering options

Current messaging products are weak at filtering. Messages are separated into different streams (or topics), so clients can do a raw preselection on the data received. However this frequently means a client application receives like 10 times more data than needed, doing fine grained filtering 'on-top'.
A big disadvantage is, that the topic approach builts filter capabilities "into" the system's data design.
E.g. if a stock exchange system splits streams on a per-stock base, a client application still needs to subscribe to all streams in order to provide a dynamically updated list of "most active" stocks. Querying usually means "replay+search the complete message history".

A scalable, "continuous query" distributed Datagrid.

I had the enjoyment to do conceptional & technical design for a large scale realtime system, so I'd like to share a generic scalable solution for continuous query processing at high volume and large scale.

It is common, that real-time processing systems are designed "event sourced". This means, persistence is replaced by journaling transactions. System state is kept in memory, the transaction journal is required for historic analysis and crash recovery only.
Client applications do not query, but listen to event streams instead. A common issue with event sourced systems is the problem of "late joining client". A late client would have to replay the whole system event journal in order to get an up-to-date snapshot of the system state.
In order to support late joining clients, a kind of "Last Value Cache" (LVC) component is required. The LVC holds current system state and allows late joiners to bootstrap by querying.
In a high performance, large data system, the LVC component becomes a bottleneck as the number of clients rises.

Generalizing the Last Value Cache: Continuous Queries

In a continuous query data cache, a query result is kept up to date automatically. Queries are replaced by subscriptions.

subscribe * from Orders where
symbol in ['ALV', 'BMW'] and
volume > 1000 and
owner='MyCompany'

creates a message stream, which initially performs a query operation, after that updates the result set whenever a data change affecting the query result happened (transparent to the client application). The system ensures each subscriber receives exactly the change notifications necessary to keep its "live" query results up-to-date.

A distributed continous query system: The LVC Nodes hold data. Transactions are sent to them on a message bus (red). The LVC nodes compute the actual difference caused by a transaction and send change notifications on a message bus (blue). This enables "processing nodes" to keep a mirror of their relevant data partition up-to-date. External clients connected via TCP/Http do not listen to the message bus (because multicast is not an option in WAN). "Subscription processors" keep the client's continuous queries up-to-date by listening to the (blue) message bus and dispatching required change notifications only to client's point2point connection.

Difference of data access patterns compared to static data management:

High write volume
Real time systems create a high volume of write access/change in data.
Fewer full table scans.
Only late-joining clients or changes of a query's condition require a full data scan. Because continuous queries make "refreshing" a query result obsolete, Read/Write ratio is ~ 1:1 (if one counts the change notification resulting from a transaction as "Read Access").
The majority of load is generated, when evaluating queries of active continuous subscriptions with each change of data. Consider a transaction load of 100.000 changes per second with 10.000 active continuous queries: this requires 100.000*10.000 = 1 Billion evaluations of query conditions per second. That's still an underestimation: When a record gets updated, it must be tested whether the record has matched a query condition before the update and whether it matches after the update. A record's update may result in an add (because it matches after the change) or a remove transaction (because the record does not match anymore after a change) to a query subscription.

Data Cluster Nodes ("LastValueCache Nodes")

Data is organized in tables, column oriented. Each table's data is evenly partitioned amongst all data grid nodes (=last value cache node="LVC node"). By adding data nodes to the cluster, capacity is increased and snapshot queries (initializing a subscription) are sped up by increased concurrency.

There are three basic transactions/messages processed by the data grid nodes:

AddRow(table,newRow),
RemoveRow(table,rowId),
UpdateRow(table, rowId, diff).

The data grid nodes provide a lambda-alike (row iterator) interface supporting the iteration of a table's rows using plain java code. This can be used to perform map-reduce jobs and as a specialization, the initial query required by newly subscribing clients. Since ongoing computation of continuous queries is done in the "Gateway" nodes, the load of data nodes and the number of clients correlate weakly only.

All transactions processed by a data grid node are (re-)broadcasted using multicast "Change Notification" messages.

Gateway Nodes

Gateway nodes track subscriptions/connections to client applications. They listen to the global stream of change notifications and check whether a change influences the result of a continuous query (=subscription). This is very CPU intensive.

Two things make this work:

by using plain java to define a query, query conditions profit from JIT compilation, no need to parse and interpret a query language. HotSpot is one of the best optimizing JIT compilers on the planet.
Since multicast is used for the stream of global changes, one can add additional Gateway nodes with ~no impact on throughput of the cluster.

Processor (or Mutator) Nodes

These nodes implement logic on-top of the cluster data. E.g. a statistics processor does a continuous query for each table, incrementally counts the number of rows of each table and writes the results back to a "statistics" table, so a monitoring client application can subscribe to realtime data of current table sizes. Another example would be a "Matcher processor" in a stock exchange, listening to orders for a stock, if orders match, it removes them and adds a Trade to the "trades" table.

If one sees the whole cluster as kind of a "giant spreadsheet", processors implement the formulas of this spreadsheet.

Scaling Out

with data size:
increase number of LVC nodes
Number of Clients
increase subscription processor nodes.
TP/S
scale up processor nodes and LVC nodes

Of cause the system relies heavily on availability of a "real" multicast messaging bus system. Any point to point oriented or broker-oriented networking/messaging will be a massive bottleneck.

Conclusion

Building real time processing software backed by a continuous query system simplifies application development a lot.

Its model-view-controller at large scale.
Astonishing: patterns used in GUI applications for decades have not been extended regulary to the backing data storage systems.
Any server side processing can be partitioned in a natural way. A processor node creates an in-memory mirror of its data partition using continuous queries. Processing results are streamed back to the data grid. Computing intensive jobs, e.g. risk computation of derivatives can be scaled by adding processor instances subscribing to distinct partitions of the data ("sharding").
The size of the Code Base reduces significantly (both business logic and Front-End).
A lot of code in handcrafted systems deals with keeping data up to date.

About me

I am a technical architect/senior developer consultant at an european company involved heavily in stock & derivative trading systems.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

59 comments:

rmouniakJanuary 22, 2018 at 9:52 AM
Thanks for sharing this good blog.
Java Online Training
ReplyDelete
Replies
swathimaAugust 9, 2019 at 9:23 AM
Nice Presentation and its hopefull words..
if you want a cheap web hosting in web
crm software development company in chennai
erp software development company in chennai
Professional webdesigning company in chennai
best seo company in chennai
ReplyDelete
Replies
Savanah EdenAugust 9, 2019 at 9:59 AM
This comment has been removed by the author.
ReplyDelete
Replies
manimaranMarch 11, 2020 at 12:29 PM
It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
http://rexapparels.com/boxer-manufacturers-in-tirupur-india/
http://rexapparels.com/track-pants-manufacturers-in-tirupur-india/
http://rexapparels.com/innerwear-manufacturers-in-tirupur-india/
http://rexapparels.com/buying-office-in-tirupur-india/
http://rexapparels.com/export-surplus-t-shirts-in-tirupur-india/
http://rexapparels.com/t-shirt-manufacturer-in-tirupur-india/
ReplyDelete
Replies
Jamesmith0264September 16, 2020 at 12:51 PM
A Chatbot Development is a product program for reproducing wise discussions with human utilizing rules or man-made brainpower. Clients connect with the Chatbot development service by means of conversational interface through composed or spoken content. Chatbots can live in informing stages like Slack, Facebook Messenger bot developer and Telegram and fill some needs – requesting items, thinking about climate and dealing with your fund in addition to other things. As a Chatbot development company advancement organization our competency let you find happiness in the hereafter by taking care of clients all the more intelligently to accomplish wanted outcome. As a Chatbot companies we can streamline a large portion of your dreary undertakings, for example, voice bot advancement and client service, online business advices and so on.
ReplyDelete
Replies
ravisynitNovember 2, 2020 at 9:15 AM
great java tips At SynergisticIT we offer the best java training san francisco
ReplyDelete
Replies
AnonymousSeptember 1, 2021 at 1:51 PM
QuickPay Portal | www.quickpayportal.com | Pay Your Medical Bill

Pay Your Medical Bill using www.quickpayportal.com and Make a quick and fast payment online using the QuickPay Portal. Just find your QuickPay Code on your billing statement and and you're ready to go

quickpayportal.com

quickpayportal.com

quickpayportal.com

ReplyDelete
Replies
ScopexApril 5, 2022 at 1:50 PM
Canli
ReplyDelete
Replies
packersandmoversinKondapurAugust 29, 2022 at 1:25 PM
Nice information.packers and movers in kondapur

ReplyDelete
Replies
AnonymousDecember 16, 2022 at 8:10 AM
Thanks for sharing. This post is very simple to read and appreciate without leaving any details out. Go to crystal shops near me and you will find that the allure of healing crystals is difficult to ignore.
ReplyDelete
Replies
Mr. MarkApril 12, 2023 at 11:10 PM
This comment has been removed by the author.
ReplyDelete
Replies
바카라사이트 추천February 25, 2024 at 10:15 AM
I am really pleased to read this web site posts which carries tons of valuable data.
ReplyDelete
Replies
토토사이트 추천February 25, 2024 at 10:16 AM
I really like your technique of blogging.
ReplyDelete
Replies
파워볼사이트February 25, 2024 at 10:16 AM
I book marked it to my bookmark website list and will be checking back soon.
ReplyDelete
Replies
스포츠 토토사이트March 10, 2024 at 8:28 AM
Hello there! This post could not be written any better!
ReplyDelete
Replies
gostopsite.comMarch 10, 2024 at 8:30 AM
Anyways thanks for posting ideas.
ReplyDelete
Replies
sportstotomen.comMarch 10, 2024 at 8:31 AM
Is really nice and would appreciate thank you.
ReplyDelete
Replies
19guide03.comMarch 10, 2024 at 8:31 AM
very much informative post. thanky youu!!!
ReplyDelete
Replies
slotplayground.comMarch 10, 2024 at 8:32 AM
I admire this post for having excess of knowledge and information.
ReplyDelete
Replies
edwardsrailcar.comMarch 10, 2024 at 8:33 AM
An impressive share! I have just forwarded this onto a coworker thank you
ReplyDelete
Replies
homeplatekrAugust 11, 2024 at 4:30 PM

Informative article, exactly what I wanted to find
ReplyDelete
Replies
카지노사이트August 17, 2024 at 6:45 PM
I really loved reading your blog.
ReplyDelete
Replies
안전놀이터August 17, 2024 at 6:46 PM
Really nice and interesting post.
ReplyDelete
Replies
토토사이트August 17, 2024 at 6:46 PM
Keep posting. Thanks for sharing.
ReplyDelete
Replies
파워볼사이트August 17, 2024 at 6:47 PM
It's very useful. I really appreciate your post.
ReplyDelete
Replies
outlookindiapowerballAugust 21, 2024 at 3:50 AM

You are so awesome! I don’t think I’ve read through anything like that before.
ReplyDelete
Replies
outlookindiatotoSeptember 1, 2024 at 6:40 AM

Perfect just what I was looking for!
ReplyDelete
Replies
outlookindiacasinoSeptember 2, 2024 at 8:10 AM

thanks for the good ideas
ReplyDelete
Replies
outlookindiasportsSeptember 8, 2024 at 12:38 PM

Wow, awesome blog layout!
ReplyDelete
Replies
outlookindiacasinoSeptember 12, 2024 at 6:46 AM

Your blog got me to learn a lot, thanks for sharing.
ReplyDelete
Replies
outlookindiapowerballSeptember 14, 2024 at 6:17 AM

Nice Blog. Thanks for sharing with us. Such amazing information.
ReplyDelete
Replies
outlookindiatotoSeptember 21, 2024 at 12:28 PM

I’m glad that you simply shared this useful info with us
ReplyDelete
Replies
outlookindiatotoOctober 9, 2024 at 10:19 AM

thanks for sharing great blog article thanks again
ReplyDelete
Replies
outlookindiacasinoOctober 10, 2024 at 11:50 AM

This article is really contains lot more information about This Topic. Its great
ReplyDelete
Replies
outlookindiasportsOctober 11, 2024 at 8:38 AM

You really amazed me with your writing talent
ReplyDelete
Replies
totosafeguideOctober 13, 2024 at 10:54 AM

Wonderful website. Lots of useful info here. Great article!
ReplyDelete
Replies
casinositezoneOctober 15, 2024 at 12:03 PM

Hello guys! I think you should need to pay attention at this
ReplyDelete
Replies
casinositewikiOctober 16, 2024 at 5:48 AM

I like the valuable information you supply for your articles
ReplyDelete
Replies
카지노사이트March 11, 2025 at 8:41 AM
Good job, cheers D.
ReplyDelete
Replies
카지노사이트March 11, 2025 at 8:41 AM
Good job, cheers D.
ReplyDelete
Replies
토토사이트March 11, 2025 at 8:41 AM
Hi there very cool web site!! Guy .. Excellent .. D.
ReplyDelete
Replies
안전놀이터March 11, 2025 at 8:41 AM
I'm definitely loving the information. D.
ReplyDelete
Replies
소액결제 현금화March 11, 2025 at 8:42 AM
I'm book-marking and will betweeting this to my followers! D.
ReplyDelete
Replies
신용카드 현금화March 11, 2025 at 8:42 AM
Great looking web site. D.
ReplyDelete
Replies
바카라사이트March 11, 2025 at 8:43 AM
Hi there, just wanted to tell you, I liked this article.D.
ReplyDelete
Replies
파워볼사이트March 11, 2025 at 8:43 AM
It was inspiring. D.
ReplyDelete
Replies
신용카드 현금화April 9, 2025 at 9:44 AM
Great topic, but I'm not sure where you're getting your facts.
ReplyDelete
Replies
토토사이트April 9, 2025 at 9:45 AM
I appreciate you sharing such great information.
ReplyDelete
Replies
토토사이트April 9, 2025 at 9:46 AM
I genuinely enjoy reading on this internet site, it holds wonderful articles
ReplyDelete
Replies
토토사이트April 9, 2025 at 9:47 AM
very nice information really like it if you want to read more like this please
ReplyDelete
Replies
대물카지노April 9, 2025 at 9:48 AM
Fantastic great post This is a great post, thanks for sharing it. Gratitude again. Awesome.
ReplyDelete
Replies
엑스엑스벳April 9, 2025 at 9:49 AM
Great points you made there in your article.
ReplyDelete
Replies
파워볼사이트April 9, 2025 at 9:50 AM
I was really amazed reading this
ReplyDelete
Replies
슬롯사이트April 9, 2025 at 9:52 AM
Pretty portion of content.
ReplyDelete
Replies
스포츠 토토사이트June 15, 2025 at 10:00 AM
Excellent quality writing, it is uncommon to see a nice blog like this one nowadays... MM
ReplyDelete
Replies
메이저사이트June 15, 2025 at 10:00 AM
Your personal stuffs is beautiful. All the time deal with it up! Thanks for this... MM
ReplyDelete
Replies
농구 토토June 15, 2025 at 10:01 AM
I cant wait to learn much more from you. This is actually a great site... MM
ReplyDelete
Replies
축구토토 승무패June 15, 2025 at 10:01 AM
Impressive website. Numerous helpful tips in this article. Excellent activity!... MM
ReplyDelete
Replies
먹튀검증사이트June 15, 2025 at 10:04 AM
Keep up the good work and we will continue to support your web posts... MM
ReplyDelete
Replies

Add comment