Real Time Analytics – What It’s Like to Hold the Holy Grail

Real-time analytics has moved from the realm of fantastic goal to deliverable reality. And while our internal and external customers — who have waited patiently (and sometimes not-so-patiently) for the arrival of ultra-fast processing capabilities and delivery of decision-enabling analytics — are now starry-eyed with anticipation of processing data streams at the speed with which they are entering our organisation, CIOs, BI & Data Warehouse Directors and data architects must manage those expectations very carefully.

In one sense, it’s as though we’ve finally got one of the Holy Grails of business intelligence within our grasp! We want it so badly, and we can’t afford to let it slip through our fingers.

On the one hand we stand poised to deliver a quantum leap forward in business capability. Processing data for analytics as it arrives has been possible only for the smallest and least complex data sets, if at all. But with advances in storage paradigms like Hadoop 2.0 and resource management tools like Spark or Yarn, as well as technology improvements in hardware increasing the volumes feasible for in-memory computing, larger and more complex data sets are suddenly accessible in real time. IT stewards of these tools want them in the hands of business users, who can leverage them to make more effective or more timely decisions and drive the business forward.

On the other hand, increasing volumes have correlative costs, usually rising exponentially as volumes grow. Data warehouse owners also understand that real-time processing of incoming data streams often rules out the kinds of data validation and cleansing activities that their business users have come to expect from their non-real-time sources. It’s often a difficult conversation to explain to business users that this data is validated (and not available real-time), but this data is not validated (but is available in real-time). As always, the business wants everything, perfect, and immediately!

The savvy BI architect will be using three criteria to manage how real-time analytics are pushed out to the business through the company’s analytics platforms:

Discernment: Not all data needs to be real-time-available. Careful integration of real-time data streams with static reference data and deep (even offline) analytics is the recipe for achieving the elusive big-value that is also sustainable and extensible.

Reliability: Data streams that dry up without warning or experience intermittent interruptions are of little use if they’re offline when a salesperson is demonstrating a heatmap based on twitter sentiment for a brand’s hashtag campaign. What’s more subtle, however, is that the company’s infrastructure must also be sufficiently robust to accommodate traffic and bandwidth demands for real-time analytics processing, whether it be inbound data volumes from providers or outbound connections to sales personnel in the field.

Capability: The marriage between real-time and batch-cycle analytics requires careful preparation, with special attention paid to the integration points. A marketer watching public Instagram feeds during a concert doesn’t need heavy business-rule processing, and data that might feed into the marketer’s analytics that has to be put through the entire suite of business rules might not come back in time to capture time-critical opportunities to shift the marketing choices during the show for maximum impact. The key is to match the capability to the scenario and build accordingly.

Real time analytics promises to be one of the brightest and most promising horizons in the decade’s BI and data warehousing developments — without question, it’s this year’s Holy Grail.


DataHub Writer: Douglas R. Briggs
Mr. Briggs has been active in the fields of Data Warehousing and Business Intelligence for the entirety of his 17-year career. He was responsible for the early adoption and promulgation of BI at one of the world’s largest consumer product companies and developed their initial BI competency centre. He has consulted with numerous other companies about effective BI practices. He holds a Master of Science degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor of Arts degree from Williams College (Mass)..
View Linkedin Profile->
Other Articles by Douglas->

No results found

Real Time Analytics – What It’s Like to Hold the Holy Grail

Real-time analytics has moved from the realm of fantastic goal to deliverable reality. And while our internal and external customers — who have waited patiently (and sometimes not-so-patiently) for the arrival of ultra-fast processing capabilities and delivery of decision-enabling analytics — are now starry-eyed with anticipation of processing data streams at the speed with which they are entering our organisation, CIOs, BI & Data Warehouse Directors and data architects must manage those expectations very carefully.

In one sense, it’s as though we’ve finally got one of the Holy Grails of business intelligence within our grasp! We want it so badly, and we can’t afford to let it slip through our fingers.

On the one hand we stand poised to deliver a quantum leap forward in business capability. Processing data for analytics as it arrives has been possible only for the smallest and least complex data sets, if at all. But with advances in storage paradigms like Hadoop 2.0 and resource management tools like Spark or Yarn, as well as technology improvements in hardware increasing the volumes feasible for in-memory computing, larger and more complex data sets are suddenly accessible in real time. IT stewards of these tools want them in the hands of business users, who can leverage them to make more effective or more timely decisions and drive the business forward.

On the other hand, increasing volumes have correlative costs, usually rising exponentially as volumes grow. Data warehouse owners also understand that real-time processing of incoming data streams often rules out the kinds of data validation and cleansing activities that their business users have come to expect from their non-real-time sources. It’s often a difficult conversation to explain to business users that this data is validated (and not available real-time), but this data is not validated (but is available in real-time). As always, the business wants everything, perfect, and immediately!

The savvy BI architect will be using three criteria to manage how real-time analytics are pushed out to the business through the company’s analytics platforms:

Discernment: Not all data needs to be real-time-available. Careful integration of real-time data streams with static reference data and deep (even offline) analytics is the recipe for achieving the elusive big-value that is also sustainable and extensible.

Reliability: Data streams that dry up without warning or experience intermittent interruptions are of little use if they’re offline when a salesperson is demonstrating a heatmap based on twitter sentiment for a brand’s hashtag campaign. What’s more subtle, however, is that the company’s infrastructure must also be sufficiently robust to accommodate traffic and bandwidth demands for real-time analytics processing, whether it be inbound data volumes from providers or outbound connections to sales personnel in the field.

Capability: The marriage between real-time and batch-cycle analytics requires careful preparation, with special attention paid to the integration points. A marketer watching public Instagram feeds during a concert doesn’t need heavy business-rule processing, and data that might feed into the marketer’s analytics that has to be put through the entire suite of business rules might not come back in time to capture time-critical opportunities to shift the marketing choices during the show for maximum impact. The key is to match the capability to the scenario and build accordingly.

Real time analytics promises to be one of the brightest and most promising horizons in the decade’s BI and data warehousing developments — without question, it’s this year’s Holy Grail.


DataHub Writer: Douglas R. Briggs
Mr. Briggs has been active in the fields of Data Warehousing and Business Intelligence for the entirety of his 17-year career. He was responsible for the early adoption and promulgation of BI at one of the world’s largest consumer product companies and developed their initial BI competency centre. He has consulted with numerous other companies about effective BI practices. He holds a Master of Science degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor of Arts degree from Williams College (Mass)..
View Linkedin Profile->
Other Articles by Douglas->

No results found

Menu