Dipping Your Toe in Hadoop…

10 years ago

By now your CEO or CIO has not only heard of Hadoop, but you’ve likely been asked for your input as to what the company’s Big Data strategy is going to be. Or perhaps you got pinned down in the hallway in a kind of CIO-drive-by, asking “Hey, what are we doing with Hadoop? I need a deck on this for three-year budget planning by Friday.”

It’s a reasonable request. Companies striving for the ever-elusive competitive advantage in data-driven/data-focused decision-making are combing through massive amounts of structured and unstructured information. In order to compete in the marketplace, they want the freshest data (despite ever-lengthening batch cycles), the deepest history (despite increasing costs for data retention), and the fastest response times (despite increasing competition for hardware performance between batch and interactive demands).

But fortunately you’ve done your homework. You know what Hadoop is, and you’ve done some research or attended a few webinars. These outlandish expectations are where Hadoop excels. And even if the company is not yet swimming in the deep end, you’re ready to dip your toe in the water. The key to making this step forward successful and effective in generating interest and driving commitment for a larger implementation is to start by solving the IT problem, not the business problem! Specifically, use Hadoop to shoulder some of the batch cycle burden currently processed by the data warehouse itself. From there, use the increased capacity in the batch cycle to deliver on some of those business demands, either for fresher data streams processed in faster cycle times or for deeper data histories available for ad-hoc analysis.

From this launching pad of minor-but-increasing successes, the Hadoop cluster can work for you as a centralised data archive for the entire organisation based solely on its capacity to ingest data effectively. Next you can leverage its flexible organisational capabilities by increasing the data footprint. As you add more of your existing data streams into the Hadoop cluster and refocus the data warehouse on preparing analytics, you can also begin to add new data sources and novel data types, such as unstructured data streams from social media, clickstreams, PLC and sensor logs, etc. The step after that is to reach into that rich data “soup” with existing data analysis and discovery tools to ladle out the richness of information you’ve been sitting on all this time. From here you’re leveraging the power of the technology itself and preparing for the final phase of maturity, in which you begin pointing the heavy weapons at your data, conducting extensive data discovery, employing sophisticated data science, and enjoying the capability of rapid analytics prototyping and refinement.

While integrating Hadoop into your company is certainly more than a hallway conversation with the CIO can reasonably encapsulate, it’s good to know that a pathway exists between where you are and where you can get to, even if you have to adapt a few of the steps along the way.

DataHub Writer: Douglas R. Briggs
Mr. Briggs has been active in the fields of Data Warehousing and Business Intelligence for the entirety of his 17-year career. He was responsible for the early adoption and promulgation of BI at one of the world’s largest consumer product companies and developed their initial BI competency centre. He has consulted with numerous other companies about effective BI practices. He holds a Master of Science degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor of Arts degree from Williams College (Mass)..
View Linkedin Profile->
Other Articles by Douglas->

No results found.

By now your CEO or CIO has not only heard of Hadoop, but you’ve likely been asked for your input as to what the company’s Big Data strategy is going to be. Or perhaps you got pinned down in the hallway in a kind of CIO-drive-by, asking “Hey, what are we doing with Hadoop? I need a deck on this for three-year budget planning by Friday.”

It’s a reasonable request. Companies striving for the ever-elusive competitive advantage in data-driven/data-focused decision-making are combing through massive amounts of structured and unstructured information. In order to compete in the marketplace, they want the freshest data (despite ever-lengthening batch cycles), the deepest history (despite increasing costs for data retention), and the fastest response times (despite increasing competition for hardware performance between batch and interactive demands).

But fortunately you’ve done your homework. You know what Hadoop is, and you’ve done some research or attended a few webinars. These outlandish expectations are where Hadoop excels. And even if the company is not yet swimming in the deep end, you’re ready to dip your toe in the water. The key to making this step forward successful and effective in generating interest and driving commitment for a larger implementation is to start by solving the IT problem, not the business problem! Specifically, use Hadoop to shoulder some of the batch cycle burden currently processed by the data warehouse itself. From there, use the increased capacity in the batch cycle to deliver on some of those business demands, either for fresher data streams processed in faster cycle times or for deeper data histories available for ad-hoc analysis.

From this launching pad of minor-but-increasing successes, the Hadoop cluster can work for you as a centralised data archive for the entire organisation based solely on its capacity to ingest data effectively. Next you can leverage its flexible organisational capabilities by increasing the data footprint. As you add more of your existing data streams into the Hadoop cluster and refocus the data warehouse on preparing analytics, you can also begin to add new data sources and novel data types, such as unstructured data streams from social media, clickstreams, PLC and sensor logs, etc. The step after that is to reach into that rich data “soup” with existing data analysis and discovery tools to ladle out the richness of information you’ve been sitting on all this time. From here you’re leveraging the power of the technology itself and preparing for the final phase of maturity, in which you begin pointing the heavy weapons at your data, conducting extensive data discovery, employing sophisticated data science, and enjoying the capability of rapid analytics prototyping and refinement.

While integrating Hadoop into your company is certainly more than a hallway conversation with the CIO can reasonably encapsulate, it’s good to know that a pathway exists between where you are and where you can get to, even if you have to adapt a few of the steps along the way.

DataHub Writer: Douglas R. Briggs
Mr. Briggs has been active in the fields of Data Warehousing and Business Intelligence for the entirety of his 17-year career. He was responsible for the early adoption and promulgation of BI at one of the world’s largest consumer product companies and developed their initial BI competency centre. He has consulted with numerous other companies about effective BI practices. He holds a Master of Science degree in Computer Science from the University of Illinois at Urbana-Champaign and a Bachelor of Arts degree from Williams College (Mass)..
View Linkedin Profile->
Other Articles by Douglas->