Hitachi Vantara has begun shipping Pentaho 8.0, a new release of the company's business analytics software with advanced connectivity to streaming data sources for real-time business analytics.
As past practices of analyzing historical data give way to analyzing data in real time or near-real-time, businesses are increasingly demanding that their business intelligence tools be capable of handling streaming data from Internet of Things networks, social applications and cloud systems.
Market researcher IDC forecasts that the volume of generated data will increase by a factor of 10 by 2025. More importantly, 25 percent of that data will be real time – with 95 percent of that data streaming from IoT systems.
"A lot of businesses have come to the conclusion that data is relevant. But they have also discovered that they don't know how to integrate data, analyze it and use it to gain insights," said Dennis Wilbrink, a data and analytics consultant at Incentro, a Netherlands-based solution provider and longtime Pentaho partner.
Anticipating the growing demand for ways to derive value from all that data, Hitachi acquired business analytics software developer Pentaho in 2015.
The company recently combined the Pentaho business with its Hitachi Data Systems and Hitachi Insight Group operations within Hitachi Vantara, a wholly owned subsidiary.
"Pentaho 8.0 is really all about improving connectivity to these real-time data streams," said Arik Pelkey, senior director of Pentaho product marketing, in an interview with CRN.
Pentaho 8.0 offers improved connectivity to streaming data sources, most notably the Kafka Streams publish/subscribe messaging system that handles large data volumes in a growing number of IT organizations. The 8.0 edition specifically enables real-time processing with specialized steps that connect Pentaho Data Integration to Kafka.
The new release also fully enables stream data ingestion and processing using either the software's native engine or the Spark in-memory processing engine.
Those new capabilities will help meet the demands that Incentro is seeing from customers for the ability to handle streaming data from IoT systems, automotive systems and financial service systems, said Wilbrink in an interview with CRN.
"The notion of streaming data is becoming increasingly important to critical business operations," Incentro's Wilbrink said, pointing to use cases in financial services and telecommunications.
While Wilbrink said Pentaho 7.1 offered some real-time analytical capabilities, "with the release of 8.0, they have a done a lot of development on the streaming data side."
The new Pentaho release builds on its enterprise-level security for Cloudera and Hortonworks platforms by supporting the Knox Gateway for authenticating users for Hadoop services.
The 8.0 edition also provides a number of new features and functions to help IT optimize data processing resources.
Adaptive execution, which matches workloads to the most appropriate processing engine without rewriting data integration logic, was introduced in Pentaho 7.1. The 8.0 release makes it easier to set up, use and secure adaptive execution. It also makes adaptive execution available for the Hortonworks platform.
The 8.0 release allows IT managers to utilize additional compute nodes and spread workloads across all available computation resources to match demand. It also supports the popular Avro and Parquet big data file formats.
Wilbrink said the enhancements move data processing closer to where the data resides, significantly speeding up processing times.
The Incentro data and analytics consultant said he has been evaluating the community edition of Pentaho 8.0, which came out earlier, and has been waiting for the general availability of the commercial release.
Another focus of the 8.0 release is speeding up the development and implementation times for business analysis projects.
"There's a shortage of big data developers. So you want to get the most out of those resources," Pelkey said, noting that between 60 percent and 80 percent of business analysis projects' time is spent on data preparation.