StreamSets, the startup that develops software for managing the performance of data pipelines, has extended the capabilities of its product portfolio with a new product for managing the flow of data from "edge" devices, and a new cloud service that helps businesses gain more control over multiple data pipelines.
The additions extend the range of the company's Streamsets Data Operations Platform portfolio for managing the performance of data flows to and from big data applications.
"A good metaphor for us is air traffic control for data," said Rick Bilodeau, StreamSets marketing vice president, in an interview with CRN. "We think of that as our role in the data world."
[Related: The 10 Coolest Big Data Products Of 2017]
Managing the lifecycle of "data in motion" from operational applications, Internet of Things networks, industrial sensors, real-time analytical applications and cybersecurity endpoint systems is a challenge for businesses. While custom-coded software has been the traditional remedy, Bilodeau said such systems are difficult to maintain and update.
StreamSets, founded in 2014 and based in San Francisco, debuted its StreamSet Data Collector software in 2015 and StreamSets Dataflow Performance Manager in 2016.
More businesses are building pipelines to fuel their next-generation analytics projects, said Clarke Patterson, StreamSets' product marketing executive, making business intelligence more pervasive throughout their organizations.
The nature of such analytics projects is shifting from batch mode to streaming data analysis, doing more lightweight analysis at the edge devices rather than just collecting data, and mixing different analytical workloads – including "hardcore machine learning and artificial intelligence," Patterson said.
"We see a lot of opportunity moving forward," Patterson said.
The new Streamsets Data Collector Edge (SDC Edge) is an open-source, ultralight version of the StreamSets Data Collector software that provides data collection and ingestion from edge systems such as Internet of Things and cybersecurity devices.
The software takes up only 5MB, making it ideal for resource- and connectivity-constrained systems, according to the company. Today data ingestion logic for such applications is often custom-coded for specific devices. Some packaged software, such as Splunk and General Electric's Predix, have their own data ingestion capabilities.
Based on the Go programming language, SDC Edge supports a range of operating systems including Linux and Android. It performs computations such as data normalization, redaction and aggregation, and is architected to support full-featured edge analytics including machine and deep-learning models.
"Portability is key here," Clarke said. The company sees industrial IoT and cybersecurity as the two most common use cases for the SDC Edge software.
Last week StreamSets debuted StreamSets Control Hub, a higher-level management tool for streamlining the development, deployment and operational management of many-to-many data flows that span edge systems, on-premise data centers and multiple cloud platforms.
A component of the StreamSets Enterprise Edition, the new StreamSets Control Hub includes a cloud-based design tool and shared pipeline respository, automated deployment and provisioning, and data governance support through integration with enterprise catalogs like Cloudera Navigator and Apache Atlas.
StreamSets relys on the channel as part of its go-to-market strategy, partnering with resellers, systems integrators and OEMs – the latter including industrial equipment manufacturers. Solution provider partners often have practices in specific areas such as IoT and cybersecurity, Patterson said.
StreamSets was named a Gartner "Cool Vendor" in May and the company raised $20 million in Series B financing the same month.