This use is most popular, according to Gallivan. Online retailers want to find out what shoppers are doing on their sites -- what pages they visit, where they linger, how long they stay, and when they leave.
"That's all unstructured clickstream data," said Gallivan. "Pentaho takes that and blends it with transaction data, which is very structured data that sits in our customers' ERP [business management] system that says what the customers actually bought."
Internet of Things
The second most popular use case involves IoT-connected devices managed by hardware, sensor, and information security companies. "These devices are sitting in their customers' environment, and they phone home with information about the use, health, or security of the device," said Gallivan.
Storage manufacturer NetApp, for instance, uses Pentaho software to collect and organize "tens of millions of messages a week" that arrive from NetApp devices deployed at its customers' sites. This unstructured machine data is then structured, put into Hadoop, and then pulled out for analysis by NetApp.
Data warehouse optimization
This is an "IT-efficiency play," Gallivan said. A large company, hoping to boost the efficiency of its enterprise data warehouse, will look for unstructured or "active" archive data that might be stored more cost effectively on a Hadoop platform. "We help customers determine what data is better suited for a lower-cost computing platform."
Big data service refinery
This means using big-data technologies to break down silos across data stores and sources to increase corporate efficiency.
A large global financial institution, for instance, wanted to move from next-day to same-day balance reporting for its corporate banking customers. It brought in Pentaho to take data from multiple sources, process and store it in Hadoop, and then pull it out again. This allowed the bank's marketing department to examine the data "more on an intra-day than a longer-frequency basis," Gallivan told us.
"It was about driving an efficiency gain that they couldn't get with their existing relational data infrastructure. They needed big-data technologies to collect this information and change the business process."
This last use case involves large enterprises with sophisticated information security architectures, as well as security vendors looking for more efficient ways to store petabytes of event or machine data. In the past, these companies would store this information in relational databases. "These traditional systems weren't scaling, both from a performance and cost standpoint," said Gallivan, adding that Hadoop is a better option for storing machine data.