This week’s BIGcast brings in OnApproach’s CEO, Paul Ablack, to talk with John Best about his vision for the credit union industry. The second episode of the Data Lake BIGcast Series, Data Lake: What It Takes, discusses how the creation of an industry-standard data lake could completely revolutionize the credit union industry. Paul goes into depth on the capabilities of a data lake and how this transformation is not only beneficial for credit unions, but the entire community.
Laying the Groundwork for the Future
OnApproach is a credit union service organization (CUSO) that is focused on collaborative analytics for the credit union industry. OnApproach provides credit unions with a middleware solution for efficiently organizing and maximizing the full potential of that data. Paul goes on to explain:
Data is extremely valuable for all the initiatives underway. The big challenge for credit unions is the integration of that data and putting it to the correct use. Our mission is to bring all that data into one place and make it easily accessible for not just credit unions, but the industry as a whole. We call this the CU App Store, where app developers and vendors can come in and create an app that credit unions can use to perform analysis (i.e. models of attrition, outreach, and predictive analytics).
The Caspian Data Lake will take multiple years to develop but will revolutionize the way companies can utilize their data in great fashion. Paul takes it a step further in saying, “We want to create a community. Caspian is that community where all of the data comes together and allows credit unions, data scientists and fintechs to work on that data.”
OnApproach is transforming this complex data lake dream into reality one step at a time, by focusing on:
- Their partnership with CUFX
- Complying with NIST standards
- Implementing Hadoop for real time data uploads
One Common Language
CUFX stands for Credit Union Financial Exchange and is an initiative focused on improving data standardization to aid with the overall system integration between credit unions and outside vendors. In this episode of BIGcast, Paul explains the differentiating characteristic of the Caspian Data Lake:
I’ll tell you one other thing we do which is different from most other data lakes. Because of the normalization of the data to CUFX, it gives us a tremendous advantage as we avoid the ingestion process of that data. Users of our lake skip this complex process because the data is already mapped and fits well together to make sense…The CUFX is a data standard for everything, making the data one common language that works on the lake and the application programming interface (CU App Store).
However, standardizing credit union member data does bring up relevant concerns. One of the most important aspects that this collaborative community brings is the high level of security that it requires. Users can not just come in and take sensitive information from credit unions as all the available data is not only masked but encrypted. Caspian is a workspace where PII (personally identifiable information) is safeguarded and not extractable in any manner. Regarding security provisions Paul says, “We are following the health care industry. We follow the NIST standards, a guide to protecting confidentiality of data when managing data.”
Going Above and Beyond: More than Just a Relational Database
Managing different types of data from a plethora of sources poses credit unions with a difficult challenge. That is, it is difficult to find a way to store and extract data in an efficient manner. The answer to this lies in the consistency and manner of data uploads taking place. In other words, a higher frequency of data uploads in a shorter amount of time would yield more accurate and up-to-date data.
Paul talks about OnApproach’s status of the uploads, “Right now we upload data daily, but we are going to set up the lake to handle real-time use cases. We use the Hadoop file system (primary data storage that provides high-performance access to data across multiple nodes) that can handle much larger volumes of data.” This is perfectly suited for Caspian, as a data lake stores greater quantities and formats of data. The sole use of a relational database, like SQL, is simply not enough to handle the scale of querying that a data lake requires. Paul goes into further detail:
The best example I can give you is Netflix. If you had to handle that many queries at one time with a normal relational database it could get really slow. Single-threaded databases like that can cause the queries to get lined up behind each other. In the Hadoop world, they can replicate this data across multiple nodes (computers). The nodes all sit in a central network with access to all the data…Imagine a wall containing all of these nodes, and users can go in and access them. They are so fast that they can handle a lot of these queries simultaneously.
The Perfect Industry for Innovation: One Built Upon Collaboration
There are upwards of 5,600 credit unions in the United States, with 5,300 of them below the $1 Billion in assets mark. Alone, they don’t have anywhere near sufficient data to perform predictive analytics with confidence. However, with the implementation of an industry data lake, everything could change. Making one big collaborative data pool with thousands of different credit unions would significantly improve the capabilities of the entire community. The data would become a lot more meaningful and lot more accurate for reporting and analytics. Paul closes his thoughts on this tremendous opportunity with Caspian. “We want to bring the best and the brightest to the lake, collecting more and more data to make innovation possible with this community. It’s almost like a perfect storm of technology coming together at the right time. The exponential possibilities that this could bring in a short amount of time makes this so exciting.”
To listen to the entire podcast, go to https://www.big-fintech.com/Media/BIGcast/ArticleID/241/Data-Lakes-What-it-takes