Home / Banking Strategies / Battling bank fraud with graph databases

Battling bank fraud with graph databases


As the world continues its transformation to an always-on status, data breaches and, in turn, fraud, are on the rise. In fact, according to CreditCards.com, data breaches totaled 1,540 worldwide in 2014 with 12% of those breaches occurring in the financial services sector. While fraud is not completely preventable, there are approaches banks can take to significantly mitigate the growing issue, including focusing on relationships in banking data to uncover patterns of suspicious fraudster activity.

Looking at data relationships doesn’t necessarily mean gathering new or more data. The key to it is to look at data in a way that helps make explicit underlying connections through graph databases. And a growing number of financial institutions are using them to solve a variety of data problems, in particular to identify advance fraud scenarios, and in real time, too.

PayPal uses graph techniques to perform sophisticated fraud detection on eBay and StubHub transactions for just this purpose. International Data Corporation (IDC) estimates that this has already saved PayPal more than $700 million while enabling the company to perform predictive fraud analysis.

As we know, there are various types of fraud, with first-party bank fraud, insurance fraud and e-commerce fraud being some of the most troublesome.

First-party fraud involves criminals who apply for credit cards, loans, overdrafts and unsecured banking credit lines. Aite Group suggests that first-party fraud will be responsible for an estimated $28.9 billion in credit losses by the end of 2016. The surprisingly large size of these losses is due to the difficulty of identifying first-party fraud, where fraudsters behave in the same way as a legitimate customer until the day they cash in their inflated accounts and abscond with the money.

At the same time, there is a relationship between the number of fraud participants to the potential value of their illegal gains. In a fraud ring of just two individuals, sharing only phone number and address, this ring can create four synthetic identities with fake names and, with four to five accounts for each synthetic identity, a total of approximately 18 accounts. Assuming an average of $5,600 in credit exposure per account, the bank’s loss could be over $100,000 as a result.

Catching fraud rings before they cause significant damage is still difficult because traditional methods of fraud detection are not geared to look for shared identifiers and layers of indirection that can only be uncovered via connected analysis. The good news is that while these exponential relationships are what make these schemes so damaging for banks, it’s also the characteristic that makes them open to graph detection techniques.

By contrast, standard data-based tools, e.g. deviations from normal purchasing patterns, use discrete data, not the connections leveraged in these complicated fraud rings. Discrete methods are useful for catching fraudsters acting on their own, but fall short when it comes to collectives who may work cross-border, even cross-continent. Furthermore, such methods are prone to the notorious “false positive,” which creates undesired side effects in annoyed customers and lost revenue opportunity.

Uncovering fraud rings with traditional relational database technologies requires modeling the data as a set of tables and columns, then carrying out a series of complex joins and self-joins. Queries like this work, but can be complex to build and expensive to run. Scaling them in a way that supports real-time access also poses significant technical challenges, with performance becoming exponentially worse, not only as the size of the ring increases, but also as the size of the total dataset grows.

Augmenting existing fraud detection infrastructure to support ring detection can be easily done by running appropriate entity link analysis queries using a graph database, augmented by running checks during key stages across the customer and account lifecycle, such as once a credit balance threshold is hit, when a check bounces and so on.

Real-time graph traversals tied to the right kinds of events can help banks identify probable fraud rings, during or even before the bust-out occurs.

To sum up, traditional database technologies, while necessary for certain types of prevention, are not designed to detect the most elaborate fraud operations. In contrast, graph databases provide a unique ability to uncover a variety of important fraud patterns in real time, either in groups or on an individual basis, making them a powerful addition to any financial services firm’s security arsenal.

Mr. Eifrem is CEO of San Mateo, Calif.-based Neo Technology. He can be reached at [email protected].