What are Graph Databases useful for?

By David Mizell | June 01, 2016

Relational Databases are great. But…
Relational databases are great. After 40-some years of development and refinement, they are reliable, powerful and capable. They can hold huge amounts of data. Some of them can be updated thousands of times per second. If you want to query a relational database about sales, sales per product or sales per product per region, you’re in good shape. Any time you’re looking for information about some item, or sums or averages of many items of the same type of information, you’ll get the answer back quickly.

What are relational databases not good at? They fall down when you’re looking for relationships between data items, patterns of relationships or interactions between multiple data items. Let’s contrast two queries: first, suppose you wrote a query that amounted to:

show all the employees who work in our Houston store

A relational database would scan through the employees table, looking for matches to “Houston” in the “location” field in the table. You’d get the answer back in milliseconds. Now, instead, suppose you asked:

show all of the employees whose management chain includes the person who taught their new-hire-orientation class

The employee’s date-of-hire might be a field in the same employees table, but the new-hire-orientation class rosters are probably in another table. So the query has to take every employee and search against the class rosters to find which new-hire-orientation class they attended. Let’s assume that this points us to the instructor of that class. Now we have to do a really elaborate search. Is that orientation instructor the employee’s supervisor? No? Well, is the orientation instructor that person’s supervisor? And on and on we crawl up the employee’s management chain. Either of two things will happen: The relational database would take minutes, even hours, to get all the answers – or the database system will run out of resources and fail to return any answers. Graph databases, on the other hand, can return answers to this second query in milliseconds. Because they are built to search through graphs, they can traverse up through a management chain many times faster than a relational database could – if it succeeded at all.

You might infer that, since SPARQL looks a lot like SQL, it can probably handle a lot of the same types of queries that relational databases are good at. You’d be right. The graph database might not be as fast, but it’ll do OK. But it really shines on those complex queries that the relational database can’t handle well, if at all.

Applications that make good use of a graph database
What kinds of applications can make good use of a graph database? Applications where it’s useful to find patterns of relationships between data items. This is not every database application – it’s no accident that relational databases are so popular – but there are many significant graph applications which cannot be analyzed accurately and efficiently via relational databases:

In many intelligence and law enforcement applications, it’s important to look for a pattern of events. Any one of the events may look innocuous, but the view of all of them together, and how they are directly or indirectly related to each other, is ominous. Chief Bad Guy sends an email to Bad Guy A, makes a phone call to Bad Guy B and sends a courier message to Bad Financier. Bad Guy A takes a train to Berlin. Bad Guy B takes a flight to Berlin. Bad Financier wires money to Bad Guy C. Bad Guy C lives in Berlin. Bad Guys A and B take a flight from Berlin to Atlanta.

Similarly, investment banks guarding against insider trading have to look for a suspicious pattern of actions, not necessarily any single action. Investment Banker A gets insider information about Stock S. Banker A sends email to Joe IT, an information technologist at the same bank. Joe IT phones Banker A. At close of business, Banker A and Joe IT badge out within seconds of each other. That night, Joe IT makes a purchase of Stock S.

The bioinformatics research community has largely adopted graph databases and the SPARQL query language. They are a natural fit to the huge network of relationships between all the chemicals present in the human body. One of our bioinformatics customers traced a chain of “this interacts with this” relationships from a drug designed to combat AIDS, via various proteins, human cells and other molecules, out to a discovery that the same drug might be effective against breast cancer.

Many vendors of consumer products have become interested in social network analysis (SNA). This has to do with constructing a graph of relationships between people. Facebook is an example of a social network, with its links between people and their friends. Graph searches might reveal which person is probably influential over his/her friends, which groups of friends share a common interest, and so on – which could lead to some very sophisticated, highly targeted marketing strategies.

Okay, that’s a quick look at why graph databases are worth considering. Basically it sums up to the fact that graph databases can answer complex questions that relational databases can’t. A truly sophisticated, effective analytics environment will include both relational and graph databases.

About the author
David Mizell is a Senior Manager of Technical Projects at Cray Inc.

<1 2 >

Navigation