Sharing data in a Microservices Architecture using GraphQL

Published in

GetNinjas

7 min readJul 12, 2017

Here at GetNinjas we’ve faced some problems integrating components of our Microservices Architecture and exploring available options, GraphQL showed up as an excellent fit. I’ll focus on exploring problems with three common ways to do that, and in the end suggest GraphQL as a better option for some cases.

Common strategies

Strategy #1 — Share database access

Integrate systems by their databases seems to be so easy that we feel seduced to do this. All we have to do is share databases access among the services, so the system A can access the database of the system B and so forth. What is the problem with this approach? Here is a list of some of them:

#1.1 Performance
Problems with performance in the service A can affect the system B and make it hard to scale them independently, actually they are not independent.

#1.2 Duplicated business rules
Imagine that the system A have a list of users with a status property that indicates if the user is activated or not, if the service B fetches these users directly from the A’s database, you’ll have to filter by activated users in two places, if this rule changes, the chances of one of them becoming outdated is high.

Ok, this strategy breaks basic rules of Microservices so it’s not worth it continuing analyzing its problems.

Microservices prefer letting each service manage its own database, either different instances of the same database technology, or entirely different database systems.(Decentralized Data Management)

Strategy #2 — Sync all data in a centralized database

If you have a data warehouse you may be feel tempted to use it but, here’s some details you have to pay attention.

#2.1 Different types of data source
If you have components of different data source types in your architecture, for instance, some of them using PostgreSQL, MySQL others using MongoDB, Redis, Cassandra, Neo4j… It can be hard to sync all of them in a single centralized database due to different formats.

#2.2 Racing conditions
The sync takes some time to occur, then when a service tries to access information that was not yet synced, that service will work with an outdated version of the data and you will start to experience some random problems. The common solution is to add some delay to request that information. Actually it’s not a real solution because it brings other problems.

#2.3 Changes in the service’s schema
When you change the schema of some service, it will be synced, and then, other services that relies in the old format will break. You can minimize this problem by writing a giant integration test that runs after changes on each component, but a change in one service should not impact other service (See Componentization via Services).

#2.4 Change the type of data source
You may need to change the type of your data source, for example from a document to a relational (here at GetNinjas we had a case like that). This change will impact all the services using this data.

#2.5 Dynamic calculated information
There are some cases that the information is not persisted but is calculated before the use, like the URL of an user’s avatar. This type of information lives inside the application in places like Decorators, Presenters, config files, not persisted in the database and obviously it will not be synced, so, in this strategy you will have to duplicate the logic to accomplish the same result URL.

#2.6 Duplication of common queries
Let’s say you have a query that returns the top five similar products and other service want to use it. In a synced database strategy you will need to duplicate that query, the problem with it is obvious, when the query is changed, you will need to change all versions too.

#2.7 Data format
It's a good practice to store the data in raw (without format) and format them when displaying. Using an API to integrate systems you can also return data formatted and the logic to format that data is not spread out the entire architecture. In this strategy you will have to save the data formatted or reimplement the logic.

Sync databases in one big place can be a good choice for analysis purpose, but to share data between services, not so much.

Strategy #3 — REST APIs

At this point, things start to get much better. But we can list a few problems.

#3.1 Hard to reuse existing endpoints
Most likely, each service will need a set of specific fields, adding new fields to the endpoint will not only affect all services consuming this endpoint, but it will also decrease the performance of the API.

#3.2 Create specific endpoints for each service
Seems to be a good idea, but as you start to add more and more endpoints, the API becomes messy and the development time grows as well. The need for a documentation system like Swagger begins to make sense.

For some cases you can stop reading here, REST APIs can do a really great job integrating services, BUT GraphQL is here and understanding this tool can be very useful!

GraphQL as an API for some services

I’ll describe below a few benefits of GraphQL over REST.

I'm preparing a post about pain points of GraphQL, stay tuned :)
Update 09/19/17: Pain Points of GraphQL

Flexibility
You can query the schema navigating through multiple resources by their relations saving round trips to the server.

Performance
Ask for only what you need, not a resource that responds with all data.

Github’s GraphQL API example of query/response using the GraphiQL IDE.

Highlights

Query multiple "resources" at the same time in a single request (in this case user and repository).
The query matches exactly the response. You do not need to read the documentation or run the request to know the response structure.
You can pass arguments to fields (like in the avatarUrl that receives size).
Query nested resources (e.g. organizations are children of user).
Pagination ready to be used (see Relay Cursor Connections Specification).

Documentation
Document field by field during development.

Github’s GraphQL API example of auto-generated documentation using the GraphiQL IDE.

Highlights:

The description of types or fields appears at the top of the documentation window.
The type system helps you define your schema in a more natural way.
You can mark fields as deprecated (like databaseId of Organization Type).

Development
Evolve the entire API schema instead of only one REST API endpoint. When you add a field it can be used by other consumers.

Organic versioning
Simply add fields when you need or mark them as deprecated when you plan to stop using them.

Monitoring
Track the usage of each field in your API, you can track who is using, performance and the usage of deprecated fields.

Of course that GraphQL will not replace all cases that we are accustomed to solve with other technologies, there are so many out there, RESTful, SOAP, Sockets, Protobuf, some kind of binary protocol over UDP, Queues, etc. Knowing other options will help you achieve your goals quickly and better.

GraphQL as API Gateway

Instead of requesting information to a specific service you can request that information to an API Gateway which abstracts the service that is the owner of the data.

Let’s use an online store as example. Services that have more information to expose may use GraphQL and services that exposes less may use a simple REST API.

In this approach it doesn’t really matter if the services uses a REST API or a GraphQL API. Actually, if your service is really thin GraphQL will only add an extra complexity, so a simple REST API can be better.

Merging the APIs in one place

Update: 02/26/2018

At this point, you may be asking yourself, how to join all APIs in only one GraphQL schema? There are some tools to help you :)

GraphQL schema stitching
I guess that this term was created by Apollo team. It refers to the action of merge GraphQL schemas.

This post summarizes how schema stitching works and here, you cam find examples to build it using the Apollo GraphQL Tools.

Another interesting too to combine GraphQL schemas is the graphql-weaver (A tool to combine, link and transform GraphQL schemas), that actually was born before the tools created by Apollo.

Reimplement the schema
In this situation, you can rewrite all the schema, fetching the data from each field resolver, like in this PersonType example:

...
person: {
  type: PersonType,
  args: {
    id: { type: GraphQLString },
  },
  resolve: (root, args) => {
    fetch(`${BASE_URL}/person/${args.id}/`)
      .then(res => res.json())
      .then(json => json.person)
  },
},
...

If you choose to go in this way, take a look at Wrapping a REST API in GraphQL, and DataLoader (to prevent N+1 fetches).

Conclusion

Integrate services by their databases is not a good option, an architecture that takes advantage of the best of each technology shall bring better results (as always).

Special thanks to Cristhiane Almeida, Daniel Tamai and Ion D. Filho, who helped review an earlier draft of this post.

I hope you’ve enjoyed the ideas presented here. If you liked it, please consider tapping or clicking the 👏 icon to recommend it to others so they can enjoy it too. And feel free to share it widely in your favorite social network :-)