10 bad surprises I encountered as an engineer when writing B2B SaaS integrations (and how to deal with them)
An engineer shares 10 (bad) surprises he encountered as he started to implement customer facing integrations at his last B2B SaaS company
An engineer shares 10 (bad) surprises he encountered as he started to implement customer facing integrations at his last B2B SaaS company
When I started to write customer facing integrations between (B2B) SaaS products I quickly learned that it is a lot thornier than it looks at first sight. In theory it is just about calling an external REST API, parsing some JSON and connecting the external fields to your internal data structure. Shouldn’t be that hard, right?
Well, in reality I kept running into problem after problem at my last B2B SaaS as we scaled from a single type of integrations with 1-2 connected external systems to 3+ integration types with dozens of connected systems. I really wished somebody would have warned me how much harder this would be than I imagined when I got started.
In the spirit of paying it forward, and hopefully saving you some time and headaches in building your own customer facing integrations, I assembled a list of the 10 worst surprises I ran into when building out our integrations.
Let’s dive in:
Solution: Plan for scale from day 1. You will need it.
When I got started with writing integrations for our SaaS product I did some brief market research, spoke to a few business people and came to the conclusion that for the type of integration we were looking to build (importing customer’s fuel costs for their cars) there seemed to be about 5 vendors which together had 70-80% market share at least.
I reasoned that if we could get to about 80% coverage of the market we should be good and thus planned our integrations architecture with the support of 5, maybe 10 different systems in mind.
3 years later this type of integration now supported 50+ different vendors with more than 100 different data formats. These integrations are used by 800+ different customers, each of whom has between 1 and 4 different formats setup. So in total we are talking about 2000-3000 active connections which import data from customers every week (which created plenty of interesting debugging issues, more on that below).
What I had missed in my initial analysis is how important these integrations were for our customers. Our application was focused on vehicle costs and the costs of the car’s fuel was a central piece in that. No customer wanted to enter this data manually, so if they had some vehicles for which we did not support the import of their fuel costs they immediately saw much less value in our software. This meant that supporting a good part of the long tail of providers was crucial for our business.The second problem which I had underestimated was just how big the long tail of providers was: Sure the top 5 providers give you ~70% coverage, but the remaining 30% are spread over dozens if not hundreds of smaller niche providers. And it turns out that many customers, for historical reasons, or to optimize costs or coverage, used more than one provider.
Solution: How I would approach it today
What I have learned since is that the same long tail problem exists in almost every category of (SaaS) software: Yes there are some major vendors in every category, but they rarely each have more than 5-10% market share. So by the time you cover the top 5-6 vendors you have at best a 50-60% coverage of the market. This is almost never enough, so if the integration you are looking to build is a core part of the value proposition of your product you should plan for scale in your connected systems from day one.
Solution: Keep your architecture flexible and expect to rewrite it a few times.
This one was very counter intuitive for me: After we had integrated with about 10 external systems I expected the next integrations to get progressively easier. After all we had already seen 10 different architectures, surely number 10 through 15 would resemble one of these?
It turns out that I was wrong and in fact the opposite happened: The first 10 systems that we integrated with were all fairly large players and had an (at least somewhat decent) api with proper documentation. But beyond that we got into the long tail and many of the systems we encountered either did not offer a proper api anymore or it was badly documented and rudimentary at best.
This was bad news for our internal integrations architecture: Once we had implemented the first 5 integrations we had started to see patterns and with a quick refactor had abstracted these away into a mini-framework to make writing additional integrations faster and less error prone. This was helpful for integrations number 5 to 10. But the long tail did not fit well with our abstracted framework and we quickly had more exceptions where we were working around our own framework than standard cases. We had no choice but to rewrite our mini-framework again to accommodate for these new kinds of integrations.
Solution: Keep it flexible
Today I try to keep the architecture for the first 10 or so integrated systems fairly flexible. The variety of APIs is just too big to squeeze them all into a rigid framework where you expect e.g. a certain endpoint to behave in a certain way, rate limiting to be handled in a certain way or an HTTP status code to always be representative of the same error.
Keep it flexible and think of your mini framework as more of a helper library that provides common functions rather than a strict corset for how the interaction with the external system should be structured.
Solution: Budget for it accordingly and build for robustness from day 1.
In theory customer facing integrations should provide tremendously scalable value: Write an integrations once and potentially thousands of customers can get value from it. Whilst this is somewhat true I have found that in reality 90% of the work is in maintaining, debugging and fixing these integrations: External systems change, not all behaviour (and edge cases) are properly documented, different customers use these external systems differently which can lead to issues in how you map data or interpret values, a sync schedule that worked fine with 10 connected customers breaks when you hit 30 etc. Maintaining customer facing integrations is an endless game of whack the mole.
Solution: Accept it, budget for it and build for robustness from day 1
The most important part about this is to be aware that building a new integration is a long term commitment. Budget for it accordingly and make sure you have the engineering capacity to maintain the integrations that you have built. A broken integration is often worse than no integration at all and can quickly erode a customer’s trust in any data in the system.
Wherever you can build for robustness from day 1, because…
Solution: Build for scale and robustness from day 1.
Most teams underestimate the amount of traffic they will have from their customer facing integrations. Let’s do some quick math and say you have 500 customers which each have 2-3 active integrations in your product. Each integration performs on average 500 api calls a day to interact with an external system. This gives us:
Most SaaS products which reach medium sized scale with a few hundred customers will see millions of API calls in their integrations every month.
This means that even small edge cases which only affect 0.5-1% of your API calls will affect >100’000 api calls a month. And if an issue affects only 1% of integrations that still affects more than a dozen customers in our example above.
There is no real way around this, if you build integrations which are popular with your customers (as we sure hope they are!) you will reach scale on these quite quickly. And because you only control one side of the system you have limited control over error rates and how to mitigate issues when they arise.
Solution: Build for scale and robustness from day 1
Build for robustness whenever you can: Handling rate limits, expired access tokens, temporary outages, format parsing issues and revoked permissions should all be standard fare. You will run into them, trust me.
If you see a potential edge case that could be problematic at least add an info level logging message with the request payload and full response. Many edge cases can be hard to reproduce and debugging it will be so much easier when you have visibility into the actual requests when the issue arose.
You probably also want to setup an early warning system that notifies you of issues before customers notice them, but more on that below.
Solution: Only add external systems when there is a clear ROI case (return on investment)
As customer facing integrations are often central to the value customers get out of your SaaS product it is almost impossible to deprecate an integration once it exists: You would essentially be telling your customers that their business is no longer welcome with you.
Yet the incentives for adding a new integration are unbalanced: Sales and business development are always eager to add more integrations as they keep hearing about systems their leads are using during the sales process, but which are not yet supported by your product. Each of these missing integrations can become a deal blocker and so Sales is keen on having more supported integrations (or ideally a blanket promise that you will support whatever integrations their signed customers demand).Sometimes it is strategically important that your company promises a lead (often for large Enterprise deals) that a certain type of system will be supported if they sign as it is central to how they will use your product. But beware of making these kinds of promises lightly as the maintenance burden of every additional integration is significant.
Solution: Only add integrations when there is a clear ROI case or it is a strategic differentiator
Every new integration should require a clear ROI case that outlines how the benefits (e.g. additional revenue from new or existing customers) outweighs the costs of creating and maintaining the integration.
By creating and sharing these ROI cases within your company you can get two major benefits for everybody involved:
First you only add additional integrations when there is an actual business case behind it, which is especially important as you start scaling into the long tail of providers.
Second it also helps Sales and business development teams by giving them clarity on how many customers are required to support their case for an additional integration. By clearly stating the decision criteria it turns a potentially emotional topic into a rational one and can help align both the engineering and sales teams behind the common goal of creating value for the customer whilst also making sure it leaves a profit margin for the business.
Solution: Provide clear expectations and as much insight into the integration status for customers as possible.
Here is a situation that will for sure NEVER happen:A customer creates a new lead in Salesforce, switches tabs to your email marketing tool and constantly hits refresh on the contacts list until they see their new lead appear. If they can’t find it after 3 minutes they are already on the phone with your support team and in full panic thinking about how many leads already got lost in the sync between the two systems (and mind you if the total count of leads between the two does not match…)
Many customers have a deep distrust for technical systems and a few hard core users will double check everything. The good news is that these customers will find any issue with your integrations. The bad news is that each of these (perceived, not always real) issues will for sure land on your desk and with every issue they find they will trust your product and the integration a little bit less.
Solution: Provide clear expectations and insights into the status of the integration
Many integrations only run periodically in the background, if you can provide visibility to customers when an integration last ran (or when a lead/deal/item was last updated/synchronised). This can pre-empt many support queries and give your customers peace of mind that everything is working as it should. A button to manually trigger a refresh for a specific record can also help with problems where a change was not picked up (yet) and lets a customer fix a problem by herself without having to bother you or customer support.
Where detailed insights are not available at least document the periodic sync intervals (also for customer support, which will otherwise ask you at least 4 times a week when integration X will sync next…) and the conditions required for a record to be updated or pushed out.
If you do not provide deep insights into the status of the integration you can expect to get many tickets on your desk from customers who believe record X should have been synced, item Y was not pushed out properly or an integration is not working “as they expect it”. In my experience a good 60-70% of such issues are either user problems or arose out of a misunderstanding when/how an integration syncs information. But they still take a lot of time to debug because you need to rule out there is no underlying bug. The more of these issues you can nip in the bud the better.
Solution: Embrace it and build your integration mini-framework/library with it in mind
Many integrations look like fairly straight forward data mappings on the surface: For example, importing contacts from an external system might look as easy as mapping the fields in that system to the fields in your data schema.
Alas, things are rarely this simple in reality. Not only do different systems often have very different schemas but they also have different ways of representing the same thing: What is an enum (categorical field) in one system might be a free-form text field in the next. Or what is represented as an attribute in one system is a linked child-object one to many relationship in another.It gets worse when you consider the actual values: Sometimes a very similar value can have subtle differences in meaning and the correct mapping makes or breaks your integration from the customer’s perspective.
Solution: Embrace the complexity and build your abstraction at the right level
All of these things mean that integrations are never as easy to build as just transforming data from one schema to another. Embrace it and make sure your integration mini-framework (or helper function library) leaves room for custom logic as you load data from the external system or push it out.
A good level of abstraction is often the action you want to perform: Provide a common interface for instance to load all contacts from the external system. But how these contacts are loaded and transformed into your internal schema is probably something you want to leave to the implementation of each integration.
Solution: Keep an eye on the external api’s change log and add logging for format errors
If you build towards another system’s api you are inherently in a position where you do not control what the other system does. Luckily many APIs are quite stable and change infrequently, but every now and then you will find that you have to rewrite a large part of an integration because some endpoints changed, formats got updated or entire sections of an API have been deprecated.
Unfortunately there is no silver bullet here and until you are large enough to have others build towards you (which comes with its own issues, more on that in another post) this is a constant part of life as an engineer working on customer facing integrations.
Solution: Follow change logs, subscribe to updates and add logging on format errors
Many bigger APIs (which also tend to be the more actively developed ones and hence the ones which experience more frequent changes) provide mailing lists or update feeds where you can subscribe to their change logs. Make sure you don’t get too far behind as it is often easier to upgrade from just 1-2 versions ago than it is to catch up after 5 years of neglected changes.
For the APIs which provide no easy way to subscribe to changes I have found it helpful to add error-level logging on format errors: A barrage of these is often an indicator that something has changed in the API. Of course at this point it is technically already too late, but unfortunately many small and less well maintained APIs are bad at versioning and noticing fast when something has changed is still better than not noticing at all.
Solution: Rather log too many things and consider building some small tools for frequent issues you need to debug
This one might be a bit more of a personal opinion but I have heard similar complaints from other engineers who are working on customer facing integrations. Because many monitoring tools have steep usage based pricing, which is charged per metric tracked or GB of logs ingested, it quickly comes prohibitively expensive to store debug information on every API call for every customer in these systems.
But this is not the only issue, on top of bad-fit pricing I don’t know of any monitoring or logging tool which has a built-in concept of a customer or integration: When debugging issues it is often helpful to understand when a certain sync last run for a specific customers, which items were updated and why etc. You can get there with detailed logging and some custom scripts to parse these logs for you, but this still feels very manual and labour intense for something quite essential for running customer facing integrations at scale.
Solution: Generous logging and scripts for common debug tasks
Unfortunately I have not been able to find any good solutions here, if you have found an approach that works well for you with some standard system I would love to hear about it (see below for contact details).
Meanwhile my solution has been to be quite verbose in my info-level logging together with some custom scripts to e.g. only return log messages related to a certain customer ID within a certain time interval.My log messages typically log whenever a certain integrations runs for a specific customer (start time, customer id, integration/connection id, possibly configuration parameters), when a certain item is synced or not (e.g. “Skipped lead because it does not match condition X”, be sure to include appropriate internal or external ids) and when an error is encountered with the external API. These logs get stored in a separate log stream which has a fairly short retention time (7-30 days typically) as they are only meant to help debug recent sync issues.
Solution: Insight into integration status for customers and building for reliability helps avoid the worst, for the other cases good debugging logs as described in point 9 help a lot
Background syncs, where your system periodically loads data from an external system or pushes data out to it, are great for customers as they keep their data magically up to date. But when they break debugging them can be a horrible experience for engineers: Without good logging in place it can be almost impossible to understand what happened in a sync 5 days ago and why a certain record has not been updated when it should have been.
I recently talked to an engineer who spent a full 3 days debugging a background sync issue in one of his integrations and unfortunately in my experience this is not an uncommon occurrence.
Solution: Let customers help themselves, build for reliability and be generous with your logging
This one has no silver bullet.
The first line of defence is to make sure as few user problems (instead of bugs) land on your debugging list as possible. For this make sure customers understand what is going on in the background sync whenever possible (see problem 6 for more details).
Next handle all of the common reliability issues properly so they don’t have you chasing down a potential bug for hours just to find out that it was a fluke with the external system and now all is good and dandy again (see problem 4 for more on that topic).
And last but not least, add generous logging from day 1 so that if you need to digg in at least you have good visibility into what happened when and why. Problem 9 above has some best-practices for that.
I am sure I forgot some big issues in the list above. What was your worst surprise when you started building customer facing integrations? Or do you have a creative solution to one of the problems above?
I would love to hear about your experience, tweet to us at @NangoHQ or join our Slack to chime in on the conversation!
I hope this list of issues along with my best practices is helpful for you.
I wish you all the best with your next integration!