Recently I was stress testing an API I built by creating a migration tool to migrate the data form version 1 of the API to version 2 of the API. The team that built version 1 of the API left out some rather important features and the DB structure of v1 was pretty bad so we decided the best course of action was to migrate the data out of version 1 into version 2 and depreciate v1.
I was migrating about 5 million records (of various types) out of the old API and into the new API. While doing that I noticed that the time it took to migrate the photos was taking much longer than the time it took to migrate the sessions.
I looked in my API call log table and noticed the rtime (response time) of an average call to import a batch of 1000 photos was about 225 seconds. Whoa, not acceptable. Digging deeper I found that there was a constraint check that was being done on the application layer that should also be taking place in the database.
A session consists of a sequence of photos, the sequence and the photos are unique to that session. I was performing this check with a lookup to the database to see if we had a photo for the given session with the same sequence number already in the database. I was using a pivot table to form a many-to-many relationship on the photos and the sessions.
The many-to-many relationship was a design flaw, it could not happen in any scenario. Additionally I could not force an integrity check by setting up a UNIQUE key on the session.id and photo.sequence due to the pivot table.
For kicks I included a picture of my migration tool.
I quickly realized there was a design flaw and removed the pivot table, added a foreign key ‘session_id’ into the `photos` table, created some KEYS for faster indexing and UNIQUE KEY to force a constraint on the data at the database layer. Modified my code to reflect these changes and then began migrating the photos again.
WOW! Pay dirt. The response time dropped from 225 seconds to about 10 seconds for each batch of 1000 entities. What an improvement. Goes to show you the importance of proper planning, good design and stress testing of your application.
You can see the improvements in the response time of the API calls in the table featured in this image. Green represents the response time AFTER the structural changes to the database. Those response times are in seconds, so as you can see a call that was taking 3 to 4 minutes to complete was now happening in 8-12 seconds.