Breaking down performance: Introducing delays into data retrieval

Breaking down performance: Introducing delays into data retrieval

We had a performance problem. Certain data needed to be retrieved from a variety of tables and needed to be queried. However, this took way too long.

We started by optimizing the table, trying to adjust indexes, adjusting our queries. We got the time down somewhat, but it was deemed this functionality was just too complex to optimize further. Too much had to be done on the fly, and too much data was segregated. We couldn’t change the data structure, as this was crucial for all other functionality. We got stuck.

Until we asked our client about the freshness of the data. How much delay did they allow? The client was convinced the data should be less than a day old. We were surprised. A day? Then why were we doing all these calculations on the fly?

The answer is simple, of course – the functionality was build the same way every other functionality was built. The performance issue only crept in once we saw the scope of the data it had to handle, and it only became a real problem when conventional optimizations did not work well enough. We could do it differently, however.

Essentially, this meant we could take all day to collect the information – as long as the user interaction was lightning fast. We knew it didn’t take that long, just too long for user interaction to be pleasant.

We decided to work as follows:
blogexplain

Every 2 hours, an SQL job retrieves the entire complex table, with all possible fields that can be shown or filtered pre-calculated. This is all combined into a temporary table, so users can keep using the active table.

Once the  table is completely done, the temporary table is switched with the active table. This is quick and unnoticeable for the user. The old table is then deleted in the background.

After all this, it is possible to create indexes on this new active table as well, to increase performance on certain commonly filtered fields. However, the process of combining everything into a single searchable table will already give a big performance boost.

Depending on the heaviness of the calculations, the period can be increased or lengthened. Your client can tell you what the minimal freshness of the data needs to be.


Kevin Deyne

A 25-year-old Software Engineer with a passion for Web, Java and Writing. Working at RealDolmen, he focuses on customer-centric projects that can actually help people and move organisations forward. Most hours of the day, he's thinking about code, integrating architectures and how to solve the next big problem. He also wrote a Lovecraft-inspired thriller called Whitewood and is working on Envir, a high-end project management tool.

Leave a Reply