Sometimes making the simplest enhancement can be difficult to implement at a technical level. Simple is relative off course but from a BI perspective somethings I would consider simple to include:
- Adding a single column to a report
- Changing a calculation in a report or cube
The reason everyday tasks in a BI environment become difficult can mostly be put down to technical debt. This got me thinking, what are the design choices that unnecessarily increase technical dept in BI? I thought of many such design choices and decided it is best to go through each one of them in a series of posts. First on the list is the SQL CASE statement. No self respecting BI practitioner can live without the CASE statement but if used incorrectly this statement can lead to much cursing. The CASE statement is a classical quick solution. Within a few seconds you can:
- Correct spelling mistakes, rename items and correct formatting issues
- Group data into reporting categories
- Basically change source data almost in way you want
The trouble with CASE statements is that they take a life of their own and grow reporting requirements change. The only thing constant in life is change. Some of the issues associated with using the CASE statement in BI include:
- Data integrity – if the same CASE statement is not applied everywhere that particular piece of data is used, it will appear to end users as if the data is not the same across different reports. This leads to mistrust of data and questions on data integrity. This is the most serious consideration to keep in mind when using a CASE statement.
- Performance can become and issue
- Tracking of changes. Using start and end dates in database table it is possible to track changes to data. This is not so simple if changes to data are done via a CASE statement that is changed. Using version control for your SQL statements might mitigate this risk but requires developer intervention to recount the history of data*If the reporting data is in a look up table, a business user is able to make sense of the data. However it is coded in a CASE statement, you need a developer to interpret to the business user what the code is doing. This ties up two resources and becomes a burden on the developer to explain to the business every time there is a code change/new person.
- Code reuse may become an issue depending on where in the BI stack the CASE statement is made.
He is my rule of thumb I would recommend you follow when it comes to the CASE statement. The lower down your BI stack the better it is to use a CASE statement. The only time it is quite acceptable to use a CASE statement in your BI presentation layer is if the CASE statement deals with dynamic data.
CASE WHEN year(my_date) = year(getdate()) THEN “Current Year” WHEN year(my_date)
CASE
WHEN year(my_date) = year(getdate()) THEN "Current Year" WHEN year(my_date)
Example of acceptable statement to use in BI presentation layer.
CASE WHEN my_name = ‘john’ THEN ‘John’ WHEN my_name = ‘MARY’ THEN ‘Mary’ ELSE my_name END
CASE
WHEN my_name = 'john' THEN 'John'
WHEN my_name = 'MARY' THEN 'Mary'
ELSE my_name END
Example of CASE statement that should be avoided in a report but rather should be used in the ETL layer.
In conclusion, the CASE statement can help you meet deadlines and get things done quick but fixing the source data at the ETL layer is always the best design decision you can make. This leads to better transparency in data shown in analytical tools available to your end users.