choices

Which data source should I use to build my reports?

Today, thanks to the democratization of reporting tools and their ease of connecting to many different data sources, it is easier than ever to create reports using multiple sources. However, we must take the time to stop and ask ourselves if our report is connected to the right data.

In this article, I want to talk about the different systems that exist within your company in which you can build reports. Also I want to provide you with some questions to ask yourself to make sure you are connecting to the right data source for your type of report.

Types of systems

In this section, I will elaborate on 4 systems: the source system, the data lake, the data warehouse and the data cube.

Source Systems

Your source systems are the systems that your business uses on a daily basis to conduct its activities. Whether it’s your ERP to do your accounting or your inventory and production tracking, your CRM to track your customers or in-house applications to assist your internal processes. These systems often have integrated reports and one can often connect to them to make personalized reports.

Data Lake

The data lake is a system that is not present in all organizations, but it usually gathers all the company’s data in a raw and transformed format. It also allows you to manage data that is difficult to integrate into traditional databases, such as images and videos. The data lake is equivalent to the hard drive of your computer.

Data Warehouse

The data warehouse is a classic business intelligence (BI) system. It is a database modeled differently than the source system. The source system aims to write information quickly and the warehouse aims to read information quickly for reporting purposes. However, there is an effort to be made to model and transform our data.

Data Cube

The data cube acts as an abstraction layer on your data warehouse. This means that it is often difficult for an end user to interact with a database such as a data warehouse. The cube is designed to be easily navigable via an Excel file and a dynamic cross table, uses the user’s language and allows for more advanced calculations such as year over year comparisons.

 

Questions to ask yourself before creating a report.

Now that you know more about these four systems, let’s get back to our reports. I want to share with you some helpful questions and rules of thumb to make sure your report uses the right source. Because, you don’t want to be forced to change a source mid-development and lose valuable time.

How much refresh time is needed?

One of the first questions to ask is what data lag am I willing to live with. The closer our report is to the source system, the closer our data is to real time. The more sensitive our report is to this question, the closer to real time we should be. Keep in mind though, your report is going to put pressure on your source system and you don’t want to prevent people from using it for their daily activities for reporting.

Am I interested in integrating data from multiple systems?

Here, we want to know if your report should contain data from more than one system. While modern reporting tools can do the job of integration, it’s wise to use more robust data integration tools to do the job. That doesn’t stop you from using your reporting tool to do a proof of concept and then operationalize it in a more solid tool.

Is my need regular or ad-hoc?

Next, you need to ask yourself if the report you are requesting will be used on a regular basis or on an ad hoc basis. To develop in a warehouse or a cube, it takes time and resources. You don’t want to invest that time for an analysis that you have to do for a special event and that won’t happen afterwards. These systems are at their best for known issues. For ad-hoc needs, your data lake will be of great use to you.

Am I interested in having a history?

One of my favorite questions. You have to ask yourself if you want to see the data at given time and not in real time. What I mean is to see the data as it was in the past (e.g. an employee’s manager history). Often source systems keep the most up-to-date version of this information, but your data lake, data warehouse and cube have this information. In technical terms, this is called Slowly Changing Dimensions..

Do I need complex transformations?

The final question to ask is what types of data transformations are needed to complete my report. If the transformations are complex (e.g. current year vs. previous year), it is wise to do this type of transformation outside of your source system. If the need is ad-hoc, your lake will serve you well, if the need is regular, take the time to develop it in your data warehouse.

The beauty of these questions is the fact that even if you don’t have all of these systems in place, your answers may also push you to implement one of them.

Conclusion

In this article, we saw 4 different places you can take your data to build your reports. I’ve also given you 5 questions to ask yourself to make a more informed decision in selecting your source. I hope this article helps and if you have any other questions to offer, feel free to add them in the comments!

Gabriel Guimond-Prevost – Data Architect
Neos Group Inc.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top