Over the course of my career, I’ve built dozens of data models to improve the accuracy, consistency, and turnaround time for data analytics and reporting.
Along the way, I learned what to do–and perhaps more importantly what NOT to do.
This article is written for anyone who is responsible for extracting insights and automating reporting using SQL. It is NOT a guide on how to make efficient data models so data engineers can leave now.
If you are a data analysts, data scientists, analytics engineers, and business intelligence professional, you should finish this quick read.
First, what is a data model?
The phrase “data model” sounds intimidating, but you’ve likely already built a few. A table is a data model. A view is a data model. A data model is something materialized that can be directly queried.

Sometimes creating a data model is as simple as creating a table / view from an existing query. There may be valid reasons why you want your logic and output to remain hidden, but in general, it’s best to materialize commonly run queries as views or tables. If something is helpful for you, there’s a chance it’s helpful for others (not to mention that others can help improve your query).
How do I know if I need a data model?
- If you are repeatedly using the same Common Table Expression (CTE) across multiple queries
- If you are repeatedly making the same joins between tables, especially if the join logic is complicated
- If a common question takes a long time to answer
- If implicit business knowledge is required to interpret a table correctly
In general, SQL-related pain points are the best sign that there is a need for some data modeling.
Considerations for your data model
- Decide on the level of granularity
- What does a row of data represent in your model?
- This is what I spend the most time by far thinking about when designing a new data model
- In general, there are 3 types of granularity to consider:
- Event-based: e.g. click, registration, invoice, etc.
- Entity-based: e.g. company, member, business unit, etc.
- Time-based: month, since launch, calendar year
- These levels of granularity can exist in isolation or be combined e.g. aggregate registrations by customer this year
- It’s not always best to choose the most granular as that can have run-time issues and may require complex aggregation and CTEs to use effectively
- My goal when building a data model is to answer 90% of questions without the use of CTEs. The ideal is that queries become as simple as SELECT * FROM x WHERE y
- Sometimes this means what I thought would be 1 data model becomes 2 or 3 related data models–that’s okay!
- Build it broad
- It’s really easy to create a new data model that answers your immediate need, but challenge yourself to make your table as conceptually broad as possible
- E.g. instead of only including “attended appointments” (your immediate need), include “all appointments” (a likely need of someone else that is conceptually similar to include with minimal additional effort)
- Making your table broad incentives other users to adopt and enhance your work (which makes your life easier and makes you look like a SQL rockstar)
- Rollout slowly
- Don’t make the mistake of announcing to the whole company that your v1 data model is available to use–it will inevitably have some issues and that will damage your credibility and slow adoption
- Instead use the v1 data model yourself for a few weeks and continue to tweak it in response to real-world scenarios
- Then let a few others know and tweak some more based on their feedback
- Announce only after you and others have thoroughly tested it works with real-world scenarios
- Add helper columns
- When aggregating always include things like min, max mode, and array so the loss of granularity feels like less of a limitation
Conclusion
There you have it. Some things to consider when building your data model and please do build a data model. Sharing your logic with others is the right thing to do for the company and your career.