Enterprise grade architecture means managing your data model lifecycle. A model lifecycle refers to the series of stages and processes that a data model undergoes from its creation to its eventual retirement or replacement. This lifecycle encompasses the development, testing, deployment, and maintenance.
It is crucial to ensure the accuracy, reliability, and effectiveness of data analysis processes. By consistently managing the lifecycle of a model, you can adapt to changing data, business requirements, and insights.
This article looks to uncover some important phases, from designing an enterprise model, to releasing it to end users, and maintain it after deployment.
Designing your datasets
Designing enterprise grade models means adhering to best practices and through testing. This is usually done as part of teams, which means processes like version control become important to avoid mistakes.
Enterprise grade features
Incorporating enterprise-grade features into a Power BI dataset ensures that the dataset is robust, efficient, and secure, catering to the diverse and intricate needs of large organizations.
Firstly, implementing Row Level Security (RLS) is imperative, as it ensures that data is filtered and displayed based on the user's role or permissions, safeguarding sensitive information. Incremental refresh can also be employed, especially for large fact tables, to optimize the data load process by only updating new or changed data, thereby improving performance and reducing resource consumption. Aggregation tables can also be leveraged; as they provide a summarized version of your detailed data, allowing faster query performance and efficient data retrieval without scanning the entire dataset. Lastly, using perspectives is an excellent practice in enterprise environments with multifaceted datasets. Perspectives provide a tailored view of the dataset, highlighting specific tables, columns, and measures, thus simplifying the user experience and ensuring that users are only presented with the most relevant data.
These features, when implemented with diligence, can transform a Power BI dataset into a powerful, enterprise-ready tool.
What is version control?
Utilizing Power BI version control offers several advantages. Firstly, it provides a central location for Power BI content, eliminating the need to email files and ensuring access to the latest versions. This automatically maintains all report versions, meaning you can roll back to any previous version when required. Version control systems prevent simultaneous editing by multiple team members, thereby reducing conflicts. In Power BI, this could be as simple as checking out files to lock them, so multiple people cannot work at the same time, or using enterprise solutions such as Git.
Moving your datasets from design to production
Once a dataset is created, we do not want to immediately give it to end users. First, it needs to undergo the testing and validation process. This has a number of steps to ensure it is truly ready to be deployed to end users.
GIT with Fabric
Recently, Microsoft introduced Power BI Developer mode. It allows users to sync their changes with Git. Simply put, Git is an enterprise version control system. It has a number of features that maintain good processes and limit mistakes. Using Power BI Developer mode, Fabric currently integrates with Azure DevOps, a Microsoft service to manage Git and version control. To learn more about Git, visit the overview here.
SharePoint and the Power BI Version Control App
Knowing your users is key. While Git and advanced methods are ideal, not all organizations have the skills for these. As Power BI often has a lot of citizen developers, simpler version control methods are often needed. Here, SharePoint works very well. It has built-in version control and check-in / check-out ability. The Power BI version control app is a free app, courtesy of Power BI Tips, that simplifies SharePoint version control.
Environments
Typically, we do not want to work directly on versions of datasets or reports end users are using. We need a system that allows us to make changes and test before we publish to our end users.
Environments enable this process. We typically have three core environments - development, testing, and production. In Power BI, we can think of these as their own workspace for each. Development (Dev) is the environment where changes, updates, and new features are developed. Testing (Test) is the environment where datasets and reports are thoroughly tested to ensure functionality, performance, and accuracy. Production (Prod) is the live and operational environment where the final version of reports are accessed by end-users.
Deployment pipelines
Once we make changes, we want to publish it to the Dev workspace. Then, once happy, we will move it to test. Here, we test our models and reports to make sure our changes are correct. For this to run well, we need a process to move everything and keep track of what is where.
For Power BI Premium users, deployment pipelines bring efficient management of report and model transitions between environments. Deployment pipelines automate the process of promoting through environments and ensuring consistency between the environments.
A "diff compare," short for "difference comparison," refers to the process of analyzing and displaying the variations between two versions of a file or set of files. It highlights the specific lines, sections, or elements that have been added, modified, or removed in one version compared to another. We could use this to see how our changes are different to the version that is published. Or, we could easily see the difference between the version in Dev and the version in Test.
Datasets are represented as code, so this works especially well. The external tool, ALM Toolkit, allows diff comparisons across different versions of your dataset (for example, between a local desktop model and the published version). Additionally, it allows you to select the exact changes you want to deploy, meaning you do not update a column you may have accidentally deleted.
Maintaining your datasets after publishing
Once the model is created, there is a responsibility to maintain and manage it. This includes communicating with end users, allowing them to find it, and ensuring they can trust the data within it.
Enhancing the Dataset's Professional Presentation
Once a Power BI dataset is published, it's crucial to further configure its presentation and credibility attributes. This includes setting the dataset image and crafting an apt description. An illustrative dataset image creates an immediate visual impression, helping users identify and associate the dataset with its purpose and content.
A well-written description, on the other hand, gives potential users a quick overview of the dataset's content, purpose, and any other pertinent details. Descriptions also appear when hovering over the dataset name as a tool-tip, either in the Data Hub or when viewing datasets using Get Data in Power BI Desktop. These elements combined ensure that the dataset does not just become a name among a sea of datasets. Instead, it stands out, communicates its value at a glance, and thereby encourages wider adoption among users.
Promotion, Discovery, and Establishing Trust
Promotion, discovery, and certification are three pillars that ensure shared datasets' effectiveness and wide adoption. Promotion of a dataset is the process of elevating its visibility status, making it easier for users in an organization to find it among other datasets.
Promoting a dataset signifies its value and relevance, nudging users to prioritize its utilization. To further enhance discoverability, datasets should be optimized for search, ensuring that users can find them with relative ease when they are searching for data. However, beyond mere discovery, trust is paramount. That is where certification comes into play. Certifying a dataset signals to users that it has met certain criteria for accuracy, completeness, and reliability.
When users see a dataset has been certified, it instantly builds confidence in its quality, ensuring that the insights drawn from it are based on credible data. This trust factor is vital for decision-makers who rely on these datasets for crucial business decisions.
The authors
This overview of enterprise architecture and data lifecycle considerations has been written by Reid Havens and Steve Campbell of Havens Consulting and Sunny BI, respectively. You can read more about them on our authors page.
If you would like to take a much deeper dive into upgrading your enterprise architecture, check out their upcoming webinar.