dbt (Data Build Tool) has rapidly become the lingua franca of the modern data stack, but running it in a production healthcare environment introduces constraints that a typical SaaS analytics deployment never encounters: PHI sensitivity, strict audit requirements, regulatory scrutiny, and zero tolerance for incorrect metrics that could influence clinical decisions.
Project Structure for Compliance
Organise your dbt project with PHI data access as a first-class concern. Create a dedicated phi schema tag and apply row-level security policies at the database layer. Use dbt's meta fields to document PHI sensitivity on every model and column — this metadata can be exported and used to drive automated data cataloguing and access control reviews. Never allow PHI-containing models to be materialised into development environments without proper masking.
Testing That Actually Catches Problems
The built-in dbt tests (not_null, unique, accepted_values, relationships) are the floor, not the ceiling. For healthcare analytics, add custom tests for clinical business rules: patient age ranges that are physiologically plausible, diagnosis codes that are valid ICD-10, encounter dates that precede discharge dates. These domain-specific tests catch upstream data quality issues before they propagate into reporting and affect clinical decision-making.
A dbt test suite that only checks for null values and uniqueness is like a smoke detector that only triggers at 1000°C. Useful in theory, useless in practice.
Documentation as a Compliance Artifact
In regulated healthcare environments, the dbt docs site is not just a developer convenience — it is a compliance artifact. Every model should have a description explaining what the data represents, its source system, and known limitations. Every column containing PHI should be tagged and described. Use dbt's exposures to document downstream dashboards and reports, creating a complete lineage graph from raw EHR data to the KPI on the CMO's dashboard. This lineage is invaluable during audits and dramatically accelerates root-cause analysis when metrics look wrong.