Epic Caboodle Data Warehouse: Data Model and Clarity

Epic Caboodle Data Warehouse: How the Data Model Works and Who Needs It vs Clarity

Star Schema
Caboodle’s dimensional data model – fact tables joined to dimension tables for analytical querying
EncounterFact
The central Caboodle fact table – one row per encounter with pre-computed metrics and dimension keys
Azure Synapse
Cloud-based Caboodle deployment using Microsoft’s distributed analytics platform
External Data
Caboodle accepts non-Epic data sources – financial, claims, HR – making it a true enterprise data warehouse

Most health systems that implement Epic Caboodle use it as a more expensive version of Clarity rather than as the enterprise data warehouse it is designed to be. They query Caboodle the same way they query Clarity, skip the dimensional model’s advantages, and then wonder why the investment did not produce the analytical capability they expected. This article explains how Caboodle’s data model actually works, when Caboodle is the right tool versus when Clarity is sufficient, and what operational and analytical use cases each platform serves best.

Epic Caboodle Architecture: Where It Fits in the Data Stack

Epic’s data architecture has three distinct layers. Chronicles is the operational database where clinical transactions occur in real time. Clarity is the relational reporting database extracted nightly from Chronicles, structured for SQL reporting. Caboodle is the enterprise data warehouse (EDW) built from Clarity data through a second ETL layer, structured for dimensional analysis and BI platform integration.

Caboodle sits above Clarity in the data stack. It is not a replacement for Clarity – it depends on Clarity as its primary data source. The ETL from Clarity to Caboodle runs after the Clarity ETL completes, typically in the early morning hours. Caboodle data is therefore one ETL cycle older than Clarity data – if the Clarity ETL finishes at 4:00 AM and the Caboodle ETL runs from 4:30 AM to 6:00 AM, Caboodle data reflects clinical activity through the prior evening, same as Clarity. The full Cogito architecture context is covered in the Epic Cogito Reporting and Analytics guide.

The key distinction between Caboodle and Clarity is the data model, not the data itself. Both contain largely the same clinical information – but Clarity presents it in normalized relational tables that mirror Epic’s operational structure, while Caboodle presents it in a dimensional model optimized for analytical queries. The data model difference determines which analytical tasks are easier, faster, and more natural in each platform. Understanding the full Epic module landscape provides context for how Caboodle fits into the broader implementation – covered in the Epic EHR Learning Hub.

The Dimensional Data Model: Facts, Dimensions, and Star Schema

Caboodle is organized as a dimensional model – specifically, a star schema. A star schema has two types of tables: fact tables that contain measurable events and pre-computed metrics, and dimension tables that contain descriptive attributes used to filter and group those facts. The name comes from the visual pattern when you draw the schema: the fact table is at the center, and dimension tables radiate outward like points of a star.

Fact Tables: Measurements and Events

Fact tables in Caboodle contain one row per measurable event. EncounterFact has one row per patient encounter. MedicationAdministrationFact has one row per medication administration. OrderFact has one row per clinical order. Each fact table contains the measurements relevant to its event – LOS for encounters, administration time for medication events, result value for lab orders – and a set of foreign keys that link to dimension tables.

Fact tables are designed to be queried in aggregate. You SUM charges from EncounterFact, AVERAGE LOS, COUNT encounters by department. Querying individual rows from a fact table is technically possible but not the primary use case. If you need patient-level detail with many columns, Clarity is often more appropriate than Caboodle for that specific query pattern.

Dimension Tables: Descriptive Attributes

Dimension tables contain the descriptive context that makes fact table metrics meaningful. PatientDim contains patient demographics – age, sex, race, insurance type. DepartmentDim contains department name, specialty, facility, and location. ProviderDim contains provider name, specialty, and NPI. DiagnosisDim contains ICD-10 codes and descriptions. DateDim contains calendar attributes like fiscal year, quarter, week number, and day of week.

The DateDim is one of Caboodle’s most powerful features. A query that breaks down encounter volume by fiscal quarter, by day of week, by month-over-month comparison – all of these are made trivial by joining EncounterFact to DateDim. In Clarity, the same analysis requires date arithmetic functions applied to raw timestamp columns. In Caboodle, you join to DateDim and filter on FISCAL_QUARTER or DAY_OF_WEEK_NAME. The dimensional model eliminates much of the date calculation complexity that makes Clarity analytical queries difficult to write correctly.

Why Star Schema Queries Are Faster for Analytics

The star schema’s analytical performance advantage comes from the pre-aggregation and pre-computation built into the ETL process. When Caboodle’s ETL loads EncounterFact, it calculates and stores the LOS for each encounter directly in the fact table. A Clarity query for average LOS must calculate DATEDIFF(MINUTE,…)/1440 for every row in the result set. A Caboodle query can simply AVG the pre-computed LOS_DAYS column. This pre-computation advantage compounds when you are aggregating millions of encounter rows across multiple dimensions simultaneously.

The tradeoff is that pre-computed metrics are static until the next ETL run. The LOS in EncounterFact reflects the LOS as calculated during the most recent ETL, using the metric definition built into the Caboodle ETL logic. If your organization’s LOS definition differs from Epic’s Caboodle default calculation, the pre-computed value is wrong – and you must either modify the ETL logic or recalculate in your query. This is the validation step most Caboodle implementations skip.

Key Caboodle Tables for Analysts

Table NameTypeGrain (One Row Per)Key Pre-Computed MetricsPrimary Use Case
EncounterFactFactPatient encounterLOS_DAYS, READMISSION_FLG, RISK_SCOREVolume, LOS, readmission, population analytics
PatientDimDimensionPatientAGE_AT_ENCOUNTER, PAYER_TYPE_CDemographics, payer mix, population segmentation
DepartmentDimDimensionDepartmentDEPT_NAME, SPECIALTY, FACILITYDepartmental filtering and grouping
DiagnosisDimDimensionDiagnosis codeICD10_CODE, DESCRIPTION, CHRONIC_FLGDiagnosis analysis, case mix, chronic disease
ProviderDimDimensionProviderPROV_NAME, NPI, SPECIALTY, DEPT_IDProvider productivity, referral patterns, credentialing
DateDimDimensionCalendar dateFISCAL_YEAR, FISCAL_QUARTER, DAY_OF_WEEK_NAMETime-based analysis, fiscal period reporting
MedicationAdministrationFactFactMedication administrationADMIN_ACTION, DOSE_GIVENMedication compliance, pharmacy analytics
OrderFactFactClinical orderORDER_TO_RESULT_MINUTES (lab), ORDER_STATUSCPOE utilization, lab turnaround, ordering patterns
Real Scenario – Academic Medical Center, Caboodle Readmission Analysis

An academic medical center’s quality team needed a 30-day all-cause readmission analysis broken down by primary diagnosis category (using CCS groupings of ICD-10 codes), payer type, age group, and attending service – for the last 24 months, compared month-over-month. In Clarity, this query required joining PAT_ENC, PATIENT, PAT_ENC_DX, CLARITY_EDG, and CLARITY_SER across 24 months of data, with complex date arithmetic for month-over-month comparison and manual CCS code grouping logic. The initial Clarity query took 22 minutes to run and required 340 lines of SQL written by a senior data analyst. The same analysis in Caboodle joined EncounterFact to PatientDim (payer type), DiagnosisDim (CCS grouping pre-loaded in dim), ProviderDim (attending service), and DateDim (month-over-month via FISCAL_MONTH columns). The Caboodle query ran in 4 minutes, used 28 lines of SQL, and the READMISSION_FLG pre-computed metric eliminated the self-join that had been the most complex part of the Clarity version. The 22-minute Clarity run time was also a problem during the ETL window – the Caboodle query could run at any time without impacting the Clarity reporting environment.

Caboodle vs Clarity: When to Use Which for Analytics

The decision between Caboodle and Clarity is not about which one is better – both are valuable for different analytical tasks. The decision is about fit: which platform’s data model matches the question being asked?

DimensionEpic ClarityEpic Caboodle
Data modelNormalized relational (mirrors Chronicles structure)Dimensional star schema (optimized for analytics)
Query complexityHigh for multi-dimensional analysis – many joins requiredLower for multi-dimensional analysis – pre-aggregated
Performance at scaleCan slow significantly for large multi-table analytical queriesBetter for large analytical queries across many rows
Near-real-time dataYes – near-real-time extracts available for select tablesNo – nightly ETL only; one cycle behind Clarity
Operational reportingBetter fit – Workbench reports run against ClarityNot designed for operational/near-real-time reporting
Enterprise analyticsWorkable but complex for multi-dimensional analysisBetter fit – built for enterprise analytical queries
External data integrationEpic data onlySupports non-Epic data sources (financial, claims, HR)
BI platform integrationPossible – SQL Server connection directPreferred – dimensional model maps naturally to BI tools
Pre-computed metricsNone – all calculations at query timeLOS, readmission flags, risk scores pre-calculated in ETL
Implementation costIncluded in Epic license (Clarity access)Additional license + infrastructure investment

A practical decision framework: if the question can be answered by a Reporting Workbench report running against Clarity, use Clarity. If the question requires multi-year trend analysis across multiple dimensions, integration with non-Epic data, or BI platform visualization at scale, use Caboodle. If the question requires near-real-time data (same-day operational metrics), neither Caboodle nor Clarity nightly extract will work – you need near-real-time Clarity extracts or a different data source. Writing direct SQL queries against Clarity for operational analysis is covered in the Epic Clarity SQL for Analysts guide.

External Data Integration in Epic Caboodle

Caboodle’s most underused capability is its support for non-Epic data sources. Caboodle’s ETL framework can ingest data from external systems – financial accounting systems, claims data, HR systems, patient satisfaction platforms (Press Ganey, HCAHPS), external benchmarks (Vizient, Premier) – and link it to Epic clinical data through common keys like patient MRN, encounter ID, or provider NPI.

This integration capability is what makes Caboodle a genuine enterprise data warehouse rather than just an Epic-specific analytical database. A health system that integrates claims data into Caboodle can answer questions like: which patients seen in our ambulatory clinics subsequently had claims with another health system, and what were those services? A health system that integrates financial data can answer: what is the actual margin per DRG after accounting for labor costs from the HR system? Neither of these questions is answerable from Epic data alone.

External Data Integration Architecture

External data enters Caboodle through the Epic Foundation System’s ETL pipeline or through a custom ETL process that loads data directly into Caboodle’s custom schema. Epic provides a custom schema namespace within Caboodle specifically for non-Epic data – this keeps external data organized separately from the Epic-managed fact and dimension tables and prevents it from being overwritten by Epic ETL updates.

The linking mechanism between external data and Epic data requires a common identifier. Patient-level linkage uses PAT_ID or MRN. Encounter-level linkage uses CSN (Contact Serial Number) or visit date range. Provider-level linkage uses NPI. Build analysts designing external data integration must confirm which identifier the external system uses and verify that the same patient/provider appears with the same identifier in both systems before assuming a JOIN will produce correct matches.

Real Scenario – Regional Health System, Claims Integration Linkage Problem

A regional health system integrated Medicare claims data into Caboodle to identify patients who had visits at the health system but received subsequent care at competing facilities. The integration plan linked claims to Epic patients via Medicare Beneficiary Identifier (MBI). During data validation, the team discovered that Epic’s patient records stored the MBI in a custom field that had inconsistent formatting – some records had hyphens (1EG4-TE5-MK72), some had no hyphens (1EG4TE5MK72), and some had leading spaces from a legacy import. The claims data used the hyphenated format consistently. The JOIN linked only 67% of expected patient records because of the format inconsistency. Resolving the linkage required a data cleansing step on the Epic MBI field before the JOIN would work correctly. This is a routine data quality problem in any enterprise integration – but it is often not discovered until the first validation run, which in this case was three weeks before the scheduled production deployment.

Pre-Built Derived Metrics and When to Validate Them

Caboodle’s pre-built derived metrics are one of its most valuable features and one of its most common sources of reporting errors. Epic calculates and stores metrics like LOS, 30-day readmission flags, and risk scores directly in fact tables during the ETL process. These pre-computed values are faster to query than real-time calculations. But they use Epic’s definition of each metric, which may not match your organization’s adopted definition.

Readmission Flag Validation

The READMISSION_FLG in EncounterFact uses Epic’s default readmission definition: an unplanned inpatient readmission within 30 days of an index inpatient discharge, for the same patient at the same facility. This definition differs from CMS’s Hospital Readmission Reduction Program (HRRP) definition, which applies risk adjustment based on diagnosis and excludes planned readmissions using a different exclusion list. It differs from some organizations’ internal definitions that count readmissions to any facility in the network, not just the same facility.

Before using any Caboodle derived metric for performance management, regulatory reporting, or payer contracting, validate the metric definition against your organization’s canonical definition. The validation process: run the pre-computed Caboodle metric for a prior period, then run your own calculation of the same metric from first principles, compare the results. Any discrepancy requires investigation. The BAT vs UAT methodology that applies to clinical system testing applies equally to data warehouse validation – described in the BAT vs UAT guide.

LOS Validation in Caboodle

LOS_DAYS in EncounterFact is calculated from admission to discharge timestamps during the Caboodle ETL. The calculation method (hours to minutes to decimal days) is consistent but the question is which timestamps are used. Epic’s LOS may use the A01 (admit) to A03 (discharge) ADT event timestamps, which is the correct clinical LOS. Some organizations’ billing LOS definitions use registration timestamp to discharge timestamp. These produce slightly different numbers that matter when your Caboodle LOS is being compared against payer data that uses a different definition.

Caboodle on Azure Synapse: Cloud Deployment Considerations

Epic increasingly supports Caboodle deployment on Microsoft Azure Synapse Analytics rather than on-premises SQL Server. Azure Synapse is a cloud-based distributed analytics platform that provides elastic compute scaling, integration with Azure Machine Learning, native Power BI connectivity, and consumption-based pricing. For large health systems with multi-facility Epic deployments and complex analytical workloads, Azure Synapse offers significant performance advantages for the largest Caboodle queries.

Cloud Caboodle vs On-Premises Caboodle

DimensionOn-Premises Caboodle (SQL Server)Cloud Caboodle (Azure Synapse)
InfrastructureOrganization-managed SQL Server hardware and licensingMicrosoft Azure-managed – no hardware to provision
Compute scalingFixed – limited by purchased server capacityElastic – scale up/down based on query demand
Large query performanceCan bottleneck at server capacity ceilingDistributed query engine handles large parallel workloads
Cost modelFixed capital cost (hardware + SQL Server license)Consumption-based – cost increases with query volume
HIPAA complianceManaged by organization’s data center security controlsRequires Microsoft Azure HIPAA BAA and configuration review
Power BI integrationSQL Server connection via Power BI Desktop or gatewayNative Azure connectivity – no gateway required
Best forOrganizations with existing data center infrastructure; predictable query loadsOrganizations seeking elastic scaling; ML integration; Power BI-first BI strategy

The most significant risk with cloud Caboodle is unoptimized query cost. Azure Synapse pricing is consumption-based – a poorly optimized query that performs a full distributed scan of a large fact table in 8 seconds might generate significant compute charges if run frequently. Organizations migrating from on-premises Caboodle where query cost was invisible (included in fixed infrastructure) must establish query governance, resource monitoring, and cost alerting before migrating analytical workloads to Azure Synapse.

Connecting Caboodle to External BI Platforms

Caboodle’s dimensional model makes it the natural backend for external Business Intelligence platforms – Power BI, Tableau, Qlik, and others. The star schema structure maps directly to the way BI tools think about data: drag-and-drop dimensions for filters and groupings, pre-computed measures for instant aggregation. Connecting a BI platform to Caboodle typically produces far better performance and usability than connecting the same BI platform directly to Clarity’s normalized tables.

Connection Architecture and HIPAA Requirements

Connecting a BI platform to Caboodle requires a read-only database connection – a SQL Server connection string with a service account that has SELECT-only permissions on the Caboodle schema. On-premises connections use the BI tool’s SQL Server connector. Azure Synapse connections use Azure-native connectors that do not require an on-premises data gateway.

HIPAA compliance requires that the BI platform is covered by a Business Associate Agreement (BAA) with the health system. Microsoft Power BI’s enterprise tier (Power BI Premium or Power BI for Government) supports BAA coverage. Tableau can be covered via BAA when deployed in a HIPAA-eligible configuration. Consumer-tier BI tools without enterprise compliance certification must not be connected to Caboodle if the data contains PHI. The privacy officer must confirm BAA status for any external platform before the Caboodle connection is established. Epic Reporting Workbench’s relationship to Caboodle and external BI platforms is covered in the Epic Reporting Workbench guide.

DirectQuery vs Import Mode in Power BI

When connecting Power BI to Caboodle, analysts must choose between DirectQuery mode (Power BI sends queries to Caboodle in real time when users interact with dashboards) and Import mode (Power BI imports a scheduled snapshot of Caboodle data into its own storage). DirectQuery provides up-to-date data but generates Caboodle compute load with every dashboard interaction. Import mode reduces Caboodle query load but introduces a data freshness lag equal to the import schedule. For Caboodle (which is already one ETL cycle behind Chronicles), a daily import schedule in Power BI adds another data freshness layer – Import mode with a daily refresh at 7:00 AM and a Caboodle ETL that completes at 6:00 AM produces a tolerable one-day maximum data lag for most analytical use cases.

Caboodle Implementation: What Teams Consistently Underestimate

Caboodle implementations are consistently underestimated in three areas: ETL validation, derived metric validation, and data governance. These are the same categories that cause Caboodle implementations to deliver less analytical value than expected.

ETL Validation Takes Longer Than Expected

Validating that the Caboodle ETL correctly populated fact and dimension tables from Clarity is not a quick exercise. EncounterFact record counts must be reconciled against Clarity PAT_ENC. Dimension table values must be checked against Clarity reference tables. External data joins must be validated against the source system’s own record counts. A thorough ETL validation for a medium-sized health system (300-500 beds, 3-5 years of historical data) typically takes 4-8 weeks when done correctly.

Governance Must Be Built Before the First Report Is Published

The same report proliferation and metric inconsistency problem that affects Reporting Workbench (47 versions of the census report) will occur with Caboodle-based analytics if governance is not in place before the first dashboard goes live. Establish canonical metric definitions, report certification processes, and BI platform access governance before publishing any Caboodle-based analytics to clinical or executive stakeholders. The investment in governance structure pays back immediately in analytical credibility.

Roles, Skills, and Career Path for Caboodle Work

Caboodle Data Analyst
Writes analytical SQL against the Caboodle dimensional model. Validates derived metrics against canonical definitions. Builds BI platform data sets. Requires Epic Cogito certification plus dimensional modeling knowledge and SQL proficiency.
Data Architect / EDW Lead
Designs Caboodle schema extensions, manages ETL configuration, designs external data integration pipelines, and oversees BI platform connection architecture. Senior role requiring Epic Cogito knowledge plus enterprise data warehouse and cloud platform expertise.
Clinical Informatics Analyst
Bridges clinical operations and Caboodle analytics. Owns canonical metric definitions, validates derived metric logic against clinical intent, and supports governance committee. Often holds both clinical and analytics credentials.
BI Developer
Builds Power BI, Tableau, or Qlik dashboards connected to Caboodle. Designs semantic layers, DirectQuery vs Import mode strategy, row-level security, and report publication governance. Works closely with data architect on connection architecture.
RoleCertificationKey SkillsSalary Range (2026)
Caboodle Data AnalystEpic CogitoDimensional SQL, fact/dimension model, metric validation$90,000 – $130,000
Data Architect / EDW LeadEpic Cogito + Azure/cloud certETL design, external integration, Azure Synapse, BI architecture$120,000 – $165,000
Clinical Informatics AnalystEpic Cogito + clinical backgroundMetric governance, CMS quality, analytical credibility$105,000 – $145,000
BI Developer (Caboodle-connected)Power BI / Tableau cert + Epic CogitoStar schema BI modeling, DirectQuery, row-level security$95,000 – $135,000
The Caboodle Investment That Returns Value Fastest

Validate every pre-computed Caboodle derived metric against your organization’s canonical definition before using it in any report or dashboard. Start with the three metrics that appear in the most leadership conversations: readmission rate, length of stay, and patient volume. Run Epic’s pre-computed value alongside your own calculation from first principles. Document the differences. Correct the ones that matter for your use case – either by adjusting the ETL logic or by recalculating in the query layer. Organizations that do this before the first Caboodle dashboard goes live avoid six months of stakeholder trust erosion from metric discrepancies that analysts cannot explain.

Authoritative References

Downloads: Caboodle Data Warehouse Templates

๐Ÿ“‹
Caboodle Go-Live Readiness Checklist (PDF)
Pre-go-live gates for ETL validation, fact table record count reconciliation, derived metric validation, external data linkage verification, BI platform HIPAA BAA, Power BI DirectQuery vs Import mode decision, and governance structure.

Download Checklist (PDF)

๐Ÿ“Š
Caboodle Derived Metric Validation Tracker (Excel)
Track every Caboodle pre-computed metric: Epic’s built-in definition, your organization’s canonical definition, validation result (match / discrepancy), resolution action (use as-is, recalculate in query, modify ETL), and sign-off status.

Download Tracker (Excel)

๐Ÿงช
Caboodle vs Clarity Decision Guide (PDF)
Decision framework for choosing the right Epic data platform per use case: operational reporting, near-real-time dashboards, multi-dimensional enterprise analytics, external data integration, BI platform connection, and CMS regulatory reporting.

Download Decision Guide (PDF)

Scroll to Top