Epic Clarity SQL for Analysts: Writing Queries, Common Tables, and Data Validation Patterns
Epic Clarity SQL is one of the most valuable technical skills in healthcare IT – and one of the most misused. Analysts who know how to write efficient, safe queries against Clarity can answer operational and clinical questions in minutes that would otherwise require weeks of report development. Analysts who treat Clarity like a general-purpose SQL database produce incorrect results, degrade reporting system performance, and in worst cases expose PHI to unauthorized access. This article covers how Clarity is structured, which tables matter most, how to write queries that produce accurate results, and the safety practices that separate professional Clarity work from dangerous improvisation.
- How Clarity Is Structured: ETL, Tables, and Relationships
- The Most Important Clarity Tables for Analysts
- Writing Clarity SQL: Patterns That Work
- Date and Time Handling in Clarity
- Data Validation Patterns Against Clarity
- What Separates Useful SQL from Dangerous SQL
- HIPAA, Access Controls, and Clarity Query Safety
- Query Performance and Clarity Database Etiquette
- Common Clarity SQL Mistakes and How to Avoid Them
- Downloads
How Clarity Is Structured: ETL, Tables, and Relationships
Clarity is a Microsoft SQL Server relational database populated via ETL from Epic’s operational database (Chronicles). The ETL runs on a defined schedule – typically nightly for most tables, with near-real-time extracts available for select high-urgency tables like ADT data and order status. Clarity does not hold real-time data by default. A query run at 9:00 AM against a nightly-extract table reflects data through the prior evening.
Chronicles is a hierarchical, proprietary database organized around master files and records. Clarity maps Chronicles’ hierarchical structure to normalized relational tables. Each Chronicles master file becomes one or more Clarity tables. The Chronicles Patient (EPT) master file becomes the PATIENT table in Clarity. The Chronicles Encounter (PAT_ENC) record becomes the PAT_ENC table in Clarity. These mappings are documented in Epic’s Clarity Data Dictionary, which is accessible to organizations with Epic licenses. The broader context of Epic’s data architecture – including how Clarity relates to Caboodle and the Reporting Workbench – is covered in the Epic Cogito Reporting and Analytics guide.
How Clarity Table Names Map to Clinical Concepts
Clarity table names follow a predictable convention. Tables prefixed with PAT_ relate to patient or encounter data. Tables prefixed with ORD_ relate to orders. CLB_ tables relate to clinical billing and charge data. HSP_ tables relate to hospital account and financial data. DOCS_ tables relate to document and clinical note data. ZC_ tables are category (lookup/reference) tables that contain the valid values for categorical fields – ZC_PAT_CLASS contains the valid patient class codes, ZC_DISP_ENC_TYPE contains encounter disposition types.
ZC_ tables are essential for making categorical data human-readable. A query against PAT_ENC returns numeric codes for patient class, encounter type, and department. Joining to the corresponding ZC_ table replaces those codes with descriptive names. Queries that skip ZC_ joins return raw codes that clinical and operational stakeholders cannot interpret – and may interpret incorrectly if they guess at what a code means.
The Most Important Clarity Tables for Analysts
| Table Name | Clinical Domain | Key Columns | Common Use Case |
|---|---|---|---|
| PAT_ENC | Patient encounters (all types) | PAT_ENC_CSN_ID, PAT_ID, HOSP_ADMSN_TIME, HOSP_DISCHRG_TIME, DEPARTMENT_ID, ENC_TYPE_C | Patient volume, LOS, encounter type breakdown |
| CLARITY_ADT | Patient movement (ADT events) | PAT_ENC_CSN_ID, EVENT_TYPE_C, EFFECTIVE_TIME, FROM_BASE_CLASS_C, TO_BASE_CLASS_C | LOS calculation, transfer tracking, boarding time |
| PAT_ENC_DX | Encounter diagnoses (ICD-10) | PAT_ENC_CSN_ID, DX_ID, LINE, PRIMARY_DX_YN | Diagnosis reporting, case mix, ICD-10 analysis |
| ORDER_PROC | Procedure orders (lab, imaging) | ORDER_PROC_ID, PAT_ENC_CSN_ID, PROC_ID, ORD_STATUS_C, ORDER_TIME | Order utilization, CPOE analysis, turnaround time |
| ORDER_RESULTS | Lab and result values | ORDER_PROC_ID, COMPONENT_ID, ORD_VALUE, RESULT_TIME, RESULT_FLAG_C | Lab result analysis, critical value reporting |
| MAR_ADMIN_INFO | Medication administration (eMAR) | ORDER_MED_ID, TAKEN_TIME, MAR_ACTION_C, SIG_USER_ID, DOSE_UNIT_C | Medication compliance, administration timing analysis |
| PATIENT | Patient demographics | PAT_ID, PAT_MRN_ID, BIRTH_DATE, SEX_C, PAT_STATUS_C | Demographics, EMPI analysis, patient population |
| CLARITY_DEP | Department reference | DEPARTMENT_ID, DEPARTMENT_NAME, REV_LOC_ID, SPECIALTY_DEP_C | Department name lookup, location reference |
| CLB_PROV_CHARGE | Professional billing charges | TX_ID, PAT_ENC_CSN_ID, PROC_ID, AMOUNT, POST_DATE | Charge capture analysis, revenue cycle reporting |
Understanding PAT_ENC_CSN_ID: The Key That Connects Everything
PAT_ENC_CSN_ID is the Contact Serial Number – Epic’s unique identifier for a single patient encounter. It is the primary join key across most Clarity tables. When you need to connect encounter data to diagnoses, orders, results, medications, and charges, PAT_ENC_CSN_ID is the column that ties them together. Every analyst writing Clarity SQL must understand this identifier before writing any multi-table query.
PAT_ID is the patient-level identifier – it identifies the patient across all encounters. PAT_ENC_CSN_ID identifies a single visit. A patient with 20 ED visits has one PAT_ID and 20 PAT_ENC_CSN_IDs. Confusing these two identifiers produces queries that either aggregate incorrectly (using PAT_ID where you need CSN) or count visits as patients (using CSN where you need patient-level distinct counts).
Writing Clarity SQL: Patterns That Work
Pattern 1: Patient Volume by Department and Encounter Type
The most fundamental Clarity query counts encounters by department for a defined period. This pattern is the starting point for nearly all operational reporting:
d.DEPARTMENT_NAME,
zc.NAME AS ENC_TYPE,
COUNT(pe.PAT_ENC_CSN_ID) AS ENCOUNTER_COUNT,
COUNT(DISTINCT pe.PAT_ID) AS UNIQUE_PATIENTS
FROM PAT_ENC pe
INNER JOIN CLARITY_DEP d ON pe.DEPARTMENT_ID = d.DEPARTMENT_ID
LEFT JOIN ZC_ENC_TYPE zc ON pe.ENC_TYPE_C = zc.ENC_TYPE_C
WHERE pe.HOSP_ADMSN_TIME >= ‘2026-01-01’
AND pe.HOSP_ADMSN_TIME < ‘2026-02-01’
AND pe.ENC_TYPE_C IN (3, 101, 50) — Hospital Encounter, Office Visit, ED
GROUP BY d.DEPARTMENT_NAME, zc.NAME
ORDER BY ENCOUNTER_COUNT DESC;
Note the distinction between COUNT(PAT_ENC_CSN_ID) for encounters and COUNT(DISTINCT PAT_ID) for unique patients. Reporting “patient volume” requires clarity (no pun intended) about whether the stakeholder wants encounter count or unique patient count. These numbers are different and tell different operational stories. Always confirm which the stakeholder needs before delivering the query.
Pattern 2: Length of Stay Using CLARITY_ADT
Length of stay calculations must use CLARITY_ADT timestamp events rather than PAT_ENC admission and discharge times alone. CLARITY_ADT captures every patient movement event, allowing separation of active care time from boarding time:
pe.PAT_ENC_CSN_ID,
p.PAT_MRN_ID,
pe.HOSP_ADMSN_TIME,
pe.HOSP_DISCHRG_TIME,
DATEDIFF(HOUR, pe.HOSP_ADMSN_TIME, pe.HOSP_DISCHRG_TIME) AS LOS_HOURS,
CAST(DATEDIFF(HOUR, pe.HOSP_ADMSN_TIME, pe.HOSP_DISCHRG_TIME) AS DECIMAL(10,2)) / 24.0 AS LOS_DAYS,
d.DEPARTMENT_NAME AS ADMIT_DEPT
FROM PAT_ENC pe
INNER JOIN PATIENT p ON pe.PAT_ID = p.PAT_ID
INNER JOIN CLARITY_DEP d ON pe.DEPARTMENT_ID = d.DEPARTMENT_ID
WHERE pe.HOSP_ADMSN_TIME >= ‘2026-01-01’
AND pe.HOSP_ADMSN_TIME < ‘2026-02-01’
AND pe.ENC_TYPE_C = 3 — Hospital Encounter (inpatient)
AND pe.HOSP_DISCHRG_TIME IS NOT NULL; — Exclude still-admitted patients
The WHERE clause filtering on HOSP_DISCHRG_TIME IS NOT NULL is essential. Including patients still admitted produces an artificially inflated LOS because DATEDIFF against a NULL discharge time returns NULL – but if your aggregation drops NULLs, your average silently excludes current admits. If DATEDIFF coerces the NULL, you get wrong numbers. Always be explicit about whether you want currently admitted patients included or excluded, and handle it in the WHERE clause.
Pattern 3: Diagnosis Analysis with ICD-10 Codes
Pulling encounters by primary diagnosis requires joining PAT_ENC to PAT_ENC_DX and then to the EDG_CURRENT_ICD10 table for the actual ICD-10 code:
edg.CODE AS ICD10_CODE,
edg.CODE_DESC AS DIAGNOSIS_NAME,
COUNT(pe.PAT_ENC_CSN_ID) AS ENCOUNTER_COUNT,
COUNT(DISTINCT pe.PAT_ID) AS UNIQUE_PATIENTS
FROM PAT_ENC pe
INNER JOIN PAT_ENC_DX dx ON pe.PAT_ENC_CSN_ID = dx.PAT_ENC_CSN_ID
INNER JOIN CLARITY_EDG edg ON dx.DX_ID = edg.DX_ID
WHERE pe.HOSP_ADMSN_TIME >= ‘2026-01-01’
AND pe.HOSP_ADMSN_TIME < ‘2026-04-01’
AND dx.PRIMARY_DX_YN = ‘Y’ — Primary diagnosis only
AND edg.CODE LIKE ‘A41%’ — Sepsis ICD-10 range
GROUP BY edg.CODE, edg.CODE_DESC
ORDER BY ENCOUNTER_COUNT DESC;
The PRIMARY_DX_YN = ‘Y’ filter is critical. PAT_ENC_DX contains all diagnoses for an encounter – primary, secondary, and others. Without this filter, you count encounters multiple times (once per diagnosis) and inflate your counts. The LINE column in PAT_ENC_DX provides diagnosis sequence – LINE = 1 is typically the primary diagnosis, but PRIMARY_DX_YN is the authoritative flag. Use PRIMARY_DX_YN, not LINE alone. CPOE and order workflows that generate the diagnoses being reported here are covered in the Epic EHR Orders and CPOE guide.
Date and Time Handling in Clarity
Clarity stores timestamps in local time by default. This creates a subtle but important problem for health systems that span time zones: a patient admitted in Chicago and a patient admitted in Phoenix on the same calendar date may have timestamps that are 2 hours apart in Clarity. If your organization has multiple time zones, confirm with your Epic technical team how timestamps are stored and whether any conversion is needed before aggregating across facilities.
Date range filtering in Clarity SQL must always use half-open intervals. Use WHERE HOSP_ADMSN_TIME >= '2026-01-01' AND HOSP_ADMSN_TIME < '2026-02-01' rather than BETWEEN '2026-01-01' AND '2026-01-31'. The BETWEEN operator includes both endpoints. On a datetime column, BETWEEN '2026-01-01' AND '2026-01-31' excludes encounters with admission times after midnight on January 31 (e.g., 2026-01-31 06:00:00) unless you specify the end of day. The half-open interval is unambiguous and correct for any timestamp.
DATEDIFF in SQL Server calculates the number of interval boundaries crossed between two dates, not the actual elapsed time. DATEDIFF(DAY, ‘2026-01-31 23:59’, ‘2026-02-01 00:01’) returns 1 – even though only 2 minutes have elapsed. For clinical LOS calculations, use DATEDIFF(HOUR, …) and divide by 24, or use DATEDIFF(MINUTE, …) and divide by 1440. This gives you a decimal LOS in days that correctly represents partial days.
A health system quality team developed a monthly inpatient LOS report for the CMO’s performance dashboard. The report used DATEDIFF(DAY, HOSP_ADMSN_TIME, HOSP_DISCHRG_TIME) to calculate LOS. The CMO received the report and compared it to the health system’s prior vendor’s LOS calculation. The numbers were consistently 0.3-0.5 days lower than the prior system’s figures. Investigation revealed the prior system had calculated LOS as total hours divided by 24, counting partial days correctly. The Clarity query using DATEDIFF(DAY) was counting midnight crossings, which underrepresented LOS for patients admitted in the morning and discharged the following morning – a common inpatient pattern. The fix was to change the calculation to CAST(DATEDIFF(MINUTE, HOSP_ADMSN_TIME, HOSP_DISCHRG_TIME) AS DECIMAL(10,2)) / 1440.0. The CMO received a corrected report with a note explaining the methodology change. No patient safety issue arose, but 3 months of performance data needed recalculation.
Data Validation Patterns Against Clarity
Data validation against Clarity answers two distinct questions. The first is technical: does the Clarity data match what is in Chronicles (did the ETL run correctly)? The second is clinical: does the Clarity data accurately represent the clinical reality it is supposed to capture? Both questions require different validation approaches.
ETL Validation: Record Count Reconciliation
The simplest ETL validation confirms that Clarity PAT_ENC counts match Epic’s operational encounter counts for the same period:
— Compare this to Epic’s ADT daily census report for the same period
SELECT
CAST(pe.CONTACT_DATE AS DATE) AS CONTACT_DATE,
pe.ENC_TYPE_C,
COUNT(pe.PAT_ENC_CSN_ID) AS CLARITY_COUNT
FROM PAT_ENC pe
WHERE pe.CONTACT_DATE >= CAST(GETDATE() AS DATE) – 1
AND pe.CONTACT_DATE < CAST(GETDATE() AS DATE)
GROUP BY CAST(pe.CONTACT_DATE AS DATE), pe.ENC_TYPE_C
ORDER BY CONTACT_DATE, pe.ENC_TYPE_C;
Field-Level Validation: Sample-Based Spot Checks
Reconciling record counts confirms the ETL ran. It does not confirm field-level accuracy. A field-level validation pulls a sample of Clarity records and compares individual field values to what appears in the Epic clinical application for the same encounter. This is the validation step most teams skip – and it catches the subtle data transformation errors that count reconciliation misses.
A structured field-level validation should cover: patient name and MRN (PATIENT table vs Epic registration screen), admission and discharge timestamps (PAT_ENC vs clinical encounter), primary diagnosis code (PAT_ENC_DX vs diagnosis list in chart), attending provider ID (PAT_ENC vs provider in clinical encounter), and for financial validation, charge amounts (CLB_PROV_CHARGE vs charge review in Resolute). BAT methodology as applied to data validation is described in the BAT vs UAT guide.
Clinical Validation: Does the Data Match Reality?
Clinical validation goes beyond ETL accuracy. It asks whether the Clarity data correctly represents the clinical situation. A report that shows 95% medication administration compliance may be technically correct in Clarity but clinically misleading if nurses are documenting “given” for medications they did not actually administer. The data in Clarity reflects documentation – not always the clinical event itself. Understanding the clinical documentation workflow that generates the data being queried is a prerequisite for valid clinical reporting. The documentation workflows are covered in the EpicCare Inpatient ClinDoc guide.
What Separates Useful SQL from Dangerous SQL in Clarity
The difference between useful and dangerous Clarity SQL is not query complexity. It is the combination of access scope, query impact on system performance, and the clinical accuracy of the results returned. A sophisticated query run responsibly is useful. A simple query run carelessly against the wrong population is dangerous.
| Category | Useful SQL Practice | Dangerous SQL Practice | Why It Matters |
|---|---|---|---|
| Access scope | Read-only access only; specific columns in SELECT | SELECT * on wide tables; write access to Clarity | SELECT * pulls all columns including PHI not needed for the analysis |
| Date filtering | Always filter by indexed date column with narrow range | No date filter; scanning full table history | Full table scan on PAT_ENC (millions of rows) degrades Clarity for all users |
| Row limiting | TOP 100 or TOP 1000 during development and validation | Removing TOP limit before validating logic is correct | Unrestricted results expose more PHI than needed during query development |
| DML statements | SELECT only; no INSERT, UPDATE, DELETE, DROP | Any DML against Clarity – even accidental | DML can corrupt clinical data and invalidate ETL integrity |
| PHI handling | Minimum necessary columns; results stored in HIPAA-covered location | Exporting PHI to desktop, email, or personal storage | HIPAA breach risk; audit trail violation |
| Join logic | Explicit JOIN type (INNER/LEFT) with documented reason | Implicit joins (comma-separated FROM clause) | Implicit joins can produce cartesian products that inflate counts silently |
Never run DML (INSERT, UPDATE, DELETE, TRUNCATE, DROP) against Clarity. This cannot be overstated. Clarity is an ETL-managed database. Its data integrity depends on the ETL process owning all writes. A manual UPDATE to a Clarity table will be overwritten by the next ETL run – or may not be overwritten if the ETL only processes changed records, leaving your manual change corrupting the data indefinitely. Most organizations grant only read-only database access to Clarity. If you have write access, use it only in designated scratch or staging schemas, never against Epic’s Clarity schema objects.
A reporting analyst at a regional hospital ran an ad hoc Clarity query to support a payer audit. The query joined PAT_ENC, PATIENT, PAT_ENC_DX, ORDER_PROC, and ORDER_RESULTS with no date filter – effectively scanning 8 years of patient data across 5 tables simultaneously. The query ran for 47 minutes before the DBA terminated it. During those 47 minutes, the Reporting Workbench was unresponsive for all users because the query had consumed all available Clarity SQL Server CPU resources. The daily ETL job that was running concurrently was delayed by 3 hours, causing all next-morning Cogito reports to show prior-day data. The analyst had not been aware that Clarity is a shared resource – not a personal query environment. The organization responded by implementing SQL resource governor limits per user session and requiring all ad hoc queries over 30-day date ranges to be reviewed by the reporting team lead before execution.
HIPAA, Access Controls, and Clarity Query Safety
Every Clarity query that returns PHI is subject to HIPAA’s minimum necessary standard. The minimum necessary standard requires that covered entities limit the protected health information they access to the minimum amount necessary to accomplish the intended purpose. An analyst running a patient volume report by department does not need patient names, dates of birth, or SSNs – those fields should not be included in the SELECT list.
Clarity access should be granted at the column or view level, not at the table level. An analyst who needs encounter counts for quality reporting does not need access to the PATIENT table’s SSN or financial columns. Database views that expose only the columns needed for each use case are the correct access model. Many organizations implement this through Clarity user schemas – each analyst role gets access to specific views rather than the underlying Clarity tables.
Query results that contain PHI must be stored only in HIPAA-covered locations. This means network drives or cloud storage covered by your organization’s BAA (Business Associate Agreement) – not personal desktops, personal email, or consumer cloud storage. Analysts who download PHI query results to a personal laptop to work on them remotely have created a potential HIPAA breach regardless of intent. Establish and communicate clear data handling rules for Clarity query results before granting access.
Query Performance and Clarity Database Etiquette
Clarity is a shared production reporting database. Your query competes for resources with every other concurrent query and with the ETL process. Writing performant queries is not optional – it is part of responsible Clarity access.
Index Awareness and Sargable Predicates
Clarity tables are indexed on their primary keys and on common filtering columns like HOSP_ADMSN_TIME, CONTACT_DATE, and DEPARTMENT_ID. Always filter on indexed columns where possible, and write your WHERE clause predicates in a “sargable” form – one that allows the SQL optimizer to use the index. A sargable predicate is one where the column is not wrapped in a function.
Non-sargable (bad): WHERE YEAR(HOSP_ADMSN_TIME) = 2026 – wrapping the column in YEAR() prevents index use.
Sargable (good): WHERE HOSP_ADMSN_TIME >= '2026-01-01' AND HOSP_ADMSN_TIME < '2027-01-01' – the index can seek directly to the date range.
Query Development Practices
Always develop Clarity queries with a TOP clause during testing. Use TOP 100 or TOP 1000 while validating your logic. This limits the result set and the resources consumed. Only remove the TOP clause when the query logic is confirmed correct and you are running the final production version. Using SELECT TOP 100 … with a representative date range catches most logic errors without running a full population query.
Avoid running large analytical queries during peak Clarity usage hours – typically 7:00 AM to 10:00 AM when operational staff run their daily reports. Schedule resource-intensive queries during off-peak hours. If your organization has a SQL Server Agent or job scheduler, use it to run large queries overnight after the ETL completes.
Common Clarity SQL Mistakes and How to Avoid Them
| Mistake | What Goes Wrong | Correct Approach |
|---|---|---|
| Counting CSN instead of PAT_ID for patient counts | COUNT(CSN) = encounter count, not patient count. A patient with 5 visits counts as 5. | Use COUNT(DISTINCT PAT_ID) for unique patient counts. Confirm with stakeholder which they need. |
| Joining PAT_ENC_DX without filtering LINE or PRIMARY_DX_YN | Encounter counted once per diagnosis. A patient with 4 diagnoses inflates the count 4x. | Add WHERE dx.PRIMARY_DX_YN = ‘Y’ for primary diagnosis only. Or filter by LINE = 1 if primary flag is unreliable. |
| Using DATEDIFF(DAY) for LOS calculation | DATEDIFF counts midnight crossings. A patient admitted at 11 PM and discharged at 1 AM = 1 day, not 2 hours. | Use DATEDIFF(MINUTE,…) / 1440.0 or DATEDIFF(HOUR,…) / 24.0 for accurate decimal LOS. |
| No date filter on large tables | Full table scan on PAT_ENC or ORDER_PROC reads millions of rows. Degrades system for all users. | Always include a WHERE clause on an indexed date column. Start with a narrow range during development. |
| Using BETWEEN on datetime columns | BETWEEN includes both endpoints. ‘BETWEEN 2026-01-01 AND 2026-01-31’ misses encounters after midnight on Jan 31. | Use >= start AND < end (half-open interval). Always precise with datetime columns. |
| Skipping ZC_ table joins | Stakeholders receive numeric codes. They guess meanings or ask you to “translate” – wasting both your time. | Always join ZC_ tables for categorical columns. Include both code and name in SELECT. |
| Including currently admitted patients in average LOS | Open encounters inflate average LOS because their stay is still accumulating. | Filter WHERE HOSP_DISCHRG_TIME IS NOT NULL. Or explicitly include open encounters with a flag column. |
Always add a date filter and a TOP clause before running any new Clarity query for the first time. Always. Without a date filter, a join that looks reasonable on paper can trigger a full table scan across years of patient data in seconds. The TOP clause limits the blast radius while you confirm the query returns the expected shape of results. Validate the logic on a small sample. Remove TOP only when you are confident the query is correct and necessary. This practice protects Clarity performance, limits PHI exposure during development, and catches logic errors before they reach leadership dashboards.
Authoritative References
- HHS OCR – HIPAA Minimum Necessary Requirement: Guidance on PHI Access Limitations for Covered Entities
- Microsoft SQL Server – Query Processing Architecture Guide: Sargability, Indexing, and Execution Plan Optimization
