Epic Clarity SQL for Analysts

Epic Clarity SQL for Analysts: Writing Queries, Common Tables, and Data Validation Patterns

PAT_ENC
The central Clarity encounter table – every patient visit has a row here
CLARITY_ADT
Patient movement events – the source for LOS, boarding time, and ADT reporting
Read-Only
Clarity access must always be read-only – never run DML against a clinical reporting database
HIPAA
Every Clarity query that returns PHI must be covered by minimum necessary justification and audit logging

Epic Clarity SQL is one of the most valuable technical skills in healthcare IT – and one of the most misused. Analysts who know how to write efficient, safe queries against Clarity can answer operational and clinical questions in minutes that would otherwise require weeks of report development. Analysts who treat Clarity like a general-purpose SQL database produce incorrect results, degrade reporting system performance, and in worst cases expose PHI to unauthorized access. This article covers how Clarity is structured, which tables matter most, how to write queries that produce accurate results, and the safety practices that separate professional Clarity work from dangerous improvisation.

How Clarity Is Structured: ETL, Tables, and Relationships

Clarity is a Microsoft SQL Server relational database populated via ETL from Epic’s operational database (Chronicles). The ETL runs on a defined schedule – typically nightly for most tables, with near-real-time extracts available for select high-urgency tables like ADT data and order status. Clarity does not hold real-time data by default. A query run at 9:00 AM against a nightly-extract table reflects data through the prior evening.

Chronicles is a hierarchical, proprietary database organized around master files and records. Clarity maps Chronicles’ hierarchical structure to normalized relational tables. Each Chronicles master file becomes one or more Clarity tables. The Chronicles Patient (EPT) master file becomes the PATIENT table in Clarity. The Chronicles Encounter (PAT_ENC) record becomes the PAT_ENC table in Clarity. These mappings are documented in Epic’s Clarity Data Dictionary, which is accessible to organizations with Epic licenses. The broader context of Epic’s data architecture – including how Clarity relates to Caboodle and the Reporting Workbench – is covered in the Epic Cogito Reporting and Analytics guide.

How Clarity Table Names Map to Clinical Concepts

Clarity table names follow a predictable convention. Tables prefixed with PAT_ relate to patient or encounter data. Tables prefixed with ORD_ relate to orders. CLB_ tables relate to clinical billing and charge data. HSP_ tables relate to hospital account and financial data. DOCS_ tables relate to document and clinical note data. ZC_ tables are category (lookup/reference) tables that contain the valid values for categorical fields – ZC_PAT_CLASS contains the valid patient class codes, ZC_DISP_ENC_TYPE contains encounter disposition types.

ZC_ tables are essential for making categorical data human-readable. A query against PAT_ENC returns numeric codes for patient class, encounter type, and department. Joining to the corresponding ZC_ table replaces those codes with descriptive names. Queries that skip ZC_ joins return raw codes that clinical and operational stakeholders cannot interpret – and may interpret incorrectly if they guess at what a code means.

The Most Important Clarity Tables for Analysts

Table NameClinical DomainKey ColumnsCommon Use Case
PAT_ENCPatient encounters (all types)PAT_ENC_CSN_ID, PAT_ID, HOSP_ADMSN_TIME, HOSP_DISCHRG_TIME, DEPARTMENT_ID, ENC_TYPE_CPatient volume, LOS, encounter type breakdown
CLARITY_ADTPatient movement (ADT events)PAT_ENC_CSN_ID, EVENT_TYPE_C, EFFECTIVE_TIME, FROM_BASE_CLASS_C, TO_BASE_CLASS_CLOS calculation, transfer tracking, boarding time
PAT_ENC_DXEncounter diagnoses (ICD-10)PAT_ENC_CSN_ID, DX_ID, LINE, PRIMARY_DX_YNDiagnosis reporting, case mix, ICD-10 analysis
ORDER_PROCProcedure orders (lab, imaging)ORDER_PROC_ID, PAT_ENC_CSN_ID, PROC_ID, ORD_STATUS_C, ORDER_TIMEOrder utilization, CPOE analysis, turnaround time
ORDER_RESULTSLab and result valuesORDER_PROC_ID, COMPONENT_ID, ORD_VALUE, RESULT_TIME, RESULT_FLAG_CLab result analysis, critical value reporting
MAR_ADMIN_INFOMedication administration (eMAR)ORDER_MED_ID, TAKEN_TIME, MAR_ACTION_C, SIG_USER_ID, DOSE_UNIT_CMedication compliance, administration timing analysis
PATIENTPatient demographicsPAT_ID, PAT_MRN_ID, BIRTH_DATE, SEX_C, PAT_STATUS_CDemographics, EMPI analysis, patient population
CLARITY_DEPDepartment referenceDEPARTMENT_ID, DEPARTMENT_NAME, REV_LOC_ID, SPECIALTY_DEP_CDepartment name lookup, location reference
CLB_PROV_CHARGEProfessional billing chargesTX_ID, PAT_ENC_CSN_ID, PROC_ID, AMOUNT, POST_DATECharge capture analysis, revenue cycle reporting

Understanding PAT_ENC_CSN_ID: The Key That Connects Everything

PAT_ENC_CSN_ID is the Contact Serial Number – Epic’s unique identifier for a single patient encounter. It is the primary join key across most Clarity tables. When you need to connect encounter data to diagnoses, orders, results, medications, and charges, PAT_ENC_CSN_ID is the column that ties them together. Every analyst writing Clarity SQL must understand this identifier before writing any multi-table query.

PAT_ID is the patient-level identifier – it identifies the patient across all encounters. PAT_ENC_CSN_ID identifies a single visit. A patient with 20 ED visits has one PAT_ID and 20 PAT_ENC_CSN_IDs. Confusing these two identifiers produces queries that either aggregate incorrectly (using PAT_ID where you need CSN) or count visits as patients (using CSN where you need patient-level distinct counts).

Writing Clarity SQL: Patterns That Work

Pattern 1: Patient Volume by Department and Encounter Type

The most fundamental Clarity query counts encounters by department for a defined period. This pattern is the starting point for nearly all operational reporting:

SELECT
d.DEPARTMENT_NAME,
zc.NAME AS ENC_TYPE,
COUNT(pe.PAT_ENC_CSN_ID) AS ENCOUNTER_COUNT,
COUNT(DISTINCT pe.PAT_ID) AS UNIQUE_PATIENTS
FROM PAT_ENC pe
INNER JOIN CLARITY_DEP d ON pe.DEPARTMENT_ID = d.DEPARTMENT_ID
LEFT JOIN ZC_ENC_TYPE zc ON pe.ENC_TYPE_C = zc.ENC_TYPE_C
WHERE pe.HOSP_ADMSN_TIME >= ‘2026-01-01’
AND pe.HOSP_ADMSN_TIME < ‘2026-02-01’
AND pe.ENC_TYPE_C IN (3, 101, 50) — Hospital Encounter, Office Visit, ED
GROUP BY d.DEPARTMENT_NAME, zc.NAME
ORDER BY ENCOUNTER_COUNT DESC;

Note the distinction between COUNT(PAT_ENC_CSN_ID) for encounters and COUNT(DISTINCT PAT_ID) for unique patients. Reporting “patient volume” requires clarity (no pun intended) about whether the stakeholder wants encounter count or unique patient count. These numbers are different and tell different operational stories. Always confirm which the stakeholder needs before delivering the query.

Pattern 2: Length of Stay Using CLARITY_ADT

Length of stay calculations must use CLARITY_ADT timestamp events rather than PAT_ENC admission and discharge times alone. CLARITY_ADT captures every patient movement event, allowing separation of active care time from boarding time:

SELECT
pe.PAT_ENC_CSN_ID,
p.PAT_MRN_ID,
pe.HOSP_ADMSN_TIME,
pe.HOSP_DISCHRG_TIME,
DATEDIFF(HOUR, pe.HOSP_ADMSN_TIME, pe.HOSP_DISCHRG_TIME) AS LOS_HOURS,
CAST(DATEDIFF(HOUR, pe.HOSP_ADMSN_TIME, pe.HOSP_DISCHRG_TIME) AS DECIMAL(10,2)) / 24.0 AS LOS_DAYS,
d.DEPARTMENT_NAME AS ADMIT_DEPT
FROM PAT_ENC pe
INNER JOIN PATIENT p ON pe.PAT_ID = p.PAT_ID
INNER JOIN CLARITY_DEP d ON pe.DEPARTMENT_ID = d.DEPARTMENT_ID
WHERE pe.HOSP_ADMSN_TIME >= ‘2026-01-01’
AND pe.HOSP_ADMSN_TIME < ‘2026-02-01’
AND pe.ENC_TYPE_C = 3 — Hospital Encounter (inpatient)
AND pe.HOSP_DISCHRG_TIME IS NOT NULL; — Exclude still-admitted patients

The WHERE clause filtering on HOSP_DISCHRG_TIME IS NOT NULL is essential. Including patients still admitted produces an artificially inflated LOS because DATEDIFF against a NULL discharge time returns NULL – but if your aggregation drops NULLs, your average silently excludes current admits. If DATEDIFF coerces the NULL, you get wrong numbers. Always be explicit about whether you want currently admitted patients included or excluded, and handle it in the WHERE clause.

Pattern 3: Diagnosis Analysis with ICD-10 Codes

Pulling encounters by primary diagnosis requires joining PAT_ENC to PAT_ENC_DX and then to the EDG_CURRENT_ICD10 table for the actual ICD-10 code:

SELECT
edg.CODE AS ICD10_CODE,
edg.CODE_DESC AS DIAGNOSIS_NAME,
COUNT(pe.PAT_ENC_CSN_ID) AS ENCOUNTER_COUNT,
COUNT(DISTINCT pe.PAT_ID) AS UNIQUE_PATIENTS
FROM PAT_ENC pe
INNER JOIN PAT_ENC_DX dx ON pe.PAT_ENC_CSN_ID = dx.PAT_ENC_CSN_ID
INNER JOIN CLARITY_EDG edg ON dx.DX_ID = edg.DX_ID
WHERE pe.HOSP_ADMSN_TIME >= ‘2026-01-01’
AND pe.HOSP_ADMSN_TIME < ‘2026-04-01’
AND dx.PRIMARY_DX_YN = ‘Y’ — Primary diagnosis only
AND edg.CODE LIKE ‘A41%’ — Sepsis ICD-10 range
GROUP BY edg.CODE, edg.CODE_DESC
ORDER BY ENCOUNTER_COUNT DESC;

The PRIMARY_DX_YN = ‘Y’ filter is critical. PAT_ENC_DX contains all diagnoses for an encounter – primary, secondary, and others. Without this filter, you count encounters multiple times (once per diagnosis) and inflate your counts. The LINE column in PAT_ENC_DX provides diagnosis sequence – LINE = 1 is typically the primary diagnosis, but PRIMARY_DX_YN is the authoritative flag. Use PRIMARY_DX_YN, not LINE alone. CPOE and order workflows that generate the diagnoses being reported here are covered in the Epic EHR Orders and CPOE guide.

Date and Time Handling in Clarity

Clarity stores timestamps in local time by default. This creates a subtle but important problem for health systems that span time zones: a patient admitted in Chicago and a patient admitted in Phoenix on the same calendar date may have timestamps that are 2 hours apart in Clarity. If your organization has multiple time zones, confirm with your Epic technical team how timestamps are stored and whether any conversion is needed before aggregating across facilities.

Date range filtering in Clarity SQL must always use half-open intervals. Use WHERE HOSP_ADMSN_TIME >= '2026-01-01' AND HOSP_ADMSN_TIME < '2026-02-01' rather than BETWEEN '2026-01-01' AND '2026-01-31'. The BETWEEN operator includes both endpoints. On a datetime column, BETWEEN '2026-01-01' AND '2026-01-31' excludes encounters with admission times after midnight on January 31 (e.g., 2026-01-31 06:00:00) unless you specify the end of day. The half-open interval is unambiguous and correct for any timestamp.

DATEDIFF in SQL Server calculates the number of interval boundaries crossed between two dates, not the actual elapsed time. DATEDIFF(DAY, ‘2026-01-31 23:59’, ‘2026-02-01 00:01’) returns 1 – even though only 2 minutes have elapsed. For clinical LOS calculations, use DATEDIFF(HOUR, …) and divide by 24, or use DATEDIFF(MINUTE, …) and divide by 1440. This gives you a decimal LOS in days that correctly represents partial days.

Real Scenario – Health System Quality Team, LOS Calculation Error

A health system quality team developed a monthly inpatient LOS report for the CMO’s performance dashboard. The report used DATEDIFF(DAY, HOSP_ADMSN_TIME, HOSP_DISCHRG_TIME) to calculate LOS. The CMO received the report and compared it to the health system’s prior vendor’s LOS calculation. The numbers were consistently 0.3-0.5 days lower than the prior system’s figures. Investigation revealed the prior system had calculated LOS as total hours divided by 24, counting partial days correctly. The Clarity query using DATEDIFF(DAY) was counting midnight crossings, which underrepresented LOS for patients admitted in the morning and discharged the following morning – a common inpatient pattern. The fix was to change the calculation to CAST(DATEDIFF(MINUTE, HOSP_ADMSN_TIME, HOSP_DISCHRG_TIME) AS DECIMAL(10,2)) / 1440.0. The CMO received a corrected report with a note explaining the methodology change. No patient safety issue arose, but 3 months of performance data needed recalculation.

Data Validation Patterns Against Clarity

Data validation against Clarity answers two distinct questions. The first is technical: does the Clarity data match what is in Chronicles (did the ETL run correctly)? The second is clinical: does the Clarity data accurately represent the clinical reality it is supposed to capture? Both questions require different validation approaches.

ETL Validation: Record Count Reconciliation

The simplest ETL validation confirms that Clarity PAT_ENC counts match Epic’s operational encounter counts for the same period:

— Count of encounters registered in the last 24 hours
— Compare this to Epic’s ADT daily census report for the same period
SELECT
CAST(pe.CONTACT_DATE AS DATE) AS CONTACT_DATE,
pe.ENC_TYPE_C,
COUNT(pe.PAT_ENC_CSN_ID) AS CLARITY_COUNT
FROM PAT_ENC pe
WHERE pe.CONTACT_DATE >= CAST(GETDATE() AS DATE) – 1
AND pe.CONTACT_DATE < CAST(GETDATE() AS DATE)
GROUP BY CAST(pe.CONTACT_DATE AS DATE), pe.ENC_TYPE_C
ORDER BY CONTACT_DATE, pe.ENC_TYPE_C;

Field-Level Validation: Sample-Based Spot Checks

Reconciling record counts confirms the ETL ran. It does not confirm field-level accuracy. A field-level validation pulls a sample of Clarity records and compares individual field values to what appears in the Epic clinical application for the same encounter. This is the validation step most teams skip – and it catches the subtle data transformation errors that count reconciliation misses.

A structured field-level validation should cover: patient name and MRN (PATIENT table vs Epic registration screen), admission and discharge timestamps (PAT_ENC vs clinical encounter), primary diagnosis code (PAT_ENC_DX vs diagnosis list in chart), attending provider ID (PAT_ENC vs provider in clinical encounter), and for financial validation, charge amounts (CLB_PROV_CHARGE vs charge review in Resolute). BAT methodology as applied to data validation is described in the BAT vs UAT guide.

Clinical Validation: Does the Data Match Reality?

Clinical validation goes beyond ETL accuracy. It asks whether the Clarity data correctly represents the clinical situation. A report that shows 95% medication administration compliance may be technically correct in Clarity but clinically misleading if nurses are documenting “given” for medications they did not actually administer. The data in Clarity reflects documentation – not always the clinical event itself. Understanding the clinical documentation workflow that generates the data being queried is a prerequisite for valid clinical reporting. The documentation workflows are covered in the EpicCare Inpatient ClinDoc guide.

What Separates Useful SQL from Dangerous SQL in Clarity

The difference between useful and dangerous Clarity SQL is not query complexity. It is the combination of access scope, query impact on system performance, and the clinical accuracy of the results returned. A sophisticated query run responsibly is useful. A simple query run carelessly against the wrong population is dangerous.

CategoryUseful SQL PracticeDangerous SQL PracticeWhy It Matters
Access scopeRead-only access only; specific columns in SELECTSELECT * on wide tables; write access to ClaritySELECT * pulls all columns including PHI not needed for the analysis
Date filteringAlways filter by indexed date column with narrow rangeNo date filter; scanning full table historyFull table scan on PAT_ENC (millions of rows) degrades Clarity for all users
Row limitingTOP 100 or TOP 1000 during development and validationRemoving TOP limit before validating logic is correctUnrestricted results expose more PHI than needed during query development
DML statementsSELECT only; no INSERT, UPDATE, DELETE, DROPAny DML against Clarity – even accidentalDML can corrupt clinical data and invalidate ETL integrity
PHI handlingMinimum necessary columns; results stored in HIPAA-covered locationExporting PHI to desktop, email, or personal storageHIPAA breach risk; audit trail violation
Join logicExplicit JOIN type (INNER/LEFT) with documented reasonImplicit joins (comma-separated FROM clause)Implicit joins can produce cartesian products that inflate counts silently

Never run DML (INSERT, UPDATE, DELETE, TRUNCATE, DROP) against Clarity. This cannot be overstated. Clarity is an ETL-managed database. Its data integrity depends on the ETL process owning all writes. A manual UPDATE to a Clarity table will be overwritten by the next ETL run – or may not be overwritten if the ETL only processes changed records, leaving your manual change corrupting the data indefinitely. Most organizations grant only read-only database access to Clarity. If you have write access, use it only in designated scratch or staging schemas, never against Epic’s Clarity schema objects.

Real Scenario – Regional Hospital, Unfiltered Query Degrading Clarity Performance

A reporting analyst at a regional hospital ran an ad hoc Clarity query to support a payer audit. The query joined PAT_ENC, PATIENT, PAT_ENC_DX, ORDER_PROC, and ORDER_RESULTS with no date filter – effectively scanning 8 years of patient data across 5 tables simultaneously. The query ran for 47 minutes before the DBA terminated it. During those 47 minutes, the Reporting Workbench was unresponsive for all users because the query had consumed all available Clarity SQL Server CPU resources. The daily ETL job that was running concurrently was delayed by 3 hours, causing all next-morning Cogito reports to show prior-day data. The analyst had not been aware that Clarity is a shared resource – not a personal query environment. The organization responded by implementing SQL resource governor limits per user session and requiring all ad hoc queries over 30-day date ranges to be reviewed by the reporting team lead before execution.

HIPAA, Access Controls, and Clarity Query Safety

Every Clarity query that returns PHI is subject to HIPAA’s minimum necessary standard. The minimum necessary standard requires that covered entities limit the protected health information they access to the minimum amount necessary to accomplish the intended purpose. An analyst running a patient volume report by department does not need patient names, dates of birth, or SSNs – those fields should not be included in the SELECT list.

Clarity access should be granted at the column or view level, not at the table level. An analyst who needs encounter counts for quality reporting does not need access to the PATIENT table’s SSN or financial columns. Database views that expose only the columns needed for each use case are the correct access model. Many organizations implement this through Clarity user schemas – each analyst role gets access to specific views rather than the underlying Clarity tables.

Query results that contain PHI must be stored only in HIPAA-covered locations. This means network drives or cloud storage covered by your organization’s BAA (Business Associate Agreement) – not personal desktops, personal email, or consumer cloud storage. Analysts who download PHI query results to a personal laptop to work on them remotely have created a potential HIPAA breach regardless of intent. Establish and communicate clear data handling rules for Clarity query results before granting access.

Query Performance and Clarity Database Etiquette

Clarity is a shared production reporting database. Your query competes for resources with every other concurrent query and with the ETL process. Writing performant queries is not optional – it is part of responsible Clarity access.

Index Awareness and Sargable Predicates

Clarity tables are indexed on their primary keys and on common filtering columns like HOSP_ADMSN_TIME, CONTACT_DATE, and DEPARTMENT_ID. Always filter on indexed columns where possible, and write your WHERE clause predicates in a “sargable” form – one that allows the SQL optimizer to use the index. A sargable predicate is one where the column is not wrapped in a function.

Non-sargable (bad): WHERE YEAR(HOSP_ADMSN_TIME) = 2026 – wrapping the column in YEAR() prevents index use.
Sargable (good): WHERE HOSP_ADMSN_TIME >= '2026-01-01' AND HOSP_ADMSN_TIME < '2027-01-01' – the index can seek directly to the date range.

Query Development Practices

Always develop Clarity queries with a TOP clause during testing. Use TOP 100 or TOP 1000 while validating your logic. This limits the result set and the resources consumed. Only remove the TOP clause when the query logic is confirmed correct and you are running the final production version. Using SELECT TOP 100 … with a representative date range catches most logic errors without running a full population query.

Avoid running large analytical queries during peak Clarity usage hours – typically 7:00 AM to 10:00 AM when operational staff run their daily reports. Schedule resource-intensive queries during off-peak hours. If your organization has a SQL Server Agent or job scheduler, use it to run large queries overnight after the ETL completes.

Common Clarity SQL Mistakes and How to Avoid Them

MistakeWhat Goes WrongCorrect Approach
Counting CSN instead of PAT_ID for patient countsCOUNT(CSN) = encounter count, not patient count. A patient with 5 visits counts as 5.Use COUNT(DISTINCT PAT_ID) for unique patient counts. Confirm with stakeholder which they need.
Joining PAT_ENC_DX without filtering LINE or PRIMARY_DX_YNEncounter counted once per diagnosis. A patient with 4 diagnoses inflates the count 4x.Add WHERE dx.PRIMARY_DX_YN = ‘Y’ for primary diagnosis only. Or filter by LINE = 1 if primary flag is unreliable.
Using DATEDIFF(DAY) for LOS calculationDATEDIFF counts midnight crossings. A patient admitted at 11 PM and discharged at 1 AM = 1 day, not 2 hours.Use DATEDIFF(MINUTE,…) / 1440.0 or DATEDIFF(HOUR,…) / 24.0 for accurate decimal LOS.
No date filter on large tablesFull table scan on PAT_ENC or ORDER_PROC reads millions of rows. Degrades system for all users.Always include a WHERE clause on an indexed date column. Start with a narrow range during development.
Using BETWEEN on datetime columnsBETWEEN includes both endpoints. ‘BETWEEN 2026-01-01 AND 2026-01-31’ misses encounters after midnight on Jan 31.Use >= start AND < end (half-open interval). Always precise with datetime columns.
Skipping ZC_ table joinsStakeholders receive numeric codes. They guess meanings or ask you to “translate” – wasting both your time.Always join ZC_ tables for categorical columns. Include both code and name in SELECT.
Including currently admitted patients in average LOSOpen encounters inflate average LOS because their stay is still accumulating.Filter WHERE HOSP_DISCHRG_TIME IS NOT NULL. Or explicitly include open encounters with a flag column.
The Clarity SQL Practice That Prevents the Most Problems

Always add a date filter and a TOP clause before running any new Clarity query for the first time. Always. Without a date filter, a join that looks reasonable on paper can trigger a full table scan across years of patient data in seconds. The TOP clause limits the blast radius while you confirm the query returns the expected shape of results. Validate the logic on a small sample. Remove TOP only when you are confident the query is correct and necessary. This practice protects Clarity performance, limits PHI exposure during development, and catches logic errors before they reach leadership dashboards.

Authoritative References

Downloads: Clarity SQL Reference Tools

๐Ÿ“Š
Clarity SQL Query Library – Common Patterns (Excel)
Ready-to-use Clarity SQL query templates: patient volume by department, LOS calculation, ICD-10 diagnosis analysis, medication administration compliance, order turnaround time, charge capture by service, and readmission identification. Includes table references and ZC_ join notes.

Download Query Library (Excel)

๐Ÿ“‹
Clarity Table Reference Guide (PDF)
Quick-reference for the 25 most commonly queried Clarity tables: table name, clinical domain, key columns, primary join keys, ETL schedule type (nightly vs near-real-time), and common mistakes per table.

Download Table Reference (PDF)

๐Ÿงช
Clarity SQL Safety and Data Validation Checklist (PDF)
Pre-query safety checklist: read-only access confirmed, date filter added, TOP clause added, PHI minimized, output location HIPAA-covered. Plus data validation steps: ETL record count reconciliation, field-level spot check, and clinical accuracy confirmation protocol.

Download Safety Checklist (PDF)

Scroll to Top