Medicare Cost & Quality
What the same procedure costs across hospitals, how far charges sit above what Medicare actually pays, and whether spending more buys better outcomes — a two-fact star schema on public CMS data.
The brief
Healthcare prices are famously incoherent. The same routine procedure can carry wildly different "charges" at two hospitals across town, and almost nobody pays the charge anyway — Medicare pays its own administered rate. So a fair question for anyone who pays for care (an insurer, an employer, a policy team) is: does paying a hospital more actually buy better outcomes?
That question needs two datasets joined together, which is exactly the kind of work I did as a BI analyst in healthcare — so I built the warehouse to answer it from public CMS data.
The data
Two extracts from the Centers for Medicare & Medicaid Services, both public, no authentication:
- Medicare Inpatient Hospitals by Provider and Service — for every hospital and every MS-DRG (procedure group): total discharges, average covered charge (what the hospital bills), average total payment, average Medicare payment. ~150,000 hospital × DRG rows.
- Hospital Readmissions Reduction Program (HRRP) — for every hospital: the risk-adjusted 30-day excess-readmission ratio across six conditions.
The two files share one column under two different names: HRRP's Facility ID is the inpatient file's Rndrng_Prvdr_CCN, the CMS Certification Number. That single shared key is the whole project — it's what lets cost and quality be compared at the same hospital.
The pipeline
The fetch step resolves the inpatient file from the CMS DCAT catalog rather than a hardcoded URL (CMS rotates the download link each release year; the catalog lookup survives that). HRRP pages through the Provider Data Catalog API. Both land in Snowflake, get cast and cleaned in dbt staging, and conform into the marts.
The schema
This is a two-fact model, and that's the deliberate part. The cost data and the quality data live at genuinely different grains — one row per hospital × DRG for charges, one row per hospital × condition for readmissions. Forcing them into a single table would either explode into a meaningless DRG × condition cross-join or throw away the per-DRG charge detail.
So they stay as separate facts at their natural grains, sharing a conformed dim_provider. A dbt relationships test enforces the CCN join as a contract — if a future CMS release ships an HRRP hospital that isn't in the charges file, the build fails loudly instead of silently dropping it from every cost-vs-quality comparison. The full grain/conformance reasoning is in docs/star_schema.md.
Three questions
Q1 · Same procedure, different price
DRG 470 — major joint replacement without complications — is about as standardized as inpatient care gets. And yet across 1,212 hospitals the average charge runs from $19k to $384k — a 20× spread, median $80k — while the actual Medicare payment over the same hospitals varies only ~1.7×. The charge isn't tracking the cost of care; it's a negotiating anchor. The query behind this uses PERCENTILE_CONT window functions to get the distribution shape in a single pass — see sql_highlights/01.
Q2 · The markup gap
Zoom out from one DRG to whole service lines, and the markup — billed charge over Medicare payment — is remarkably uniform: nervous-system and digestive procedures top out at 6.4×, transplants sit at 5.6×, and everything else falls in between. The list price isn't tracking the complexity of the care; it's a roughly fixed multiple applied across the board. The number that matters here is discharge-weighted: a high-volume DRG counts more than a rare one. That weighting (sum of weighted measures over sum of discharges) is the difference between a real rollup and the naïve average-of-averages that quietly lies. sql_highlights/02 shows it.
Q3 · Does spending buy quality?
This is the join the whole schema was built for. Bucket 2,789 hospitals into five groups by what Medicare pays them per discharge (the cost fact), then plot each group's average 30-day excess-readmission ratio (the quality fact), joined through the shared provider dimension. There's a faint downward tilt — the best-paid quintile (avg $21.9k/discharge) lands just below the risk-adjusted expectation, the lowest-paid ($8.4k) just above — but the effect is barely there: the correlation is r = −0.081, meaning hospital payment explains well under 1% of the variance in readmissions.
That near-flatness is the finding. Paying a hospital 2.6× more buys, at most, a rounding error of better outcomes. It's the result payers and policy researchers keep arriving at, and being able to reproduce it from public data with a clean two-fact join — rather than asserting it — is the point of the project.
What this shows
- Two-fact conformed-dimension modeling — the part of dimensional modeling that's easy to describe and hard to do right.
- Honest aggregation — discharge-weighting non-additive averages, dropping suppressed cells, risk-adjusted ratios used correctly.
- SQL that earns its keep — percentile windows, NTILE bucketing, a cross-fact correlation, QUALIFY in the repo's highlights.
- dbt discipline — staging/marts layering, a cross-fact relationships test that encodes the CCN contract, accepted_range and accepted_values tests, a schema-name macro for clean MARTS/STG.
- Healthcare-domain fluency — DRGs, service lines, the charge/payment/Medicare-payment distinction, HRRP risk adjustment. The domain my résumé already lives in, now on the portfolio.
Stack
| Layer | Tool | |---|---| | Source | CMS data.cms.gov (public API, no auth) | | Warehouse | Snowflake Standard (trial) | | Ingest | Snowflake CLI + Python (DCAT-resolved fetch) | | Transform | dbt Core with dbt_utils | | Tests | dbt schema tests incl. cross-fact relationships | | Presentation | Hand-coded SVG charts on this page |
Where this goes next
- Multiple data years — adding 2019–2023 introduces a dim_date and turns both facts into periodic snapshots, at which point SCD2 on dim_provider earns its keep (hospitals open, close, and merge across years).
- Geographic cost-of-living adjustment — Medicare already wage-indexes payments; normalizing charges the same way would separate "expensive market" from "expensive hospital."
- A fact_review-style patient-experience fact — CMS publishes HCAHPS survey scores keyed on the same CCN, a third fact that drops straight into the conformed model.