Baskets of Apples: Comparing reimbursements across plan MRFs

Hi Readers!

This month, we’ll demonstrate another use case for price transparency machine readable file (MRF) data by geographically mapping and comparing negotiated rates. And because it worked so well last time, I’ll try to make ‘apples’ related commentary a few more times in the process.

So, what’s interesting about this type of comparison? Why is it worth blogging about? Well, you don’t often see geographic analyses like I’m suggesting floating around on the Internet. The only source of data you could use to do this - commercial claims data - not only costs millions at a national level, it also has restrictions. Specifically, the claim payer is anonymized, and analysts are strictly prohibited from deanonymizing the payer. If you ever did identify which payer was which, you certainly couldn’t blog about it to show a nice pretty chart of how much payer A pays out relative to payer B in some county in the USA without a lawsuit on your hands.

MRF files are relatively new, publicly accessible, federal disclosures of reimbursement information. They come with no such use case restrictions. While some payers will write papers effectively saying “you should only use this if you combine it with claims and volumes data we won’t allow you to have”, other payers are already doing these kinds of analyses to joust with each other in RFPs and in court.

You don't have to be a payer to pull this off; Serif Health has built tools that make getting clean data for analyses like this simple and cost effective. So, let's walk through a basket comparison of healthcare reimbursements using data from our puller, noting the complications and complexities along the way.

Step 1: Picking Targets

In order to do a basket comparison of plan payment amounts, we’re going to need to pick a few plans to compare, a codeset to compare over, and a provider set.

If you pick national plans, you’re going to involve an order of magnitude more data points. And if you pick plans from different payers, you’re going to have potentially vast differences in the provider network included, or in code ranges available and well-populated in the data set due to differences in how plans might bill or reimburse.

While those might be exciting or interesting (and we can cover some of those in future posts!), we need to get a clean methodology established first. In the spirit of MVP & iteration, let’s simplify as much as possible, and pick two state-specific plans, from the same payer, in Texas - the Blue Choice PPO vs. Blue Advantage HMO. Note the HMO also has a BAV file as well.

From common assumptions about how healthcare works, we will assume the PPO would pay more for the same basket of goods and services than the HMO. Let's see if we can validate that with our analysis, shall we?

The codeset ‘basket’ we select from each plan for the analysis is important, since it informs the conclusions we can draw. If we limit the codeset down too much, we’ll wind up with a narrow lens on plan expense that doesn’t extend to other code ranges. For example, if we only compare outpatient shoppable services, we’re missing a big part of the healthcare spend picture - inpatient DRGs. If we only go for inpatient, we’re missing the significantly more common visits and procedures that most healthcare participants will actually experience.

That might lead you to conclude that well, we should just compare all codes. Unfortunately, there’s tens of thousands of billing codes. Including several thousands of low-value and rarely billed codes might wash out meaningful and important takeaways with noise data. Plus, the more codes you include, the more likely you are to run into 1) incomparable or bad data 2) gaps in payer data that will skew the analysis based on your gap fill methodology 3) computational issues like out of memory conditions. Each of these issues makes accurate comparisons complex to do correctly across larger codesets; for each code you pick you’ll have a data cleaning and QA exercise on your hands to make sure you’re actually working with 'apple-y' data and not something else.

At the end of the day, what matters here is that you’re picking a basket of codes that matters for your use case, is published by the payers, and results in a workable data set. So for this blog post, we went with a sampling of 20 codes across the billing code ranges and classes. Generally, we wanted to use billing codes that fell in “common procedures” or “shoppable services” lists published by various parties to inform price transparency disclosures, with an additional restriction - we only want to use codes that have a published baseline payment amount in standard CMS schedules.

The code list we landed on:

DRG 470 - Major hip and knee joint replacement
DRG 743 - Uterine and adnexa procedures
G0121 - Colonoscopy
11102 - Biopsy
27441 - Arthroplasty, knee, tibial plateau
29880 - Arthroscopy, knee, surgical
45378 - Colonoscopy, flexible; diagnostic,
59409 - Vaginal delivery only (with or without episiotomy and/or forceps)
66984 - Extracapsular cataract removal
70450 - Computed tomography, head or brain
73721 - MRI Lower Extremity Joint
76700 - Diagnostic Ultrasound
77065 - Diagnostic mammography
90837 - Psychotherapy, 1 hour
93000 - ECG
97110 - Physical Therapy 15m
99203 - New patient office visit, 30m
99213 - Established patient office visit, 30m
99283 - Emergency Dept. Level 3
99395 - Periodic preventative visit

This list is small enough to run analysis and generate output on a laptop, and it covers a broad spectrum of health care utilization - inpatient and outpatient surgeries, visit codes, ER, PT, Behavioral Health, and radiology are all included. Due to lack of standard baseline schedules, vaccines, anesthesia, labs, drugs, and exotic episodes (like cancer care) were excluded.

Finally, provider selection. Another basket, different kind of apples. Similar issues apply here as on the codeset side; the more you include, the more you might wind up with gaps or invalid data points in the comparison. The fewer you pick, the more biased your conclusions will be.

We can take a shortcut on this one and say 'let’s allow all provider NPIs through our data pull, and in post processing, we'll only keep the ones that show up in both networks'. That way we’re not cherry picking anything here - we’re avoiding basket selection issues entirely.

Many payers pay the same amount to providers across their PPO / HMO product lines, meaning this analysis run for such a payer would result in a net delta of zero. BCBS of Texas, however, is one of the entities that tiers payment amounts by product. So the final comparison will effectively tell us if there's a delta between PPO and HMO payment for any provider who sits in both product networks, which is exactly what we want to validate.

This decision does mean we’re distorting the PPO network somewhat - by only including providers who are already in the HMO, we are likely leaving out more expensive providers that choose to only be in-network in the PPO. Thankfully, that analysis is simple and can be done as a follow-on blog post after this one!

Step 2: Getting data.

Picking a machine readable file set to work with is easy, thanks to Serif Health’s data inventory. Our operations team gathers MRF data that covers tens of thousands of insurance plan ID’s monthly from ~170 payers nationwide, and allows for a flexible set of filters and output options to get just the data you need in the format you want.

Unfortunately, for this particular kind of comparison, MRF data isn’t ‘batteries included’. The raw MRF files do not have an easily identifiable geolocated place of service, there’s no way to know if a data point would actually be billed in the real world, and the data comes with no baseline amount or percentages to allow easy comparison.

Thankfully we’ve built features into our ETL pipeline (a.k.a. ‘data puller’) that will fill these gaps and make our lives substantially easier.

As a concrete example, let's look at source MRF data and how it can be enriched. A typical rate gets mapped to a provider in an in-network MRF file by tax ID number and a list of NPIs. Simplified (a lot):

_{{ TIN: ‘11-1111111’, NPI: [1111111111, 122222222, 1333333333], code: 99211, rate: 30.0 }}

Our puller allows us to explode NPI Lists, assign a primary taxonomy to each NPI, and geolocate each NPI to a zip3 using NPPES records, turning our example data into three identified and geolocated rows of data much more suitable for geolocated comparison. Exploding is useful as it gives us a rough proxy to billed volumes based on listed provider group size; a provider group of size 10 will count 10x as much as a provider group with one NPI. Why a 3 digit zip? We want to cover a broader geographic spectrum around each provider. Most providers practice at multiple locations; without a lookup source for getting multiple locations, we instead work with the assumption they could practice anywhere in a surrounding zip3.

_{{ TIN: ‘11-1111111’, NPI: 1111111111, rate: 30.0, code: 99211, taxonomy: ‘Family Medicine’, zip3: ‘451’ }}

_{{ TIN: ‘11-1111111’, NPI: 122222222, rate: 30.0, code: 99211, taxonomy: ‘Registered Nurse’, zip3: ‘451’ }}

_{{ TIN: ‘11-1111111’, NPI: 1333333333, rate: 30.0, code: 99211, taxonomy: ‘Ophthalmologist’, zip3: ‘992’ }}

Our puller also allows us to attach 'baseline' amounts to every data point, using the three national Center for Medicare and Medicaid Services (CMS) payment schedules (IPPS, OPPS, and PFS). Based on the code type, billing arrangement, and place of service data (elements not shown in our trivial example) in each data row, we can reference the appropriate CMS schedule and append the national payment amount and % of CMS ratio to the data point. Carrying forward row one from our previous example:

_{{ TIN: ‘11-1111111’, NPI: 1111111111, rate: 30.0, code: 99211, taxonomy: ‘Family Medicine’, zip3: ‘451’, billing_class: ‘professional’, site: 11, baseline: 23.38, baseline_schedule: ‘PFS - Nonfacility’, percent_cms: 128.31 }}

Last, but not least, we can use a new feature (going live in September) in our pulls output that leverages publicly available volumes data to flag which rates are not likely billable in practice a.k.a. ‘Zombie rates’ - an ophthalmologist doing a knee surgery, a knee surgeon removing cataracts, a knee replacement in an office setting, etc. etc. Again, extending our data row:

So, we set up a Serif Health data pull against the two BCBS networks to get two extracts, limited to the codes we chose, Texas NPIs only, baselined to national payment amounts and geolocated to a zip3, with billable rates flagged in the data row.

Aside: Reimbursement pros might be thinking, ‘national payment amount? Why not use the MAC or facility specific schedule so you get localized percentages that take into account the regional labor costs, real estate, and overhead differences?’ The reason is, it’s more complex to do that lookup (good luck with the IPPS pricer) and it ultimately doesn’t matter that much here. Using MAC specific schedules is most important when you want to say “this plan pays x% of CMS or this hospital gets paid x% of CMS” and have that statement mean relatively the same thing when one entity is in Boston and the other entity is in Dallas. But for our basket comparison, we’re only going to be comparing a provider to themselves across a localized plan. So we care about the sign of the delta more than the specific magnitude, and what matters most to us is picking a consistent denominator more than a cost-adjusted one.

Step 3: Cleaning and Normalization

From the pulled data output we have, we need to walk over the output data rows to build a visualization. We can do that with a pretty simple script, with some additional cleaning and normalization logic.

Cleaning logic:

Only keep rates for NPIs seen in both output files, ignore others
Ignore flagged ‘zombie rates’ - unbillable provider / code combos and places of service / code combos
Ignore any billing arrangements that don’t map to the CMS schedules, as they can’t be baselined or easily compared - specifically, drop all bundled, capitation, per diem and percentage arrangements.
Ignore any modifier’d rates that aren’t radiology subcomponents and thus also can’t be baselined to CMS
Drop obvious errors, bugs, and placeholder values payers are publishing in the source files. The most egregious will show up as ridiculous outliers (we saw last month some percentage rates published as negotiated, for example) - a simple enough filter is to remove all rates below 10% of CMS and above 10x (1000%) of CMS baseline.

Doing this cleaning, normalization and cross-comparison, we wound up with 102,072 common NPIs across both plans out of 139,530 in the PPO and 103,074 in the HMO that were geolocatable to a zip and had a primary taxonomy. Printing all price rows available for all NPIs across our 20 codes, we wound up with nearly 7 million data points. Dropping zombies and outliers trimmed the valid data set down to around 5 million output rows, and generated a CSV nearly 300MB uncompressed. Hopefully you now understand why we only picked 20 codes!

Normalization logic:

For each Zip3, calculate a final number that represents the expense ratio for the plan overall in that zip - effectively one number representing the overall ‘relative to Medicare’ ratio. An accurate but very complex calculation would be sum all billed-in-the-wild volumes, calculate a weighting ratio for each billing code, and multiply each per-code sums by that ratio. But instead, since we already eliminated zombies, we’ll just add all the remaining negotiated rate values to a numerator variable and add the CMS baseline rates for each of those negotiated values to a denominator variable. After all rates in the zip3 have been summed together across all providers and codes, divide the numerator by the denominator to get a net ‘percentage of CMS’ value for the zip3.
Finally, we need to generate a delta between the calculated plan ratios. There’s a few ways we can do this, but for simplicity’s sake, we’ll just subtract the calculated ratio for plan A from that calculated for plan B in each zip3. This gives us a somewhat arcane number - the absolute delta of % CMS to % CMS in a zip3 is not often how you’ll see plans compared - but again, for our analysis we’re not trying to generate headlines or make strong claims about Plan A being X percent more expensive than plan B. We’re just trying to understand whether the HMO reimburses more, less, or about the same as the PPO, and this kind of delta will be enough to do that.

After the normalization step, we wind up with a data structure with just a few elements - a zip 3, its ratio for plan A, its ratio for plan B, and the delta between them.

Step 4: Display & Takeaways

For what we’re doing, geographic heat maps rule, and the only tool I could find to do this kind of display cheaply and easily is Excel. Our script's last step is to expand each zip3 to zip5s and dump out the data into CSV format, with our final output containing two columns - a five digit zip code, and its calculated “Relative % of CMS Ratio” between the two plans. Mapping this geographically in Excel results in the following heat map…drumroll please…

Pretty! What does this tell us? Most importantly, the data confirms our hypothesis - the BCBS HMO reimburses relatively less than the PPO across every zip in the state, for our common codes basket, across identical providers. We shouldn’t get too lost in the weeds of what -.1 vs -.4 means given our cherry-picked code set, arbitrary delta calculation, and various simplifying assumptions - but we can comfortably say ‘relatively less reimbursement for our selected codes’.

It also tells us that there’s some geographic variance of the reimbursement delta, which is probably worth a deeper dive if you’re in an outlier area. Meaning, if you’re practicing medicine in the dark blue regions of the panhandle or between Abilene and Dallas Fort Worth, you bill all of these codes, and you’re in both the BCBS PPO and HMO, you’d probably want to email me so you can dig in to the underlying data and better understand why the MRFs report that the BCBS HMO is paying you significantly less relative to the BCBS PPO, when that’s not happening elsewhere in the state. (From a spot check, it looks like DRGs are reimbursing far lower in those areas in the HMO).

Summary

As you can see, comparing a ‘basket of apples’ across two health care plans isn’t simple. But it is possible - it just takes work, patience and some data massaging.

The good news is that Serif Health has the batteries included capability to pull, filter, extend, and clean MRF data to bootstrap this and other analyses you might want to perform, across any plans nationwide. We’re happy to help you with a custom data pull for your use case - just get in touch!

‍

Signal New

APIs

Data Delivery

Reporting & Analytics

Providers

Plans

Employers

Innovators

Blog

Step 1: Picking Targets

Step 2: Getting data.

Step 3: Cleaning and Normalization

Step 4: Display & Takeaways

Summary