Hi folks. Matt here, chiming in with our May edition of machine readable file processing notes. We have two fun ones this month!
HealthSparq's Non-Transparent Transparency Portal
Just to disprove readers who think I'm writing clickbait, I'll lead with my smoking gun evidence first - here's the "current" BCBS NE portal which is showing a 6/1 'date created' next to every file (screenshot taken May 23).
If you're reading this on date of posting, you can now start from a place of belief with what I'm going to explain about Aetna.
Aetna has posed a specific processing challenge since going live last July - navigating the variety of different transparency in coverage (TiC) machine readable file directories. Each Aetna directory hosts multiple indexes and a mix of national vs. local files. Aetna’s opaque, GUID based file naming scheme also changes monthly, making it hard to automate pulls month-over-month or keep track of index -> plan -> file name associations.
There are at least 5 different Aetna indexes. National files tend to be 900+MB compressed, ranging up to 4.3 GB compressed for the EPO and national open access managed choice file. State-specific HMO and group plans tend to be 80->250 MB, depending on density of the network and covered life counts.
The directories we have inventoried:
- Fully insured - Aetna’s national and state-specific group exchange plans. Possibly large group?
- Self - insured - ALICSI / ALICUNDER100 seem to be two different self-insured group hosted pages, one for large group and one for small group plans, and tend to include the plan sponsor TIN.
- AetnaCVS individual exchange plans - these tend to be smaller individual HIOS ID’s that don’t have much scale.
- Aetna Signature Administrators - administrative services only arrangements / TPA structured plans. These files tend to be national in scale and around 185 GB unpacked.
- Texas Health posts its own unique plans and directories at a different branded URL.
Unpacking the URL structure here, you can see that all of these MRF directories have the same root URL: https://health1.aetna.com/app/public/#/one/insurerCode=AETNACVS_I
The next URL query parameter “brandCode=” is what changes to inform which branded portal and directory you get. TEXASFI = Texas Health Fully Insured, ALICSI = Aetna Self insured, ALICFI = Aetna Fully insured, and so on.
The way we know this portal is maintained and whitelabled by HealthSparq is due to the identical headers and page chrome used by other plans who also have similar portals:
Again, we’ve been tracking the different indexes, pages, and networks for all of these plans going back many months. For Aetna, we will pull various state-specific networks for different clients, in addition to the national files for all major plans - PPO, Open Access, EPO, HMO, etc. This month, on 5/12, we noticed a few concerning things.
First, several of the regional Fully Insured indexes that provided state-specific HMO networks in popular states like New York and Texas were no longer present.
Second, when we pulled down the Aetna national fully insured index file labeled “2023-05-05_Aetna-Life-Insurance-Company_index.json.gz”, we were no longer able to locate the nationwide, broad-scale EPO/PPO/Choice networks that we've been able to consistently find for months.
After further investigation, we realized that there was an error in Aetna’s MRF directory and index files. The plans were indeed still listed in the directory and MRF files for those plans were downloadable by searching for Aetna TIN 06-6033492. When you punched in the TIN, you saw the familiar Aetna EPO / HMO / PPO machine readable file download links, and they worked.
But, when you downloaded the “Table of Contents” index file, and scanned through it - you didn't find any plans with TIN 06-6033492. You can verify for yourself with a cached copy we downloaded on May 17th and saved here: https://mrf.serifhealth.com/oneoff/2023-05-05_Aetna-Life-Insurance-Company_index.json.gz.
We went back to verify the issue this morning and it has been fixed, with a new directory, same name, that unpacks 7KB larger and does have all the national networks and files included: https://mrf.healthsparq.com/aetnacvs-egress.nophi.kyruushsq.com/prd/mrf/AETNACVS_I/ALICFI/2023-05-05/tableOfContents/2023-05-05_Aetna-Life-Insurance-Company_index.json.gz
But the portal still shows the ‘date created’ field for everything as 5-5.
We’re also now seeing Aetna of Texas and Aetna of New York table of contents files that weren’t present in the portal at all when we did our ingestion pass earlier in the month. Combine this with the fact we have multiple documented examples of postings being forward dated, including the screenshot above, and our post last month showing Univera's portal on 4/28 where all files dated 5/1, it’s fair to conclude that HealthSparq's portal tool is allowing these postings to be generated, updated, and edited on the fly with arbitrary dates in the ‘created’ field.
The code for connecting the displayed plans and files in the portal seems to be divorced from what’s generating the underlying machine readable files, allowing a plan to be listed in the website directory without actually passing the plan ID and MRF URL into the actual index file as we saw with Aetna earlier this month.
The processing note for readers out there is to be careful when you’re automating ingest or dates based on published TOC / index files from HealthSparq portals. Assume the dates in the portal and the files themselves are completely fungible and subject to file swaps / belated postings / updates under the hood with no record of the change. You’ll need to cache everything and potentially scan multiple times per month in case of errors or under-the-hood file swaps, since dates of files and TOCs is not trustworthy.
It seems…odd…for a product managing federal transparency compliance postings to be engineered to allow this level of non-transparent mutation.
BCBS of Minnesota’s IP blocking of Index Fetches
One of the more annoying automation issues we see in the wild is blocking of file fetches from EC2 / cloud provider IP addresses. These issues won’t appear on your home computer, but try and do a data pull from a lambda, EC2, or EKS instance in production, and you’ll get an error.
The irony should be lost on no one that the insurance companies who are required to post machine readable files have ‘protected’ these files with firewalls that prevent machines from actually reading them.
One such entity, BCBS of Minnesota, uses Imperva on their base URL path https://mktg.bluecrossmn.com/ - we know this to be true from the X-CDN header sent back in the request for their linked index file from https://www.bluecrossmn.com/transparency-coverage-machine-readable-files:
As well as the fact that the empty body sent back when requesting the file from an EC2 machine includes the word ‘incapsula’ with no further data.
Cool beans, BCBS. Thanks for the helpful output!
We also know that Highmark has a ‘partnership’ with BCBS Minnesota from their MRF page, here: https://mrfdata.hmhs.com/, plus the fact that the local in-network file URLs for Minnesota MRFs appear to be served by hmhs.com instead of bluecrossmn.com. That is, Highmark is actually serving the in network files under the hood…so it stands to reason that they’re serving the index file too.
Sure enough, they are.
But it's not as simple as taking the link directly from the Highmark MRF page. That URL, which is currently set to https://mrfdata.hmhs.com/files/363/pa/inbound/local/2023-05-01_Blue_Cross_and_Blue_Shield_of_Minnesota_index.json, still has a 363/PA prefix applied to the file URL - and when you download it, you're actually looking at Pennsylvania's table of contents, not Minnesota.
Instead, when you take the filename from highmark and put the appropriate base URL from the innetwork files together, you get the right output. https://mrfdata.hmhs.com/files/720/mn/inbound/local/2023-05-01_Blue_Cross_and_Blue_Shield_of_Minnesota_index.json
Check out the headers - just a plain ‘ol Apache server.
And, when hit from EC2:
So, there you go - fix up your index file URLs and ingest away!
We hope this post helps our readers with MRF processing issues, and as always, get in touch if you need help working with the data.