Data acquisition and matching
The primary data source for the Community Benefit Insight tool is IRS Form 990. Additional data sources are used to confirm nonprofit hospital status and provide contextual information about each hospital or health system. This supplemental information also allows for hospital comparisons. Data sources include:
- IRS Form 990
- Schedule H
- CMS Cost Report
- CMS Providers of Service
- AHA hospital data
- Kaiser Family Foundation
- Hilltop Institute
Data acquisition and matching
The steps for acquiring and matching relevant hospital data are as follows:
- Electronic IRS Form 990 data (including all submitted schedules) is extracted, by employer identification number or tax ID (EIN), from the Amazon Web Services (AWS) hosting site. Broad selection criteria are used to capture nonprofit hospital EINs from these sources:
- IRS Exempt Organization Annual Extract of Financial Data (where Schedule H submission is indicated)
- IRS Exempt Organization Business Master File Extract (where foundation or NTEE code is a hospital)
- List of nonprofit hospital EINs collected in research performed by Dr. Gregory Tung and colleagues at the University of Colorado Denver - Department of Health Systems, Management, and Policy.
- List of nonprofit hospital EINs collected by Northeastern University in development of the Community Benefit Web Tool prototype, a precursor to CBI.
- Nonprofit hospitals, along with their CCN (CMS certification number), are identified from these sources and meeting the below criteria (d):
- CMS Cost Report – where PRVDR_CTRL_TYPE_CD is voluntary nonprofit, church, or other.
- CMS Providers of Service (POS) – where GNRL_CNTL_TYPE_CD is church, private (not for profit), or other.
- American Hospital Association (AHA) Annual Survey – where CNTRL is not for profit church-operated or not for profit other.
- And retain only hospitals meeting these criteria:
- Critical Access (CAH)
- Name and address information is extracted from Form 990 and Schedule H of (1) above and nonprofit CCNs of (2) above.
- Addresses are standardized by running them through Google's Geocoding API
- Form 990 and Schedule H data (1) are matched to CCNs (2) by name and standardized address. The results create the EIN-to-CCN crosswalk, which contains exact matches based on:
- Standardized address match
- Hospital name plus city and state match (including city/state ensures facilities with same names but located in different locations don't result in false match)
- Nonprofit hospitals without an exact match or questionable match are output for further examination. This occurs in approximately 20% of cases in any given year. These cases occur as:
- Exact matches where the CCN's EIN has changed from other years. This may be valid, but further examination is warranted to confirm an erroneous match did not occur.
- Partial matches, some components of name and/or address are similar
- No valid match, this category will include CCNs which submitted paper returns and are therefore not found in the electronic data extracted in (1) above.
- After further analysis is performed on cases noted above (6a-c) and correct EIN-to-CCN crosswalk information is obtained:
- Records are added to the list of exact matches in (5) and used to continue to (8) to build the Community Benefit Insight database.
- If the EIN is determined to be correct, but no electronic form is available, send to GuideStar for extraction of IRS data from paper form (approximately 4% of nonprofit hospitals a year).
- The EIN-to-CCN crosswalk (5 and 7) is used to build the Community Benefit Insight database from these sources:
- Electronic Form 990 (Parts I and III) and Schedule H information.
- CMS POS – Hospital county, bed count, medical school, church, and teaching affiliations.
- Area Health Resource File (AHRF) – County level information for hospital county, such as per capita income, median income, % in poverty, % <65 w/o health insurance, unemployment rate.
- AHA – Sole community provider and system affiliation.
- Kaiser Family Foundation website – ACA Medicaid expansion and enrollment indicators.
- Hilltop Institute – State community benefits reporting requirement indicator.
Data release schedule
The time from the end of a non-profit hospital's tax period to when their return is made publicly available by the IRS is currently about 16 months.
As 2015 tax years can end in calendar year 2016, this results in some returns not being available until late 2017 and into early 2018. Once the majority of electronically submitted tax year 2015 data are available, they will be processed and made available on CBI, approximately spring 2018.
Tax year 2016 data availability will be actively monitored and once determined to be approximately 50% complete, these data will be processed and added to CBI, too (estimated fall 2018). The remainder of the tax year 2016 data will be processed and added to CBI when most of the electronic returns are available, approximately spring 2019.
Processing of tax year 2017 returns will follow a similar methodology as described for 2016 data.