Data acquisition and matching
The primary data source for the Community Benefit Insight tool is IRS Form 990. Additional data sources are used to confirm nonprofit hospital status and provide contextual information about each hospital or health system. This supplemental information also allows for hospital comparisons. Data sources include:
- IRS Form 990
- Schedule H
- CMS Cost Report
- CMS Providers of Service
- Kaiser Family Foundation
- Hilltop Institute
Data acquisition and matching
The steps for acquiring and matching relevant hospital data are as follows:
- Electronic IRS Form 990 data (including all submitted schedules) is extracted, by employer identification number or tax ID (EIN), from the Amazon Web Services (AWS) hosting site. Broad selection criteria are used to capture nonprofit hospital EINs from these sources:
- IRS Exempt Organization Annual Extract of Financial Data (where Schedule H submission is indicated)
- IRS Exempt Organization Business Master File Extract (where foundation or NTEE code is a hospital)
- List of nonprofit hospital EINs collected in research performed by Dr. Gregory Tung and colleagues at the University of Colorado Denver - Department of Health Systems, Management, and Policy.
- List of nonprofit hospital EINs collected by Northeastern University in development of the Community Benefit Web Tool prototype, a precursor to CBI.
- Nonprofit hospitals, along with their CCN (CMS certification number), are identified from these sources and meeting the below criteria (d):
- CMS Cost Report – where PRVDR_CTRL_TYPE_CD is voluntary nonprofit, church, or other.
- CMS Providers of Service (POS) – where GNRL_CNTL_TYPE_CD is church, private (not for profit), or other.
- And retain only hospitals meeting these criteria:
- Critical Access (CAH)
- Name and address information is extracted from Form 990 and Schedule H of (1) above and nonprofit CCNs of (2) above.
- Addresses are standardized by running them through Google's Geocoding API
- Form 990 and Schedule H data (1) are matched to CCNs (2) by name and standardized address. The results create the EIN-to-CCN crosswalk, which contains exact matches based on:
- Standardized address match
- Hospital name plus city and state match (including city/state ensures facilities with same names but located in different locations don't result in false match)
- Nonprofit hospitals without an exact match or questionable match are output for further examination. This occurs in approximately 20% of cases in any given year. These cases occur as:
- Exact matches where the CCN's EIN has changed from other years. This may be valid, but further examination is warranted to confirm an erroneous match did not occur.
- Partial matches, some components of name and/or address are similar
- No valid match, this category will include CCNs which submitted paper returns and are therefore not found in the electronic data extracted in (1) above.
- After further analysis is performed on cases noted above (6a-c) and correct EIN-to-CCN crosswalk information is obtained:
- Records are added to the list of exact matches in (5) and used to continue to (8) to build the Community Benefit Insight database.
- If the EIN is determined to be correct, but no electronic form is available, send to GuideStar for extraction of IRS data from paper form (approximately 4% of nonprofit hospitals a year).
- The EIN-to-CCN crosswalk (5 and 7) is used to build the Community Benefit Insight database from these sources:
- Electronic Form 990 (Parts I and III) and Schedule H information.
- CMS POS – Hospital county, bed count, medical school, church, and teaching affiliations.
- Area Health Resource File (AHRF) – County level information for hospital county, such as per capita income, median income, % in poverty, % <65 w/o health insurance, unemployment rate.
- Kaiser Family Foundation website – ACA Medicaid expansion and enrollment indicators.
- Hilltop Institute – State community benefits reporting requirement indicator.
Data release schedule
The majority of IRS Form 990 returns from tax exempt organizations which operate hospitals are available 16 months after the end of the organization’s tax period. However, these organizations can select their own tax reporting period. For example, a 2016 tax reporting year may start late in calendar year 2016 and extend well into calendar year 2017. Therefore, the availability of returns can vary significantly across all organizations.
We have found in the fall, approximately 50% of returns are available for the second previous year. For example, in the fall of 2018, approximately 50% of tax year 2016 records were available. Additionally we see the majority of returns available in the following spring for the same tax year. In other words, in spring 2019, the majority of tax year 2016 returns were available.
Following these patterns, CBI is updated in the fall and spring of each year with records for the second previous tax year. Additionally, any late arriving records from prior tax years are also processed and added during these update periods.
|Tax Year||# Tax Reporting Entities|