The NIS dataset
Working with the National Inpatient Sample (NIS) database can be
challenging due to its size (>7M records per year and 170+ variables)
and the limited support for non-proprietary statistical software
packages. The goal of readHCUP
is to make it easier for
researchers to focus on their research, not loading data into R. With
this in mind, readHCUP
’s read_nis()
allows
researchers to read datasets with a single function call.
Working with the NIS dataset
Once you have purchased the NIS dataset from HCUP, you can read the
NIS dataset (.ASC file) into R using a single function,
read_nis()
. The following code uses a synthetic NIS 2019
dataset (to avoid publishing real data) to show how to read NIS datasets
into R:
df <- read_nis("NIS_2019_test_data.ASC", 2019)
The path to the dataset and the year it was produced is all we need to read the dataset into R and save it as a tibble.
NOTE: The current method for reading data using
read_nis()
comes from thereadr
package. In the future, there will be additional support for other methods such asread.table
Corrected datasets
By default, the read_nis()
automatically returns the
corrected version of the data. For example, HCUP released corrections
for PCLASS_ORPROC
in the NIS 2019 and 2020 datasets.
Usually, you’d need to download a CSV file with the corrections and then
update the values in the dataset. This process can be a bit of a hassle
when there are 7M+ records, so the corrections are automatically applied
when using read_nis()
.
- Note: For the corrections to be applied,
KEY_NIS
andPCLASS_ORPROC
need to be included in your dataset. If they are not included,read_nis()
will still return the data, and you will receive a warning that corrections were not applied.
If you don’t want the corrections to be automatically applied, use
corrected = FALSE
:
# Read dataset the first 10 records of the dataset without corrections.
df <- read_nis("NIS_2019_test_data.ASC", 2019, n_max = 10, corrected = FALSE)
Supported datasets
The structure of the NIS dataset can change each year, which means
read_nis()
needs to be updated to support each NIS dataset.
You can find a list of readHCUP’s supported datasets by running the
following:
supported_datasets
#> $data
#> [1] "NIS 2016" "NIS 2017" "NIS 2018" "NIS 2019" "NIS 2020"
#>
#> $dataset_file_name
#> [1] "NIS_2016_CORE" "NIS_2017_CORE" "NIS_2018_CORE" "NIS_2019_CORE_V2"
#> [5] "NIS_2020_CORE_V2"
data
is the name of the dataset and the yeardataset_file_name
is the file name that was provided by the HCUP Central Distributor
If the dataset is not supported, you will receive an error message:
df_error <- read_nis("nis", 2040)
#> Error in read_nis("nis", 2040): This is not a currently supported dataset.
#> A list of supported datasets can be found using: View(supported_datasets)
If you’re working with a dataset that is not currently supported, please open an issue in GitHub, and we’ll work on adding it to the list of supported datasets.
Descriptions
The NIS dataset has over 150 variables, which are covered in detail
on HCUP’s website.
The descriptions()
function allows you to get a list of all
of the variable descriptions:
d_list <- descriptions("nis", 2019)
head(d_list)
#> # A tibble: 6 × 2
#> variable labels
#> <chr> <chr>
#> 1 AGE Age in years at admission
#> 2 AGE_NEONATE Neonatal age (first 28 days after birth) indicator
#> 3 AMONTH Admission month
#> 4 AWEEKEND Admission day is a weekend
#> 5 DIED Died during hospitalization
#> 6 DISCWT NIS discharge weight