NIS • readHCUP

library(readHCUP)

The NIS dataset

Working with the National Inpatient Sample (NIS) database can be challenging due to its size (>7M records per year and 170+ variables) and the limited support for non-proprietary statistical software packages. The goal of readHCUP is to make it easier for researchers to focus on their research, not loading data into R. With this in mind, readHCUP’s read_nis() allows researchers to read datasets with a single function call.

Working with the NIS dataset

Once you have purchased the NIS dataset from HCUP, you can read the NIS dataset (.ASC file) into R using a single function, read_nis(). The following code uses a synthetic NIS 2019 dataset (to avoid publishing real data) to show how to read NIS datasets into R:

df <- read_nis("NIS_2019_test_data.ASC", 2019)

The path to the dataset and the year it was produced is all we need to read the dataset into R and save it as a tibble.

NOTE: The current method for reading data using read_nis() comes from the readr package. In the future, there will be additional support for other methods such as read.table

Corrected datasets

By default, the read_nis() automatically returns the corrected version of the data. For example, HCUP released corrections for PCLASS_ORPROC in the NIS 2019 and 2020 datasets. Usually, you’d need to download a CSV file with the corrections and then update the values in the dataset. This process can be a bit of a hassle when there are 7M+ records, so the corrections are automatically applied when using read_nis().

Note: For the corrections to be applied, KEY_NIS and PCLASS_ORPROC need to be included in your dataset. If they are not included, read_nis() will still return the data, and you will receive a warning that corrections were not applied.

If you don’t want the corrections to be automatically applied, use corrected = FALSE:

# Read dataset the first 10 records of the dataset without corrections.
df <- read_nis("NIS_2019_test_data.ASC", 2019, n_max = 10, corrected = FALSE)

Supported datasets

The structure of the NIS dataset can change each year, which means read_nis() needs to be updated to support each NIS dataset. You can find a list of readHCUP’s supported datasets by running the following:

supported_datasets
#> $data
#> [1] "NIS 2016" "NIS 2017" "NIS 2018" "NIS 2019" "NIS 2020"
#> 
#> $dataset_file_name
#> [1] "NIS_2016_CORE"    "NIS_2017_CORE"    "NIS_2018_CORE"    "NIS_2019_CORE_V2"
#> [5] "NIS_2020_CORE_V2"

data is the name of the dataset and the year
dataset_file_name is the file name that was provided by the HCUP Central Distributor

If the dataset is not supported, you will receive an error message:

df_error <- read_nis("nis", 2040)
#> Error in read_nis("nis", 2040): This is not a currently supported dataset. 
#>   A list of supported datasets can be found using: View(supported_datasets)

If you’re working with a dataset that is not currently supported, please open an issue in GitHub, and we’ll work on adding it to the list of supported datasets.

Descriptions

The NIS dataset has over 150 variables, which are covered in detail on HCUP’s website. The descriptions() function allows you to get a list of all of the variable descriptions:

d_list <- descriptions("nis", 2019)
head(d_list)
#> # A tibble: 6 × 2
#>   variable    labels                                            
#>   <chr>       <chr>                                             
#> 1 AGE         Age in years at admission                         
#> 2 AGE_NEONATE Neonatal age (first 28 days after birth) indicator
#> 3 AMONTH      Admission month                                   
#> 4 AWEEKEND    Admission day is a weekend                        
#> 5 DIED        Died during hospitalization                       
#> 6 DISCWT      NIS discharge weight