---
title: "qalytools"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{qalytools}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteDepends{ggplot2}
editor_options:
  chunk_output_type: console
  markdown:
    wrap: 80
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.align = "center",
    fig.width = 8,
    fig.height = 5
)
```

qalytools provides a simple and intuitive user interface for the 
analysis of [EQ-5D](https://euroqol.org/eq-5d-instruments) surveys. It builds
upon the [eq5d](https://cran.r-project.org/package=eq5d) package to facilitate
the calculation of QALY metrics and other related values across multiple
surveys.

## Overview

The package provides a range of functions:

- Coercion functions `as_eq5d3l()`, `as_eq5d5l()` and `as_eq5dy3l()` for
  converting existing \R objects to EQ5D data frame subclasses (EQ5D3L, EQ5D5L
  and EQ5DY3L).

- The calculation of utility values based on a range of different value sets.
  This functionality is provided via the `calculate_utility()`, `add_utility()`
  and `available_valuesets()` functions which are wrappers around the
  functionality of the underlying eq5d package.

- The calculation of different Quality of Life Years (QALY) metrics including
  unadjusted 'raw' values, and the disutility from both perfect health and,
  optionally, a specified baseline. See `calculate_qalys()`.

- The calculation of the Paretian Classification of Health Change (PCHC) in an
  individual's health state between two surveys via `calculate_pchc()` (again
  wrapping functionality of the eq5d package).

- Easy calculation of responses with a health limitation (i.e. a non-one
  response in one of the dimensions) via `calculate_limitation()`.

Whilst this vignette provides an introduction to the core functionality of the
package, an example analysis can be found in the
[EQ5D Analysis](https://timtaylor.github.io/qalytools/articles/example_analysis.html)
article on the package website.


## The EQ5D object class

We define an `EQ5D` object as a table with data represented in long format that
meets a few additional criteria:

- It contains columns that represent dimensions from the EQ5D survey
  specification as well as a column representing the Visual Analogue Score
  (columns 'mobility', 'self_care', 'usual', 'pain', 'anxiety' and 'vas' in the
  example table below)

- Dimension values must be whole numbers, bounded between 1 and 3 for EQ5D3L and
  EQ5D3Y surveys or bounded between 1 and 5 for EQ5D5L surveys.

- It contains a column that acts as a unique respondent identifier
  (`respondentID`) and another that identifying different surveys over time
  (`surveyID). Together these should uniquely identify a response (i.e. no
  combination of these should be duplicated within the given data frame).

Table: Example of an EQ5D5L object

| surveyID| respondentID| time_index| mobility| self_care| usual| pain| anxiety| vas|sex    | age|
|--------:|------------:|----------:|--------:|---------:|-----:|----:|-------:|---:|:------|---:|
|        .|            .|         . |        .|         .|     .|    .|       .|   .|.      |   .|
|        1|            1|         30|        3|         2|     2|    1|       1|  87|Female |  25|
|        1|            2|         30|        2|         2|     2|    2|       2|  65|Male   |  30|
|        1|            3|         30|        1|         3|     3|    1|       3|  76|Male   |  39|
|        1|            4|         30|        1|         2|     3|    2|       4|  55|Female |  60|
|        1|            5|         30|        2|         1|     4|    1|       1|  80|Male   |  28|
|        1|            6|         30|        1|         3|     2|    1|       1|  83|Male   |  59|
|        .|            .|         . |        .|         .|     .|    .|       .|   .|.      |   .|

In qalytools these `EQ5D` objects are implemented as a subclass of data frame
and we provide associated methods for working with this class.

## Usage

In the following examples we make use of a synthetic EQ-5D-5L data set that is
included in the package

```{r}
library(qalytools)
library(ggplot2)

# Example EQ5D5L data
data("EQ5D5L_surveys")
str(EQ5D5L_surveys)
```

### Coercion

Before utilising methods from qalytools we must first coerce our input data to
the `<EQ5D5L>` object class via `as_eq5d5l()`

```{r}
(dat <- as_eq5d5l(
    EQ5D5L_surveys,
    mobility = "mobility",
    self_care = "self_care",
    usual = "usual",
    pain = "pain",
    anxiety = "anxiety",
    respondentID = "respondentID",
    surveyID = "surveyID",
    vas = "vas"
))
```

### Descriptive methods

To obtain a quick overview of the data we can call `summary()`. By default this
returns the output as a list of data frames, showing frequency counts and
proportions, split by `surveyID`

```{r}
head(summary(dat), n = 2)
```

Alternatively we can set the parameter `tidy` to TRUE and obtain the summary
data in a "tidy" (long) format.

```{r}
summary(dat, tidy = TRUE)
```

It can be useful to look at the percentage of responses, across dimensions,
that are not equal to one. This can be done with the `calculate_limitation()`
function

```{r}
(limitation <- calculate_limitation(dat))

ggplot(limitation, aes(x = surveyID, y = without_limitation, group = dimension)) +
    geom_line(aes(colour = dimension)) +
    theme_light() +
    scale_y_continuous(n.breaks = 10, expand = c(0.005, 0.005), limits = c(0, 1)) +
    scale_x_discrete() +
    scale_fill_discrete(name = "dimension") +
    ylab("Without limitation")
```

### Calculating utility values

To calculate the utility values we call `calculate_utility()` on our EQ5D object
with additional arguments specifying the countries and type we are interested
in. `calculate_utility()` will return the desired values split by country, type,
respondentID and surveyID. If you prefer to augment the utility values on top
of the input object we also provide the `add_utility()` function.

We can obtain a list of compatible value sets (across countries and type) by
passing our EQ5D object directly to the `available_valuesets()` function or by
passing a comparable string.

For EQ5D3L inputs the type can be:

-   TTO, the time trade-off valuation technique;

-   VAS, the visual analogue scale valuation technique;

-   RCW, a reverse crosswalk conversion to EQ5D5L values; or

-   DSU, the NICE Decision Support Unit's model that allows mappings on to
EQ5D5L values accounting for both age and sex.

For EQ5D5L inputs this can be:

-   VT, value sets generated via a EuroQol standardised valuation study
protocol;

-   CW, a crosswalk conversion EQ5D3L values; or

-   DSU, the NICE Decision Support Unit's model that allows mappings on to
EQ5D5L values accounting for both age and sex.

Note that `available_valuesets()`is a convenience wrapper around the
`eq5d::valuesets()` function. It will return a A [`tibble`][tibble::tibble] with
columns representing the EQ5D version, the value set country and the valueset
type.

```{r}
vs <- available_valuesets(dat)
head(vs, 10)

# comparable character values will also give the same result
identical(vs, available_valuesets("eq5d5l"))

# cross walk comparison
vs <- subset(vs, Country %in% c("England", "UK") & Type %in% c("VT", "CW"))
util <- calculate_utility(dat, type = vs$Type, country = vs$Country)

# plot the results
util$fill <- paste(util$.utility_country, util$.utility_type)
ggplot(util, aes(x = surveyID, y = .value, fill = fill)) +
    geom_boxplot(lwd = 1, outlier.shape = 4) +
    stat_summary(
        mapping = aes(group = .utility_country),
        fun = mean,
        geom = "point",
        position = position_dodge(width = 0.75),
        shape = 21,
        color = "black",
        fill = "white"
    ) +
    theme_light() +
    geom_hline(yintercept = 0, linetype = "longdash", size = 0.6, color = "grey30") +
    scale_y_continuous(n.breaks = 10, expand = c(0.005, 0.005)) +
    scale_x_discrete(name = "value set")
```

For further details about the available value sets, consult the documentation of
the wrapped eq5d package via  `vignette(topic = "eq5d", package = "eq5d")`.

### QALY calculations

Quality of life years can be calculated directly from utility values. By
default, two different metrics are provided. Firstly, a "raw" value which is
simply the scaled area under the utility curve and, secondly, a value which
represents the loss from full health.

```{r}
qalys <-
    dat |>
    add_utility(type = "VT", country = c("Denmark", "France")) |>
    calculate_qalys(time_index = "time_index")

subset(qalys, .qaly == "raw")
subset(qalys, .qaly == "loss_vs_fullhealth")
```

`calculate_qalys()` also allows us to calculate the loss from a specified
baseline in one of two ways. Firstly, a character string `baseline_survey`
argument can be placed which matches a survey present in the utility data. If
the argument is passed as such then the utility values from the specified survey
are used to calculate the loss. Note that the survey is still included in the
raw, unadjusted calculation, prior to the calculation of loss.

```{r}
# Reload the example data and combine with some baseline measurements
data("EQ5D5L_surveys")
dat <- as_eq5d5l(
    EQ5D5L_surveys,
    mobility = "mobility",
    self_care = "self_care",
    usual = "usual",
    pain = "pain",
    anxiety = "anxiety",
    respondentID = "respondentID",
    surveyID = "surveyID",
    vas = "vas"
)

add_utility(dat, type = "VT", country = c("Denmark", "France")) |>
    calculate_qalys(baseline_survey = "survey01", time_index = "time_index") |>
    subset(.qaly == "loss_vs_baseline")
```

Alternatively the `baseline_survey` argument can be specified as a data frame
with a column corresponding to the respondentID and another representing the
associated utility. Optionally columns corresponding to the utility country and
utility type can be included to allow more granular comparisons. Note that for
this specification of baseline, it is **not** included in the unadjusted, raw,
calculation.

```{r}
split_dat <- split(dat, dat$surveyID == "survey01")
surveys <- split_dat[[1]]
baseline <- split_dat[[2]]
utility_dat <- add_utility(
    surveys,
    type = "VT",
    country = c("Denmark", "France")
)
baseline_utility <-
    baseline |>
    add_utility(type = "VT", country = c("Denmark", "France")) |>
    subset(select = c(respondentID, .utility_country, .utility_type, .value))

calculate_qalys(utility_dat, baseline_survey = baseline_utility, time_index = "time_index") |>
    subset(.qaly == "loss_vs_baseline")
```

### Other functionality

#### Paretian Classification of Health Change (PCHC)

```{r}
data("eq5d3l_example")
dat <- as_eq5d3l(
    eq5d3l_example,
    respondentID = "respondentID",
    surveyID = "surveyID",
    mobility = "MO",
    self_care = "SC",
    usual = "UA",
    pain = "PD",
    anxiety = "AD",
    vas = "vas"
)
grp1 <- subset(dat, Group == "Group1")
grp2 <- subset(dat, Group == "Group2")
calculate_pchc(grp1, grp2)
calculate_pchc(grp1, grp2, by.dimension = TRUE)
```