Confidence intervals and tests in emmeans

Russ Lenth

2018-01-09

Contents

This vignette describes various ways of summarizing emmGrid objects.

  1. emmGrid objects, described
  2. summary(),confint(), andtest()`
  3. Back-transforming to response scale (See also the “transformations” vignette)
  4. Multiplicity adjustments
  5. Using “by” variables
  6. Joint (omnibus) tests
  7. Testing equivalence, noninferiority, nonsuperiority
  8. Graphics (in “basics” vignette)

Vignette index

Objects, structures, and methods

The ref_grid() and emmeans() functions are introduced in the “Basics” vignette. These functions, and a few related ones, return an object of class emmGrid:

pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
pigs.rg <- ref_grid(pigs.lm)
class(pigs.rg)
## [1] "emmGrid"
## attr(,"package")
## [1] "emmeans"

pigs.emm.s <- emmeans(pigs.rg, "source")
class(pigs.emm.s)
## [1] "emmGrid"
## attr(,"package")
## [1] "emmeans"

If you simply show these objects, you get different-looking results:

pigs.rg
## 'emmGrid' object with variables:
##     source = fish, soy, skim
##     percent =  9, 12, 15, 18
## Transformation: "log"

pigs.emm.s
##  source   emmean         SE df lower.CL upper.CL
##  fish   3.394492 0.03668122 23 3.318612 3.470373
##  soy    3.667260 0.03744798 23 3.589793 3.744727
##  skim   3.796770 0.03938283 23 3.715300 3.878240
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

This is based on guessing what users most need to see when displaying the object. You can override these defaults; for example,

str(pigs.emm.s)
## 'emmGrid' object with variables:
##     source = fish, soy, skim
## Transformation: "log"

The most important method for emmGrid objects is summary(). It is used as the default for displaying an emmeans() result like pigs.emm.s. This summary() method for emmGrid objects) actually produces a data.frame, but with extra bells and whistles:

class(summary(pigs.emm.s))
## [1] "summary_emm" "data.frame"

This can be useful to know because if you want to actually use emmeans() results in other computations, you should save its summary, and then you can access those results just like you would access data in a data frame. The emmGrid object itself is not so accessible. There is a print.summary_emm() function that is what actually produces the output you see above – a data frame with extra annotations.

Back to Contents

summary() and its relatives

As you may have gathered, the most important method for emmGrid objects is summary(). It has a lot of options, and the detailed documentation via help("summary.emm") is worth a look.

Just summary(<object>) by itself will produce a summary that varies somewhat according to context. It does this by setting different defaults for the infer argument, which consists of two logical values, specifying confidence intervals and tests, respectively. The summary of a newly made reference grid will show just estimates and standard errors, but not confidence intervals or tests (that is, infer = c(FALSE, FALSE)). The summary of an emmeans() result, as we see above, will have intervals, but no tests (i.e., infer = c(TRUE, FALSE)); and the result of a contrast() call (see comparisons and contrasts) will show test statistics and P values, but not intervals (i.e., infer = c(FALSE, TRUE)). There are courtesy methods confint() and test() that just call summary() with the appropriate infer setting; for example,

test(pigs.emm.s)
##  source   emmean         SE df t.ratio p.value
##  fish   3.394492 0.03668122 23  92.540  <.0001
##  soy    3.667260 0.03744798 23  97.929  <.0001
##  skim   3.796770 0.03938283 23  96.407  <.0001
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale.

It is not particularly useful, though, to test these EMMs against the default of zero – which is why tests are not usually shown. It makes a lot more sense to test them against some target concentration, say 40. And suppose we want to do a one-sided test to see if the concentration is greater than 40. Remembering that the response is log-transformed in this model,

test(pigs.emm.s, null = log(40), side = ">")
##  source   emmean         SE df     null t.ratio p.value
##  fish   3.394492 0.03668122 23 3.688879  -8.026  1.0000
##  soy    3.667260 0.03744798 23 3.688879  -0.577  0.7153
##  skim   3.796770 0.03938283 23 3.688879   2.740  0.0058
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## P values are right-tailed

Back to Contents

Back-transforming

Transformations and link functions are supported an several ways in emmeans, making this a complex topic worthy of its own vignette. Here, we show just the most basic approach. Namely, specifying the argument type = "response" will cause the displayed results to be back-transformed to the response scale, when a transformation or link function is incorporated in the model. For example, let’s try the preceding test() call again:

test(pigs.emm.s, null = log(40), side = ">", type = "response")
##  source response       SE df null t.ratio p.value
##  fish   29.79952 1.093083 23   40  -8.026  1.0000
##  soy    39.14451 1.465883 23   40  -0.577  0.7153
##  skim   44.55704 1.754782 23   40   2.740  0.0058
## 
## Results are averaged over the levels of: percent 
## P values are right-tailed 
## Tests are performed on the log scale

Note what changes and what doesn’t change. In the test() call, we still use the log of 40 as the null value; null must always be specified on the linear-prediction scale, in this case the log. In the output, the displayed estimates, as well as the null value, are shown back-transformed. As well, the standard errors are altered (using the delta method). However, the t ratios and P values are identical to the preceding results. That is, the tests themselves are still conducted on the linear-predictor scale (as is noted in the output).

Similar statements apply to confidence intervals on the response scale:

confint(pigs.emm.s, side = ">", level = .90, type = "response")
##  source response       SE df lower.CL upper.CL
##  fish   29.79952 1.093083 23 28.39159      Inf
##  soy    39.14451 1.465883 23 37.25735      Inf
##  skim   44.55704 1.754782 23 42.30080      Inf
## 
## Results are averaged over the levels of: percent 
## Confidence level used: 0.9 
## Intervals are back-transformed from the log scale

With side = ">", a lower confidence limit is computed on the log scale, then that limit is back-transformed to the response scale. (We have also illustrated how to change the confidence level.)

Back to Contents

Multiplicity adjustments

Both tests and confidence intervals may be adjusted for simultaneous inference. Such adjustments ensure that the confidence coefficient for a whole set of intervals is at least the specified level, or to control the overall significance level for a whole family of tests. This is done via the adjust argument. For ref_grid() and emmeans() results, the default is adjust = "none". For most contrast() results, adjust is often something else, depending on what type of contrasts are created. For example, pairwise comparisons default to adjust = "tukey", i.e., the Tukey HSD method. The summary() function sometimes changes adjust if it is inappropriate. For example, with

confint(pigs.emm.s, adjust = "tukey")
##  source   emmean         SE df lower.CL upper.CL
##  fish   3.394492 0.03668122 23 3.300060 3.488925
##  soy    3.667260 0.03744798 23 3.570854 3.763666
##  skim   3.796770 0.03938283 23 3.695383 3.898157
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95 
## Conf-level adjustment: sidak method for 3 estimates

the adjustment is changed to the Sidak method because the Tukey adjustment is inappropriate unless you are doing pairwise comparisons.

The adjustment method that is never inappropriate is Bonferroni; however, it can be quite conservative. Using adjust = "mvt" is the closest to being the “exact” all-around method, as it uses the multivariate t distribution (and the mvtnorm package) with the same covariance structure as the estimates to determine the adjustment. However, this comes at high computational expense as the computations are done using simulation techniques. For a large set of tests (and especially confidence intervals), the computational lag becomes noticeable if not intolerable.

For tests, adjust increases the P values over those otherwise obtained with adjust = "none", making it harder to declare an individual test as “significant.” Compare the following adjusted tests with the unadjusted ones previously computed.

test(pigs.emm.s, null = log(40), side = ">", adjust = "bonferroni")
##  source   emmean         SE df     null t.ratio p.value
##  fish   3.394492 0.03668122 23 3.688879  -8.026  1.0000
##  soy    3.667260 0.03744798 23 3.688879  -0.577  1.0000
##  skim   3.796770 0.03938283 23 3.688879   2.740  0.0175
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## P value adjustment: bonferroni method for 3 tests 
## P values are right-tailed

Back to Contents

“By” variables

Sometimes you want to break a summary down into smaller pieces; for this purpose, the by argument in summary() is useful. For example,

confint(pigs.rg, by = "source")
## source = fish:
##  percent prediction         SE df lower.CL upper.CL
##        9   3.220292 0.05361160 23 3.109388 3.331196
##       12   3.399846 0.04934177 23 3.297775 3.501917
##       15   3.437691 0.05482126 23 3.324284 3.551097
##       18   3.520141 0.05474644 23 3.406889 3.633393
## 
## source = soy:
##  percent prediction         SE df lower.CL upper.CL
##        9   3.493060 0.04979018 23 3.390061 3.596058
##       12   3.672614 0.04893581 23 3.571383 3.773846
##       15   3.710459 0.05070383 23 3.605570 3.815347
##       18   3.792909 0.06398483 23 3.660546 3.925272
## 
## source = skim:
##  percent prediction         SE df lower.CL upper.CL
##        9   3.622569 0.05011840 23 3.518892 3.726247
##       12   3.802124 0.04938053 23 3.699972 3.904275
##       15   3.839968 0.05486610 23 3.726469 3.953468
##       18   3.922419 0.06459690 23 3.788790 4.056048
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

If there is also an adjust in force when by variables are used, the adjustment is made separately on each by group; e.g., in the above, we would be adjusting for sets of 4 intervals, not all 12 together.

There can be a by specification in emmeans() (or equivalently, a | in the formula); and if so, it is passed on to summary() and used unless overridden by another by. Here are examples, not run:

emmeans(pigs.lm, ~ percent | source)     ### same results as above
summary(.Last.value, by = percent)       ### grouped the other way

Specifying by = NULL will remove all grouping.

Back to Contents

Joint tests

From the above, we already know how to test individual results. For pairwise comparisons (details in the “comparisons” vignette), we might do

pigs.prs.s <- pairs(pigs.emm.s)
pigs.prs.s
##  contrast      estimate         SE df t.ratio p.value
##  fish - soy  -0.2727678 0.05293450 23  -5.153  0.0001
##  fish - skim -0.4022777 0.05415929 23  -7.428  <.0001
##  soy - skim  -0.1295098 0.05304280 23  -2.442  0.0570
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 3 estimates

But suppose we want an omnibus test that all these comparisons are zero. Easy enough, using the joint argument in test (note: not available in summary()):

test(pigs.prs.s, joint = TRUE)
##  df1 df2 F.ratio p.value note
##    2  23  28.849  <.0001  d  
## 
## d: df1 reduced due to linear dependence

Notice that there are three comparisons, but only 2 d.f. for the test, as cautioned in the message.

The test produced with joint = TRUE is a “type III” test (assuming the default equal weights are used to obtain the EMMs). See more on these types of tests for higher-order effects in the “interactions” vignette section on contrasts.

Back to Contents

Testing equivalence, noninferiority, and nonsuperiority

The delta argument in summary() or test() allows the user to specify a threshold value to use in a test of equivalence, noninferiority, or nonsuperiority. An equivalence test is kind of a backwards significance test, where differences enough smaller than delta are the ones that can be significant. The help page for summary.emmGrid gives the details of these tests. Suppose in the present example, we consider two sources to be equivalent if they are within 25% of each other. We can test this as follows:

test(pigs.prs.s, delta = log(1.25), adjust = "none")
##  contrast      estimate         SE df t.ratio p.value
##  fish - soy  -0.2727678 0.05293450 23   0.937  0.8209
##  fish - skim -0.4022777 0.05415929 23   3.308  0.9985
##  soy - skim  -0.1295098 0.05304280 23  -1.765  0.0454
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## Statistics are tests of equivalence with a threshold of 0.22314 
## P values are left-tailed

By our 25% standard, soy and skim are equivalent at the \(\alpha = .05\) level, when no multiplicity adjustment is used.

Back to Contents

Graphics

Graphical displays of emmGrid objects are described in the “basics” vignette