The MTC framework was convenient for synthesizing available evidence and estimating % MBL < 80 mL at various follow-up times, but our focus was not on comparing treatment classes, nor on comparing treatments within classes (the data generally were not sufficient). The validity and reliability of the evidence for compounds within the same class (e.g., COCs) varies among studies, and pooled estimates for treatment classes may not account for some variation in efficacy within the class. The main aim was to estimate efficacy at a follow-up time of 3 months, with corresponding credible intervals, as inputs in a microsimulation economic model evaluating the relative cost and health impact of the eight treatment classes. The analysis produced posterior median estimates of % MBL < 80 mL that plausibly reflect the current evidence: a high level of efficacy for LNG-IUS and endometrial ablation  and somewhat lower efficacy for oral treatments. LNG-IUS and ablation, however, are designed for long-term (1 year or longer) reduction of menstrual bleeding. For women who prefer oral treatments and reversible contraception, COCs are an appropriate option.
Our evidence synthesis used a Bayesian framework, rather than a frequentist analysis, because Bayesian methods for indirect comparisons and MTCs are much more fully developed, offer greater flexibility in handling the special features of our data (e.g., availability of both direct and indirect evidence for some comparisons, uncertainty of % MBL < 80 mL and % PBAC < 100 estimated from summary statistics, accounting for missing data), and avoid problems associated with inverse-variance weighting based on estimated variances, such as bias and confidence-interval coverage that departs substantially from the nominal value . However, the use of a treatment-class effect, rather than treatment-specific effects within a class, may not fully account for some variation in efficacy between interventions within the same class.
A number of systematic reviews have synthesized the evidence for subsets of the treatment classes, and a network meta-analysis compared six second-generation endometrial ablation techniques (primarily with the class of first-generation hysteroscopic devices as the reference treatment) . However, this is the first study that has combined data on eight treatment options for heavy menstrual bleeding.
The systematic reviews comparing LNG-IUS and endometrial ablation have produced inconsistent conclusions. Marjoribanks et al.  concluded that resection or ablation was more effective than LNG-IUS at controlling bleeding at 1 year, but the evidence for longer-term effects was inconclusive. Lethaby et al. , addressing the same comparison from the opposite perspective, also found that LNG-IUS produced a smaller mean reduction in MBL (the primary endpoint) than ablation and progestogen side effects, but with no evidence of a difference in satisfaction or perceived quality of life between LNG-IUS and ablation. In contrast, Kaunitz et al.  subsequently used trial-level means and standard deviations of PBAC scores (their primary endpoint) from six RCTs to compare LNG-IUS and ablation. They concluded that at 6, 12, and 24 months, LNG-IUS was at least as effective as ablation in reducing MBL.
Middleton et al.  identified 30 RCTs that compared pairs of treatments from the classes first-generation ablation, second-generation ablation, LNG-IUS, and hysterectomy, and assembled individual patient data (IPD) from 17 of them. The primary outcome measure was satisfaction, but they also analyzed available data on MBL. Having IPD allowed them “to use previously unreported data, improve the assessment of study quality, standardize outcome measures, undertake intention-to-treat analysis, and use optimal analytical methods.” In their analysis LNG-IUS and endometrial ablation were comparable, but the authors remarked on uncertainty from small sample sizes in studies of LNG-IUS.
The findings of these systematic reviews of direct comparisons add support to the indication from our analysis that ablation is an effective treatment for HMB.
Lethaby et al. also evaluated a number of pharmacological therapies for HMB in a series of systematic reviews. They concluded that oral progestogens administered only during the luteal phase were less effective at reducing MBL than tranexamic acid, danazol, and LNG-IUS. Progestogens taken between day 5 and day 26 of the cycle, however, significantly reduced MBL from baseline, but were less effective than LNG-IUS [10, 12]. Danazol seemed to be more effective than placebo, progestogens, or COC, but confidence intervals were wide (based on pooled data from nine RCTs) . Tranexamic acid was more effective than placebo and luteal-phase progestogens at reducing MBL . An additional review of the effect of COCs on MBL by Farquhar and Brown  located only one cross-over study of 45 women, which found no significant difference in MBL between COC and danazol or non-steroidal anti-inflammatory drugs. Marjoribanks et al.  concluded that the results of these reviews “suggest that the LNG-IUS system provides a better alternative to surgery than oral medication. Levels of satisfaction and quality of life reported by women with an LNG-IUS system are similar to those in women who have undergone transcervical endometrial ablation or balloon ablation. Surgical methods are significantly more effective in reducing bleeding at one year, but studies with longer follow up did not show an ongoing advantage for surgery.”
These systematic reviews therefore agree with the estimates from our MTC that LNG-IUS and ablation are the most effective of the treatments studied at reducing MBL, that progestogens given for less than 2 weeks out of 4 during the menstrual cycle are least effective, and that danazol, progestogens given for close to 3 weeks out of 4, and tranexamic acid also showed efficacy. Our MTC was able to produce stronger evidence to support the use of COCs in HMB, largely by identifying studies that were not published at the time of the review by Farquhar and Brown . The previous systematic reviews found no direct-comparison studies for oral progestogens versus placebo, danazol versus TXA, danazol versus LNG-IUS, or LNG-IUS versus placebo, but our MTC suggested comparative efficacy for these treatments.
As mentioned in the introduction, we encountered several methodological challenges. First, the limited amount of data available contributed to the substantial uncertainty in most of our estimates of efficacy. Most of the studies had fairly modest sample sizes (median number of patients per arm 33, range 9 to 164). Also, as indicated earlier, studies varied greatly in the measures used and in study designs; 21 direct comparisons were spread among 8 treatment classes, and only one pair of treatment classes had more than 3 direct comparisons (Figure 1).
Further, the small number of follow-up times that were common across treatment classes (Table 1) increased uncertainty. Some variation in follow-up time is a consequence of the nature of the treatment classes. For example, for endometrial ablation, follow-up times of 6, 12, and 24 months are common. Among the seven studies that compared ablation and LNG-IUS, only one reported efficacy at 3 months. Thus, the wide credible interval at 3 months (94 percentage points) is not surprising. For other treatment classes (e.g., danazol and TXA), follow-up times of 1, 2, 3, and 6 months are more appropriate. The apparent lack of consensus on follow-up times among researchers studying a particular treatment class presents a challenge for evidence synthesis.
Estimation of % MBL < 80 mL (or % PBAC < 100) from summary statistics for MBL (or PBAC) introduced additional uncertainty, and each of the three distinct sets of summary statistics (mean and standard deviation, median and minimum and maximum, and median and quartiles) required a separate procedure for estimating % MBL < 80 mL or % PBAC < 100 (and a further, more-complicated procedure for estimating the standard error of the estimate). We wanted, however, to use as much of the available evidence as possible. In some articles we were unable to extract the same measure of efficacy at all follow-up times, or even for both treatments. Investigators showed little consensus on the measures of efficacy to report, with no convergence of approach over time. Future researchers can facilitate meta-analyses and MTCs by reporting outcomes in a more consistent way and in sufficient detail (e.g., in a supplemental file, available online) for secondary analysis.
The greater use of measures based on PBAC scores may reflect a shift away from the burden that use of MBL places on trial participants (who must collect their sanitary material for laboratory analysis). We used only data based on the PBAC score developed by Higham et al.  because it was much more common in the articles that we encountered than scores based on other pictorial charts. The validity of both PBAC and the alkaline hematin method requires consistent use of the specific validated sanitary materials. Deviations from this requirement may affect estimates of efficacy; but they are difficult to measure and are not reported in the studies’ results, adding to unexplained variation and uncertainty of the estimates.
In some MTC meta-analyses it may be advantageous to include RCTs that evaluated only one, or even none, of the treatments of interest . When comparisons of efficacy are the focus, the network of evidence would then ordinarily include all the treatments evaluated in those RCTs. Five of the RCTs in our data evaluated treatments that were not considered in the microsimulation model, and we did not include data from the other arms of those RCTs. In four of the five, the other treatment was mefenamic acid (which is no longer considered a strong treatment option), and in the fifth it was hysterectomy.
Several areas would benefit from attention in future work: the effect of including additional treatment classes in the evidence network, including RCTs that reported outcomes based on other pictorial charts, incorporating results from observational studies, and synthesizing evidence on patient-focused outcomes such as satisfaction and health-related quality of life.