Decompression with Fusion is not in Superiority to Decompression Alone in Lumbar Spondylosis Based on Randomized Controlled Trials: A Meta-analysis and Systematic Review

Objective: To compare the efficacy on Decompression (D) and decompression with Fusion(F) for patients with Herniated Disc (HD) and Lumbar Spinal Stenosis (LSS) regardless of Degenerative Spondylolisthesis (DS) based on RCTs. Summary of Background Data: Whether F is superior to D on LSS and HD still remains controversial. Recently several RCTs have been published. Methods: The databases include PUBMED/MEDLINE, EMBASE, Cochrane Library and Web of Science from January 1970 to March 2018. Two reviewers assessed eligible trials and extracted information. The information included basic characteristics, primary and secondary measures, then meta-analysis was progressed as well as subgroup analysis by DS and follow-up time (36 months). The strength of evidence and recommendation was evaluated by GRADE system. Result: A total of 9 RCTs met inclusion criteria with 857 patients and the average age, sex ratio and preoperative VAS were of no significance. In primary measures, there were no difference in VAS changes on back and leg pain between D and F group (MD = -0.03, P = 0.94; MD = 0.11, P = 0.86, respectively); Patients’ satisfaction (P = 0.48), the change of ODI (P = 0.29) were of no difference. Secondary measures showed no difference in complication rate (P=0.50) and reoperation (P=0.11) while a significance of longer operation duration (P<0.0001), more blood loss (P=0.004), longer hospital stays (P<0.0001) in F group. The subgroup analysis with DS showed all measures were basically in consistency with meta-analysis. The follow-up showed a higher reoperation rate in middle-to-long term (>36months) in D group. According to the GRADE system, the grade of meta-analysis is of “High” quality. Conclusion: F group has no better clinical results than D alone in LS, regardless of DS and follow-up time. According to the GRADE, the grade strength of recommendation was “Strong”.


Introduction
Lumbar Spinal Stenosis (LSS) and Herniated Disc (HD) are the most common diseases in vertebrae or disc Lumbar Spondylosis (LS), LSS is characterized by narrowing of the central vertebral canal, lateral recesses and HD is of protrusion intervertebral disc [1,2]. Degenerative Spondylolisthesis (DS), approximately 4.1% in general population [3] and usually accompanied with LSS, due to degenerative changes resulting in slip of one vertebral body over another, causing a series of symptoms of intermittent neurogenic claudication, radicular back and leg pain. The therapy strategy has been identified that surgical intervention was superior to conservative care for symptomatic lumbar spondylosis by The Spine Patient Outcomes Research Trial (SPORT) [4]. Decompression (D) is a recommended surgical approach of LS and D with Fusion (F) is even regarded as the gold standard surgery on DS for the stability support [5]. However, the issue on whether fusion is absolute need remains still controversial [6][7][8]. Over the last 2 decades, several reviews on comparison of surgical outcomes between D alone and D plus Fusion (F) for LS have been published and some of them are in favor that F had better clinical outcomes [3,9,10]. However, with the publish of qualified Randomized Controlled Trials (RCTs) about D and F of LS drawing somewhat different conclusion, the opinion on this focus progressed more controversial. Therefore, a meta-analysis is still of vital important to be performed since the lack of qualified study consist of nonrandomized Controlled Trials (nRCTs), the neglect of data published by Forsth, et al. [11], the paucity of evidence on all outcomes but the primary ones, the lack of grades of recommendation on the whole meta-analysis. Therefore, we conducted a meta-analysis and systematic reviews to compare the entire efficacy on D with F for patients with LSS (with or without DS) and HD based on published RCTs.

Search strategies
The databases used to search include PUBMED/MEDLINE, EMBASE, Cochrane Library and Web of Science for Englishlanguage articles, from January 1970 to March 2018. The following search strategy were used: (laminotomy OR laminectomy OR fenestration OR hemilaminectomy OR decompression) AND (lumbar spondylolisthesis OR lumbar spinal stenosis OR lumbar canal stenosis OR degenerative lumbar spondylolisthesis OR slipped disk OR protrusion OR herniated disc) AND (fusion OR arthrodesis). Two reviewers independently screened all studies for eligibility.

Inclusion and Exclusion Criteria
Included studies fulfilled the following criteria: (1) they were RCTs written in English; (2) the studies focused the comparison between D versus F for LSS and HD, the LSS was with or without DS; (3) the comparative data of clinical outcomes, major complications, reoperations, and other perioperative desirable outcomes could be acquired and (4) the sample size was bigger than 5 per group and a minimum follow up time of 1 year. Exclusion criteria were: (1) non-English-language articles; (2) nRCTs, case reports, duplicate papers, or review reports; (3) without a controlled group or with a small sample size (<5 patients per group); (4) participants mixed tumors, fractures, osteoporosis, or other irrelevant diseases; (5) studies mainly concerning a surgical approach, or surgical techniques or instruments; (6) studies with incomplete or undesirable outcome.

Data Extraction
Both reviewers assessed potentially eligible trials and extracted information independently from each potential study. Any discrepancies were resolved through a third reviewer to reach consensus. The following data were extracted: basic characteristics of demographic information, primary and secondary measures. Primary measures included the change of visual analog scales (VASs, ranging from 0 to 10, with higher scores indicating more severe pain) on back and leg pain, the Oswestry Disability Index (ODI, ranging from 0 to 100, with higher scores indicating more disability related to pain), European Quality of Life-5 Dimensions (EQ-5D, range ranging from 0 to 1, with higher score indicating better quality of life), Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) , patients' satisfaction, walking ability. Secondary measures included that included incidence of complications and reoperations, operation time, blood loss, length of hospitalization and Adjacent Segment Degenerative/Disease (ASD).

Risk of Bias and Quality Assessment
Two investigators independently graded each eligible study. We used the Cochrane Handbook for Systematic Reviews of Interventions, version 5.0 [12] for RCTs. The following domains were assessed: randomization, blinding (of patients, surgeons and assessors), allocation concealment, adequacy of outcome data, selective reporting, and other biases. Each domain of quality assessment was classified as adequate (A), unclear (B) or inadequate (C). If all domains were A, the study was A-level; if at least one domain was B, the study was B-level; if at least one domain was C, the study was C-level.

Data Synthesis and Analysis
Review Manager Software (Rev Man Version 5.3 [The Cochrane Collaboration, Oxford, United Kingdom]) was used to conduct the statistical analysis. Continuous variables were reported as Weighted Mean Difference (WMD) and 95% confidence interval (95% CI), and dichotomous variables were reported as Odds Ratios (ORs) and 95% CI. Results were regarded as statistically significant if Two-sided P<0.05. I² was used to estimate the size of the heterogeneity. I²<50% indicated low heterogeneity and the results of comparable groups could be pooled using a fixed-effects model. Subgroup analysis that could reduce statistical heterogeneity to facilitate factor definition was worthwhile. If the overall heterogeneity was I²<50%, we could still divide studies into subgroups depending on professional principles and clinical meaning. We constructed a funnel plot for overall outcomes to assess publication bias.

GRADE Approach
The GRADE (The grades of recommendation, assessment, development and evaluation) approach was used to evaluate the strength of evidence [13]. Based on parameters, the quality assessment was classified as very low, low, moderate or high according to the GRADE handbook (version 3.2), with the GRADE profiler software (version 3.6). A Summary of Findings Table (SoF  Table) was used to explain the final results.

Search Result
The process of identifying relevant studies is summarized in (Figure 1). the Cochrane Handbook for Systematic Reviews of Interventions [12], 6 out of 9 were of high quality and a low risk of bias. One study was A-level quality [16], 5 articles were B-level [14,17,18,20,23] and 3 articles were C-level with a moderate risk of bias [15,19,21] (Figure 2). 2768 references were obtained from the databases mentioned and a total of 9 RCTs [14][15][16][17][18][19][20][21][22] eventually met inclusion criteria with a total of 857 patients: 367 were in D group and 490were in F group. As some studies were continuations of previous articles, we used the latest publication to avoid duplication and the 9 included studies were published between 1987 and 2016. 2 RCTs published in 2017 completed by Försth et al. and Karlsson, et al. [11,23] contained the same data as the study published in 2016, we finally could not regard the 2RCTs as included studies but only adopt partial refreshed information as supplement for its undesirable and inadequate outcomes although published later.

Risk of Bias and Quality Assessment
According to the quality assessment criteria recommended by The review authors' judgments about each risk of bias item for each included study: + is "yes", -is "no", ? is "unclear"

Basic characteristics
The characteristics on basic information of the 9 included RCTs were recorded in (Table 1).  is available but not shown, it has been supported in S1 Table.   Table 1: Characteristics and surgery information of the included studies.

Pre-
The participants were diagnosed with LSS combined with DS in 6 studied，LSS in 2 studies and HD in 1 study. The average age in D group and F group was of no difference (P=0.99), so was the sex ratio (F/M) (P=0.47). Surgery approaches in D group referred to decompression alone, laminectomy and facetectomy, while in F group contained PLIF, PLF and facet arthrodesis with or without instruments. There were of no significance on preoperative VAS on back and leg pain between the two groups supported by 5 articles [14,[16][17][18]21].

Figure 3(A):
The meta-analysis on the change of VAS on back pain between D and F group.
Grob, et al. [19] reported there was of no difference between D group and F group but both amelioration contrasted with that of reoperation though a lack of precise data. The number of improvement on back pain mentioned in 3 articles [14,[18][19] showed no difference between the 2 groups (OR = 0.75, z = 1.27, P = 0.21).

Figure 3(B):
The meta-analysis on the change of VAS on leg pain between D and F group.
Grob, et al. [19] also reported there was of no difference between D group and F group, but both improved postoperatively. The number of improvement on leg pain mentioned in 2 articles [14,19] showed no difference between the 2 groups (OR =1.79, z =0.50, P = 0.62) and 1 article [18] reported no significance without specific data.

Figure 4(A):
The meta-analysis on the change of ODI between D and F group.

Figure 4(B):
The meta-analysis on patients' satisfaction between D and F group.
The increased number of in walking distance were reported of the 2 studies [14,19], a meta-analysis about it showed no statistical significance (OR = 1.07, z = 0.09, P = 0.93) and Aihara, et al. [17] indicated the walking ability score (4.81 vs. 4.24) of no difference between D and F group.

Figure 5(A):
The meta-analysis on complications rate between D and F group.
ASD was not distinguished meticulously in this study though the different conception between Adjacent Segment Degeneration (ASDeg) and Adjacent Segment Disease (ASDis) [24]. A meta-analysis showed a difference between D group and F group (OR = 2.35, z = 2.40, P=0.02) ( Figure 5(B)).

Figure 5(B):
The meta-analysis on the rate of ASD between D and F group.

Figure 6(A):
The meta-analysis on reoperation rate between D and F group.

Operation duration, blood loss and hospital stay
The duration of operation, blood loss and length of hospital stays were simultaneously included by these 5 articles [14][15][16][17]19] but one [19] out of 5 miss the standard derivation, so 4 studies could be performed meta-analysis. There was a statistical difference in operation time and blood loss between the D group and F group (MD = -80.02, z = 4.53, P＜0.0001; MD = -339.05, z = 2.86, P=0.004, respectively) with random effects model (I² = 97%; I² = 100%, respectively). Grob, et al. [19] reported a significance on duration of operation (104min vs. 147min) and blood loss (300ml vs. 762ml) between the two groups in original article. As to the length of hospitalization, a statistical significance was also showed between the two groups (MD = -2.66, z = 4.43, P＜ 0.0001, I² = 78%).

Postoperative DS progression
Accompanied with LSS, DS was often seen in LS and 6 [14,15,[17][18][19]20,21] out of 9 included studies referred to DS with a proportion of 64.76% among selected participants. A meta-analysis on 2 studies about the number of postoperative DS progression showed no difference (OR = 8.59, z = 1.11, P=0.27) (Figure 6(B)). Then we performed a subgroup analysis on stratification of DS.

Subgroup Meta-analysis LD Combined with DS
The RCT published in 2016 [14] reported a comparison of D group (66 patients) and F group (67 patients) with DS and other patients included 5 RCTs were all diagnosed LSS combined with DS. Bridwell, et al. [20] showed a proportion of 26.47% occurred in L3/4 and 73.5% in L4/5 with single segment slip. Overall, the operation duration and blood loss in secondary measures were of statistical difference between the 2 groups (P = 0.004 and P＜0.0001, respectively), all of the comparisons were in consistency with the whole meta-analysis (

Follow up Time
Long-term follow-up suggested that fusion surgery may accelerate degeneration of the adjacent segment but no influence on clinical result [25]. Consequentially a subgroup analysis base on follow-up time of short term (＜36 months) and middle-to-long term (＜36months) was then underwent in comparison of primary and secondary measures except operation duration, blood loss and hospital stay for their senseless. (Table 3) showed that there was a statistical difference (＜36 months) in VAS change on leg pain (P=0.04) standing D group side, suggesting as least no better outcome with fusion in short term follow-up. As to the middle-tolong term follow-up, the change of ODI and reoperation rate were of significance in favor of D group and F group respectively, which indicated decompression alone may induce a higher reoperation rate with the longer follow-up. The other measures were in line with the overall meta-analysis and ASD was the most seasons of reoperation yet no matter the follow-up time.

Publication bias
Publication bias was just assessed for VAS change on back pain, reoperation rate, and complications as at least five studies are required to detect asymmetry. Funnel plot showed no apparent asymmetry (not shown), suggesting that publication bias may not be a limitation.

GRADE approach
The SoF Table (Figure 7) presents the grade of the ultimate outcome under the intervention of D and F group with a result of no statistical significance and the "High" quality grade of this meta-analysis. According to the academic and clinical experiences, the grade of ultimate outcome and the overall grade quality of this meta-analysis, the grade strength of recommendation was "strong".

Discussion
The debate on efficacy of decompression versus decompression plus fusion in lumbar spondylosis has never stopped and more intensified over several decades. Relevant publications insisted decompression alone to be significantly less invasive than that combined with fusion [26,27] demonstrated that posterior spinal fusion following decompression led to longer operative time, more blood loss, while instability of the spine is a potential consequence that needs to be considered [28,29], especially combined with DS. The recent publications included 3 RCTs [14][15][16] focusing the issue, with more qualified and quantitate evidence, made it facilitate to perform a further study. 9 RCTs included in our meta-analysis showed there was no difference in the primary clinical outcomes as well as secondary ones of complications rate, reoperation rate ASD between D versus F while patients with fusion suffered more blood loss, prolonged operation time, and hospital stays. It was the first time to study based on all RCTs including newest publications, to perform a subgroup analysis and to show a evidence and recommendation grade.
Stability is an inevitable topic as a potential factor indicating the approach selection. Decompression alone was recommended for typical LSS with no lumber operation history, no spinal instability [30] and decompression without fusion cannot guarantee consolidation as to the satisfactory outcomes [31]. A survey [5] reported that the presence of motion on dynamic radiographs and back pain might raise enough reason to choose fusion surgery. Herkowitz, et al. [21] reported a difference on spondylolisthesis postoperatively between D and F group on flexion and extension position (5.8mm vs. 0.1mm) and neutral position (7.9mm vs. 5.3mm) (both P ＜ 0.05 respectively) without significance preoperatively, while the DS progression postoperatively seemed not in line with the olishthesis degree in our analysis. Brown, et al. [32] affirmed intraoperative spinal stiffness measurements did not predict clinical results after lumbar spine surgery. Försth, et al. [14] also found no significant difference between the D and F groups in amelioration of back pain, regardless of DS, and previous studies have shown that spondylolisthesis was not associated with an increased level of back pain [33].
In the last 3 years, more studies have approved that D alone was as effective as DF for LS [34]. In our meta-analysis, the primary outcomes deciding the majority efficacy such as the imprudent of VAS, ODI and walking ability were of no difference, which was in array with some recent publications. Brodke, et al. [35] reported fusion added to decompression had no superior survival curve, improved clinical outcomes over decompression alone. Although three different fusion with or with instruments as F group approaches included in our study, it concluded no significant differences were found in SF-36 and ODI score among 3 different fusion techniques for patients with DS and LSS [36]. Therefore, an explanation for the result drawn by Ghogawala, et al. [15] that F group was with slightly greater improvement in SF-36 than D alone statistically may be a factual clinical outcome, but the overall main measures of no difference should pay more attention. Spinal fusion surgery theoretically requires more intervention produces and often involves spinal implants or intervertebral cages [37], the secondary measures of operation duration, blood loss and hospital stays were unquestionably less in D group though a various value in different studies, in agreement with most articles. As a consequence, we believed that D alone could achieve paralleled clinical efficacy compared with F approach [10]. However, a reasonable selection should be required individually, when LS mixed with other degenerative changes, such as osteophyte or calcified ligaments, the more consolidation would make it possible to reduce a fusion. Matsudaira suggested a better clinical results outcome with DS by preserving the posterior elements [38]. Similary, Tuli et al. thought the best alternative of adequate laminectomy with preservation of the posterior ligament complex integrity [39].
The presence of degenerative spondylolisthesis has often been considered an instability, although there is no consensus on the definition [14] and surgical strategies for DS was still a matter of debate. 6 out of 9 RCTs referred to DS and there was a similar outcome as the whole meta-analysis when stratified for further subgroup analysis. McCullen proposed patients with DS may require changes in decompression without fusion modality to improve outcomes [40]. Several studies suggested that decompression alone may exacerbate instability and increase the degree of DS [7,41,42]. While Försth, et al. proved that F did not result in clinical outcomes that were superior to D with DS and our meta-analysis based on 2 studies about the number of postoperative DS progression showed no difference. Except the probability of major proportion, the participants with DS took, the better explanation was there, factually, was of no significance between D and F group.
Long-term follow-up between the two approaches suggested no influence on clinical outcome [26]. Follow-up time was always distinguished by 2 years and 5 years to conform short-, middle-and long-term, while short term (＜36 months) and middle-to-long term ( ＞36months) in this subgroup analysis performed as a reasonable combination of proportional distribution statistically and clinical sense. The change of ODI in middle-to-long term was of statistical difference but probably a bias as the little sample and significant heterogeneity (I²＞50%), while that the reoperation rate in D group was higher (＞36months) may be make sense with the reason of ASD, which, however, was against the opinion of Inui, et al. [20] that there was a significantly higher reoperation rate in fusion compared with decompression alone. In addition, Försth, et al. [11] and Karlsson, et al. [23] progressed the follow-up time of 5 years and refreshed information about some measures published in 2017, which, regretly, just contain a partial desirable result and eventually abandoned with an exclude study. It reported several paralleled measures of no difference between D and F groups: the VAS change on back pain (2.8 vs. 3.2), the VAS change on leg pain (3.1 vs. 3.2), the change of ODI (26 vs. 29), the number of satisfaction (74 vs.64) and restenosis (7 vs. 1). There were eventually no significant clinical outcomes yet between D and F group three years later.
The complications contained surgery associated events such as dural rupture and other adverse such as pulmonary embolism and cardiac infarction [14]. It was regret that a further analysis should be progressed according to the types of complications but failed to obtain the desirable data. The overall incidence of complication was 14.71% in D group and 14.49 % in F group (with a range of 0 to 42%). Publications reported a higher grade of spondylolisthesis and older age were believed to be the risk factors of higher complication rate [3,43] but we could not draw the same conclusion. In this metaanalysis, we found that the complication rate and reoperation rate did not differ significantly between D and F groups, which was different from most previous studies [3]. ASD was an unavoidable complication and in theory the altered biomechanical function of the spine, was compensated for by increased motion at the unfused segments, which then accelerated adjacent lumbar level fusion problems and produced back pain and leg pain [44,45]. While on the contrary, there was a higher ASD incidence in D group and the favor of F group indicated ASD may not be associated with fusion but a natural progression in LS, a consistent point drawn by Pesce, et al. [46] that ASD is a part of the natural history of cervical spondylosis a complication based on a RCTs of 10 years follow-up. It seemed that surgeons might improperly attribute ASD as the common reason for poor outcomes after fusion surgery [14].
Inui, et al. [20] shown that there was a significantly higher reoperation rate in fusion compared with decompression alone. Dailey, et al. [47] thought reoperation rate at the surgical level or adjacent levels was not associated with D or F and reported a 13% reoperation rate, a proximity of 10.90% in D group and 5.71% in F group in our study and of no difference between the two groups. There were publications reported the common causes of reoperations in the D group were the same segmental herniation and restenosis, while in the F group were caused by implant-related problems and ASD [6,21] Brodke et al. reported the common reason for reoperation was due to symptomatic adjacent segment pathology whatever the approaches (D or F) [38], which was an approval by the same result in this analysis. A cost-effective analysis was not included for the restriction of RCTs that just one out of 9 studies described, which, emphasized by Försth, et al. [14], showed the mean direct costs of each procedure (mainly hospital costs, including surgery) were $6,800 higher in the F group than in the D group because of the additional operating time, extended hospitalization, and cost of the implant. Hallett et al. revealed a cost difference of approximately USD $6290 per patient for an additional fusion implant [48]. Given the higher cost of adding fusion, D alone was believed to be more cost effective than instrumented fusion for selected patients.
There are several limitations restrict the overall efficacy: 9 RCTs included in this study with a less participants contrast with some relevant publications, confined by the number of RCTs although the supported a quality guarantee and evidence strength. In addition, a somewhat unsatisfied result of quality assessment with some high-risk factors probably down-regulate the grade of recommendation, since 3 RCTs are still of moderate risk of bias and most of them could not exert inadequate blinding so as to produce 15% overestimation of treatment effect. Besides, there is insufficient data of primary outcomes in walking ability, SF-36 and further information on DS. Finally, the lack of results on radiographic findings may make an effect on an objective evaluation.

Conclusions
Decompression plus fusion has no better clinical results than decompression alone in lumber vertebral and disc spondylosis, regardless of the combination with spondylolisthesis, which is yet no significant change with shotterm or middle-to-long term follow-up. Decompression plus fusion has a longer duration of operation, more hospital stays and more blood loss, even perhaps a lager cost. According to the GRADE, the grade of this meta-analysis is of "High" quality, the grade strength of recommendation was "strong".