The Call for Progress in Evidence-based Medicine
by Franz Porzsolt*
Private Research Institute of Clinical Economics, 89081 Ulm / Germany
*Corresponding author: Franz Porzsolt, Private Research Institute of Clinical Economics, 89081 Ulm / Germany
Received Date: 21 March, 2025
Accepted Date: 26 March, 2025
Published Date: 28 March, 2025
Citation: Porzsolt F (2025) The Call for Progress in Evidence-based Medicine. J Family Med Prim Care Open Acc 9: 278. https://doi.org/10.29011/2688-7460.100278
Introduction
Four international cardiology societies (European Society of Cardiology, American Heart Association, American College of Cardiology, World Heart Federation) issued a joint statement proposing a modification of Randomized Controlled Trials (RCT) [1]. This modification is justified due to increased administrative requirements and financial burdens, as well as a disproportionately low information gain from conventional RCTs. In a ‘joint opinion’, the design of an adaptive platform study is proposed instead of traditional RCTs [2] because promising results could be achieved by this study design in different studies [3-5].
We agree with the Joint Opinion Group’s call for a necessary optimization of the standards for gaining knowledge in the healthcare system and contribute our experience we gained while developing the Pragmatic Controlled Trial (PCT).
Background
The need to develop a specific method for the detection of nonexperimental care effects arose in the late 1980s. One of us, a young oncologist, noticed that treatment successes in patients at our university hospital were inferior compared to published oncology reports. Nearly a decade passed before a simple idea could plausibly explain the difference. We observed effects in our hospital that occur in everyday care (real-world effectiveness), whereas journals reported data were almost exclusively generated in experimental studies under strictly controlled conditions. Although the scientific literature differentiated between “efficacy” and “effectiveness” very early [6 - 8], convincing methods for the undistorted detection of results under non-experimental conditions were not yet available. We did not succeed in formally describing the difference between expected and observed results until much later [9].
Our research in evidence-based medicine taught us Sir Archibald Cochrane’s and Sir Austin Bradford Hill’s three essential questions to ask before implementing an innovation in the healthcare system: “Can it work? Does it work? Is it worth it?” [10]. Cooperation with teachers and students in the “hochschule für gestaltung (hfg)” (Ulm school of design) taught us the rule “Form Follows Function (FFF)”generated by American designers and architects [11]. As citizens of Ulm, we are familiar with many of Albert Einstein’s (born in 1879 in Ulm) statements, which pointed out that problems cannot be solved by the mindset that caused them. The recommendations of the British epidemiologists and input by the American designers and the German physicist facilitated the development of a three-dimensional strategy for the evaluation of healthcare performance.
The Three-Dimensional Strategy
The concept of the three-dimensional strategy is based on the three Cochrane-Hill questions. The answer to the first question, “Can it work?”, requires proof of the effective principle (proof of principle, PoP). The second question, “Does it work?” can be answered by demonstrating Real-World Effectiveness (RWE). The third answer describes the perceived value (Val) of healthcare services from an individual and a societal perspective. Efficacy and effectiveness depend on objective judgments, whereas the description of value is a subjective but essential judgment. If the proof of efficacy is supplemented by the proof of real-world effectiveness, not only effective interventions can be identified. In addition, it will also be possible to describe the endpoint-specific risk profiles of patients who can be successfully treated with effective intervention. This classification of the successfully treatable subpopulation will help to significantly increase care efficiency. If different therapies achieve identical results in patients with identical risk profiles, this knowledge will also contribute to the development of new care strategies.
The Criteria for Distinguishing Efficacy and Effectiveness
Functions and Forms of the Three Healthcare Conditions
Patient care can be performed under three different conditions, under the non-structured, natural conditions of everyday care or under two different types of a structured study. The two structured study conditions are the strictly controlled experimental or interventional study, the Randomized Controlled Trial (RCT), or the non-experimental or observational study, the Pragmatic Controlled Trial (PCT), which is based on the principle of Bayes’ statistics. In a PCT, each patient is cared for under non-structured, natural everyday care conditions, but evaluated under structured conditions by applying Bayesian statistics. The advantage of Bayesian statistics over randomization is the ability to apply statistical methods under the non-structured conditions of everyday care without altering these natural conditions.
The Bayesian method requires documentation of the intervention (therapy) and all individual risk factors that could affect any of the measured PCT endpoints. Based on the categorization of therapies and individual risk profiles, each patient can be assigned to the appropriate risk class with respect to each of the measured PCT endpoints [12-15]. Although this accurate risk classification requires large numbers of cases, it allows comparison of patients assigned to an individual risk class with respect to each measured endpoint and the intervention applied. An RCT only ensures that the risk profiles are equally distributed within the study arms. Therefore, all patient risk profiles that were not excluded in the RCT are represented in each study arm. Only a limited number of different therapies can be studied in an RCT, which significantly limits the applicability of results in everyday care. We compiled the detailed differentiation of the three healthcare conditions based on two functional and twelve formal (structural) criteria [16] (Table 1).
Criteria |
Efficacy |
Effectiveness |
Functions |
Care of subjects under structured conditions of an experimental study. Proof of principle (PoP) under structured conditions of an experimental study (e.g. RCT) |
Care of subjects under not structured conditions of everyday care Demonstration of real-world effectiveness (RWE) under structured conditions of an observational study (e.g. PCT). |
Forms / structures |
No agreement between the 12 criteria of an experimental study (RCT) with the criteria under the conditions of everyday care. |
Agreement of six of the 12 criteria with the criteria of experimental studies and of four criteria with the criteria of everyday care |
Table 1: Functions and forms/structures of efficacy and effectiveness [15].
Causes, Consequences, and a Possible Solution to the Terminology Conflict
The challenge to assessing healthcare outcomes in three dimensions [10], which has been unresolved for 80 years, is based on a terminology conflict involving differentiating efficacy from effectiveness [14]. The most likely cause of this terminology conflict is the lack of proposed alternative solutions. Therefore, the RCT has hitherto been considered the only valid method to measure the effects of healthcare.
The consequence is that most clinical decisions concerning guidelines, patient treatment, and court verdicts are decided by the experimentally driven proof of principle and not by its suitability for everyday use. These decisions are based on highly selected patient populations and often on surrogates rather than real endpoints. As a result of these two “idealized measurement conditions”, the successes achievable under everyday conditions are significantly overestimated. This overestimation can be avoided by using PCTs.
Differentiation Among the Three Outcome Dimensions and the Three Healthcare Conditions
Table 2 describes the three outcome dimensions of proof of principle (PoP), real-world effectiveness (RWE), and value (Val) from the perspectives of clinical research, health-services research, and economic research. Each address different questions, different healthcare conditions, different study types, and different methods of achieving evidence [13].
Perspective |
Question |
Answer |
Health care condition |
Type of study |
Method |
Clinical research |
Can it work? |
Objective confirmation of proof of principle (PoP) or efficacy |
Experimental study condition (ESC) |
Interventional study |
Randomized controlled trial (RCT) |
Health services research |
Does it work? |
Objective confirmation of real-world effectiveness (RWE) |
Real world condition including with systematic evaluation of outcomes |
Pragmatic/Observational study |
Pragmatic controlled trial (PCT) |
Economic research |
Is it worth it? |
Subjective confirmation of value (Val) |
Real world condition without systematic evaluation of outcomes |
Complete economic analysis |
Costeffectiveness analysis (CEA) |
Table 2: Answering the three Cochrane-Hill questions from the perspectives of clinical research, health services research, and economic research (modified from [12]).
Healthcare services have hitherto been evaluated by objective evidence of PoP, where final policy decisions are almost always made on the basis of the subjective estimation of value [17,18]. Due to the measurability of RWE, the final subjective decision can be supported by data that are closer to the objectifiable value of a healthcare service than the proof of the PoP.
Our assumption that we doctor make arbitrary decisions in everyday care is probably incorrect. Every doctor makes an implicit effort to adapt his strategy to the individual risk profile of his patient. However, this strategy has not yet been standardized [16]. In everyday health-care, we can distinguish three conditions under which healthcare services are offered. The experimental conditions require to conduct a structured RCT. These studies are well known. We recently described two functional and twelve formal criteria to distinguish these conditions from healthcare conditions under the non-structured conditions of everyday care [17-18]. In PCTs, which detect effects induced under everyday conditions, six of the twelve formal criteria are consistent with the criteria used in experimental studies, and four criteria are consistent with the criteria used in the non-structured conditions of everyday care. Two criteria of the PCT differ from all criteria of an experimental study and the criteria of everyday care [18].
Importance of the Study Question, The Study Conditions (Including Selection Criteria), and the Interpretations
Under the title “Front-end-processor”, we present data suggesting that every scientific question may be developed in four steps. If these four steps do not correspond exactly in terms of content, the risk of answering the scientific question incorrectly will increase [18]. We also address the necessary congruence of the forms (structures) and functions of research methods. Experimental methods cannot be used for the analysis of outcomes of (non-experimental) care as usual [18]. The reproduction of reported outcomes may be impossible unless the risk profiles of he investigated patients were known [19]. Scientists and policy makers use the same (experimental) data to “make” and to “take” different types of decisions [20]. Using the example of breast cancer screening we show that the same data – analyzed in different ways – can quantify both the objective risks and the subjective perception of objective risks [18,21]. More attention should be paid to the study conditions and the selection criteria because, without their descriptions, it is difficult to draw conclusions about the scope of the results collected [20]. Both the chosen study conditions and the chosen selection criteria influence the study results via direct and indirect effects. Direct effects concern the exact formulation of the study objective, the results obtained, and their interpretability. Indirect effects of a healthcare study define its scope. Since defined selection criteria in studies that are included in meta-analyses and/ or confirm similar effects are often only vaguely defined, a sharp delineation of the scope of clinical trial results cannot always be deduced [17,18].
The outcome dimension (PoP, RWE, or Val) also needs to be defined because the demonstration of each of these dimensions requires different methods and strategies. The correspondence between the precise question to be answered and the choice of the most appropriate endpoints to achieve this answer is crucial for the quality of the responses obtained. The more differentiated the inclusion criteria of a study, the more uniform (but also smaller) the population studied and the more likely it will be possible to obtain consistent results.
Understanding the significant differences between inclusion and exclusion criteria is important. Inclusion criteria are required for any form of health-related study as opposed to exclusion criteria. Exclusion criteria exist only in experimental studies, but not in studies describing everyday healthcare, like the PCT or studies describing the subjectively perceived added value of a healthcare service or health-related quality of life [9,19]. Both exclusion and inclusion criteria, depend on the study question. The function of exclusion criteria, however, is to describe all subjects who exhibit any of the confounding factors that may bias the measurement of the primary endpoint of an experimental study. It should be noted that exclusion criteria only address treatment with the study medication and selection of the study population, not treatment of the excluded subjects with all other therapies. In other words, exclusion criteria protect the evidence of PoP from bias, but compromise the evidence of RWE because the risk profiles of patients investigated in experimental studies will barely meet the conditions of care as usual.
Deriving the Consequences
The results of all decision-relevant studies, whether experimental or pragmatic, have often been applied without exact consideration of the selection (inclusion and exclusion) criteria [19]. When evaluating services in the health care system, we scientists should pay close attention to the patient population being examined. When highly selected (experimental) populations have been investigated, recommendations for everyday care can hardly be derived. Knowledge of our patients’ endpoint-specific risk profiles can improve the quality and efficiency of healthcare. The benefits of analyzing endpoint-specific risk profiles should be particularly evident in very large study groups, as the expected variance of these profiles will be rather high. For a systematic analysis, comparable risk profiles are to be stratified into similar (high, intermediate, low) endpoint-specific risk classes. This requirement can only be met if this classification is carried out according to jointly defined criteria. These considerations presuppose the willingness to cooperate in very large projects.
Discussion and Summary
Evidence of fitness for daily use should be demonstrated for all interventions applied in healthcare. The shift of focus from PoP to RWE can be justified by the following:
- RCT studies only involve a highly selected patient population in which the major risk factors affecting the measured primary endpoint have been eliminated by exclusion criteria. Exclusion criteria are not applied in a PCT because they would not accurately describe the population receiving healthcare under everyday conditions.
- An RCT limits the choice of healthcare options to the few interventions that can be compared and interpreted in an RCT. The PCT does not limit the choice of healthcare options. Each participant chooses the intervention expected to produce the optimal outcomes for the individual patient. This produces the healthcare conditions applied under everyday medical practice.
- An RCT is expected to ensure the equal distribution of all risk factors not excluded in the study populations. This, however, can hardly be confirmed because the size of the studied population depends on a large number of variables, like the number of risk factors, their effect sizes, and their interrelationships. The smaller the population studied in an RCT, the greater the danger that not all risks will be equally distributed in the randomized groups.
Progress in health care can be achieved step by step. The supplied patients will only notice that considerably more data is collected than before, but that the supply will remain unchanged for the time being. The advanced data collection will require several basis steps.
- Selection of the clinical health problem to be analyzed.
- Definition of the targeted endpoints of care in advance.
- Definition of the potential risk factors of the patients that may impair the achievement of these endpoints, i.e. the “endpointspecific risk lists (ESRLs)”.
- Based on these ESRLs, clinical expert teams can form different endpoint-specific risk classes (ESRCs; high, intermediate, low).
- To evaluate the care outcomes, each patient treated is assigned to a defined ESRC (high, intermediate, low) for each measured endpoint. The methods of AI enable this complex data assessment and collection, which includes not only the risk profile of the patient but also a classification of the therapeutic measures. Usually, multiple health problems require multiple therapies in parallel in the majority of patients [12-18].
The necessary increase in data collection may be perceived by doctors as a similar burden as the demand for randomization 30 years ago. Nevertheless, there will be a significant difference because the assessment of the risk profile will seem plausible for patients and physicians and, unlike randomization, will not affect the relationship between physician and patient.
A change in our traditional way of thinking is necessary to accept that the proof of everyday suitability of healthcare services, i.e., the new field of healthcare-services research, requires two different healthcare conditions (twin method) [15]: Care must be provided under the non-structured everyday conditions of ‘natural chaos’ prevailing in patient care, while the evaluation of healthcare outcomes requires precisely structured tools, like Bayesian statistics, with no reciprocal influence between these two methods, the care as usual and the method used for the analysis of the data. This comment should appeal to colleagues who share our concern that the uncritical interpretation of the results of experimental RCTs could affect the financial viability of our health systems.
Several scientists doubted the results of the RCTs [22,23]. However, it is possible that the method of the RCT itself is not the problem. The randomization of patients (in contrast to well defined objects) requires compliance with specific framework conditions, e.g. the exclusion of certain risks. Consequently, the interpretation of the results of an RCT will only be valid if the limitations defined by the framework conditions are actually taken into account. Otherwise, the effects that can actually be achieved will be overestimated. RCTs cannot provide detailed data to derive new care concepts because the risk profiles of the patients treated will be too different. This assumption suggests that PCTs should not be carried out regionally, but at the national level. Without taking into account the national perspective, there is a risk of overlooking risks that are specific to certain regions. In other words, the orientation of care towards ESRPs is probably more complex than originally expected.
This commentary does not claim to discuss all the details of the detailed assessment of supply effects. However, it pursues the goal of increasing interest in proving RWE. Without this proof, it will hardly be possible to prove the efficiency of health services.
Acknowledgements
This comment could only be created by the valuable contributions of my colleagues: Christel Weiss (Medical Statistics, Medical Faculty of the University of Heidelberg, 68167 Mannheim / Germany), Meret Phlippen (Department ENT, University Hospital Dresden, 01307 Dresden / Germany), Felicitas Wiedemann (CLARUNIS, University Digestive Health Care Center, Basel, 4002 Basel / Switzerland), Reinaldo A. Silva-Sobrinho (Dept. Public Health, Universidade Estadual do Oeste do Paraná, CEP 85819-110, Brazil), Paulo CM Mayer (Dept. Psychology, Universidade CEUMA Imperatriz, MA, 65903-093, Brazil), and Manfred Weiss (Dept. Anesthesiology and Intensive Care Medicine, University Hospital Ulm, 89081 Ulm / Germany).
References
- Bowman L, Weidinger F, Albert MA, Fry ETA, Pinto FJ (2023) Randomized Trials Fit for the 21st Century. A Joint Opinion from the European Society of Cardiology, American Heart Association, American College of Cardiology, and the World Heart Federation. J Am Coll Cardiol 81: 1205-1210.
- Park JJH, Detry MA, Murthy S, Guyatt G, Mills EJ (2022) How to use and interpret the results of a platform trial: users’ guide to the medical literature. JAMA 327: 67-74.
- Gordon AC, Angus DC, Derde LPG (2021) Interleukin-6 Receptor Antagonists in Critically Ill Patients with Covid-19. Reply. N Engl J Med 385: 1147-1149.
- Higgins AM, Berry LR, Lorenzi E, Murthy S, McQuilten Z, et al. (2023) Long-term (180-Day) Outcomes in Critically Ill Patients With COVID-19 in the REMAP-CAP Randomized Clinical Trial. JAMA 329: 39-51.
- Barnett ML, Sax PE (2023) Long-term Follow-up After Critical COVID-19: REMAP-CAP Revisited. JAMA 329: 25-27.
- Schwartz D, Lellouch J (1967) Explanatory and pragmatic attitudes in therapeutic trials. J Chron Dis 20: 637-648.
- Grimes DA, Schulz KF (2002) An overview of clinical research: the lay of the land. Lancet 359: 57-61.
- Thiese MS (2014) Observational and interventional study design types; an overview. Biochem Med 24: 199-210.
- Wiedemann F, Porzsolt F (2022) Measuring Health-Related Quality of Life in Randomised Controlled Trials: Expected and Reported Results Do Not Match. Pragmat Obs Res 13: 9-16.
- Haynes B (1999) Can it work? Does it work? Is it worth it? The testing of healthcare interventions is evolving. BMJ 319: 652-653.
- Sullivan LH (1896) The tall office building artistically considered. Lippincott’s Magazine 57: 403-409. Reprinted in Inland Architect and News Record 27 (May 1896), pp. 32-34; Western Architect 31 (January 1922), pp. 3-11; published as “Form and Function Artistically Considered” The Craftsman 8 (July 1905), pp. 453-458.
- Porzsolt F, Eisemann M, Habs M, Wyer P (2013) Form Follows Function: Pragmatic Controlled Trials (PCTs) have to answer different questions and require different designs than Randomized Controlled Trials (RCTs). Z Gesundh Wiss 21: 307-313.
- Porzsolt F, Rocha NG, Toledo-Arruda AC, Thomaz TG, Moraes C, et al. (2015) Efficacy and Effectiveness Trials Have Different Goals, Use Different Tools, and Generate Different Messages. Pragmat Obs Res 6: 47-54.
- Porzsolt F, Wiedemann F, Phlippen M, Weiss C, Weiss M, et al. (2020) The terminology conflict on efficacy and effectiveness in healthcare. J Comp Eff Res 9: 1171-1178.
- Porzsolt F, Weiss C, Weiss M (2023) Covid-19: Twin method for demonstration of real-world effectiveness (RWE) under the conditions of day-to-day care. Gesundheitswesen 84: 22-25.
- Porzsolt F (2025) The Call for Progress in Evidence-based Medicine. Qeios ID: 3TDWGW. http://doi.org/10.32388/3TDWGW. Jan. 24th, 2025.
- Porzsolt F, Weiss M, Weiss C (2024) Applying the Rule of Designers and Architects “Form Follows Function (FFF)” Can Reduce Misinterpretations and Methodical Shortcomings in Healthcare. Trends Gen Med 2: 1-7.
- Porzsolt F, Phlippen MS, Legrum P, Weiss M (2024) The Front-End Processor Developed by Engineers — A Useful Tool for Describing the Quality and Quantity of Progress in Healthcare. Qeios ID: 8PWWZD.
- Porzsolt F, Wiedemann F, Becker SI, Rhoads CJ (2019) Inclusion and exclusion criteria and the problem of describing homogeneity of study populations in clinical trials. BMJ Evidence-Based Medicine BMJ Evid Based Med 24: 92-94.
- Muir Gray JA (2004) Evidence based policy making. BMJ 329: 988989.
- Porzsolt F, Pfuhl G, Kaplan RM, Eisemann M (2021) Covid-19 pandemic lessons: Uncritical communication of test results can induce more harm than benefit and raises questions on standardized quality criteria for communication and liability. Health Psychol Behav Med 9: 818-829.
- Krauss A (2018) Why all randomised controlled trials produce biased results. Ann Med 50: 312-322.
- Jureidini J, McHenry LB (2022) The illusion of evidence based medicine. BMJ 376: o702.
© by the Authors & Gavin Publishers. This is an Open Access Journal Article Published Under Attribution-Share Alike CC BY-SA: Creative Commons Attribution-Share Alike 4.0 International License. Read More About Open Access Policy.