Levels of Evidence

How to Learn What We Want to Know

By Diana L. Thompson
[Somatic Research]

I received an email recently berating the Massage Therapy Foundation for supporting a research study that did not include a control group. It gave me pause to consider that we, too, are going along with the belief that randomized controlled trials (RCTs) are the gold standard for all health-care research.

By traditional standards, RCTs are widely touted as gold, but as we look for innovations in health-care delivery and insurance coverage, and create new public health policy, perhaps we should also be looking for new ways of informing our practices and educating the public on the use and benefits of complementary and alternative medicine (CAM) therapies, and support studies that have real world implications and accurately represent how therapies are practiced.

Let’s start by reviewing the levels of evidence of research (see diagram on page 118). A basic understanding of the research designs commonly used and why will lay the groundwork for thinking beyond what currently exists. Then, we will look at research methods and what type of studies can best tell us what we need to know to be safe and effective with our clients, and secure our place in health care and wellness as the environment for public health shifts to include prevention. Finally, we will place this in the context of what is happening this year as the National Center on Complementary and Alternative Medicine (NCCAM) determines the next five years of research funding priorities and how their decisions may impact somatic research.

Levels of Evidence

Levels of evidence are like giving research design a grade. An “A,” or the highest level of evidence, is awarded to studies that provide information that is applicable to a large group of people. This is known as generalizability.

Bias also plays a role in determining level of evidence. Reducing bias, or the study’s ability to ensure that the results are directly related to the intervention and not to chance or other outside influences, is the goal of every researcher reaching for high marks. Let’s look at some examples of research designs to see how this works.

Case Reports

Case reports document an interaction between one practitioner and one client. For example, a massage therapist has an elderly client suffering from lack of sleep. After several massage sessions, the client’s sleep improves and the practitioner decides to write up the results for publication. The information takes the form of a case report and contributes to the body of knowledge of our profession, informing other practitioners, referring caregivers, educators, and researchers. Case reports are a mechanism for recording information on typical and atypical clinical interactions, and may focus on anything from the interview questions, assessment techniques, and treatment, to client demographics and clinical settings. While the results are not generalizable to a larger population (the results only show how one person reacted to the intervention of one practitioner) they offer a perspective on what is possible and suggest what might warrant further study. Case reports can inform and shape a potential hypothesis for a larger study.

Bias is inherent in case reports. The client has already chosen the practitioner and the type of treatment, so he or she is invested in the success of the treatment more so than being randomized into an unknown protocol with an unknown practitioner.

Case reports are a low level of evidence. The data is not generalizable and bias is inherent.

Pilot Studies

Once an interest is identified (for example, sleep disturbances in older adults) and a hypothesis is formed (can massage therapy improve sleep in older adults), a pilot study can be used to test the hypothesis on a larger group of people and identify the feasibility of, and refinements for, further study. A pilot study does not include a comparison or control group and often uses a small set of participants: more than one, but usually less than 50. A pilot study is an opportunity to see if the study design was effective in answering the question posed in the hypothesis. Once the protocol (intervention or techniques) and methods (the selection of participants, the application of the protocol, the measurement tools, etc.) are tested and refined, a comparison trial or RCT may be conducted.

Bias is slightly less in pilot studies than in case reports. The participants are recruited from outside: they are not from inside the practitioner’s existing clientele, nor do they select the practitioner. But with only one intervention provided in the study, they are agreeing to receive a known treatment, tending toward bias.

Pilot studies are more generalizable and have less bias than case reports, and therefore have a higher level of evidence.

Randomized Controlled Trials

RCTs typically compare two or more clinical interventions to determine which treatment is best for an identified population. RCTs involve recruiting a large number of participants with a particular condition (sleep disturbances), and a computer program or other blinded selection process randomly funnels them into one of a few different arms of a research project (music therapy, aromatherapy, massage therapy). In randomized trials, the participants do not select the type of treatment nor do they select the practitioner, thereby limiting bias.

The term controlled often refers to a placebo treatment: something that looks like the intended intervention but doesn’t contain any healing properties. Placebos are difficult to design for somatic therapies. One problem is that we have yet to identify which component of massage therapy—the one-on-one interaction, the healing intent, the touch itself or type of touch, the relaxing environment—is most critical in an intervention, or if the combination of the various aspects of a session makes the intervention effective. If an individual component of the massage therapy session is isolated, the problem becomes how to remove the healing properties of it to create the placebo—virtually impossible.

Control groups may then become comparison groups, as in the example above, comparing massage therapy to music therapy or aromatherapy. Often, the comparison groups include a “usual care” arm—a continuation of what the primary care physician has prescribed, such as stretching, or self-care exercises—to demonstrate if massage therapy is more or less effective than the standard of care. The inherent problem with all of these examples is that a typical massage therapy session may include all of these options: stretching homework or self-care education is common in massage therapy sessions, as is music and scents.

All that aside, RCTs are generalizable and limit bias, obtaining the gold standard seal of approval for clinical trials.


Meta-analyses combine data from like studies to create a larger pool of evidence. For example, a search is done on “massage and sleep,” and 300 articles are identified. The articles are eliminated or included according to set criteria, resulting in 50 articles. Measurement tools are evaluated and commonalities are identified so data can be combined. Many meta-analyses only consider RCTs, putting somatic research at a disadvantage.

While rigorous scientific research is accumulating on massage therapy in recent years, there are comparatively few RCTs involving massage interventions (of 9,373 research articles on massage in CAM on PubMed, 762 were randomized controlled trials).1 As a result, many meta-analyses simply say there is not enough data available to draw conclusions on the effectiveness of massage therapy.

Inclusion/exclusion criteria often eliminate studies that are not generalizable or where bias is evident, producing data that represents the highest level of evidence available.

Other Research Methods to Consider

Are there so few RCTs on massage therapy because there are not enough funds dedicated to large massage studies or because there are enough inherent problems in designing RCTs for massage applications that we should look to other types of research?

The answer to both is yes. NIH is the largest funder of health-care research in the United States; NCCAM is the division of NIH that funds complementary health-care research. While massage therapy is the public’s number one out-of-pocket CAM expense involving practitioner intervention (verses self-care like vitamins and herbal remedies) only one percent of NCCAM’s funding goes to massage therapy research.2

In the real world, while time and money is spent determining whether an intervention is better than a placebo, somatic therapists are combining therapies and responding to the complexity of each individual client, altering the session as the tissue changes under our hands. Clients are combining modalities, trying to find the perfect environment for healing to take place. We do not live in an either/or world, but often combine manual lymph drainage with surgery and prescription drugs, or sports massage with nutrition and exercise, or craniosacral therapy with chiropractic and yoga. It seems less important to determine which modality is better than another, beyond the need to ensure safety, without studying the power of how they interact and complement together.

This does not mean we should stop investigating basic science (why and how massage works, for example, what is the mechanism at work when massage helps people sleep) or abandon the RCT design. There are aspects of RCTs that make perfect sense: increase generalizability and reduce bias. But what else is there, and where should the funding go?

In February 2009, with the adoption of the American Recovery and Reinvestment Act, $10.4 billion is earmarked for NIH research, renovations, and equipment upgrades.3 President Obama specifically put a call out for funds to be spent on comparative effectiveness research (CER). CER is “a rigorous evaluation of the impact of different options that are available for treating a given medical condition for a particular set of patients. Such a study may compare similar treatments, such as competing drugs, or it may analyze very different approaches, such as surgery and drug therapy.”4 Anywhere from $400 million to $1 billion is to be set aside for CER.

Will CER give us the information we seek? It does not require that we invent a placebo for massage. This is positive. It does, however, advocate an either/or rather than an add-on method of research. This is fine in the reductionist model, but again, it does not represent real world massage applications.

Are there other applications of outcomes research? According to the U.S. Department of Health and Human Services: “Outcomes research seeks to understand the end results of particular health-care practices and interventions. End results include effects that people experience and care about, such as change in the ability to function. In particular, for individuals with chronic conditions—where cure is not always possible—end results include quality of life as well as mortality. No longer just the domain of a small cadre of researchers, outcomes research has altered the culture of clinical practice and health-care research by changing how we assess the end results of health-care services. In doing so, it has provided the foundation for measuring the quality of care.”5

It appears there is room for interpretation. Perhaps we should be encouraging NIH and NCCAM to prioritize funding CER that focuses on real world outcomes. Sounds very similar to what Congress had in mind when they created NCCAM.

NCCAM 2011–2015 Strategic Plan

NCCAM is developing a strategic plan that will impact the next five years of research funding. In response to NCCAM’s call for comment on the new strategic plan, John Weeks on www.theintegratorblog.com pointed out some serious disconnects between this proposed direction, a direction that will determine how $600 million will be spent over the next five years, and the original mandate from Congress. The 1998 Congress mandate that established NCCAM identified practical, real world outcomes as the focus for NCCAM, yet less than 1 percent of NCCAM’s budget has funded effectiveness and cost-effectiveness research.6

The emphasis has been to continue funding RCTs and basic science, similar to what has been done for Western medical research rather than to shift to a new paradigm and delve into what is possible with CAM therapies. Dozens of representatives from CAM organizations have written to NCCAM Director Dr. Josephine Briggs to share their concerns about the direction NCCAM is heading and the need to limit research on basic science and find new methods beyond RCTs to study outcomes.

While the debate has slowed down and we await the final version of NCCAM’s strategic plan, we can ponder the possibility that all levels of evidence are necessary building blocks to identifying studies that can answer our questions, and that funding real world outcomes research takes priority over RCTs and basic science research. Let’s ensure our government funds focus on discovering how to integrate massage and bodywork into health-care systems and preventative wellness models. And let’s discover ways of designing collaborative research to define how CAM therapies work together to ensure the health of our clients, rather than focusing on identifying the one best intervention. 

 An LMP since 1984, Diana Thompson has created a varied and interesting career out of massage: from specializing in pre- and postsurgical lymph drainage to teaching, writing, consulting, and volunteering. Her consulting includes assisting insurance carriers on integrating massage into insurance plans, and educating researchers on massage therapy theory and practice to ensure research
projects and protocols are designed to match how we practice. Contact her at soapsage@comcast.net.


1. Search of “complementary and alternative medicine” on PubMed, December 7, 2009.

2. National Center for Complementary and Alternative Medicine, “Statistics on Complementary and Alternative Medicine in the United States.” Available at http://nccam.nih.gov/news/camstats (accessed January 2010).

3. The Henry J. Kaiser Family Foundation,  “Kaiser Daily Health Policy Report.” Available at www.kaisernetwork.org/Daily_Reports/rep_index.cfm?DR_ID=57761 (accessed January 2010).

4. U.S. Department of Health & Human Services, Office of Extramural Research, “NIH Challenge Grants in Health and Science Research (RC1).” Available at http://grants.nih.gov/grants/funding/challenge%5Faward (accessed January 2010).

5. U.S. Department of Health & Human Services, Agency for Healthcare Research and Quality, “Outcomes Research: Fact Sheet.” Available at www.ahrq.gov/clinic/outfact.htm (accessed January 2010).

6. John Weeks, “How NCCAM’s ‘Real World’ Congressional Mandate is Optimal for NCCAM’s 2010–2015 Strategic Plan,” The Integrator Blog. Available at http://theintegratorblog.com/site/index.php?option=com_content&task=view&id=606&Itemid=189 (accessed January 2010).