Power posing: a seemingly simple solution for nerves. Image
credit: PopTech via
Flickr
A simple enough solution for public speaking nerves or a high-stakes job interview: look powerful. Amy
Cuddy, in a 2012 TED Talk, spoke passionately about her research as a prominent social psychologist. Studies
she'd worked on demonstrated that by simply posing like a superhero, one can alter not only their behavior,
but their literal physical self. Decreased
stress hormones, increased confidence and better life outcomes were all shown to stem from holding a stance
for mere minutes.
“Don't fake it till you make it," Cuddy says in her talk. "Fake it till you become.”
When researchers attempted to rerun the study using different
subjects or larger samples, though, they did not observe the same effect. In fact, they were unable find
anything close to the magnitude of the original studies. They published their findings.
Power posing quickly became an example of psychology's "replication crisis." After decades of research, in
the early 2010s academics started sharing their attempts to build on previous papers, only to find they
could not demonstrate the effect at all. Social psychology, with its famed studies which seeped into popular
culture, was undergoing a seismic shift.
Replications often don't generate the same magnitude of effect
Replication by original and replication effect size, normalized
1
0.5
Replication Effect
Power posing
0
-0.5
0
0.25
0.5
0.75
1
Original effect
1
0.5
Replication Effect
Power
posing
0
-0.5
0
0.25
0.5
0.75
1
Original effect
1
0.5
Replication Effect
Power
posing
0
-0.5
0
0.25
0.5
0.75
1
Original effect
Source: FORRT Replication Database
Here's a study that attempted to replicate the power posing hypothesis.
If the study showed a successful replication, it would be at or near this line.
Instead, power posing shows a near-zero effect in its replication compared to
the
original study.
Of over 500 verified studies, most show no replication effects.
Anything below the line shows a smaller effect size than previously published
Over 50 even showed the opposite effect as their original findings.
The extent of the replication crisis is difficult to understand. Researchers may choose to replicate studies
for various reasons that may lead to a bias in the types of studies replicated most often, such as the
simplicity of the study design or how reasonable the hypothesis is.
The Framework for Open and Reproducible Research Training, or FORRT, attempts to collect replication attempts
of psychological studies in an open-source and independently verified database. Users can submit replication
attempts found throughout publications, and the organizations will verify the results in the name of
"advancing research transparency, reproducibility, rigor, and ethics through pedagogical reform and
meta-scientific
research."
Of the 505 collected replications, over 60% did not find statistically significant effects reported in the
original studies. Others found results that were signficant, but demonstrated the opposite direction of the
original effect.
More than 60% of attempted replications failed, with some even showing reverse effects
Replication studies by result
Opposite effect
No effect
Failure
Success
Opposite effect
No effect
Failure
Success
Opposite effect
No effect
Failure
Success
Source: FORRT Replication Database
Brian Nosek is a psychologist, professor and co-founder of the Center for Open Science who helped bring
transparency to the field. In graduate school, he found that colleagues could not replicate seminal papers
but had no incentive to share that with the larger field.
"We'd go to the bar at the conference, and other labs would say, 'we can't replicate that either,'" Nosek
said. "Why aren't you publishing this?"
Flawed studies and hypotheses became popular for many reasons. First, publications preferred results with
significant effects, meaning journals often ignored papers that may have shown unsuccessful replications or
disproven hypotheses. More extreme effects also draw more attention from readers, incentivizing journals to
prefer those with larger or exciting findings.
The focus on publishable results also led individual academics to skew their analysis toward data that showed
a statistically significant result. Usually, this does not mean actively manipulating or changing raw test
data. Instead, by continously filtering data by different subgroups such as age or gender, they introduce
the chance that some relationships look significant. In actuality, these differences are mostly likely due
to chance. This method of analyzing data until you find a significant result is often called 'p-hacking,'
referencing the p-value which indicates something is statistically significant.
Additional reasons include sample bias, which may occur when a study utilizes too few people or participants
who don't represent a population. Often, studies are run exclusively on undergraduate students of the
researcher's university who attempt to extrapolate results to the broader public.
Specific domains of psychology saw substantial shifts in citations and the number of new publications in
those domains, Nosek said. The fields most impacted by the unsuccessful replications were ego depletion (exhausting someone's willpower to change their behavior),
priming (showing or telling a person something in advance to change their reaction to other stimuli) and terror management (how knowledge of death alters cultural signifcance and personality), all of which fall under the "social psychology."
Of the subdomains listed in the FORRT database, only 22% findings were successfully replicated in social
psychology. Differential psychology, the most replicable field in the dataset, focuses on individual and
group differences in behavior.
Social psychology was the subdomain with the largest percentage of failed replication
Percentage of successful replications by subdomain
Social psychology
25%
General psychology
32%
Marketing
41%
Experimental philosophy
74%
Differential psychology
89%
Social psychology
25%
General psychology
32%
Marketing
41%
Experimental philosophy
74%
Differential psychology
89%
Social psych.
25%
General psych.
32%
Marketing
41%
Experimental phil.
74%
Differential psych.
89%
Source: FORRT Replication Database
At the level of individual research papers, there was relatively little change, according to Nosek. The
number of times other researchers would cite unsuccessfully replicated papers has not shown to decline meaningfully after the
replication.
Joining the FORRT database with citation counts from Google Scholar, there was little difference between a
successful replication and a failure. For papers with completely successful replications, the median number
of citations was 205, compared to papers with at least one failed replication having 201.
This finding contracticts other papers. Researchers
previously demonstrated that specific overall citation counts are higher on average for unsuccessful
replications, most likely due to the more extreme findings.
Withstanding the debate around citation counts of individual papers, psychology as a whole responded with
advancements in transparency and the creation of a metascience community. The
Center for Open Science popularized Registered
Reports, allowing researchers to publish without sharing results first and focus on methodology.
Nosek also pointed to increased reproducibility, which will benefit accountability overall.
"It has to happen at a scale broader than the individual by individual replication to have impact," Nosek
said.
Methodology
The data was based on and downloaded from FORRT's replication database, found here. After compiling the
data, I used each reference to gather total citation counts from Google Scholar, utlizing Playwright to
automatically search the list of references and BeautifulSoup to scrape the data. I also analyzed each
title to ensure that the first entry from Google Scholar's result was the correct study in the database.
To bypass Google's captchas, I utlized an automated extension called NopeCHA that solved CAPTCHA
tasks that prevent bots from accessing the site. I then calculated the median citation counts based on
whether or not the replication was successful.