(UPDATE 2012-11-02: I made some changes to the prose to increase readability.)

Recently, I got into a slap fight with PolitiFactBias.com (PFB). The self-proclaimed PolitiFact whistle blower bristled at my claim that my estimate of the partisan bias among two leading fact checkers is superior to theirs. A recurring theme in the debate surrounded PFB's finding that PolitiFact.com's "Pants on Fire" category, which PolitiFact reserves for egregious statements, occurs much more often for Republicans than for Democrats. Because the "Pants on Fire" category is the most subjective of the categories in PolitiFact's Truth-O-Meter, PFB believes the comparison is evidence of PolitiFact's liberal bias.

I agree with PFB that the "Pants on Fire" category is highly subjective. That's why, when I calculate my factuality scores, I treat the the category the same as I treat the "False" category. Yet treating the two categories the same doesn't account for selection bias. Perhaps PolitiFact is more likely to choose ridiculous statements that Republicans make so that they can rate them as "Pants on Fire", rather than because Republicans tend to make ridiculous statements more often than Democrats.

One way to adjust for selection bias on ridiculous statements is to pretend that "Pants on Fire" rulings ever happened. Presumably, the rest of the Truth-O-Meter categories are less susceptible to partisan bias in the selection and rating of statements. Therefore, the malarkey scores calculated from a report card excluding "Pants on Fire" statements might be a cleaner estimate of the factuality of an individual or group.

To examine the effect of excluding the "Pants on Fire" category on the comparison of malarkey scores between Republican and Democrats, I used Malark-O-Meter's simulation methods to statistically compare the collated malarkey scores of Rymney and Obiden after excluding the "Pants on Fire" statements from the observed Politi-Fact report cards. The collated malarkey score adds up the statements in each category across all the individuals in a certain group (such as a campaign ticket), and then calculates a malarkey score from the collated ticket. I examine the range of values of the modified comparison in which we have 95% statistical confidence. I chose the collated malarkey score comparison because it is one of the comparisons that my original analysis was most certain about, and because the collated malarkey score is a summary measure of the falsehood in statements made collectively by a campaign ticket.

My original analysis suggested that Rymney spews 1.17 times more malarkey than Obiden (either that or fact checkers have 17% liberal bias). Because we have a small sample of fact checked statements, however, we can only be 95% confident that the true comparison (or the true partisan bias) leads to the conclusion that Rymney spewed between 1.08 and 1.27 times more malarkey than Obiden. We can, however, be 99.99% certain that Rymney spewed more malarkey than Obiden, regardless of how much more.

After excluding the "Pants on Fire" category, you know what happens to the estimated difference between the two tickets and our degree of certainty in that difference? Not much. The mean comparison drops to Rymney spewing 1.14 times more malarkey than Obiden (a difference of 0.03 times, whatever that means!). The 95% confidence intervals shift a smidge left to show Rymney spewing between 1.05 and 1.24 times more malarkey than Obiden (notice that the width of the confidence intervals does not change). The probability that Rymney spewed more malarkey than Obiden plunges (sarcasm fully intended) to 99.87%. By the way, those decimals are probably meaningless for our purposes. Basically, we can be almost completely certain that Rymney's malarkey score is higher than Obiden's.

Why doesn't the comparison change that much after excluding the "Pants on Fire" rulings? There are two interacting, proximate reasons. First, the malarkey score is actually an average of malarkey scores calculated separately from the rulings of PolitiFact and The Fact Checker at The Washington Post. When I remove the "Pants on Fire" rulings from Truth-O-Meter report cards, it does nothing to The Fact Checker report cards or their associated malarkey scores.

Second, the number of "Pants on Fire" rulings is small compared to the number of other rulings. In fact, it is only 3% of the total sample of rulings across all four candidates, 2% of the Obiden collated report card, and 8% of the Rymney collated report card. So although Rymney has 4 times more "Pants on Fire" rulings than Obiden, it doesn't affect their malarkey scores from the Truth-O-Meter report cards much.

When you average one malarkey score that doesn't change all that much and another that doesn't change at all, the obvious result is that not much change happens.

What does this mean for the argument that including "Pants on Fire" rulings muddies the waters, even if I treat them the same as "False" rulings? It means that the differences I measure aren't affected heavily by the "Pants on Fire" bias, if it exists. So I'm just going to keep including them. This finding also lends credence to my argument that, if you want to call foul on PolitiFact and other top fact checkers, you need to cry foul on the whole shebang, not just one type of subjective ruling.

If you want to cry foul on all of PolitiFact's rulings, you need to estimate the potential bias in all of their rulings. That's what I did a few days ago, but what PFB hasn't done. I suggested a better way for them to fulfill their mission of exposing PolitiFact as liberally biased (which they've tried to downplay as their mission, but it clearly is). Strangely, they don't want to take my advice. It's just as well, because my estimate of PolitiFact's bias (and their estimate) can just as easily be interpreted as an estimate of true party differences.
 
 
The other day, I posted estimates of the potential partisan and centrist bias in fact checker rulings. My post was critical of fact checking critics as different in political positions as PolitiFactBias.com and Rachel Maddow.

On Sunday, Politifactbias.com posted what they call a "semi-smackdown" of my claim that they provide little quantitative evidence that PolitiFact has liberal bias.

I want to thank PolitiFactBias for engaging me in a rational debate. (I'm serious. This is good!) To show how grateful I am, I'm going to systematically tear their semi-smackdown to shreds. In the process, I will clear up points of confusion that PolitiFactBias.com (PFB.com) has about who I am, and about Malark-O-Meter's methods.

1. "Our pseudonymous subject goes by 'Brash Equilibrium.'"

My name is Benjamin Chabot-Hanowell. I prefer the Internet to know me as Brash Equilibrium, and I don't mind if people call me Brash in meatspace. The link between my true identity and my pseudonym is apparent on the Internet because I value transparency. That said, yes, call me Brash, not Benjamin.

2. "Brash goes through the trouble of adding Kessler's Pinocchios together with PolitiFact's 'Truth-O-Meter' ratings..."

I don't add the two types of report card together. Doing so would bias the estimate heavily in favor of PolitiFact, which posts many times more rulings than Kessler, and is harder on Republicans than Kessler. Instead, I calculate the malarkey score from a report card (or collated report card, or subset of statements) and average the scores for the same subset. Doing so gives the two fact checkers equal weight. I don’t do this for my debate analyses because Kessler doesn’t do separate rulings for each statement made during the debates.

3. "...and then calculates confidence intervals for various sets of ratings, based on the apparent assumption that the selection of stories is essentially random."

My confidence intervals don’t assume anything about the selection of stories. What they do assume is that fact checkers assemble a sample of statements from a population of statements, which results in sampling error. The population of statements from which those statements are selected could be everything that individual or group says. Or it could be the population of statements that are susceptible to whatever selection biases that fact checkers have. Either way, the basic mechanics of the calculation of the confidence intervals is the same. The question lies in whether I have parameterized my sampling distribution properly. Basically, PFB.com's saying that I haven't.

But what would PFB.com have me do? Introduce a prior probability distribution on the concentration parameters of the Dirichlet that isn't equal to the counts in each category plus one? Where would my prior beliefs about those parameters come from? From PFB.com’ allegations that PolitiFact cherrypicks liberal statements that are more likely to be true, whereas it cherrypicks conservative statements that are more likely to be false? Okay. What model should I use to characterize the strength of that bias, and its separate effects on conditional inclusion in each category?

We don’t know what model we should use because no one has statistically analyzed fact checker rating bias or selection bias, and that is the point of my article. Until someone does that, we can only estimate how much bias might exist. To do this, we perform a thought experiment in which we assume that I am measuring fact checker bias instead of real differences among politicians. Doing so, I gave PFB.com two figures that it is free to use to support their argument that PolitiFact is biased (they’ll also have to assert that Glen Kessler is biased; look for PolitiFactAndTheFactCheckerBias.com soon!). 

Meanwhile, I am free to use my findings to support my argument that the Republican ticket is less factual than the Democratic ticket. The truth is probably somewhere in between those two extremes, and the other extreme that fact checkers have centrist bias, as partisan liberals allege. For now, we don’t know exactly where the truth lies within that simplex of extremes. Although PFB.com's qualitative analysis suggests there might be some liberal bias, its authors rhetorically argue that there is a lot of bias. They actually argue that it's all bias! They present no statistical estimates of bias that cannot also be interpreted as statistical estimates of true differences.

4. "It's a waste of time calculating confidence intervals if the data set exhibits a significant degree of selection bias."

Item 3 soundly defended my methods against this criticism. In sum, it is not a waste of time. What is a waste of time? Assuming that you know how biased an organization is when you've no conclusive estimate of the strength of that bias whatsoever.

5. "Our case against PolitiFact is based on solid survey data showing a left-of-center ideological tendency among journalists, an extensive set of anecdotes showing mistakes that more often unfairly harm conservatives and our own study of PolitiFact's bias based on its ratings."

Survey data that shows journalists tend to be liberal doesn't automatically allow you to conclude that fact checker rulings are all bias. It doesn't give you an estimate of the strength of that bias if it exists. All it does is give one pause. And, yeah, it gives me pause, as I stated in my article when I conceded that there could be as much as 17% liberal bias in fact checker rulings!

6. "Our study does not have a significant selection bias problem."

I highly doubt that. That PFB.com makes this assumption about its research, which relies heavily on blog entries in which it re-interprets a limited subset of PolitiFact rulings, makes me as suspicious of it as it is suspicious of PolitiFact.

7. "Brash's opinion of PolitiFact Bias consists of an assertion without any apparent basis in fact."

And I never said it did. That is, in fact, the whole point of my article. Similarly, however, PFB.com's rhetoric about the strength of PolitiFact's bias has little evidentiary support. At least I recognize the gaps in my knowledge!

My methods, however, have much stronger scientific foundations than PFB.com's.

8. In response to one of my recommendations about how to do better fact checking, PFB.com writes, "How often have we said it? Lacking a control for selection bias, the aggregated ratings tell us about PolitiFact and The Fact Checker, not about the subjects whose statements they grade."

No. It tells us about both the subjects whose statements they grade, and about the raters. We don't know the relative importance of these two factors in determining the results. PFB.com thinks it does. Actually, so do I. Our opinions differ markedly. Neither is based on a good estimate of how much bias there is among fact checkers.

Subjectively, however, I think it's pretty ridiculous to assume that it's all just bias. But I guess someday we'll see!

9. "We need fact checkers who know how to draw the line between fact and opinion."

Sorry, PFB.com, you're never going to get that. What we actually need is a statistical method to estimate the influence of political beliefs on the report cards of individuals assembled from the rulings of professional fact checkers, and then a statistical method to adjust for that bias.

10. "And critics who know enough to whistle a foul when "fact checkers" cross the line and conflate the two."

Yes. People like you and Rachel Maddow (strange bedfellows, to be sure!) are valuable whistle blowers. But your value isn't in estimating the strength of political bias among fact checkers.

UPDATE (same day): PFB.com and I fling more poor at one another here.
 

    about

    Malark-O-blog published news and commentary about the statistical analysis of the comparative truthfulness of the 2012 presidential and vice presidential candidates. It has since closed down while its author makes bigger plans.

    author

    Brash Equilibrium is an evolutionary anthropologist and writer. His real name is Benjamin Chabot-Hanowell. His wife calls him Babe. His daughter calls him Papa.

    what is malarkey?

    It's a polite word for bullshit. Here, it's a measure of falsehood. 0 means you're truthful on average. 100 means you're 100% full of malarkey. Details.

    what is simulated malarkey?

    Fact checkers only rate a small sample of the statements that politicians make. How uncertain are we about the real truthfulness of politicians? To find out, treat fact checker report cards like an experiment, and use random number generators to repeat that experiment a lot of times to see all the possible outcomes. Details.

    malark-O-glimpse

    Can you tell the difference between the 2012 presidential election tickets from just a glimpse at their simulated malarkey score distributions?

    Picture
    dark = pres, light = vp
    (Click for larger image.)

    fuzzy portraits of malarkey

    Simulated distributions of malarkey for each 2012 presidential candidate with 95% confidence interval on either side of the simulated average malarkey score. White line at half truthful. (Rounded to nearest whole number.)

    Picture
    (Click for larger image.)
    • 87% certain Obama is less than half full of malarkey.
    • 100% certain Romney is more than half full of malarkey.
    • 66% certain Biden is more than half full of malarkey.
    • 70% certain Ryan is more than half full of malarkey.
    (Probabilities rounded to nearest percent.)

    fuzzy portraits of ticket malarkey

    Simulated distributions of collated and average malarkey for each 2012 presidential election ticket, with 95% confidence interval labeled on either side of the simulated malarkey score. White line at half truthful. (Rounded to nearest whole number.)

    malarkometer fuzzy ticket portraits 2012-10-16 2012 election
    (Click for larger image.)
    • 81% certain Obama/Biden's collective statements are less than half full of malarkey.
    • 100% certain Romney/Ryan's collective statements are more than half full of malarkey.
    • 51% certain the Democratic candidates are less than half full of malarkey.
    • 97% certain the Republican candidates are on average more than half full of malarkey.
    • 95% certain the candidates' statements are on average more than half full of malarkey.
    • 93% certain the candidates themselves are on average more than half full of malarkey.
    (Probabilities rounded to nearest percent.)

    Comparisons

    Simulated probability distributions of the difference the malarkey scores of one 2012 presidential candidate or party and another, with 95% confidence interval labeled on either side of simulated mean malarkey. Blue bars are when Democrats spew more malarkey, red when Republicans do. White line and purple bar at equal malarkey. (Rounded to nearest hundredth.)

    Picture
    (Click for larger image.)
    • 100% certain Romney spews more malarkey than Obama.
    • 55% certain Ryan spews more malarkey than Biden.
    • 100% certain Romney/Ryan collectively spew more malarkey than Obama/Biden.
    • 94% certain the Republican candidates spew more malarkey on average than the Democratic candidates.
    (Probabilities rounded to nearest percent.)

    2012 prez debates

    presidential debates

    Simulated probability distribution of the malarkey spewed by individual 2012 presidential candidates during debates, with 95% confidence interval labeled on either side of simulated mean malarkey. White line at half truthful. (Rounded to nearest whole number.)

    Picture
    (Click for larger image.)
    • 66% certain Obama was more than half full of malarkey during the 1st debate.
    • 81% certain Obama was less than half full of malarkey during the 2nd debate.
    • 60% certain Obama was less than half full of malarkey during the 3rd debate.
    (Probabilities rounded to nearest percent.)

    Picture
    (Click for larger image.)
    • 78% certain Romney was more than half full of malarkey during the 1st debate.
    • 80% certain Romney was less than half full of malarkey during the 2nd debate.
    • 66% certain Romney was more than half full of malarkey during the 3rd debate.
    (Probabilities rounded to nearest percent.)

    aggregate 2012 prez debate

    Distributions of malarkey for collated 2012 presidential debate report cards and the average presidential debate malarkey score.
    Picture
    (Click for larger image.)
    • 68% certain Obama's collective debate statements were less than half full of malarkey.
    • 68% certain Obama was less than half full of malarkey during the average debate.
    • 67% certain Romney's collective debate statements were more than half full of malarkey.
    • 57% certain Romney was more than half full of malarkey during the average debate.
     (Probabilities rounded to nearest percent.)

    2012 vice presidential debate

    Picture
    (Click for larger image.)
    • 60% certain Biden was less than half full of malarkey during the vice presidential debate.
    • 89% certain Ryan was more than half full of malarkey during the vice presidential debate.
    (Probabilities rounded to nearest percent.)

    overall 2012 debate performance

    Malarkey score from collated report card comprising all debates, and malarkey score averaged over candidates on each party's ticket.
    Picture
    (Click for larger image.)
    • 72% certain Obama/Biden's collective statements during the debates were less than half full of malarkey.
    • 67% certain the average Democratic ticket member was less than half full of malarkey during the debates.
    • 87% certain Romney/Ryan's collective statements during the debates were more than half full of malarkey.
    • 88% certain the average Republican ticket member was more than half full of malarkey during the debates.

    (Probabilities rounded to nearest percent.)

    2012 debate self comparisons

    Simulated probability distributions of the difference in malarkey that a 2012 presidential candidate spews normally compared to how much they spewed during a debate (or aggregate debate), with 95% confidence interval labeled on either side of the simulated mean difference. Light bars mean less malarkey was spewed during the debate than usual. Dark bars less. White bar at equal malarkey. (Rounded to nearest hundredth.)

    individual 2012 presidential debates

    Picture
    (Click for larger image.)
    • 80% certain Obama spewed more malarkey during the 1st debate than he usually does.
    • 84% certain Obama spewed less malarkey during the 2nd debate than he usually does.
    • 52% certain Obama spewed more malarkey during the 3rd debate than he usually does.
    Picture
    (Click for larger image.)
    • 51% certain Romney spewed more malarkey during the 1st debate than he usually does.
    • 98% certain Romney spewed less malarkey during the 2nd debate than he usually does.
    • 68% certain Romney spewed less malarkey during the 3rd debate than he usually does.

    (Probabilities rounded to nearest percent.)

    aggregate 2012 presidential debate

    Picture
    (Click for larger image.)
    • 58% certain Obama's statements during the debates were more full of malarkey than they usually are.
    • 56% certain Obama spewed more malarkey than he usually does during the average debate.
    • 73% certain Romney's statements during the debates were less full of malarkey than they usually are.
    • 86% certain Romney spewed less malarkey than he usually does during the average debate.

    (Probabilities rounded to nearest percent.)

    vice presidential debate

    Picture
    (Click for larger image.)
    • 70% certain Biden spewed less malarkey during the vice presidential debate than he usually does.
    • 86% certain Ryan spewed more malarkey during the vice presdiential debate than he usually does.

    (Probabilities rounded to nearest percent.)

    2012 opponent comparisons

    Simulated probability distributions of the difference in malarkey between the Republican candidate and the Democratic candidate during a debate, with 95% confidence interval labeled on either side of simulated mean comparison. Blue bars are when Democrats spew more malarkey, red when Republicans do. White bar at equal malarkey. (Rounded to nearest hundredth.)

    individual 2012 presidential debates

    Picture
    (Click for larger image.)
    • 60% certain Romney spewed more malarkey during the 1st debate than Obama.
    • 49% certain Romney spewed more malarkey during the 2nd debate than Obama.
    • 72% certain Romney spewed more malarkey during the 3rd debate than Obama.

    (Probabilities rounded to nearest percent.)

    aggregate 2012 presidential debate

    Picture
    (Click for larger image.)
    • 74% certain Romney's statements during the debates were more full of malarkey than Obama's.
    • 67% certain Romney was more full of malarkey than Obama during the average debate.

    (Probabilities rounded to nearest percent.)

    vice presidential debate

    • 92% certain Ryan spewed more malarkey than Biden during the vice presidential debate.

    (Probabilities rounded to nearest percent.)

    overall 2012 debate comparison

    Party comparison of 2012 presidential ticket members' collective and individual average malarkey scores during debates.
    • 88% certain that Republican ticket members' collective statements were more full of malarkey than Democratic ticket members'.
    • 86% certain that the average Republican candidate spewed more malarkey during the average debate than the average Democratic candidate.

    (Probabilities rounded to nearest percent.)

    observe & report

    Below are the observed malarkey scores and comparisons form the  malarkey scores of the 2012 presidential candidates.

    2012 prez candidates

    Truth-O-Meter only (observed)

    candidate malarkey
    Obama 44
    Biden 48
    Romney 55
    Ryan 58

    The Fact Checker only (observed)

    candidate malarkey
    Obama 53
    Biden 58
    Romney 60
    Ryan 47

    Averaged over fact checkers

    candidate malarkey
    Obama 48
    Biden 53
    Romney 58
    Ryan 52

    2012 Red prez vs. Blue prez

    Collated bullpucky

    ticket malarkey
    Obama/Biden 46
    Romney/Ryan 56

    Average bullpucky

    ticket malarkey
    Obama/Biden 48
    Romney/Ryan 58

    2012 prez debates

    1st presidential debate

    opponent malarkey
    Romney 61
    Obama 56

    2nd presidential debate (town hall)

    opponent malarkey
    Romney 31
    Obama 33

    3rd presidential debate

    opponent malarkey
    Romney 57
    Obama 46

    collated presidential debates

    opponent malarkey
    Romney 54
    Obama 46

    average presidential debate

    opponent malarkey
    Romney 61
    Obama 56

    vice presidential debate

    opponent malarkey
    Ryan 68
    Biden 44

    collated debates overall

    ticket malarkey
    Romney/Ryan 57
    Obama/Biden 46

    average debate overall

    ticket malarkey
    Romney/Ryan 61
    Obama/Biden 56

    the raw deal

    You've come this far. Why not just check out the raw data Maslark-O-Meter is using? I promise you: it is as riveting as a phone book.

    archives

    June 2013
    May 2013
    April 2013
    January 2013
    December 2012
    November 2012
    October 2012

    malark-O-dex

    All
    2008 Election
    2012 Election
    Average Malarkey
    Bias
    Brainstorm
    Brier Score
    Bullpucky
    Caveats
    Closure
    Collated Malarkey
    Conversations
    Dan Shultz
    Darryl Holman
    Debates
    Drew Linzer
    Election Forecasting
    Equivalence
    Fact Checking Industry
    Fallacy Checking
    Foreign Policy
    Fuzzy Portraits
    Gerrymandering
    Incumbents Vs. Challengers
    Information Theory
    Kathleen Hall Jamieson
    Launch
    Logical Fallacies
    Longitudinal Study
    Malarkey
    Marco Rubio
    Meta Analysis
    Methods Changes
    Misleading
    Model Averaging
    Nate Silver
    Origins
    Pants On Fire
    Politifactbias.com
    Poo Flinging
    Presidential Election
    Ratios Vs Differences
    Redistricting
    Red Vs. Blue
    Root Mean Squared Error
    Sam Wang
    Science Literacy
    Short Fiction
    Simon Jackman
    Small Multiples
    Stomach Parasite
    The Future
    The Past
    To Do
    Truth Goggles
    Truth O Meter
    Truth O Meter