Malark-O-Meter's mission is to statistically analyze fact checker rulings to make comparative judgments about the factuality of politicians, and to measure our uncertainty in those judgments. Malark-O-Meter's methods, however, have a serious problem. To borrow terms made popular by Nate Silver's new book, Malark-O-Meter isn't yet good at distinguishing the signal from the noise. Moreover, we can't even distinguish one signal from another. I know. It sucks. But I'm just being honest. Without honestly appraising how well Malark-O-Meter fulfills its mission, there's no way to improve its methods.

Note: if you aren't familiar with how Malark-O-Meter works, I suggest you visit the Methods section.

The signals that we can't distinguish from one another are the real differences in factuality between individuals and groups, versus the potential ideological biases of fact checkers. For example, I've shown in a previous post that Malark-O-Meter's analsis of the 2012 presidential election could lead you to believe either that Romney is between four and 14 percent more full of malarkey than Obama, or that PolitiFact and The Fact Checker have on average a liberal bias that gives Obama between a four and 14 percentage point advantage in truthfulness, or that the fact checkers have a centrist bias that shrinks the difference between the two fact checkers to just six percent of what frothy-mouthed partisans believe it truly is. Although I've verbally argued that fact checker bias is probably not as strong as either conservatives or liberals believe, no one...NO ONE...has adequately measured the influence of political bias on fact checker rulings.

In a previous post on Malark-O-blog, I briefly considered some methods to measure, adjust, and reduce political bias in fact checking. Today, let's discuss the problem with Malark-O-Meter's methods that we can't tell signal from noise. The problem is a bit different than the one Silver describes in his book, which is that people have a tendency to see patterns and trends when there aren't any. Instead, the problem is how a signal might influence the amount of noise that we estimate. 

Again, the signal is potential partisan or centrist bias. The noise comes from sampling error, which occurs when you take an incomplete sample of all the falsifiable statements that a politician makes. Malark-O-Meter estimates the sampling error of a fact checker report card by randomly drawing report cards from a Dirichlet distribution, which describes the probability distribution of the proportion of statements in each report card category. Sampling error is higher the smaller your sample of statements. The greater your sampling error, the less certain you will be in the differences you observe among individuals' malarkey scores.

To illustrate the sample size effect, I've reproduced a plot of the simulated malarkey score distributions for Obama, Romney, Biden, and Ryan, as of November 11th, 2012. Obama  and Romney average 272 and ~140 rated statements per fact checker, respectively. Biden and Ryan average ~37 and ~21 statements per fact checker, respectively. The difference in the spread of their probability distributions is clear from the histograms and the differences between the upper and lower bounds of the labeled 95% confidence intervals.

Picture
The trouble is that Malark-O-Meter's sampling distribution assumes that the report card of all the falsifiable statements an individual ever made would have similar proportions in each category as the sample report card. And that assumption implies another one: that the ideological biases of fact checkers, whether liberal or centrist, do not influence the probability that a given statement of a given truthfulness category is sampled.

In statistical analysis, this is called selection bias. The conservative ideologues at PolitiFactBias.com (and Zebra FactCheck, and Sublime Bloviations; they're all written by at least one of the same two guys, really) suggest that fact checkers could bias the selection of their statements toward more false ones made by Republicans, and more true ones made by Democrats. Fact checkers might also be biased toward selecting some statements that make them appear more left-center so that they don't seem too partisan. I'm pretty sure there are some liberals out there who would agree that fact checkers purposefully choose a roughly equal number of true and false statements by conservative and liberal politicians so that they don't seem partisan. In fact, that's a common practice for at least one fact checker, FactCheck.org. The case for centrist bias isn't as clear for PolitiFact or The Fact Checker.

I think it will turn out that fact checkers' partisan or centrist biases, whether in rating or sampling statements, are too weak to swamp the true differences between individuals or groups. It is, however, instructive to examine the possible effects of selection bias on malarkey scores and their sampling errors. (In contrast, the possible effects of ideological bias on the observed malarkey scores are fairly obvious.)

My previous analysis of the possible liberal and centrist biases of fact checkers was pretty simple. To estimate the possible partisan bias, I simply compared the probability distribution of the observed differences between the Democratic and Republican candidates to ones in which the entire distribution was shifted so that the mean difference was zero, or so that the difference between the parties was reversed. To estimate possible centrist bias, I simply divided the probability distribution that I simulated by the size of the difference that frothy-mouthed partisans would expected, which is large. That analysis assumed that the width of the margin of error in the malarkey score, which is determined by the sampling error, remained constant after accounting for fact checker bias. But that isn't true.

There are at least two ways that selection bias can influence the simulated margin of error of a malarkey score. One way is that selection bias can diminish the efficiency of a fact checkers' search for statements to fact check, leading to a smaller sample size of statements on each report card. Again, the smaller the sample size, the wider the margin of error. The wider the margin of error, the more difficult it is to distinguish among individuals, holding the difference in their malarkey scores constant. So the efficiency effect of selection bias causes us to underestimate, not overestimate, our certainty in the differences in factuality that we observe. So the only reason why we should worry about this effect is that it would diminish our confidence in observed differences in malarkey scores, which might be real even though we don't know the reason (bias versus real differences in factuality) that those differences exist.

The bigger problem, of course, is that selection bias influences the probability that statements of a given truthfulness category are selected into an individual report card. Specifically, selection bias might increase the probability that more true statements are chosen over less true statements, or vice versa, depending on the partisan bias of the fact checker. Centrist selection bias might increase the probability that more half true statements are chosen, or that more equal numbers of true and false statements are chosen.

The distribution of statements in a report card definitely influences the width of the simulated margin of error. Holding sample size constant, the more even the statements are distributed among the categories, the greater the margin of error. Conversely, when statements are clumped into only a few of the categories, the margin of error is smaller. To illustrate, let's look at some extreme examples.

Suppose I have an individual's report card that rates 50 statements. Let's see what happens to the spread of the simulated malarkey score distribution when we change the spread of the statements across the categories from more even to more clumped. We'll measure how clumped the statements are with something called the Shannon entropy. The Shannon entropy is a measure of uncertainty, typically measured in bits (binary digits that can be 0 or 1). In our case, entropy measures our uncertainty in the truthfulness category of a single statement sampled from all the statements that an individual has made. The higher the entropy score, the greater the uncertainty. Entropy (thus uncertainty) is greatest when the probabilities of all possible events are equal to one another.

We'll measure the spread of the simulated malarkey score distributed by the width of its 95% confidence interval. The 95% confidence interval is the range of malarkey scores that we can be 95% certain would result from another report card with the same number of statements sampled from the same person, given our beliefs about the probabilities of each statement.

We'll compare six cases. First is the case when the true probability of each category is the same. The other five cases are when the the true probability of one category is 51 times greater than the probabilities of the other categories, which would define our beliefs of the category probabilities if we observed (or forced through selection bias) that all 50 statements were in one of the categories. Below is a table that collects the entropy and confidence interval width from each of the six cases, and compares them to the equal statement probability case, for which the entropy is greatest the confidence intervals are widest. Entropies and are rounded to the nearest tenth, confidence interval widths to the nearest whole number, and comparisons to the nearest tenth. Here are the meanings of the column headers.
  • Case: self explanatory
  • Ent.: Absolute entropy of assumed category probabilities
  • Comp. ent.: Entropy of assumed category probabilities compared to the case when the probabilities are all equal, expressed as a ratio
  • CI width: Width of 95% confidence interval
  • Comp. CI width: Width of 95% confidence interval compared to the case when the probabilities are all equal, expressed as a ratio

And here is the table:

CaseEnt.Comp. ent.CI widthComp. CI width
Equal2.31.0181.0
All true0.50.2120.66
All mostly true0.50.290.5
All half true0.50.270.4
All mostly false0.50.290.5
All false0.50.2120.66

Created with the HTML Table Generator

For all the clumped cases, the entropy is 20% of the entropy for the evenly distributed case. In fact, the entropy of all the clumped cases are the same because the calculation of entropy doesn't care about which categories are more likely than others. It only cares whether some categories are more likely than others.

The lower entropy in the clumped cases corresponds to small confidence intervals relative to the even case, which makes sense. The more certain we think we are in the probability that any one statement will be in a given report card category, the more certain we should be in the malarkey score.

This finding suggests that if fact checker bias causes oversampling of statements in certain categories, Malark-O-Meter will overestimate our certainty in the observed differences if the true probabilities within each category are more even. This logic could apply to partisan biases that lead to oversampling of truer or more false statements, or to centrist biases that oversample half true statements. The finding also suggests that a centrist bias that leads to artificially equivalent probabilities in each category will cause Malark-O-Meter to underestimate the level of certainty in the observed statements.

Another interesting finding is that the confidence interval widths that we've explored follow a predictable pattern. Here's a bar plot of the comparative CI widths from the table above.
Picture
Click for larger version.
The confidence interval is widest in the equal probability case. From there, we see a u-shaped pattern, with the narrowest confidence intervals occurring when we oversample half true statements. The confidence intervals get wider for the cases when we oversample mostly true or mostly false statements, and wider still for the cases when we oversample true or false statements. The confidence interval widths are equivaelent between the all true and all false cases, and the all mostly true and all mostly false cases.
What's going on? I don't really know yet. We'll have to wait for another day, and a more detailed analysis. I suspect it has something to do with how the malarkey score is calculated, which results in fewer malarkey score possibilities when the probabilities are more closely centered on half true statements.

Anyway, we're approaching a better understanding of how the selection bias among fact checkers can influence our comparative judgments of the factuality of politicians. Usefully, the same logic applies to the effects of fact checkers' rating biases in the absence of selection bias. You can expect Malark-O-Meter's honesty to continue. We're not here to prove any point that can't be proven. We're here to give an honest appraisal of how well we can compare the factuality of individuals using fact checker data. Stay tuned.
 

    about

    Malark-O-blog published news and commentary about the statistical analysis of the comparative truthfulness of the 2012 presidential and vice presidential candidates. It has since closed down while its author makes bigger plans.

    author

    Brash Equilibrium is an evolutionary anthropologist and writer. His real name is Benjamin Chabot-Hanowell. His wife calls him Babe. His daughter calls him Papa.

    what is malarkey?

    It's a polite word for bullshit. Here, it's a measure of falsehood. 0 means you're truthful on average. 100 means you're 100% full of malarkey. Details.

    what is simulated malarkey?

    Fact checkers only rate a small sample of the statements that politicians make. How uncertain are we about the real truthfulness of politicians? To find out, treat fact checker report cards like an experiment, and use random number generators to repeat that experiment a lot of times to see all the possible outcomes. Details.

    malark-O-glimpse

    Can you tell the difference between the 2012 presidential election tickets from just a glimpse at their simulated malarkey score distributions?

    Picture
    dark = pres, light = vp
    (Click for larger image.)

    fuzzy portraits of malarkey

    Simulated distributions of malarkey for each 2012 presidential candidate with 95% confidence interval on either side of the simulated average malarkey score. White line at half truthful. (Rounded to nearest whole number.)

    Picture
    (Click for larger image.)
    • 87% certain Obama is less than half full of malarkey.
    • 100% certain Romney is more than half full of malarkey.
    • 66% certain Biden is more than half full of malarkey.
    • 70% certain Ryan is more than half full of malarkey.
    (Probabilities rounded to nearest percent.)

    fuzzy portraits of ticket malarkey

    Simulated distributions of collated and average malarkey for each 2012 presidential election ticket, with 95% confidence interval labeled on either side of the simulated malarkey score. White line at half truthful. (Rounded to nearest whole number.)

    malarkometer fuzzy ticket portraits 2012-10-16 2012 election
    (Click for larger image.)
    • 81% certain Obama/Biden's collective statements are less than half full of malarkey.
    • 100% certain Romney/Ryan's collective statements are more than half full of malarkey.
    • 51% certain the Democratic candidates are less than half full of malarkey.
    • 97% certain the Republican candidates are on average more than half full of malarkey.
    • 95% certain the candidates' statements are on average more than half full of malarkey.
    • 93% certain the candidates themselves are on average more than half full of malarkey.
    (Probabilities rounded to nearest percent.)

    Comparisons

    Simulated probability distributions of the difference the malarkey scores of one 2012 presidential candidate or party and another, with 95% confidence interval labeled on either side of simulated mean malarkey. Blue bars are when Democrats spew more malarkey, red when Republicans do. White line and purple bar at equal malarkey. (Rounded to nearest hundredth.)

    Picture
    (Click for larger image.)
    • 100% certain Romney spews more malarkey than Obama.
    • 55% certain Ryan spews more malarkey than Biden.
    • 100% certain Romney/Ryan collectively spew more malarkey than Obama/Biden.
    • 94% certain the Republican candidates spew more malarkey on average than the Democratic candidates.
    (Probabilities rounded to nearest percent.)

    2012 prez debates

    presidential debates

    Simulated probability distribution of the malarkey spewed by individual 2012 presidential candidates during debates, with 95% confidence interval labeled on either side of simulated mean malarkey. White line at half truthful. (Rounded to nearest whole number.)

    Picture
    (Click for larger image.)
    • 66% certain Obama was more than half full of malarkey during the 1st debate.
    • 81% certain Obama was less than half full of malarkey during the 2nd debate.
    • 60% certain Obama was less than half full of malarkey during the 3rd debate.
    (Probabilities rounded to nearest percent.)

    Picture
    (Click for larger image.)
    • 78% certain Romney was more than half full of malarkey during the 1st debate.
    • 80% certain Romney was less than half full of malarkey during the 2nd debate.
    • 66% certain Romney was more than half full of malarkey during the 3rd debate.
    (Probabilities rounded to nearest percent.)

    aggregate 2012 prez debate

    Distributions of malarkey for collated 2012 presidential debate report cards and the average presidential debate malarkey score.
    Picture
    (Click for larger image.)
    • 68% certain Obama's collective debate statements were less than half full of malarkey.
    • 68% certain Obama was less than half full of malarkey during the average debate.
    • 67% certain Romney's collective debate statements were more than half full of malarkey.
    • 57% certain Romney was more than half full of malarkey during the average debate.
     (Probabilities rounded to nearest percent.)

    2012 vice presidential debate

    Picture
    (Click for larger image.)
    • 60% certain Biden was less than half full of malarkey during the vice presidential debate.
    • 89% certain Ryan was more than half full of malarkey during the vice presidential debate.
    (Probabilities rounded to nearest percent.)

    overall 2012 debate performance

    Malarkey score from collated report card comprising all debates, and malarkey score averaged over candidates on each party's ticket.
    Picture
    (Click for larger image.)
    • 72% certain Obama/Biden's collective statements during the debates were less than half full of malarkey.
    • 67% certain the average Democratic ticket member was less than half full of malarkey during the debates.
    • 87% certain Romney/Ryan's collective statements during the debates were more than half full of malarkey.
    • 88% certain the average Republican ticket member was more than half full of malarkey during the debates.

    (Probabilities rounded to nearest percent.)

    2012 debate self comparisons

    Simulated probability distributions of the difference in malarkey that a 2012 presidential candidate spews normally compared to how much they spewed during a debate (or aggregate debate), with 95% confidence interval labeled on either side of the simulated mean difference. Light bars mean less malarkey was spewed during the debate than usual. Dark bars less. White bar at equal malarkey. (Rounded to nearest hundredth.)

    individual 2012 presidential debates

    Picture
    (Click for larger image.)
    • 80% certain Obama spewed more malarkey during the 1st debate than he usually does.
    • 84% certain Obama spewed less malarkey during the 2nd debate than he usually does.
    • 52% certain Obama spewed more malarkey during the 3rd debate than he usually does.
    Picture
    (Click for larger image.)
    • 51% certain Romney spewed more malarkey during the 1st debate than he usually does.
    • 98% certain Romney spewed less malarkey during the 2nd debate than he usually does.
    • 68% certain Romney spewed less malarkey during the 3rd debate than he usually does.

    (Probabilities rounded to nearest percent.)

    aggregate 2012 presidential debate

    Picture
    (Click for larger image.)
    • 58% certain Obama's statements during the debates were more full of malarkey than they usually are.
    • 56% certain Obama spewed more malarkey than he usually does during the average debate.
    • 73% certain Romney's statements during the debates were less full of malarkey than they usually are.
    • 86% certain Romney spewed less malarkey than he usually does during the average debate.

    (Probabilities rounded to nearest percent.)

    vice presidential debate

    Picture
    (Click for larger image.)
    • 70% certain Biden spewed less malarkey during the vice presidential debate than he usually does.
    • 86% certain Ryan spewed more malarkey during the vice presdiential debate than he usually does.

    (Probabilities rounded to nearest percent.)

    2012 opponent comparisons

    Simulated probability distributions of the difference in malarkey between the Republican candidate and the Democratic candidate during a debate, with 95% confidence interval labeled on either side of simulated mean comparison. Blue bars are when Democrats spew more malarkey, red when Republicans do. White bar at equal malarkey. (Rounded to nearest hundredth.)

    individual 2012 presidential debates

    Picture
    (Click for larger image.)
    • 60% certain Romney spewed more malarkey during the 1st debate than Obama.
    • 49% certain Romney spewed more malarkey during the 2nd debate than Obama.
    • 72% certain Romney spewed more malarkey during the 3rd debate than Obama.

    (Probabilities rounded to nearest percent.)

    aggregate 2012 presidential debate

    Picture
    (Click for larger image.)
    • 74% certain Romney's statements during the debates were more full of malarkey than Obama's.
    • 67% certain Romney was more full of malarkey than Obama during the average debate.

    (Probabilities rounded to nearest percent.)

    vice presidential debate

    • 92% certain Ryan spewed more malarkey than Biden during the vice presidential debate.

    (Probabilities rounded to nearest percent.)

    overall 2012 debate comparison

    Party comparison of 2012 presidential ticket members' collective and individual average malarkey scores during debates.
    • 88% certain that Republican ticket members' collective statements were more full of malarkey than Democratic ticket members'.
    • 86% certain that the average Republican candidate spewed more malarkey during the average debate than the average Democratic candidate.

    (Probabilities rounded to nearest percent.)

    observe & report

    Below are the observed malarkey scores and comparisons form the  malarkey scores of the 2012 presidential candidates.

    2012 prez candidates

    Truth-O-Meter only (observed)

    candidate malarkey
    Obama 44
    Biden 48
    Romney 55
    Ryan 58

    The Fact Checker only (observed)

    candidate malarkey
    Obama 53
    Biden 58
    Romney 60
    Ryan 47

    Averaged over fact checkers

    candidate malarkey
    Obama 48
    Biden 53
    Romney 58
    Ryan 52

    2012 Red prez vs. Blue prez

    Collated bullpucky

    ticket malarkey
    Obama/Biden 46
    Romney/Ryan 56

    Average bullpucky

    ticket malarkey
    Obama/Biden 48
    Romney/Ryan 58

    2012 prez debates

    1st presidential debate

    opponent malarkey
    Romney 61
    Obama 56

    2nd presidential debate (town hall)

    opponent malarkey
    Romney 31
    Obama 33

    3rd presidential debate

    opponent malarkey
    Romney 57
    Obama 46

    collated presidential debates

    opponent malarkey
    Romney 54
    Obama 46

    average presidential debate

    opponent malarkey
    Romney 61
    Obama 56

    vice presidential debate

    opponent malarkey
    Ryan 68
    Biden 44

    collated debates overall

    ticket malarkey
    Romney/Ryan 57
    Obama/Biden 46

    average debate overall

    ticket malarkey
    Romney/Ryan 61
    Obama/Biden 56

    the raw deal

    You've come this far. Why not just check out the raw data Maslark-O-Meter is using? I promise you: it is as riveting as a phone book.

    archives

    June 2013
    May 2013
    April 2013
    January 2013
    December 2012
    November 2012
    October 2012

    malark-O-dex

    All
    2008 Election
    2012 Election
    Average Malarkey
    Bias
    Brainstorm
    Brier Score
    Bullpucky
    Caveats
    Closure
    Collated Malarkey
    Conversations
    Dan Shultz
    Darryl Holman
    Debates
    Drew Linzer
    Election Forecasting
    Equivalence
    Fact Checking Industry
    Fallacy Checking
    Foreign Policy
    Fuzzy Portraits
    Gerrymandering
    Incumbents Vs. Challengers
    Information Theory
    Kathleen Hall Jamieson
    Launch
    Logical Fallacies
    Longitudinal Study
    Malarkey
    Marco Rubio
    Meta Analysis
    Methods Changes
    Misleading
    Model Averaging
    Nate Silver
    Origins
    Pants On Fire
    Politifactbias.com
    Poo Flinging
    Presidential Election
    Ratios Vs Differences
    Redistricting
    Red Vs. Blue
    Root Mean Squared Error
    Sam Wang
    Science Literacy
    Short Fiction
    Simon Jackman
    Small Multiples
    Stomach Parasite
    The Future
    The Past
    To Do
    Truth Goggles
    Truth O Meter
    Truth O Meter