Last year, while living away from my family for a year to do ethnographic fieldwork in a remote village on a tiny Lesser Antillean island, I kept myself sane and connected to the political news in my home country by creating a new hobby. I applied my knowledge of inferential statistics and computational simulation to use fact checker reports from PolitiFact.com and The Fact Checker at the Washington Post to comparatively judge the truthfulness of the 2012 presidential and vice presidential candidates, and (more importantly) to measure our uncertainty in those judgments. 

The site (and its syndication on the Daily Kos) generated some good discussion, some respectable traffic, and (I hope) showed its followers the potential for a new kind of inference-driven fact checking journalism. My main conclusions from the 2012 election analysis were:

(1) The candidates aren't as different as partisans left or right would have us believe.

(2) But the Democratic ticket was somewhat more truthful than the Republican ticket, both overall, and during the debates.

(3) It's quite likely that the 2012 Republican ticket was less truthful than the 2008 Republican ticket, and somewhat likely that the 2012 Democratic ticket was less truthful than the 2008 Democratic ticket.

Throughout, I tempered these conclusions with the recognition that my analyses did not account for the possible biases of fact checkers, including biases toward fairness, newsworthiness, and, yes, political beliefs. Meanwhile, I discussed ways to work toward measuring these biases and adjusting measures of truthfulness for them. I also suggested that fact checkers should begin in earnest to acknowledge that they aren't just checking facts, but the logical validity of politicians' arguments, as well. That is, fact checkers should also become fallacy checkers who gauge the soundness of an argument, not simply the truth of its premises. 

Now, it's time to close up shop. Not because I don't plan on moving forward with what I'm proud to have done here. I'm closing up shop because I have much bigger ideas.

I've started writing up an master plan for a research institute and social media platform that will revolutionize fact checking journalism. For now, I'm calling the project Sound Check. I might have to change the name because that domain name is taken. Whatever its eventual name, Sound Check will be like FiveThirtyEight meets YouGov meets PolitiFact meets RapGenius: data-driven soundness checking journalism and research on an annotated social web. You can read more about the idea from this draft executive summary.

Anyway, over the next three years (and beyond!), I hope you're going to hear a lot about this project. Already, I've started searching for funding so that I can, once I obtain my PhD in June 2014, start working full time on Sound Check.

One plan is to become an "Upstart". Upstart is a new idea from some ex-Googlers. At Upstart, individual graduates hedge their personal risk by looking for investor/mentors, who gain returns from the Upstart's future income (which is predicted from a proprietary algorithm owned by Upstart). Think of it as a capitalist, mentoring-focused sort of patronage. Unlike Kickstarter or other crowd-funding mechanisms, where patrons get feel-good vibes and rewards, Upstart investors are investing in a person like they would invest in a company.

Another plan is, of course, to go the now almost traditional crowd-funding route, but only for clearly defined milestones of the project. For example, first I'd want to get funding to organize a meet-up of potential collaborators and investors. Next I'd want to get funding for the beta-testing of the sound checking algorithm. After that I'd get funding for a beta-test of the social network aspect of Sound Check. Perhaps the these (hopefully successfully) crowd-funded projects would create interest among heavy-hitting investors.

Yet another idea is to entice some university (UW?) and some wealthy person or group of people interested in civic engagement and political fact checking to partner with Sound Check in a way similar to how FactCheck.org grew out of the Annenberg Public Policy Center at University of Pennsylvania.

Sound Check is a highly ambitious idea. It will need startup funding for servers, programmers, administrative staff, as well as training and maintaining Sound Checkers (that's fact checkers who also fallacy check). So I've got my work cut out for me. I'm open to advice and new mentors. And soon, I'll be open, along with Sound Check, to investors and donors.
 
 
This week, two political science blog posts about the difference between political engagement and factual understanding stood out to Malark-O-Meter. (Thanks to The Monkey Cage for Tweeting their links.) First, there's Brendan Nyhan's article at YouGov about how political knowledge doesn't guard against belief in conspiracy theories. Second, there's voteview's article about issues in the 2012 election. (Side note: This could be the Golden Era of political science blogging) These posts stand out both as cautionary tales about what it means to be politically engaged versus factual, and as promising clues about how to assess the potential biases of professional fact checkers in order to facilitate the creation of better factuality metrics (what Malark-O-Meter is all about).

Let's start with Nyhan's disturbing look at the interactive effect of partisan bias and political knowledge on belief in the conspiracy theory that the 2012 unemployment rate numbers were manipulated for political reasons. The following pair of plots (reproduced from the original article) pretty much says it all.

First, there's the comparison of Dem, Indie, and GOP perception of whether unemployment statistics are accurate, grouped by party affiliation and low, medium, and high scores on a ten-question quiz on political knowledge.
Republicans and maybe Independents with greater political knowledge perceive the unemployment statistics to be less accurate.

Here's a similar plot showing the percent in each political knowledge and party affiliation group that believe in the conspiracy theory about the September unemployment statistic report.
Democrats appear less likely to believe the conspiracy theory the more knowledgeable they are. Republicans with greater political knowledge are more likely to believe the conspiracy theory. There's no clear effect among Independents. What's going on?

Perhaps the more knowledgeable individuals are also more politically motivated, and so is their reasoning. It just so happens that motivated reasoning in this case probably errs on the side of the politically knowledgeable Democrats.

Before discussing what this means for fact checkers and factuality metrics, let's look at what voteview writes about an aggregate answer to a different question, posed by Gallup (aka, the new whipping boy of the poll aggregators) about the June jobs report.
Picture
Click to enlarge.
In case you haven't figured it out, you're looking at yet another picture of motivated reasoning at work (or is it play?). Democrats were more likely than Republicans to see the jobs report as mixed or positive, whereas Republicans were more likely than Democrats to see it as negative. You might expect this effect to shrink among individuals who say they pay very close attention to news about the report because, you know, they're more knowledgeable and they really think about the issues and... NOPE!
Picture
Click to enlarge.
The more people say they pay attention to the news, the more motivated their reasoning appears to be.

What's happening here? In Nyhan's study, are the more knowledgeable people trying to skew the results of the survey to make it seem like more people believe or don't believe in the conspiracy theory? In the Gallup poll, is "paid very close attention to news about the report" code for "watched a lot of MSNBC/Fox News"? Or is it an effect similar to what we see among educated people who tend to believe that vaccinations are (on net) bad for their children despite lots and lots of evidence to the contrary? That is, do knowledgeable people know enough to be dangerous(ly stupid)?

I honestly don't know what's happening, but I do have an idea about what this might mean for the measurement of potential act checker bias to aid the creation of better factuality metrics and fact checking methods. I think we can all agree that fact checkers are knowledgeable people. The question is, does their political knowledge and engagement have the same effect on their fact checking as its does on the perceptions educated non-fact-checkers? If so, is the effect as strong?

I've mentioned before that a step toward better fact checking is to measure the potential effect of political bias on both the perception of fact and the rulings of fact checkers. Basically, give individuals a questionnaire that assesses their political beliefs, and see how they proceed to judge the factuality of statements made by individuals of known party affiliations, ethnicity, et cetera. To see if fact checking improves upon the motivated reasoning of non-professionals, compare the strength of political biases on the fact checking of professionals versus non-professionals. 

What these two blog posts tell me is that, when drawing such comparisons, I should take into account not only the political affiliation of the non-professionals, not only the political knowledge of the non-professionals, but the interaction of those two variables. Then, we can check which subgroup of non-professionals the professional fact checkers are most similar to, allowing us to make inferences about whether professional fact checkers suffer from the same affliction of motivated reasoning that the supposedly knowledgeable non-professionals suffer from.
 
 
Recently, the Nieman Journalism Lab reported on OpenCaptions, the creation of Dan "Truth Goggles" Schultz. OpenCaptions prepares a live television transcript from closed captions, which can then be analyzed. I came across OpenCaptions back in October, when I learned about Schultz's work on Truth Goggles, which highlights web content that has been fact checked by PolitiFact. Reading about it this time reminded me of something I'd written in my critique of the fact checking industry's opinions about factuality comparison among individual politicians.

At the end of that post, I commented on a suggestion made by Kathleen Hall Jamieson of the Annenberg Public Policy Center about how to measure the volume of factuality that a politician pumps into the mediasphere. Jamieson's suggestion was to weight the claims that a politician makes by the size of their audience. I pointed out some weaknesses of this factuality metric. I also recognized that it is still useful, and described the data infrastructure necessary to calculate the metric. Basically, you need to know the size of the audience of a political broadcast (say, a political advertisement), the content of the broadcast, and the soundness of the arguments made during the broadcast.

OpenCaptions shows promise as a way to collect the content of political broadcasts and publish it to the web for shared analysis. Cheers to Dan Schultz for creating yet another application that will probably be part of the future of journalism...and fact checking...and factuality metrics.
 
 
Fact checkers perform a vital public service. The truth, however, is contentious. So fact checkers take criticism from all sides. Sometimes, they deserve it. For example, Greg Marx wrote in the Columbia Journalism Review,
But here’s where the fact-checkers find themselves in a box. They’ve reached for the clear language of truth and falsehood as a moral weapon, a way to invoke ideas of journalists as almost scientific fact-finders. And for some of the statements they scrutinize, those bright-line categories work fine.

A project that involves patrolling public discourse, though, will inevitably involve judgments not only about truth, but about what attacks are fair, what arguments are reasonable, what language is appropriate. And one of the maddening things about the fact-checkers is their unwillingness to acknowledge that many of these decisions—including just what constitutes “civil discourse”—are contestable and, at times, irresolvable.
Whether or not fact checkers wield it as a "moral weapon", they certainly use the "language of truth and falsehood", and some of them attempt to define "bright-line categories". This is most true for PolitiFact and The Fact Checker, which give clear cut, categorical rulings to the statements that they cover, and whose rulings currently form the basis of Malark-O-Meter's malarkey score, which rates the average factuality of individuals and groups. 

The language of truth and falsehood does "invoke ideas of journalists as almost scientific fact-finders." But it isn't just the language of truth and falsehood that bestows upon the art of fact checking an air of science. Journalists who specialize in fact checking do many things that scientists do (but not always). They usually cover falsifiable claims, flicking a wink into Karl Popper's posthumous cup of tiddlies. They always formulate questions and hypotheses about the factuality of the claims that they cover. They usually test their hypotheses against empirical evidence rather than unsubstantiated opinion.

Yet Fact checkers ignore a lot of the scientific method. For instance, they don't replicate (then again, neither do many scientists). Moreover, fact checkers like PolitiFact and The Fact Checker use rating scales that link only indirectly and quite incompletely to the logic of a claim. To illustrate, observe PolitiFact's description of its Truth-O-Meter scale.
True – The statement is accurate and there’s nothing significant missing.

Mostly True – The statement is accurate but needs clarification or additional information.

Half True – The statement is partially accurate but leaves out important details or takes things out of context.

Mostly False – The statement contains some element of truth but ignores critical facts that would give a different impression.

False – The statement is not accurate.

Pants on Fire – The statement is not accurate and makes a ridiculous claim. [Malark-O-Meter note: Remember that the malarkey score treats "False" and "Pants on Fire" statements the same.]
Sometimes, fact checkers specify in the essay component of their coverage the logical fallacies that a claim perpetrates. Yet neither the Truth-O-Meter scale nor The Fact Checker's Pinocchio scale specify which logical fallacies were committed or how many. Instead, PolitiFact and The Fact Checker use a discrete, ordinal scale that combines accuracy in the sense of correctness with completeness in the sense of clarity. 

By obscuring the reasons why something is false, these ruling scales make it easy to derive factuality metrics like the malarkey score, but difficult to interpret what those metrics mean. More importantly, PolitiFact and The Fact Checker make themselves vulnerable to the criticism that their truth ratings are subject to ideological biases because...well...because they are. Their apparent vagueness makes them so. Does this make the Truth-O-Meter and Pinocchio scales worthless? Probably not. But we can do better. Here's how.

When evaluating an argument (all claims are arguments, even if they are political sound bites), determine if it is sound. To be sound, all of an argument's premises must be true, and the argument must be valid. To be true, a premise must adhere to the empirical evidence. To be valid, an argument must commit no logical fallacies. The problem is that the ruling scales of fact checkers conflate soundness and validity. The solution is to stop doing that.

When and if Malark-O-Meter grows into a fact checking entity, it will experiment with rating scales that specify and enumerate logical fallacies. It will assess both the soundness and the validity of an argument. I have an idea of how to implement this on the web that is so good, I don't want to give it away just yet.

There are thousands of years of formal logic research that stretch into the modern age. Hell, philosophy PhD Gary N. Curtis publishes an annotated and interactive taxonomic tree of logical fallacies on the web.

Stay tuned to Malark-O-Meter, where I'm staging a fact check revolution.
 
 
There's a lot of talk this week about Marco Rubio, who is already being vetted as a possible front runner in the 2016 presidential campaign...in 2012...right after the 2012 presidential campaign. In answer to the conservatives' giddiness about the Senator from Florida, liberals have been looking for ways to steal Rubio's...er...storm clouds on the horizon that could lead to potential thunder maybe in a few years? I dunno. Anyway, one example of this odd little skirmish involves a comment that Senator Rubio made in answer to a GQ interviewers' question about the age of the Earth:
GQ: How old do you think the Earth is?

Marco Rubio: I'm not a scientist, man. I can tell you what recorded history says, I can tell you what the Bible says, but I think that's a dispute amongst theologians and I think it has nothing to do with the gross domestic product or economic growth of the United States. I think the age of the universe has zero to do with how our economy is going to grow. I'm not a scientist. I don't think I'm qualified to answer a question like that. At the end of the day, I think there are multiple theories out there on how the universe was created and I think this is a country where people should have the opportunity to teach them all. I think parents should be able to teach their kids what their faith says, what science says. Whether the Earth was created in 7 days, or 7 actual eras, I'm not sure we'll ever be able to answer that. It's one of the great mysteries. [emphasis added]
"Gotcha!" say my fellow liberals (and I). Ross Douthat, conservative blogger at the New York Times (among other places), argues convincingly that it was a "politician's answer" to a politically contentious question, but rightly asks why Rubio answered in a way that fuels the "conservatives vs. science" trope that Douthat admits has basis in reality. Douthat writes that Rubio could have said instead:
I’m not a scientist, but I respect the scientific consensus that says that the earth is — what, something like a few billions of years old, right? I don’t have any trouble reconciling that consensus with my faith. I don’t think the 7 days in Genesis have to be literal 24-hour days. I don’t have strong opinions about the specifics of how to teach these issues — that’s for school boards to decide, and I’m not running for school board — but I think religion and science can be conversation partners, and I think kids can benefit from that conversation.
So why didn't Rubio say that instead of suggesting wrongly, and at odds with overwhelming scientific consensus, that the age of the Earth is one of the greatest mysteries? 

A more important issue relevant to the fact checking industry that Malark-O-Meter studies and draws on to measure politicians' factuality, why aren't statements like this featured in fact checking reports? The answer probably has something to do with one issue Rubio raised in his answer to GQ, and something that pops up in Douthat's wishful revision.

  • "I think the age of the universe has zero to do with how our economy is going to grow." (Rubio)
  • "...I'm not running for school board..." (Douthat)

You can easily associate these statements with a key constraint of the fact checking industry. As Glenn Kessler stated in a recent panel discussion about the fact checking industry, fact checkers are biased toward newsworthy claims that have broad appeal (PolitiFact's growing state-level fact checking effort notwithstanding). Most Americans care about the economy right now, and few Americans have ever thought scientific literacy was the most important political issue. Fact checkers play to the audience on what most people think are the most important issues of the day. I could not find one fact checked statement that a politician made about evolution or climate change that wasn't either a track record of Obama's campaign promises, or an assessment of how well a politicians' statements and actions adhere to their previous positions on these issues.

What does the fact checker bias toward newsworthiness mean for Malark-O-Meter's statistical analyses of politicians' factuality? Because fact checkers aren't that interested in politicians' statements about things like biology and cosmology, the malarkey score isn't going to tell you much about how well politicians adhere to the facts on those issues.  Does that mean biology, cosmology, and other sciences aren't important? Does that mean that a politicians' scientific literacy doesn't impact the soundness of their legislation?

No.

The scientific literacy of politicians is salient to whether they support particular policies on greenhouse gas reduction, or stem cell research, or education, or, yes, the economy. After all, although economics is a soft science, it's still a science. And if you watched the recent extended debate between Rubio and Jon Stewart on the Daily Show, and you also read the Congressional Research Report that debunks the trickle down hypothesis, and you've read the evidence that we'd need a lot of economic growth to solve the debt problem, you'd recognize that some of Rubio's positions on how to solve our country's economic problems do not align well with the empirical evidence.

But does that mean that Rubio is full of malarkey? According to his Truth-O-Meter report card alone, no. The mean of his simulated malarkey score distribution is 45, and we can be 95% certain that, if we sampled another incomplete report card with the same number of Marco Rubio's statements, his measured malarkey score would be between 35 and 56. Not bad. By comparison, Obama, the least full of malarkey among the 2012 presidential candidates, has a simulated malarkey score based on his Truth-O-Meter report card of 44 and is 95% likely to fall between 41 and 47. The odds that Rubio's malarkey score is greater than Obama's are only 3 to 2, and the difference between their malarkey score distributions averages only one percentage point.

How would a more exhaustive fact checking of Rubio's scientifically relevant statements influence his malarkey score? I don't know. Is this an indictment of truthfulness metrics like the ones that Malark-O-Meter calculates? Not necessarily. It does suggest, however, that Malark-O-Meter should look for ways to modify its methods to account for the newsworthiness bias of fact checkers.

If my dreams for Malark-O-Meter ever come to fruition, I'd like it to be at the forefront of the following changes to the fact checker industry:

  1. Measure the size and direction of association between the topics that fact checkers cover, the issues that Americans currently think are most important, and the stuff that politicians say.
  2. Develop a factuality metric for each topic (this would require us to identify the topic(s) relevant to a particular statement).
  3. Incorporate (and create) more fact checker sites that provide information about a politicians' positions on topics that are underrepresented by the fact checker industry. For example, one could use a Truth-O-Meter-like scale to rate the positions that individuals have on scientific topics, which are often available at sites like OnTheIssues.org.

So it isn't that problems like these bring the whole idea of factuality metrics into question. It's just that the limitations of the fact checker data instruct us about how we might correct for them with statistical methods, and with new fact checking methods. Follow Malark-O-Meter and tell all your friends about it so that maybe we can one day aid that process.
 
 
A funny short story about the triumph and perils of endless recursions in meta-analysis. NOT a critique of meta-analysis itself.
Once upon a time, there was a land called the United States of America, which was ruled by a shapeshifter whose physiognomy and political party affiliation was recast every four years by an electoral vote, itself a reflection of the vote of the people. For centuries, the outcome of the election had been foretold by a cadre of magicians and wizards collectively known as the Pundets. Gazing into their crystal balls at the size of crowds at political rallies, they charted the course of the shapeshifting campaign. They were often wrong, but people listened to them anyway.

Then, from the labyrinthine caves beneath the Marginuvera Mountains emerged a troglodyte race known as the Pulstirs. Pasty of skin and snarfy in laughter, they challenged the hegemony of the Pundet elite by crafting their predictions from the collective utterances of the populace. Trouble soon followed. Some of the powerful new Pulstir craftsmen forged alliances with one party or another. And as more and more Pulstirs emerged from Marginuvera, they conducted more and more puls.

The greatest trouble came, unsurprisingly, from the old Pundet guard in their ill-fated attempts to merge their decrees with Pulstir findings. Unable to cope with the number of puls, unwilling to so much as state an individual pul's marginuvera, the Pundet's predictions confused the people more than it informed them.

Then, one day, unbeknownst to one another, rangers emerged from the Forests of Metta Analisis. Long had each of them observed the Pundets and Pulstirs from afar. Long had they anguished over the amount of time the Pundets spent bullshyting about what the ruler of America would look like after election day rather than discussing in earnest the policies that the shapeshifter would adopt. Long had the rangers shaken their fists at the sky every time Pundets with differing loyalties supported their misbegotten claims with a smattering of gooseberry-picked puls. Long had the rangers tasted vomit at the back of their throats whenever the Pundets at Sea-en-en jabbered about it being a close race when one possible shapeshifting outcome had been on average trailing the other by several points in the last several fortnights of puls.

Each ranger retreated to a secluded cave, where they used the newfangled signal torches of the Intyrnet to broadcast their shrewd aggregation of the Pulstir's predictions. There, they persisted on a diet of espresso, Power Bars, and drops of Mountain Dew. Few hours they slept. In making their predictions, some relied only on the collective information of the puls. Others looked as well to fundamental trends of prosperity in each of America's states. 

Pundets on all (by that, we mean both) sides questioned the rangers' methods, scoffed at the certainty with which the best of them predicted that the next ruler of America would look kind of like a skinny Nelson Mandela, and would support similar policies to the ones he supported back when he had a bigger chin and lighter skin, was lame of leg, and harbored great fondness for elegantly masculine cigarette holders.

On election day, it was the rangers who triumphed, and who collectively became known as the Quants, a moniker that was earlier bestowed upon another group of now disgraced, but equally pasty rangers who may have helped usher in the Great Recession of the early Second Millennium. The trouble is that the number of Quants had increased due to the popularity and controversy surrounding their predictions. While most of the rangers correctly predicted the physiognomy of the president, they had differing levels of uncertainty in the outcome, and their predictions fluctuated to different degrees over the course of the lengthy campaign.

Soon after the election, friends of the Quants, who had also trained in the Forests of Metta Analisis, made a bold suggestion. They argued that, just as the Quants had aggregated the puls to form better predictions about the outcome of the election, we could aggregate the aggregates to make our predictions yet more accurate. 

Four years later, the Meta-Quants broadcast their predictions alongside those of the original Quants. Sure enough, the Meta-Quants predicted the outcome with greater accuracy and precision than the original Qaunts.

Soon after the election, friends of the Meta-Quants, who had also trained in the Forests of Metta Analsis, made a bold suggestion. They argued that, just as the Meta-Quants had aggregated the Quants to form better predictions about the outcome of the election, we could aggregate the aggregates of the aggregates to make even better predictions.

Four years later, the Meta-Meta-Quants broadcast their predictions alongside those of the Quants and the Meta-Quants. Sure enough, the Meta-Meta-Quants predicted the outcome with somewhat better accuracy and precision than the Meta-Quants, but not as much better as the Meta-Quants had over the Quants. Nobody really paid attention to that part of it.

Which is why, soon after the election, friends of the Meta-Meta-Quants, who had also trained in the Forests of Metta Analisis, made a bold suggestion. They argued that, just as the Meta-Meta-Quants had aggregated the Meta-Quants to form better predictions about the outcome of the election, we could aggregate the aggregates of the aggregates of the aggregates to make even better predictions.

...

One thousand years later, the (Meta x 253)-Quants broadcast their predictions alongside those of all the other types of Quants. By this time, 99.9999999% of Intyrnet communication was devoted to the prediction of the next election, and the rest was devoted to the prediction of the election after that. A Dyson Sphere was constructed around the sun to power the syrvers necessary to compute and communicate the prediction models of the (Meta x 253)-Quants, plus all the other types of Quants. Unfortunately, most of the brilliant people in the Solar System were employed making predictions about elections. Thus the second-rate constructors of the Dyson Sphere accidentally built its shell within the orbit of Earth, blocking out the sun and eventually causing the extinction of life on the planet.

The end.
 
 
Many accuse fact checkers like PolitiFact and The Fact Checker of bias. Most of these accusations come from the right, for which the most relevant example is politifactbias.com. Conservatives don't focus as heavily on The Washington Post's Fact Checker, perhaps because its rulings are apparently more centrist than PolitiFact, and because PolitiFact rulings apparently favor Democrats at least a little bit [1].

We can use Malark-O-Meter's recent analysis of the 2012 election candidates' factuality to estimate the magnitude of liberal bias necessary to explain the differences observed between the two parties and estimate our uncertainty in the size of that bias.

The simplest way to do this is to re-interpret my findings as measuring the average liberal bias of the two fact checkers, assuming that there is no difference between the two tickets. The appropriate comparison here is what I call the collated ticket malarkey, which sums all statements that the members of a ticket make in each category, then calculates the malarkey score from the collated ticket. Using statistical simulation methods, I've estimated the probability distribution of the ratio of the collated malarkey scores of Rymney to Obiden.

Here's a plot of that distribution with the 95% confidence intervals labeled on either side of the mean ratio. The white line lies at equal malarkey scores between the two tickets.

Interpreted as a true comparison of factuality, the probability distribution indicates that we can expect Rymney's statements are on average 17% more full of malarkey than Obiden's, although we can be 95% confident that the comparison is somewhere between 8% and 27% more red than blue malarkey. 

Interpreted as an indicator of the average bias of PolitiFact and The Fact Checker, the probability distribution suggests that, if the two tickets spew equal amounts of malarkey, then the fact checkers on average rate the Democratic ticket's statements as somewhere between 8% and 27% more truthful than the Republican ticket's statements.

I'm going to speak against my subjective beliefs as a bleeding heart liberal and say that amount of bias isn't all that unrealistic, even if the bias is entirely subconscious.

If instead we believed like a moderate conservative that the true comparison was reversed - that is, if we believed that Obiden spewed 17% more malarkey than Rymney - then it suggests that the fact checkers's average bias is somewhere between 16% and 54% for the Democrats, with a mean estimated bias of 34%. 

It seems unrealistic to me that PolitiFact and The Fact Checker are on average that biased against the Republican party, even subconsciously. So while I think it's likely that bias could inflate the difference between the Republicans and Democrats, I find it much less likely that bias has reversed the comparison between the two tickets. Of course, these beliefs are based on hunches. Unlike politifactbias.com's rhetoric and limited quantitative analysis, however, it is based on good estimates of the possible bias, and our uncertainty in it.

It isn't just conservatives that accuse PolitiFact and The Fact Checker of bias. Believe it or not, liberals do, too. Liberals accuse fact checkers of being too centrist in a supposedly misguided quest to appear fair. You can look to Rachel Maddow as a representative of this camp. Maddow's accusations, like politifactbias.com's, typically nitpick a few choice rulings (which is funny, because a lot of critics on both sides accuse PolitiFact and The Fact Checker of cherrypicking).

Such accusations amount to the suggestion that fact checkers artificially shrink the difference between the two parties, making the histogram that I showed above incorrectly hover close to a ratio of one. So how much centrist bias do the fact checkers have on average?

Well, let's assume for a moment that we don't know which party spews more malarkey. We just know that, as I've estimated, the fact checkers on average rule that one party spews somewhere between 1.08 and 1.27 times the malarkey that the other party spews. Now let's put on a Rachel Maddow wig or a Rush Limbaugh bald cap and fat suit to become true partisans that believe the other side is actually, say, 95% full of crap, while our side is only 5% full of crap. This belief leads to a ratio of 19 to 1 comparing the malarkey of the enemy to our preferred party. Already, it seems unrealistic. But let's continue.

Next, divide each bin in the histogram I showed above by 19, which is the "true" ratio according to the partisans. The result is a measure of the alleged centrist bias of the average fact checker (at least at PolitiFact or The Fact Checker). Get a load of the 95% confidence interval of this new distribution: it runs from about 6% to about 7%. That is, a partisan would conclude that PolitiFact and The Fact Checker are on average so centrist that their rulings shrink the difference between the two parties to a mere SIX PERCENT of what it "truly" is.

I don't know about you, but I find this accusation as hard to swallow, if not harder, than the accusation that there is minor partisan bias among fact checkers.

Then again, my belief that fact checkers on average get it about right is entirely subjective. Given the data we currently have, it is not currently possible to tell how much partisan bias versus centrist bias versus honest mistakes versus honest fact checking contribute to the differences that I have estimated.

So what is the way forward? How can we create a system of fact checking that is less susceptible to accusations of bias, whether partisan or centrist? Here are my suggestions, which will require a lot of investment and time.

  1. More fact checking organizations. We need more large-scale fact checking institutions that provide categorical rulings like The Fact Checker and PolitiFact. The more fact checker rulings we have access to, the more fact checker rulings we can analyze and combine into some (possibly weighted) average.
  2. More fact checkers. We need more fact checkers in each institution so that we can rate more statements. The more statements we can rate, the weaker selection bias will be because, after some point, you can't cherrypick anymore.
  3. Blind fact checkers. After the statements are collected, they should be passed to people who do not see who made the statement. While it will be possible for people to figure out who made some statements, particularly when they are egregious, and particularly when they are repeated by a specific party or individual, many statements that fact checkers examine can be stripped of information about the individuals or parties involved so that fact checkers can concentrate on the facts.
  4. Embrace the partisans and centrists. There should be at least one institution that employs professional fact checkers who are, according to some objective measure, at different points along the various political dimensions that political scientists usually measure.  So long as they are professional fact checkers and not simply politically motivated hacks, let these obvious partisans and centrists subconsciously cherrypick, waffle, and misrule to their heart's content so that we can actually measure the amount of subconscious bias rather than make accusations based on scanty evidence and fact checker rulings that make our neck hairs bristle.


I hope that Malark-O-Meter will someday grow into an organization that can realize at least one of these recommendations.
[1] To see how PolitiFact and The Fact Checker disagree, and how PolitiFact is harder on Republicans, compare my PolitiFact and The Fact Checker based malarkey scores (see right side bar), and read the press release of a study done by George Mason's Center for Media and Public Affairs, and another study done by the same organization.
 
 
"The Hitchhiker's Guide to the Galaxy skips lightly over this tangle of academic abstraction, pausing only to note that the term 'Future Perfect' has been abandoned since it was discovered not to be."
(Douglas Adams, The Restaurant at the End of the Universe. Pan Books, 1980)

So where am I going with Malark-O-Meter? I've lost an unhealthy amount of sleep working on it when I should have been resting after a day of interviews and dissertation data management. My heart flutters when I'm about to finish a Malark-O-Meter analysis,  again when I'm about to share it, and again when I get some feedback. Wait a second. That must mean I'm passionate about it!

I'm passionate enough about Malark-O-Meter (M.O.M.) to think seriously about its future. I'm also passionate (almost to a fault) about doing things transparently. So I'm going to write my business plan in public view, using Google Docs. Every journey starts with one (almost) blank page (as of this writing).

Before I type another word in that document, I'll lay out what I've been thinking about so far. I'll warn you. It's too big for M.O.M.'s current britches. But I think if I play my cards right and meet the right people, all of this will one day be feasible.

I envision a non-profit organization that adheres to and extends M.O.M.'s core mission: to statistically analyze fact checker rulings to make judgments about the factuality of politicians, and to measure our uncertainty in those judgments. 

M.O.M. would continue to calculate and statistically compare intuitive measures of factuality. It would mine data sufficient to calculate malarkey scores for entire political parties, and for the complex web of politicians, PACs, SuperPACs, supporters, and surrogates that surround a political campaign. As data amasses over its existence, we would be able to study changes in factuality over time, and for a larger sample of individuals, groups, and localities.

We'd extend M.O.M.'s core mission by doing our own, in-house fact check rulings. This would increase the sample size of fact checker report cards that we use to generate malarkey scores. I see a transparent fact checking system employing at least two competent fact checkers, preferably at varying points within the political spectra, who make rulings comparable to the Truth-O-Meter and The Fact Checker. By getting into the business of fact-checking, we would also be in a position to do longitudinal malarkey research with historical scope as wide and deep as our archives of political claims.

That's right. One of the projects I'd like to do is to, within reason, make a fact check report card for every American president, alive or dead. Maybe every vice president, too. American politics has a rich historical record. Let's find a new use for it! We could ask and answer questions like, "Have presidential candidates become more brazen as the power of the presidency has increased?" Of course, to fact check history properly, we'd need to consult with at least one able historian.

Another way to extend M.O.M.'s mission is to make outreach a key component. How do we make the technical machinery of M.O.M., and its equally technical output, understandable to the broadest audience possible? To fulfill this mission, we need to blur the lines between political science and journalism. This has already happened with projects like fivethirtyeight.com, and the Princeton Election Consortium (and I'll put in a plug for the less well-known but equally awesome Darryl Holman at horsesass.org, whose voting simulations you should take seriously despite the website's hilarious name). Fact checking itself is a hybrid of journalism and scholarship. M.O.M. would add to and innovate within this new information ecology.

M.O.M.'s revenues could come from several sources. Of course, there are grants, there's crowd-funding, there's "please give us money" buttons. But we could also gain revenue by doing commissioned studies for the media, for think tanks, and, yes, even for political campaigns. I'm thinking the funding will come mostly from grants, then commissioned studies, then begging for money from the crowd.

So, yeah. These ideas are big. I am serious about this project. It is not a toy or a gimmick or a game. I want to follow through with this to end. I hope you'll follow along. And for some of you, I hope some day you will participate.

For now, stay tuned for my next analysis update, in which I'll examine all four debates, and all four candidates.
 

    about

    Malark-O-blog published news and commentary about the statistical analysis of the comparative truthfulness of the 2012 presidential and vice presidential candidates. It has since closed down while its author makes bigger plans.

    author

    Brash Equilibrium is an evolutionary anthropologist and writer. His real name is Benjamin Chabot-Hanowell. His wife calls him Babe. His daughter calls him Papa.

    what is malarkey?

    It's a polite word for bullshit. Here, it's a measure of falsehood. 0 means you're truthful on average. 100 means you're 100% full of malarkey. Details.

    what is simulated malarkey?

    Fact checkers only rate a small sample of the statements that politicians make. How uncertain are we about the real truthfulness of politicians? To find out, treat fact checker report cards like an experiment, and use random number generators to repeat that experiment a lot of times to see all the possible outcomes. Details.

    malark-O-glimpse

    Can you tell the difference between the 2012 presidential election tickets from just a glimpse at their simulated malarkey score distributions?

    Picture
    dark = pres, light = vp
    (Click for larger image.)

    fuzzy portraits of malarkey

    Simulated distributions of malarkey for each 2012 presidential candidate with 95% confidence interval on either side of the simulated average malarkey score. White line at half truthful. (Rounded to nearest whole number.)

    Picture
    (Click for larger image.)
    • 87% certain Obama is less than half full of malarkey.
    • 100% certain Romney is more than half full of malarkey.
    • 66% certain Biden is more than half full of malarkey.
    • 70% certain Ryan is more than half full of malarkey.
    (Probabilities rounded to nearest percent.)

    fuzzy portraits of ticket malarkey

    Simulated distributions of collated and average malarkey for each 2012 presidential election ticket, with 95% confidence interval labeled on either side of the simulated malarkey score. White line at half truthful. (Rounded to nearest whole number.)

    malarkometer fuzzy ticket portraits 2012-10-16 2012 election
    (Click for larger image.)
    • 81% certain Obama/Biden's collective statements are less than half full of malarkey.
    • 100% certain Romney/Ryan's collective statements are more than half full of malarkey.
    • 51% certain the Democratic candidates are less than half full of malarkey.
    • 97% certain the Republican candidates are on average more than half full of malarkey.
    • 95% certain the candidates' statements are on average more than half full of malarkey.
    • 93% certain the candidates themselves are on average more than half full of malarkey.
    (Probabilities rounded to nearest percent.)

    Comparisons

    Simulated probability distributions of the difference the malarkey scores of one 2012 presidential candidate or party and another, with 95% confidence interval labeled on either side of simulated mean malarkey. Blue bars are when Democrats spew more malarkey, red when Republicans do. White line and purple bar at equal malarkey. (Rounded to nearest hundredth.)

    Picture
    (Click for larger image.)
    • 100% certain Romney spews more malarkey than Obama.
    • 55% certain Ryan spews more malarkey than Biden.
    • 100% certain Romney/Ryan collectively spew more malarkey than Obama/Biden.
    • 94% certain the Republican candidates spew more malarkey on average than the Democratic candidates.
    (Probabilities rounded to nearest percent.)

    2012 prez debates

    presidential debates

    Simulated probability distribution of the malarkey spewed by individual 2012 presidential candidates during debates, with 95% confidence interval labeled on either side of simulated mean malarkey. White line at half truthful. (Rounded to nearest whole number.)

    Picture
    (Click for larger image.)
    • 66% certain Obama was more than half full of malarkey during the 1st debate.
    • 81% certain Obama was less than half full of malarkey during the 2nd debate.
    • 60% certain Obama was less than half full of malarkey during the 3rd debate.
    (Probabilities rounded to nearest percent.)

    Picture
    (Click for larger image.)
    • 78% certain Romney was more than half full of malarkey during the 1st debate.
    • 80% certain Romney was less than half full of malarkey during the 2nd debate.
    • 66% certain Romney was more than half full of malarkey during the 3rd debate.
    (Probabilities rounded to nearest percent.)

    aggregate 2012 prez debate

    Distributions of malarkey for collated 2012 presidential debate report cards and the average presidential debate malarkey score.
    Picture
    (Click for larger image.)
    • 68% certain Obama's collective debate statements were less than half full of malarkey.
    • 68% certain Obama was less than half full of malarkey during the average debate.
    • 67% certain Romney's collective debate statements were more than half full of malarkey.
    • 57% certain Romney was more than half full of malarkey during the average debate.
     (Probabilities rounded to nearest percent.)

    2012 vice presidential debate

    Picture
    (Click for larger image.)
    • 60% certain Biden was less than half full of malarkey during the vice presidential debate.
    • 89% certain Ryan was more than half full of malarkey during the vice presidential debate.
    (Probabilities rounded to nearest percent.)

    overall 2012 debate performance

    Malarkey score from collated report card comprising all debates, and malarkey score averaged over candidates on each party's ticket.
    Picture
    (Click for larger image.)
    • 72% certain Obama/Biden's collective statements during the debates were less than half full of malarkey.
    • 67% certain the average Democratic ticket member was less than half full of malarkey during the debates.
    • 87% certain Romney/Ryan's collective statements during the debates were more than half full of malarkey.
    • 88% certain the average Republican ticket member was more than half full of malarkey during the debates.

    (Probabilities rounded to nearest percent.)

    2012 debate self comparisons

    Simulated probability distributions of the difference in malarkey that a 2012 presidential candidate spews normally compared to how much they spewed during a debate (or aggregate debate), with 95% confidence interval labeled on either side of the simulated mean difference. Light bars mean less malarkey was spewed during the debate than usual. Dark bars less. White bar at equal malarkey. (Rounded to nearest hundredth.)

    individual 2012 presidential debates

    Picture
    (Click for larger image.)
    • 80% certain Obama spewed more malarkey during the 1st debate than he usually does.
    • 84% certain Obama spewed less malarkey during the 2nd debate than he usually does.
    • 52% certain Obama spewed more malarkey during the 3rd debate than he usually does.
    Picture
    (Click for larger image.)
    • 51% certain Romney spewed more malarkey during the 1st debate than he usually does.
    • 98% certain Romney spewed less malarkey during the 2nd debate than he usually does.
    • 68% certain Romney spewed less malarkey during the 3rd debate than he usually does.

    (Probabilities rounded to nearest percent.)

    aggregate 2012 presidential debate

    Picture
    (Click for larger image.)
    • 58% certain Obama's statements during the debates were more full of malarkey than they usually are.
    • 56% certain Obama spewed more malarkey than he usually does during the average debate.
    • 73% certain Romney's statements during the debates were less full of malarkey than they usually are.
    • 86% certain Romney spewed less malarkey than he usually does during the average debate.

    (Probabilities rounded to nearest percent.)

    vice presidential debate

    Picture
    (Click for larger image.)
    • 70% certain Biden spewed less malarkey during the vice presidential debate than he usually does.
    • 86% certain Ryan spewed more malarkey during the vice presdiential debate than he usually does.

    (Probabilities rounded to nearest percent.)

    2012 opponent comparisons

    Simulated probability distributions of the difference in malarkey between the Republican candidate and the Democratic candidate during a debate, with 95% confidence interval labeled on either side of simulated mean comparison. Blue bars are when Democrats spew more malarkey, red when Republicans do. White bar at equal malarkey. (Rounded to nearest hundredth.)

    individual 2012 presidential debates

    Picture
    (Click for larger image.)
    • 60% certain Romney spewed more malarkey during the 1st debate than Obama.
    • 49% certain Romney spewed more malarkey during the 2nd debate than Obama.
    • 72% certain Romney spewed more malarkey during the 3rd debate than Obama.

    (Probabilities rounded to nearest percent.)

    aggregate 2012 presidential debate

    Picture
    (Click for larger image.)
    • 74% certain Romney's statements during the debates were more full of malarkey than Obama's.
    • 67% certain Romney was more full of malarkey than Obama during the average debate.

    (Probabilities rounded to nearest percent.)

    vice presidential debate

    • 92% certain Ryan spewed more malarkey than Biden during the vice presidential debate.

    (Probabilities rounded to nearest percent.)

    overall 2012 debate comparison

    Party comparison of 2012 presidential ticket members' collective and individual average malarkey scores during debates.
    • 88% certain that Republican ticket members' collective statements were more full of malarkey than Democratic ticket members'.
    • 86% certain that the average Republican candidate spewed more malarkey during the average debate than the average Democratic candidate.

    (Probabilities rounded to nearest percent.)

    observe & report

    Below are the observed malarkey scores and comparisons form the  malarkey scores of the 2012 presidential candidates.

    2012 prez candidates

    Truth-O-Meter only (observed)

    candidate malarkey
    Obama 44
    Biden 48
    Romney 55
    Ryan 58

    The Fact Checker only (observed)

    candidate malarkey
    Obama 53
    Biden 58
    Romney 60
    Ryan 47

    Averaged over fact checkers

    candidate malarkey
    Obama 48
    Biden 53
    Romney 58
    Ryan 52

    2012 Red prez vs. Blue prez

    Collated bullpucky

    ticket malarkey
    Obama/Biden 46
    Romney/Ryan 56

    Average bullpucky

    ticket malarkey
    Obama/Biden 48
    Romney/Ryan 58

    2012 prez debates

    1st presidential debate

    opponent malarkey
    Romney 61
    Obama 56

    2nd presidential debate (town hall)

    opponent malarkey
    Romney 31
    Obama 33

    3rd presidential debate

    opponent malarkey
    Romney 57
    Obama 46

    collated presidential debates

    opponent malarkey
    Romney 54
    Obama 46

    average presidential debate

    opponent malarkey
    Romney 61
    Obama 56

    vice presidential debate

    opponent malarkey
    Ryan 68
    Biden 44

    collated debates overall

    ticket malarkey
    Romney/Ryan 57
    Obama/Biden 46

    average debate overall

    ticket malarkey
    Romney/Ryan 61
    Obama/Biden 56

    the raw deal

    You've come this far. Why not just check out the raw data Maslark-O-Meter is using? I promise you: it is as riveting as a phone book.

    archives

    June 2013
    May 2013
    April 2013
    January 2013
    December 2012
    November 2012
    October 2012

    malark-O-dex

    All
    2008 Election
    2012 Election
    Average Malarkey
    Bias
    Brainstorm
    Brier Score
    Bullpucky
    Caveats
    Closure
    Collated Malarkey
    Conversations
    Dan Shultz
    Darryl Holman
    Debates
    Drew Linzer
    Election Forecasting
    Equivalence
    Fact Checking Industry
    Fallacy Checking
    Foreign Policy
    Fuzzy Portraits
    Gerrymandering
    Incumbents Vs. Challengers
    Information Theory
    Kathleen Hall Jamieson
    Launch
    Logical Fallacies
    Longitudinal Study
    Malarkey
    Marco Rubio
    Meta Analysis
    Methods Changes
    Misleading
    Model Averaging
    Nate Silver
    Origins
    Pants On Fire
    Politifactbias.com
    Poo Flinging
    Presidential Election
    Ratios Vs Differences
    Redistricting
    Red Vs. Blue
    Root Mean Squared Error
    Sam Wang
    Science Literacy
    Short Fiction
    Simon Jackman
    Small Multiples
    Stomach Parasite
    The Future
    The Past
    To Do
    Truth Goggles
    Truth O Meter
    Truth O Meter