This week, two political science blog posts about the difference between political engagement and factual understanding stood out to Malark-O-Meter. (Thanks to The Monkey Cage
for Tweeting their links.) First, there's Brendan Nyhan
's article at YouGov
about how political knowledge doesn't guard against belief in conspiracy theories
. Second, there's voteview
's article about issues in the 2012 election
. (Side note: This could be the Golden Era of political science blogging) These posts stand out both as cautionary tales about what it means to be politically engaged versus factual, and as promising clues about how to assess the potential biases of professional fact checkers in order to facilitate the creation of better factuality metrics (what Malark-O-Meter is all about).
Let's start with Nyhan's disturbing look at the interactive effect of partisan bias and political knowledge on belief in the conspiracy theory that the 2012 unemployment rate numbers were manipulated for political reasons. The following pair of plots (reproduced from the original article) pretty much says it all.
First, there's the comparison of Dem, Indie, and GOP perception of whether unemployment statistics are accurate, grouped by party affiliation and low, medium, and high scores on a ten-question quiz on political knowledge.
Republicans and maybe Independents with greater political knowledge perceive the unemployment statistics to be less accurate.
Here's a similar plot showing the percent in each political knowledge and party affiliation group that believe in the conspiracy theory about the September unemployment statistic report.
Democrats appear less likely to believe the conspiracy theory the more knowledgeable they are. Republicans with greater political knowledge are more likely to believe the conspiracy theory. There's no clear effect among Independents. What's going on?
Perhaps the more knowledgeable individuals are also more politically motivated, and so is their reasoning. It just so happens that motivated reasoning in this case probably errs on the side of the politically knowledgeable Democrats.
Before discussing what this means for fact checkers and factuality metrics, let's look at what voteview writes about an aggregate answer to a different question, posed by Gallup (aka, the new whipping boy of the poll aggregators) about the June jobs report.
Click to enlarge.
In case you haven't figured it out, you're looking at yet another picture of motivated reasoning at work (or is it play?). Democrats were more likely than Republicans to see the jobs report as mixed or positive, whereas Republicans were more likely than Democrats to see it as negative. You might expect this effect to shrink among individuals who say they pay very close attention to news about the report because, you know, they're more knowledgeable and they really think about the issues and... NOPE!
Click to enlarge.
The more people say they pay attention to the news, the more motivated their reasoning appears to be.
What's happening here? In Nyhan's study, are the more knowledgeable people trying to skew the results of the survey to make it seem like more people believe or don't believe in the conspiracy theory? In the Gallup poll, is "paid very close attention to news about the report" code for "watched a lot of MSNBC/Fox News"? Or is it an effect similar to what we see among educated people who tend to believe that vaccinations are (on net) bad for their children despite lots and lots of evidence to the contrary? That is, do knowledgeable people know enough to be dangerous(ly stupid)?
I honestly don't know what's happening, but I do have an idea about what this might mean for the measurement of potential act checker bias to aid the creation of better factuality metrics and fact checking methods. I think we can all agree that fact checkers are knowledgeable people. The question is, does their political knowledge and engagement have the same effect on their fact checking as its does on the perceptions educated non-fact-checkers? If so, is the effect as strong?
I've mentioned before that a step toward better fact checking is to measure the potential effect of political bias on both the perception of fact and the rulings of fact checkers. Basically, give individuals a questionnaire that assesses their political beliefs, and see how they proceed to judge the factuality of statements made by individuals of known party affiliations, ethnicity, et cetera. To see if fact checking improves upon the motivated reasoning of non-professionals, compare the strength of political biases on the fact checking of professionals versus non-professionals.
What these two blog posts tell me is that, when drawing such comparisons, I should take into account not only the political affiliation of the non-professionals, not only the political knowledge of the non-professionals, but the interaction of those two variables. Then, we can check which subgroup of non-professionals the professional fact checkers are most similar to, allowing us to make inferences about whether professional fact checkers suffer from the same affliction of motivated reasoning that the supposedly knowledgeable non-professionals suffer from.
This is Internet commenting at its best: constructive, well-reasoned, and mainly correct. Let's address the comment point by point."Validity is a nice standard for mathematics and logic but it is not often found in public discourse."
I can't agree more. This unfortunate fact should not, however, discourage us from specifying and enumerating the logical fallacies that public figures commit. It should encourage us to do so, as it has encouraged the establishment of the fact checking industry."Even scientific conclusions are rarely (if ever) backed by valid reasoning as they typically rely on induction or inference to the best explanation."
I agree that scientists stray from valid (and sound) argumentation more often than they should. I do not, however, agree that scientists rarely if ever make sound or valid arguments. I also agree that scientists often use inductive reasoning. Scientists will continue to do so as Bayesian statistical methods proliferate. I do not, however, agree that inductive inference is immune to the assessment of soundness and, by inclusion, validity. Inductive reasoning
is probabilistic. For instance, a statistical syllogism
(following Wikipedia's example) could go,
- 90% of humans are right-handed.
- Joe is a human.
- Therefore, the probability that Joe is right-handed is 90% (therefore, if we are required to guess [one way or the other] we will choose "right-handed" in the absence of any other evidence).
You can assess the validity of this statistical syllogism by considering whether the steps in the argument follow logically from one another. You can assess its soundness by furthermore considering whether its premises are true. Are 90% of humans right-handed? Is Joe a human? Inductive logic is still logic."Not every claim is an argument. An argument must offer evidence intended to support a conclusion. I can claim 'I am hungry' without thereby offering any sort of argument (valid, inductive, fallacious or otherwise) in support of that claim. One cannot test the validity of a single proposition."
I agree that not every claim is an argument, either in the formal or informal sense. Every claim is, however, a premise. In such cases, we can simply determine whether or not the premise is true. Furthermore, many claims that fact checkers care about imply or support an informal (or even formal or legal) argument. In such cases, you can assess the implied informal argument's validity. Lastly, in any case where a public figure makes a claim that ties vaguely to an informal argument, that public figure deserves to be criticized for committing the ambiguity fallacy
. Many politicians often commit the ambiguity fallacy. As much as possible, we should call them on it whenever they do it."No need to check for 'both' soundness and validity. If you check for soundness, then you have already checked for validity as part of that. Perhaps you meant to say you would check for both truth of basic premises and validity of reasoning."
Correct. To be sound, an argument must be valid. What I should have said is that fact checkers conflate truth with validity."It depends a bit on which notion of fallacy you are working with, but arguments can fail to be valid without committing a common named fallacy. A far simpler check for validity is simply to find counterexamples to the reasoning (logically possible examples in which the basic premises of the argument are all true and in which the conclusion of the argument is false)."
I hope that Ima Pseudonym will elaborate on the logical counterexample part of this statement. If it's a viable shortcut, I'm all for it. That said, I suspect that there are many logical fallacies that do not yet have a name. Perhaps Malark-O-Meter's future army of logicians will name the unnamed!
Thank you again, Ima Pseudonym. Your move if you wish to continue playing. I like this game because you play it well. I encourage constructive criticism from you and all of Malark-O-Meter's readers. Cry 'Reason,' and let slip the dogs of logic.
There's a lot of talk this week about Marco Rubio
, who is already being vetted as a possible front runner in the 2016 presidential campaign...in 2012...right after the 2012 presidential campaign. In answer to the conservatives' giddiness about the Senator from Florida, liberals have been looking for ways to steal Rubio's...er...storm clouds on the horizon that could lead to potential thunder maybe in a few years? I dunno. Anyway, one example of this odd little skirmish involves a comment that Senator Rubio made in answer to a GQ interviewers' question about the age of the Earth
" say my fellow liberals (and I
). Ross Douthat, conservative blogger at the New York Times
(among other places), argues convincingly
that it was a "politician's answer" to a politically contentious question, but rightly asks why Rubio answered in a way that fuels the "conservatives vs. science" trope that Douthat admits has basis in reality. Douthat writes that Rubio could have said instead:
So why didn't Rubio say that instead of suggesting wrongly, and at odds with overwhelming scientific consensus, that the age of the Earth is one of the greatest mysteries?
A more important issue relevant to the fact checking industry that Malark-O-Meter studies and draws on to measure politicians' factuality, why aren't statements like this featured in fact checking reports? The answer probably has something to do with one issue Rubio raised in his answer to GQ, and something that pops up in Douthat's wishful revision.
- "I think the age of the universe has zero to do with how our economy is going to grow." (Rubio)
- "...I'm not running for school board..." (Douthat)
You can easily associate these statements with a key constraint of the fact checking industry. As Glenn Kessler stated in a recent panel discussion about the fact checking industry
, fact checkers are biased toward newsworthy claims that have broad appeal (PolitiFact's growing state-level fact checking effort notwithstanding). Most Americans care about the economy right now, and few Americans have ever thought scientific literacy was the most important political issue. Fact checkers play to the audience on what most people think are the most important issues of the day. I could not find one fact checked statement that a politician made about evolution or climate change that wasn't either a track record of Obama's campaign promises, or an assessment of how well a politicians' statements and actions adhere to their previous positions on these issues.
What does the fact checker bias toward newsworthiness mean for Malark-O-Meter's statistical analyses of politicians' factuality? Because fact checkers aren't that interested in politicians' statements about things like biology and cosmology, the malarkey score isn't going to tell you much about how well politicians adhere to the facts on those issues. Does that mean biology, cosmology, and other sciences aren't important? Does that mean that a politicians' scientific literacy doesn't impact the soundness of their legislation?
The scientific literacy of politicians is salient to whether they support particular policies on greenhouse gas reduction, or stem cell research, or education, or, yes, the economy. After all, although economics is a soft science, it's still a science. And if you watched the recent extended debate between Rubio and Jon Stewart on the Daily Show
, and you also read the Congressional Research Report that debunks the trickle down hypothesis
, and you've read the evidence that we'd need a lot of economic growth to solve the debt problem, you'd recognize that some of Rubio's positions on how to solve our country's economic problems do not align well with the empirical evidence.
But does that mean that Rubio is full of malarkey? According to his Truth-O-Meter report card alone, no. The mean of his simulated malarkey score
distribution is 45, and we can be 95% certain that, if we sampled another incomplete report card with the same number of Marco Rubio's statements, his measured malarkey score would be between 35 and 56. Not bad. By comparison, Obama, the least full of malarkey among the 2012 presidential candidates, has a simulated malarkey score based on his Truth-O-Meter report card of 44 and is 95% likely to fall between 41 and 47. The odds that Rubio's malarkey score is greater than Obama's are only 3 to 2, and the difference between their malarkey score distributions averages only one percentage point.
How would a more exhaustive fact checking of Rubio's scientifically relevant statements influence his malarkey score? I don't know. Is this an indictment of truthfulness metrics like the ones that Malark-O-Meter calculates? Not necessarily. It does suggest, however, that Malark-O-Meter should look for ways to modify its methods to account for the newsworthiness bias of fact checkers.
If my dreams for Malark-O-Meter
ever come to fruition, I'd like it to be at the forefront of the following changes to the fact checker industry:
- Measure the size and direction of association between the topics that fact checkers cover, the issues that Americans currently think are most important, and the stuff that politicians say.
- Develop a factuality metric for each topic (this would require us to identify the topic(s) relevant to a particular statement).
- Incorporate (and create) more fact checker sites that provide information about a politicians' positions on topics that are underrepresented by the fact checker industry. For example, one could use a Truth-O-Meter-like scale to rate the positions that individuals have on scientific topics, which are often available at sites like OnTheIssues.org.
So it isn't that problems like these bring the whole idea of factuality metrics into question. It's just that the limitations of the fact checker data instruct us about how we might correct for them with statistical methods, and with new fact checking methods. Follow Malark-O-Meter and tell all your friends about it so that maybe we can one day aid that process.
Malark-O-Meter's mission is to statistically analyze fact checker rulings to make comparative judgments about the factuality of politicians, and to measure our uncertainty in those judgments. Malark-O-Meter's methods, however, have a serious problem. To borrow terms made popular by Nate Silver's new book, Malark-O-Meter isn't yet good at distinguishing the signal from the noise
. Moreover, we can't even distinguish one signal from another. I know. It sucks. But I'm just being honest. Without honestly appraising how well Malark-O-Meter fulfills its mission, there's no way to improve its methods.
Note: if you aren't familiar with how Malark-O-Meter works, I suggest you visit the Methods
The signals that we can't distinguish from one another are the real differences in factuality between individuals and groups, versus the potential ideological biases of fact checkers. For example, I've shown in a previous post
that Malark-O-Meter's analsis of the 2012 presidential election
could lead you to believe either that Romney is between four and 14 percent more full of malarkey than Obama, or that PolitiFact and The Fact Checker have on average a liberal bias that gives Obama between a four and 14 percentage point advantage in truthfulness, or that the fact checkers have a centrist bias that shrinks the difference between the two fact checkers to just six percent of what frothy-mouthed partisans believe it truly is. Although I've verbally argued that fact checker bias is probably not as strong as either conservatives or liberals believe, no one...NO ONE
...has adequately measured the influence of political bias on fact checker rulings.
In a previous post on Malark-O-blog
, I briefly considered some methods to measure, adjust, and reduce political bias in fact checking. Today, let's discuss the problem with Malark-O-Meter's methods that we can't tell signal from noise. The problem is a bit different than the one Silver describes in his book, which is that people have a tendency to see patterns and trends when there aren't any. Instead, the problem is how a signal might influence the amount of noise that we estimate.
Again, the signal is potential partisan or centrist bias. The noise comes from sampling error, which occurs when you take an incomplete sample of all the falsifiable statements that a politician makes. Malark-O-Meter estimates the sampling error of a fact checker report card by randomly drawing report cards from a Dirichlet distribution
, which describes the probability distribution of the proportion of statements in each report card category. Sampling error is higher the smaller your sample of statements. The greater your sampling error, the less certain you will be in the differences you observe among individuals' malarkey scores.
To illustrate the sample size effect, I've reproduced a plot of the simulated malarkey score distributions for Obama, Romney, Biden, and Ryan, as of November 11th, 2012. Obama and Romney average 272 and ~140 rated statements per fact checker, respectively. Biden and Ryan average ~37 and ~21 statements per fact checker, respectively. The difference in the spread of their probability distributions is clear from the histograms and the differences between the upper and lower bounds of the labeled 95% confidence intervals.
The trouble is that Malark-O-Meter's sampling distribution assumes that the report card of all the falsifiable statements an individual ever made would have similar proportions in each category as the sample report card. And that assumption implies another one: that the ideological biases of fact checkers, whether liberal or centrist, do not influence the probability that a given statement of a given truthfulness category is sampled.
In statistical analysis, this is called selection bias. The conservative ideologues at PolitiFactBias.com (and Zebra FactCheck, and Sublime Bloviations; they're all written by at least one of the same two guys, really) suggest that fact checkers could bias the selection of their statements toward more false ones made by Republicans, and more true ones made by Democrats. Fact checkers might also be biased toward selecting some statements that make them appear more left-center so that they don't seem too partisan. I'm pretty sure there are some liberals out there who would agree that fact checkers purposefully choose a roughly equal number of true and false statements by conservative and liberal politicians so that they don't seem partisan. In fact, that's a common practice for at least one fact checker, FactCheck.org
. The case for centrist bias isn't as clear for PolitiFact or The Fact Checker.
I think it will turn out that fact checkers' partisan or centrist biases, whether in rating or sampling statements, are too weak to swamp the true differences between individuals or groups. It is, however, instructive to examine the possible effects of selection bias on malarkey scores and their sampling errors. (In contrast, the possible effects of ideological bias on the observed malarkey scores are fairly obvious.)
My previous analysis of the possible liberal and centrist biases of fact checkers was pretty simple. To estimate the possible partisan bias, I simply compared the probability distribution of the observed differences between the Democratic and Republican candidates to ones in which the entire distribution was shifted so that the mean difference was zero, or so that the difference between the parties was reversed. To estimate possible centrist bias, I simply divided the probability distribution that I simulated by the size of the difference that frothy-mouthed partisans would expected, which is large. That analysis assumed that the width of the margin of error in the malarkey score, which is determined by the sampling error, remained constant after accounting for fact checker bias. But that isn't true.
There are at least two ways that selection bias can influence the simulated margin of error of a malarkey score. One way is that selection bias can diminish the efficiency of a fact checkers' search for statements to fact check, leading to a smaller sample size of statements on each report card. Again, the smaller the sample size, the wider the margin of error. The wider the margin of error, the more difficult it is to distinguish among individuals, holding the difference in their malarkey scores constant. So the efficiency effect of selection bias causes us to underestimate, not overestimate, our certainty in the differences in factuality that we observe. So the only reason why we should worry about this effect is that it would diminish our confidence in observed differences in malarkey scores, which might be real even though we don't know the reason (bias versus real differences in factuality) that those differences exist.
The bigger problem, of course, is that selection bias influences the probability that statements of a given truthfulness category are selected into an individual report card. Specifically, selection bias might increase the probability that more true statements are chosen over less true statements, or vice versa, depending on the partisan bias of the fact checker. Centrist selection bias might increase the probability that more half true statements are chosen, or that more equal numbers of true and false statements are chosen.
The distribution of statements in a report card definitely influences the width of the simulated margin of error. Holding sample size constant, the more even the statements are distributed among the categories, the greater the margin of error. Conversely, when statements are clumped into only a few of the categories, the margin of error is smaller. To illustrate, let's look at some extreme examples.
Suppose I have an individual's report card that rates 50 statements. Let's see what happens to the spread of the simulated malarkey score distribution when we change the spread of the statements across the categories from more even to more clumped. We'll measure how clumped the statements are with something called the Shannon entropy. The Shannon entropy is a measure of uncertainty, typically measured in bits (bi
that can be 0 or 1). In our case, entropy measures our uncertainty in the truthfulness category of a single statement sampled from all the statements that an individual has made. The higher the entropy score, the greater the uncertainty. Entropy (thus uncertainty) is greatest when the probabilities of all possible events are equal to one another.
We'll measure the spread of the simulated malarkey score distributed by the width of its 95% confidence interval. The 95% confidence interval is the range of malarkey scores that we can be 95% certain would result from another report card with the same number of statements sampled from the same person, given our beliefs about the probabilities of each statement.
We'll compare six cases. First is the case when the true probability of each category is the same. The other five cases are when the the true probability of one category is 51 times greater than the probabilities of the other categories, which would define our beliefs of the category probabilities if we observed (or forced through selection bias) that all 50 statements were in one of the categories. Below is a table that collects the entropy and confidence interval width from each of the six cases, and compares them to the equal statement probability case, for which the entropy is greatest the confidence intervals are widest. Entropies and are rounded to the nearest tenth, confidence interval widths to the nearest whole number, and comparisons to the nearest tenth. Here are the meanings of the column headers.
- Case: self explanatory
- Ent.: Absolute entropy of assumed category probabilities
- Comp. ent.: Entropy of assumed category probabilities compared to the case when the probabilities are all equal, expressed as a ratio
- CI width: Width of 95% confidence interval
- Comp. CI width: Width of 95% confidence interval compared to the case when the probabilities are all equal, expressed as a ratio
And here is the table:
For all the clumped cases, the entropy is 20% of the entropy for the evenly distributed case. In fact, the entropy of all the clumped cases are the same because the calculation of entropy doesn't care about which categories are more likely than others. It only cares whether some categories are more likely than others.
The lower entropy in the clumped cases corresponds to small confidence intervals relative to the even case, which makes sense. The more certain we think we are in the probability that any one statement will be in a given report card category, the more certain we should be in the malarkey score.
This finding suggests that if fact checker bias causes oversampling of statements in certain categories, Malark-O-Meter will overestimate our certainty in the observed differences if the true probabilities within each category are more even. This logic could apply to partisan biases that lead to oversampling of truer or more false statements, or to centrist biases that oversample half true statements. The finding also suggests that a centrist bias that leads to artificially equivalent probabilities in each category will cause Malark-O-Meter to underestimate the level of certainty in the observed statements.
Another interesting finding is that the confidence interval widths that we've explored follow a predictable pattern. Here's a bar plot of the comparative CI widths from the table above.
Click for larger version.
The confidence interval is widest in the equal probability case. From there, we see a u-shaped pattern, with the narrowest confidence intervals occurring when we oversample half true statements. The confidence intervals get wider for the cases when we oversample mostly true or mostly false statements, and wider still for the cases when we oversample true or false statements. The confidence interval widths are equivaelent between the all true and all false cases, and the all mostly true and all mostly false cases.
What's going on? I don't really know yet. We'll have to wait for another day, and a more detailed analysis. I suspect it has something to do with how the malarkey score is calculated, which results in fewer malarkey score possibilities when the probabilities are more closely centered on half true statements.
Anyway, we're approaching a better understanding of how the selection bias among fact checkers can influence our comparative judgments of the factuality of politicians. Usefully, the same logic applies to the effects of fact checkers' rating biases in the absence of selection bias. You can expect Malark-O-Meter's honesty to continue. We're not here to prove any point that can't be proven. We're here to give an honest appraisal of how well we can compare the factuality of individuals using fact checker data. Stay tuned.
(UPDATE 2012-11-02: I made some changes to the prose to increase readability.)
Recently, I got into a slap fight with PolitiFactBias.com
(PFB). The self-proclaimed PolitiFact whistle blower bristled at my claim that my estimate of the partisan bias among two leading fact checkers is superior to theirs. A recurring theme in the debate surrounded PFB's finding that PolitiFact.com
's "Pants on Fire" category, which PolitiFact reserves for egregious statements, occurs much more often for Republicans than for Democrats. Because the "Pants on Fire" category is the most subjective of the categories in PolitiFact's Truth-O-Meter, PFB believes the comparison is evidence of PolitiFact's liberal bias.
I agree with PFB that the "Pants on Fire" category is highly subjective. That's why, when I calculate my factuality scores
, I treat the the category the same as I treat the "False" category. Yet treating the two categories the same doesn't account for selection bias. Perhaps PolitiFact is more likely to choose ridiculous statements that Republicans make so that they can rate them as "Pants on Fire", rather than because Republicans tend to make ridiculous statements more often than Democrats.
One way to adjust for selection bias on ridiculous statements is to pretend that "Pants on Fire" rulings ever happened. Presumably, the rest of the Truth-O-Meter categories are less susceptible to partisan bias in the selection and rating of statements. Therefore, the malarkey scores calculated from a report card excluding "Pants on Fire" statements might be a cleaner estimate of the factuality of an individual or group.
To examine the effect of excluding the "Pants on Fire" category on the comparison of malarkey scores between Republican and Democrats, I used Malark-O
-Meter's simulation methods
to statistically compare the collated malarkey scores of Rymney and Obiden after excluding the "Pants on Fire" statements from the observed Politi-Fact report cards. The collated malarkey score adds up the statements in each category across all the individuals in a certain group (such as a campaign ticket), and then calculates a malarkey score from the collated ticket. I examine the range of values of the modified comparison in which we have 95% statistical confidence. I chose the collated malarkey score comparison because it is one of the comparisons that my original analysis
was most certain about, and because the collated malarkey score is a summary measure of the falsehood in statements made collectively by a campaign ticket.
My original analysis suggested that Rymney spews 1.17 times more malarkey than Obiden (either that or fact checkers have 17% liberal bias
). Because we have a small sample of fact checked statements, however, we can only be 95% confident that the true comparison (or the true partisan bias) leads to the conclusion that Rymney spewed between 1.08 and 1.27 times more malarkey than Obiden. We can, however, be 99.99% certain that Rymney spewed more malarkey than Obiden, regardless of how much more.
After excluding the "Pants on Fire" category, you know what happens to the estimated difference between the two tickets and our degree of certainty in that difference? Not much
. The mean comparison drops to Rymney spewing 1.14 times more malarkey than Obiden (a difference of 0.03 times, whatever that means!). The 95% confidence intervals shift a smidge left to show Rymney spewing between 1.05 and 1.24 times more malarkey than Obiden (notice that the width of the confidence intervals does not change). The probability that Rymney spewed more malarkey than Obiden plunges
(sarcasm fully intended) to 99.87%. By the way, those decimals are probably meaningless for our purposes. Basically, we can be almost completely certain that Rymney's malarkey score is higher than Obiden's.
Why doesn't the comparison change that much after excluding the "Pants on Fire" rulings? There are two interacting, proximate reasons. First, the malarkey score is actually an average of malarkey scores calculated separately from the rulings of PolitiFact and The Fact Checker
at The Washington Post
. When I remove the "Pants on Fire" rulings from Truth-O-Meter report cards, it does nothing to The Fact Checker report cards or their associated malarkey scores.
Second, the number of "Pants on Fire" rulings is small compared to the number of other rulings. In fact, it is only 3% of the total sample of rulings across all four candidates, 2% of the Obiden collated report card, and 8% of the Rymney collated report card. So although Rymney has 4 times more "Pants on Fire" rulings than Obiden, it doesn't affect their malarkey scores from the Truth-O-Meter report cards much.
When you average one malarkey score that doesn't change all that much and another that doesn't change at all, the obvious result is that not much change happens.
What does this mean for the argument that including "Pants on Fire" rulings muddies the waters, even if I treat them the same as "False" rulings? It means that the differences I measure aren't affected heavily by the "Pants on Fire" bias, if it exists. So I'm just going to keep including them. This finding also lends credence to my argument that, if you want to call foul on PolitiFact and other top fact checkers, you need to cry foul on the whole shebang, not just one type of subjective ruling.
If you want to cry foul on all of PolitiFact's rulings, you need to estimate the potential bias in all of their rulings. That's what I did a few days ago
, but what PFB hasn't done. I suggested a better way for them to fulfill their mission of exposing PolitiFact as liberally biased (which they've tried to downplay as their mission, but it clearly is). Strangely, they don't want to take my advice. It's just as well, because my estimate of PolitiFact's bias (and their estimate) can just as easily be interpreted as an estimate of true party differences.
The other day, I posted estimates of the potential partisan and centrist bias in fact checker rulings
. My post was critical of fact checking critics as different in political positions as PolitiFactBias.com and Rachel Maddow.
On Sunday, Politifactbias.com posted what they call a "semi-smackdown" of my claim that they provide little quantitative evidence that PolitiFact has liberal bias
I want to thank PolitiFactBias for engaging me in a rational debate. (I'm serious. This is good!) To show how grateful I am, I'm going to systematically tear their semi-smackdown to shreds. In the process, I will clear up points of confusion that PolitiFactBias.com (PFB.com) has about who I am, and about Malark-O-Meter's methods.1. "Our pseudonymous subject goes by 'Brash Equilibrium.'"
My name is Benjamin Chabot-Hanowell
. I prefer the Internet to know me as Brash Equilibrium, and I don't mind if people call me Brash in meatspace. The link between my true identity and my pseudonym is apparent on the Internet
because I value transparency. That said, yes, call me Brash, not Benjamin.2. "Brash goes through the trouble of adding Kessler's Pinocchios together with PolitiFact's 'Truth-O-Meter' ratings..."
I don't add the two types of report card together. Doing so would bias the estimate heavily in favor of PolitiFact, which posts many times more rulings than Kessler, and is harder on Republicans than Kessler. Instead, I calculate the malarkey score from a report card (or collated report card, or subset of statements) and average the scores for the same subset. Doing so gives the two fact checkers equal weight. I don’t do this for my debate analyses because Kessler doesn’t do separate rulings for each statement made during the debates.3. "...and then calculates confidence intervals for various sets of ratings, based on the apparent assumption that the selection of stories is essentially random."
My confidence intervals don’t assume anything about the selection of stories. What they do assume is that fact checkers assemble a sample of statements from a population of statements, which results in sampling error. The population of statements from which those statements are selected could be everything that individual or group says. Or it could be the population of statements that are susceptible to whatever selection biases that fact checkers have. Either way, the basic mechanics of the calculation of the confidence intervals is the same. The question lies in whether I have parameterized my sampling distribution properly. Basically, PFB.com's saying that I haven't.
But what would PFB.com have me do? Introduce a prior probability distribution on the concentration parameters of the Dirichlet that isn't equal to the counts in each category plus one? Where would my prior beliefs about those parameters come from? From PFB.com’ allegations that PolitiFact cherrypicks liberal statements that are more likely to be true, whereas it cherrypicks conservative statements that are more likely to be false? Okay. What model should I use to characterize the strength of that bias, and its separate effects on conditional inclusion in each category?
We don’t know what model we should use because no one has statistically analyzed fact checker rating bias or selection bias, and that is the point of my article. Until someone does that, we can only estimate how much bias might
exist. To do this, we perform a thought experiment in which we assume that I am measuring fact checker bias instead of real differences among politicians. Doing so, I gave PFB.com two figures that it is free to use to support their argument that PolitiFact is biased (they’ll also have to assert that Glen Kessler is biased; look for PolitiFactAndTheFactCheckerBias.com soon!).
Meanwhile, I am free to use my findings to support my argument that the Republican ticket is less factual than the Democratic ticket. The truth is probably somewhere in between those two extremes, and the other extreme that fact checkers have centrist bias, as partisan liberals allege. For now, we don’t know exactly where the truth lies within that simplex of extremes. Although PFB.com's qualitative analysis suggests there might be some liberal bias, its authors rhetorically argue that there is a lot
of bias. They actually argue that it's all bias
! They present no statistical estimates of bias that cannot also be interpreted as statistical estimates of true differences.4. "It's a waste of time calculating confidence intervals if the data set exhibits a significant degree of selection bias."
Item 3 soundly defended my methods against this criticism. In sum, it is not a waste of time. What is a waste of time? Assuming that you know how biased an organization is when you've no conclusive estimate of the strength of that bias whatsoever.5. "Our case against PolitiFact is based on solid survey data showing a left-of-center ideological tendency among journalists, an extensive set of anecdotes showing mistakes that more often unfairly harm conservatives and our own study of PolitiFact's bias based on its ratings."
Survey data that shows journalists tend to be liberal doesn't automatically allow you to conclude that fact checker rulings are all bias. It doesn't give you an estimate of the strength of that bias if it exists. All it does is give one pause. And, yeah, it gives me pause, as I stated in my article when I conceded that there could be as much as 17% liberal bias in fact checker rulings!6. "Our study does not have a significant selection bias problem."
I highly doubt that. That PFB.com makes this assumption about its research, which relies heavily on blog entries in which it re-interprets a limited subset of PolitiFact rulings, makes me as suspicious of it as it is suspicious of PolitiFact.7. "Brash's opinion of PolitiFact Bias consists of an assertion without any apparent basis in fact."
And I never said it did. That is, in fact, the whole point of my article. Similarly, however, PFB.com's rhetoric about the strength of PolitiFact's bias has little evidentiary support. At least I recognize the gaps in my knowledge!
My methods, however, have much stronger scientific foundations than PFB.com's.8. In response to one of my recommendations about how to do better fact checking, PFB.com writes, "How often have we said it? Lacking a control for selection bias, the aggregated ratings tell us about PolitiFact and The Fact Checker, not about the subjects whose statements they grade."
No. It tells us about both the subjects whose statements they grade, and about the raters. We don't know the relative importance of these two factors in determining the results. PFB.com thinks it does. Actually, so do I. Our opinions differ markedly. Neither is based on a good estimate of how much bias there is among fact checkers.
Subjectively, however, I think it's pretty ridiculous to assume that it's all just bias. But I guess someday we'll see!9. "We need fact checkers who know how to draw the line between fact and opinion."
Sorry, PFB.com, you're never going to get that. What we actually need is a statistical method to estimate the influence of political beliefs on the report cards of individuals assembled from the rulings of professional fact checkers, and then a statistical method to adjust for that bias.10. "And critics who know enough to whistle a foul when "fact checkers" cross the line and conflate the two."
Yes. People like you and Rachel Maddow (strange bedfellows, to be sure!) are valuable whistle blowers. But your value isn't in estimating the strength of political bias among fact checkers.UPDATE (same day):
PFB.com and I fling more poor at one another here
Earlier this month, Michael Scherer published an article called "Fact Checking and the False Equivalence Dilemma
" on Time
's Swampland blog. Scherer wrote the article in response to criticism of a cover story he wrote about the "factual deceptions" of Barry Obama and Willard Romney. Some readers accused him of false centrism.
Scherer's defense is that we cannot reliably compare the deceptiveness of individuals or groups, especially not based on fact checker rulings. He based his defense on comments by the leaders of the fact checking industry during a press conference that Scherer attended
. (In fact, the comments responded to a question that Scherer himself asked.)
Evidenced by my previous post on estimating partisan and centrist bias from fact checker report cards
, I sympathize with Scherer's defense against frothy-mouthed partisans who are convinced that the other side tells nothing but a bunch of stuff. Yet I disagree with him and the leaders of the fact checking industry that we cannot reliably compare fact checker rulings (notice I don't say deceptiveness) across politicians and political groups.
To make my point, I'll condense into a list what the fact checking industry leaders and Michael Scherer have said about what Scherer calls the "false equivalence dilemma" (but which should be called the "false comparison dilemma"). For each item in the list, I'll describe the issue, then explain why it's not that big of a deal.1. "...it's self selective process," says Glen Kessler from The Fact Checker at The Washington Post.
Kessler argues that fact checkers cherrypick the statements that they fact check. No, not out of centrist or partisan bias. In this case, Kessler's talking about a bias toward the timeliness and relevance of the statement. Kessler says that he decides what to fact check based on how much he thinks the fact check will educate the public about something important, like medicare or health insurance reform. He shies away from mere slips of the tongue.
Wait a while. If the only bias fact that checkers had was to fact check timely and relevant remarks about policy, that would make Malark-O-Meter's comparisons more valid, not less. Far more concerning is the possibility that some fact checkers have a fairness bias. Which brings me to...2. "...it would look like we were endorsing the other candidate," says Brooks Jackson of FactCheck.org.
This comment raises one non-issue against comparisons while implying another. Brooks argues that by demonstrating that one politician is more deceptive than another, FactCheck.org would open itself up to accusations of partisanship. From a publishing standpoint, this makes some sense, especially if your organization wants to maintain a nonpartisan reputation. Yet the ensuing controversy might cause the buzz about your organization to get louder. Just look what's happened with Nate Silver's political calculus this week. Or better yet, look what's happened to Internet searches for PolitiFact compared to factcheck.org over the last year. (Among frothy-mouthed right-wing partisans, PolitiFact is the poster child of the liberal fact checking establishment.)
Yet from the standpoint of informing the public (which is what we're trying to do, right?), who cares if you gain a false reputation of partisan bias? Many people already believe that the fact checking industry is biased, but at least as many people find it highly readable and refreshing. Perhaps that same demographic will find lucid, academically respectable factuality comparisons similarly refreshing.
Interestingly, Jackson's comment hints at the separate issue of centrist bias among today's top fact checkers. In the quest to avoid a partisan reputation, frothy-mouthed liberals allege, the fact checking industry is too fair-minded and falsely balanced (the same criticism leveled against Scherer's cover story in Time
).I've already shown that we can use Malark-O-Meter's statistical methods to estimate the likely level of centrist bias
(assuming that one exists). In the same article, I made suggestions for how to estimate the actual level of centrist (and partisan) bias among professional fact checkers.
Furthermore, if what we're aiming at is a more informed public, why must we always shy away from ambiguity? Yes, Malark-O-Meter's measurements are a complex mix of true difference, bias, sampling error, and perceptual error. No, we don't know the relative weights of those influences. But that doesn't make the estimates useless. In fact, it makes them something for people to discuss in light of other evidence about the comparative factuality of political groups.3. “Politicians in both parties will stretch the truth if it is in their political interest,” says Glen Kessler.
Glen Kessler argues that comparing politicians is fruitless because all politicians lie. Well, I statistically compared the factuality of Obama, Biden, Romney, and Ryan
. While all of them appear about half factual, there are some statistically significant differences. I estimate that Rymney's statements are collectively nearly 20% more false than Obiden's statements (I also estimated our uncertainty in that judgment). So yes, both parties' candidates appear to stretch (or maybe just not know) the facts about half the time. But one of them most likely does it more than the other, and maybe that matters
.4. "...not all deceptions are equally deceiving, and different people will reach different judgements about which is worse," says Michael Scherer.
Scherer goes on to ask:
He then says he doesn't know the answer to those questions. Neither do I, but I don't think the answers matter. What matters is the extent to which an individual's or group's policy recommendations and rhetoric adhere to the facts. That is why the fact checking industry exists. If the questions above bother you, then the fact checking industry writ large should bother you, not just the comparison niche that Malark-O-Meter is carving out. Furthermore, since Kessler has already established that fact checkers tend to examine statements that would lead to instructive journalism, we can be confident that most rulings that we would compare are, roughly speaking, equally cogent.
Which brings me to the straw man of the false equivalence dilemma:5. We can't read someone's mind.
Much of the fact checking industry leaders' commentary, and Michael Scherer's subsequent blog entry, assumed that what we're comparing is the deceptiveness (or conversely the truthfulness) of individuals or groups. This opened up the criticism that we can't read people's minds to determine if they are being deceptive. All we can do is rate the factuality of what they say. I agree with this statement so much that I discuss this issue in the section of my website about the caveats to the malarkey score and its analysis
I contend, however, that when words come out of someone's mouth that we want to fact check, that person is probably trying to influence someone else's opinion. The degree to which people influence our opinion should
be highly positively correlated with the degree to which their statements are true. No, not true in the value laden sense. True in the sense that matters to people like scientists and court judges. So I don't think it matters whether or not we can tell if someone is trying to be deceptive. What matters should be the soundness and validity of someone's arguments. The fact checking industry exists to facilitate such evaluations. Malark-O-Meter's comparisons facilitate similar evaluations at a higher level.
Lastly, I want to address one of Michael Scherer's remarks about a suggestion by political deceptiveness research pioneer, Kathleen Hall Jamieson, who works with Brooks Jackson at the Annenberg Public Policy Center, which runs FactCheck.org.
Three things. First, this is definitely a fine idea...if you want to measure the level of deception that moved voters. But what if you simply want to measure the average factuality of the statements that an individual or group makes? In that case, there is no need to weight fact check rulings by the size of their audience. In fact, by believing this measure is a measure of individual or group factuality (rather than a measure of the effects of an individual or group's statements), you would overestimate the factuality or falsehood of highly influential people relative to less influential people.
Second, most fact check rulings are of timely and relevant statements, and they are often a campaign's main talking points. So I would be interested to see what information all that extra work would add to a factuality score.
Third, while it is difficult to do in real time, it isn't impossible, especially not in pseudo real time. (Why do we have to do it in real time, anyway? Can't people wait a day? They already wait that long or more for most fact checker rulings! Moreover, didn't we once believe real time fact checking was so difficult, and yet that's what PolitiFact did during the debates.)
Anyway, for any given campaign ad or speech or debate, there's usually a transcript. We often know the target audience. We can also estimate the size of the audience. Come up with a systematic way to put those pieces of information together, and it will become as straightforward as...well...fact checking!
In sum, so long as fact checkers are doing their job fairly well (and I think they are) people like me can do our job (oh, but I wish it actually were my job!) fairly well. That said, there is much room for improvement and innovation. Stay tuned to Malark-O-Meter, where I hope some of that will happen.
Many accuse fact checkers like PolitiFact
and The Fact Checker
of bias. Most of these accusations come from the right, for which the most relevant example is politifactbias.com
. Conservatives don't focus as heavily on The Washington Post
's Fact Checker, perhaps because its rulings are apparently more centrist than PolitiFact, and because PolitiFact rulings apparently favor Democrats at least a little bit .
We can use Malark-O-Meter's recent analysis of the 2012 election candidates' factuality
to estimate the magnitude of liberal bias necessary to explain the differences observed between the two parties and
estimate our uncertainty in the size of that bias.
The simplest way to do this is to re-interpret my findings as measuring the average liberal bias of the two fact checkers, assuming that there is no difference between the two tickets. The appropriate comparison here is what I call the collated ticket malarkey, which sums all statements that the members of a ticket make in each category, then calculates the malarkey score
from the collated ticket. Using statistical simulation methods
, I've estimated the probability distribution of the ratio of the collated malarkey scores of Rymney to Obiden.
Here's a plot of that distribution with the 95% confidence intervals labeled on either side of the mean ratio. The white line lies at equal malarkey scores between the two tickets.
Interpreted as a true comparison of factuality, the probability distribution indicates that we can expect Rymney's statements are on average 17% more full of malarkey than Obiden's, although we can be 95% confident that the comparison is somewhere between 8% and 27% more red than blue malarkey.
Interpreted as an indicator of the average bias of PolitiFact and The Fact Checker, the probability distribution suggests that, if the two tickets spew equal amounts of malarkey, then the fact checkers on average rate the Democratic ticket's statements as somewhere between 8% and 27% more truthful than the Republican ticket's statements.
I'm going to speak against my subjective beliefs as a bleeding heart liberal and say that amount of bias isn't all that unrealistic, even if the bias is entirely subconscious.
If instead we believed like a moderate conservative that the true comparison was reversed - that is, if we believed that Obiden spewed 17% more malarkey than Rymney - then it suggests that the fact checkers's average bias is somewhere between 16% and 54% for the Democrats, with a mean estimated bias of 34%.
It seems unrealistic to me that PolitiFact and The Fact Checker are on average that
biased against the Republican party, even subconsciously. So while I think it's likely that bias could inflate the difference between the Republicans and Democrats, I find it much less likely that bias has reversed the comparison between the two tickets. Of course, these beliefs are based on hunches. Unlike politifactbias.com's rhetoric and limited quantitative analysis, however, it is based on good estimates of the possible bias, and our uncertainty in it.
It isn't just conservatives that accuse PolitiFact and The Fact Checker of bias. Believe it or not, liberals do, too. Liberals accuse fact checkers of being too centrist in a supposedly misguided quest to appear fair. You can look to Rachel Maddow
as a representative of this camp. Maddow's accusations, like politifactbias.com's, typically nitpick a few choice rulings (which is funny, because a lot of critics on both sides accuse PolitiFact and The Fact Checker of cherrypicking).
Such accusations amount to the suggestion that fact checkers artificially shrink
the difference between the two parties, making the histogram that I showed above incorrectly hover close to a ratio of one. So how much centrist bias do the fact checkers have on average?
Well, let's assume for a moment that we don't know which party spews more malarkey. We just know that, as I've estimated, the fact checkers on average rule that one party spews somewhere between 1.08 and 1.27 times the malarkey that the other party spews. Now let's put on a Rachel Maddow wig or a Rush Limbaugh bald cap and fat suit to become true partisans that believe the other side is actually, say, 95% full of crap, while our side is only 5% full of crap. This belief leads to a ratio of 19 to 1 comparing the malarkey of the enemy to our preferred party. Already, it seems unrealistic. But let's continue.
Next, divide each bin in the histogram I showed above by 19, which is the "true" ratio according to the partisans. The result is a measure of the alleged centrist bias of the average fact checker (at least at PolitiFact or The Fact Checker). Get a load of the 95% confidence interval of this new distribution: it runs from about 6% to about 7%. That is, a partisan would conclude that PolitiFact and The Fact Checker are on average so centrist that their rulings shrink the difference between the two parties to a mere SIX PERCENT of what it "truly" is.
I don't know about you, but I find this accusation as hard to swallow, if not harder, than the accusation that there is minor partisan bias among fact checkers.
Then again, my belief that fact checkers on average get it about right is entirely subjective. Given the data we currently have, it is not currently possible to tell how much partisan bias versus centrist bias versus honest mistakes versus honest fact checking contribute to the differences that I have estimated.
So what is the way forward? How can we create a system of fact checking that is less susceptible to accusations of bias, whether partisan or centrist? Here are my suggestions, which will require a lot of investment and time.
- More fact checking organizations. We need more large-scale fact checking institutions that provide categorical rulings like The Fact Checker and PolitiFact. The more fact checker rulings we have access to, the more fact checker rulings we can analyze and combine into some (possibly weighted) average.
- More fact checkers. We need more fact checkers in each institution so that we can rate more statements. The more statements we can rate, the weaker selection bias will be because, after some point, you can't cherrypick anymore.
- Blind fact checkers. After the statements are collected, they should be passed to people who do not see who made the statement. While it will be possible for people to figure out who made some statements, particularly when they are egregious, and particularly when they are repeated by a specific party or individual, many statements that fact checkers examine can be stripped of information about the individuals or parties involved so that fact checkers can concentrate on the facts.
- Embrace the partisans and centrists. There should be at least one institution that employs professional fact checkers who are, according to some objective measure, at different points along the various political dimensions that political scientists usually measure. So long as they are professional fact checkers and not simply politically motivated hacks, let these obvious partisans and centrists subconsciously cherrypick, waffle, and misrule to their heart's content so that we can actually measure the amount of subconscious bias rather than make accusations based on scanty evidence and fact checker rulings that make our neck hairs bristle.
I hope that Malark-O-Meter will someday grow into an organization that can realize at least one of these recommendations.