Explaining the Figures in Latest State of Our Unions

12.12.2011, 1:20 PM

Recently, Professor Philip Cohen raised concerns about the methodology of the new State of Our Unions report. We are happy to respond to his criticisms.

One of Cohen’s critiques was that some of the figures show similar predicted probabilities for the different groups. Indeed, Cohen stated that they are alarmingly similar. This similarity arises from the way that we chose to treat non-significant dummy variables when calculating predicted probabilities from the logistic regression models.

We faced a (common) dilemma as we created figures from the regression models: Should we create the figures by including the variables and intercepts with non-significant coefficients or should we create the figures using only those variables and intercepts that are significant? The first solution will yield figures that more closely correspond to reality but will also depict non-significant differences–potentially misleading readers. The second solution will yield more technically correct figures but can bring about the confusion that Dr. Cohen has noted—namely, that the figures don’t display differences that may exist between the groups in the real world. We chose to create the figures under the assumption that any non-significant, non-zero coefficient was due to chance and so we represented it with a 0 in constructing the figures. All significant coefficients (including control covariates which were held at their means) were included in constructing the figures.

This was a decision we made prior to estimating the predicted probabilities so as to avoid biasing the results in one way or another. We made the same decision for all of the figures, and as noted, when there were nonsignificant differences between groups, the figures do end up looking the same. In the spirit of consistency, we choose to publish the figures anyway, in part to avoid leaving ourselves open to the criticism of “cooking” the figures, but also because we thought this was the most internally consistent way of presenting the data.

Dr. Cohen also critiques the discrepancies in predicted probability ranges in the figures. This discrepancy stems from the same source. If none of the control variables were significant and the intercept was not significant, individuals with the low levels of the independent variables would have a predicted probability of being “very happy” of 50%. The only differences would then occur for those with high levels of the independent variable who would have a predicted probability below or above 50% depending on the sign of the coefficient. This accounts for the differences between Figures 13 and A1. The intercept and some of the control covariates were significant in the model for A1; they were not significant in the model used to generate Figure 13. Again, we chose to stay consistent with our basic methodological approach in constructing our figures.

Do these decisions render the work “apparently-shoddy?” We have never once had a reviewer comment negatively on this practice in peer-reviewed studies in which we have graphed figures in the same way. Thus, it seems that using only the statistically significant coefficients in constructing the figures is a defensible technique to generate figures from regression equations.

Finally, Figures 12 and 13 deserve special comment because of the way that the independent variable was modeled. Our unit of analysis when it came to religious attendance and belief was married couples. We choose to classify couples based on the religious attendance/beliefs of both the husband and wife, thus leading to our four categories (e.g., neither attends, wife only attends, husband only attends, both attend). For several of the key outcomes once we controlled for other variables, the differences between the first three of these groups was non-significant, and thus the calculated predicted probabilities were identical. For other outcomes, these groups did have statistically significant differences as the figures demonstrate. Again, for this analysis, we were modeling types of married couples based on the combination of their religious attendance/beliefs, and differences between these typologies were of most interest to us.

As one of Cohen’s commentators notes, an alternate approach would be to treat husbands’ and wives’ religious attendance as separate main effects with an interaction term to see if there is an added impact when both of them attend. We plan to perform these analyses as well, but they answer a slightly different, though still very important question.

We thank Professor Cohen for his comments and criticisms. We hope this comment helps to better explain some of the reasoning behind our methodological approach to the report.

Jeffrey Dew and W. Bradford Wilcox

3 Responses to “Explaining the Figures in Latest State of Our Unions”

  1. For folks who want to read the post Brad is responding to, you can find it here, on Professor Cohen’s blog.

    There’s an interesting criticism from someone in the comments there:

    They note that, “If participants reported attending religious worship services at least once every week, they were coded as attending weekly. If they both reported that they jointly attended they were coded as both attending.” As such, the “true” impact of attending service is your individual effect plus the interaction effect. For women, this is .15+.56. For men, it is -.12+.56. Based on this (and assuming that the rest of their model is correct and that a better intercept–based on the other figures–is -.2 and not 0), my ball park estimate is something like a 45% happiness for those who don’t attend services, 50% for women who attend without their husbands, 42% for men who attend without their wives, and 62% for wives who attend with their husbands, and 56% for husband who attend with their wives. So people who do things with their spouses like their spouses more than people who do things without their spouses.

    Brad, do you know, is this person correct that there’s a significant difference in happiness between those who reported attending weekly services with their spouse versus those who attended without their spouse?

  2. In any type of regression where dummy coded variables are used to represent groups, one group (the “omitted” or “comparison” group) is compared to all the others. In the report, the couples where neither spouse attended weekly was the omitted group. The only group that was significantly more likely to be very happy in their marriage than the “neither attend regularly group” was the group in which both spouses attended regularly.

    Even if there were differences in the likelihood of being very happy in marriage between those couples where both spouses attend and those where only one spouse attends (as the commenter on Dr. Cohen’s blog suggests) our model could not test those differences. It could only test differences between the omitted group and the three other groups. And just because differences exist, it does not mean that they are statistically significant.

    ~Jeffrey Dew and W. Bradford Wilcox

  3. Neal says:

    I think I was confused about the variable coding. I read, “If participants reported attending religious worship services at least once every week, they were coded as attending weekly,” to mean that if participants reported attending religious worship services at least once every week, they were coded as attending weekly. That is why I assumed that the impact of both going to church was the sum of the individual and the joint effect. But in the response above, it looks like this indicator was only used when a person attend without their spouse.

    I would also add that Wald or likelihood-ratio tests are useful for testing whether model parameters are equivalent or are jointly significant. As Dew and Wilcox point out, the significant tests are only in comparison to the excluded category, and so whether or not an individual variable is “significant” can depend on what the reference group is. Andrew Gelman has a nice article on this called, “The difference between significant and not significant is not always significant” link.

    More generally, I think that most modern statistical packages can now produce predicted values along with their associated standard errors. For example, Stata (since version 11, I think) has the “margins” command. I have not seen one, however, that lets you use some variables in the model but not for predicting values. So if that is your preference, computing by hand is probably the way to go.