Could Someone Still Be Collecting a Civil War Widow's Pension? A Possibility Proof
20 February 2024 | 8:37 pm

In 1865, a 14-year-old boy becomes a Union soldier in the U.S. Civil War. In 1931, at age 90, he marries an 18-year-old woman, who continues to collect his Civil War pension after he dies. Today, in early 2024, she is one hundred and ten years old, still collecting that pension.

I was inspired to this thought by reflecting about some long-dead people my father knew, who survive in my memory through his stories. How far back might such second-hand memories go? Farther than one might initially suppose -- in principle, back to the 1860s. An elderly philosopher, alive today, might easily have second-hand memories of William James (d. 1910) or Nietzsche (d. 1900), maybe even Karl Marx (d. 1883) or John Stuart Mill (d. 1873).

Second-hand memories have a quality to them that third-hand memories and historical accounts lack. Through my father's and uncle's stories, I feel a kind of personal connection to Timothy Leary (d. 1996), B.F. Skinner (d. 1990), and Abraham Maslow (d. 1970), even though I never met them, in a way I don't to other scholars of the era. It hasn't been so long since their heyday in the 1950s - 1960s, when my father and his brother knew them -- but I might still have several decades in me. My son David, currently a Cognitive Science PhD student at Institut Jean Nicod at ENS in Paris, has also heard such stories, and he could potentially live to see the 22nd century. (My daughter Kate was too young when my father died to have made much of his academic stories.)

The idea that the U.S. might still be paying a Civil War widow's pension is not as ridiculous as it seems. According to this website, the last pension-recieving Union widow died in 2003. According to this website, it was 2008. The last recipient of a Civil War children's benefit died from a hip injury in 2020.

GPT-4 representation of an elderly civil war widow in a cityscape in 2020:


What Types of Argument Convince People to Donate to Charity? Empirical Evidence
16 February 2024 | 4:46 pm

Back in 2020, Fiery Cushman and I ran a contest to see if anyone could write a philosophical argument that convinced online research participants to donate a surprise bonus to charity at rates statistically above control. (Chris McVey, Josh May, and I had failed to write any successful arguments in some earlier attempts.) Contributions were not permitted to mention particular real people or events, couldn't be narratives, and couldn't include graphics or vivid descriptions. We wanted to see whether relatively dry philosophical arguments could move people to donate.

We received 90 submissions (mostly from professional philosophers, psychologists, and behavioral economists, but also from other Splintered Mind readers), and we selected 20 that we thought represented a diversity of the most promising arguments. The contest winner was an argument written by Matthew Lindauer and Peter Singer, highlighting that a donation of $25 can save a child in a developing country from going blind due to trachoma, then asking the reader to reflect on how much they would be willing to donate to save their own child from going blind. (Full text here.)

Kirstan Brodie, Jason Nemirow, Fiery, and I decided to follow up by testing all 90 submitted arguments to see what features were present in the most effective arguments. We coded the arguments according to whether, for example, they mentioned children, or appealed to religion, or mentioned the reader's assumed own economic good fortune, etc. -- twenty different features in all. We recruited approximately 9000 participants. Each participant had a 10% chance of winning a surprise bonus of $10. They could either keep the whole $10 or donate some portion of it to one of six effective charities. Participants decided whether to donate, and how much, before knowing if they were among the 10% receiving the $10.

Now, unfortunately, proper statistical analysis is complicated. Because we were working with whatever came in, we couldn't balance argument features, most arguments had multiple coded features, and the coded features tended to correlate between submissions. I'll share a proper analysis of the results later. Today I'll share a simpler analysis. This simple analysis looks at the coded features one by one, comparing the average donation among the set of arguments with the feature to average donation among the set of arguments without the feature.

There is something to be said, I think, for simple analysis even when they aren't perfect: They tend to be easier to understand and to have fewer "researcher degrees of freedom" (and thus less opportunity for p-hacking). Ideally, simple and sophisticated statistical analyses go hand-in-hand, telling a unified story.

So, what argument features appear to be relatively more versus less effective in motivating charitable giving?

Here are our results, from highest to lowest difference in mean donation. "diff" is the dollar difference in mean donation, N is the number of participants who saw an argument with that feature, n is the number of arguments containing that feature, and p is the statistical p-value in a two-sample t test (without correction for multiple comparisons). All analyses are tentative, pending double-checking, skeptical examination, and possibly some remaining data clean-up.

Predictive Argument Features, Highest to Lowest

Does the argument appeal to the notion of equality?
$3.99 vs $3.39 (diff = $.60, N = 395, n = 4, p < .001)

... mention human evolutionary history?
$3.93 vs $3.39 (diff = $.55, N = 4940, n = 5, p < .001)

... specifically mention children?
$3.76 vs $3.26 (diff = $.49, N = 4940, n = 27, p < .001)

... mention a specific, concrete benefit to others that $10 or a similar amount would bring (e.g., 3 mosquito nets or a specific inexpensive medical treatment)?
$3.75 vs $3.44 (diff = $.41, N = 1718, n = 17, p < .001)

... appeal to the diminishing marginal utility of dollars kept by (rich) donors?
$3.69 vs $3.29 (diff = $.40, N = 2843, n = 27, p < .001)

... appeal to the massive marginal utility of dollars transferred to (poor) recipients?
$3.65 vs $3.25 (diff = $.40, N = 3758, n = 36, p < .001)

... mention, or ask the participant to bring to mind, a particular person who is physically or emotionally near to them?
$3.74 vs $3.34 (diff = $.34, N = 318, n = 3, p = .061)

... mention particular needs or hardships such as clean drinking water or blindness?
$3.56 vs $3.23 (diff = $.30, N = 4940, n = 49, p < .001)

... refer to the reader's own assumed economic good fortune?
$3.58 vs $3.31 (diff = $.27, N = 3544, n = 35, p < .001)

... focus on one, single issue? (e.g. trachoma)
$3.61 vs $3.40 (diff = $.21, N = 800, n = 8, p = .07)

... remind people that giving something is better than nothing? (i.e. corrective for drop-in-the-bucket thinking)
$3.56 vs $3.40 (diff = $.15, N = 595, n = 6, p = .24)

... appeal to the views of experts (e.g. philosophers, psychologists)?
$3.47 vs $3.39 (diff = $.07, N = 2629, n = 27, p = .29)

... reference specific external sources such as news reports or empirical studies?
$3.47 vs $3.40 (diff = $.07, N = 1828, n = 18, p = .41)

... explicitly mention that donation is common?
$3.46 vs $3.41 (diff = $.05, N = 736, n = 7, p = .66)

... appeal to the notion of randomness/luck (e.g., nobody chose the country they were born in)?
$3.43 vs $3.41 (diff = $.02, N = 1403, n = 14, p = .80)

... mention religion?
$3.35 vs $3.42 (diff = -$.07, N = 905, n = 9, p = .48)

... appeal to veil-of-ignorance reasoning or other perspective-taking thought experiments?
$3.29 vs $3.23 (diff = -$.14, N = 4940, n = 8, p = .20)

... mention that giving could inspire others to give? (i.e. spark behavioral contagion)
$3.29 vs $3.43 (diff = -$.14, N = 896, n = 9, p = .20)

... explicitly mention and address specific counterarguments?
$3.29 vs $3.45 (diff = -$.15, N = 1829, n = 19, p = .048)

... appeal to the self-interest of the participant?
$3.22 vs $3.49 (diff = -$.30, N = 2604, n = 22, p < .001)

From this analysis, several argument features appear to be effective in increasing participant donations:

  • mentioning children and appealing to the equality of all people,
  • mentioning concrete benefits (one or several),
  • mentioning the reader's assumed economic good fortune and the relatively large impact of a relatively small sacrifice (the "margins" features), and
  • mentioning evolutionary history (e.g., theories that human beings evolved to care more about near others than distant others).
  • Mentioning a particular near person might also have been effective, but since only three arguments were coded in this category, statistical power was poor.

    In contrast, appealing to the participant's self-interest (e.g., that donating will make them feel good) appears to have backfired. Mentioning and addressing counterarguments to donation (e.g., responding to concerns that donations are ineffective or wasted) might also have backfired.

    Now I don't think we should take these results wholly at face value. For example, only five of the ninety arguments appealed to evolutionary history, and all of those arguments included at least two other seemingly effective features: particular hardships, margins, or children. In multiple regression analyses and multi-level analyses that explore how the argument features cluster, it looks like particular hardships, children, and margins might be more robustly predictive -- more on that in a future post. ETA (Feb 19): Where the n < 10 arguments, effects are unlikely to be statistically robust.

    What if we combine argument features? There are various ways to do this, but the simplest is to give an argument one point for any of the ten largest-effect features, then perform a linear regression. The resulting model has an intercept of $3.09 and a slope of $.13. Thus, the model predicts that participants who read arguments with none of these features will donate $3.09, while participants who read a hypothetical argument containing all ten features will donate $4.39.

    Further analysis also suggests that piling up argument features is cumulative: Arguments with at least six of the effective features generated mean donations of $3.89 (vs. $3.37), those with at least seven generated mean donations of $4.46 (vs. $3.38), and the one argument with eight of the ten effective features generated a mean donation of $4.88 (vs. $3.40) (all p's < .001). This eight-feature argument was, in fact, the best performing argument of the ninety. (However, caution is warranted concerning the estimated effect size for any particular argument: With approximately only 100 participants per argument and a standard deviation of about $3, the 95% confidence intervals for the effect size of individual arguments are about +/- $.50.)

    ------------------------------------------------------

    Last month, I articulated and defended the attractiveness of moral expansion through Mengzian extension. On my interpretion of the ancient Chinese philosopher Mengzi, expansion of one's moral perspective often (typically?) begins with noticing how you react to nearby cases -- whether physically nearby (a child in front of you, about to fall into a well) or relationally nearby (your close family members) -- and proceeds by noticing that remote cases (distant children, other people's parents) are similar in important respects.

    None of the twenty coded features captured exactly that. ("Particular near person" was close, but neither necessary nor sufficient: not necessary, because the coders used a stringent standard for when an argument invoked a particular near person, and not sufficient since invoking a particular near person is only the first step in Mengzian extension.) So I asked UCR graduate student Jordan Jackson, who studies Chinese philosophy and with whom I've discussed Mengzian extension, to read all 90 arguments and code them for whether they employed Mengzian extension style reasoning. He found six that did.

    In accord with my hypothesis about the effectiveness of Mengzian extension, the six Mengzian extension arguments outperformed the arguments that did not employ Mengzian extension:

    $3.85 vs $3.38 (diff = $.47, N = 612, n = 6, p < .001)

    Among those six arguments are both the 2020 original contest winner written by Lindauer and Singer and also the best-performing argument in the present study -- though as noted earlier, the best-performing argument in the current study also had many other seemingly effective features.

    In case you're curious, here's the full text of that argument, adapted by Alex Garinther, and quoting extensively, from one of the stimuli in Lindauer et al. 2020

    HEAR ME OUT ON SOMETHING. The explanation below is a bit long, but I promise reading the next few paragraphs will change you.

    As you know, there are many children who live in conditions of severe poverty. As a result, their health, mental development, and even their lives are at risk from lack of safe water, basic health care, and healthy food. These children suffer from malnutrition, unsanitary living conditions, and are susceptible to a variety of diseases. Fortunately, effective aid agencies (like the Against Malaria Foundation) know how to handle these problems; the issue is their resources are limited.

    HERE'S A PHILOSOPHICAL ARGUMENT: Almost all of us think that we should save the life of a child in front of us who is at risk of dying (for example, a child drowning in a shallow pond) if we are able to do so. Most people also agree that all lives are of equal moral worth. The lives of faraway children are no less morally significant than the lives of children close to us, but nearby children exert a more powerful emotional influence. Why?

    SCIENTISTS HAVE A PLAUSIBLE ANSWER: We evolved in small groups in which people helped their neighbors and were suspicious of outsiders, who were often hostile. Today we still have these “Us versus Them” biases, even when outsiders pose no threat to us and could benefit enormously from our help. Our biological history may predispose us to ignore the suffering of faraway people, but we don't have to act that way.

    By taking money that we would otherwise spend on needless luxuries and donating it to an effective aid agency, we can have a big impact. We can provide safe water, basic health care, and healthy food to children living in severe poverty, saving lives and relieving suffering.

    Shouldn't we, then, use at least some of our extra money to help children in severe poverty? By doing so, we can help these children to realize their potential for a full life. Great progress has been made in recent years in addressing the problem of global poverty, but the problem isn't being solved fast enough. Through charitable giving, you can contribute towards more rapid progress in overcoming severe poverty.

    Even a donation $5 can save a life by providing one mosquito net to a child in a malaria-prone area. FIVE DOLLARS could buy us a large cappuccino, and that same amount of money could be used to save a life.


    Grade Inflation at UC Riverside, and Institutional Pressures for Easier Grading
    9 February 2024 | 3:25 pm

    Recent news reports have highlighted grade inflation at elite universities: Harvard gave 79% As in 2020-2021, as did Yale in 2022-2023, compared to 67% in 2010-2011. At Harvard, the average GPA has risen from 2.55 in 1950 to 3.05 in 1975 to 3.36 in 1995 to 3.80 now. At Brown, 67% of grades were As in 2020-2021, 10% Bs, and only 1% Cs. It's not just elite universities, however: Grades have risen sharply since at least the 1980s across a wide range of schools.

    I decided to look at UC Riverside's grade distributions since 2013, since faculty now have access to a tool to view this information. (It would be nice to look back farther, but even the changes since 2013 are interesting.)

    The following chart lists grade distributions quarter by quarter for the regular academic year, from 2013 through the present. The dark blue bars at the top are As, medium blue Bs, light blue Cs, and red is D, F, or W.

    [click to enlarge and clarify]

    Three things are visually obvious from this graph:

  • First, there's a spike of high grades in Spring 2020 -- presumably due to the chaos of the early days of the pandemic.
  • Second, the percentage of As is higher in recent years than in earlier years.
  • Third, the percentage of DFWs has remained about the same across the period.
  • In Fall 2013, 32% of enrolled students received As. In Fall 2023, 45% did. (DFW's were 9% in both terms.)

    One open question is whether the new normal of about 45% As reflects a general trend independent of the pandemic spike or whether the pandemic somehow created an enduring change. Another question is whether the higher percentage of As reflects easier grading or better performance. The term "inflation" suggests the former, but of course data of this sort by themselves don't distinguish between those possibilities.

    The increase in percentage As is evident in both lower division and upper division classes, increasing from 32% to 43% in lower division and from 33% to 49% in upper division.

    How about UCR philosophy in particular? I'd like to think that my own department has consistent and rigorous standards. However, as the figure below shows, the trends in UCR philosophy are similar, with an increase from 26% As in Fall 2013 to 41% As in Fall 2024:

    [click to enlarge and clarify]

    Lower division philosophy classes at UCR increased from 25% As in Fall 2013 to 40% As in Fall 2023, while upper division classes increased from 26% to 47% As.

    Smoothing out quarter-by-quarter differences, here is the percentage of As, Fall 2013 - Spring 2014 vs Winter 2023 - Fall 2023 for Philosophy and some selected other disciplines at UCR for comparison:
    Philosophy: 27% to 43% (28% to 42% lower, 25% to 46% upper)
    English: 20% to 33% (15% to 28% lower, 38% to 64% upper)
    History: 28% to 52% (23% to 52% lower, 48% to 52% upper)
    Business: 28% to 46% (20% to 24% lower, 29% to 49% upper)
    Psychology: 32% to 51% (33% to 51% lower, 31% to 51% upper)
    Biology: 22% to 38% (28% to 36% lower, 17% to 41% upper)
    Physics: 26% to 39% (26% to 37% lower, 40% to 41% upper)

    As you can see, in some disciplines at some levels, the percentage of As has almost doubled over the ten-year time period.

    UCR is probably not unusual in the respects I have described. However, if other people have similar analyses for their own institutions, I'd be interested to hear, especially if the pattern is different.

    I doubt, unfortunately, that students are actually performing that much better. UCR philosophy students in 2023 were not dramatically better at writing, critical thinking, and understanding historical material than were students in 2013. I conjecture that the main cause of grade inflation is institutional pressures toward easier grading.

    I see two institutional pressures toward higher grades and more relaxed standards:

    Teaching evaluations: Generally students give better teaching evaluations to professors from whom they expect better grades.[1] Other things being equal, a professor who gives few As will get worse evaluations than one who gives many As. Since professors' teaching is often judged in large part on student evaluations, professors will tend to be institutionally rewarded for giving higher grades, ensuring happier students who give them better evaluations. Professors who are easier graders, if this fact is known among the student body, will also tend to get higher enrollments.

    Graduation rates: At the institutional level, success is often evaluated in terms of graduation rates. If students fail to complete their degrees or take longer than expected to so do because they are struggling with classes, this looks bad for the institution. Thus, there is institutional pressure toward lower standards to ensure high levels of student graduation and "success".

    There are fewer countervailing institutional pressures toward higher rigor and more challenging grading schemes. If classes are too unrigorous, a school might risk losing its WASC accreditation, but few well-established colleges and universities are at genuine risk of losing their accreditation.

    At some point, the grade "A" loses its strength as a signal of excellence. If over 50% of students are receiving As, then an A is consistent with average performance. Yes, for some inspiring teachers and some amazing student groups, average performance might be truly excellent! But that's not the typical scenario.

    I have one positive suggestion for how to deal with grade inflation. But before I get to it, I want to mention one other striking phenomenon: the variation in the grade distributions between terms for what is nominally the same course. For example, here is the distribution chart for one of the lower division classes in UCR's Philosophy Deparment:

    [click to enlarge and clarify]

    The distribution ranges from 11% As in Fall 2014 to 72% As in Fall 2020.

    Some departments in some universities have moved to standardized curricula and tests so that the same class in each term is taught and graded similarly. In philosophy, this is probably not the right approach, since different instructors can reasonably want to focus on different material, approached and graded differently. Still, that degree of term-by-term variation in what is nominally the same class raises issues of fairness to students.

    My suggestion is: sunlight. Let course grade distributions be widely shared and known.

    Sunlight won't solve everything -- far from it -- but I do think that in looking at students' teaching evaluations, seeing the professor's grade distribution provides valuable context that might disincentivize cynical strategies to inflate grades for good evaluations. I've evaluated teaching for teaching awards, for visiting instructors, and for my own colleagues, and I'm struck by how rare it is for information about grade distributions even to be supplied in the context of evaluating teaching. A full picture of a professor's teaching should include an understanding of the range of grades they are distributing and, ideally, random samples of tests and assignments that earn As and Bs and Cs. This situates us to better celebrate the work of professors with high standards and the students in their classes who live up to those high standards.

    Similarly, grade distributions should be made available at the departmental and institutional level. In combination with other evidence -- again, ideally random samples of assignments awarded A, B, and C -- this can help in evaluating the extent to which those departments and institutions are holding students to high standards.

    Student transcripts, too, might be better understood in the context of institutions' and departments' grading standards. This would allow viewers of the transcript to know whether a student's 3.7 GPA is a rare achievement in their institutional context, or simply average performance.

    --------------------------------------------------

    [1] A recent study suggests that grade satisfaction might be the primary driver of the correlation between students' expected grades and their course evaluations, rather than grading leniency per se -- these can come apart when a student is satisfied with their grade as a result of their hard work for it -- but grading leniency is an instructor's easiest path to generating student grade satisfaction, generating the institutional pressure.



    More News from this Feed See Full Web Site