Hands on with the Federalist Papers

Peter Ralph

2026-02-04

The Federalist Papers

From the Library of Congress:

The Federalist, commonly referred to as the Federalist Papers, is a series of 85 essays written by Alexander Hamilton, John Jay, and James Madison between October 1787 and May 1788. The essays were published anonymously, under the pen name “Publius,” in various New York state newspapers of the time.

image of the cover

from Wikimedia

A mystery

Of 85 essays,

  • five were written by John Jay
  • three were written by Alexander Hamilton and James Madison together
  • 15 were written by James Madison
  • 51 were written by Alexander Hamilton
  • but three were claimed* by both Madison and Hamilton

\(^*\) in Hamilton’s case, in a list written two days before his death by duel.

Mosteller and Wallace (1963)

Table 2.1 of counts of ‘by’, ‘from’, and ‘to’, from Mosteller and Wallace

The results (roughly)

Alexander joins forces with James Madison and John Jay To write a series of essays defending the new United States Constitution, entitled The Federalist Papers. The plan was to write a total of 25 essays, the work Divided evenly among the three men. In the end, they wrote eighty-five essays in the span of six months. John Jay got sick after writing five James Madison wrote twenty-nine. Hamilton wrote the other fifty-one!

Miranda, 2015

Mosteller and Wallace took a very hands-on and guided approach, counting numbers of certain ‘filler’ words per block of 200 words:

The words in an article were typed one word per line on a long paper tape, like adding machine tape. Then with scissors the tape was cut into slips, one word per slip . … When the counting was going on, if someone opened the door, slips of paper would fly about the room.

Mosteller, 2010, quoted in Text As Data

An exploratory approach

We’ll start with some exploration. A knee-jerk response to this question (“who wrote these eleven papers”) might be: “hey let’s do dimension reduction and see if the authors cluster”.

What do you think about this?

  • If the authors do cluster, why do you think this would happen? (i.e., where’s the signal for this coming from?)
  • If the authors do cluster, then does this give a good answer to the question?
  • Are there ways that this might mislead us?
  • What’s a more direct way you might answer the question?

Let’s have a look

set-up

import json, re
import pandas as pd
import numpy as np
import plotnine as p9
import scipy
import sklearn.decomposition, sklearn.manifold

The data

You can get the data from this file: data/federalist.json. It is a text file, where each line is a JSON entry, containing: author, text, date, title, paper_id, and venue.

with open("data/federalist.json", 'r') as f:
    text = [json.loads(line) for line in f]

info = pd.DataFrame(
    { k: [t[k] for t in text] for k in ['author', 'date', 'title', 'paper_id', 'venue']}
).assign(length = [len(t['text'].split(" ")) for t in text])
info
author date title paper_id venue length
0 HAMILTON NaN General Introduction 1 For the Independent Journal 1468
1 JAY NaN Concerning Dangers from Foreign Force and Infl... 2 For the Independent Journal 1513
2 JAY NaN The Same Subject Continued (Concerning Dangers... 3 For the Independent Journal 1310
3 JAY NaN The Same Subject Continued (Concerning Dangers... 4 For the Independent Journal 1480
4 JAY NaN The Same Subject Continued (Concerning Dangers... 5 For the Independent Journal 1214
... ... ... ... ... ... ...
80 HAMILTON NaN The Judiciary Continued, and the Distribution ... 81 From McLEAN's Edition, New York 3562
81 HAMILTON NaN The Judiciary Continued 82 From McLEAN's Edition, New York 1409
82 HAMILTON NaN The Judiciary Continued in Relation to Trial b... 83 From MCLEAN's Edition, New York 5306
83 HAMILTON NaN Certain General and Miscellaneous Objections t... 84 From McLEAN's Edition, New York 3816
84 HAMILTON NaN Concluding Remarks 85 From MCLEAN's Edition, New York 2464

85 rows × 6 columns

What’s the plan?

  1. look at, and clean the data
  2. do PCA on the word count matrix
  3. think about the results
  4. adjust how we’re doing the PCA and iterate

Cleaning:

The re.sub function will be useful.

def clean(t):
    return t

Counting:

Once it’s cleaned, let’s count:

from collections import Counter

words = np.unique(" ".join([clean(t['text']) for t in text]).split(" "))

def tabwords(x, words):
    d = Counter(x.split(" "))
    out = np.array([d[w] for w in words])
    return out

wordmat = np.array([tabwords(clean(t['text']), words) for t in text])

PCA

Let’s do it “by hand” with SVD:

x = wordmat

pcs, evals, evecs = scipy.sparse.linalg.svds(x, k=4)
eord = np.argsort(evals)[::-1]
evals = evals[eord]
evecs = evecs[eord,:]
pcs = pcs[:,eord]
pc_df = pd.concat([
    info,
    pd.DataFrame({f"PC{k+1}" : pcs[:,k] for k in range(pcs.shape[1])})
], axis=1)
loadings = pd.DataFrame(evecs.T, columns=[f"PC{k+1}" for k in range(pcs.shape[1])], index=words)