01 · Hereditary lineage society · founded 1890

Daughters of the American Revolution

A complete admission record of the National Society Daughters of the American Revolution from its founding in 1890 through the eve of WWII — every member's name, birthplace, parents, husband, the qualifying Revolutionary War Patriot ancestor, and the multi-generation lineage chain proving that descent. Scraped from Ancestry collection 3174, which indexes the original 166-volume DAR-published Lineage Books set.

Members captured
129,719
unique records, single-pass detail scrape
Years covered
1890–1939
DAR founded 1890; vols 1–166
Patriot ancestors
~93,000
unique single-Patriot names
Source volumes
166
~1,000 members each
Data type
Pre-indexed
field-level not OCR
Source
Ancestry 3174
no batch API · detail-page mode

Background

The National Society Daughters of the American Revolution (DAR) is a hereditary lineage society founded in 1890. Membership requires documenting unbroken descent from a person who provided "patriotic service" to the Continental cause during the American Revolution (1775–1783). Each successful applicant submits a multi-generation lineage chain back to that Patriot ancestor. From 1890 onward the DAR published these submissions as the Lineage Books: a 166-volume set, with roughly 1,000 members per volume, transcribing each application as a paragraph of names, dates, places, and a brief summary of the Patriot's service.

For an economic-history project on social capital, civic identity, and inherited status (Chyn & Haggag), the Lineage Books are valuable because they encode both ends of a 4–6 generation family chain: a 1900s/1910s/1920s American woman on one end, and a verified 18th-century Patriot ancestor on the other. The Patriot service descriptions also include years, units, and place of death — research-quality detail you would otherwise pull from the DAR Genealogical Research System (which we also have, separately, as 132,840 Patriot records).

Field-level coverage

129,719 members, admitted between 1890 and roughly 1939. Approximately 1,000 members per published volume. Coverage of the underlying schema:

FieldCoverageNotes
member_name100.0%Member's full name including title prefix
qualifying_ancestor91.4%Patriot ancestor name(s); empty when Ancestry's parser failed on the Relative line
residence99.4%Member's birthplace (the field is mis-labeled "residence" by the standard schema but the DAR Lineage Books actually transcribe birth place)
member_id98.5%DAR national identification number, sequential by admission
lifespan_dates derived78.9%All (YYYY-YYYY) spans found in the lineage narrative
service_years derived66.7%Distinct years 1750–1799 mentioned in the Patriot service blurb
patriot_birth_year derived19.5%Only when single-Patriot AND the name is followed by a parenthetical lifespan in Comments — conservative attribution heuristic
patriot_death_year derived19.5%Same as above

The DAR has had ~1.4 M members in its history and currently has ~185,000 active members. Our 130 K is the pre-WWII admission cohort — every person admitted while the Lineage Books series was being published.

02 · Every column in the table

What each row contains

Each row in the final dataset is one DAR member. Eight fields come straight from Ancestry's structured detail-page table; four more are derived in our Python pipeline by parsing the freeform Comments narrative for date patterns.

FieldDescriptionExample
Identity
member_nameMember's full name with title prefix. 79% Mrs., 20% Miss, 0.6% no prefix.Mrs. Sarah Vacher Williams
member_idDAR national number; sequential by admission order. Useful as a primary key and as a proxy for admission date (low IDs = admitted earlier).17148
residenceMember's birthplace. Free-form transcription with inconsistent state spellings ("N. Y.", "New York", "NY"). Normalize before grouping.Hudson County, New York.
Patriot ancestor
qualifying_ancestorThe Revolutionary War Patriot the membership is claimed through. ~14% of records list multiple Patriots ("X and of Y").Dr. John Francis Vacher, and of Capt. Robert Cochran.
Source & provenance
source_idAncestry record ID for this DAR entry. Stable; safe primary key.10741
source_urlDirect link to the Ancestry detail page.…/3174/records/10741
society_slugStandardized society identifier across the lineage-society datasets in this project.national-society-daughters-of-the-american-revolution
raw_text_excerptThe full Comments narrative (lineage chain + Patriot service description), truncated to ~400 chars for spreadsheet sanity. Use the JSONL for full text.gender=Female | spouse= | comments=…
Date fields derived from Comments
patriot_birth_yearPatriot's birth year. Only set when the Relative names a single Patriot AND that name is followed by a (YYYY-YYYY) span within ~80 chars in the Comments narrative. Rejects matches that would imply a Patriot born after 1770.1751
patriot_death_yearSame attribution rule.1807
lifespan_datesAll (YYYY-YYYY) spans found in Comments, semicolon-delimited. May include the Patriot, his spouse, the great-grandparents, and the member's parents. Use for full lifespan recovery.1739-1829;1751-1807;1735-1824
service_yearsDistinct years 1750–1799 mentioned in Comments, semicolon-delimited. Strongly concentrated in 1775–1783 (the war years) — see the histogram in Distributions.1776;1777
Standard-schema columns left blank for this collection
admission_dateNot exposed by Ancestry for 3174. empty
admission_yearNot exposed. Use member_id as a rough proxy (sequential).
source_yearVolume publication year. We do not currently maintain an ID-to-volume map, but the Internet Archive identifiers in the Sources tab let you reconstruct it.

03 · One real row, picked because it's rich

Example record — DAR ID 17148

One real record from our scrape, picked because it's unusually rich: a multi-Patriot member with a full 5-generation lineage chain and detailed military-service descriptions for both Patriots.

Mrs. Sarah Vacher Williams

DAR ID 17148 · source_id 10741 · vol 18, p 57 · published 1905
BirthplaceHudson County, New York
SpouseGeorge Herbert Williams, M.D.
FatherJohn Van Vorst
MotherEmily Harrimond Bacot
Patriot 1Dr. John Francis Vacher (1751–1807)
Patriot 2Capt. Robert Cochran (1735–1824)
Service years1776, 1777
Admission era~1905 (DAR ID 17148)

Daughter of John Van Vorst and Emily Harrimond Bacot, his wife. Granddaughter of John Van Vorst and Sarah Vacher, his wife; Peter Bacot and Mary Eugenia Cochran, his wife. Gr.-granddaughter of Cornelius Van Vorst and Anna Van Horne, his wife; John Francis Vacher and Sarah Potter, his wife; Col. Charles B. Cochran and Mary Thompson, his wife. Gr.-gr.-granddaughter of Robert Cochran and Mary Elliott (1739–1829), his wife.

John Francis Vacher (1751–1807) was appointed surgeon, 1777, and served through the war. He was an original member of the New York Society of the Cincinnati. He was born near Toulon, France, died in New York City, and was buried in St. Paul's churchyard.

Robert Cochran (1735–1824) served in the naval forces of South Carolina under Commodore Gillon. He was appointed, 1776, captain of the armed cruiser "Notre Dame," to procure materials of war and clothing for the army. In his cruises to France he captured many English vessels with valuable cargoes and gained much important information for the commander-in-chief. When Charleston surrendered he was captured, exiled to St. Augustine, Fla., and was a prisoner until the close of the war. For meritorious and important service he received the unanimous thanks of the South Carolina Legislature. He was born in Massachusetts and died in Charleston, S. C.

04 · Book → HTTP → JSONL → CSV

From source to row

What does it look like end-to-end — from a typeset book page to one row in our final CSV? The trail has four hops, all visible on the example record above (Mrs. Sarah Vacher Williams, DAR ID 17148).

Page 57 of DAR Lineage Book vol 18, showing the start of Mrs. Sarah Vacher Williams's entry
The source. Page 57 of Lineage Book of the National Society of the Daughters of the American Revolution, Volume XVIII (1905). Each member entry is one paragraph: the member's name, DAR national number, birthplace, husband, descent claim, multi-generation lineage chain, and a free-form description of the Patriot's military service. Internet Archive scan.
Page 58 of DAR Lineage Book vol 18, showing the rest of the Vacher Williams entry plus the next member
The source, continued. Page 58. The Vacher Williams entry spills onto this page with the Patriot biographies for John Francis Vacher and Robert Cochran. The next member (Mrs. Sophie Holland Boucher, DAR ID 17149) starts immediately below — typical layout, no record breaks within a page.

The pipeline below is what gets us from the typeset paragraph to one CSV row.

1. Ancestry's pre-built index gives us a recordId

Ancestry parsed the 166-volume set into a structured index in collection 3174, assigning each member a stable Ancestry recordId. Our scraper iterates recordId 1..150,000 sequentially. Member 17148 happens to live at:

GET https://www.ancestry.com/search/collections/3174/records/10741

Note that recordId (10741) is not the same as member_id (17148). The recordId is Ancestry's internal ID; the member_id is the DAR national number visible on the typeset page. The mapping is messy — recordIds are not contiguous (~13% are 404s with no underlying record) and the relationship between recordId and member_id is not monotone.

2. The detail page returns server-rendered HTML

Each detail page is ~150 KB of HTML. The 8 structured fields are encoded as <dt>Label</dt><dd>Value</dd> pairs in the page body. There is no JSON island, no batch endpoint for this collection — verified via Playwright network capture. Every record requires its own HTTP request:

→ HTTP 200, ~150 KB text/html

<dt>Source Name</dt><dd>Mrs. Sarah Vacher Williams</dd>
<dt>Gender</dt><dd>Female</dd>
<dt>Birth Place</dt><dd>Hudson County, New York.</dd>
<dt>Spouse</dt><dd>George Herbert Williams</dd>
<dt>Relative</dt><dd>Dr. John Francis Vacher, and of Capt. Robert Cochran.</dd>
<dt>Father</dt><dd>John Van Vorst</dd>
<dt>Mother</dt><dd>Emily Harrimond Bacot</dd>
<dt>Comments</dt><dd>Mrs. Sarah Vacher Williams.DAR ID Number: 17148; …</dd>

3. We parse to JSONL with one immediate write per record

The parser pulls each <dt>/<dd> pair, normalizes whitespace, and pulls the DAR ID out of the Comments narrative via regex. One JSONL line per record, written immediately:

{
  "Name": "Mrs. Sarah Vacher Williams",
  "Gender": "Female",
  "Birth Place": "Hudson County, New York.",
  "Relative": "Dr. John Francis Vacher, and of Capt. Robert Cochran.",
  "Father": "John Van Vorst",
  "Mother": "Emily Harrimond Bacot",
  "Spouse": "George Herbert Williams",
  "Comments": "…full lineage chain + both Patriot biographies…",
  "DAR ID Number": "17148",
  "recordId": 10741,
  "_status": 200,
  "_url": "https://www.ancestry.com/search/collections/3174/records/10741"
}

4. JSONL → CSV adds derived date columns

The Comments narrative is the only place dates appear in this collection — no structured year field is exposed. Our converter (05_to_csv.py) post-processes Comments with regex to surface four useful date columns (patriot_birth_year, patriot_death_year, lifespan_dates, service_years). The Vacher Williams record yields:

patriot_birth_year:  (blank — multi-Patriot)
patriot_death_year:  (blank — multi-Patriot)
lifespan_dates:      1739-1829;1751-1807;1735-1824
service_years:       1776;1777

For multi-Patriot records like this one, the per-Patriot birth/death-year columns are deliberately left empty — there is no reliable way to attribute a parenthetical span like (1751-1807) to one particular Patriot when the Relative field names two of them. The full set of spans is preserved in lifespan_dates so a researcher can refine downstream.

05 · Geography · service years · birth decades · top patriots

Distributions across the corpus

Four views of the data: where members were born, when their Patriot ancestors served, when those Patriots were born, and which Patriot names recur most. Together they sketch the demographic shape of the collection.

Where members were born

Top 20 birthplaces of DAR members, after normalizing state-name conventions ("N. Y.", "New York", "NY" all map to NY):

NY18,471
MA10,166
PA10,164
IL8,272
OH8,069
CT6,137
MO4,911
IA4,799
IN4,478
MI3,832
GA3,815
ME3,574
WI2,999
KY2,968
VT2,947
NH2,916
VA2,622
NJ2,439
SC2,421
TN2,356

Top six states (NY, MA, PA, IL, OH, CT) account for ~47% of all members. The Northeast plus the early-Midwest (IL, OH, IN, MI, IA) dominate. Southern colonial states (VA, SC, NC, GA) appear in moderate counts. The South's lower share reflects both Confederate-era disruption to state-society organization and the demographic facts of who the granddaughters and great-granddaughters of Patriots actually were when the DAR was forming members in the 1890s–1910s.

Patriot service years

Each Patriot biography in the Comments narrative typically names one or more years — the year of commission, the year of capture, the year of regimental service. We extracted every distinct year 1750–1799 from the Comments column. The histogram is exactly what an American history student would predict: a ~1750–1774 baseline (people getting older), then a sharp spike in 1775–1783 (the war), then a gradual drop-off through the 1790s.

17502,399
17551,935
17602,855
17652,152
17702,570
17711,922
17722,276
17732,168
17743,338
177514,615
177620,716
177718,153
177811,500
17798,327
17809,898
17818,138
17824,915
17833,907
17853,013
17902,960
17952,241
17992,183

Selected years; full histogram in the first-look JSON. The 1776 peak (20,716 mentions) is roughly 60% above the second-most-common year (1777). The pre-1775 mentions are mostly birth-year-of-grandparents or "served in French and Indian War" references; the post-1783 tail is "received pension in 1791," "moved to Ohio in 1795," etc.

Patriot birth decades

For the 25,329 records (19.5% of the corpus) where we could confidently attribute a single Patriot's birth/death years, here is the distribution of birth decades. The shape is exactly what you would expect for a Revolutionary War cohort: a sharp peak in the 1750s (15-25 years old at the start of the war), with the 1740s and 1760s as the secondary peaks.

1700s20
1710s224
1720s1,249
1730s3,671
1740s5,930
1750s9,508
1760s4,635
1770s55

The thin 1700s, 1710s, 1720s tail captures the older officer corps (people who were ~50+ years old at war start). The handful of 1770s births are typically drummer-boys or late-war militia.

Most-claimed Patriots

Top 20 single-Patriot names by descendant count. Common 18th-century names dominate (John, William, James) — many of these "John Brown"s are different people across multiple states. A future research pass could disambiguate via the residence + service-description blob.

1John Hart83
2John Brown69
3John Williams56
3John Davis56
3James Moore56
6Peter Norton53
7John Reed52
8James Smith51
9James Williams48
10William White46
10John White46
10William Brown46
10William Smith46
14John Thompson45
14John Smith45
14William Henshaw45
17Thomas Marshall44
18Samuel Smith43
18John Allen43
20John Harris42
21Samuel Adams41

Samuel Adams (the Boston Tea Party Patriot) appears at #21 — but this is almost certainly a mix of the Samuel Adams plus other unrelated Samuel Adamses. Name-collision is the dominant story in this leaderboard.

06 · ~6 days · 1 worker · Cloudflare-aware

How we collected it

Unlike most Ancestry collections, 3174 has no batch endpoint — the imageviewer JSON API that gave us 80× speedups on collections 2204 (SAR Applications) and 2221 (Great Registers) returns empty results for 3174. The collection is index-only: there are no scanned images, only typeset book transcriptions. Verification was definitive — three independent tests:

  • Every 3174 record's "report-issue" link encodes imageId="" and indexOnly=true. Compare to 2204, where the same link encodes a real imageId like 32596_242028-00010.
  • Calling /imageviewer/api/record/index-panel-data?dbId=3174&imageId=X returns HTTP 200 with {"records": []} for every imageId tried — empty string, raw recordIds, even known-good 2204 imageIds.
  • A Playwright headful capture of every XHR fired by Ancestry's own UI on a 3174 detail page shows no batch endpoint. The image-viewer sidebar that powers 2204's batch path doesn't exist for 3174 because the collection has no images.

So the only path is one HTTP request per record, fetching the ~150 KB detail HTML and parsing the <dt>/<dd> fields. To stay under Cloudflare's bot threshold:

  • 1 worker (vs 3 for image-backed collections — detail HTML is more aggressively rate-limited)
  • 0.3–0.6 s jitter per request via polite_get
  • Connection pool reset every 100 fetches to defeat per-connection bot scoring (cf_bm, TLS session tickets, HTTP/2 stream history). Cost per reset is one fresh TLS handshake, ~100 ms, basically free.
  • 24-hour cookie freshness guard — the scraper auto-exits cleanly when cookies pass that age, before Cloudflare's clearance token can drift far enough to trigger a soft block. A macOS launchd daemon watches the local cookies.json and rsyncs to UCLA on every refresh; the running scraper picks up new cookies at the next session reset (~5–7 min) without restart.
  • Append-only JSONL with immediate flush, plus a defensive trailing-newline guard at writer open-time — defends against losing records to a SIGTERM mid-write.

Sustained rate: ~0.27 records/sec (~24 records/min, ~1,500/hour). Total wall-clock for 150,000 recordIds: ~6 days, with one ~5-min cookie-refresh interruption per day.

07 · 4 done · 2 future · 7 caveats

Status & caveats

Phase status

PhaseStateOutcome / notes
1. Endpoint probedoneDetail-HTML mode confirmed for 3174. Imageviewer batch endpoint definitively dead via 3-test verification.
2. Detail-page scrape (UCLA)done~6 days at 0.27 r/s, 1 worker, 7 cookie-refresh cycles. 0 soft blocks, 0 IP cooldowns, 6 clean cookie-guard exits.
3. JSONL → CSVdone129,719 rows; 4 derived date columns added.
4. First-look analysisdoneDistributions, missing-value patterns, attribution coverage. See the Distributions tab.
5. ID-to-volume mapfutureFor approximating admission_year from member_id. Each volume's title page records its publication year (e.g., vol 18 = 1905). Reconstruct via the IA scans listed in Sources.
6. Patriot disambiguationfuture"John Brown" appears 69 times. Within-state + service-description hashing should split most ambiguous names. Useful for cross-walk to DAR Genealogical Research System (132,840 Patriots already collected).

Caveats

  • No structured dates. Ancestry's parser for collection 3174 captures Name, Gender, Birth Place, Father, Mother, Spouse, Relative, and a long Comments narrative — but no birth year, death year, marriage year, or admission year as discrete fields. All dates here are extracted from Comments via regex; treat them as approximate and verify on the source for any individual case.
  • Birth-place column is a free-form transcription. The same state appears as "N. Y.", "New York", "NY". The Distributions tab presents canonicalized counts; the raw CSV preserves Ancestry's spelling. Some entries are city, state ("Hudson County, New York"); others are just state ("Illinois."). Strip trailing periods before parsing.
  • Multi-Patriot membership. 14% of records list multiple Patriot ancestors (~"X and of Y", "X, Y, and Z"). For these, our heuristic intentionally leaves patriot_birth_year / patriot_death_year empty rather than guessing which Patriot a given (YYYY-YYYY) span belongs to. The raw spans are preserved in lifespan_dates.
  • member_id is mostly sequential, but not perfectly so. 1,172 records share an ID with at least one other record — a mix of multi-volume reprints, parser hiccups on Comments where the regex pulled the wrong digits, and a small tail of bogus 7-digit values. For analysis purposes, consider source_id (Ancestry recordId) the safer key and member_id the human-meaningful but imperfect cross-reference.
  • Patriot-name leaderboard is name-collision-heavy. "John Brown" appears 69 times — these are mostly different John Browns from different states, not 69 descendants of one man. State + service-description hashing would be required to disambiguate before treating any leaderboard row as an actual Patriot.
  • Approximately 9% of records have no Patriot recorded. Ancestry's parser failed on ~11,000 entries, typically because the original typeset paragraph used unusual punctuation that broke their column extraction. The Comments narrative for these records is intact; a researcher could extract the Relative manually if needed.
  • Books are typeset, not scanned-then-OCR'd. Unlike microfilm-based Ancestry collections, the underlying source for 3174 is the originally printed and indexed DAR Lineage Book set. Field values are 100% clean text from the printed page — no OCR errors, no handwriting transcription noise. The trade-off: no microfilm-style image to inspect on dispute. Use the Internet Archive scans (Sources tab) for visual verification.

08 · Books · archives · cross-walks

Sources & provenance

  • National Society Daughters of the American Revolution. Lineage Book of the National Society of the Daughters of the American Revolution. Vols 1–166, published 1890–1939.
  • Ancestry.com collection 3174, U.S., Daughters of the American Revolution Lineage Books, 1890-1939. Pre-indexed; no batch endpoint.
  • Internet Archive — lineagebookNNdaug series: complete scans of vols 1–67, full-text OCR available. Sample identifier: lineagebook1817daug (vol 18, 1905; pictured in Source → row). Search prefix: archive.org → DAR Lineage Books.
  • Internet Archive — lineagebookNNrevogoog series: 28 Google-Books-sourced scans, vols 1–28+.
  • Internet Archive — lineagebooknatioNNNdaug series: 8 later-volume scans (vols 28, 31, 33, 34, 35, 48, 134, 136).
  • DAR Genealogical Research System (Patriot side, separately scraped in this project): 132,840 records covering ~110K verified Patriots and their service descriptions. Cross-walks to the Patriot names in this collection; see data/lineage_societies/dar_grs/extracted/members.csv.