A workshop at the University of Guelph May 24-25 2010
OVC LifeLong Learning Centre Rm 1713
Record linkage at the Minnesota Population Center
Ron Goeken, Lap Huynh, TomLenius, and Rebecca Vick (Minnesota Population Center)
This paper will present an overview of methods used to link various samples of the United States Population Censuses to a complete-count database of the 1880 United Statespopulation. Topics include name standardization, construction of similarity scores and the use of support vector machines to classify linked records (SVMs). We discuss our preliminary data release and subsequent work on our final release. Topics include the construction of name commonness scores and birth density measures, and their impact on the final linked data. We also present a number of indirect measures assessing the accuracy of our linked data. We also discuss the construction of weights to deal with linkage ratedifferentials.
An Automated Record Linkage System – Linking 1871 Canadian Censusto 1881 Canadian Census
Luiza Antonie (U of Guelph), Peter Baskerville (U of Alberta),Kris Inwood (U of Guelph) and Andrew Ross (U of Guelph)
This paper describes a recently developed linkage system for historical Canadian censuses and its application for tracking people from 1871 to 1881. The record linkage system incorporates a supervised learning module for classifying pairs of records as matches and non-matches. The classification module was trained using a set of true links that was created by experts. We evaluate the first results and provide a road map for further experimentation.
Using family lineage data to improve record linkage success
David Barss (Family Search)
This paper demonstrates how using lineage linked family data from census records expands the data that can be gathered from the census and thereby improves the record linkage opportunities and success when merging census data with other census years orother record collections. It shows results from using lineage linked data samples and versions of the same data that are not lineage linked. It also shows several family relationships that can be preserved from the census that are lost using the household perspective. As well as extended relationships that can be captured with “derivedrecords” generated as place holders to connect identified family members.
“To Fill Dishonored Graves”: Assembling life course data for transported British convicts
Hamish Maxwell-Stewart (University of Tasmania)
Between 1803 and 1853 69,000 male and 13,500 female convicts were transported to the British penal colony of Van Diemen’s Land, later renamed Tasmania. While the documentation for these individuals is highly detailed, information about death was only occasionally recorded on each convict’s file. Our aim is to fill this gap by linking withother classes of records, including the surgeons’ reports for the voyage toAustralia and the civil death registers for Van Diemen’s Land/Tasmania. As aresult of this we have been able to build up a detailed picture of death ratesduring the voyage to the Antipodes and the initial years in the colony whilethe convicts were still under sentence. We are now attempting to extend thispicture by exploring death rates for former convicts. This process has raised anumber of interesting issues and I will outline a range of approaches we areexploring in an attempt to address these.
What accounts for the movement of rural household heads in Logan Township?
Peter Baskerville (University of Alberta)
This paper provides a first report on a project which has linked/traced residents of Logan township (pop: 3196) in 1871 to an unusually large catchment area: the whole of Canada in 1881 and the United States in 1880. The linkage was done by hand in accordance withrigorous rules to provide a set of true links both for use in testing and establishing a computer generated linkage program and to further a project which focuses on credit and community in Perth County Ontario in the late 19th and early 20th centuries. The paper focuses on household heads in 1871. Of the 521 heads for whom we have no death information, we linked 415 (79.7%) to the Canadian or US census in 1881 and 1880 respectively. We could not link 106 of the 521 heads. Two hundred and ninety-nine (72%) of the linked heads stayed in Logan and 116 (21%) moved. Through a series of logistic regressions this paper seeks to establish the personal, familial, and environmental attributes that most influenced the probability of Loganites persisting or moving in the 1870s and compares the situations of those who moved to a new region with those whopersisted in Logan.
To combine Swedish historical data with modern population registers
Elisabeth Engberg (Umeå University), Maria Larsson (Umeå University) and Maria Wisselgren (Umeå University)
The Demographic Data Base (DDB) started out as a temporary employment project in the early 70’s. The aim of the organization was to computerize parish registers to make them available for research. Today DDB is a national research resource and responsible forensuring that historical data from parish registers and parish statistics are easily available for researchers in both Sweden and other counties. Since the 70’s DDB has digitized parish registers and constructed one of the largest historical population databases in Europe, based on church records from the 18th and 19th century. The individual historical database currently contains information about more than one million people, has a depth of about ten generations, and includes around eighty parishes. The database is available forresearch and has been used by researchers both in the social sciences and humanities as well as in medicine and science, both in Sweden and internationally. However, interest in and demand for population data from the 20th century have increased and the question have been raised about the possibility of linking historical data with modern population registers. In Sweden there is a lack of digitized population data on an individual levelduring the period 1900 to 1950’s, from where Statistics Sweden is having digitized data. In order to meet the present needs within several fields of research, a new infrastructure is being developed by the DDB in close cooperation with Statistics Sweden. In this presentation we will talk about the preparatory work behind this new infrastructure. The different stages of linkage will be in focus as well as methods of secure linkage that has been developed by the DDB.
Reconstructing the history of morbidity: the Hampshire Friendly Society and its records
Martin Gorsky (London School of Hygiene and Tropical Medicine), Aravinda Guntupalli (University of Southampton), Bernard Harris (University of Southampton),Andrew Hinde (University of Southampton)
During the last two decades, economic, social and demographic historians have achieved significant advances in our understanding of the history of health and disease. However, the majority of these studies have been concerned either with the history of ‘positive health indicators’, such as height, or with the history of mortality. It has proved much harder to reconstruct the history of non-fatal illnesses, despite the valiant efforts of researchers such as James Riley, George Alter, Herb Emery and John Murray.
In addition to this, there has also been a growing interest in the use of historical records to understand what is often termed ‘lifecourse epidemiology’. Much of this work was stimulated by David Barker’s research into the impact of early-life experiences on adult development and mortality. However, other researchers, such as George Davey Smith, have argued strongly that we also need to take account of the impact of ‘insults’ across the lifecourse in our efforts to understand mortality at higher ages.
The records maintained by the Hampshire Friendly Society offer an opportunity to address both of these issues. The Society recorded information about the sickness episodes experienced by its members from 1825 onwards. In May 2007 we were awarded a grant by the UK Economic and Social Research Council which enabled us to construct a database containing information about 5552 individuals who joined the Society between1825 and 1939 and experienced sickness episodes between 1825 and 1981. We are currently in the process of analysing these data, and hope to be able to present new results showing how sickness patterns changed over the course of these men’s lives and between different membership cohorts. We also hope to be able to present new results which illustrate the relationship between the sickness episodes experienced in earlier life and later-life mortality.
Parsing data from several sources in the Netherlands with LINKS
Kees Mandemakers (Historical Sample of the Netherlands)
LINKS stands for LINKing System for historical family reconstruction. In first instance the project aims at reconstructing all nineteenth and early twentieth century families in theNetherlands. This reconstruction will be based on GENLIAS, which is a digitized index of all civil certificates from this period. In second instance the system will be enlarged with other sources like church registers, address books, tax registers and other large nominal administrative sources. With the church registers we will attempt to link 18th century material into families as well and make a connection with the 19th century material, especially the death certificates.
LINKS has formulated three requirements for successful reconstruction and dissemination: a) a dynamic parser which converts the input from the sources into a standardized data structure, b) nominal record linkage procedures with self learning capacities and c) a retrieval system including GIS-references and visualizationprocedures. For a schematic overview, see the scheme below. Important is alsothe feedback given to the archives of all kind of errors and inconsistencies that we will find in the sources delivered by the archives themselves. In my presentation I willconcentrate on the parsing part of the project. By parsing we mean converting and standardizing the data from all kinds of sources in a universal format suited for the linking process in a way as efficient as possible. I will elaborate on the different sources and the way we handle/standardize the data before starting the matching process. Because a lot of data are hidden in fields called ‘miscellaneous’ quite an effort is put in systematic scanning,separating and retrieving data out of these fields. Preparing the data for the matching procedures means that quite a lot of redundancy is stored in the database called LINKS_cleaned. Actually LINKS_cleaned is not one database but a system of five interconnected but separated database-systems. In my presentation I will explain this system, sketch the several parts and go into the operational results.
Did Railroads Induce or Follow Economic Growth? Urbanization and Population Growth in the American Midwest,1850-60
Jeremy Atack (Vanderbilt University), Fred Bateman (University of Georgia), Michael Haines (Colgate University) and Robert Margo (Boston University)
Using a newly developedgeographic information system transportation database, we study the impact of gaining access to railtransportation on changes in population density and the rate of urbanization between 1850and 1860 in the American Midwest.Differences-in-differences and instrumental variable analysis of abalanced panel of 278 counties reveals only a small positive effect of railaccess on population density but alarge positive impact on urbanization as measured by the fraction of peopleliving in incorporated areas of 2,500or more. Our estimates imply that one-half or more of the growth in urbanization in the Midwest in thelate antebellum period may be attributableto the spread of the rail network.
Community Trees: From concept to publication
Ray Madsen (Family Search)
A longitudinal perspectives on the French-English staturedivide
John Cranfield (University of Guelph),Kris Inwood (University of Guelph) and Asher Kirk-Elleker (University of Guelph)
In this paper we bring togethermedical examinations during World War One with household information for 9000soldiers located in the 1901 census. The linked data provide evidence ofintergenerational occupational mobility on a scale sufficient to cautionagainst the use of own-occupation as a proxy for socio-economic status duringchildhood. The two measures of socioeconomic status, own-occupation andthat of household head during childhood, lead to analysis of heightdifferentials indicating that Quebec-born and labourers were especially shortand that stature for all groups declined more or less continuously during thelast third of the 19th century. The 1901 census linkage adds newinformation that confirms being francophone is the principle source of smallerstature in Quebec. The intergenerational perspective adds useful detailand sharpens our impressions of significant differences physical well-being byregion and occupational class and of a decline at the end of the 19th century.
How did teacher recruitment and teacher career pathschange as school provision became centralized? The case of Victorian Britain
David Mitch (University of Maryland)
On the accession of Victoria tothe British throne, much of elementary schooling was operated on a private,adventure basis. By the time of her death in 1901, elementary schooling waslargely state funded with an extensive system of rules, inspection andlocalized educational authorities in place to supervise its operation. Onerecurring issue in the history of Victorian education has been whether Victorian elementary school teachers wereincreasingly recruited from those with middle class parentage as standards ofteacher qualification became more rigorous. Another important question is whathappened to the turnover of teachers and the related point of their duration inthe profession as schools were subject to funding according to examinationperformance. This paper will address these questions by linking those listingteacher related occupations in British censuses between 1841 and 1901.
Between family and household: the linkage of civil recordsand census data, a pilot project on Quebec City, 1851-1911
Marc St-Hilaire (University of Laval)and Hélène Vézina (Université du Québec à Chicoutimi)
One of the most hazardouslinkages is the one between a single individual living with his family at onecensus and the young recently married one living with his own family in thenext census. This issue is critical as to inter-generational studies as toindividual biographies. There are not many ways to overcome the problem:Whether the linkage relies on the individual name and surname only (andmarginally – and in a risky way – on the age, the occupation, and the place ofabode); whether other sources are used, which give additional clues tostrengthen the link. This proposal aims to present the result of the use ofmarriage records to help with that kind of problematic linkages. The casepopulation is that of Quebec City, from 1851 to 1911, for which the nominalcensus data has been entered by the Population et histoire sociale de la villede Québec (PHSVQ) project. The linkage involves one cohort by census (10-yearold boys), which is linked to subsequent censuses. The paper presents theresults on the linkage using 1) only census data and 2) marriage records (bothCatholic and non-Catholic), showing how the use of a second source enhances theoverall outcome.
Reconstituting the population of Antigonish, Nova Scotiausing probabilistic record linking
Sue Dintelman (PleiadesSoftware), Tim Maness (Pleiades Software) and David Barss (Family Search)
Reconstituting populations isvaluable for many reasons. Historians and demographers reconstitute populationsto do longitudinal population studies of such things as migration and birthrates. Geneticists use populations to study inheritance mechanisms and to helplocate specific genes. Genealogists are interested in tracing ancestry backthrough time and finding context for their ancestors. Reconstitutingpopulations from vital statistics records such as births, deaths and marriagesand from census records is attractive, but until recently, projects of any sizerequired a large investment. Today with available software and low costhardware smaller organizations and even individuals can undertake populationreconstitution projects. This paper outlines the specific steps used to mergebirth and marriage records from Antigonish Nova Scotia into a genealogy.
A longitudinal database for Norway 1800-2010
Gunnar Thorvaldsen (University of Tromsø)
A population registry islongitudinal database in the sense that the aim is to maintain a continuously updated overview of the population in a geographic area. The registry may benational, regional or local. Longitudinal population registers are topical for two reasons in the historical context. First, as a contemporary methodology,they have for some decades in country after country replaced the traditional population control instruments based on censuses or vital registers. Second,historians in several countries are building historical population registers in order to be able to base their research on continuous collective biographies.This paper illustrates these contemporary and historical developments primarily with examples from Norway, where such registers have a century-long history andwhere we are currently building a historical population register on the basis of the NAPP censuses. Supplemented with other source material, this aims toinclude as many of the 9.7 million people that were born or immigrated between 1735 and 1964 as possible. The latter year marks the introduction of thenational or Central Population Register (CPR), which superseded contemporary local registers with a long history.