Structure of the CBDB

 

An Account of the Structure of the China Biographical Database (CBDB)

 

Michael A. Fuller
August 23, 2023

 

The China Biographical Database (CBDB) is a relational database of biographical information for China before the early twentieth century. Through the wide range of data it collects, CBDB offers many ways to examine the lives of past individuals and groups. While CBDB provides detailed information about people and can serve as a biographical dictionary, its more powerful use is as a tool for prosopography, the study of the lives of groups of people.

Relational Databases

A. Relational Database and the Organization of Complex Data

The social historian Robert Hartwell (1932-1996) was concerned with the kinship and social networks of Song dynasty officials and developed a relational database to study collective biographies. CBDB grew from his initial model, which he bequeathed to the Harvard Yenching Institute.

Hartwell realized that he could think of the interactions he saw in biographical data as relations between (1) people, (2) places, (3) a bureaucratic system, (4) kinship structures and (5) modes of social association. He built his relational database to capture this array of biographical data as the relations between these five “things.”

In the current version of CBDB that evolved from Hartwell’s model, we have added three more aspects of social experience through which individuals defined themselves: (6) social institutions like temples, academies, etc., (7) cultural systems for attaining social distinction, and (8) the vast webs of textual production.

This structuring of relationships between entities, categories of “things” in the world, is what a relational database does: it allows one to capture relations between complex objects in the world that interact with one another. That is, place is an entity and the information about place that we systematically collect are the attributes of place as an entity. Similarly, people as a category is another entity. Using a relational database, we can record all the relations between people and places that we consider significant: where they were born, where they moved, where they were buried, and so on. We have the abstract model of relations between entities:

 

A diagram of a person's life

Description automatically generated

This abstract model, when transformed into a relational database, becomes a series of tables to represent the entities and relationships between entities in the system:

 

A group of different types of people

Description automatically generated

 

These tables are of three basic types:
 

1. Tables that describe the basic entities. (The yellow tables PEOPLE and PLACES above) In CBDB, these include people, places, kinship term, bureaucratic structures, and so on. The fields in these tables capture the attributes of these entities that we want to know about. For people, this would include their names, birth and death dates, gender, and the like. For places (“addresses” in CBDB parlance) it would include the administrative level of a place, its superior or subordinate units, and the period of validity. For offices this would include where the office fit in the administrative hierarchy during a particular dynastic period.
 

2. Tables that describe relations between basic entities. (For example, the blue PEOPLE-PLACES table.) In CBDB, these translate the relations between people and their social, physical, and cultural environment into a structured format. The fields in these tables capture the features of the relations that are considered important in describing the relationship. For instance, when a person receives a posting to serve in a bureaucratic office, in addition to the basic information of who the person was and what the office was, we also would like to know (1) where the post was, (2) if the person in fact served, and (3) when he served.
 

3. Tables that describe the types of relations between entities. (The pink PEOPLE-PLACE TYPES table.) Sometimes, there can be many ways for two “things” to interact in the world, and we need to be able to be more specific in recording the details of the interaction. In the example above, people can have many different ways of being related to a place: it might be the place at which they were formally registered, the place at which they actually lived, or the place where they were buried. We can group these relations into categories to give them structure.

 

 

B. Relational Databases and the Interactions of Complex Data

CBDB models the interactions between people and the entities—the “things”—that shape their social world. Some of these entities are easily understood in their “thingness:” places are physical entities, and the official bureaucracy has a substantial structure in premodern Chinese society.

Kinship is a bit more difficult to conceive. Anthropologists have long considered the kinship relations in a society as a structured system: some kinship ties are particularly strong, and societies are organized around these ties. People, that is, are not simply related to one another: their relationship is part of—and acquires meaning through—the kinship system of the society.

“Social relations” as a “thing” is yet more abstract but follows the same principles. If one wants to establish a social relation with another person, the society sets out patterns of what relations are appropriate and significant and what relations are not. Within the system of associations that a society values, “social capital” measures how one has positioned oneself in this network of associations.

The categories that CBDB has created for both kinship and social relations reflect the particular systems of significant distinctions we have encountered as we explore the legacy of information on individuals in premodern China. CBDB, as a relational database, then allows users to explore the interactions between these entities in the lives of groups of individuals. For example, consider the following set of entities and their relations with the basic entity PEOPLE:

 

A diagram of people with text

Description automatically generated

 

The Structure of CBDB

A. An Overview of the Entities in the Database

 

  1. design uses tables to give concrete form to more abstract objects which we simply call “entities.” Since the goal of a database is to capture the relational information about entities, it remains useful to keep the abstract objects separate from the tables that represent their relations. That way, one can more easily ask the question of how the tables need to change to better stand in for the entities they represent.

 

The basic entities in CBDB (with corresponding tables) are:

 

  1. People
  2. Kinship
  3. Social (Non-kin) Associations
  4. Status Categories (modes of social distinction such fame for calligraphy or serving as a monk)
  5. Modes of Entry into Government or other careers (e.g., passing the civil-service examinations, nepotism or the yin protection privilege)
  6. Offices (e.g., a magistrate or general)
  7. Social Institutions in which people collectively participated (from Buddhist temples and Confucian academies to the repair of city walls and bridges)
  8. Texts (including primary texts, secondary texts, and paleographic data). These include the data sources from which CBDB draws its information (this includes primary sources, secondary scholarly compilations, and digital resources).
  9. Places (administrative units) (the names and locations of the places as defined as prefectures, counties, etc.)

 

Next are the relationships between people and other entities

 

  1. Relations defined through Kinship
  2. Relations defined through Social (Non-kin) Association
  3. Status Attributions
  4. Entry into Government or other careers
  5. Postings to Offices
  6. Relations to Social Institutions
  7. Roles in Texts
  8. Relations to Places (administrative units)

 

Then there are relationships between other entities:

 

  1. Administrative Hierarchy of Places (the structure of superior and subordinate administrative prefectures, counties, etc.)
  2. Bureaucratic Organization (the changes in official bureaucracy and its reporting responsibilities over time)

 

B. Details of Tables in CBDB representing Entities and Relationships

 

In this section, I provide the basic structure of the tables in CBDB. Whenever a table includes codes that refer to other tables, I also include the name of the source table, since this relationship between tables is a central feature of relational databases.

 

Note: in relational databases, tables usually are normalized, that is, whenever possible, using just codes that refer to other tables (these are called foreign keys). However, these tables are not user-friendly, since the user will need to look up the codes to make sense of the data. In CBDB, however, tables that begin with ZZZ… are de-normalized tables in which the additional information (e.g., the name of a place in addition to its code) has been restored. Thus, ZZZ_BIOG_MAIN is the expanded version of BIOG_MAIN, etc.

1. People

The table for people (ZZZ_BIOG_MAIN) assigns an ID for each person in the database and captures the following types of information.

 

a. Basic Data: This includes name, male or female, date of birth, and date of death.
 

Precise dates of birth and death often are not available, and all we have is a period of years of activity (“floruit” dates). CBDB gives two floruit years: the earliest and the latest. Often when there is no data for index year (see below) or for birth and death dates, texts nonetheless provide datable references to individuals. CBDB gives the earliest and the latest known dates given in the textual sources we have examined so far. Sometimes, however, not even that is available: we simply know the reign period (nianhao) or dynasty. In order to capture the level of precision in the data, the database allows the use of reign period information for all dates. One can give a specific year within the reign period, but one also can simply indicate “beginning,” “middle”, “end”, or “unspecified.” For analytic purposes, the database will algorithmically produce Western dates from the reign period information for birth, death, years of activity, and any other date given in the traditional Chinese nianhao designation, but it will preserve the vagueness in the nianhao coding.

 

b. Ethnicity and Tribe Affiliation
 

CDBD tracks ethnicity, like Han, Uighur, Tibetan, etc. We have over 465 codes at present. ZZZ_BIOG_MAIN has a field for the code, and the codes are defined in the table ETHICITY_TRIBE_CODES, which organizes ethnicity and tribe designations by group and subgroup and includes variant forms for ethnicity names.

 

c. Choronym

From the Six Dynasties into the Tang, membership in a clan was of central importance in defining one’s social status. From the Song Dynasty onward people did make claims of descent from a particular clan from a particular place (like the Cui clan of Boling) but they carried little social or political weight. The combination of place name and clan name defined a choronym. ZZZ_BIOG_MAIN has a field for the choronym code, and the codes are defined in the table CHORONYM_CODES.

 

d. Index Year

For computational purposes, CBDB needs a single year value to locate a person in time. The index year is an artificial value used in analyses. In earlier versions of the database, index year was based on when the person would have turned 60 sui. However, starting with the 2021 dataset, the index year has been based on the known or projected year of birth. The 26 rules for calculating the value are complex. For more detail, see the User’s Guide.

2. Relations defined through Kinship

An instance of the Kinship relationship for an individual has three components (plus the source information):


person ID (from ZZZ_BIOG_MAIN)
kin ID (from ZZZ_BIOG_MAIN)
kinship relation (from KINSHIP_CODES)
 

This relationship is structured as: “Person A has Person B (the kin) as his/her Kinship Relation.” E.g. {Wang Anshi, Wang Anli, B-} means Wang Anshi has Wang Anli as a younger brother. CBDB captures these kinship relations in the table ZZZ_KIN_BIOG_ADDR_DATA.

The codes for the types of relationships are in the table KINSHIP_CODES. Although CBDB records all the many variations of kinship, searches for kinship networks in CBDB use an important set of four metrics for kinship distance to simplify the vast proliferation of terms. Each relationship code in the KINSHIP_CODES table has values for:

 

ancestor generation (“up”): “father’s generation” = 1, “grandfather’s generation = 2, and so on.
descendent generation (“down”): son = 1, grandson = 2, etc.
collateral relation: “brother” = 1, “brother’s wife’s sister” =2, and so on.
marriage relation: “wife” = 1, “wife’s father’s wife = 2, and so on.

 

Thus brothers, step-brothers, bastard brothers, and adopted brothers all have the same set of values {up = 0; down = 0; collateral = 1; marriage = 0}.

3. Relations defined through Non-kinship Associations

These have a three-part structure: person + association + associate. The major challenge in recording the non-kinship Associations that individuals formed over their lives is to control the proliferation of categories that we encounter in the historical sources.

Because associations are between pairs of people, there must be symmetrical types of associations. That is, if {A “is the student of” B} is in the database, then {B “is the teacher of” A} also should be in CBDB. (The current version of the program automatically generates this second entry.)

In some important cases, associations form through the mediation of institutions or people. CBDB captures these types of relations by adding additional data to associations. For example, we might know of a relation between X and Y because X asked Y to write a biography for his mother’s tomb. To capture all the variations in the ways in which social relations were created, the structure of the table to capture these relations (ZZZ_NONKIN_BIOG_ADDR) has grown quite complex, with a large number of fields:

 

Basic Information

1. Person ID (from ZZZ_BIOG_MAIN)
2. Associated person ID (from ZZZ_BIOG_MAIN)
3. The kind of association (from ASSOC_CODES)
4. The number of objects or events establishing the association

Information about Kinship and Other Relations that played a role in the Association

5. The kinship relation, if the association was established through a relative of the person (from KINSHIP_CODES)
6. The ID of the person whose kinship relation established the association (from ZZZ_BIOG_MAIN)
7. The kinship relation, if the association was established through a relative of the associated person (from KINSHIP_CODES)
8. The ID of the kin of the associate through whom the association was established (from ZZZ_BIOG_MAIN)
9. The ID of the person who claimed the existence of the association: for example, a son claiming it for his father (from ZZZ_BIOG_MAIN)


Time and Place of the Association

10. The place of the association (from ADDR_CODES)
11. The date of the association (year, month, and day, if known)
12. The sequence of an association, if one does not know the actual date


Contextual Information

13. The social institution at or through which the association was established (from ZZZ_SOCIAL_INSTITUTIONS)
14. The occasion on which the association was established (from OCCASION_CODES)
15. The genre of the writing that establishes the association, if relevant (from LITERARYGENRE_CODES)
16. The title of the work that established the association, if relevant
17. The scholarly topic around which the association was formed (from SCHOLARLYTOPIC_CODES)


Source and Notes

18. Source (from TEXT_CODES)
19. Note

4. Social Attributions

CBDB has a table (ZZZ_STATUS_DATA) to take note of a person’s “social distinctiveness,” that for which they are known in society. Since the dating often is uncertain, however, the table has a field to record sequence if known. Some forms of social distinctiveness may combine roles (a Buddhist monk known for his calligraphy, or a literatus who runs a printing firm). At present, CBDB records the different aspects of status under distinct categories. This is a question awaiting future research. The structure of a Status datum for a person is:

 

Status Data
Person ID (from ZZZ_BIOG_MAIN)
Status code (from STATUS_CODES)
Status sequence
Date
Source information and notes

5. Entry

Entry itself is a simple entity, just a name, a type, and a subtype. At present it largely describes entry into government, but CBDB also has begun to track categories like monks’ ordinations. Because different routes of entry entail different types of information, the instance of an entry event for an individual is more complex. If a person enters government through the examination system, for example, we would like to know the type of examination and the date of the degree. (CBDB also tracks failed examinations.) If, in contrast, one enters government through the merit of someone else, the person, and the relationship to the person should also be recorded, if known. Thus if Zhang Weisan entered office through yin protection privilege deriving from his uncle Zhang Jingyi, the entry would be:
 

Person: [ID of] Zhang Weisan
Entry type: [code for] yin
Entry relation type: [code for] Uncle
Entry relation: [ID of] Zhang Jingyi

 

Since it is also possible that one can enter office through the yin privilege of a non-kin associate, the “entry event” will need to have a way to record the non-kinship relation. In the end, then, the entry event (represented in CBDB in the table ZZZ_ENTRY_DATA) has many attributes, only some of which are relevant to any particular instance:
 

Entry Data

Person ID (from ZZZ_BIOG_MAIN)
Entry type code (from ENTRY_CODES)
Entry relation type code (for kin: from KINSHIP_CODES)
Entry associate type code (for non-kin: from ASSOC_CODES)
Entry associate ID (used for both kin and non-kin, from ZZZ_BIOG_MAIN)
Entry test date (both Western and nianhao + year (if known))
Entry test ranking

6. Postings to Office

CBDB currently lists over 32,000 office titles. For the Tang, Song, and Yuan, Ming, and Qing, CBDB also has reconstructed these offices’ place in the dynastic bureaucratic structure. Postings connect people to offices and—since most postings were away from the capital—to places. A person serves in an office at a given rank in particular place at a specified time. However, there are instances when a posting includes jurisdiction over more than one administrative unit, and there are times when a single posting entails more than one official position. Following the rule that one-to-many relations require separate tables, information about postings requires three entities, since one posting may have more than one address and one posting may involve more than one office title: a basic postings table, a posted-to-office table, and a posted-to-office-address table.

 

Posting-Data (POSTING_DATA records posting events)
Posting ID (this is a unique number)
Person ID
Source and Notes

 

Posted-to-Office (ZZZ_POSTED_TO_OFFICE_DATA records the offices assigned in each posting)
Posting ID (from POSTING_DATA)
Office ID (from OFFICE_CODES)
Appointment Type (regular, provisional, honorary, etc.: from APPOINTMENT_TYPE_CODES)
Sequence (since often only the order of office is known with no further information about the years for any of the postings)
Year (both Western and nianhao + year: a person may have duties added while still serving in a post)
Sources and Notes

 

Posted-to-Address (ZZZ_POSTED_TO_ADDR_DATA records the addresses associated with each office in the posting)
Posting ID (from POSTING_DATA)
Office ID (from ZZZ_POSTED_TO_OFFICE_DATA)
Address ID (from ADDR_CODES)

7. Places

CBDB uses a strategy for coding places that derives from the China Historical Geographic Information System (CHGIS) project and relies on the spatial entity addresses.

addresses are specifically historical “instances” of place designation that refer to an administrative jurisdiction. Although administrative jurisdictions such as counties (xian) and prefectures (zhou and fu) were bounded spatial entities, CBDB uses the coordinates for the administrative seat as the address; it does not provide boundaries. Boundaries can be downloaded from CHGIS. If either the boundaries or the name changes, a new address must be created.

These historical instances of addresses are part of administrative hierarchies: this information is preserved in a “belongs-to” table that serves the same function as the “part-of” table in CHGIS. Thus, there are two tables:

 

Address Code (ADDR_CODES)
Address code
Address name
Administrative type
X coordinate
Y coordinate
Address first year
Address last year

 

Belongs to (ZZZ_BELONGS_TO)
Address code (from ADDR_CODES)
Belongs-to Address code (from ADDR_CODES)
Belongs-to first year
Belongs-to last year


To allow the examination of trends across dynastic boundaries, the database needs a way to examine what happens in a particular location over long periods of time. For this, CBDB relies on data about physical location, the x-y coordinates on the map. (In Geographic Information Systems [GIS] research, longitude and latitude typically are referred to as x-y coordinates.) The analytic forms allow one to use the x-y data for the addresses one has selected to define squares around those x-y coordinates and locate additional addresses across time that fall within those squares. These addresses then can be searched across the time period one has specified.

8. Relations to Places

People have many connections to place: where they were born, lived, died, and were buried, where they served in office, where they held property and ran businesses, where they visited. Since these relations to place arise out of activities recorded in separate tables in CBDB (e.g., office holding, and possessions), the information appears in these various tables rather than in one place. The tables that record information about people and places are:
 

Basic biographical information relating to place (ZZZ_BIOG_ADDR_DATA)
Place of official service (ZZZ_POSTED_TO_ADDR_DATA)
The place where a non-kinship relation took place (ZZZ_NONKIN_BIOG_ADDR_DATA)
The place where people participated in social institutions (ZZZ_BIOG_INST_DATA)

 

CBDB now has a form (LookAtPlace) to allow the user to ask questions that integrate all these sources of place information. Note that at present CBDB does not systematically preserve information about places persons briefly visited, where they received their education, or where they wrote texts.

CBDB attempts to associate each person with an index place. As with index year, CBDB assigns these place associations based on available information, but the data is often incomplete. Therefore CBDB uses a hierarchy of categories of place association to assign a person’s index place. CBDB first uses the “basic affiliation” 籍貫, if available. However, this hierarchy of codes to use in assigning the index place may not be the most suitable for particular research projects. Thus, CBDB allows the user to change this order. See the User’s Guide discussion.

9. Texts and Roles in Texts

There are three major types of texts of concern to the database: inscriptional and other paleographic material, printed primary texts, and secondary scholarship (in both print and digital form). Since a work like Huang Zongxi’s Song Yuan xue’an is both a scholarly compendium of earlier writings and a work in its own right, and since the paleographic materials also were written by authors who are of interest to the database, these distinctions for pre-modern texts of any sort are neither clear nor useful. CBDB accordingly treats all three types as texts. Texts have the attributes one can expect:
 

title
category of writing (inscription or manuscript/printed)
genre (the bibliographic categories common to that period)
current publication date
current publisher
current publication location

 

People can relate to the text in a variety of ways. Their roles include:
 

author
publisher
editor
collator
translator
annotator


The two tables for texts include one to record the texts themselves and another to record the roles people play in relation to texts:


Texts Codes (TEXT_CODES)
Text ID
Text Name
Date of composition
Current status: extant or not
Current Publication Information (if extant)

Roles in Texts (ZZZ_BIOG_TEXT_DATA)
Text ID (from TEXT_CODES)
Person ID (from BIOG_MAIN)
Role ID (from the table TEXT_ROLE_CODES)

10. Social Institutions and Relations to Social Institutions

People participated in the lives of their communities in many ways. A man, for example, may have served for several years as the director of an academy. That academy had students during this period: their respective roles in the academy would have served as important social links between the man and the students. The academy also had donors who contributed to its creation and upkeep and helped to define a community centered on the institution. Similar patterns appeared for Buddhist monasteries and Daoist temples.

CBDB is beginning to track this information in a way that captures the uncertainty we find in the historical sources. There are, for example, thirty-nine temples with the name Kaiyuansi 開元寺. A biographical source may tell us that Wang Anshi contributed funds to repairs at a Kaiyuansi, but we may not know (yet) which Kaiyuansi was the recipient. Other sources eventually may clarify the point, but for the moment simply records “a Kaiyuansi.” There are four tables used to record this information:

 

Social Institution Name (SOCIAL_INSTITUTION_NAME_CODES)
Institution Name ID
Institution Name

 

Social Institution (ZZZ_SOCIAL_INSTITUTIONS)
Institution Name ID (from SOCIAL_INSTITUTION_NAME_CODES)
Institution Code (this is a unique ID for each institution: the name may change, but the ID does not.)
Institution Type ID (from SOCIAL_INSTITUTION_TYPES)
Institution Address ID, etc. (from SOCIAL_INSTITUTION_ADDR)
Institution Dates (this includes the beginning and ending years, if known, as well as the first known and last known years

 

Social Institution Addresses (SOCIAL_INSTITUTION_ADDR)
Institution Name ID (from ZZZ_SOCIAL_INSTITUTIONS)
Institution Code (from ZZZ_SOCIAL_INSTITUTIONS)
Address ID (this gives an approximate location by identifying an administrative unit: ADDR_CODES)
XY-coordinates (this may be more precise than the coordinates associated with the Address ID. An institution may move within its locality.)
Address Type (derived from Address ID or recorded independently: (from SOCIAL_INSTITUTION_ADDR_TYPES)
Address Dates

 

Relations to Social Institutions (ZZZ_BIOG_INST_DATA)
Person ID (from ZZZ_BIOG_MAIN)
Institution Name ID (from SOCIAL_INSTITUTION_NAME_CODES)
Institution Code (if only the name is known, CBDB assigns a 0 to this field)
Institutional Role Code (from BIOG_INST_CODES)
Role Dates