Collaborating Institutions


The Chinese Studies Librarian Team in the U.S.: Developing and Providing Supporting Infrastructure for Ming Biographic Data

Since April 2015, a team of Chinese studies librarians from the U.S. has formed to expand librarians' role in the digital scholarship environment in general and to support CBDB’s Ming biographical data collection in specific.

There are three projects that the librarian team is involved in relation to CBDB, which include:

  • Identification and preparation of historical sources for extracting (manually and automatically) Ming biographical data
  • Development of a hierarchical tree of government offices and a mapped list of Ming government official titles
  • Completion of English translation of Ming government official titles  through extracting related entries from Charles Hucker’s Dictionary of Official Titles in Imperial China, as well as seeking community-based translation contributions from Ming scholars across the world, via an online crowdsourcing system developed by the team

Current members of the librarian team are-

  • Xiaohe Ma (Harvard Yenching)
  • Zhaohui Xue (Stanford)
  • Susan Xue (UC Berkeley)
  • Ying Zhang (UC Irvine), project coordinator
  • Martin Heijdra (Princeton), project consultant
  • Thomas Nimick (USMA), project consultant
  • Shouxian Gao 高壽仙 (Beijing Administrative College), project consultant

You will find the 明代職官中英辭典 Chinese-English Dictionary of Ming Government Official Titles which was compiled by Ying Zhang's the Chinese Studies Librarian Team in the U.S team here:


Digging into Data: Automating Chinese Text Extraction

The Automating Data Extraction from Chinese Texts Project aims to provide humanists and social scientists with a means of transforming 2200 years of Chinese texts into structured data. The project will develop an open-source platform (MARKUS) that allows users to apply sophisticated text-mining techniques to a wide variety of historical and literary texts. Users will be able to tag and extract personal names, dates, place names, official titles and postings, kinship ties, other social relationships, and other user-defined content. The platform will be tested against 2000 local histories spanning an 800-year period and roughly 20,000 letters and 500 notebooks dating from the seventh through the thirteenth century. Data extracted from the sample repositories will be used to enrich text-mining applications and will also be made available for research through open-access online databases and data archives.

This project involves research teams from Harvard University (United States), Birmingham University (United Kingdom), National Taiwan University (Taiwan), and the Communication and Empire team at Leiden University overseen by the original PI, Professor Hilde De Weerdt.
The team at Harvard University is the China Biographical Database (CBDB) project. CBDB is abstracting, cleaning and revising dictionary tables from China Biographical Database. CBDB, China Historical GIS and DDBC dictionaries will be transferred to Birmingham team and the Birmingham team will filter out known problems. The Leiden team and National Taiwan University team will use those dictionaries to build the project platform, MARKUS.

You will find more information about Digging into Data: Automating Chinese Text Extraction here:

You can find the MARKUS platform here:


Ming Qing Women's Writings

The McGill-Harvard-Yenching Library Ming Qing Women’s Writing digitization project, MQWW ( is the product of the collaborative effort between McGill University and the Harvard-Yenching Library. Initiated in 2003, the project has been directed by Grace Fong, a specialist in classical Chinese poetry and women’s writing in late imperial China, who was assisted by a McGill digitization team supervised by David McKnight, former director of McGill Library’s Digital Collections Program (now at UPenn). The project consisted of two components: the first was to digitize the entire collection of Chinese women’s writings in the holdings of the Harvard-Yenching Library (94 individual collections and large anthologies). These collections of writings were published during the Ming and Qing dynasties (1368-1644, 1644-1911), the late imperial period when women’s literary culture as well as the printing industry flourished on an unprecedented level, as recent scholarship has rediscovered. Due to the biases of Confucian gender ideology, women’s writing often suffered from marginalization, neglect, and loss. Those that survived into the modern age are mostly housed in hard to access rare book collections in major libraries in China. The digitization project thus aimed to make available a valuable corpus of texts for research on women’s history and culture, an area which has emerged in recent years as one of the most active and innovative subfields in Chinese Studies.

The second component, carried out by the McGill team, has been to build an extensive searchable database to enhance the research scope and potential of these materials. The result is the Ming Qing Women’s Writings website, launched in summer 2005. The database has been designed especially for Chinese women’s writings of the past several hundred years, and contains information on about 5,000 women poets and other writers, more than 10,000 poems mainly written by women, several hundred historical regions, approximately 20,000 scanned images of original texts, and other useful reference information. The database can be searched using many access points, including name of author, title, poem form, social status, region, etc., both in Chinese and Pinyin. The full records contain a great deal of rigorously verified data. Individual collections can be viewed online in their entirety by using the page-viewer developed by digitization team supervised by David McKnight, former director of McGill Library’s Digital Collections Program (now at UPenn).


Collaborating Institutions: University of Zurich

China and the West: 1245-2000: Database hosted by the East Asia Seminar, University of Zurich

Thematic Scope:

Information on any conceivable topic relating to cultural / intellectual / religious forms of contact and exchange between China and the West.

Main thematic areas: missionary history, general and disciplinary history (history of sinology), travel and legation accounts, German language literature, philosophy, art, and history of science.

"China and the West" and CBDB will:

  • Foster the exchange of scholarly information, increase system interoperability between already existing databases and their subparts, especially as far as the as the crosslinking of biographical data in CBDB and CWDB is concerned;
  • Cooperate on data translation in CWDB, especially metadata, but potentially comprising primary data, into English and Chinese;
  • Develop joint research activities in the field of "retrosinification" in CWDB, i.e. the adding of Chinese characters to personal names, place names, other entities, as far as they are not already available, to the CWDB datasets;
  • Promote other activities which enhance the European biographical coverage scope of CWDB and its biographical interface density with CBDB; and
  • Mutually support one another in the development of academic activities which enhance the above mentioned goals.

Kyoto University Institute for Research in the Humanities: Collaborating on Tang Biographical Data

The Database of Tang Figures 唐代人物データベース is part of the the “Tang Knowledgebase Project” at the Center for Informatics in East Asian Studies, Institute for Research in the Humanities at Kyoto University.

As of August 2009 it held biographical data on about 4636 persons. The goal is to provide dates for birth and death where available, otherwise fluorit dates, alternate names, places of origin and other relevant places, kin relations, examination degrees, official postings, writings, and source references.

Information has been collected from existing reference works, anthologies such as the Quan Tang shi 全唐詩 and the Quan Tang wen 全唐文, and builds on the series of reference works for Tang Studies Tang Civilization Reference Series 唐代研究のしおり compiled by HIRAOKA Takeo 平岡武夫 and his collaborators in the 1950s and early 1960s at the Institute for Research in the Humanities, as well as more recent reference works like the Zhongguo wenxuejia dacidian 中国文学家大辞典(唐五代卷) compiled by Zhou Zu 周祖 and his team (Beijing, Zhonghua Shuju, 1992). At the moment, only the most prominent persons have been included, with a focus on persons who left some kind of written record. It is planed to expand the scope and merge with the more extensive material collected as part of the Resources for Tang Studies.

The collaboration with the China Biographical Database project will foster the exchange of scholarly information, increase system interoperability between already existing databases and their subparts, especially with regard to the cross-linking of biographical data in CBDB and the Tang Knowledgebase. We will share biographical data from our respective databases for incorporation into our respective system and provide each other with new data on Tang figures as it is developed.


Individually Volunteering Data

Collaborating Projects are typically the projects of individual scholars rather than institutions who have agreed to make their data available for harvesting by the China Biographical Database project. In some instances CBDB also publishes these datasets and databases through its peer-review process. Collaborating Projects may also have formal agreements to create some level of system interoperability in the case that they maintain their own online systems.



Please click an item of interest on the left.