Summary of the reporting session of CBDB Visiting Scholars (2021-2022)

The reporting session of CBDB Visiting Scholars (2021-2022) was held by the China Biographical Database Project (CBDB) and Fairbank Center for Chinese Studies at CGIS Knafel K350, on the afternoon of July 29th, 2022.

Dr. Lingling Gu from Zhejiang University took "Information collation and research of female authors in Ming and Qing Dynasties" as the title, introduced Gui Hai Yin 閨海吟, the book edited by Du Xun of Peking University, which contains the biographies and work information of more than 8600 female authors, with a total of more than 1.2 million words. It is a book with the largest number of talented women in ancient times so far. Second, she described the main contents of her work in CBDB. After OCR of this book, she used VS Code to capture each woman's name, kinship, social relationship and book information. Thirdly, she matched the information of this book with CBDB and the Ming Qing Women's Writings. Fourthly, she analyzed the data of women in this book and CBDB from multiple dimensions.


Professor Yangbo Zhou of Suzhou University of Science and Technology introduced two batches of biographical data in Song Dynasty in his report "Data Mining of Song Dynasty Biographical Sources:Take Poets Societies and Children For Two Examples". Professor Zhou believe there were two batches of Song dynasty biographical data supplemented to CBDB in my past half year. One is about 98 poets societies include more than 400 poets, another is about more than 400 children whose life track had been made to chronology. Some data are ready-made in CBDB, other hundreds of data are supplemented through data input interface one by one or in batches. These data can be visualized by Gephi & QGIS, which are the important materials for social network analysis about literati in Song Dynasty.


Professor Shalan Sun from Chengdu University of Information Technology, introduced the process of extracting Ming Hua Lu 明畫錄 as the basic text for data mining of Ming dynasty painters' character information based on the characteristics of complete data, systematic content and regular style in her report "Data Mining and analysis of Biographical information of Artists in Ming Dynasty, Taking Ming Hua Lu as an example". In order to improve the data accuracy, manual checking the raw data is essential based on the work of data mining group. Firstly, utilizing the CBDB ACCESS to carry on person name disambiguation. Secondly, referring to historical materials and research resources form Erudition, CNKI and Duxiu to correct the raw data. At last, 828 records of artists’ name, 941 records of artists’ alias, and 634 records of artists’ address have been extracted from Ming Hua Lu, which is the supplement to CBDB about biographical information of artists in Ming dynasty. With the digital analysis tool of QGIS and Gephi, the research treat artists of Ming Dynasty as entities that have relationships to their kin and their social associations, so as to set up the complete art prospect in that era.


Ms. Fanjing Kong of Northeast Normal University gave a report titled "The Association between CBDB and Harvard Yenching Ancient Books and Rare Books". In order to supplement the author information of nearly 20,000 writing records in CBDB and to include more authoritative writing information to CBDB, three methods were used to carry out the association between CBDB and Harvard Yenching old&rare books. "Association" is mainly carried out from two perspectives: the association between author information and CBDB people, and the association between bibliographic information and CBDB books. The specific methods are as follows, the first is HOLLIS API, this is used to extract bibliographic information of old&rare books. Second is to extract bibliographic information of old&rare books from Harvard Yenching Book List. The third is FUZZYLOOKUP (fuzzy matching plugin of Microsoft Excel), which is used to extract the bibliographic information of Harvard Yenching old & rare books provided by Zhonghua Book Company. The matching degree of the book titles and author names are set respectively. The matching degree is sorted from high to low, and then the disambiguation is made manually.


The topic of Dr. Lixiang Qian’s report is “Creating the World of Ming Anthologies - Data Compilation and Research on Ming Dynasty Anthologies”, with the aim of clarifying the publication situation of Ming dynasty anthologies. My report is divided into four parts: first, the source and magnitude of Ming anthologies data; second, the content and evaluation of Ming anthologies data; third, the processing of information on Ming anthology authors, titles, and volumes; and fourth, the study of Ming anthologies digitization. The report uses GIS, econometric statistics, and social relationship networks. The report has five conclusions: first, clarified the best-selling authors in Ming dynasty; second, analyzed the less popular writers from the long tail theory; third, clarified who were the writers with the most volumes engraved and printed and how much it cost; fourth, clarified which places had the most writers in Ming dynasty; fifth, sorted out the relationship between anthology publishing and Ming dynasty publishing history.


Dr. Shanhui Huang from Peking University took " Data Mining and Analysis in Local Gazetteers -- A Case study of officials and courier posts(驛站) in of Jiangxi Tongzhi 江西通志" as the title. This study is composed of three parts: officials data mining and analysis under the standardized workflow; exploratory courier posts(驛站)extraction, adjustment and analysis; And takeaways and further research, in order to show data mining methodology and potential analysis conclusions in local gazetteers. The first part is the data mining and analysis of official records. The second part is exploratory courier posts data extraction, adjustment and analysis. And the third part is based on the data mining and analysis in Jiangxi Tongzhi, it is the most efficient and reliable method to carry out supplementary analysis through mix methods, such as combining regular expression and BERT model. As for the potential research, the comparison of historical and modern place names can be considered to carry out through fuzzy retrieval and matching of the names of courier posts, and the introduction of slope and other natural elements information for fitting, can further improve the accuracy rate. At the same time, spatial network analysis and spatial correlation analysis can be carried out to better test the influencing factor.


Dr. Qin Yu from Zhejiang University took "An insight into index year calculation in China Biographical Database" as the title, introduced the concept of "index year", which is the Chinese Biographical Database (CBDB) pioneered the time, creatively replacing the vague time frame of historical figures with an estimated specific time value, so as to incorporate more figures into the calculation on time dimension in the database. The index year is defined as a character's birth year, or a character's birth year calculated by combining the important time points of the person and his/her relatives with the "20 rules". The "20 rules" are summarized based on a large quantity of known exact time information in the database, which priorities were arranged in descending order, so they are theoretically reliable. The results of the random sample data combined with the historical investigation prove that the application of lower priority rules is more treacherous than the repeated use of the inferred results. Therefore, when there are multiple possible paths, the calculation rule with higher priority should be applied first to reduce errors, while check points should be set up. Once the result exceeds the preset difference limit, the further calculation of other people’s value based on it will be abandoned to curb the spread of errors in the database.


Dr. Hao Yu from Fudan University took "The Textual Research of Geographic Name in China Biographical Database (CBDB)" as the title. He believe the information of native place is an important content in the information of people included in CBDB database, which can be visualized on the map by GIS. As for the data of figures in the Ming Dynasty, due to the phenomenon of military households, it becomes an important content of improving CBDB database, especially the geographic display of CBDB, to investigate the geographical location of military sites and other military institutions in the Ming Dynasty. By registering the relevant maps in the Chinese historical atlas and checking the current geographical locations of military institutions in the Ming Dynasty, the author examines the location information of the military sites and other military institutions in the Ming Dynasty in CBDB, and on this basis considers the continuity of the city sites, military defense facilities and roads in the historical period.


Dr. Yang Li from Capital Normal University took "Data exploration of Christian missionaries in China based on CBDB" as the title. Dr. Li introduced the first of her study is regarding to people’s information of database, she believe that the main purpose of CBDB is not to be a biographical dictionary, but to be a character relationship. Character information may be insufficient from a people's individual perspective. However, from the perspective of the group, it is little-known missionary groups that restore the social network closer to the historical situation at that time. Researchers can discover problems that are different from our intuition and common cognition, and truly return to the historical scene instead of constantly repeating the information of character information. Second, since the researchers have just started accumulating missionary data in CBDB, she has conducted research on relevant data platforms in the field of Christian China. The researchers have summarized them into three categories: digital library-like database, narrative-oriented database, and multiple-relation database. Third, she has explored the data analysis and geographic information visualization in terms of data application.


Dr. Likun Kang from Peking University took "Data of ‘Three Li’ in Jing Yi Kao” as the tittle, and introducedJing Yi Kao" is a specialized catalogue of Chinese classical literature. It records 3 volumes of imperial commentary and classics and records 27 categories and 297 volumes of classics literature. Its text style is standardized and easily accessible for data mining. Most of the characters with biographical information in the "Three Rites" part of "Jing Yi Kao" are unknown scholars. Extracting data from this limited corpus beneficially complements the biographical data on the CBDB database. In addition to complementing the CBDB database, two interesting phenomena were found via data analysis. First, the results of data analysis demonstrate that Zhu Yizun's most frequently cited literary sources are Huang Yuji, "Min Shu" and Lu Yuanfu. Among them, Huang Yuji and Lu Yuanfu are both colleagues and friends of Zhu Yizun in the Hanlin Academy. This data thus verifies historical facts. After importing the data into QGIS for analysis, we discovered that the ancestral home of scholars who have biographical information in the "Three Rites" of "Jing Yi Kao" is Yongjia. Zhu Yizun once took refuge in Yongjia in the first year of Kangxi, and he had a lot of affection for Yongjia. He once wrote more than 60 poems about Yongjia. This phenomenon may indicate Zhu Yizun's more general tendency to select works and biographical information.


Professor Peter K. Bol of Harvard University, CBDB project Mr. Manager Hongsu Wang, Fellow of Fairbank Center for Chinese Studies Mr. Kwok-leong Tang, and visiting scholars of Harvard University including Professor Ma Min, Professor He Zhaohui, Professor Liu Lingbo and Professor Ye Hua also participated in the discussions.


