In my study on the recorders of the Zhuzi yulei 朱子語類 (Conversations of Master Chu, Arranged Topically), I use MARKUS to identify personal names, place names, and official titles in the text.
Then I extract the recorders’ main geographical addresses and the associated geographical coordinates by linking the person ids provided in MARKUS to the China Biographical Database (CBDB).
Putting these addresses on a map, we can observe the geographical distribution of the recorders.
Figure 1 Geographical Distribution of the Recorders (by Mao Yuan-heng)
We can also get all Zhu Xi’s disciples from CBDB and do a comparison with the previous distribution map.
Figure 2 Geographical Distribution of Zhu Xi’s Disciples (by Mao Yuan-heng)
Some items in the Zhuzi yulei have more than one recorder -- probably a result of transcription or because multiple disciples heard the same conversation from Zhu Xi. We can use this as a criterion to group the recorders. If we suppose that those who recorded the same item share a tie and group them together, we can obtain the following social network graph.
We can simplify the graph by only counting those who recorded the same item more than five times:
Thus in terms of recording, we can separate the recorders into groups based on those who recorded the same items.
MARKUS also makes it easy and fast to get some specific information in a text, such as which place names, personal names or official titles are mentioned most frequently.
Chu Ping-tzu, Associate Professor, National Tsing Hua University
The Song cultural elite were the first to develop an intense interest in collecting and studying ancient ritual bronze objects from the “Three Dynasties.” Several of the art catalogues and commentaries they compiled, some with illustrations, survive to this date. These antiquarian works provide information about the collectors, the items they collected, and their thoughts about antiquities. Moreover, these authors often referred to their predecessors from one or two generations before them and recorded their interactions with contemporary collectors and connoisseurs. The antiquarian works shed light on not only the Song elite’s study of antiquities but also their social relations with other collectors. With the aid of MARKUS and CBDB, my goal is to reconstruct the collecting circles for each of the antiquarian works, which will serve as a foundation for further analysis.
Here, I am using Jinshi lu 金石錄* by Zhao Mingcheng 趙明誠 (1081–1129) as an example to demonstrate how to extract information using MARKUS, how to process that information, and finally, the problems I encountered. What I show here is part of an ongoing experiment, and I welcome any comments and suggestions.
I. Extracting information
For each of the antiquarian works, I created Excel spreadsheets containing data of the collected items, collectors, and provenance by manually inputting information from earlier research. MARKUS might be useful in double-checking the missing data, particularly for the collectors mentioned in Jinshi lu, because Zhao Mingcheng used formulaic phrases when referring to the collectors: 藏…氏 and 藏…家. With “Keywords helper” in the Keyword function, the lists of collectors’ names were extracted instantaneously, as demonstrated in the following images.
Step 1: Keywords helper “藏…氏”
Step 2: Keywords helper “藏…家”
The results from Step 1 and 2 have high relevancy, as shown in the two previous screenshots.
Step 3: Keywords helper “舊藏…家”
Zhao Mingcheng occasionally revealed the transmission of antiquities by using the term舊藏 to record the names of former collectors. Running the Keywords helper again with “舊藏…家” allowed me to identify the earlier collectors from the results of Steps 1 and 2.
With Keywords helper, I could identify two missing collectors and thereby make my dataset more complete. In the case of Jinshi lu, although MARKUS was extremely effective in extracting collectors’ names, as previously demonstrated, when applying the keywords in the lists to mark up the text, a problem occurred. The application did not tag those terms that had automatic location markups, such as 方城范氏. MARKUS does not allow adding multiple tags to a term.
II. Processing the data
With the names of the collectors and connoisseurs found and extracted, the next step was to find their social relations in CBDB, import the association data to Gephi, and produce a social network graph of Zhao Mingcheng on the basis of the records in Jinshi lu, as shown in the following image. For greater legibility, the nodes with only 1 edge are purposely hidden. The bold, weighted lines are associations extracted from Jinshi lu, whereas the thin lines are social associations collected from CBDB.
Total: Nodes: 388; Edges: 443
After filtering out the nodes with only 1 edge: Nodes: 42 (10.82% Visible); Edges: 97 (21.9% Visible)
The juxtaposition of the association from Jinshi lu with other types of associations from CBDB allows us to explore and understand how the collectors related to one another in a larger social context. Some nodes are noticeable in the graph. Looking at the centrality values generated by Gephi is also helpful in identifying people of greater influence in the network. For instance, the betweenness centrality that quantifies the bridging capability of the nodes indicates Zhao Mingcheng, Chao Buzhi 晁補之, Liu Qi 劉跂, and Mi Fu 米芾 as the most pronounced communicators in the network. The reason for this is certainly worthy of further investigation.
The same approach was applied to four other antiquarian works of the Northern Song to generate network graphs for each work. Comparing these network graphs allows for an assessment of their relative scale and complexity, and taking the temporal dimension into consideration further affords a delineation of the long-term development of the collecting circles. Through the visualization of the social networks, I could observe how the circles related to one another and how the shape of the overall picture changed over time. As an historian and art historian working primarily with illustrated books and objects, I have not used the markup function of MARKUS extensively. But MARKUS is a versatile tool, with excellent Keyword functions for extracting information. MARKUS could also be a useful tool for students who want to read texts in classical Chinese. With various built-in dictionaries, one can read and understand the text more efficiently. In conclusion, what is presented here is a work-in-progress project. Comments and suggestions are welcome.
- The editions used in this study are:
Ya-hwei Hsu, Assistant Professor, National Taiwan University
I use MARKUS as part of a project to data-mine the Daoist and Buddhist Canons for materia medica terms. I use MARKUS in concert with a team in Taipei at Dharma Drum Institute for the Liberal Arts (DILA) to mark up texts. Having selected my textual set, I then go through each juan that we have identified as important. I first mark up the drug terms using Keyword Markup. This then draws my eye to the “action” of the text. I then analyse how drug knowledge is being recorded in that particular juan, and mark up a sample passage for important related terms. These can include alternate drug names, place names, illness terms, anatomical terms and many more. By marking up the sample, I establish what I expect the ontology of the drug knowledge in that text to be. I then forward the text to the team members at DILA, who clean up the rest of the automatic tags and follow my lead to completely mark up that juan. Once I have checked their work we forward it to National Taiwan University for uploading into Docusky. (For more details on the other parts of the project please see here).
MARKUS is an exceptional tool for its ease of use and speed. We have budgeted that we can mark up roughly 16 juan per month, with one person doing it in Taipei. This means that over the course of a year, we can complete roughly 200 juan. This figure is very useful for project design and planning. Knowing this, we can analyse a set of texts and know roughly how long (and how much) it will take to get them marked up. Currently we are marking up Six Dynasties texts for drug terms and recipes. We can use our results at the end of a year to calculate how much and how long it would take to mark up the entire Buddhist and Daoist canons.
Once the marked-up texts are loaded into Docusky, we will be able to analyse them based on all kinds of criteria. Where do different communities source their materia medica? What sects use which drugs, and at what time in history? Can we see recipes moving across the corpuses? We will be able to do large-scale analysis in ways that have never been attempted before. We can plot relationship graphs to see how close the drug repertoires are. The exciting part about this is that we are not just trying out how to research materia medica. This method could be applied for any repertoire of terms that one wants to learn about – pantheons of deities, for example, or ritual objects and bodily cultivation practices. Finally, historians doing close readings of texts will no longer have to assume that their text is representative. They can do statistical analyses to establish at scale how representative the texts they use for close-up analysis are. So I feel that this is the beginning of a new digital methodology.
There are a number of directions where MARKUS could fruitfully grow to become even more useful, but it will require investment of time and resources to make these possible. For example, it would be a great benefit if we could draw relationships between the terms in a given paragraph or text. This is important not just for prosopography, but for detailed recipe analysis as well, as drug interaction is a fundamental part of recipe design in Chinese medicine. Right now, I cannot tell if two drug names in the same paragraph are alternate names for the same substance, working in parallel in the recipe, or are antagonistic to each other. We need to refine the tool to make these relationships visible.
Furthermore, it’s very important to be able to mark up textual layers in an easy and user-friendly way. Historical Chinese texts are always layered – whether it be commentary, or different editions that have been combined, or if the text is a composite of different hands that have been spliced together. Marking these layers is critical not only for analysing the contents of the text, but dating them. It would be extremely useful even just to publish digital editions where the different layers have been marked up, allowing others to do digital analysis of the different layers.
MARKUS is very well-suited for individual researchers, allowing one a lot of privacy, and even the ability to store files in the cloud. Allowing people to do this in a collaborative way would make it much easier for teams to work together. At present, it takes a bit of training and sending of texts back and forth by email to bring collaborators onto “the same page.” It would be very helpful if everyone could see “same page” live onscreen at the same time. Eventually, as our research in Chinese materia medica matures, I would like to work with colleagues and use MARKUS to compare textual corpuses from different languages, such as Sanskrit, Tibetan, Mongol, and Persian. We could see how close the drug repertoires were across languages, and whether or how recipes travelled. However, MARKUS needs more development to be able to search in those languages, as well as integration with stemming tools or other kinds of features to track terms with spelling variants, or as they undergo grammatical transformation. This would enable MARKUS to have a much broader reach, opening up possibilities for translingual digital research, making it a very powerful tool for the history of science or material culture. Done in concert with teams of philological experts from multiple regions, we could use MARKUS and good translation rules to link primary sources from different languages, and do research on the global history of medicine in an entirely new way.
MARKUS has changed the way I read texts, how I collect and organise them, and how I construct arguments about them. However, I feel it has much more potential to fulfil, and not just in Chinese studies. I look forward to future changes as more and more people recognise the usefulness of this tool. Well done Brent and Hilde.
Michael Stanley-Baker, Postdoctoral Fellow, Max Planck Institute for the History of Science, Dept. III
Citation Ho, Hou Ieong Brent, and Hilde De Weerdt. MARKUS. Text Analysis and Reading Platform. 2014- http://dh.chinese-empires.eu/beta/ Funded by the European Research Council and the Digging into Data Challenge.
I am currently writing an MA Thesis on the "Ten Friends of the North City Wall" (Beiguo Shi You 北郭十友). The image that we have of this group is mainly based on later sources and a few contemporary sources that describe a group of more or less ten young poets who were active during the late Yuan and early Ming in a district near the north walls of Suzhou. We know little of the inner dynamics of this network and how they as a group fit within a larger local network of Suzhou elites during the fourteenth century.
For my research I chose an approach based on the activities that these ten individuals performed together. Examining what activities people joined in, when and where, can provide context for understanding a social relationship. Together the “Ten Friends” composed around 5000 poems spread over 10 different collections. This huge amount of material raises for me the methodological problem how to identify the poems that help me find these activities. It is at this point that I found MARKUS especially useful. I used it to find the relevant joint activities that I would need for my research and it helped me to decide how to approach these activities.
First, I manually marked all the personal names and locations that appear in the titles of all the poems. I was then able to filter out exactly those poems in which multiple person names appeared together. These poems were often composed at an event that involved multiple individuals or they describe such an event having taken place. These poems also often told me who participated in an event, where, and when the activity was held. My results also allowed me to see at a glance which individuals participated in which activities, and which activities attracted the highest number of people. It also showed me at which places important social activities took place. I complemented my search with a key-word markup of all the other personal names and place names that appear in the poems that I had found. This helped me to find all the remaining poems that can be associated with a certain activity. This way using MARKUS I was able to find roughly twenty joint activities between the “Ten Friends” and others.
Then, I used my findings to create a network graph that links individuals (orange) to social activities (blue). The graph visualizes which gatherings, which individuals, and which places were most significant to this social network. Center activities were important in maintaining relationships between the “Ten Friends of the North City Wall” themselves, while periphery activities were less impactful on their internal network, but provided them with important opportunities to extend their circle outwards.
Social network graph of the “Ten Friends of the North City Wall” based on shared participation in activities (https://goo.gl/zP5HE4)
By locating the places on the map, and separating them into private (red) and public places (blue), I was able to quickly visualize the geographical spread of the activities. From the map I concluded that activities at privately owned places like rural residencies or exile destinations brought them far outside the North City Wall district in Suzhou where they lived together. Even while they were sent to various places, their involvement in activities at more public places like scenic sites, monasteries, and local estates, suggest a strong sense of local involvement.
Geographical spread of the social network of the “Ten Friends of the North City Wall” (crop) (https://goo.gl/uVKlr8)
I used MARKUS in the early stages of my research. These early results already suggest that the North City Wall district only played a small part in the social network of the “Ten Friends of the North City Wall”. In all the 5000 poems I couldn’t find a single poem or prose piece that describes a concrete joint activity or experience that took place between ten or so individuals in this part of the city. MARKUS has shown me that to understand what bound this group together through the Yuan-Ming transition one has to look elsewhere. As the graph shows, one has to take into account a complex network of activities involving for each activity a different set of individuals. I have found relationships with a wide variety of individuals: with friends, politicians, famous artists, patrons, landowners, monks, etc. Each of these relationships have their separate activities, places, and memories. As can be seen from the map, the geographical center of this group is not fixed to a single place but shifts around, while it suggests a strong sense of locality.
Overall, the visualizations were a helpful tool throughout the whole research process and allowed me to determine what approach suited each activity best and how to relate each activity to other activities. MARKUS can be a useful tool that can bring us closer to understanding and visualizing the complexities of how social networks developed in time and space during the transitional period of the late Yuan and early Ming.
Levi Voorsmit, MA student, Leiden University
MARKUS is the reason I am pursuing my current research project. Inspired by Franco Moretti’s work, I had an idea about mapping the Chinese novel, but it seemed that it would be incredibly difficult to do it on a large scale using traditional methods. Then I learned about MARKUS, and its ability to identify place names in a digitized Chinese text. I tried it on my material, and found that with some manual correction, MARKUS gives me a good overview of all the places mentioned in a particular novel (or collection of fiction, like Baijia gongan 百家公案, below).
From that point I can decide what to map (using QGIS) and how to interpret it.
Map of places in the court case collection Longtu gongan 龙图公案, ca. 1644.
This is a big change from the way literature scholars usually look at places in the novel. We generally think in terms of types of settings or Bakhtin’s chronotopes, but relatively little attention has been paid to where those settings fall on a map. We don’t usually consider how frequently a place is mentioned in a particular novel, but doing so can be revealing. I intend to continue using MARKUS to investigate the use of space in hundreds of traditional Chinese novels, in order to bring back the vast number of novels that are usually overlooked. Once we articulate meaningful frameworks in which to interpret the results, digital tools like MARKUS have the potential to change the scale at which we do research.
Margaret Wan, Associate Professor, University of Utah
Professor Wan is currently working on a publication on this work. Stay tuned for an update.
The China Biographical Database Project is a freely accessible relational database with biographical information about approximately 370,000 people, primarily from the 7th through 19th centuries. In our efforts to populate it with biographical data, we are using MARKUS for tagging historical Chinese texts and identifying information about historical figures that we can put into our database. From 2015 onwards, we have tagged persons’ names and government office titles in epitaphs (muzhiming 墓志銘) from Tang China (618-907 AD) as well as the Compilation of Song Regulations 宋會要輯稿, which is a collection of official documents from the 10th to the 13th centuries.
In our work on Tang epitaphs, we employ regular expressions to batch tag persons’ names and office titles mentioned in such epitaphs. After doing that we adjust the regular expressions to reduce errors due to irregularities in the text to a workable amount, we load the tagged texts onto MARKUS. Our helpers then use it to check whether the persons’ names and office titles are tagged accurately. Each of our helpers is assigned a number of epitaphs. The batch uploading function on MARKUS enables us to load all the epitaphs that we need to process for the project. After these checks are done, they can also batch download the data instead of downloading them one by one. After finishing these checks, we use a tailored Python program to extract all the persons’ names in an epitaph and create temporary IDs for those persons. Then our helpers will use MARKUS to create connections between those names and temporary IDs. (See figure 1) With this we obtain the names and office titles for those historical figures, which we will include in CBDB.
Figure 1, creating the relationships between office titles and person names
In our work on the Compilation of Song Regulations, we are implementing a similar solution. But there are two main differences. First, the texts from Compilation of Song Regulations that we use are XML files drawn from an online tagging analysis platform developed by the National Taiwan University's Compilation of Song Regulations System. The tags used by their system are quite different from the ones on MARKUS. Thus we developed a Python program to convert those tags to adhere to MARKUS standards. (See Figure 2) Secondly, since the biographical data that the Compilation of Song Regulations contains are much more numerous than in Tang epitaphs, we do not ask our helpers to link persons’ names to office titles manually. Instead we designed an artificial intelligence (AI) system to determine the probabilities of the relationships between persons’ names and office titles after we finish our training data. In this AI system, we have created some patterns to define the relationship between office titles and persons’ names within a paragraph in the Song Regulations text. For example, if there is an office title at the beginning of a paragraph, and if a list of persons’ names follow it, it probably means that the office titles should be assigned to each of the persons’ names behind it. We have already figured that this is the general pattern for many paragraphs in the text. By identifying this pattern, we associate historical persons with the office titles in such government records. This training data will be sent to the team developing MARKUS for further improving its machine learning function.
Figure 2, National Taiwan University's Compilation of Song Regulations System tag format
Hongsu Wang, Project Manager, The China Biographical Database, Harvard University, firstname.lastname@example.org
Lik Hang Tsui, Postdoctoral Fellow, The China Biographical Database, Harvard University, email@example.com