Download the CBDBRegexMachine

By Elif Yamangil

This Regular Expression Machine is a graphical user interface (GUI) built within Java Swing library that allows a user to graphically design patterns of biographical texts, match these against the data and see results immediately via a user-friendly color-coding scheme.

It consists of a workspace of three main components: (1) A view that displays the textual data currently used. (2) A list of active regular expressions that are matched against the data via color-coding. (3) A list of shortcut regular expression parts that can be used as building-blocks in active regular expressions from (2).

CBDB Regex Machine

Additional facilities we have built into our product are (1) an XML export ability: Users can create XML files that flatten the current workspace (data and regular expression matched against) at the click of a button. This facilitates interfacing to other programs such as Microsoft Excel and Access for database input. (2) A save/load ability: Users can easily save/load the workspace state which includes the list of regular expressions and shortcuts and their color settings. (3) A handy list of pre-made regular expression examples: Numerous date patterns can be added instantly to any regular expression using the GUI menus.

The point of building this application is to allow users with no apriori experience with programming or Computer Science concepts such as regular expression scripting to experiment with data-mining Chinese biographical texts at an intuitive template-matching understanding level only, yet still effectively.


  • Elif Yamagil (major developer)
  • Hou Ieong Ho (developer)
  • Sophia Huang at Harvard Asia Center (early participant)


Click to download the CBDBRegexMachine package. After unzip, please read "docs\Using CBDBRegexMachine.pptx" first in order to learn how to install the CBDBRegexMachine.


Regular Expressions.ppt

Text Extraction using Regular Expressions:

text extraction regex_materials.zip

text extraction regex_shihpei.ppt