Textual Analysis

Web-based Tools and Corpora:

Voyant Tools

This is the Japanese version of the web-based software, Voyant Tools. It can tokenize Japanese text and eliminates the need to insert whitespace before pasting or uploading text.

KH Coder

An open source software for quantitative content analysis and text mining. It supports Japanese, English, and numerous other languages. An English reference manual is available.

Center for Open Data in the Humanities (CODH)

Since 2017 the CODH has been focused on opening up new possibilities in Japanese digital humanities and provide access to multiple open datasets, such as: Pre-Modern Japanese Text (from the National Institute of Japanese Literature), Dataset of Edo Cooking Recipes, Bunkan Complete Collection (biographical and geospatial data related to daimyo and shognate officials), and Dataset of Modern Magazines.

Center for Corpus Development, NINJAL

Various web-based tools and corpora are available through the National Institute of Japanese Language and Linguisitics (NINJAL). Notable corpus include: Shonagon (Corpus of Contemporary Written Japanese), Chunagon (Corpus of Contemporary Written and Spoken Japanese; with free registration), as well as the Oxford-NINJAL Corpus of Old Japanese.

JAPANESE.GR.JP (JGJ)

A text analysis project for Japanese linguistic and literary classics, including software, results, and data sets. There is a special focus on waka poetry.

Kokalog

A system that enables full text searches of Diet proceedings, as well as a timeline indicating when these terms were used with greatest frequency. The use of double quotes around each search term is advised ("X").

Japanese Text Initiative

A collaborative effort between the University of Virginia Library Electronic Text Center and the University of Pittsburgh East Asian Library to make texts of classical Japanese literature available on the internet.

HathiTrust Research Center

Supports large-scale computational analysis of the works in the HathiTrust Digital Library. Support of Japanese is, however, not well documented.

Japan Digital Research Center

Reischauer Institute of Japanese Studies

Textual Analysis

Web-based Tools and Corpora:

Featured Projects

Text Analysis Guides

Word Segmenters

Japan Digital Research Center