成果・報告 Results Reports

HOME > 成果・報告 > Miyuki Kondo, A Method for Studying Waka from the Dynastic Period (Chapter 12)

Miyuki Kondo, A Method for Studying Waka from the Dynastic Period (Chapter 12)

Sampling of Word Forms and Compound Words Based on N-gram Statistics (Analysis of Japanese language in the Heian Period )

Miyuki Kondo

(A Method for Studying Waka from the Dynastic Period , Kasama-shoin, 2015, Chapter 12)

1. Introduction

There are many points of view and perspectives on how to establish criteria for identifying an individual word in classical Japanese or modern Japanese. In addition, extensive research has been conducted on this process, which has produced no consensus in regard to the units that should be used to identify a word. For example, in several studies conducted by the National Language Research Institute, two types of units have been proposed: alpha units and beta units [1], which reveal the difficulty of recognizing words in Japanese. One extreme example is the case of compound words. To date, there have been four approaches to defining and analyzing compound words: 1) the morphological approaches [2]; 2) phonological approaches [3]; 3) statistical approaches [4]; and 4) semantic approaches [5]. As there have been numerous comparisons with other foreign languages [6], it may appear as if this topic has been discussed from every possible angle. However, there are problems that have not necessarily been resolved such as the vague methods in which compound words are identified as items in indices and dictionaries [7] as well as cases in which the boundaries with words, customary expressions, and texts are unclear [8]. One of the reasons for such difficulty is the meshing of theory and reality, although it is probably related to the fact that identification has been primarily based on the researchers' introspective opinions. There are more than a few instances in which discrepancies in the selection of compound words as items in indices and dictionaries can be described as reflections of differences in the editors' subjective opinions and linguistic theories. Of course, one cannot doubt the importance of introspection in research about grammar, vocabulary, and diction. However, when classical Japanese (not modern Japanese) is the subject of attempts to sample and analyze the forms and semantic features of compound words, a more appropriate grasp of linguistic phenomena is required in addition to introspection. Therefore, this study proposes a method of sampling and analyzing items from a corpus of compound words by using statistical processing techniques based on literary texts from the Heian Period. In addition, this study demonstrates one aspect of the knowledge gained from this approach.

...... (skip)

full text download link