jamorasep.main
Module Contents
Classes
Attributes
- class jamorasep.main.Kanamap(kanamap_csv: str = None)
- load_kanamap(kanamap_csv: str = None)
- __call__(kana)
- get_2letter_morae()
- lst_katakana()
- header()
- class jamorasep.main.Morasep(kanamap_csv: str = None)
- check_if_successive_2chars_compose_mora(i, j)
Check if the successive 2 characters compose a mora. If so, return the mora. If not, return the list of morae depending on the relationship between the 2 characters.
- More specifically, assume that the input is a katakana string such as
[BOS]アイウキャカァィゥエ[EOS]
- Then this function evaluates the successive 2 characters
[BOS]ア, アイ, イウ, ウキ, キャ, ャカ, カァ, ァィ, ィゥ, ゥエ, エ[EOS]
- and returns the list of morae as follows
[BOS]ア: -> [BOS] (RULE 4) アイ: -> ア (RULE 4) イウ: -> イ (RULE 4) ウキ: -> ウ (RULE 4) キャ: -> キャ (RULE 0) ャカ: -> [] (RULE 2) カァ: -> カ, ア (RULE 1) ァィ: -> イ (RULE 3) ィゥ: -> ウ (RULE 3) ゥエ: -> [] (RULE 2) エ[EOS]: -> エ (RULE 4)
The [BOS] token should be eventually removed by the caller.
- Each rule is described in the following.
RULE 0: i + j が「キャ」「シュ」など、辞書に記載のパターンに合致する場合 RULE 1: 「カァ」のような、辞書に記載したパターンに合致せず1モーラを形成しない場合は「カア」などに変換して返す RULE 2: 「ァカ」のような、1文字目が小文字で、次の文字が大文字のパターンの場合は、ここでは何も返さない RULE 3: 「ァァ」のような、1文字目が小文字で、次の文字も小文字のパターンの場合は、2文字目のみを「ア」に変換した上で返す RULE 4: それ以外のパターンの場合は、1文字目のみを返す
- kana2mora(txt: str) List[str]
Convert a string of Japanese text (hiragana or katakana) into a list of morae. (Mora is a unit of Japanese syllable.) Symbols and characters other than hiragana/katakana are just separated character-wise and returned without any modification. For example,
“あいうえお・きゃきゅきょ・一二三<tag>”
- will be converted into
[“あ”, “い”, “う”, “え”, “お”, “・”, “きゃ”, “きゅ”, “きょ”, “・”, “一”, “二”, “三”, “<”, “t”, “a”, “g”, “>”]
- Parameters:
text (a string of Japanese) –
- Returns:
a list of morae.
- modify_special_mora(morae: List[str]) List[str]
Modify Q (ッ). If the last mora is Q, it should be replaced with a space. If the next mora starts with a vowel, Q should be replaced with a space. If the next mora starts with some kinds of consonants, Q should be replaced with the initial consonant of the following mora.
- Parameters:
morae. (a list of) –
- Returns:
a list of morae.
- convert_lst_of_mora(lst: List[str], output_format='katakana', phoneme=False) List[str]
Convert a list of morae into a list of katakana or hiragana or other formats.
- Parameters:
lst (list) – A list of morae.
output_format (str) – The output format of the morae. Defaults to “katakana”. Options are [“katakana”, “hiragana”], and any of the columns in the kanamap.csv file, including [“kunrei”, “hepburn”, “simple-ipa”]. Defaults to “katakana”.
- parse(txt: str = '', **kwargs) List[str]
Converts a string of katakana into a list of morae.
- Parameters:
txt (str) – A string of katakana.
output_format (str) – The output format of the morae. Defaults to None. If None, the output is a list of morae. Options are [“katakana”, “hiragana”], and any of the columns in the kanamap.csv file, including [“kunrei”, “hepburn”, “simple-ipa”].
phoneme (bool) – If True, the output is a list of phonemes. Defaults to False. This flag is only valid when output_format is either of [“kunrei”, “hepburn”, “simple-ipa”].
- Returns:
A list of morae.
- Return type:
list
- jamorasep.main.parse