jamorasep.main

Module Contents

Classes

Kanamap

Morasep

Attributes

parse

class jamorasep.main.Kanamap(kanamap_csv: str = None)
load_kanamap(kanamap_csv: str = None)
__call__(kana)
get_2letter_morae()
lst_katakana()
header()
class jamorasep.main.Morasep(kanamap_csv: str = None)
check_if_successive_2chars_compose_mora(i, j)

Check if the successive 2 characters compose a mora. If so, return the mora. If not, return the list of morae depending on the relationship between the 2 characters.

More specifically, assume that the input is a katakana string such as

[BOS]アイウキャカァィゥエ[EOS]

Then this function evaluates the successive 2 characters

[BOS]ア, アイ, イウ, ウキ, キャ, ャカ, カァ, ァィ, ィゥ, ゥエ, エ[EOS]

and returns the list of morae as follows

[BOS]ア: -> [BOS] (RULE 4) アイ: -> ア (RULE 4) イウ: -> イ (RULE 4) ウキ: -> ウ (RULE 4) キャ: -> キャ (RULE 0) ャカ: -> [] (RULE 2) カァ: -> カ, ア (RULE 1) ァィ: -> イ (RULE 3) ィゥ: -> ウ (RULE 3) ゥエ: -> [] (RULE 2) エ[EOS]: -> エ (RULE 4)

The [BOS] token should be eventually removed by the caller.

Each rule is described in the following.

RULE 0: i + j が「キャ」「シュ」など、辞書に記載のパターンに合致する場合 RULE 1: 「カァ」のような、辞書に記載したパターンに合致せず1モーラを形成しない場合は「カア」などに変換して返す RULE 2: 「ァカ」のような、1文字目が小文字で、次の文字が大文字のパターンの場合は、ここでは何も返さない RULE 3: 「ァァ」のような、1文字目が小文字で、次の文字も小文字のパターンの場合は、2文字目のみを「ア」に変換した上で返す RULE 4: それ以外のパターンの場合は、1文字目のみを返す

kana2mora(txt: str) List[str]

Convert a string of Japanese text (hiragana or katakana) into a list of morae. (Mora is a unit of Japanese syllable.) Symbols and characters other than hiragana/katakana are just separated character-wise and returned without any modification. For example,

“あいうえお・きゃきゅきょ・一二三<tag>”

will be converted into

[“あ”, “い”, “う”, “え”, “お”, “・”, “きゃ”, “きゅ”, “きょ”, “・”, “一”, “二”, “三”, “<”, “t”, “a”, “g”, “>”]

Parameters:

text (a string of Japanese) –

Returns:

a list of morae.

modify_special_mora(morae: List[str]) List[str]

Modify Q (ッ). If the last mora is Q, it should be replaced with a space. If the next mora starts with a vowel, Q should be replaced with a space. If the next mora starts with some kinds of consonants, Q should be replaced with the initial consonant of the following mora.

Parameters:

morae. (a list of) –

Returns:

a list of morae.

convert_lst_of_mora(lst: List[str], output_format='katakana', phoneme=False) List[str]

Convert a list of morae into a list of katakana or hiragana or other formats.

Parameters:
  • lst (list) – A list of morae.

  • output_format (str) – The output format of the morae. Defaults to “katakana”. Options are [“katakana”, “hiragana”], and any of the columns in the kanamap.csv file, including [“kunrei”, “hepburn”, “simple-ipa”]. Defaults to “katakana”.

parse(txt: str = '', **kwargs) List[str]

Converts a string of katakana into a list of morae.

Parameters:
  • txt (str) – A string of katakana.

  • output_format (str) – The output format of the morae. Defaults to None. If None, the output is a list of morae. Options are [“katakana”, “hiragana”], and any of the columns in the kanamap.csv file, including [“kunrei”, “hepburn”, “simple-ipa”].

  • phoneme (bool) – If True, the output is a list of phonemes. Defaults to False. This flag is only valid when output_format is either of [“kunrei”, “hepburn”, “simple-ipa”].

Returns:

A list of morae.

Return type:

list

jamorasep.main.parse