meguru_tokenizer.process.noise_pytorch module

noising batched tokenized text for pytorch

origin https://github.com/shentianxiao/text-autoencoders

class meguru_tokenizer.process.noise_pytorch.Noiser(vocab: meguru_tokenizer.vocab.BaseVocab)[source]

Bases: object

Noising per padded batch tensor

Note

x is the torch.Tensor whose shape is [|S|, B]

noisy(x, drop_prob, blank_prob, sub_prob, shuffle_dist)[source]
word_blank(x, p)[source]
word_drop(x, p)[source]
word_shuffle(x, k)[source]
word_substitute(x, p)[source]