meguru_tokenizer.process.noise_pytorch module¶
noising batched tokenized text for pytorch
origin https://github.com/shentianxiao/text-autoencoders
-
class
meguru_tokenizer.process.noise_pytorch.
Noiser
(vocab: meguru_tokenizer.vocab.BaseVocab)[source]¶ Bases:
object
Noising per padded batch tensor
Note
x is the torch.Tensor whose shape is [|S|, B]