Translation¶
Translations¶
-
class
onmt.translate.
Translation
(src, src_raw, pred_sents, attn, pred_scores, tgt_sent, gold_score)[source]¶ Bases:
object
Container for a translated sentence.
- Variables
src (LongTensor) – Source word IDs.
src_raw (List[str]) – Raw source words.
pred_sents (List[List[str]]) – Words from the n-best translations.
pred_scores (List[List[float]]) – Log-probs of n-best translations.
attns (List[FloatTensor]) – Attention distribution for each translation.
gold_sent (List[str]) – Words from gold translation.
gold_score (List[float]) – Log-prob of gold translation.
Translator Class¶
-
class
onmt.translate.
Translator
(model, fields, src_reader, tgt_reader, gpu=-1, n_best=1, min_length=0, max_length=100, ratio=0.0, beam_size=30, random_sampling_topk=1, random_sampling_temp=1, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, phrase_table='', data_type='text', verbose=False, report_bleu=False, report_rouge=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_score=True, logger=None, seed=-1)[source]¶ Bases:
object
Translate a batch of sentences with a saved model.
- Parameters
model (onmt.modules.NMTModel) – NMT model to use for translation
fields (dict[str, torchtext.data.Field]) – A dict mapping each side to its list of name-Field pairs.
src_reader (onmt.inputters.DataReaderBase) – Source reader.
tgt_reader (onmt.inputters.TextDataReader) – Target reader.
gpu (int) – GPU device. Set to negative for no GPU.
n_best (int) – How many beams to wait for.
min_length (int) – See
onmt.translate.decode_strategy.DecodeStrategy
.max_length (int) – See
onmt.translate.decode_strategy.DecodeStrategy
.beam_size (int) – Number of beams.
random_sampling_topk (int) – See
onmt.translate.random_sampling.RandomSampling
.random_sampling_temp (int) – See
onmt.translate.random_sampling.RandomSampling
.stepwise_penalty (bool) – Whether coverage penalty is applied every step or not.
dump_beam (bool) – Debugging option.
block_ngram_repeat (int) – See
onmt.translate.decode_strategy.DecodeStrategy
.ignore_when_blocking (set or frozenset) – See
onmt.translate.decode_strategy.DecodeStrategy
.replace_unk (bool) – Replace unknown token.
data_type (str) – Source data type.
verbose (bool) – Print/log every translation.
report_bleu (bool) – Print/log Bleu metric.
report_rouge (bool) – Print/log Rouge metric.
report_time (bool) – Print/log total time/frequency.
copy_attn (bool) – Use copy attention.
global_scorer (onmt.translate.GNMTGlobalScorer) – Translation scoring/reranking object.
out_file (TextIO or codecs.StreamReaderWriter) – Output file.
report_score (bool) – Whether to report scores
logger (logging.Logger or NoneType) – Logger.
-
classmethod
from_opt
(model, fields, opt, model_opt, global_scorer=None, out_file=None, report_score=True, logger=None)[source]¶ Alternate constructor.
- Parameters
model (onmt.modules.NMTModel) – See
__init__()
.fields (dict[str, torchtext.data.Field]) – See
__init__()
.opt (argparse.Namespace) – Command line options
model_opt (argparse.Namespace) – Command line options saved with the model checkpoint.
global_scorer (onmt.translate.GNMTGlobalScorer) – See
__init__()
..out_file (TextIO or codecs.StreamReaderWriter) – See
__init__()
.report_score (bool) – See
__init__()
.logger (logging.Logger or NoneType) – See
__init__()
.
-
translate
(src, tgt=None, src_dir=None, batch_size=None, attn_debug=False, phrase_table='')[source]¶ Translate content of
src
and get gold scores fromtgt
.- Parameters
src – See
self.src_reader.read()
.tgt – See
self.tgt_reader.read()
.src_dir – See
self.src_reader.read()
(only relevant for certain types of data).batch_size (int) – size of examples per mini-batch
attn_debug (bool) – enables the attention logging
- Returns
(list, list)
all_scores is a list of batch_size lists of n_best scores
- all_predictions is a list of batch_size lists
of n_best predictions
-
class
onmt.translate.
TranslationBuilder
(data, fields, n_best=1, replace_unk=False, has_tgt=False, phrase_table='')[source]¶ Bases:
object
Build a word-based translation from the batch output of translator and the underlying dictionaries.
Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]
- Parameters
data (onmt.inputters.Dataset) – Data.
fields (List[Tuple[str, torchtext.data.Field]]) – data fields
n_best (int) – number of translations produced
replace_unk (bool) – replace unknown words using attention
has_tgt (bool) – will the batch have gold targets
Decoding Strategies¶
-
class
onmt.translate.
DecodeStrategy
(pad, bos, eos, batch_size, device, parallel_paths, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length)[source]¶ Bases:
object
Base class for generation strategies.
- Parameters
pad (int) – Magic integer in output vocab.
bos (int) – Magic integer in output vocab.
eos (int) – Magic integer in output vocab.
batch_size (int) – Current batch size.
device (torch.device or str) – Device for memory bank (encoder).
parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated
parallel_paths
times in relevant state tensors.min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.
max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).
block_ngram_repeat (int) – Block beams where
block_ngram_repeat
-grams repeat.exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.
return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.
- Variables
pad (int) – See above.
bos (int) – See above.
eos (int) – See above.
predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences.
scores (list[list[FloatTensor]]) – For each batch, holds a list of scores.
attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape
(step, inp_seq_len)
whereinp_seq_len
is the length of the sample (not the max length of all inp seqs).alive_seq (LongTensor) – Shape
(B x parallel_paths, step)
. This sequence grows in thestep
axis on each call toadvance()
.is_finished (ByteTensor or NoneType) – Shape
(B, parallel_paths)
. Initialized toNone
.alive_attn (FloatTensor or NoneType) – If tensor, shape is
(step, B x parallel_paths, inp_seq_len)
, whereinp_seq_len
is the (max) length of the input sequence.min_length (int) – See above.
max_length (int) – See above.
block_ngram_repeat (int) – See above.
exclusion_tokens (set[int]) – See above.
return_attention (bool) – See above.
done (bool) – See above.
-
advance
(log_probs, attn)[source]¶ DecodeStrategy subclasses should override
advance()
.Advance is used to update
self.alive_seq
,self.is_finished
, and, when appropriate,self.alive_attn
.
-
update_finished
()[source]¶ DecodeStrategy subclasses should override
update_finished()
.update_finished
is used to updateself.predictions
,self.scores
, and other “output” attributes.
-
class
onmt.translate.
BeamSearch
(beam_size, batch_size, pad, bos, eos, n_best, mb_device, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, memory_lengths, stepwise_penalty, ratio)[source]¶ Bases:
onmt.translate.decode_strategy.DecodeStrategy
Generation beam search.
Note that the attributes list is not exhaustive. Rather, it highlights tensors to document their shape. (Since the state variables’ “batch” size decreases as beams finish, we denote this axis with a B rather than
batch_size
).- Parameters
beam_size (int) – Number of beams to use (see base
parallel_paths
).batch_size (int) – See base.
pad (int) – See base.
bos (int) – See base.
eos (int) – See base.
n_best (int) – Don’t stop until at least this many beams have reached EOS.
mb_device (torch.device or str) – See base
device
.global_scorer (onmt.translate.GNMTGlobalScorer) – Scorer instance.
min_length (int) – See base.
max_length (int) – See base.
return_attention (bool) – See base.
block_ngram_repeat (int) – See base.
exclusion_tokens (set[int]) – See base.
memory_lengths (LongTensor) – Lengths of encodings. Used for masking attentions.
- Variables
top_beam_finished (ByteTensor) – Shape
(B,)
._batch_offset (LongTensor) – Shape
(B,)
._beam_offset (LongTensor) – Shape
(batch_size x beam_size,)
.alive_seq (LongTensor) – See base.
topk_log_probs (FloatTensor) – Shape
(B x beam_size,)
. These are the scores used for the topk operation.select_indices (LongTensor or NoneType) – Shape
(B x beam_size,)
. This is just a flat view of the_batch_index
.topk_scores (FloatTensor) – Shape
(B, beam_size)
. These are the scores a sequence will receive if it finishes.topk_ids (LongTensor) – Shape
(B, beam_size)
. These are the word indices of the topk predictions._batch_index (LongTensor) – Shape
(B, beam_size)
._prev_penalty (FloatTensor or NoneType) – Shape
(B, beam_size)
. Initialized toNone
._coverage (FloatTensor or NoneType) – Shape
(1, B x beam_size, inp_seq_len)
.hypotheses (list[list[Tuple[Tensor]]]) – Contains a tuple of score (float), sequence (long), and attention (float or None).
-
advance
(log_probs, attn)[source]¶ DecodeStrategy subclasses should override
advance()
.Advance is used to update
self.alive_seq
,self.is_finished
, and, when appropriate,self.alive_attn
.
-
update_finished
()[source]¶ DecodeStrategy subclasses should override
update_finished()
.update_finished
is used to updateself.predictions
,self.scores
, and other “output” attributes.
-
onmt.translate.random_sampling.
sample_with_temperature
(logits, sampling_temp, keep_topk)[source]¶ Select next tokens randomly from the top k possible next tokens.
Samples from a categorical distribution over the
keep_topk
words using the category probabilitieslogits / sampling_temp
.- Parameters
logits (FloatTensor) – Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.)sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
- Returns
topk_ids: Shaped
(batch_size, 1)
. These are the sampled word indices in the output vocab.topk_scores: Shaped
(batch_size, 1)
. These are essentially(logits / sampling_temp)[topk_ids]
.
- Return type
(LongTensor, FloatTensor)
-
class
onmt.translate.
RandomSampling
(pad, bos, eos, batch_size, device, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, memory_length)[source]¶ Bases:
onmt.translate.decode_strategy.DecodeStrategy
Select next tokens randomly from the top k possible next tokens.
The
scores
attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event thatmax_length
is reached)- Parameters
pad (int) – See base.
bos (int) – See base.
eos (int) – See base.
batch_size (int) – See base.
device (torch.device or str) – See base
device
.min_length (int) – See base.
max_length (int) – See base.
block_ngram_repeat (int) – See base.
exclusion_tokens (set[int]) – See base.
return_attention (bool) – See base.
max_length – See base.
sampling_temp (float) – See
sample_with_temperature()
.keep_topk (int) – See
sample_with_temperature()
.memory_length (LongTensor) – Lengths of encodings. Used for masking attention.
-
advance
(log_probs, attn)[source]¶ Select next tokens randomly from the top k possible next tokens.
- Parameters
log_probs (FloatTensor) – Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.)attn (FloatTensor) – Shaped
(1, B, inp_seq_len)
.
Scoring¶
-
class
onmt.translate.penalties.
PenaltyBuilder
(cov_pen, length_pen)[source]¶ Bases:
object
Returns the Length and Coverage Penalty function for Beam Search.
- Parameters
length_pen (str) – option name of length pen
cov_pen (str) – option name of cov pen
- Variables
has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.
length_penalty (callable[[int, float], float]) – Calculates the length penalty.
-
coverage_wu
(cov, beta=0.0)[source]¶ GNMT coverage re-ranking score.
See “Google’s Neural Machine Translation System” [WSC+16].
cov
is expected to be sized(*, seq_len)
, where*
is probablybatch_size x beam_size
but could be several dimensions like(batch_size, beam_size)
. Ifcov
is attention, then theseq_len
axis probably sums to (almost) 1.
-
class
onmt.translate.
GNMTGlobalScorer
(alpha, beta, length_penalty, coverage_penalty)[source]¶ Bases:
object
NMT re-ranking.
- Parameters
alpha (float) – Length parameter.
beta (float) – Coverage parameter.
length_penalty (str) – Length penalty strategy.
coverage_penalty (str) – Coverage penalty strategy.
- Variables
alpha (float) – See above.
beta (float) – See above.
length_penalty (callable) – See
penalties.PenaltyBuilder
.coverage_penalty (callable) – See
penalties.PenaltyBuilder
.has_cov_pen (bool) – See
penalties.PenaltyBuilder
.has_len_pen (bool) – See
penalties.PenaltyBuilder
.