Train a custom model
A model provides translations for a specific language pair. The outcome of a successful training is a model. To train a custom model, three mutually exclusive document types are required: training, tuning, and testing. If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing data. It will use a random subset of sentences from your training documents, and exclude these sentences from the training data itself. A minimum of 10,000 parallel training sentences are required to train a full model.
Create model
Select the Train model blade.
Type the Model name.
Keep the default Full training selected or select Dictionary-only training.
Note
Full training displays all uploaded document types. Dictionary-only displays dictionary documents only.
Under Select documents, select the documents you want to use to train the model, for example,
sample-English-German
and review the training cost associated with the selected number of sentences.Select Train now.
Select Train to confirm.
Note
Notifications displays model training in progress, e.g., Submitting data state. Training model takes few hours, subject to the number of selected sentences.
When to select dictionary-only training
For better results, we recommended letting the system learn from your training data. However, when you don't have enough parallel sentences to meet the 10,000 minimum requirements, or sentences and compound nouns must be rendered as-is, use dictionary-only training. Your model will typically complete training much faster than with full training. The resulting models will use the baseline models for translation along with the dictionaries you've added. You won't see BLEU scores or get a test report.
Note
Custom Translator doesn't sentence-align dictionary files. Therefore, it is important that there are an equal number of source and target phrases/sentences in your dictionary documents and that they are precisely aligned. If not, the document upload will fail.
Model details
After successful model training, select the Model details blade.
Select the Model Name to review training date/time, total training time, number of sentences used for training, tuning, testing, dictionary, and whether the system generated the test and tuning sets. You'll use
Category ID
to make translation requests.Evaluate the model BLEU score. Review the test set: the BLEU score is the custom model score and the Baseline BLEU is the pre-trained baseline model used for customization. A higher BLEU score means higher translation quality using the custom model.
Duplicate model
Select the Model details blade.
Hover over the model name and check the selection button.
Select Duplicate.
Fill in New model name.
Keep Train immediately checked if no further data will be selected or uploaded, otherwise, check Save as draft
Select Save
Note
If you save the model as
Draft
, Model details is updated with the model name inDraft
status.To add more documents, select on the model name and follow
Create model
section above.
Next steps
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for