mirror of
https://github.com/hate-alert/DE-LIMIT.git
synced 2021-05-12 18:32:23 +03:00
Instructions for BERT models
-
mBERT Baseline
- Download the multilingual bert model and tokenizer from the transformers repository and store in the folder
BERT Classifier/multilingual_bert. - Set the
languageyou wish to train on in theparamsdictionary ofBERT_training_inference.py. - Load the datasets into the model using the data_loader function as shown in
BERT_training_inference.py, using the parametersfilesto specify the dataset directory,csv_fileset as*_full.csvin order to load the untranslated dataset. - Load the pretrained bert model required, using the parameters
path_files,which_bert - Set the
how_trainparameter inBERT_training_inference.pytobaseline, and set the parameterssample_ratio,take_ratio, andsamp_strategydepending on the experiment setting. - Call the train_model function. It trains the bert model with the dataset given, for the specified number of epochs. Use the parameter
to_savefor saving the model at the epoch having best validation scores.
- Download the multilingual bert model and tokenizer from the transformers repository and store in the folder
-
mBERT All_but_one
- Similar to the instructions above, set the required parameters for target language, bert model to be used and sample ratio of the target dataset.
- Set the
how_trainparameter toall_but_one. Now data_loader function will load the datasets all other language fully, and the dataset for the target language in the given sample ratio.
-
Translation + BERT Baseline
- Set the language and other parameters similar to mBERT baseline case.
- Set the
csv_fileparameter to*_translated.csv. Now data_loader function will load the csv files containing the texts translated to English.