changes to readme

2021-05-12 18:32:23 +03:00 · 2020-04-14 12:18:11 +05:30
parent 32f799dde0
commit 8a3dcb8d26
2 changed files with 8 additions and 6 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+api_config.py
 multilingual_bert/
 models_saved/
 full_data/
--- a/README.md
+++ b/README.md
@@ -23,11 +23,12 @@ Make sure to use **Python3** when running the scripts. The package requirements
 ------------------------------------------

 1. **mBERT Baseline**
-	1. Set the `language` you wish to train on in the `params` dictionary of `BERT_training_inference.py`. 
-	2. Load the datasets into the model using the data_loader function as shown in `BERT_training_inference.py`, using the parameters `files` to specify the dataset directory, `csv_file` set as `*_full.csv` in order to load the untranslated dataset.
-	3. Load the pretrained bert model required, using the parameters `path_files`, `which_bert`
-	4. Set the `how_train` parameter in `BERT_training_inference.py` to `baseline`, and set the parameters `sample_ratio`, `take_ratio`, and `samp_strategy` depending on the experiment setting. 
-	5. Call the train_model function. It trains the bert model with the dataset given, for the specified number of epochs. Use the parameter `to_save` for saving the model at the epoch having best validation scores.
+	1. Download the multilingual bert model and tokenizer from the [transformers repository](https://github.com/huggingface/transformers/) and store in the folder `BERT Classifier/multilingual_bert`.
+	2. Set the `language` you wish to train on in the `params` dictionary of `BERT_training_inference.py`. 
+	3. Load the datasets into the model using the data_loader function as shown in `BERT_training_inference.py`, using the parameters `files` to specify the dataset directory, `csv_file` set as `*_full.csv` in order to load the untranslated dataset.
+	4. Load the pretrained bert model required, using the parameters `path_files`, `which_bert`
+	5. Set the `how_train` parameter in `BERT_training_inference.py` to `baseline`, and set the parameters `sample_ratio`, `take_ratio`, and `samp_strategy` depending on the experiment setting. 
+	6. Call the train_model function. It trains the bert model with the dataset given, for the specified number of epochs. Use the parameter `to_save` for saving the model at the epoch having best validation scores.

 2. **mBERT All_but_one**
 	1. Similar to the instructions above, set the required parameters for target language, bert model to be used and sample ratio of the target dataset.
@@ -38,7 +39,7 @@ Make sure to use **Python3** when running the scripts. The package requirements
 	2. Set the `csv_file` parameter to `*_translated.csv`. Now data_loader function will load the csv files containing the texts translated to English.

 4. **CNN+GRU Baseline**
-	1. Download the MUSE embeddings from the [MUSE github repository](https://github.com/facebookresearch/MUSE)
+	1. Download the MUSE embeddings from the [MUSE github repository](https://github.com/facebookresearch/MUSE) and store them in the folder `CNN_GRU/muse_embeddings`
 	2. The files for the CNN-GRU model are located in the `CNN_GRU` folder. The main file is called `CNN_GRU.py`. 
 	3. In the params dictionary in `CNN_GRU.py`, set the values of parameters like `language`, `epochs`, `sample_ratio`, etc depending on the experimental setup.