update READMEs

This commit is contained in:
Samet Demir
2019-12-02 00:26:06 +03:00
parent 9be31672a3
commit fca34746a6
9 changed files with 48 additions and 1179 deletions

View File

@@ -1,50 +1,51 @@
# Fake Academic Paper Generation Project
# Neural Academic Paper Generation
inzva AI Projects #2 - Fake Academic Paper Generation Project
## Project Description
## Abstract
In this project, we aim to use the LaTeX source files of open access papers on arXiv
as a dataset and feed it into a neural network to be able to generate realistic
looking academic papers. We chose the character based recurrent neural network (RNN)
model used by Andrej Karpathy in his blog post as our baseline [1]. We will try to improve
the baseline results of the char-RNN model by applying transformers and attention
mechanism [2]. We also want to try GANs to generate realistic LaTeX code. [3]
In this work, we tackle the problem of structured text generation, specifically academic paper generation in LaTeX, inspired by the surprisingly good results of basic character-level language models. Our motivation is using more recent and advanced methods of language modeling on a more complex dataset of LaTeX source files to generate realistic academic papers. Our first contribution is preparing a dataset with LaTeX source files on recent open-source computer vision papers. Our second contribution is experimenting with recent methods of language modeling and text generation such as Transformer and Transformer-XL to generate consistent LaTeX code. We report cross-entropy and bits-per-character (BPC) results of the trained models, and we also discuss interesting points on some examples of the generated LaTeX code.
## Project Dependencies
- NumPy
- TexSoup (for dataset preparation)
- BeautifulSoup (for dataset preparation)
- Tensorflow 1.12 (for RNN)
- Tensor2Tensor 1.13.4 (for Transformer)
- PyTorch (for Transformer-XL)
## Dataset
*Note: We decided not to share the dataset because of ethical concerns. However, the code can be used to recreate the dataset.*
### Dataset Preparation
To the best of our knowledge there was no available dataset compiled from academic papers. Therefore we decided to prepare a dataset from academic papers on arxiv.org.
All scripts related to the dataset preparation can be found in the **dataset_generation** directory.
All scripts related to the dataset preparation can be found in the **[dataset_generation](dataset_generation)** directory.
#### Steps for the dataset preparation:
##### 1) Select a subset of academic papers on arxiv.org
We selected Computer Vision as the topic of interest for the dataset. Therefore, we crawled arxiv.org to find papers tagged as Computer Vision between 2015 - 2018. (BeautifulSoup is used as html parser)
related scripts:
* **dataset_generation/crawler.py** (crawles arxiv.org as specified and writes the result to **paperlinks.txt**)
* **dataset_generation/random_paper_sampler.py** (samples examples from **paperlinks.txt** and writes the result to **selected_papers.txt**)
* **[dataset_generation/crawler.py](dataset_generation/crawler.py)** (crawles arxiv.org as specified and writes the result to **paperlinks.txt**)
* **[dataset_generation/random_paper_sampler.py](dataset_generation/random_paper_sampler.py)** (samples examples from **paperlinks.txt** and writes the result to **selected_papers.txt**)
##### 2) Download the source files
We downloaded the source files as tar files for the selected papers and untar/unzip them.
related script: **dataset_generation/downloader.py** (reads selected papers from **selected_papers.txt**, downloads the source files and untar/unzip them)
related script: **[dataset_generation/downloader.py](dataset_generation/downloader.py)** (reads selected papers from **selected_papers.txt**, downloads the source files and untar/unzip them)
##### 3) Find the latex source files for each paper and Compile each paper into one latex file
We resolved \include, \input kind of import statements in latex source files in order to compile each paper into one latex file and wrote a latex file for each paper.
related script: **dataset_generation/latex_input_resolver.py** (Finds the latex files from the source files, reads the content using TexSoup, finds the root files(files including documentclass statement), recursively replaces the import statements with the content of the imported file, and writes a latex file for each paper.)
related script: **[dataset_generation/latex_input_resolver.py](dataset_generation/latex_input_resolver.py)** (Finds the latex files from the source files, reads the content using TexSoup, finds the root files(files including documentclass statement), recursively replaces the import statements with the content of the imported file, and writes a latex file for each paper.)
##### other helper scripts:
* **dataset_generation/complete_dataset.py** (kind of combination of all these scripts which finds problematic source files and replaces them with other papers from the **paperlinks.txt**)
* **dataset_generation/renumber_paper.py** (renames the papers like 0.tex, 1.tex, 2.tex so on)
* **[dataset_generation/complete_dataset.py](dataset_generation/complete_dataset.py)** (kind of combination of all these scripts which finds problematic source files and replaces them with other papers from the **paperlinks.txt**)
* **[dataset_generation/renumber_paper.py](dataset_generation/renumber_paper.py)** (renames the papers like 0.tex, 1.tex, 2.tex so on)
Using this specified process, we downloaded 4-5 GB source files for papers since source files include images etc. which are not need for our purpose. At the end, we have 799 latex files each for an academic paper. Before preprocessing, this is approximately equal to 46 MB of latex.
#### License for the dataset
* Papers are licensed under one of Creative Common licenses. For details: https://arxiv.org/help/license
* The papers in the dataset are listed in **dataset_generation/selected_papers.txt**. The list can be used to give credit to the papers in the dataset.
### Preprocessing
Dataset is needed to be preprocessed because of noise such as created by comments and non-UTF characters. Therefore, we used _preprocess_char.py_ to delete comments and characters that used below a certain threshold, in our experiments it is 100.
@@ -57,39 +58,35 @@ For our baseline model, we decided to use character level embedding. The details
| Lower-case to Upper-case Ratio | 23.95 |
| Word to Non-word Ratio | 3.17 |
## Project Dependencies
- Tensorflow 1.12
- NumPy
- TexSoup (for dataset preparation)
- BeautifulSoup (for dataset preparation)
## Models
### Baseline Model
### Baseline Model (RNN)
The rnn model described in the blog post "The Unreasonable Effectiveness of Recurrent Neural Networks"[1]
#### How to Run:
After preparing the dataset, run **char-rnn.py** to train the model.
After preparing the dataset, run **[char-rnn.py](char-rnn.py)** to train the model.
When training is over, run **generate_text.py**. This script will load the last
When training is over, run **[generate_text.py](generate_text.py)**. This script will load the last
checkpoint and generate a number of characters using the learned parameters.
### Simplified Transformer Model
From Transformer Model [2], the parts related to translation problem is deleted.
<img src="./simplified_transformer/simplified_transformer_model.png" height="400"/>
### Transformer
Transformer [2] is another popular model.
#### How to Run:
After preparing the dataset, run **simplified_transformer/simplified_transformer.py** to train the model.
We use [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor) [3] for Transformer model. See [t2t_paper_generation_problem](t2t_paper_generation_problem) directory for details.
When training is over, run **simplified_transformer/generate_text.py**. This script will load the last
checkpoint and generate a number of characters using the learned parameters.
### Transformer-XL
Transformer-XL [4] is a new model aiming to extend Transformer such that long term dependecies could be handled properly.
#### How to Run:
We use the original code shared by the authors who propose Transformer-XL. See [transformer-xl](transformer-xl) directory for details.
## References
[1] The Unreasonable Effectiveness of Recurrent Neural Networks
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
[2] Vaswani, Ashish, et al. "Attention is all you need." *Advances in Neural Information Processing Systems*. 2017.
[2] Vaswani, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
[3] Nie, Weili, Nina Narodytska, and Ankit Patel. "RelGAN: Relational Generative Adversarial Networks for Text Generation." (2018).
[3] Vaswani et al. "Tensor2Tensor for Neural Machine Translation". 2018. [arXiv:1803.07416](http://arxiv.org/abs/1803.07416)
[4] Dai et al. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context". 2018. [arXiv:1901.02860](http://arxiv.org/abs/1901.02860)

View File

@@ -1,143 +0,0 @@
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--seq_length', type=int, default=100, help='Input sequence length given to the network')
parser.add_argument('--epochs', type=int, default=3, help='Number of training epochs')
parser.add_argument('--batch_size', type=int, default=64, help='Size of the training batches')
parser.add_argument('--d_model', type=int, default=256, help='')
parser.add_argument('--d_inner_hid', type=int, default=512, help='')
parser.add_argument('--n_head', type=int, default=4, help='')
parser.add_argument('--d_k', type=int, default=64, help='')
parser.add_argument('--d_v', type=int, default=64, help='')
parser.add_argument('--layers', type=int, default=6, help='Number of stacked multi-head-layers layers')
parser.add_argument('--dropout', type=float, default=0.1, help='')
parser.add_argument('--active_layers', type=int, default=999, help='')
parser.add_argument('--input_file', type=str, default='../dataset/preprocessed_data.txt', help='Input file path')
parser.add_argument('--chars_to_generate', type=int, default=1000, help='')
parser.add_argument('--temperature', type=float, default=None, help='')
opt = parser.parse_args()
import random, os, sys
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.initializers import *
import keras.backend as K
from transformer import Encoder, GetPosEncodingMatrix
# Read, then decode for py2 compat.
text = open(opt.input_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# Take a look at the first 250 characters in text
print(text[:250])
# The unique characters in the file
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
# Creating a mapping from unique characters to indices
char2idx = {u: i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
# Length of the vocabulary in chars
vocab_size = len(vocab)
def build_model(n_tokens, len_limit, batch_size, d_model=256, d_inner_hid=512, n_head=4, d_k=64, d_v=64, layers=6, dropout=0.1, active_layers=999):
d_emb = d_model
pos_emb = Embedding(len_limit, d_emb, trainable=False, \
weights=[GetPosEncodingMatrix(len_limit, d_emb)], \
batch_input_shape=[batch_size, None])
word_emb = Embedding(n_tokens, d_emb, batch_input_shape=[batch_size, None])
encoder = Encoder(d_model, d_inner_hid, n_head, d_k, d_v, layers, dropout, \
word_emb=word_emb, pos_emb=pos_emb)
target_layer = TimeDistributed(Dense(n_tokens, use_bias=False))
def get_pos_seq(x):
mask = K.cast(K.not_equal(x, 0), 'int32')
pos = K.cumsum(K.ones_like(x, 'int32'), 1)
return pos * mask
src_seq = Input(shape=(None,), dtype='int32')
src_pos = Lambda(get_pos_seq)(src_seq)
enc_output = encoder(src_seq, src_pos, active_layers=active_layers)
final_output = target_layer(enc_output)
model = Model(inputs=src_seq, outputs=final_output)
return model
model = build_model(vocab_size, opt.seq_length+1, 1, d_model=opt.d_model,d_inner_hid=opt.d_inner_hid,\
n_head=opt.n_head, d_k=opt.d_k, d_v=opt.d_v, layers=opt.layers, dropout=opt.dropout, active_layers=opt.active_layers)
# Directory where the checkpoints will be saved
checkpoint_dir = './experiment/training_checkpoints_seq_len_{}'.format(opt.seq_length)
#RecursionError: maximum recursion depth exceeded while loading parameters
sys.setrecursionlimit(10000)
tf.train.latest_checkpoint(checkpoint_dir)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
model.summary()
def generate_text(model, start_string, temperature):
# Evaluation step (generating text using the learned model)
# Number of characters to generate
num_generate = opt.chars_to_generate
# Converting our start string to numbers (vectorizing)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# Empty string to store our results
text_generated = []
# Low temperatures results in more predictable text.
# Higher temperatures results in more surprising text.
# Experiment to find the best setting.
# temperature = 0.5
# Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)
# using a multinomial distribution to predict the word returned by the model
predictions = predictions / temperature
predicted_id = tf.random.multinomial(predictions, num_samples=1)[-1, 0].numpy()
# We pass the predicted word as the next input to the model
# along with the previous hidden state
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])
return start_string + ''.join(text_generated)
if opt.temperature is None:
temperatures = [0.1, 0.25, 0.35, 0.5, 0.65, 0.75, 0.9, 1.]
else:
temperatures = [opt.temperature]
for temperature in temperatures:
with open(os.path.join(checkpoint_dir, 'generated_text_temp_{}.txt'.format(temperature)), 'w+', encoding='utf-8') as f:
print(generate_text(model, start_string=u"\\begin{document}", temperature=temperature), file=f)

View File

@@ -1,166 +0,0 @@
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--seq_length', type=int, default=100, help='Input sequence length given to the network')
parser.add_argument('--epochs', type=int, default=3, help='Number of training epochs')
parser.add_argument('--batch_size', type=int, default=64, help='Size of the training batches')
parser.add_argument('--d_model', type=int, default=256, help='')
parser.add_argument('--d_inner_hid', type=int, default=512, help='')
parser.add_argument('--n_head', type=int, default=4, help='')
parser.add_argument('--d_k', type=int, default=64, help='')
parser.add_argument('--d_v', type=int, default=64, help='')
parser.add_argument('--layers', type=int, default=6, help='Number of stacked multi-head-layers layers')
parser.add_argument('--dropout', type=float, default=0.1, help='')
parser.add_argument('--active_layers', type=int, default=999, help='')
parser.add_argument('--input_file', type=str, default='../dataset/preprocessed_data.txt', help='Input file path')
opt = parser.parse_args()
import random, os, sys
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.initializers import *
import keras.backend as K
from transformer import Encoder, GetPosEncodingMatrix
print("Arguments", opt)
# Read, then decode for py2 compat.
text = open(opt.input_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# The unique characters in the file
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
# Creating a mapping from unique characters to indices
char2idx = {u: i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
# Show how the first 13 characters from the text are mapped to integers
print('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))
# The maximum length sentence we want for a single input in characters
seq_length = opt.seq_length
examples_per_epoch = len(text) // seq_length
# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
for i in char_dataset.take(5):
print(idx2char[i.numpy()])
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
for item in sequences.take(5):
print(repr(''.join(idx2char[item.numpy()])))
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
for input_example, target_example in dataset.take(1):
print('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
print('Target data:', repr(''.join(idx2char[target_example.numpy()])))
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
print("Step {:4d}".format(i))
print(" input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
print(" expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))
# Batch size
BATCH_SIZE = opt.batch_size
steps_per_epoch = examples_per_epoch // BATCH_SIZE
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
# Length of the vocabulary in chars
vocab_size = len(vocab)
def build_model(n_tokens, len_limit, batch_size, d_model=256, d_inner_hid=512, n_head=4, d_k=64, d_v=64, layers=6, dropout=0.1, active_layers=999):
d_emb = d_model
pos_emb = Embedding(len_limit, d_emb, trainable=False, \
weights=[GetPosEncodingMatrix(len_limit, d_emb)], \
batch_input_shape=[batch_size, None])
word_emb = Embedding(n_tokens, d_emb, batch_input_shape=[batch_size, None])
encoder = Encoder(d_model, d_inner_hid, n_head, d_k, d_v, layers, dropout, \
word_emb=word_emb, pos_emb=pos_emb)
target_layer = TimeDistributed(Dense(n_tokens, use_bias=False))
def get_pos_seq(x):
mask = K.cast(K.not_equal(x, 0), 'int32')
pos = K.cumsum(K.ones_like(x, 'int32'), 1)
return pos * mask
src_seq = Input(shape=(None,), dtype='int32')
src_pos = Lambda(get_pos_seq)(src_seq)
enc_output = encoder(src_seq, src_pos, active_layers=active_layers)
final_output = target_layer(enc_output)
model = Model(inputs=src_seq, outputs=final_output)
return model
model = build_model(vocab_size, seq_length+1, BATCH_SIZE, d_model=opt.d_model,d_inner_hid=opt.d_inner_hid,\
n_head=opt.n_head, d_k=opt.d_k, d_v=opt.d_v, layers=opt.layers, dropout=opt.dropout, active_layers=opt.active_layers)
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
model.summary()
def loss(labels, logits):
return tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", example_batch_loss.numpy().mean())
model.compile(
optimizer=tf.train.AdamOptimizer(),
loss=loss)
# Directory where the checkpoints will be saved
checkpoint_dir = './experiment/training_checkpoints_seq_len_{}'.format(opt.seq_length)
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
os.makedirs(checkpoint_dir, exist_ok=True)
#RecursionError: maximum recursion depth exceeded while saving parameters
sys.setrecursionlimit(10000)
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
EPOCHS = opt.epochs
history = model.fit(dataset.repeat(), epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])

Binary file not shown.

Before

Width:  |  Height:  |  Size: 135 KiB

View File

@@ -1,167 +0,0 @@
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--seq_length', type=int, default=100, help='Input sequence length given to the network')
parser.add_argument('--epochs', type=int, default=3, help='Number of training epochs')
parser.add_argument('--batch_size', type=int, default=64, help='Size of the training batches')
parser.add_argument('--d_model', type=int, default=256, help='')
parser.add_argument('--d_inner_hid', type=int, default=512, help='')
parser.add_argument('--n_head', type=int, default=4, help='')
parser.add_argument('--d_k', type=int, default=64, help='')
parser.add_argument('--d_v', type=int, default=64, help='')
parser.add_argument('--layers', type=int, default=6, help='Number of stacked multi-head-layers layers')
parser.add_argument('--dropout', type=float, default=0.1, help='')
parser.add_argument('--active_layers', type=int, default=999, help='')
parser.add_argument('--input_file', type=str, default='../dataset/preprocessed_data.txt', help='Input file path')
opt = parser.parse_args()
import random, os, sys
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.initializers import *
import tensorflow.keras.backend as K
from transformer import Encoder, GetPosEncodingMatrix
print("Arguments", opt)
# Read, then decode for py2 compat.
text = open(opt.input_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# The unique characters in the file
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
# Creating a mapping from unique characters to indices
char2idx = {u: i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
# Show how the first 13 characters from the text are mapped to integers
print('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))
# The maximum length sentence we want for a single input in characters
seq_length = opt.seq_length
examples_per_epoch = len(text) // seq_length
# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
for i in char_dataset.take(5):
print(idx2char[i.numpy()])
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
for item in sequences.take(5):
print(repr(''.join(idx2char[item.numpy()])))
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
for input_example, target_example in dataset.take(1):
print('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
print('Target data:', repr(''.join(idx2char[target_example.numpy()])))
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
print("Step {:4d}".format(i))
print(" input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
print(" expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))
# Batch size
BATCH_SIZE = opt.batch_size
steps_per_epoch = examples_per_epoch // BATCH_SIZE
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
# Length of the vocabulary in chars
vocab_size = len(vocab)
def build_model(n_tokens, len_limit, batch_size, d_model=256, d_inner_hid=512, n_head=4, d_k=64, d_v=64, layers=6, dropout=0.1, active_layers=999):
d_emb = d_model
pos_emb = Embedding(len_limit, d_emb, trainable=False, \
weights=[GetPosEncodingMatrix(len_limit, d_emb)], \
batch_input_shape=[batch_size, None])
word_emb = Embedding(n_tokens, d_emb, batch_input_shape=[batch_size, None])
encoder = Encoder(d_model, d_inner_hid, n_head, d_k, d_v, layers, dropout, \
word_emb=word_emb, pos_emb=pos_emb)
target_layer = TimeDistributed(Dense(n_tokens, use_bias=False))
def get_pos_seq(x):
mask = K.cast(K.not_equal(x, 0), 'int32')
pos = K.cumsum(K.ones_like(x, 'int32'), 1)
return pos * mask
src_seq = Input(shape=(None,), dtype='int32')
src_pos = Lambda(get_pos_seq)(src_seq)
enc_output = encoder(src_seq, src_pos, active_layers=active_layers)
final_output = target_layer(enc_output)
model = Model(inputs=src_seq, outputs=final_output)
return model
model = build_model(vocab_size, seq_length+1, BATCH_SIZE, d_model=opt.d_model,d_inner_hid=opt.d_inner_hid,\
n_head=opt.n_head, d_k=opt.d_k, d_v=opt.d_v, layers=opt.layers, dropout=opt.dropout, active_layers=opt.active_layers)
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
model.summary()
def loss(labels, logits):
return tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", example_batch_loss.numpy().mean())
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss=loss,
target_tensors=Input(shape=(None,), dtype='int32'))
# Directory where the checkpoints will be saved
checkpoint_dir = './experiment/training_checkpoints_seq_len_{}'.format(opt.seq_length)
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
os.makedirs(checkpoint_dir, exist_ok=True)
#RecursionError: maximum recursion depth exceeded while saving parameters
sys.setrecursionlimit(10000)
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
EPOCHS = opt.epochs
history = model.fit(dataset.repeat(), epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])

View File

@@ -1,473 +0,0 @@
# https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/master/transformer.py
import random, os, sys
import numpy as np
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.initializers import *
import keras.backend as K
import tensorflow as tf
class LayerNormalization(Layer):
def __init__(self, eps=1e-6, **kwargs):
self.eps = eps
super(LayerNormalization, self).__init__(**kwargs)
def build(self, input_shape):
self.gamma = self.add_weight(name='gamma', shape=input_shape[-1:],
initializer=Ones(), trainable=True)
self.beta = self.add_weight(name='beta', shape=input_shape[-1:],
initializer=Zeros(), trainable=True)
super(LayerNormalization, self).build(input_shape)
def call(self, x):
mean = K.mean(x, axis=-1, keepdims=True)
std = K.std(x, axis=-1, keepdims=True)
return self.gamma * (x - mean) / (std + self.eps) + self.beta
def compute_output_shape(self, input_shape):
return input_shape
class ScaledDotProductAttention(Layer):
def __init__(self, d_model, attn_dropout=0.1):
self.temper = np.sqrt(d_model)
self.dropout = Dropout(attn_dropout)
super(ScaledDotProductAttention, self).__init__()
def __call__(self, q, k, v, mask):
attn = Lambda(lambda x:K.batch_dot(x[0],x[1],axes=[2,2])/self.temper)([q, k])
if mask is not None:
mmask = Lambda(lambda x:(-1e+10)*(1-x))(mask)
attn = Add()([attn, mmask])
attn = Activation('softmax')(attn)
attn = self.dropout(attn)
output = Lambda(lambda x:K.batch_dot(x[0], x[1]))([attn, v])
return output, attn
class MultiHeadAttention(Layer):
# mode 0 - big martixes, faster; mode 1 - more clear implementation
def __init__(self, n_head, d_model, d_k, d_v, dropout, mode=0, use_norm=True):
self.mode = mode
self.n_head = n_head
self.d_k = d_k
self.d_v = d_v
self.dropout = dropout
if mode == 0:
self.qs_layer = Dense(n_head*d_k, use_bias=False)
self.ks_layer = Dense(n_head*d_k, use_bias=False)
self.vs_layer = Dense(n_head*d_v, use_bias=False)
elif mode == 1:
self.qs_layers = []
self.ks_layers = []
self.vs_layers = []
for _ in range(n_head):
self.qs_layers.append(TimeDistributed(Dense(d_k, use_bias=False)))
self.ks_layers.append(TimeDistributed(Dense(d_k, use_bias=False)))
self.vs_layers.append(TimeDistributed(Dense(d_v, use_bias=False)))
self.attention = ScaledDotProductAttention(d_model)
self.layer_norm = LayerNormalization() if use_norm else None
self.w_o = TimeDistributed(Dense(d_model))
def __call__(self, q, k, v, mask=None):
d_k, d_v = self.d_k, self.d_v
n_head = self.n_head
if self.mode == 0:
qs = self.qs_layer(q) # [batch_size, len_q, n_head*d_k]
ks = self.ks_layer(k)
vs = self.vs_layer(v)
def reshape1(x):
s = tf.shape(x) # [batch_size, len_q, n_head * d_k]
x = tf.reshape(x, [s[0], s[1], n_head, s[2]//n_head])
x = tf.transpose(x, [2, 0, 1, 3])
x = tf.reshape(x, [-1, s[1], s[2]//n_head]) # [n_head * batch_size, len_q, d_k]
return x
qs = Lambda(reshape1)(qs)
ks = Lambda(reshape1)(ks)
vs = Lambda(reshape1)(vs)
if mask is not None:
mask = Lambda(lambda x:K.repeat_elements(x, n_head, 0))(mask)
head, attn = self.attention(qs, ks, vs, mask=mask)
def reshape2(x):
s = tf.shape(x) # [n_head * batch_size, len_v, d_v]
x = tf.reshape(x, [n_head, -1, s[1], s[2]])
x = tf.transpose(x, [1, 2, 0, 3])
x = tf.reshape(x, [-1, s[1], n_head*d_v]) # [batch_size, len_v, n_head * d_v]
return x
head = Lambda(reshape2)(head)
elif self.mode == 1:
heads = []; attns = []
for i in range(n_head):
qs = self.qs_layers[i](q)
ks = self.ks_layers[i](k)
vs = self.vs_layers[i](v)
head, attn = self.attention(qs, ks, vs, mask)
heads.append(head); attns.append(attn)
head = Concatenate()(heads) if n_head > 1 else heads[0]
attn = Concatenate()(attns) if n_head > 1 else attns[0]
outputs = self.w_o(head)
outputs = Dropout(self.dropout)(outputs)
if not self.layer_norm: return outputs, attn
outputs = Add()([outputs, q])
return self.layer_norm(outputs), attn
class PositionwiseFeedForward(Layer):
def __init__(self, d_hid, d_inner_hid, dropout=0.1):
self.w_1 = Conv1D(d_inner_hid, 1, activation='relu')
self.w_2 = Conv1D(d_hid, 1)
self.layer_norm = LayerNormalization()
self.dropout = Dropout(dropout)
def __call__(self, x):
output = self.w_1(x)
output = self.w_2(output)
output = self.dropout(output)
output = Add()([output, x])
return self.layer_norm(output)
class EncoderLayer(Layer):
def __init__(self, d_model, d_inner_hid, n_head, d_k, d_v, dropout=0.1):
self.self_att_layer = MultiHeadAttention(n_head, d_model, d_k, d_v, dropout=dropout)
self.pos_ffn_layer = PositionwiseFeedForward(d_model, d_inner_hid, dropout=dropout)
def __call__(self, enc_input, mask=None):
output, slf_attn = self.self_att_layer(enc_input, enc_input, enc_input, mask=mask)
output = self.pos_ffn_layer(output)
return output, slf_attn
class DecoderLayer(Layer):
def __init__(self, d_model, d_inner_hid, n_head, d_k, d_v, dropout=0.1):
self.self_att_layer = MultiHeadAttention(n_head, d_model, d_k, d_v, dropout=dropout)
self.enc_att_layer = MultiHeadAttention(n_head, d_model, d_k, d_v, dropout=dropout)
self.pos_ffn_layer = PositionwiseFeedForward(d_model, d_inner_hid, dropout=dropout)
def __call__(self, dec_input, enc_output, self_mask=None, enc_mask=None):
output, slf_attn = self.self_att_layer(dec_input, dec_input, dec_input, mask=self_mask)
output, enc_attn = self.enc_att_layer(output, enc_output, enc_output, mask=enc_mask)
output = self.pos_ffn_layer(output)
return output, slf_attn, enc_attn
def GetPosEncodingMatrix(max_len, d_emb):
pos_enc = np.array([
[pos / np.power(10000, 2 * (j // 2) / d_emb) for j in range(d_emb)]
if pos != 0 else np.zeros(d_emb)
for pos in range(max_len)
])
pos_enc[1:, 0::2] = np.sin(pos_enc[1:, 0::2]) # dim 2i
pos_enc[1:, 1::2] = np.cos(pos_enc[1:, 1::2]) # dim 2i+1
return pos_enc
def GetPadMask(q, k):
ones = K.expand_dims(K.ones_like(q, 'float32'), -1)
mask = K.cast(K.expand_dims(K.not_equal(k, 0), 1), 'float32')
mask = K.batch_dot(ones, mask, axes=[2,1])
return mask
def GetSubMask(s):
len_s = tf.shape(s)[1]
bs = tf.shape(s)[:1]
mask = K.cumsum(tf.eye(len_s, batch_shape=bs), 1)
return mask
class Encoder():
def __init__(self, d_model, d_inner_hid, n_head, d_k, d_v, \
layers=6, dropout=0.1, word_emb=None, pos_emb=None):
self.emb_layer = word_emb
self.pos_layer = pos_emb
self.emb_dropout = Dropout(dropout)
self.layers = [EncoderLayer(d_model, d_inner_hid, n_head, d_k, d_v, dropout) for _ in range(layers)]
def __call__(self, src_seq, src_pos, return_att=False, active_layers=999):
x = self.emb_layer(src_seq)
if src_pos is not None:
pos = self.pos_layer(src_pos)
x = Add()([x, pos])
x = self.emb_dropout(x)
if return_att: atts = []
mask = Lambda(lambda x:GetPadMask(x, x))(src_seq)
for enc_layer in self.layers[:active_layers]:
x, att = enc_layer(x, mask)
if return_att: atts.append(att)
return (x, atts) if return_att else x
class Decoder(Layer):
def __init__(self, d_model, d_inner_hid, n_head, d_k, d_v, \
layers=6, dropout=0.1, word_emb=None, pos_emb=None):
self.emb_layer = word_emb
self.pos_layer = pos_emb
self.layers = [DecoderLayer(d_model, d_inner_hid, n_head, d_k, d_v, dropout) for _ in range(layers)]
def __call__(self, tgt_seq, tgt_pos, src_seq, enc_output, return_att=False, active_layers=999):
x = self.emb_layer(tgt_seq)
if tgt_pos is not None:
pos = self.pos_layer(tgt_pos)
x = Add()([x, pos])
self_pad_mask = Lambda(lambda x:GetPadMask(x, x))(tgt_seq)
self_sub_mask = Lambda(GetSubMask)(tgt_seq)
self_mask = Lambda(lambda x:K.minimum(x[0], x[1]))([self_pad_mask, self_sub_mask])
enc_mask = Lambda(lambda x:GetPadMask(x[0], x[1]))([tgt_seq, src_seq])
if return_att: self_atts, enc_atts = [], []
for dec_layer in self.layers[:active_layers]:
x, self_att, enc_att = dec_layer(x, enc_output, self_mask, enc_mask)
if return_att:
self_atts.append(self_att)
enc_atts.append(enc_att)
return (x, self_atts, enc_atts) if return_att else x
class Transformer:
def __init__(self, i_tokens, o_tokens, len_limit, d_model=256, \
d_inner_hid=512, n_head=4, d_k=64, d_v=64, layers=2, dropout=0.1, \
share_word_emb=False):
self.i_tokens = i_tokens
self.o_tokens = o_tokens
self.len_limit = len_limit
self.src_loc_info = True
self.d_model = d_model
self.decode_model = None
d_emb = d_model
pos_emb = Embedding(len_limit, d_emb, trainable=False, \
weights=[GetPosEncodingMatrix(len_limit, d_emb)])
i_word_emb = Embedding(i_tokens.num(), d_emb)
if share_word_emb:
assert i_tokens.num() == o_tokens.num()
o_word_emb = i_word_emb
else: o_word_emb = Embedding(o_tokens.num(), d_emb)
self.encoder = Encoder(d_model, d_inner_hid, n_head, d_k, d_v, layers, dropout, \
word_emb=i_word_emb, pos_emb=pos_emb)
self.decoder = Decoder(d_model, d_inner_hid, n_head, d_k, d_v, layers, dropout, \
word_emb=o_word_emb, pos_emb=pos_emb)
self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False))
def get_pos_seq(self, x):
mask = K.cast(K.not_equal(x, 0), 'int32')
pos = K.cumsum(K.ones_like(x, 'int32'), 1)
return pos * mask
def compile(self, optimizer='adam', active_layers=999):
src_seq_input = Input(shape=(None,), dtype='int32')
tgt_seq_input = Input(shape=(None,), dtype='int32')
src_seq = src_seq_input
tgt_seq = Lambda(lambda x:x[:,:-1])(tgt_seq_input)
tgt_true = Lambda(lambda x:x[:,1:])(tgt_seq_input)
src_pos = Lambda(self.get_pos_seq)(src_seq)
tgt_pos = Lambda(self.get_pos_seq)(tgt_seq)
if not self.src_loc_info: src_pos = None
enc_output = self.encoder(src_seq, src_pos, active_layers=active_layers)
dec_output = self.decoder(tgt_seq, tgt_pos, src_seq, enc_output, active_layers=active_layers)
final_output = self.target_layer(dec_output)
def get_loss(args):
y_pred, y_true = args
y_true = tf.cast(y_true, 'int32')
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)
mask = tf.cast(tf.not_equal(y_true, 0), 'float32')
loss = tf.reduce_sum(loss * mask, -1) / tf.reduce_sum(mask, -1)
loss = K.mean(loss)
return loss
def get_accu(args):
y_pred, y_true = args
mask = tf.cast(tf.not_equal(y_true, 0), 'float32')
corr = K.cast(K.equal(K.cast(y_true, 'int32'), K.cast(K.argmax(y_pred, axis=-1), 'int32')), 'float32')
corr = K.sum(corr * mask, -1) / K.sum(mask, -1)
return K.mean(corr)
loss = Lambda(get_loss)([final_output, tgt_true])
self.ppl = Lambda(K.exp)(loss)
self.accu = Lambda(get_accu)([final_output, tgt_true])
self.model = Model([src_seq_input, tgt_seq_input], loss)
self.model.add_loss([loss])
self.output_model = Model([src_seq_input, tgt_seq_input], final_output)
self.model.compile(optimizer, None)
self.model.metrics_names.append('ppl')
self.model.metrics_tensors.append(self.ppl)
self.model.metrics_names.append('accu')
self.model.metrics_tensors.append(self.accu)
def make_src_seq_matrix(self, input_seq):
src_seq = np.zeros((1, len(input_seq)+3), dtype='int32')
src_seq[0,0] = self.i_tokens.startid()
for i, z in enumerate(input_seq): src_seq[0,1+i] = self.i_tokens.id(z)
src_seq[0,len(input_seq)+1] = self.i_tokens.endid()
return src_seq
def decode_sequence(self, input_seq, delimiter=''):
src_seq = self.make_src_seq_matrix(input_seq)
decoded_tokens = []
target_seq = np.zeros((1, self.len_limit), dtype='int32')
target_seq[0,0] = self.o_tokens.startid()
for i in range(self.len_limit-1):
output = self.output_model.predict_on_batch([src_seq, target_seq])
sampled_index = np.argmax(output[0,i,:])
sampled_token = self.o_tokens.token(sampled_index)
decoded_tokens.append(sampled_token)
if sampled_index == self.o_tokens.endid(): break
target_seq[0,i+1] = sampled_index
return delimiter.join(decoded_tokens[:-1])
def make_fast_decode_model(self):
src_seq_input = Input(shape=(None,), dtype='int32')
tgt_seq_input = Input(shape=(None,), dtype='int32')
src_seq = src_seq_input
tgt_seq = tgt_seq_input
src_pos = Lambda(self.get_pos_seq)(src_seq)
tgt_pos = Lambda(self.get_pos_seq)(tgt_seq)
if not self.src_loc_info: src_pos = None
enc_output = self.encoder(src_seq, src_pos)
self.encode_model = Model(src_seq_input, enc_output)
enc_ret_input = Input(shape=(None, self.d_model))
dec_output = self.decoder(tgt_seq, tgt_pos, src_seq, enc_ret_input)
final_output = self.target_layer(dec_output)
self.decode_model = Model([src_seq_input, enc_ret_input, tgt_seq_input], final_output)
self.encode_model.compile('adam', 'mse')
self.decode_model.compile('adam', 'mse')
def decode_sequence_fast(self, input_seq, delimiter=''):
if self.decode_model is None: self.make_fast_decode_model()
src_seq = self.make_src_seq_matrix(input_seq)
enc_ret = self.encode_model.predict_on_batch(src_seq)
decoded_tokens = []
target_seq = np.zeros((1, self.len_limit), dtype='int32')
target_seq[0,0] = self.o_tokens.startid()
for i in range(self.len_limit-1):
output = self.decode_model.predict_on_batch([src_seq,enc_ret,target_seq])
sampled_index = np.argmax(output[0,i,:])
sampled_token = self.o_tokens.token(sampled_index)
decoded_tokens.append(sampled_token)
if sampled_index == self.o_tokens.endid(): break
target_seq[0,i+1] = sampled_index
return delimiter.join(decoded_tokens[:-1])
def beam_search(self, input_seq, topk=5, delimiter=''):
if self.decode_model is None: self.make_fast_decode_model()
src_seq = self.make_src_seq_matrix(input_seq)
src_seq = src_seq.repeat(topk, 0)
enc_ret = self.encode_model.predict_on_batch(src_seq)
final_results = []
decoded_tokens = [[] for _ in range(topk)]
decoded_logps = [0] * topk
lastk = 1
target_seq = np.zeros((topk, self.len_limit), dtype='int32')
target_seq[:,0] = self.o_tokens.startid()
for i in range(self.len_limit-1):
if lastk == 0 or len(final_results) > topk * 3: break
output = self.decode_model.predict_on_batch([src_seq,enc_ret,target_seq])
output = np.exp(output[:,i,:])
output = np.log(output / np.sum(output, -1, keepdims=True) + 1e-8)
cands = []
for k, wprobs in zip(range(lastk), output):
if target_seq[k,i] == self.o_tokens.endid(): continue
wsorted = sorted(list(enumerate(wprobs)), key=lambda x:x[-1], reverse=True)
for wid, wp in wsorted[:topk]:
cands.append( (k, wid, decoded_logps[k]+wp) )
cands.sort(key=lambda x:x[-1], reverse=True)
cands = cands[:topk]
backup_seq = target_seq.copy()
for kk, zz in enumerate(cands):
k, wid, wprob = zz
target_seq[kk,] = backup_seq[k]
target_seq[kk,i+1] = wid
decoded_logps[kk] = wprob
decoded_tokens.append(decoded_tokens[k] + [self.o_tokens.token(wid)])
if wid == self.o_tokens.endid(): final_results.append( (decoded_tokens[k], wprob) )
decoded_tokens = decoded_tokens[topk:]
lastk = len(cands)
final_results = [(x,y/(len(x)+1)) for x,y in final_results]
final_results.sort(key=lambda x:x[-1], reverse=True)
final_results = [(delimiter.join(x),y) for x,y in final_results]
return final_results
class LRSchedulerPerStep(Callback):
def __init__(self, d_model, warmup=4000):
self.basic = d_model**-0.5
self.warm = warmup**-1.5
self.step_num = 0
def on_batch_begin(self, batch, logs = None):
self.step_num += 1
lr = self.basic * min(self.step_num**-0.5, self.step_num*self.warm)
K.set_value(self.model.optimizer.lr, lr)
class LRSchedulerPerEpoch(Callback):
def __init__(self, d_model, warmup=4000, num_per_epoch=1000):
self.basic = d_model**-0.5
self.warm = warmup**-1.5
self.num_per_epoch = num_per_epoch
self.step_num = 1
def on_epoch_begin(self, epoch, logs = None):
self.step_num += self.num_per_epoch
lr = self.basic * min(self.step_num**-0.5, self.step_num*self.warm)
K.set_value(self.model.optimizer.lr, lr)
class AddPosEncoding:
def __call__(self, x):
_, max_len, d_emb = K.int_shape(x)
pos = GetPosEncodingMatrix(max_len, d_emb)
x = Lambda(lambda x:x+pos)(x)
return x
add_layer = Lambda(lambda x:x[0]+x[1], output_shape=lambda x:x[0])
# use this because keras may get wrong shapes with Add()([])
class QANet_ConvBlock:
def __init__(self, dim, n_conv=2, kernel_size=7, dropout=0.1):
self.convs = [SeparableConv1D(dim, kernel_size, activation='relu', padding='same') for _ in range(n_conv)]
self.norm = LayerNormalization()
self.dropout = Dropout(dropout)
def __call__(self, x):
for i in range(len(self.convs)):
z = self.norm(x)
if i % 2 == 0: z = self.dropout(z)
z = self.convs[i](z)
x = add_layer([x, z])
return x
class QANet_Block:
def __init__(self, dim, n_head, n_conv, kernel_size, dropout=0.1, add_pos=True):
self.conv = QANet_ConvBlock(dim, n_conv=n_conv, kernel_size=kernel_size, dropout=dropout)
self.self_att = MultiHeadAttention(n_head=n_head, d_model=dim,
d_k=dim//n_head, d_v=dim//n_head,
dropout=dropout, use_norm=False)
self.feed_forward = PositionwiseFeedForward(dim, dim, dropout=dropout)
self.norm = LayerNormalization()
self.add_pos = add_pos
def __call__(self, x, mask):
if self.add_pos: x = AddPosEncoding()(x)
x = self.conv(x)
z = self.norm(x)
z, _ = self.self_att(z, z, z, mask)
x = add_layer([x, z])
z = self.norm(x)
z = self.feed_forward(z)
x = add_layer([x, z])
return x
class QANet_Encoder:
def __init__(self, dim=128, n_head=8, n_conv=2, n_block=1, kernel_size=7, dropout=0.1, add_pos=True):
self.dim = dim
self.n_block = n_block
self.conv_first = SeparableConv1D(dim, 1, padding='same')
self.enc_block = QANet_Block(dim, n_head=n_head, n_conv=n_conv, kernel_size=kernel_size,
dropout=dropout, add_pos=add_pos)
def __call__(self, x, mask):
if K.int_shape(x)[-1] != self.dim:
x = self.conv_first(x)
for i in range(self.n_block):
x = self.enc_block(x, mask)
return x
if __name__ == '__main__':
print('done')

View File

@@ -5,13 +5,21 @@
pip install tensor2tensor==1.13.2
```
### Train
### Train Model
```
python t2t_paper_generation_problem/train.py
```
### Generate Paper from the Trained Model
```
python t2t_paper_generation_problem/generate_paper.py
usage: train.py [-h] [--folder FOLDER] [--model MODEL]
[--hparams_set HPARAMS_SET]
optional arguments:
-h, --help show this help message and exit
--folder FOLDER
--model MODEL
--hparams_set HPARAMS_SET
```
# Generate Paper from the Trained Model
See [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor)

View File

@@ -1,189 +0,0 @@
import os
import argparse
import numpy as np
import six
from tensor2tensor.data_generators import problem as problem_lib
from tensor2tensor.data_generators import text_encoder
from tensor2tensor.bin import t2t_trainer
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir
from tensor2tensor.utils import decoding
import tensorflow as tf
flags = tf.flags
FLAGS = flags.FLAGS
parser = argparse.ArgumentParser()
parser.add_argument('--seq_len', type=int, default=1024)
parser.add_argument('--sampling_temperature', type=float, default=0.6)
parser.add_argument('--folder', type=str, default="experiment")
parser.add_argument('--model', type=str, default="transformer", choices=["transformer"])
parser.add_argument('--hparams_set', type=str, default="transformer_small", choices=["transformer_small"])
opt = parser.parse_args()
folder = os.path.join(opt.folder, opt.model, opt.hparams_set)
tmp_dir = os.path.join(folder, "tmp")
data_dir = os.path.join(folder, "data")
output_dir = os.path.join(folder, "output")
generated_paper_path = os.path.join(output_dir, "generated_paper_{}.txt".format(opt.seq_len))
os.makedirs(os.path.dirname(generated_paper_path), exist_ok=True)
# Additional flags in bin/t2t_trainer.py and utils/flags.py
flags.DEFINE_string("checkpoint_path", None,
"Path to the model checkpoint. Overrides output_dir.")
flags.DEFINE_bool("keep_timestamp", False,
"Set the mtime of the decoded file to the "
"checkpoint_path+'.index' mtime.")
flags.DEFINE_bool("decode_interactive", False,
"Interactive local inference mode.")
flags.DEFINE_integer("decode_shards", 1, "Number of decoding replicas.")
flags.DEFINE_string("score_file", "", "File to score. Each line in the file "
"must be in the format input \t target.")
flags.DEFINE_bool("decode_in_memory", False, "Decode in memory.")
FLAGS.tmp_dir = tmp_dir
FLAGS.data_dir = data_dir
FLAGS.output_dir = output_dir
FLAGS.problem = "paper_generation_problem"
FLAGS.t2t_usr_dir = "t2t_paper_generation_problem"
FLAGS.model = opt.model
FLAGS.hparams_set = opt.hparams_set
FLAGS.decode_hparams="beam_size=1,alpha=0.6"
def create_hparams():
return trainer_lib.create_hparams(
FLAGS.hparams_set,
FLAGS.hparams,
data_dir=os.path.expanduser(FLAGS.data_dir),
problem_name=FLAGS.problem)
def create_decode_hparams():
decode_hp = decoding.decode_hparams(FLAGS.decode_hparams)
decode_hp.shards = FLAGS.decode_shards
decode_hp.shard_id = FLAGS.worker_id
decode_in_memory = FLAGS.decode_in_memory or decode_hp.decode_in_memory
decode_hp.decode_in_memory = decode_in_memory
decode_hp.decode_to_file = FLAGS.decode_to_file
decode_hp.decode_reference = FLAGS.decode_reference
return decode_hp
trainer_lib.set_random_seed(17)
usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)
hp = create_hparams()
decode_hp = create_decode_hparams()
hp.sampling_method="random"
hp.sampling_temp=opt.sampling_temperature
estimator = trainer_lib.create_estimator(
FLAGS.model,
hp,
t2t_trainer.create_run_config(hp),
decode_hparams=decode_hp,
use_tpu=FLAGS.use_tpu)
problem = hp.problem
def _interactive_input_tensor_to_features_dict(feature_map, hparams):
"""Convert the interactive input format (see above) to a dictionary.
Args:
feature_map: dict with inputs.
hparams: model hyperparameters
Returns:
a features dictionary, as expected by the decoder.
"""
inputs = tf.convert_to_tensor(feature_map["inputs"])
x = inputs
# Remove the batch dimension.
num_samples = x[0]
length = x[2]
x = tf.slice(x, [3], tf.to_int32([length]))
x = tf.reshape(x, [1, -1, 1, 1])
# Transform into a batch of size num_samples to get that many random
# decodes.
x = tf.tile(x, tf.to_int32([num_samples, 1, 1, 1]))
p_hparams = hparams.problem_hparams
input_space_id = tf.constant(p_hparams.input_space_id)
target_space_id = tf.constant(p_hparams.target_space_id)
features = {}
features["input_space_id"] = input_space_id
features["target_space_id"] = target_space_id
features["decode_length"] = inputs[1]
features["inputs"] = x
return features
def _interactive_input_fn(hparams, decode_length=1024, input_string="\documentclass"):
num_samples = 1
input_type = "text"
p_hparams = hparams.problem_hparams
has_input = "inputs" in p_hparams.modality
vocabulary = p_hparams.vocabulary["inputs" if has_input else "targets"]
# This should be longer than the longest input.
const_array_size = 10000
input_ids = vocabulary.encode(input_string)
if has_input:
input_ids.append(text_encoder.EOS_ID)
x = [num_samples, decode_length, len(input_ids)] + input_ids
assert len(x) < const_array_size
x += [0] * (const_array_size - len(x))
features = {
"inputs": np.array(x).astype(np.int32),
}
for k, v in six.iteritems(problem_lib.problem_hparams_to_features(p_hparams)):
features[k] = np.array(v).astype(np.int32)
yield features
def make_input_fn_from_generator(gen):
"""Use py_func to yield elements from the given generator."""
first_ex = six.next(gen)
flattened = tf.contrib.framework.nest.flatten(first_ex)
types = [t.dtype for t in flattened]
shapes = [[None] * len(t.shape) for t in flattened]
first_ex_list = [first_ex]
def py_func():
if first_ex_list:
example = first_ex_list.pop()
else:
example = six.next(gen)
return tf.contrib.framework.nest.flatten(example)
def input_fn():
flat_example = tf.py_func(py_func, [], types)
_ = [t.set_shape(shape) for t, shape in zip(flat_example, shapes)]
example = tf.contrib.framework.nest.pack_sequence_as(first_ex, flat_example)
return example
return input_fn
vocabulary = hp.problem_hparams.vocabulary["targets"]
output_text = "\documentclass"
while len(output_text) < opt.seq_len:
def input_fn():
gen_fn = make_input_fn_from_generator(
_interactive_input_fn(hp, decode_length=128, input_string=output_text))
example = gen_fn()
example = _interactive_input_tensor_to_features_dict(example, hp)
return example
prediction = list(estimator.predict(input_fn))[0]
outputs = prediction["outputs"]
if len(outputs) == 0:
print("-> Failed to Generate Full Length Paper")
break
new_text = vocabulary.decode(outputs)
output_text = output_text + new_text
with open(generated_paper_path, "wt") as f:
f.write(output_text)
print("-> Paper Generated at ", generated_paper_path)

View File

@@ -1,3 +1,5 @@
This code is adapted from https://github.com/kimiyoung/transformer-xl.
# Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
This repository contains the code in **PyTorch** for the paper