[Teachable NLP] GPT-2 Lord of the rings model

Teachable NLP: link
Tabtab: link
Ainize: View API


Teachable NLP is a no-code program that helps you tune the GPT-2 model without the need for complex code and GPU (graphics card). The GPT-2 model is a transformers model that can generate various texts. This model transforms into an attractive model that can write novels, poems, songs, etc., depending on what data it is trained on.

Then, while thinking about what to train Teachable NLP, the world famous Tolkien novel The Lord of the Rings came to mind, and this data was found in Kaggle.

1. Data!

THE RETURN OF THE KING

        _being the third part of
        The Lord of the Rings_

                       _Chapter 1_
        Minas Tirith

 Pippin looked out from the shelter of Gandalf's cloak. He wondered if he was awake or still sleeping, still in the swift-moving dream in which he had been wrapped so long since the great ride began. The dark world was rushing by and the wind sang loudly in his ears. He could see nothing but the wheeling stars, and away to his right vast shadows against the sky where the mountains of the South marched past. Sleepily he tried to reckon the times and stages of their journey, but his memory was drowsy and uncertain.
 There had been the first ride at terrible speed without a halt, and then in the dawn he had seen a pale gleam of gold, and they had come to the silent town and the great empty house on the hill. And hardly had they reached its shelter when the winged shadow had passed over once again, and men wilted with fear. But Gandalf had spoken soft words to him, and he had slept in a corner, tired but uneasy, dimly aware of comings and goings and of men talking and Gandalf giving orders. And then again riding, riding in the night. This was the second, no, the third night since he had looked in the Stone. And with that hideous memory he woke fully, and shivered, and the noise of the wind became filled with menacing voices.
 A light kindled in the sky, a blaze of yellow fire behind dark barriers Pippin cowered back, afraid for a moment, wondering into what dreadful country Gandalf was bearing him. He rubbed his eyes, and then he saw that it was the moon rising above the eastern shadows, now almost at the full. So the night was not yet old and for hours the dark journey would go on. He stirred and spoke.
 'Where are we, Gandalf?' he asked.
 'In the realm of Gondor,' the wizard answered. 'The land of An�rien is still passing by.'

There were 3 data, and many accented characters were broken with “�”.

I want to make a model that generates a novel-like text for The Lord of the Rings. However, if I put the data as it is, it seems that unnecessary spaces and words will be learned. So I decided to clean the unnecessary data.

First, I removed titles and headings that were not essential for training. It seemed that if this part was used for learning, the flow of the generated novel would break or awkward words would come out.

Secondly, I removed useless white spaces and line breaks. If the model learns bad white spaces, there is a possibility that awkward white spaces will appear in the generated results.

There weren’t many titles and headings, so I removed it myself rather than writing code, and the spaces were removed with simple Python code. And then combined the data.

Hobbits are an unobtrusive but very ancient people, more numerous formerly than they are today; for they love peace and quiet and good tilled earth: a well-ordered and well-farmed countryside was their favourite haunt. They do not and did not understand or like machines more complicated than a forge-bellows, a water-mill, or a hand-loom, though they were skilful with tools. Even in ancient days they were, as a rule, shy of ‘the Big Folk’, as they call us, and now they avoid us with dismay and are becoming hard to find. They are quick of hearing and sharp-eyed, and though they are inclined to be fat and do not hurry unnecessarily, they are nonetheless nimble and deft in their movements. They possessed from the first the art of disappearing swiftly and silently, when large folk whom they do not wish to meet come blundering by; and this an they have developed until to Men it may seem magical. But Hobbits have never, in fact, studied magic of any kind, and their elusiveness is due solely to a professional skill that heredity and practice, and a close friendship with the earth, have rendered inimitable by bigger and clumsier races. For they are a little people, smaller than Dwarves: less tout and stocky, that is, even when they are not actually much shorter. Their height is variable, ranging between two and four feet of our measure. They seldom now reach three feet; but they hive dwindled, they say, and in ancient days they were taller. According to the Red Book, Bandobras Took (Bullroarer), son of Isengrim the Second, was four foot five and able to ride a horse. He was surpassed in all Hobbit records only by two famous characters of old; but that curious matter is dealt with in this book. As for the Hobbits of the Shire, with whom these tales are concerned, in the days of their peace and prosperity they were a merry folk. They dressed in bright colours, being notably fond of yellow and green; but they seldom wore shoes, since their feet had tough leathery soles and were clad in a thick curling hair, much like the hair of their heads, which was commonly brown. Thus, the only craft little practised among them was shoe-making; but they had long and skilful fingers and could make many other useful and comely things. Their faces were as a rule good-natured rather than beautiful, broad, bright-eyed, red-cheeked, with mouths apt to laughter, and to eating and drinking. And laugh they did, and eat, and drink, often and heartily, being fond of simple jests at all times, and of six meals a day (when they could get them). They were hospitable and delighted in parties, and in presents, which they gave away freely and eagerly accepted. It is plain indeed that in spite of later estrangement Hobbits are relatives of ours: far nearer to us than Elves, or even than Dwarves. Of old they spoke the languages of Men, after their own fashion, and liked and disliked much the same things as Men did. But what exactly our relationship is can no longer be discovered. The beginning of Hobbits lies far back in the Elder Days that are now lost and forgotten. Only the Elves still preserve any records of that vanished time, and their traditions are concerned almost entirely with their own history, in which Men appear seldom and Hobbits are not mentioned at all. Yet it is clear that Hobbits had, in fact, lived quietly in Middle-earth for many long years before other folk became even aware of them. And the world being after all full of strange creatures beyond count, these little people seemed of very little importance. But in the days of Bilbo, and of Frodo his heir, they suddenly became, by no wish of their own, both important and renowned, and troubled the counsels of the Wise and the Great.

2.5MB of text data with 754 lines have been created and ready to be used for tuning! 1 MB of data is enough to fine-tuning GPT-2.

2. Training!

I uploaded data to Teachable NLP and tried tuning. At first, I wanted to use large (GPT-2 large model), but it will take a long time, so ModelType is medium (GPT-2 medium model), and 5 epochs, hoping for good results. The medium also seemed to take a lot of time, but the tuning time was as short as 30 minutes.

3. Generate!

The hobbits started heading towards the tower of ��owyn, as soon as the gates were shut again.
They found her still clinging to the lex in her hurry. ‘I am not a Black Rider, �omer,’ she said.

After training was completed, I clicked “Test your model” and tested the model via Tabtab.

An interesting-looking, Lord of the Rings style text has been generated, but some of the words are “�” so it doesn’t look good.

4. Training again!

I looked for a way to restore them to their original words, but there was no such module or method. However, if I erased it, it would not become a true Lord of the Rings, so I googled and looked for the original of the words by analogy.

line = line.replace("D�nedain", "Dunedain")
line = line.replace("N�menor", "Numenor")
line = line.replace("Sm�agol", "Smeagol")
line = line.replace("D�agol", "Deagol")
line = line.replace("Tin�viel", "Tinuviel")
line = line.replace("L�thien", "Luthien")
line = line.replace("E�rendil", "Earendil")
line = line.replace("Und�miel", "Undomiel")
...

I found the original characters and corrected the broken characters using Python’s replace. Even if you don’t know Python, you can correct the characters of various editor programs such as notepad, Google docx, vi editor, etc. with the replace function.

Once again, I uploaded data to Teachable NLP and tried tuning. Again, ModelType used medium and 5 epochs.

5. Generate again!

The hobbits started heading towards the tower of The eastern window of the inn was open, and a crowd was there waiting for them inside.
‘Be careful of your words, Sam!’ said Strider in a low voice. 'I have not seen the Lady of the Galadhrim in seven days, nor spoken with the Lady again.
Some have accused me of sorcery, but I have not studied it.

Again, I tested the model via Tabtab. The broken words were corrected, resulting in a more plausible result.

What are the results? Doesn’t it seem plausible? Everyone, try the model I made and make your own Lord of the Rings.

And you guys, too, make a model using your favorite novels. You will be able to create a model that can make fun fanpics!



3 Likes

[Korean ver.] 한글 버전 튜토리얼

1 Like