The last time I did something serious in natural language processing was during my days at RKMVERI,
probably for selecting a project for my ML-1 course, back in 2020. During that time, I thought
language generation was a cool thing, and people back then were using all sorts of RNNs, LSTMs, Bidirectional
LSTMs, and whatnot to generate/summarize texts and even for sentence completion tasks. I didn't have a
fascination for NLP tasks since the ideas were not natural to me. If I had taken a linguistics course,
I would have taken a keen interest in it. The only interesting thing that came to my mind was story
generation. I didn't have the requisite knowledge to make this idea into reality, since I was starting to
read ML back at that time, but I was fascinated by the way the human mind compresses data and spits out
interesting, and sometimes randomly connected stories during REM sleep. I thought, if I have some 4-5
keywords and I would generate some nearest token to connect those keywords, and have 3-4 passes to
interpolate more interesting ideas inside it, I can perhaps come up with a story generation mechanism.
Later, when I discussed this thing with my ML-1 mentor, he said that many people at the Ivy League are
trying to do this, so he didn't recommend I invest time in something that will take more time than the
completion of the course project, since I could afford about 3-4 months for a course project. So, I
worked on a simple task, i.e., song generation. After working for 2-3 days, collecting some Keras code
from Kaggle and piecing together a pipeline to do so, after repeated failed attempts, I was able to train
a bidirectional LSTM to complete a line of a song, and generate about 1024 characters more. The moment I
completed the training and gave some randomly rhyming words for completion, it generated complete unrelated
rhyming garbage, with a lot of slang. I still remember giving it a prompt to complete a song. I started with:
“I was a little boy” and the output it generated was:
“I was a little boy, she was a little bitch ...”
followed by those slurs repeated again and again.
I thought this was doing something which I didn't expect it to do,
like, this is generating curse word songs, and I wouldn't be able to present this to my mentor, so I gave
up on this project and started working on image segmentation in the medical domain. Later I found out that
the dataset I used for training had a lot of curse words in it, and the model just learned to mimic the dataset.
I was a bit disappointed, but I thought that this is how things work in ML, and I should have been more careful
while selecting the dataset, but it was too late, and I have already invested a lot of time in medical image segmentation
This was the last time I touched NLP code seriously, and after that, now, I am thinking of catching up with all
those things that I have ditched back then. After the invention of LLMs for chatting (like ChatGPT, etc) and
after hearing Geoffrey Hinton's and Illya Stuskever's discussions, I thought LLMs might think the way humans
think, that they might be a bit conscious, but after working with transformers for a while, I think that is
complete nonsense. The idea of even thinking that something like language might someday wake up consciousness
in a machine feels disgusting. These models, which we call LLMs, are essentially Ogre, eating and compressing
a lot of data into their latent manifold, and spitting out interpolated, sometimes weird, unrelated mispredicted
stuff (also called hallucination in modern day) for a query. This works in most of the usual and mundane
tasks at hand, but I don't think that you can have something like mind/consciousness/reasoning in machines.
They might simulate it, but not like humans, who actually reason to find solutions to complex problems.
Actually, I don't even think that reasoning/consciousness can be represented in some mathematical representations.
They are just something that can't be done algorithmically (which is essentially some form of mathematical construct).
I completely disagree with the idea that I thought in my previous
blog, but this again can be debated endlessly.
So, with all these background yapping, let's dive into the pytorch code, where we talk about the transformers paper,
implement a code for translation and break down some essential components.