A beginner’s grapplings with Natural Language Processing

Amelia Elton
5 min readNov 9, 2020

--

This article seeks to simplify the complex filed of Natural Language Processing, explore how it can be used in everyday life, and provide a look to the future.

Woman, sitting at a desk with many documents, rubbing her head.
aPhoto by Birmingham Museums Trust on Unsplash

Human speech is messy. From the changing colloquialisms of every day conversation, to rules that are broken for no understandable reason with verb irregularities, human speech is the opposite of the formally structured language of computers.

It should be no surprise, then, that there is a whole field (and many subfields within) of computer science dedicated to bridging this divide between humans and computers.

What is Natural Language Processing? (NLP)

Natural Language Processing (NLP) is a field of computer science that allows computers to analyze language for its meaning, rather than see it as simply a collection of strings.

To visualize this you can compare the human question of “How much does a blue whale weigh when it is born?” to the computer-friendly subject headings: “blue whale”
“birth weight”.

Natural Language Processing allows computers to learn, process that learning, and then manipulate language in a way that mirrors human communication. With this, a computer should not merely be able to understand human communication, but be able to replicate and manipulate it. In answer to the above pondering, a computer that was well versed in NLP could answer “A blue whale weighs up to three tons at birth”, as opposed to an untrained computer, which would merely troll a database (given that you structured your search terms of “blue whale” and “birth weight” correctly) and output a list of articles with matching keywords.

Back to the beginning

Before we look at modern uses of Natural Language Processing, let’s take a look at its origins.

Natural Language Processing dates back to the 1940s and World War II. Researchers such as Warren Weaver and Alan Turing (and countless others) realized the potential of computers to recognize and break enemy codes. What followed was the field of Machine translation, an attempt to build computer systems with a dictionary like lookup that followed the word-order rules of a given language.

While dissecting sentences and looking up their words is a good way to break a code, it can get complicated quickly if attempted to be used in natural conversation. Remember, language is messy.

A simple sentence “The cat ate a bat” quickly becomes confusing when you try to dissect it literally, with no outside knowledge.

The cat = 😸

ate = IRREGULAR past tense of eat

a bat = 🦇 … 🏏 … ?

With no understanding of context, our cat could be bringing us the gift of a dead animal bat, or a dead cricket/baseball bat. Our verb is also irregular, which means we would have to explicitly have told or computer that “ate” is the past tense of “eat” prior to its encountering this sentence.

Continuing with our cat references, the sentence “The cat’s out of the bag!” would literally leave computers searching for our runaway cats (how did it escape the protection of the bag???) as opposed to feeling betrayed because their secret was shared.

Thankfully for our cats, there have been many developments in the world of Natural Language Processing since its inception. With help from the fields of linguistics and statistics, NLP has become an extremely complex subfield of Artificial Intelligence, and has lead to many modern day technologies.

Photo by visuals on Unsplash

“Siri… what’s the weather like today?”

Natural Language Processing is seamlessly integrated with many tools used in everyday life. If Siri/Alexa is the first person 🤖 you interact with when you wake up, you can thank NLP! You can also see it’s influences almost daily in the below examples:

  • Grammar check
  • Speech to text
  • Automatic translation
  • Predictive typing (seen in search engines)
  • Email spam detection
  • Chat bots (the friendly chat option provided as customer services on many websites)
  • Smart assistants (Siri, Alexa)

And so many more!

Why now?

Thankfully for computers, there is now an enormous amount of data available to learn from. Blogs, news sites, social media posts, google searches, and emails are just the tip of the iceberg of information being circulated on the web.

These resources provide computers with a massive archive of human communication.

Image by Raconteur

For perspective, my laptop is currently holding 148.57 GB of data. That is an accumulation of around four years of documents, photos, and goodness knows what else, that would take me the good chunk of a week to review. 148.57 GB is a mere 0.004% of the amount of data being created in a single day by Facebook alone.

With that amount of data circulating, and only more to come, Natural Language Processing is a necessary tool to navigate current and future communication.

Consider the following:

A company, trying to keep track of customer satisfaction rates
A human, or even a team of people would be completely overwhelmed when tasked with navigating the world of online reviews. Semantic Analysis can detect opinions in text to ensure companies are up to date with their customer’s satisfaction. In a world where media influencers hold consumer attention, it is vital for companies to have an eye on customer mood.

Social media platforms, attempting to quell disinformation
Natural Language Processing uses algorithms to catch speech and data patterns common in misinformation campaigns to flag articles. This allows media staff to focus on specific articles, without having to review every piece of news being posted. With disinformation holding sway in things as important as the Presidential race, this is an area that cannot be overlooked.

The potential for user-oriented NLP is broad ranging, able to tackle questions of customer service and satisfaction, to banking, with the potential to tackle increasingly complex queries.

With the world moving further and further online Natural Language Processing will only become more important in making sense of all the data being produced.

--

--