Raymond Willey
- Jul 30, 2019
- 11 min read

Building a Deep Neural Network to Analyze Presidential Speeches

Updated: Oct 25, 2019

If there shall be any novelty, it will be in the mode of presenting the facts, and the inferences and observations following that presentation. –Abraham Lincoln

A more technical analysis of this project, with more detail on the underlying code and methodology, can be found on Toward Data Science. Links to relevant Jupyter Notebooks can be found in the references at the end.

Transcript:

Hello everyone, and welcome to information overload! Today, we are going to be looking at how an artificial neural network can be applied to natural language processing for a classification task. Wow, that was a mouthful! Let’s translate this into English.

The human brain is itself a neural network. It is made up of billions of neurons interconnected, each firing in response to some sort of external stimulus to trigger a reaction. Consider the following example: imagine you’re sitting at home and you hear sirens outside, and in your experience, whenever you’ve heard sirens, it has either been in response to a fire or car accident. So you have this external event which triggers a series of responses in your brain, leading you to figure there’s probably been a fire or car accident. A few moments later, you smell smoke, which could either be due to an actual fire, but it could also be a coincidence and your downstairs neighbors are just burning their dinner. Nevertheless, a new series of neurons in your brain go off, and the likelihood of it being a fire goes up. Finally, you look outside and you see smoke, another line of neurons shoot off, and they all lead to the same conclusion: there is a fire. Of course, you aren’t born with neurons that are designed to recognize firetrucks or smoke, it’s something you learn through experience. And there aren’t just a few neurons, but billions in your brain that are constantly going off to all sorts of stimuli. Well, artificial neural networks are designed to work the same way, in principle. With hundreds or thousands of artificial neurons and millions of connections, artificial networks are built and optimized for specific tasks.

So, let’s say we want to build a network that can take on a task that was at one time considered to be uniquely human: speech and language recognition (otherwise known as Natural Language Processing, or NLP). From the time we are born, we start to learn language. It begins with simple recognition of sounds and characters, to trial and error, and continues with more formal education and training to the point where we can recognize more than just words and sentences, and we can recognize styles.

Consider two well known, yet very distinct filmmakers: Wes Anderson and Quentin Tarantino. Let’s have a look at a few quotes, and see if you can identify whose script it comes from:

“If you shoot me in a dream, you better wake up and apologize.”

No surprise that this comes from a Tarantino classic: Reservoir Dogs. How about this one?

“I love you, but you don’t know what you’re talking about.”

This one comes from Anderson’s Moonrise Kingdom. Clearly, from these two quotes, we can see a clear distinction between the two. Ok, last one:

“Before we attack each other and tear ourselves to shreds like a pack of maniacs, let's just open the sack first and see what's actually in it. It might not even be worth the trouble.”

Well this one is a bit trickier. My wife is a big fan of both filmmakers, but we haven’t yet seen this movie: she guessed it was a Tarantino quote, but it is in fact from Anderson’s Isle of Dogs. What is interesting here is that, while the first two quotes provide insight to how these directors differ, the third points to their similarities.

So, this gives you a sense of how your brain, a neural network, handles a natural language classification task. The challenge we have is to see if we can build and train an artificial network to do the same thing and provide the same type of insight. Only, instead of looking at movie quotes, our network is going to learn from presidential speeches. Can a network read a subset of passages from two presidents’ speech corpora, and use them to correctly identify the speaker of passages it has never seen before? Well, let’s find out.

To start, we want to look at two individuals with whom most people have strong familiarity, and who are known to be near opposites. This will make it easier for us to interpret the qualitative aspects of the results, allowing us to derive better insight to the strengths and weaknesses of our network before applying to presidents who are less familiar. Where better to start than by looking at our two most recent presidential candidates: Donald Trump and Hillary Clinton. We are only looking at speeches each candidate made on the 2016 campaign trail. Or network will take data in 50- to 100-word chunks, training on half of the data, then using what it has learned to correctly identify the rest.

Now, to truly gauge how well our network performs, we want to establish a benchmark using traditional machine learning methods: because artificial neural networks are much more resource intensive, we should expect that they will at least outperform traditional models. The best of three different model performances are chosen as a benchmark, and in this case, a logistic regression was able to achieve an accuracy score of 78.5%.

So, how do neural networks compare? Well, two general types of networks were trained and tested. The first is what’s called a Recurrent Neural Network (RNN for short), which has an architecture that has been well optimized and is frequently used for these types of tasks. Sixteen different variations of this architecture were applied, and it was able to achieve an accuracy score of 93.1%! Not bad for a first run!

The other model applied was a Convolutional Neural Network (or CNN), which is traditionally used for image recognition, but has recently begun to show promise in the NLP space as well. These run considerably lighter than Recurrent networks, but the structure used also has more parameters that can be adjusted, so 72 variations of this model were trained and tested. So, how did our Convolutional network do? Well, not quite as well as the Recurrent network, but it still achieved a 91.7% accuracy score. So, both of our models demonstrated significant improvement over our benchmark, but for a deeper analysis of the results, we’ll use the predictions made by our top performing model.

Now, in total, 1,780 predictions were made, of which 123 were inaccurate. It’s worth noting that there are two ways in which an error can be made: either by predicting that Trump said something that was actually Clinton, or that a passage was spoken by Clinton when it was in fact Trump. From the upper right and lower left quadrants of the chart, we can see that the division of these error types are well balanced. With that in mind, let’s have a look at an example of a mistake that our network made, and see if you can correctly identify the speaker:

“But there are common sense things that your government could do that would give Americans more opportunities to succeed. Why don't we do it? Because powerful special interests and the tendency to put ideology ahead of political progress has led to gridlock in Congress. And how can you not be frustrated, and even angry, when you see nothing getting done? And a lot of people feel no one is on their side and no one has their back and that is not how it's supposed to be in America.”

So, is this Trump or Clinton? Given that Trump largely ran on a platform that was based on shaking things up in Washington, it wouldn’t be far-fetched to guess that this quote came from him. Unfortunately, such a guess would be incorrect, as this one comes from Hillary Clinton. This result may or may not surprise you, but the main takeaway here should be that this type of quote highlights an area where the two candidates seemingly have something in common. Still, an accuracy score of over 93% indicates that these candidates have a lot more that divides them than unifies them. So, let’s see if we can get a sense of their differences by looking at their word usage.

Now, rather than looking at just the most common words for each candidate, we want to look at the words used more frequently by one candidate relative to the other. Let’s start with Hillary Clinton. Here, we see frequent references to small business and young people, as well as references to service and senate, likely reflecting topics related to her experience in office. That sounds about right. Let’s have a look at Trump.

What jumps out the most here is that we see a lot of direct references to Clinton, more so than any other topic. While we do also see references to immigration, trade deficits, and regulations, their prominence is much less pronounced than one might expect.

Now, we were of course all there to witness the campaign, so there’s not a whole lot of sense in taking a deep dive here. What we’re really trying to do is perform a bit of a sanity check to see if our results make sense.

However, there’s one other set of words I think is worth looking at, and that is the most common words that exist within the space of prediction errors. Can we identify common themes that seem to be a bit more universal?

Here we can see certain words and phrases that are more or less par for the course for any politician running for president: American flag, work together, etc. Now, the word opportunity lead us to that quote from Hillary earlier, so let’s try another one: infrastructure. Here’s a quote from Donald Trump.

“And we're gonna use all of that money to invest in the infrastructure and we could even say the environmental infrastructure of our country. And remember this, remember this. When it comes to environment because I receive many environmental awards, you know, people don't wanna talk about that, that's OK. But we need absolutely crystal clear and clean water.”

Now, I don’t know about you, but I don’t exactly think of Donald Trump as being synonymous with environmental infrastructure and clean energy. But taken at face value, it’s easy to understand why our network predicted that this quote came from Clinton. The point here is that our network is shining a spotlight on potential areas to look for common ground similarities: it’s still up to us draw our own conclusions from the results. Unfortunately, we’re not yet at a point where all the errors made are this easy to understand, so there’s still more work to be done. As our networks are tuned and refined, and accuracy scores are driven even higher, we can give that spotlight more and more focus, thereby reducing our search area.

Before we get to that though, let’s apply this process to other presidential pairings and see what we get. Starting with modern history, let’s have a look at Barrack Obama and Ronald Reagan. We can see that traditional methods were able to make predictions with an accuracy rate of just under 75%, while a Recurrent Network was able to achieve a rate of 87.6%! This is also a significant improvement, but the lower score relative to Clinton and Trump suggests that these two may have a bit more in common than their modern counterparts. If we have a look at their word usage, we can get a sense of the themes related to their respective presidencies. With Obama we see a big focus on health care, whereas with Reagan the focus is on the Soviet Union and Cold War. Now, we’re not going to go into the same level of detail here as we did with Trump and Clinton, we’re just taking a quick look at some examples. But you can find links to the respective Jupyter Notebooks in the links below to dive deeper if so desired.

Next, let’s have a look at two of the biggest names in the nation’s history: Lincoln and Washington. Here we see a Convolutional Neural Network produced the best results, though it only slightly outperformed traditional models. Of course, with Lincoln we see clear references to slavery, so no surprises there. Yet with Washington, identifying any clear themes is a bit more difficult. It leads one to suspect that the aim of his presidency, at least from the perspective of public persona, was to keep the country united, trying to not rock the boat too much while the country was in its infancy.

Let’s have a look at one more: Teddy Roosevelt and Andrew Jackson. Here we find our highest performing neural network with a balanced accuracy score of 94.3%, and a great improvement over the benchmark. And as we look at the differences in word usage, certain theme seems to be emerging: the greater the distance in time between presidencies, the more the differences seem to reflect differences in the times than differences in the individuals. Now, keep in mind this is just a first impression, and when we use these in conjunction with network predictions, we can start getting a more definitive picture.

So, where do we go from here? What can we do with these models? Well, the idea is that we can apply them to any two politicians or individuals in search of areas of common ground. In a country divided, I think it’s good to look for opportunities to come together. Unfortunately, the model isn’t quite ready for prime time and will need some refinement by improving both accuracy and efficiency. In order to do that, we need to gather more data so that we can get a clearer sense of which models and hyperparameters yield the most consistent results, as well as what type of data features are best suited for this type of analysis. So, I went ahead and applied randomly selected models to 500 randomly selected pairs of presidents and will share some of the initial findings here.

First off, one big issue is data imbalance. On one end of the spectrum we have Lyndon Johnson, who gave 71 speeches and for whom we have by far the most data, and on the other we have James Garfield who was assassinated after 100 days in office and whose only documented speech was his inaugural address. If we wanted to perform a fair comparison of the two, we would have to select a sample of passages from Johnson that is equivalent in size to Garfield. As you can imagine, this is going to result in a great loss of data and creates a number of potential problems.

With that in mind, let’s see how sample size impacts accuracy. As we can see here, absolute network performance, as well as performance relative to the benchmark, greatly suffers when we have fewer than 500 samples. We see lower average scores and greater variation in the results, indicating that for such small samples, those results represent little more than random chance. This is something that needs to be taken into consideration when comparing other politicians in the future.

The other items we want to look at are network hyperparameters and architecture. As mentioned before, the two general types of models used were Recurrent and Convolutional networks. However, there were two different types of Recurrent networks used: one uses Long-Short-Term-Memory neurons (or LSTM), and the other uses Gated Recurrent Units (otherwise known as GRUs). So, let’s see how performance of these models compare to one another:

As we can see, LSTM and GRU models greatly outperformed Convolutional networks, varying only slightly from one another. The Convolutional networks generally performed rather horribly overall. However, what is interesting is that these networks were still found to be the best performer in over 20% of the samples selected. This could either mean that there is a limited set of hyperparameters which make such models effective, or it could mean that these models perform better only when specific sets of data features are present. Either way, this is worth further exploration. And much work remains to be done, but the good thing is that there is a clear path forward.

So, with that, there is one last thing I’d like to leave you with. When performing the analysis on all 500 pairs of presidents, I kept track of the prediction errors when our model achieved sufficient accuracy scores. With this, we can create one last word cloud that you might think of as representing some of the common themes throughout the history of US presidential politics. I hope you like it.

In the meantime, you can find links to all the notebooks below, as well as a more technical analysis of the work done in this project, along with images of all word clouds created if you want to explore further. Also, I have included a link to a Jupyter Notebook that will allow you to run an analysis on any two presidents you like if you want to give that a shot. If you have no idea what a Jupyter Notebook is, but are curious how any specific two presidents compare, feel free to let me know via email or in the comments, and I’ll be happy to produce those results for you. Thanks for watching!

References

A technical blog with more detail on the code and methodology behind this analysis can be found on Toward Data Science.

The repository for this project with all relevant Jupyter Notebooks can be found on Github. One notebook was created which will allow you to perform an analysis on any 2 presidents you like, assuming you are already familiar with Jupyter Notebooks, with the filename Choose_Your_Own.ipynb.

Building a Deep Neural Network to Analyze Presidential Speeches

Recent Posts