Text Bag of Words with Naive Bayes

Chris Tralie

2016 Presidential Debates

First, we load the first two debates and use them to build two different bag of words models: one for Trump and one for Clinton. Below is some code that uses lists so we don't have to repeat the same code too much

Next, we apply the models to each example in the third debate, and we get nearly perfect accuracy! In the process, we construct something called a "confusion matrix," where the row is the true class of the example we're classifying, and the columns count how often they were classified as another class. The more elements along the diagonal, the better