1) Gather data from three different sources:
2) Assess data for quality and tidiness:
3) Clean data to fix quality and tidiness issues identified:
4) Analyze and visualize the wrangled data:
Insights and visulizations produced include:
With the cleansed dataset created from gathering twitter data pertaining to the popular WeRateDogs dog [ref 1] rating provider on Twitter, analyzed and produced the following insights and visualizations:
1) correlation between favorite and retweet counts.
2) the trend of favorite and retweet counts with respect to time.
3) the trend of favorite and retweet counts with respect to classification of dog species.
4) performance of the dog image classifier.
dog
and hybrid
, which is in line with their respective species counts - 1194 dog and 472 hybrid.¶hybrid
and not dog
species classified by the neural network dog breed classifier, which led us to take a look at the performance of the classifier next.¶not dog
named Shaggy (Spanish Water Dog) and a p1 prediction 1.0 and looks like this ...¶dog
and hybrid
share the same rating of 13 while the not dog
has a rating of 10, which rank 4th and 2nd respectively in the top 10 dog ratings. Both the dog
and hybrid
are actually Labrador Retriever which ranks 3rd in the top 10 dog breeds. The not dog
was misclassified but is actually a Spanish Water Dog, though the breed is not among the top 10 dog breeds.¶twitter_archive_master
dataset, 305 entries were misclassified as not dog
, 472 as hybrid
(i.e. may be dog
). Only 1194 were correctly classified as dog
yet none attains the highest p1 predection of 1. Ironically, the not dog
has the highest prediction confidence of 1.¶