Project - Wrangle and Analyze Twitter Feed Data
Slide Show (Hit Auto Play
button to suspend/resume slide show or Manual Play
button to advance slide page manually)
WeRateDogs Twitter Feed Dataset Analysis
Wrangle WeRateDogs Twitter feed data to create interesting and trustworthy analyses and visualizations. This project is part of the deliverables for my Data Analyst Nanodegree with Udacity.
DataSet
Gather data from these 3 sources:
- WeRateDogs Twitter archive. This contains 5000+ basic tweet data about dog rating, name, and “stage”.
- tweet image predictions from Udacity site. This file contains dog breed prediction results (from a Neural Network classifier) for every dog images from the WeRateDogs Twitter archive.
- Twitter API tweepy. Use this API to query additional data (in JSON format) for each tweet ID in the WeRateDogs Twitter archive.
Process
-
After gathering the data, assess them visually and programmatically for quality and tidiness issues. Detect and document at least eight (8) quality issues and two (2) tidiness issues.
-
Clean each of the issues identified, document the cleaning steps taken and output the results to a high quality and tidy master pandas DataFrame.
-
Analyze and visualize the wrangled data to produce at least three (3) insights and one (1) visualization.
-
Produce a 300-600 word written report with brief description of the wrangling efforts, frame it as an internal document. Create another 250-word-minimum written report to communicate the insights and displays the visualization(s) produced from the wrangled data. Frame it as an external document, like a blog post or magazine article.
Tool
Jupter Notebook (Python)
Programming library
panda, numpy, matplotlib, seaborn, tweepy, json, requests
Artifact
- wrangle_act.ipynb
- wrangle_report.pdf, wrangle_report.html
- act_report.pdf, act_report.html
- twitter_archive_enhanced.csv, image_predictions.tsv, tweet_json.txt, twitter_archive_master.csv, twitter_archive.db
See code here.