Home   Teaching   Presentations   Publications   Graduate Students   Curriculum Vitae   Problems   Contact

Algorithms for Data Science (Fall 2018)

Course: Algorithms for Data Science

View on GitHub
    Course (Algorithms For Data Science):   Data   Projects

Data

Here we introduce several blogs related to data and data handling and also some resources of datasets.

Blogs:

  • How to Share Data with a Statistician by Jeff Leek
  • 10 Statistics Tips (and Why You Should Use Them!) by Jeff Leek
  • Improve Your Data Literacy Skills and Make the Most of Data BY Geckoboard Company
    • Tips for Effective Data Visualization
    • Common Data Mistakes to Avoid

    Common Data Mistakes to Avoid
    Cherry Picking Data Dredging Survivorship Bias Cobra Effect False Causality
    Gerrymandering Sampling Bias Gambler’s Fallacy Hawthorne Effect Regression Toward the Mean
    Simpson’s Paradox McNamara Fallacy Overfitting Publication Bias Danger of Summary Metrics

Datasets:

The following resources may be helpful for those still undecided about their course projects.

  • DataHub has a lot of structured data in formats such as RDF and CSV.
  • Awesome Public Datasets
  • UC Irvine Machine Learning Repository
  • CrowdFlower Data for Everyone library
  • Stanford Large Network Dataset Collection
  • Data Science Weekly
  • Awesome Data Science
  • Kaggle
  • Cafebazaar

To know more datasets, refer to the following webpage of KDnuggets:

  • Datasets for Data Mining and Data Science

Dealing with Data:

  • Data Preprocessing
  • 5 Ways To Handle Missing Values In Machine Learning Datasets
  • Handling Missing Data
  • How to Handle Missing Data
  • 7 Techniques to Handle Imbalanced Data
  • How to Handle Imbalanced Data: An Overview
  • Visualize Missing Data with VIM Package
  • Ultimate Guide to Handle Big Datasets for Machine Learning Using Dask (in Python)
  • Slide: Data Cleaning and Data Preprocessing by Nguyen Hung Son
  • Slide: Data Preprocessing by Taehyung Wang

Jupyter NoteBooks:

  • 5-Day Challenge: Data Cleaning by Rachael Tatman
  • HousePrice - Data Cleaning & Visualization by Katsuhisa Kawaguchi
Algorithms-For-Data-Science is maintained by hhaji. This page was generated by GitHub Pages.