Course Project Resources:
To bring together and apply the various topics covered in this course, you will work on a machine learning project. The goal of the project is to go through the complete knowledge discovery process to answer one or more questions you have about a topic of your own choosing. You will acquire the data, formulate a question (or questions) of interest, perform the data analysis, and communicate the results. Projects are programming assignments that cover the topic of this course. Any project is written by Jupyter Notebook. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Furthermore, we can include mathematical notation within markdown cells of Jupyter Notebook using LaTeX.
َUploading and Presentation of Project
Uploading Project in GitHub
Students individually implement given assignments in recommended language. Each Student should create his/her own GitHub repository for his/her project. The repository should comprise a Readme.md file containing the analysis of the result. It should include a short write-up including the following components:
- Problem statement and hypothesis
- Description of your data set and how it was obtained
- Description of any pre-processing steps you took
- What you learned from exploring the data, including visualizations
- How you chose which features to use in your analysis
- Details of your modeling process, including how you selected your models and validated them
- Your challenges and successes
- Possible extensions or business applications of your project
- Conclusions and key learnings
Final Project
Your project repository on GitHub should contain the following:
- Project paper: Any format (PDF, Markdown, etc.)
- Presentation slides: Any format (PDF, PowerPoint, Google Slides, IPython Notebook, etc.)
- Code: Commented Jupyter Notebooks, and any other code you used in the project
- Visualizations: Integrated into your paper and/or slides
- Data: Data files in “raw” or “processed” format
- Data dictionary (aka “code book”): Description of each variable, including units
Project Presentation in the Class
Each student will explain his/her project in a 10–15 minute presentation to the class. Presentations should clearly convey the project ideas, methods, and results, including the question(s) being addressed, the motivation of the analyses being employed, and relevant evaluations, contributions, and discussion questions.
Coding:
Programming assignments will require the use of Python 3.7 and Tensorflow, as well as additional Python packages as follows.
- Python 3.7: An interactive, object-oriented, extensible programming language.
- TensorFlow: TensorFlow is an end-to-end open source platform for machine learning.
- NumPy: A Python package for scientific computing.
- Pandas: A Python package for high-performance, easy-to-use data structures and data analysis tools.
- Scikit-Learn: A Python package for machine learning.
- Matplotlib: A Python package for 2D plotting.
- SciPy: A Python package for mathematics, science, and engineering.
- IPython: An architecture for interactive computing with Python.
Most of the relevant software is a part of the SciPy stack, a collection of Python-based open source software for mathematics, science, and engineering (which includes Python, NumPy, the SciPy library, Matplotlib, pandas, IPython, and scikit-learn). The Anaconda Python Distribution is a free distribution for the SciPy stack that supports Linux, Mac, and Windows. With over 6 million users, the open source Anaconda Distribution is the fastest and easiest way to do Python and R data science and machine learning on Linux, Windows, and Mac OS X. It’s the industry standard for developing, testing, and training on a single machine.
Tutorial:
- NumPy Tutorial
You can learn Python via the following websites:
- SoloLearn (A great website for getting started with coding. It offers easy to follow lessons, interspersed with quizzes to help you retain what you are learning).
- Google Developer Python Tutorial (highly recommended as a way to master python in just a few hours!)
Latex
The students can include mathematical notation within markdown cells using LaTeX in their Jupyter Notebooks.
A Brief Introduction to LaTeX PDF
Math in LaTeX PDF
Sample Document PDF
Competitions
Competitions:
Here are some machine learning and data mining competition platforms: