Course Overview
In this course you will learn to use Python, the most popular programming language for data sciences, for data analysis and data visualization. Explore Python libraries to more easily sort and analyze data sets for emerging trends. Quickly produce Excel quality visualizations appropriate for displaying data in real time monitoring systems.
Intro to data science using Python libraries like pandas and numpy to identify trends within datasets. Create rich visualizations with matplotlib, folium and seaborn. Use open source toolset scipy for mathematics, science, and engineering applications. Introduction to scikit-learn, a machine learning tool for datasets.
Who should attend
This course was written for professionals interested in Python and Data Sciences. This includes: Engineers, Mathematicians, Actuaries, Network Specialists, System Admins, and developers.
Prerequisites
Keyboard proficiency, and some previous python coding experience is the only hard requirement. Students with some previous exposure to Python, or any another scripting experience, will take the most from the course. In lieu of previous experience, Alta3 Research’s Python Basics course is recommended.
Recommended Prerequisite: Python Basics (5 days)
Follow On Courses
Outline: Python 203 - Python for Data Sciences (PDS)
Introduction to Python Libraries for Data Sciences
- Python with Jupyter Notebook overview
- Live code
- Equations
- Data cleaning
- Transformation
- Numerical simulation
- Statistical modeling
- Data visualization
- Machine Learning
- Pandas
- Filter DataFrames
- Dictionaries to DataFrames
- CSV to DataFrames
- Excel to DataFrames
- Numpy
- Work across arrays
- Requests
- Pull from RESTful APIs
- JSON
Sort, Analyze, and Visualize Data with Python
- Matplotlib
- Line Plots
- Area Plots
- Histograms
- Bar Charts
- Pie Charts
- Box Plots
- Scatter Plots
- Bubble Plots
- Waffle Charts
- Word Clouds
- Seaborn
- visualization techniques
- Relational
- Categorical
- Distributions
- Regressions
- visualization techniques
- Folium
- interactive leaflet maps
- choropleth visualizations
- rich vector/raster/HTML visual markers
- Saving visualizations output in various formats
Python and Databases
- Creating a database engine in Python
- sqlite3
- Looking at tables in a database
- Querying relational databases
- MySQL and Python
- SQL Queries
- Filtering with SQL WHERE
- Ordering with SQL ORDER BY
- Querying with pandas
- Table relationships with INNER JOIN
- MongoDB
- Understanding noSQL
- Python and MongoDB
- Pymongo
- Query
- Find
- Delete
- Update
- Limit
Introduction to Machine Learning with Python
- scipy open ecosystem
- numerical integration
- Interpolation
- Optimization
- linear algebra
- statistics
- Scikit-learn
- Applications of Machine Learning
- Training vs Testing sets
- Supervised vs Unsupervised Learning
- Python libraries suitable for Machine Learning
- Loading an example dataset
- Learning and predicting
Introduction to Machine Learning with Python (continued)
- Scikit-learn
- Model persistence
- Conventions
- Refitting and updating parameters
- Multiclass vs. multilabel fitting
- Moving output to remote systems
- Streaming (push) to real-time dashboard APIs
- Move data with SFTP
- Email attachments