Data Science with Python
(in German: Data Science with Python )
Module-ID: FIN-INF-120513 |
| Link: | LSF |
| Responsibility: | Dr. Christian Beyer |
| Lecturer: | Dr. Christian Beyer |
| Classes: | Data Science with Python |
| Applicability in curriculum: | - M.Sc. INF: Informatik - M.Sc. INF: Schlüssel- und Methodenkompetenzen - M.Sc. INGINF: Informatik - M.Sc. INGINF: Schlüssel- und Methodenkompetenzen - M.Sc. WIF: Informatik - M.Sc. WIF: Schlüssel- und Methodenkompetenzen - M.Sc. DKE: Applied Data Science - M.Sc. VC: Computer Science - M.Sc. VC: Schlüssel- und Methodenkompetenzen - M.Sc. DE: Methoden der Informatik - M.Sc. DE: Interdisziplinäres Team-Projekt |
|
Abbreviation DSWP |
Credit Points 6 |
Semester Winter |
Term
|
Duration 1 Semester |
Language english |
Level Master |
Intended learning outcomes:
The course is about learning from data to perform predictions and obtain useful insights. In the seminar, we will use the programming language Python.
Necessary skills to manage and analyze data will be taught and practiced on real-world applications. Programming knowledge from other courses is helpful but not mandatory. However, students are expected to have a profound knowledge of fundamental data-analysis techniques, such as classification, regression and clustering.
After successful completion of this course, the student will be able to perform the following tasks in Python:
- Import and preprocess raw data (files, databases, web APIs)
- Transform data for modelling
- Perform exploratory data analysis with summary statistics and visualization
- Understand, build and evaluate predictive classification and regression models, including tree-based models, ensembles and boosted models
- Communicate and disseminate results and findings through reproducible documents, presentations, websites and interactive web applications
Content:
Part Fundamentals & Visualization:
Basics, scripts, workflows, vectors & functions in Python
Explorative data visualization
Data transformation
Part Data Management & Exploratory Data Analysis:
Data cleaning & scraping
Generating hypotheses and an intuition about the data with
exploratory data analysis
Data import
Data management
Relational data
Strings, categorical data, dates & time
Iteration: imperative & functional programming
Part Modeling:
Linear regression
Classification
Evaluation
Model selection & regularization (LASSO, Ridge)
Feature selection & model interpretation
Decision trees
Ensembles: random forests
Boosting: gradient boosted trees
Unsupervised learning, e.g. k-means, hierarchical clustering,
self-organizing maps, principal component analysis
Topic modeling with simple graphical models
Statistical testing
Part Communication:
Communication and dissemination of results through
visualization and interpretable summaries with documents,
notebooks, presentations & websites
Workload:
Attendance time = 28 h: - 2 SWS weekly seminar;
Independent work outside the actual
Seminar time = 152 h: - 76 h preparation and follow-up of the
seminar topics - 76 h solving the tasks, incl. work in the laboratory
180h = 28h attendance time + 152h independent work
| Pre-examination requirements: | Type of examination: | Teaching method / lecture hours per week (SWS): |
|
Project with presentation and project report |
Seminar (2 SWS) |
| Prerequisites according to examination regulations: | Recommended prerequisites: |
|
keine |
Area 1: Data Mining, Machine Learning, Artificial Intelligence Area 2: Databases Area 3: Programming Languages and Software Engineering Area 4: Stochastics, Applied Statistics |
| Media: | Literature: |
|
Will provided during the seminar
|
Comments: