{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Python Grundlagen\n",
    "\n",
    "Ziel dieser Übung ist es die folgenden Grundlagen zu Python und Jupyter zu vermitteln:\n",
    " - [Code kommentieren](#commenting)\n",
    " - [Deklaration von Variablen](#variablen)\n",
    " - [Arithmetische Operationen](#arithmetik)\n",
    " - [Namenskonventionen](#naming)\n",
    " - [Listen, Sets, Tuple, Dictionaries](#datatypes)\n",
    " - [If-Else Abfragen](#ifelse)\n",
    " - [While Schleife](#whileloop)\n",
    " - [For Schleife](#forloop)\n",
    " - [Range Operator](#range)\n",
    " - [Definition eigener Funktionen](#functions)\n",
    " - [Lambdas](#lambdas)\n",
    " - [Grundlagen in Pandas](#pandas)\n",
    "\n",
    "Dieses Material wird im Rahmen der Lehrveranstaltung _Informationssysteme und Datenanalyse_ von der\n",
    "Forschungsgruppe [Datenbanken und Informationssysteme](https://tu.berlin/dima) (DIMA) der TU Berlin zur Verfügung gestellt.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"commenting\"></a>\n",
    "## Code kommentieren"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Dies ist ein einzeiliger Kommentar\n",
    "print(\"hello world\")  # Zeichen nach dem \"#\" Symbol werden vom Python Interpreter ignoriert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Ein Kommentar \n",
    "über\n",
    "mehrere \n",
    "Zeilen.\n",
    "\"\"\"\n",
    "\n",
    "print(\"hello world\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"variablen\"></a>\n",
    "## Deklaration von Variablen\n",
    "\n",
    "Variablen können beliebige Werte annehmen. Typ oder Größe müssen nicht vorher definiert werden.\n",
    "Python differenziert zwischen Groß- und Kleinschreibung und benötigt, anders als bei Java oder C, kein Semikolon am Ende eines Befehls.\n",
    "\n",
    "Die Variable, die den Wert annehmen soll, steht auf der linken Seite. Der neue Wert der Variable steht auf der rechten Seite."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "x = 3  # integer\n",
    "y = 3.0  # floating point number\n",
    "z = \"Hello\"  # ein string definiert mit \"\"\n",
    "# ein string, in der Variablen Z definiert mit \"\"\n",
    "Z = \"World!\"\n",
    "\n",
    "print(x, type(x))\n",
    "print(y, type(y))\n",
    "print(z, type(z))\n",
    "print(Z, type(Z))\n",
    "\n",
    "# der Type einer Variablen kann auch nachträglich geändert werden\n",
    "Z = 23.5\n",
    "print(Z, type(Z))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# bool Werte werden mit False oder True deklariert\n",
    "f = False\n",
    "t = True\n",
    "\n",
    "print(f, type(f))\n",
    "print(t, type(t))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"arithmetik\"></a>\n",
    "## Arithmetische Operationen"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "i = 5\n",
    "j = 3\n",
    "print(\"Summe : \", i + j)\n",
    "print(\"Differenz : \", i - j)\n",
    "print(\"Produkt : \", i * j)\n",
    "print(\"Potenz : \", i**j)\n",
    "print(\"Modulo : \", i % j)\n",
    "print(\"Floor Division : \", i // j)\n",
    "print(\"Float Division : \", i / j)\n",
    "result = i / j\n",
    "type(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"naming\"></a>\n",
    "## Namenskonventionen\n",
    "\n",
    "Ein gängiger Guide zu korrektem Code Style sind die Vorgaben von PEP8:\n",
    "https://www.python.org/dev/peps/pep-0008/\n",
    "\n",
    "Variablen und Funktionen sollten folgende Konvention beachten: <br>\n",
    "snake_case <br>\n",
    "lower_case_with_underscore\n",
    "\n",
    "Allerdings wird auch **camelCase** häufiger gesehen.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# snake_case\n",
    "mein_string = \"hallo123\"\n",
    "mein_zweiter_string = \"hallo345\"\n",
    "# camelCase\n",
    "meinDritterString = \"hallo567\"\n",
    "\n",
    "print(mein_string)\n",
    "print(mein_zweiter_string)\n",
    "print(meinDritterString)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"datatypes\"></a>\n",
    "## Listen, Sets, Tuple, Dictionaries\n",
    "**Listen** <br> sind Sammlungen von Elementen und behalten die Reihenfolge bei. Listen haben eine variable Größe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "meine_liste = [87, 43, 1, 4, 321, 5, 2, 21, 1, 32, 1, 43]\n",
    "print(meine_liste, type(meine_liste))\n",
    "\n",
    "# Einfügen von einem Element in eine Liste\n",
    "meine_liste.append(43)\n",
    "print(meine_liste, type(meine_liste))\n",
    "\n",
    "# Zusammenführen von zwei Listen\n",
    "meine_liste.extend([34, 21, 74, 146])\n",
    "print(meine_liste, type(meine_liste))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "**Sets** <br> sind unsortiere, duplikat-freie Sammlungen von Elementen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mein_set = {87, 43, 1, 4, 321, 5, 2, 21, 1, 32, 1, 43}\n",
    "print(mein_set, type(mein_set))\n",
    "\n",
    "# hinzüfügen eines Elementes\n",
    "mein_set.add(56)\n",
    "print(mein_set, type(mein_set))\n",
    "\n",
    "# entfernen\n",
    "mein_set.remove(321)\n",
    "print(mein_set, type(mein_set))\n",
    "\n",
    "# zusammenführen von zwei Sets\n",
    "mein_set.update({32, 653, 12, 723, 145})\n",
    "print(mein_set, type(mein_set))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "**Tuple** <br>\n",
    "sind wie Listen, die nicht verändert werden können"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mein_tuple = (1, 5, 1, 4)\n",
    "print(mein_tuple, type(mein_tuple))\n",
    "# erstes Element\n",
    "print(mein_tuple[0])\n",
    "# letztes Element\n",
    "print(mein_tuple[len(mein_tuple) - 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "**Dictionaries (aka Maps)** <br>\n",
    "sind Key/Value Paare. Keys sind einzigartig in einem dict. Values können mehrmals vorkommen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "woerter = {\"house\": \"Haus\", \"cat\": \"Katze\", \"black\": \"schwarz\"}\n",
    "\n",
    "print(woerter[\"house\"])\n",
    "\n",
    "woerter[\"river\"] = \"Fluss\"\n",
    "woerter[\"cat\"] = \"Veraenderte Katze\"\n",
    "# es können auch Keys verändert werden\n",
    "woerter[\"new_house\"] = woerter.pop(\"house\")\n",
    "\n",
    "print(woerter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"ifelse\"></a>\n",
    "## If-Else Abfragen\n",
    "In Python werden Blöcke mit Einrückungen definiert. Eine Einrückung besteht aus 4 Leerzeichen.\n",
    "In C oder Java werden Blöcke in { } gekennzeichnet.\n",
    "Hinweis: Python IDEs lösen in der Regel Tab automatisch in 4 Leerzeichen auf."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "a = 3 + 18\n",
    "b = 4**2\n",
    "\n",
    "if a > b:\n",
    "    print(\"a groesser b\")\n",
    "    print(\"a > b\")\n",
    "elif a < b:\n",
    "    print(\"a kleiner b\")\n",
    "    print(\"a < b\")\n",
    "else:\n",
    "    print(\"a gleich b\")\n",
    "    print(\"a == b\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "elemente = [1, 34, 442, 6, 12]\n",
    "\n",
    "if 1 in elemente:\n",
    "    print(str(1) + \" ist in der Liste\")\n",
    "    if 34 in elemente:\n",
    "        print(str(34) + \" ist auch in der Liste\")\n",
    "else:\n",
    "    print(\"Element nicht gefunden\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"whileloop\"></a>\n",
    "## While Schleife"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "i = 0\n",
    "words = [\"katze\", \"fenster\", \"haus\"]\n",
    "\n",
    "while i < len(words):\n",
    "    print(words[i])\n",
    "    i = i + 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"forloop\"></a>\n",
    "## For Schleife"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "words = [\"katze\", \"fenster\", \"haus\"]\n",
    "for w in words:\n",
    "    print(w, len(w))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"range\"></a>\n",
    "## Range Operator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "for i in range(15):\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "for i in range(2, 8):\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "for i in range(100, 20, -5):\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"functions\"></a>\n",
    "## Definition eigener Funktionen"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Iterative Methode ohne Ausgabe\n",
    "def fib(n):\n",
    "    n1, n2 = 0, 1\n",
    "    count = 0\n",
    "\n",
    "    # Validiere die Eingaben\n",
    "    if n <= 0:\n",
    "        print(\"Falsche Eingabe!\")\n",
    "    elif n == 1:\n",
    "        print(\"Fibonacci bis\", n, \":\")\n",
    "        print(n1)\n",
    "    else:\n",
    "        print(\"Fibonacci:\")\n",
    "        while count < n:\n",
    "            print(n1)\n",
    "            nth = n1 + n2\n",
    "            # update Werte\n",
    "            n1 = n2\n",
    "            n2 = nth\n",
    "            count += 1\n",
    "\n",
    "\n",
    "print(fib(14))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Rekursive Methode mit Ausgabe\n",
    "def fib(n):\n",
    "    if n < 0:\n",
    "        print(\"Falsche Eingabe!\")\n",
    "    # Erste Zahl ist 0\n",
    "    elif n == 1:\n",
    "        return 0\n",
    "    # Zweite Zahl ist 1\n",
    "    elif n == 2:\n",
    "        return 1\n",
    "    else:\n",
    "        return fib(n - 1) + fib(n - 2)\n",
    "\n",
    "\n",
    "print(fib(14))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Methode mit mehreren Eingabeparametern\n",
    "def summe(a, b, c, d):\n",
    "    return a + b + c + d\n",
    "\n",
    "\n",
    "summe(2, 2, 15, 10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Der Typ von Funktionsparameters und Rückgabewerten kann über Type Annotations definiert werden\n",
    "# Diese werden jedoch nicht vom Python Interpreter überprüft und dienen nur der Lesbarkeit\n",
    "def summe_typed(a: int, b: float) -> float:\n",
    "    return a + b\n",
    "\n",
    "\n",
    "summe_typed(2, 2.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"lambdas\"></a>\n",
    "## Lambdas\n",
    "Lambdas sind kleine anonyme Funktionen mit nur einem Ausdruck"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Lambda mit einem Parameter\n",
    "l1 = lambda a: a + 10\n",
    "print(l1(23))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Lambda mit mehreren Parametern\n",
    "l2 = lambda a, b: a * b\n",
    "print(l2(15, 6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a id=\"pandas\"></a>\n",
    "## Grundlagen in Pandas\n",
    "Hier ist eine kurze Einführung in Pandas:\n",
    "https://pandas.pydata.org/pandas-docs/stable/index.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# dependencies importieren\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "pd.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# dataframe erzeugen mit numpy array\n",
    "dates = pd.date_range(\"20130101\", periods=6)\n",
    "df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list(\"ABCD\"))\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# dataframe mit einem dict erzeugen\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"A\": 1.0,\n",
    "        \"B\": pd.Timestamp(\"20130102\"),\n",
    "        \"C\": pd.Series(1, index=list(range(4)), dtype=\"float32\"),\n",
    "        \"D\": np.array([3] * 4, dtype=\"int32\"),\n",
    "        \"E\": pd.Categorical([\"test\", \"train\", \"test\", \"train\"]),\n",
    "        \"F\": \"foo\",\n",
    "    }\n",
    ")\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# ersten drei Zeilen anzeigen\n",
    "print(df.head(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# letzten zwei Zeilen anzeigen\n",
    "print(df.tail(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Statistiken numerischer Felder\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Projektion auf den Statistiken\n",
    "df.describe()[\"A\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Selektion\n",
    "df[df[\"A\"] > 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Selektion nach Index und Update der Spalte\n",
    "df.loc[:, \"D\"] = np.array([5] * len(df), dtype=np.int32)\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Funktionen anwenden\n",
    "\n",
    "# Liste von Tuplen\n",
    "matrix = [\n",
    "    (22, 34, 23),\n",
    "    (33, 31, 11),\n",
    "    (44, 16, 21),\n",
    "    (55, 32, 22),\n",
    "    (66, 33, 27),\n",
    "    (77, 35, 11),\n",
    "]\n",
    "\n",
    "# DataFrame object erzeugen\n",
    "df = pd.DataFrame(matrix, columns=list(\"xyz\"), index=list(\"abcdef\"))\n",
    "print(df)\n",
    "\n",
    "# jetzt quadrieren wir alle elemente in z\n",
    "df[\"z\"] = df[\"z\"].apply(lambda x: x * 2)\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# nur Zeile b und c quadrieren\n",
    "# Liste von Tuplen\n",
    "matrix = [\n",
    "    (22, 34, 23),\n",
    "    (33, 31, 11),\n",
    "    (44, 16, 21),\n",
    "    (55, 32, 22),\n",
    "    (66, 33, 27),\n",
    "    (77, 35, 11),\n",
    "]\n",
    "\n",
    "# DataFrame object erzeugen\n",
    "df = pd.DataFrame(matrix, columns=list(\"xyz\"), index=list(\"abcdef\"))\n",
    "df = df.apply(lambda x: x**2 if x.name in [\"b\", \"c\"] else x, axis=1)\n",
    "print(df)"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 1
}