{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Aufgabe 7*: Operatoren der Relationalen Algebra implementieren" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "Im folgenden sollen die Operatoren der relationalen Algebra für Pandas Dataframes implementiert werden.\n", "\n", "Gehe hierbei davon aus, dass die Spalten der Dataframes benannt sind und ignoriert den Index der DataFrames.\n", "\n", "In diesem ersten Snippet wird eine Funktion definiert, mit der du deine Lösungen später testen kannst. Du musst es dir nicht im Detail ansehen, sondern nur einmal ausführen." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "from pandas import DataFrame\n", "from typing import *\n", "from io import StringIO\n", "import numpy as np\n", "\n", "df = DataFrame(\n", " [\n", " (11, 22, 34, 23),\n", " (11, 22, 31, 11),\n", " (13, 42, 16, 21),\n", " (55, 32, 71, 22),\n", " (66, 33, 19, 27),\n", " (13, 42, 31, 11),\n", " ],\n", " columns=list(\"abyz\"),\n", ")\n", "\n", "df2 = DataFrame([(11, 22), (13, 42)], columns=list(\"ab\"))\n", "\n", "df3 = DataFrame(\n", " [\n", " (11, 22, 34, 23),\n", " (15, 22, 41, 11),\n", " (13, 42, 16, 21),\n", " (55, 32, 71, 22),\n", " (66, 34, 19, 27),\n", " ],\n", " columns=list(\"abyz\"),\n", ")\n", "\n", "test = [\n", " ('select(\"y > 30 and z <= 20\", df)', \"a,b,y,z\\r\\n11,22,31,11\\r\\n13,42,31,11\\r\\n\"),\n", " (\n", " 'project([\"z\", \"y\"], df)',\n", " \"z,y\\r\\n23,34\\r\\n11,31\\r\\n21,16\\r\\n22,71\\r\\n27,19\\r\\n11,31\\r\\n\",\n", " ),\n", " (\n", " 'rename([\"a\", \"b\", \"c\", \"d\"], df)',\n", " \"a,b,c,d\\r\\n\"\n", " \"11,22,34,23\\r\\n\"\n", " \"11,22,31,11\\r\\n\"\n", " \"13,42,16,21\\r\\n\"\n", " \"55,32,71,22\\r\\n\"\n", " \"66,33,19,27\\r\\n\"\n", " \"13,42,31,11\\r\\n\",\n", " ),\n", " (\n", " \"cross(df, df2)\",\n", " \"L_a,L_b,y,z,R_a,R_b\\r\\n\"\n", " \"11,22,34,23,11,22\\r\\n\"\n", " \"11,22,34,23,13,42\\r\\n\"\n", " \"11,22,31,11,11,22\\r\\n\"\n", " \"11,22,31,11,13,42\\r\\n\"\n", " \"13,42,16,21,11,22\\r\\n\"\n", " \"13,42,16,21,13,42\\r\\n\"\n", " \"55,32,71,22,11,22\\r\\n\"\n", " \"55,32,71,22,13,42\\r\\n\"\n", " \"66,33,19,27,11,22\\r\\n\"\n", " \"66,33,19,27,13,42\\r\\n\"\n", " \"13,42,31,11,11,22\\r\\n\"\n", " \"13,42,31,11,13,42\\r\\n\",\n", " ),\n", " (\n", " \"union(df, df3)\",\n", " \"a,b,y,z\\r\\n\"\n", " \"11,22,34,23\\r\\n\"\n", " \"11,22,31,11\\r\\n\"\n", " \"13,42,16,21\\r\\n\"\n", " \"55,32,71,22\\r\\n\"\n", " \"66,33,19,27\\r\\n\"\n", " \"13,42,31,11\\r\\n\"\n", " \"11,22,34,23\\r\\n\"\n", " \"15,22,41,11\\r\\n\"\n", " \"13,42,16,21\\r\\n\"\n", " \"55,32,71,22\\r\\n\"\n", " \"66,34,19,27\\r\\n\",\n", " ),\n", " (\"minus(df, df3)\", \"a,b,y,z\\r\\n11,22,31,11\\r\\n66,33,19,27\\r\\n13,42,31,11\\r\\n\"),\n", " (\n", " \"intersect(df, df3)\",\n", " \"a,b,y,z\\r\\n\"\n", " \"11,22,34,23\\r\\n\"\n", " \"11,22,31,11\\r\\n\"\n", " \"13,42,16,21\\r\\n\"\n", " \"55,32,71,22\\r\\n\"\n", " \"66,33,19,27\\r\\n\"\n", " \"13,42,31,11\\r\\n\",\n", " ),\n", " (\n", " \"join(df, df2)\",\n", " \"a,b,y,z\\r\\n11,22,34,23\\r\\n11,22,31,11\\r\\n13,42,16,21\\r\\n13,42,31,11\\r\\n\",\n", " ),\n", " ('join(df, df2, theta=\"z == R_b\")', \"L_a,L_b,y,z,R_a,R_b\\r\\n55,32,71,22,11,22\\r\\n\"),\n", " (\"divide(df, df2)\", \"y,z\\r\\n31,11\\r\\n\"),\n", "]\n", "\n", "\n", "def validate(i):\n", " query, result = test[i]\n", " print(f\"test {i}: {query}\")\n", " expected = pd.read_csv(StringIO(result)).reset_index(drop=True)\n", " actual = eval(query).reset_index(drop=True)\n", " try:\n", " pd.testing.assert_frame_equal(expected, actual)\n", " print(\"OK\")\n", " except:\n", " print(\"expected\")\n", " print(expected)\n", " print(\"actual\")\n", " print(actual)\n", " raise" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Aufgabe 7.1*: Basisoperatoren\n", "\n", "In den folgenden Teilaufgaben implementieren Sie die Basisoperatoren der relationalen Algebra. " ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.1.1*: Selektion ($\\sigma$)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "*Hinweis:* Der Prädikatstring ist ein beliebiger boolescher Ausdruck über die Attribute einer Entity. Du kannst die Funktion `eval` benutzen um diesen String auszuwerten. Benutze das optionale Argument `locals` um die Werte der Variablen des Ausdrucks zu übergeben. Zum Beispiel:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "eval(\"a < 10\", {\"a\": 3})" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def select(predicate: str, relation: DataFrame) -> DataFrame:\n", " # TODO: Hier code hinzufügen.\n", " return relation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(0)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.1.2*: Projektion ($\\pi$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def project(attributes: List[str], relation: DataFrame) -> DataFrame:\n", " # TODO: Hier code hinzufügen.\n", " return relation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(1)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.1.3*: rename (umbenennen) ($\\rho$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def rename(new_attribute_names: List[str], relation: DataFrame) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return relation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(2)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.1.4*: Kreuzprodukt ($\\times$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def cross(left: DataFrame, right: DataFrame) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return relation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(3)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.1.5*: Vereinigung ( $\\cup$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def union(left: DataFrame, right: DataFrame) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return relation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(4)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.1.6*: Differenz ($-$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def minus(left: DataFrame, right: DataFrame) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return relation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(5)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Aufgabe 7.2*: Abgeleitete Operatoren\n", "\n", "*Hinweis:* Die abgeleiteten Operatoren heißen so, weil man sie von den Basisoperatoren ableiten kann." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Aufgabe 7.2.1*: Schnitt ($\\cap$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def intersect(left: DataFrame, right: DataFrame) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return left" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(6)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Aufgabe 7.2.2*: Natural Join ($\\bowtie$), Theta Join ( $\\bowtie_{\\theta}$)\n", "Wenn das optionale Argument `theta` leer ist, soll ein natural join durchgeführt werden." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def join(left: DataFrame, right: DataFrame, theta: str = \"\", outer: bool = False) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return left" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(7)\n", "validate(8)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Aufgabe 7.2.3*: Division ($/$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def divide(left: DataFrame, right: DataFrame) -> DataFrame:\n", " # TODO: Hier Code hinzufügen.\n", " return left" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "validate(9)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Bei Bedarf: Alle Tests ausführen" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Alle Lösungen mittels implementierter Funktionen generieren:\n", "x = []\n", "divide(df, df2)\n", "for q, _ in test:\n", " print(q)\n", " x.append((q, eval(q).to_csv(index=False)))\n", "\n", "from pprint import pprint\n", "\n", "pprint(x)\n", "\n", "# Alle Tests ausführen:\n", "for i in range(10):\n", " validate(i)" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }