{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "83f5f63e",
   "metadata": {},
   "source": [
    "# Sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1b3db3f7",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "current = Path.cwd()\n",
    "for parent in [current, *current.parents]:\n",
    "    if (parent / '_config.yml').exists():\n",
    "        project_root = parent  # ← Add project root, not chapters\n",
    "        break\n",
    "else:\n",
    "    project_root = Path.cwd().parent.parent\n",
    "\n",
    "sys.path.insert(0, str(project_root))\n",
    "\n",
    "from shared import thinkpython, diagram, jupyturtle\n",
    "from shared.download import download\n",
    "\n",
    "# Register as top-level modules so direct imports work in subsequent cells\n",
    "sys.modules['thinkpython'] = thinkpython\n",
    "sys.modules['diagram'] = diagram\n",
    "sys.modules['jupyturtle'] = jupyturtle\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0eafbe4e",
   "metadata": {},
   "source": [
    "## What is a set\n",
    "\n",
    "A set is an **unordered** collection of unique values. Unlike lists or tuples, sets do not allow **duplicates**, and the elements have no guaranteed order. \n",
    "\n",
    "Here below is an example of a set. Note that it uses curly braces just like dictionaries. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ccf867f4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{1, 2, 3}\n"
     ]
    }
   ],
   "source": [
    "s = {1, 2, 3, 3, 2}\n",
    "print(s)                ### {1, 2, 3}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aac530ad",
   "metadata": {},
   "source": [
    "Sets are useful when you care about **membership**: whether something is **in** a collection; rather than order or position.\n",
    "\n",
    "The most common use cases are:\n",
    "\n",
    "- Removing duplicates from a list\n",
    "- Checking membership quickly\n",
    "- Comparing two collections (what's shared, what's different)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "352d5069",
   "metadata": {},
   "source": [
    "For sets, the four key properties that make sets different from other data structures:\n",
    "\n",
    "1. **Unordered**: Elements have no guaranteed position. There is no first or last element.\n",
    "2. **Unique**: Duplicates are automatically removed.\n",
    "3. **Mutable**: You can add and remove elements.\n",
    "4. **Hashable elements only**: Every element must be hashable. In practice this means immutable types: `int`, `float`, `str`, `tuple`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "abf76b98",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### 1. Sets are unordered, so the order of elements is not guaranteed.\n",
    "{3, 1, 2} == {1, 2, 3}  # True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0120dc41",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 2, 3}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### 2. Sets do not allow duplicate elements, so duplicates are automatically removed.\n",
    "{1, 2, 2, 3, 3}         # {1, 2, 3}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c6ebf262",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1, 3, 4}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "### 3. Sets are mutable, so you can add or remove elements after the set is created.\n",
    "s = {1, 2, 3}\n",
    "s.add(4)                # works\n",
    "s.remove(2)             # works\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d3cb961",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%expect TypeError\n",
    "### 4. Sets can contain only hashable (immutable) elements, so you cannot include lists or other sets as elements.\n",
    "{1, \"hello\", (1, 2)}    # valid\n",
    "{[1, 2], [3, 4]}        # TypeError\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0fd5117",
   "metadata": {},
   "source": [
    "## Choosing the Right Structure\n",
    "\n",
    "| Need                                   | Use         |\n",
    "|----------------------------------------|-------------|\n",
    "| Ordered collection, duplicates allowed | `list`      |\n",
    "| Fixed data, multiple return values     | `tuple`     |\n",
    "| Key-value lookup                       | `dict`      |\n",
    "| Unique items, membership testing       | `set`       |\n",
    "| Hashable set (dict key, nested set)    | `frozenset` |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a35f0db4",
   "metadata": {},
   "source": [
    "## Hashable vs mutable\n",
    "\n",
    "We have seen that some data types are mutable and some are hashable. It's now time to learn a little more about the. \n",
    "\n",
    "- **Hashable** means an object has a hash value that stays stable during its lifetime.\n",
    "- **Mutable** objects usually are **not hashable** (for example: `list`, `dict`, `set`).\n",
    "- **Immutable** objects are often hashable (for example: `int`, `float`, `str`, `tuple`), but for tuples, all nested elements must also be hashable.\n",
    "- For hash-table types: `dict` requires **hashable keys**, and `set` requires **hashable elements**.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40b211f3",
   "metadata": {},
   "source": [
    "### Mutability"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb38a8c3",
   "metadata": {},
   "source": [
    "For the property of mutability, let's have a look over the general data structures. For in-place change methods for the common data structures, here is a comparison table. Tuple and strings are immutable, so there's no methods available for the purpose.\n",
    "\n",
    "| Method                              | List | Dict | Set | Tuple | String |\n",
    "|-------------------------------------|------|------|-----|-------|--------|\n",
    "| `.append(x)`                        | ✓    |      |     |       |        |\n",
    "| `.insert(i, x)`                     | ✓    |      |     |       |        |\n",
    "| `.extend(iter)`                     | ✓    |      |     |       |        |\n",
    "| `.update(other)`                    |      | ✓    | ✓   |       |        |\n",
    "| `.remove(x)`                        | ✓    |      | ✓   |       |        |\n",
    "| `.pop()`                            | ✓    | ✓    | ✓   |       |        |\n",
    "| `.popitem()`                        |      | ✓    |     |       |        |\n",
    "| `.add(x)`                           |      |      | ✓   |       |        |\n",
    "| `.discard(x)`                       |      |      | ✓   |       |        |\n",
    "| `.sort()`                           | ✓    |      |     |       |        |\n",
    "| `.reverse()`                        | ✓    |      |     |       |        |\n",
    "| `.setdefault(k, v)`                 |      | ✓    |     |       |        |\n",
    "| `.clear()`                          | ✓    | ✓    | ✓   |       |        |\n",
    "| `.intersection_update(t)`           |      |      | ✓   |       |        |\n",
    "| `.difference_update(t)`             |      |      | ✓   |       |        |\n",
    "| `.symmetric_difference_update(t)`   |      |      | ✓   |       |        |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "030ea21c",
   "metadata": {},
   "source": [
    "## In-place set operations\n",
    "\n",
    "In Python, mutable methods that modify in-place typically return `None`. The general rule in Python is:\n",
    "\n",
    "- Mutates in-place → returns `None`\n",
    "- Returns a new object → original unchanged\n",
    "  \n",
    "These methods modify the original set instead of creating a new one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "86f3ec74",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "update: {1, 2, 3, 4, 5}\n",
      "intersection_update: {3}\n",
      "difference_update: {1, 2}\n",
      "symmetric_difference_update: {1, 2, 4, 5}\n"
     ]
    }
   ],
   "source": [
    "x = {1, 2, 3}\n",
    "y = {3, 4, 5}\n",
    "\n",
    "x.update(y)                        # union in-place\n",
    "print(\"update:\", x)\n",
    "\n",
    "x = {1, 2, 3}\n",
    "x.intersection_update(y)\n",
    "print(\"intersection_update:\", x)\n",
    "\n",
    "x = {1, 2, 3}\n",
    "x.difference_update(y)\n",
    "print(\"difference_update:\", x)\n",
    "\n",
    "x = {1, 2, 3}\n",
    "x.symmetric_difference_update(y)\n",
    "print(\"symmetric_difference_update:\", x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7e49497",
   "metadata": {},
   "source": [
    "### Hashability\n",
    "\n",
    "Let us compare **set**, **list**, and **dict** and this time focus on hashability. \n",
    "\n",
    "\n",
    "| Feature      | Set                    | List          | Dict                          |\n",
    "|--------------|------------------------|---------------|-------------------------------|\n",
    "| Ordered      | **No**                     | Yes           | Yes (insertion order)         |\n",
    "| Duplicates   | **No**                     | Yes           | Keys: No, Values: Yes         |\n",
    "| Mutable      | **Yes**                    | Yes           | Yes                           |\n",
    "| Allows only **hashable** elements | Yes   | No            | Keys only                     |\n",
    "| Lookup speed | Avg O(1)               | O(n)          | Avg O(1)                      |\n",
    "| Use case     | Unique items, membership | Ordered items | Key-value pairs               |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17099ee3",
   "metadata": {},
   "source": [
    "In computer science, hashable means that an object can be converted into a fixed-size integer (called a hash value or hash code) via a hash function. This hash value is used to quickly locate the object in data structures like hash tables, dictionaries, and sets. For something to be hashable, it generally needs to satisfy two properties:\n",
    "\n",
    "- **Consistency**: the hash value must not change during the object's lifetime. This is why mutable objects (like Python lists) are typically not hashable because if their contents change, their hash would need to change too, which breaks lookups.\n",
    "- **Equality consistency**: if two objects are considered equal (`a == b`), they must have the same hash value. The reverse isn't required; two different objects can share a hash (called a collision), but equal objects must hash identically.\n",
    "\n",
    "Hash-based data structures like hash maps and hash sets rely on hash values to place and find items in O(1) average time. If you want to use an object as a dictionary key or store it in a set, it needs to be hashable.\n",
    "\n",
    "In summary, **hashable** means you can be consistently fingerprinted with a number, which is the prerequisite for being usable as a key in fast lookup structures."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "630069ed",
   "metadata": {},
   "source": [
    "```text\n",
    "key     →  hash function    →  index  →  bucket in array\n",
    "\"cat\"   →  hash(\"cat\")      →  42     →  array[42]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0f675460",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8096599184176479363\n",
      "6680279693667265536\n",
      "8096599184176479363\n"
     ]
    }
   ],
   "source": [
    "print(hash(\"cat\"))           ### -9223372036854775808\n",
    "print(hash(\"dog\"))           ### -867955804616931492\n",
    "print(hash(\"cat\"))           ### -9223372036854775808"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
