{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7c22d640",
   "metadata": {},
   "source": [
    "# Frozensets\n",
    "\n",
    "A frozenset is an immutable version of a set. Once created, elements cannot be added or removed. This is useful when, for example, you need to protect data from modification, to use set as dict key or set element, or for graph edges and combinations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24d63056",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "# Find project root by looking for _config.yml\n",
    "current = Path.cwd()\n",
    "for parent in [current, *current.parents]:\n",
    "    if (parent / '_config.yml').exists():\n",
    "        project_root = parent\n",
    "        break\n",
    "else:\n",
    "    project_root = Path.cwd().parent.parent\n",
    "\n",
    "# Add project root to path\n",
    "sys.path.insert(0, str(project_root))\n",
    "\n",
    "# Import shared teaching helpers and cell magics\n",
    "from shared import thinkpython, diagram, jupyturtle, structshape\n",
    "from shared.download import download\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e7afb6d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%expect AttributeError\n",
    "fs = frozenset({1, 2, 3})\n",
    "fs.add(4)     # AttributeError — frozensets are immutable\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56fa6e87",
   "metadata": {},
   "source": [
    "## Creating a frozenset\n",
    "\n",
    "The syntax for creating a frozenset is:\n",
    "\n",
    "```python\n",
    "frozenset()            # empty frozenset\n",
    "frozenset(iterable)    # from any iterable (list, set, string, etc.)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "id": "9e545a44",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "frozenset({1, 2, 3})\n",
      "frozenset({'o', 'h', 'e', 'l'})\n"
     ]
    }
   ],
   "source": [
    "fs = frozenset([1, 2, 3])\n",
    "print(fs)                   # frozenset({1, 2, 3})\n",
    "fs = frozenset(\"hello\")\n",
    "print(fs)                   # element order may vary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "765dd1ec",
   "metadata": {},
   "source": [
    "## Why use a frozenset?\n",
    "\n",
    "Because it is immutable, a frozenset is hashable, which means it can be: \n",
    "\n",
    "- used as a dictionary key, or \n",
    "- stored inside another set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "c1f80041",
   "metadata": {},
   "outputs": [],
   "source": [
    "# As a dict key\n",
    "d = {frozenset({1, 2}): \"pair\"}\n",
    "\n",
    "# Inside a set\n",
    "s = {frozenset({1, 2}), frozenset({3, 4})}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f91f361e",
   "metadata": {},
   "source": [
    "| Feature          | set               | frozenset              |\n",
    "|------------------|-------------------|------------------------|\n",
    "| Mutable          | Yes               | No                     |\n",
    "| Hashable         | No                | Yes                    |\n",
    "| Can be dict key  | No                | Yes                    |\n",
    "| Set operations   | All               | Non-mutating only      |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ebc4e8a",
   "metadata": {},
   "source": [
    "## Set Performance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0350b947",
   "metadata": {},
   "source": [
    "When you store a key in a dictionary or an element in a set, Python calls hash() on it to compute an integer, then uses that integer to determine where to store the value in memory. This is called a **hash table**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b918d6d",
   "metadata": {},
   "source": [
    "The benefit is that lookups are usually O(1) on average: Python jumps directly to a bucket via hash instead of scanning the whole collection. In the worst case (many collisions), lookup can degrade toward O(n)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "30103e4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# O(n) — scans the entire list\n",
    "\"apple\" in [\"apple\", \"banana\", \"cherry\"]\n",
    "\n",
    "# Average O(1) — jumps directly via hash bucket\n",
    "\"apple\" in {\"apple\", \"banana\", \"cherry\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeb39265",
   "metadata": {},
   "source": [
    "(set-bigo)=\n",
    "**O(n)** and **O(1)** come from **Big O notation**, which describes how the **speed** of an operation scales with the **size** of the data. The full treatment of Big O comes in the Algorithms chapter, where sorting and searching will be covered.\n",
    "\n",
    "For now, just know that:\n",
    "- **Average O(1)**: hash-table lookups in `dict`/`set` are constant time on average, because Python uses the hash to jump to a bucket.\n",
    "- **Worst-case O(n)**: if many keys collide into the same bucket, Python may need to scan multiple entries.\n",
    "- **O(n)**: **linear time**. The operation gets slower as the collection grows. A list search is O(n) because Python checks each element one by one until it finds a match:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "856d0605",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = {\"apple\": 1, \"banana\": 2, \"cherry\": 3}\n",
    "d[\"apple\"]   # same speed whether dict has 3 or 3 million items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "7baa8ed3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"apple\" in [\"apple\", \"banana\", \"cherry\"]  # checks up to n items"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec48aca5",
   "metadata": {},
   "source": [
    "In short, \n",
    "\n",
    "- **O(1)** is like looking up a word in a dictionary: you go directly to the page\n",
    "- **O(n)** is like finding a word in a novel: you read until you find it"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e512f70",
   "metadata": {},
   "source": [
    "You already saw this in the **dictionary** section, and the same idea applies here:\n",
    "\n",
    "- **O(1)** means near-constant lookup time on average (`key in dict`, `item in set`).\n",
    "- **O(n)** means linear time, where Python may scan through many elements (`item in list`).\n",
    "\n",
    "So for membership checks, sets are usually much faster than lists as data grows."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b8792d1",
   "metadata": {},
   "source": [
    "## Performance Demo\n",
    "\n",
    "This benchmark mirrors the dictionary example: same data size, worst-case target (`n - 1`), and timing for membership checks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "id": "9a5bb8e6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "List lookup (O(n)): 0.004426 s\n",
      "Set lookup (avg O(1)): 0.000021 s\n",
      "Set was ~206x faster\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "n = 1_000_000\n",
    "big_list = list(range(n))\n",
    "big_set = set(range(n))\n",
    "target = n - 1                  # worst case for linear list search\n",
    "\n",
    "# O(n): list search scans until it finds target\n",
    "start = time.perf_counter()     # time.perf_counter() is better for benchmarking code\n",
    "result = target in big_list\n",
    "list_time = time.perf_counter() - start\n",
    "\n",
    "# Average O(1): set lookup jumps to hash bucket\n",
    "start = time.perf_counter()\n",
    "result = target in big_set\n",
    "set_time = time.perf_counter() - start\n",
    "\n",
    "print(f\"List lookup (O(n)): {list_time:.6f} s\")\n",
    "print(f\"Set lookup (avg O(1)): {set_time:.6f} s\")\n",
    "print(f\"Set was ~{list_time / set_time:.0f}x faster\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df369e30",
   "metadata": {},
   "source": [
    "## Set Comprehensions\n",
    "\n",
    "A **set comprehension** builds a set from an expression, using the same syntax as a list comprehension but with curly braces `{}`. Because the result is a set, duplicates are automatically discarded.\n",
    "\n",
    "The syntax for creating sets is:\n",
    "\n",
    "```python\n",
    "{expression for item in iterable}\n",
    "{expression for item in iterable if condition}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "d2c4b705",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Square set: {0, 9, 4, 1}\n",
      "Unique word lengths: {3, 4, 5}\n",
      "Vowels found: {'o', 'e'}\n"
     ]
    }
   ],
   "source": [
    "# Create a set of squares\n",
    "square_set = {x**2 for x in range(-3, 4)}\n",
    "print(\"Square set:\", square_set)\n",
    "\n",
    "# Remove duplicates and transform\n",
    "sentence = \"the quick brown fox jumps over the lazy dog\"\n",
    "unique_lengths = {len(word) for word in sentence.split()}\n",
    "print(\"Unique word lengths:\", unique_lengths)\n",
    "\n",
    "# Filter vowels\n",
    "vowels = {char.lower() for char in \"Hello World\" if char.lower() in 'aeiou'}\n",
    "print(\"Vowels found:\", vowels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb832ca4",
   "metadata": {
    "tags": [
     "thebe-interactive"
    ]
   },
   "outputs": [],
   "source": [
    "### Exercise: Set Comprehension Practice\n",
    "# 1. From nums = [1, 2, 2, 3, 4, 4, 5], build a set of squared values.\n",
    "# 2. From text = \"apple banana apple cherry\", build a set of unique word lengths.\n",
    "# 3. Print both sets.\n",
    "### Your code starts here.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "### Your code ends here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f4a54d3",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "nums = [1, 2, 2, 3, 4, 4, 5]\n",
    "squares = {n**2 for n in nums}\n",
    "\n",
    "text = \"apple banana apple cherry\"\n",
    "lengths = {len(word) for word in text.split()}\n",
    "\n",
    "print(\"Squares:\", squares)\n",
    "print(\"Unique lengths:\", lengths)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14fc1ba9",
   "metadata": {},
   "source": [
    "## Comparing with `dict` Comprehension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "26bae0c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Set comprehension — curly braces, no colon\n",
    "{x**2 for x in range(5)}\n",
    "\n",
    "# Dict comprehension — curly braces WITH colon\n",
    "{x: x**2 for x in range(5)}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7fc75f84",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
