{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3a7d8b47",
   "metadata": {},
   "source": [
    "# Regex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "38f16749",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "current = Path.cwd()\n",
    "for parent in [current, *current.parents]:\n",
    "    if (parent / '_config.yml').exists():\n",
    "        project_root = parent  # ← Add project root, not chapters\n",
    "        break\n",
    "else:\n",
    "    project_root = Path.cwd().parent.parent\n",
    "\n",
    "sys.path.insert(0, str(project_root))\n",
    "\n",
    "from shared import thinkpython, diagram, jupyturtle\n",
    "from shared.download import download\n",
    "\n",
    "# Register as top-level modules so direct imports work in subsequent cells\n",
    "sys.modules['thinkpython'] = thinkpython\n",
    "sys.modules['diagram'] = diagram\n",
    "sys.modules['jupyturtle'] = jupyturtle"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8f141fd",
   "metadata": {},
   "source": [
    "String methods allow you to search and manipulate text to a certain extent. A **regular expression** (regex) is a sequence of characters that defines a **search pattern** to search, match, and manipulate text in more powerful ways that go beyond simple string methods. For example:\n",
    "\n",
    "| Task | Use |\n",
    "|---|---|\n",
    "| Check if string starts with 'http' | `str.startswith()` |\n",
    "| Replace all spaces with underscores | `str.replace()` |\n",
    "| Extract all email addresses from text | regex |\n",
    "| Validate a phone number format | regex |\n",
    "| Find words matching a complex pattern | regex |\n",
    "\n",
    "The rule of thumb: if the pattern is **fixed and simple**, use string methods. If the pattern is **variable or complex**, use regex."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ecec41c",
   "metadata": {},
   "source": [
    "For example, to search a pattern in a text, we may use the `find()` string method and an index is returned if the pattern is found."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "0aa5f24b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text = \"I am Dracula; and I bid you welcome, Mr. Harker,\\\n",
    "    to my house.\"\n",
    "pattern = 'Dracula'\n",
    "text.find(pattern)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b992f30",
   "metadata": {},
   "source": [
    "## Escape Sequences and Raw Strings\n",
    "\n",
    "Before using the regular expression (`re`) functions, we need to understand regex escapes and raw strings.\n",
    "\n",
    "Regex patterns use backslashes heavily. In regex, the backslash `\\` introduces escape sequences, which create special patterns or allow metacharacters to be treated as literal characters.\n",
    "\n",
    "For example, common escape sequences include:\n",
    "\n",
    "| Pattern | Meaning | Example match |\n",
    "|---|---|---|\n",
    "| **`\\d`** | **digit** | `7` |\n",
    "| **`\\w`** | **word character** (letters, digits, underscore) | `A`, `x`, `9`, `_`  (`[a-zA-Z0-9_]`) |\n",
    "| **`\\s`** | **whitespace character** | **space**, **tab**, **new line** |\n",
    "| `\\.` | literal dot | `.` |\n",
    "| `\\$` | literal dollar sign | `$` |\n",
    "| `\\\\` | literal backslash | `\\` |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6500b3b8",
   "metadata": {},
   "source": [
    "## Raw Strings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6826c201",
   "metadata": {},
   "source": [
    "A raw string, on the other hand, is a string prefixed with **`r`**, which tells Python to treat backslashes `\\` as literal characters rather than escape sequences. \n",
    "\n",
    "- Prefix with `r` to prevent Python from interpreting backslashes before the regex engine sees the pattern; works like `\\`. \n",
    "- Raw strings avoid double escaping and make patterns easier to read.\n",
    "- Without raw strings, you often need extra backslashes, like `'\\\\d+'`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c37dbeaf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\\n\n"
     ]
    }
   ],
   "source": [
    "regular = \"\\n\"   # newline character\n",
    "raw      = r\"\\n\" # literally backslash + n (two characters)\n",
    "\n",
    "print(regular)  # prints a newline\n",
    "print(raw)      # prints \\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "018d11be",
   "metadata": {},
   "source": [
    "\n",
    "Use escapes for literal special regex characters too (for example `\\.`, `\\$`, `\\?`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1e887253",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\\\\\n",
      "\\\\\n",
      "\\\\\\n\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "print('\\\\\\\\')           # double backslash in a normal string\n",
    "print(r'\\\\')            # double backslash in a raw string (same result)\n",
    "print(r'\\\\\\n')          # backslash + n in a raw string\n",
    "print(r'\\\\\\n' == '\\\\\\\\n')   # False: raw string is backslash + n, normal string is backslash + backslash + n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90a7cb99",
   "metadata": {},
   "source": [
    "Raw string tells Python we are treating this backslash `\\` as a special character. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
