{ "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. meta::\n", " :description: Topic: Data Structures, Difficulty: Medium, Category: Section\n", " :keywords: Big-O, complexity, efficiency, algorithm, interview preparation, list, tuple, sequence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Structures (Part I): Introduction\n", "Here we survey Python's built-in data structures. You should already be familiar with its lists and tuples, two data structures that facilitate working with sequential data. Two other critical, built-in data structures to be introduced are:\n", "\n", "- dictionary: a mapping from \"keys\" to \"values\"\n", "- sets: an unordered collection of items that can be used to perform set-algebraic operations (e.g. union and intersection) \n", "\n", "These data structures are not merely convenient constructs with some nice pre-written functionality; they also provide an interface to some optimized core utilities that have been written in C (or whatever language your Python interpreter is written in). For example, let's write a function that checks if an item is contained within an iterable:\n", "\n", "```python\n", "def is_in(seq, target):\n", " \"\"\" Returns True if `target` is contained in `seq`.\"\"\"\n", " for item in seq:\n", " if item == target:\n", " return True\n", " return False\n", "```\n", "\n", "This function mirrors the C-algorithm that Python uses \"under the hood\" for checking for membership in a list (assuming you are using the CPython interpreter, which you almost definitely are). Because their function is implemented \"at a lower level\", and need not be interpreted, we expect it to be faster than ours:\n", "```python\n", ">>> x = [1, \"moo\", 3, True, 5, None, 7, 8]\n", "\n", ">>> is_in(x, -1) # takes 980 nanoseconds on my machine\n", "False\n", "\n", ">>> -1 in x # takes 320 nanoseconds on my machine\n", "False\n", "```\n", "Here, Python's built-in sequence-membership function is 3x faster than using our own function. Furthermore, it will be important to know the advantages provided by each of the data structures. For instance, testing for membership in a set is even faster than is checking for membership in a list:\n", "\n", "```python\n", "# test for membership in a list\n", ">>> -1 in [1, \"moo\", 3, True, 5, None, 7, 8] # takes 295 nanoseconds on my machine\n", "False\n", "\n", "# test for membership in a set\n", ">>> -1 in {1, \"moo\", 3, True, 5, None, 7, 8} # takes 65 nanoseconds on my machine\n", "False\n", "```\n", "We get a 4.5x speedup in our membership test just by using a set instead of a list, because the use of a set permits Python to use an entirely different algorithm for checking for membership. On our end, we merely replaced square brackets with curly braces! Hopefully this is sufficient motivation for learning about Python's data structures and the algorithms that they utilize \"under the hood\".\n", "\n", "