Previous Topic

prerequisite_data_structure

0.9

Arrays provide the contiguous, indexable structure that search procedures access and reason about for performance.

Search Algorithms

algorithms data structures software engineering databases machine learning computer science

Search algorithms are methods for locating a target item within a collection or navigating structures like arrays, trees, and graphs. They range from simple scans to sophisticated, heuristic-driven strategies, and are analyzed by their time and space complexity, completeness, and optimality. Choosing the right search algorithm depends on data organization, constraints, and performance goals.

What Are Search Algorithms?

Search algorithms locate items or paths within data structures. They answer questions like: "Is this key present? Where is it? What is the shortest path from A to B?" These algorithms span from basic linear scans to advanced, heuristic-guided searches used in routing, games, and information retrieval.

Key Dimensions

Data model: array, linked list, hash table, tree, graph, text, spatial data.
Knowledge: uninformed (no domain hints) vs. informed (uses heuristics).
Goal: membership/index lookup, nearest neighbor, pathfinding, optimization.
Guarantees: completeness (finds a solution if one exists), optimality (finds the best solution), and complexity (time/space).

Fundamental Methods

Linear (Sequential) Search

Scan elements one by one until the target is found or the structure ends. Works on any iterable; no preprocessing required.

Time: O(n)
Space: O(1)
Pros: simplest, no assumptions
Cons: slow for large datasets

function linear_search(a, target):
  for i from 0 to length(a)-1:
    if a[i] == target:
      return i
  return -1

Binary Search

On a sorted array, repeatedly halve the search interval.

Time: O(log n)
Space: O(1) iterative (O(log n) recursive)
Pros: very fast for static sorted data
Cons: requires sorted random-access structure

function binary_search(a, target):
  lo = 0
  hi = length(a) - 1
  while lo <= hi:
    mid = lo + (hi - lo) // 2
    if a[mid] == target:
      return mid
    else if a[mid] < target:
      lo = mid + 1
    else:
      hi = mid - 1
  return -1

Variants: lower_bound/upper_bound, searching for first/last occurrence, searching insertion point.

Data-Structure-Driven Search

Hash Tables

Use a hash function to map keys to buckets. Average O(1) lookup; worst O(n) with many collisions. Performance depends on load factor and hash quality. Variants include chaining and open addressing.

Binary Search Trees (BSTs)

Store keys with ordering invariant; search follows left/right pointers.

Balanced BSTs (AVL, Red-Black, B-Tree): O(log n) time
Unbalanced BST: worst-case O(n)

Heaps

Great for finding min/max in O(1) and extracting in O(log n), but arbitrary search is O(n).

Graph and Tree Traversals

Breadth-First Search (BFS)

Explores neighbors level by level. On unweighted graphs, BFS finds the shortest path (fewest edges).

Time: O(V + E)
Space: O(V)
Complete and optimal for unweighted graphs

function bfs(graph, start):
  visited = set([start])
  q = queue([start])
  parent = {}
  while q not empty:
    u = q.pop_front()
    for v in graph.neighbors(u):
      if v not in visited:
        visited.add(v)
        parent[v] = u
        q.push_back(v)
  return parent

Depth-First Search (DFS)

Explores as far as possible along each branch before backtracking. Useful for connectivity, cycle detection, topological sort.

Time: O(V + E)
Space: O(V) with recursion or explicit stack
Complete in finite graphs; not necessarily optimal on path cost

function dfs(graph, u, visited):
  if u in visited: return
  visited.add(u)
  for v in graph.neighbors(u):
    dfs(graph, v, visited)

Weighted and Informed Path Search

Dijkstra's Algorithm

Finds shortest paths in graphs with non-negative edge weights using a priority queue.

Time: O((V + E) log V) with a binary heap
Complete and optimal for non-negative weights

A* Search

Guided by a heuristic h(n), prioritizing nodes by f(n) = g(n) + h(n). If h is admissible (never overestimates) and consistent, A* is complete and optimal.

function a_star(start, goal, neighbors, cost, heuristic):
  open = priority_queue()
  open.push(start, priority=heuristic(start))
  g = { start: 0 }
  parent = {}
  while open not empty:
    u = open.pop_min()
    if u == goal:
      return reconstruct_path(parent, u)
    for v in neighbors(u):
      tentative = g[u] + cost(u, v)
      if v not in g or tentative < g[v]:
        g[v] = tentative
        parent[v] = u
        f = tentative + heuristic(v)
        open.push_or_decrease_key(v, f)
  return failure

Common heuristics: Euclidean/Manhattan distance in grids, straight-line distance in maps.

Analyzing Search Algorithms

Time complexity: operations as n grows (e.g., O(n), O(log n), O(V+E)).
Space complexity: extra memory (e.g., BFS queue size).
Completeness: guarantees finding a solution if it exists.
Optimality: returns best solution under a cost metric.
Preprocessing vs. query trade-off: sorting or building an index speeds repeated searches.

Practical Tips and Pitfalls

Binary search off-by-one: carefully define mid and loop conditions to avoid infinite loops.
Data assumptions: binary/interpolation searches require sorted data; hashing requires a good hash and load control.
Memory: BFS can be memory-heavy; DFS is lighter but risks deep recursion.
Heuristics: ensure admissibility/consistency when you need optimal A* solutions.
Dynamic data: frequent updates may favor hash tables over sorted arrays + binary search.

When to Use What

Unsorted small/one-off lookup: Linear search.
Sorted static data with random access: Binary search.
Many lookups, key-value access: Hash table.
Ordered operations with frequent inserts/deletes: Balanced BST.
Shortest paths: BFS (unweighted), Dijkstra (non-negative weights), A* (with heuristic).

Applications

Databases and indexing
Pathfinding in maps and games
Compilers and symbol tables
Search engines and information retrieval
Network routing and optimization

Context from Referenced By

Context from Related Topics

Pop Quiz

Next Topic

contains

0.95

Breadth First Search
A foundational uninformed search that explores nodes level by level, illustrating completeness and optimality in unweighted graphs.

used_by

0.84

Binary Search Trees
Binary search trees rely on search procedures to locate, insert, and delete keys by traversing left/right subtrees based on comparisons, with performance characterized by search algorithm analysis.