Previous Topic
prerequisite_data_structure
0.9
Arrays provide the contiguous, indexable structure that search procedures access and reason about for performance.

Search Algorithms

algorithms data structures software engineering databases machine learning computer science
Search algorithms are methods for locating a target item within a collection or navigating structures like arrays, trees, and graphs. They range from simple scans to sophisticated, heuristic-driven strategies, and are analyzed by their time and space complexity, completeness, and optimality. Choosing the right search algorithm depends on data organization, constraints, and performance goals.

What Are Search Algorithms?

Search algorithms locate items or paths within data structures. They answer questions like: "Is this key present? Where is it? What is the shortest path from A to B?" These algorithms span from basic linear scans to advanced, heuristic-guided searches used in routing, games, and information retrieval.

Key Dimensions

  • Data model: array, linked list, hash table, tree, graph, text, spatial data.
  • Knowledge: uninformed (no domain hints) vs. informed (uses heuristics).
  • Goal: membership/index lookup, nearest neighbor, pathfinding, optimization.
  • Guarantees: completeness (finds a solution if one exists), optimality (finds the best solution), and complexity (time/space).

Fundamental Methods

Linear (Sequential) Search

Scan elements one by one until the target is found or the structure ends. Works on any iterable; no preprocessing required.

  • Time: O(n)
  • Space: O(1)
  • Pros: simplest, no assumptions
  • Cons: slow for large datasets
function linear_search(a, target):
  for i from 0 to length(a)-1:
    if a[i] == target:
      return i
  return -1

Binary Search

On a sorted array, repeatedly halve the search interval.

  • Time: O(log n)
  • Space: O(1) iterative (O(log n) recursive)
  • Pros: very fast for static sorted data
  • Cons: requires sorted random-access structure
function binary_search(a, target):
  lo = 0
  hi = length(a) - 1
  while lo <= hi:
    mid = lo + (hi - lo) // 2
    if a[mid] == target:
      return mid
    else if a[mid] < target:
      lo = mid + 1
    else:
      hi = mid - 1
  return -1

Variants: lower_bound/upper_bound, searching for first/last occurrence, searching insertion point.

Other Array-Oriented Searches

  • Jump search: check blocks of size ~√n, then linear search within the block. Time: O(√n) on sorted arrays.
  • Interpolation search: estimates position by value distribution. Average O(log log n) on uniformly distributed sorted arrays; worst-case O(n).
  • Exponential search: find range by doubling, then binary search within. Effective for unbounded or unknown-size sorted lists.

Data-Structure-Driven Search

Hash Tables

Use a hash function to map keys to buckets. Average O(1) lookup; worst O(n) with many collisions. Performance depends on load factor and hash quality. Variants include chaining and open addressing.

Binary Search Trees (BSTs)

Store keys with ordering invariant; search follows left/right pointers.

  • Balanced BSTs (AVL, Red-Black, B-Tree): O(log n) time
  • Unbalanced BST: worst-case O(n)

Heaps

Great for finding min/max in O(1) and extracting in O(log n), but arbitrary search is O(n).

Graph and Tree Traversals

Breadth-First Search (BFS)

Explores neighbors level by level. On unweighted graphs, BFS finds the shortest path (fewest edges).

  • Time: O(V + E)
  • Space: O(V)
  • Complete and optimal for unweighted graphs
function bfs(graph, start):
  visited = set([start])
  q = queue([start])
  parent = {}
  while q not empty:
    u = q.pop_front()
    for v in graph.neighbors(u):
      if v not in visited:
        visited.add(v)
        parent[v] = u
        q.push_back(v)
  return parent

Depth-First Search (DFS)

Explores as far as possible along each branch before backtracking. Useful for connectivity, cycle detection, topological sort.

  • Time: O(V + E)
  • Space: O(V) with recursion or explicit stack
  • Complete in finite graphs; not necessarily optimal on path cost
function dfs(graph, u, visited):
  if u in visited: return
  visited.add(u)
  for v in graph.neighbors(u):
    dfs(graph, v, visited)

Weighted and Informed Path Search

Dijkstra's Algorithm

Finds shortest paths in graphs with non-negative edge weights using a priority queue.

  • Time: O((V + E) log V) with a binary heap
  • Complete and optimal for non-negative weights

A* Search

Guided by a heuristic h(n), prioritizing nodes by f(n) = g(n) + h(n). If h is admissible (never overestimates) and consistent, A* is complete and optimal.

function a_star(start, goal, neighbors, cost, heuristic):
  open = priority_queue()
  open.push(start, priority=heuristic(start))
  g = { start: 0 }
  parent = {}
  while open not empty:
    u = open.pop_min()
    if u == goal:
      return reconstruct_path(parent, u)
    for v in neighbors(u):
      tentative = g[u] + cost(u, v)
      if v not in g or tentative < g[v]:
        g[v] = tentative
        parent[v] = u
        f = tentative + heuristic(v)
        open.push_or_decrease_key(v, f)
  return failure

Common heuristics: Euclidean/Manhattan distance in grids, straight-line distance in maps.

Analyzing Search Algorithms

  • Time complexity: operations as n grows (e.g., O(n), O(log n), O(V+E)).
  • Space complexity: extra memory (e.g., BFS queue size).
  • Completeness: guarantees finding a solution if it exists.
  • Optimality: returns best solution under a cost metric.
  • Preprocessing vs. query trade-off: sorting or building an index speeds repeated searches.

Practical Tips and Pitfalls

  • Binary search off-by-one: carefully define mid and loop conditions to avoid infinite loops.
  • Data assumptions: binary/interpolation searches require sorted data; hashing requires a good hash and load control.
  • Memory: BFS can be memory-heavy; DFS is lighter but risks deep recursion.
  • Heuristics: ensure admissibility/consistency when you need optimal A* solutions.
  • Dynamic data: frequent updates may favor hash tables over sorted arrays + binary search.

When to Use What

  • Unsorted small/one-off lookup: Linear search.
  • Sorted static data with random access: Binary search.
  • Many lookups, key-value access: Hash table.
  • Ordered operations with frequent inserts/deletes: Balanced BST.
  • Shortest paths: BFS (unweighted), Dijkstra (non-negative weights), A* (with heuristic).

Applications

  • Databases and indexing
  • Pathfinding in maps and games
  • Compilers and symbol tables
  • Search engines and information retrieval
  • Network routing and optimization

Context from Referenced By

Context from Related Topics
Pop Quiz
Next Topic
contains
0.95

Breadth First Search
A foundational uninformed search that explores nodes level by level, illustrating completeness and optimality in unweighted graphs.
used_by
0.84

Binary Search Trees
Binary search trees rely on search procedures to locate, insert, and delete keys by traversing left/right subtrees based on comparisons, with performance characterized by search algorithm analysis.