Appendix E — Abstract Data Types

Author

phonchi

Published

April 12, 2023

Open In Colab


E.1 Background

Problem solving with a computer means processing data. To process data, we need to define the data type and the operation to be performed on the data. For example, to find the sum of a list of numbers, we should select the type for the number (integer or real) and define the operation (addition). The definition of the data type and the definition of the operation to be applied to the data is part of the idea behind an abstract data type (ADT)—to hide how the operation is performed on the data.

In other word, the user of an ADT needs only to know that a set of operations are available for the data type, but does not need to know how they are applied.

E.1.1 Simple ADTs

Many programming languages already define some simple ADTs as integral parts of the language. For example, the Python language defines a simple ADT called an integer. The type of this ADT is integer with predefined ranges. Python also defines several operations that can be applied on this data type (addition, subtraction, multiplication, division, and so on). Python  explicitly defines these operations on integers and what we expect as the results. A  programmer who writes a Python program to add two integers should know about the integer ADT and the operations that can be applied to it.

The programmer, however, does not need to know how these operations are actually implemented. For example, the programmer uses the expression z = x + y and expects the value of x (an integer) to be added to the value of y (an integer) and the result to be named z (an integer). The programmer does not need to know how the addition is performed. We learned in previous chapters that the way this addition is done by a computer is to store the two integers in two memory locations in two’s complement format, to load them into the CPU register, to add them in binary, and to store the result back to another memory location. The programmer, however, does not need to know this. An integer in Python is a simple abstract data type with predefined operations. How the operations are performed is not a concern for the programmer.

E.1.2 Complex ADTs

Although several simple ADTs, such as integer, float, character and so on, have been implemented and are available for use in most languages, many useful complex ADTs are not. As we will see in this chapter, we need a stack ADT, a queue ADT, and so on. To be efficient, these ADTs should be created and stored in the library of the computer to be used. The user of a stack, for example, should only need to know what operations are available for the stack, not how these operations are performed.

Therefore, with an ADT, users are not concerned with how the task is done, but rather with what it can do. In other words, the ADT consists of a set of definitions that allow programmers to use the operation while their implementation is hidden. This generalization of operations with unspecified implementations is known as abstraction. We abstract the essence of the process and leave the implementation details hidden.

The concept of abstraction means: 1. We know what a data type can do. 2. How it is done is hidden.

Let us now define an ADT. An abstract data type is a data type packaged with the operations that are meaningful for the data type. We then encapsulate the data and the operations on the data and hide them from the user.

Abstract Data Type 1. Definition of data 2. Definition of operations 3. Encapsulation of data and operation

E.1.3 Model for an abstract data type

The ADT model is shown in the Figure below. The colored area with an irregular outline represents the ADT. Inside the ADT are two different parts of the model: data structure and operations (public and private). The application program can only access the public operations through the interface. An interface is a list of public operations and data to be passed to or returned from those operations. The private operations are for internal use by the ADT. The data structures, such as arrays and linked lists, are inside the ADT and are used by the public and private operations.

Although the public operations and the interface should be independent of the implementation, the private operations are dependent on the data structures chosen during the implementation of the ADT. We will elaborate on this issue when we discuss some of the ADTs.

E.2 Stacks

A stack is a restricted linear list in which all additions and deletions are made at one end, the top. If we insert a series of data into a stack and then remove it, the order of the data is reversed. Data input as 5, 10, 15, 20, for example, would be removed as 20, 15, 10, and 5. This reversing attribute is why stacks are known as a last in, first out (LIFO) data structure.

We use many different types of stacks in our daily lives. We often talk of a stack of coins or a stack of books. Any situation in which we can only add or remove an object at the top is a stack. If we want to remove an object other than the one at the top, we must first remove all objects above it. Figure below shows three representations of stacks.

E.2.1 Operations on stacks

E.2.1.1 The stack operation

The stack operation creates an empty stack. Python’s built-in list type makes a decent stack data structure as it supports push and pop operations. Python’s lists are implemented as dynamic arrays internally, which means they occasionally need to resize the storage space for elements stored in them when elements are added or removed. The list over-allocates its backing storage so that not every push or pop requires resizing. For optimum performance, stacks based on Python lists should grow towards higher indexes and shrink towards lower ones.

stack = []

E.2.1.2 The push operation

The push operation inserts an item at the top of the stack. This operation returns the new stack with dataItem inserted at the top. Figure below shows the pictorial representation of this operation.

In Python we use append method as the push operation to put it into the stack (higher index).

stack.append(20)
stack.append(78)
stack.append(30)
print(stack)
[20, 78, 30]

E.2.1.3 The pop operation

The pop operation deletes the item at the top of the stack. The following shows the pictorial representation of this operation.

Poped = stack.pop()
print(Poped)
30

The deleted item can be used by the application program or can be just discarded. After the pop operation, the item that was under the top element before the deletion becomes the top element.

E.2.1.4 The empty operation

The empty operation checks the status of the stack.

def empty(lis):
  if len(lis) == 0:
    return True
  else:
    return False
empty(stack)
False
stack.pop()
stack.pop()
empty(stack)
True

E.2.2 Stack ADT

We define a stack as an ADT as shown below:

E.2.3 Stack applications

E.2.3.1 Reversing data items

Reversing data items requires that a given set of data items be reordered so that the first and last items are exchanged, with all of the positions between the first and last being relatively exchanged also. For example, the list (2, 4, 7, 1, 6, 8) becomes (8, 6, 1, 7, 4, 2).

E.2.3.2 Example 1: In Chapter 2 we gave a simple UML diagram to convert an integer from decimal to any base. Although the algorithm is very simple, if we print the digits of the converted integer as they are created, we will get the digits in reverse order. The print instruction in any computer language prints characters from left to right, but the algorithm creates the digits from right to left. We can use the reversing characteristic of a stack (LIFO structure) to solve the problem.

def DecimalToBinary(number):
  """
    Parameters
    ----------
    number: int
        The integer to be converted 
    Returns
    -------
    print the binary equivalent
  """
  if number == 0:
    print("0b0")
    return 
  S = []
  while number != 0:
    remainder = number % 2
    S.append(remainder)
    number = number //2
  
  print("0b",end ="")
  while len(S)!=0:
    x = S.pop()
    print(x,end ="")
DecimalToBinary(0)
0b0
DecimalToBinary(35)
0b100011

We create an empty stack first. Then we use a while loop to create the bits, but instead of printing them, we push them into the stack. When all bits are created, we exit the loop. Now we use another loop to pop the bits from the stack and print them. Note that the bits are printed in the reverse order to that in which they have been created.

E.2.3.3 Pairing data items

We often need to pair some characters in an expression. For example, when we write a mathematical expression in a computer language, we often need to use parentheses to change the precedence of operators.

3*((3+2)-5)

When we type an expression with a lot of parentheses, we often forget to pair the parentheses. One of the duties of a compiler is to do the checking for us. The compiler uses a stack to check that all opening parentheses are paired with a closing parentheses.

E.2.3.4 Exercise 1: Algorithm 12.2 shows how we can check if every opening parenthesis is paired with a closing parenthesis.

def CheckingParentheses(expression):
  """
    Parameters
    ----------
    expression: str
        The expression to be checked
    Returns
    -------
    Error messages if unpaired parentheses are found
  """

  S = []
  for char in expression:
    if char == '(':
      S.append(char)
    else:
      if char == ')':
        if len(S) == 0:
          print("a closing parenthes is not matched")
        else:
          S.pop()

  if len(S)!=0:
    print("unmatched opening parenthesis")
CheckingParentheses("3*((3+2)-5)")
CheckingParentheses("3*(3+2)-5)")
a closing parenthes is not matched
CheckingParentheses("3*(3+(2-5)")
unmatched opening parenthesis

E.2.4 Stack implementation

In this section we describe the general ideas behind the implementation of a stack ADT. At the ADT level, we use the stack and its four operations (stack, push, pop, and empty): at the implementation level, we need to choose a data structure to implement it. Stack ADT can be implemented using either an array or a linked list.

We can write four algorithms in pseudocode for the four operations we defined for stack in each implementation. We showed algorithms to handle arrays and linked lists in Chapter 11: these algorithms can be modified to create the four algorithms we need for stacks: stack, push, pop, and empty. These algorithms are even easier than those presented in Chapter 11, because the insertion and deletion is done only at the top of stack.

E.2.4.1 Example 2: The linked list implementation is as above, we have an extra record that has the name of the stack. This record also has two fields: a counter, which at each moment shows the number of data items in the stack. Another field is a pointer that points to the top element.

class Node:
  def __init__(self, data):
    self.data = data
    self.link = None
  def __repr__(self):
    return str(self.data)

class LinkedList:
  def __init__(self, nodes=None):
    self.head = None
    if nodes:
      node = Node(data=nodes.pop(0)) #pop will pop out the first entry of the list
      self.head = node
      for elem in nodes:
        node.link = Node(data=elem)
        node = node.link
  def __repr__(self): #This is to beautify printing, don't worry about this
    node = self.head
    nodes = []
    while node is not None:
      nodes.append(str(node.data))
      node = node.link
    nodes.append("None")
    return " -> ".join(nodes)
L = LinkedList(0)
# Create an empty stack
def stack():
  S={"count":0, "top":None}
  return S

def push(S,L,x):
  new = Node(x)
  if S["top"] == None:
    L.head = new
    S["top"] = L.head
  else:
    new.link = L.head
    L.head = new
    S["top"] = L.head
  S["count"] = S["count"]+1


def pop(S,L):
  if S["count"] ==0:
    return None
  x = L.head
  L.head = L.head.link
  S["top"] = L.head
  S["count"] = S["count"]-1  
  return x

def empty(S):
  if S["count"] == 0:
    return True
  else:
    return False 
S = stack()
push(S,L,20)
print(S,L)
push(S,L,78)
print(S,L)
push(S,L,30)
print(S,L)
{'count': 1, 'top': 20} 20 -> None
{'count': 2, 'top': 78} 78 -> 20 -> None
{'count': 3, 'top': 30} 30 -> 78 -> 20 -> None
print(pop(S,L))
print(S,L)
print(pop(S,L))
print(S,L)
print(pop(S,L))
print(S,L)
30
{'count': 2, 'top': 78} 78 -> 20 -> None
78
{'count': 1, 'top': 20} 20 -> None
20
{'count': 0, 'top': None} None
push(S,L,20)
print(S,L)
push(S,L,78)
print(S,L)
push(S,L,30)
print(S,L)
{'count': 1, 'top': 20} 20 -> None
{'count': 2, 'top': 78} 78 -> 20 -> None
{'count': 3, 'top': 30} 30 -> 78 -> 20 -> None
print(pop(S,L))
print(S,L)
print(pop(S,L))
print(S,L)
print(pop(S,L))
print(S,L)
30
{'count': 2, 'top': 78} 78 -> 20 -> None
78
{'count': 1, 'top': 20} 20 -> None
20
{'count': 0, 'top': None} None

E.2.4.2 Example 3: In the array implementation, we have a record that has two fields. The first field can be used to store information about the array: we have used it as the count field, which at each moment shows the number of data items in the stack. The second field is an integer that holds the index of the top element. Note that the array is shown upside down to match the linked list implementation.

import numpy as np
n = 10 # number of elements allocated to the array
A = np.zeros(n, dtype=np.int8)
# Create an empty stack
def stack():
  S={"count":0, "top":-1}
  return S

def push(S,A,x):
  S["top"] = S["top"]+1
  S["count"] = S["count"]+1
  A[S["top"]] = x

def pop(S,A):
  if S["count"] ==0:
    return None
  x = A[S["top"]]
  A[S["top"]] = 0
  S["top"] = S["top"]-1
  S["count"] = S["count"]-1  
  return x

def empty(S):
  if S["count"] == 0:
    return True
  else:
    return False 
S = stack()
push(S,A,20)
print(S,A)
push(S,A,78)
print(S,A)
push(S,A,30)
print(S,A)
{'count': 1, 'top': 0} [20  0  0  0  0  0  0  0  0  0]
{'count': 2, 'top': 1} [20 78  0  0  0  0  0  0  0  0]
{'count': 3, 'top': 2} [20 78 30  0  0  0  0  0  0  0]
print(pop(S,A))
print(S,A)
print(pop(S,A))
print(S,A)
print(pop(S,A))
print(S,A)
30
{'count': 2, 'top': 1} [20 78  0  0  0  0  0  0  0  0]
78
{'count': 1, 'top': 0} [20  0  0  0  0  0  0  0  0  0]
20
{'count': 0, 'top': -1} [0 0 0 0 0 0 0 0 0 0]
push(S,A,20)
print(S,A)
push(S,A,78)
print(S,A)
push(S,A,30)
print(S,A)
{'count': 1, 'top': 0} [20  0  0  0  0  0  0  0  0  0]
{'count': 2, 'top': 1} [20 78  0  0  0  0  0  0  0  0]
{'count': 3, 'top': 2} [20 78 30  0  0  0  0  0  0  0]
print(pop(S,A))
print(S,A)
print(pop(S,A))
print(S,A)
print(pop(S,A))
print(S,A)
30
{'count': 2, 'top': 1} [20 78  0  0  0  0  0  0  0  0]
78
{'count': 1, 'top': 0} [20  0  0  0  0  0  0  0  0  0]
20
{'count': 0, 'top': -1} [0 0 0 0 0 0 0 0 0 0]

E.3 Queues

A queue is a linear list in which data can only be inserted at one end, called the rear, and deleted from the other end, called the front. These restrictions ensure that the data are processed through the queue in the order in which it is received. In other words, a queue is a first in, first out (FIFO) structure.

Queues are familiar from everyday life. A line of people waiting for the bus at a bus station is a queue, a list of calls put on hold to be answered by a telephone operator is a queue, and a list of waiting jobs to be processed by a computer is a queue.

Figure 12.8 shows two representations of queues, one a queue of people and the other a computer queue. Both people and data enter the queue at the rear and progress through the queue until they arrive at the front. Once they are at the front of the queue, they leave the queue and are served.

E.3.1 Operations on queues

E.3.1.1 The queue operation

The queue operation creates an empty queue. A queue is a collection of objects that supports fast FIFO semantics for inserts and deletes. The insert and delete operations are sometimes called enqueue and dequeue. Unlike lists or arrays, queues typically don’t allow for random access to the objects they contain. It’s possible to use a regular list as a queue, but this is not ideal from a performance perspective. Lists are quite slow for this purpose because inserting or deleting an element at the beginning requires shifting all the other elements by one!

The collections.deque implementation is a great default choice if you’re looking for a queue data structure in Python’s standard library.

from collections import deque

queue = deque()

E.3.1.2 The enqueue operation

The enqueue operation inserts an item at the rear of the queue. After the enqueue operation, the new item becomes the last item in the queue. This operation returns the new queue with new data inserted at the rear. Figure below shows the pictorial representation of this operation.

In Python, you can use the append operation for enqueue.

queue.append(20)
queue.append(78)
queue.append(30)
queue
deque([20, 78, 30])

E.3.1.3 The dequeue operation

The dequeue operation deletes the item at the front of the queue. The deleted item can be used by the application program or can be just discarded. After the dequeue operation, the item that followed the front element becomes the front element. This operation returns the new queue with one less element. Figure below shows the pictorial representation of this operation.

front = queue.popleft()
front
20

In python, you can use the popleft operation for dequeue.

E.3.1.4 The empty operation

The empty operation checks the status of the queue.

def empty(q):
  if len(q) == 0:
    return True
  else:
    return False
empty(queue)
False
queue.popleft()
queue.popleft()
empty(queue)
True

E.3.2 Queue ADT

We define a stack as an ADT as shown below:

E.3.3 Queue applications

Queues are one of the most common of all data processing structures. They are found in virtually every operating system and network and in countless other areas. For example, queues are used in online business applications such as processing customer requests, jobs, and orders. In a computer system, a queue is needed to process jobs and for system services such as print spools.

E.3.3.1 Example 4: Queues can be used to organize databases by some characteristic of the data. For example, imagine we have a list of sorted data stored in the computer belonging to two categories: less than 1000, and greater than 1000. We can use two queues to separate the categories and at the same time maintain the order of data in their own category. Algorithm 12.3 shows the pseudocode for this operation.

def Categorizer(ilist):
  """
    Parameters
    ----------
    ilist1: list
        Input sorted list
    Returns
    -------
    Categorize data into two categories and create two separate queues and print them out
  """

  Q1 = deque()
  Q2 = deque()
  for data in ilist:
    if data < 1000:
      Q1.append(data)
    if data >=1000:
      Q2.append(data)
  
  print("Q1:")
  while len(Q1)!=0:
    x = Q1.popleft()
    print(x, end=' ')

  print("\nQ2:")
  while len(Q2)!=0:
    x = Q2.popleft()
    print(x, end=' ')
Categorizer([150, 350, 500, 700, 1000, 1200, 1550])
Q1:
150 350 500 700 
Q2:
1000 1200 1550 
Categorizer([50, 250, 900, 999, 1001, 1900, 2550])
Q1:
50 250 900 999 
Q2:
1001 1900 2550 

Another common application of a queue is to adjust and create a balance between a fast producer of data and a slow consumer of data. For example, assume that a CPU is connected to a printer. The speed of a printer is not comparable with the speed of a CPU. If the CPU waits for the printer to print some data created by the CPU, the CPU would be idle for a long time. The solution is a queue. The CPU creates as many chunks of data as the queue can hold and sends them to the queue. The CPU is now free to do other jobs. The chunks are dequeued slowly and printed by the printer. The queue used for this purpose is normally referred to as a spool queue.

E.3.4 Queue implementation

At the ADT level, we use the queue and its four operations (queue, enqueue, dequeue, and empty): at the implementation level, we need to choose a data structure to implement it. A queue ADT can be implemented using either an array or a linked list.

We can write four algorithms in pseudocode for the four operations we defined for queues in each implementation. We described algorithms to handle arrays and linked lists in Chapter 11: we can modify those algorithms to create the four algorithms we need for queues: queue, enqueue, dequeue, and empty. These algorithms are easier than those presented in Chapter 11, because insertion is done only at the end of the queue and deletion is done only at the front of the queue.

E.3.4.1 Exercise 2: In the array implementation we have a record with three fields. The first field can be used to store information about the queue: we have used this as a count field that shows the current number of data items in the queue. The second field is an integer that holds the index of the front element. The third field is also an integer, which holds the index of the rear element.

import numpy as np
n = 10 # number of elements allocated to the array
A = np.zeros(n, dtype=np.int8)
# Create an empty queue
def queue():
  Q={"count":0, "rear":-1, "front":-1}
  return Q

def enqueue(Q,A,x):
  if Q["front"]==-1:
    Q["front"] = 0
  Q["rear"] = Q["rear"]+1
  Q["count"] = Q["count"]+1
  A[Q["rear"]] = x

def dequeue(Q,A):
  if Q["count"] ==0:
    return None
  x = A[Q["front"]]
  A[Q["front"]] = 0
  Q["front"] = Q["front"]+1
  Q["count"] = Q["count"]-1  
  return x

def empty(S):
  if Q["count"] == 0:
    return True
  else:
    return False 
Q = queue()
enqueue(Q,A,20)
print(Q,A)
enqueue(Q,A,78)
print(Q,A)
enqueue(Q,A,30)
print(Q,A)
{'count': 1, 'rear': 0, 'front': 0} [20  0  0  0  0  0  0  0  0  0]
{'count': 2, 'rear': 1, 'front': 0} [20 78  0  0  0  0  0  0  0  0]
{'count': 3, 'rear': 2, 'front': 0} [20 78 30  0  0  0  0  0  0  0]
print(dequeue(Q,A))
print(Q,A)
print(dequeue(Q,A))
print(Q,A)
print(dequeue(Q,A))
print(Q,A)
20
{'count': 2, 'rear': 2, 'front': 1} [ 0 78 30  0  0  0  0  0  0  0]
78
{'count': 1, 'rear': 2, 'front': 2} [ 0  0 30  0  0  0  0  0  0  0]
30
{'count': 0, 'rear': 2, 'front': 3} [0 0 0 0 0 0 0 0 0 0]
enqueue(Q,A,20)
print(Q,A)
enqueue(Q,A,78)
print(Q,A)
enqueue(Q,A,30)
print(Q,A)
{'count': 1, 'rear': 3, 'front': 3} [ 0  0  0 20  0  0  0  0  0  0]
{'count': 2, 'rear': 4, 'front': 3} [ 0  0  0 20 78  0  0  0  0  0]
{'count': 3, 'rear': 5, 'front': 3} [ 0  0  0 20 78 30  0  0  0  0]
print(dequeue(Q,A))
print(Q,A)
print(dequeue(Q,A))
print(Q,A)
print(dequeue(Q,A))
print(Q,A)
20
{'count': 2, 'rear': 5, 'front': 4} [ 0  0  0  0 78 30  0  0  0  0]
78
{'count': 1, 'rear': 5, 'front': 5} [ 0  0  0  0  0 30  0  0  0  0]
30
{'count': 0, 'rear': 5, 'front': 6} [0 0 0 0 0 0 0 0 0 0]

E.3.4.2 Exercise 3: The linked list implementation is similar: we have an extra record that has the name of the queue. This node also has three fields: a count, a pointer that points to the front element, and a pointer that points to the rear element.

class Node:
  def __init__(self, data):
    self.data = data
    self.link = None
  def __repr__(self):
    return str(self.data)

class LinkedList:
  def __init__(self, nodes=None):
    self.head = None
    if nodes:
      node = Node(data=nodes.pop(0)) #pop will pop out the first entry of the list
      self.head = node
      for elem in nodes:
        node.link = Node(data=elem)
        node = node.link
  def __repr__(self): #This is to beautify printing, don't worry about this
    node = self.head
    nodes = []
    while node is not None:
      nodes.append(str(node.data))
      node = node.link
    nodes.append("None")
    return " -> ".join(nodes)
L = LinkedList(0)
# Create an empty queue
def queue():
  Q={"count":0, "front":None, "rear":None}
  return Q

def enqueue(Q,L,x):
  new = Node(x)
  if Q["count"] == 0:
    L.head = new
    Q["front"] = L.head
    Q["rear"] = L.head
  else:
    Q["rear"].link = new
    Q["rear"] = Q["rear"].link
  Q["count"] = Q["count"]+1


def dequeue(Q,L):
  if Q["count"] ==0:
    return None
  x = Q["front"]
  if Q["count"] ==1:
    Q["front"] = None
    Q["rear"] = None
  else:
    Q["front"] = Q["front"].link
  L.head = Q["front"]
  Q["count"] = Q["count"] - 1
  return x

def empty(Q):
  if Q["count"] == 0:
    return True
  else:
    return False 
Q = queue()
enqueue(Q,L,20)
print(Q,L)
enqueue(Q,L,78)
print(Q,L)
enqueue(Q,L,30)
print(Q,L)
{'count': 1, 'front': 20, 'rear': 20} 20 -> None
{'count': 2, 'front': 20, 'rear': 78} 20 -> 78 -> None
{'count': 3, 'front': 20, 'rear': 30} 20 -> 78 -> 30 -> None
print(dequeue(Q,L))
print(Q,L)
print(dequeue(Q,L))
print(Q,L)
print(dequeue(Q,L))
print(Q,L)
20
{'count': 2, 'front': 78, 'rear': 30} 78 -> 30 -> None
78
{'count': 1, 'front': 30, 'rear': 30} 30 -> None
30
{'count': 0, 'front': None, 'rear': None} None
enqueue(Q,L,20)
print(Q,L)
enqueue(Q,L,78)
print(Q,L)
enqueue(Q,L,30)
print(Q,L)
{'count': 1, 'front': 20, 'rear': 20} 20 -> None
{'count': 2, 'front': 20, 'rear': 78} 20 -> 78 -> None
{'count': 3, 'front': 20, 'rear': 30} 20 -> 78 -> 30 -> None
print(dequeue(Q,L))
print(Q,L)
print(dequeue(Q,L))
print(Q,L)
print(dequeue(Q,L))
print(Q,L)
20
{'count': 2, 'front': 78, 'rear': 30} 78 -> 30 -> None
78
{'count': 1, 'front': 30, 'rear': 30} 30 -> None
30
{'count': 0, 'front': None, 'rear': None} None

E.4 General linear lists

This topic is beyond our scope.

E.5 Trees

A tree consists of a finite set of elements, called nodes (or vertices), and a finite set of directed lines, called arcs, that connect pairs of the nodes. If the tree is not empty, one of the nodes, called the root, has no incoming arcs. The other nodes in a tree can be reached from the root by following a unique path, which is a sequence of consecutive arcs. Tree structures are normally drawn upside down with the root at the top as shown below:

We can divide the vertices in a tree into three categories: the root, leaves, and the internal nodes. The following showsthe number of outgoing and incoming arcs allowed for each type of node.

Type of node Incoming arc Outgoing arc
root 0 0 or more
leaf 1 0
internal 1 1 or more

A node that is directly accessible (through a single arc) from a given node is called the child: the node from which the child is directly accessible is called a parent. Nodes with a common parent are called siblings. Descendents of a node are all nodes that can be reached by that node, and a node from which all descendents can be reached is called an ancestor. Each node in a tree may have a subtree.

The subtree of each node includes one of its children and all descendents of that child. Figure below shows all subtrees for the tree in Figure above

Although trees have many applications in computer science, such as index files, their study is beyond the scope of this book. We introduce trees as a prelude to discussing one special type of tree, binary trees.

E.5.1 Binary trees

A binary tree is a tree in which no node can have more than two subtrees. In other words, a node can have zero, one, or two subtrees. These subtrees are designated as the left subtree and the right subtree. Figure below shows a binary tree with its two subtrees. Note that each subtree is itself a binary tree.

E.5.1.1 Recursive definition of binary trees

In Chapter 8 we introduced the recursive definition of an algorithm. We can also define a structure or an ADT recursively. The following gives the recursive definition of a binary tree. Note that, based on this definition, a binary tree can have a root, but each subtree can also have a root.

A binary tree is either empty or consists of a node, root, with two subtrees, in which each subtree is also a binary tree.

Figure below shows eight trees, the first of which is an empty binary tree (sometimes called a null binary tree).

E.5.2 Operations on binary trees

The six most common operations defined for a binary tree are tree (creates an empty tree) insert, delete, retrieve, empty and traversal. The first five are complex and beyond the scope of this book. We discuss binary tree traversal in this section.

E.5.2.1 Binary tree traversals

A binary tree traversal requires that each node of the tree be processed once and only once in a predetermined sequence. The two general approaches to the traversal sequence are depth-first and breadth-first traversal.

E.5.2.1.1 Depth-first traversals

Given that a binary tree consists of a root, a left subtree, and a right subtree, we can define six different depth-first traversal sequences. Computer scientists have assigned standard names to three of these sequences in the literature: the other three are unnamed but are easily derived. The standard traversals are shown in Figure below.

  • Preorder traversal. In preorder traversal the root node is processed first, followed by the left subtree and then the right subtree. The prefix pre indicates that the root node is processed before the subtrees.

  • Inorder traversal. In inorder traversal the left subtree is processed first, then the root node, and finally the right subtree. The prefix in indicates that the root node is processed between the subtrees.

  • Postorder traversal. In postorder traversal the root node is processed after the left and right subtrees have been processed. The prefix post indicates that the root is processed after the subtrees.

E.5.2.2 Example 5: Figure below shows how we visit each node in a tree using preorder traversal. The figure also shows the walking order. In preorder traversal we visit a node when we pass from its left side. The nodes are visited in this order: A, B, C, D, E, F.

E.5.2.2.1 Breadth-first traversals

In breadth-first traversal of a binary tree we process all the children of a node before proceeding with the next generation. As with depth-first traversals, we can trace the traversal with a walk.

E.5.2.3 Example 6: Figure below shows how we visit each node in a tree using breadth-first traversal. The figure also shows the walking order. The traversal order is A, B, E, C, D, F.

E.5.2.4 Binary tree applications

E.5.2.4.1 Expression trees

An arithmetic expression can be represented in three different formats: infix, postfix, and prefix. In an infix notation, the operator comes between the two operands. In postfix notation, the operator comes after its two operands, and in prefix notation it comes before the two operands. These formats are shown below for the addition of two operands A and B.

  • Prefix: + A B
  • Infix: A + B
  • Postfix: A B +

Although we use infix notation in our algorithms and in programming languages, the compiler often changes them to postfix notation before evaluating them. One way to do this conversion is to create an expression tree. In an expression tree, the root and the internal nodes are operators and the leaves are the operands. The three standard traversals (preorder, inorder, and postorder) then represent the three different expression formats: infix , postfix , and prefix. The inorder traversal produces the infix expression, the postorder traversal produces the postfix expression, and the preorder traversal produces the prefix expression. Figure below shows an expression and its expression tree. Note that only the infix notation needs parentheses.

E.5.3 Binary search trees

A binary search tree (BST) is a binary tree with one extra property: the key value of each node is greater than the key values of all nodes in each left subtree and smaller than the value of all nodes in each right subtree. Figure below shows the idea.

E.5.3.1 Example 7: Figure below shows some binary trees that are BSTs and some that are not. Note that a tree is a BST if all its subtrees are BSTs and the whole tree is also a BST.

E.5.3.1.1 BST implementation

BSTs can be implemented using either arrays or linked lists. However, linked list structures are more common and more efficient. A linear implementation uses nodes with two pointers, left and right. The left pointer points to the left subtree and the right pointer points to the right subtree. If the left subtree is empty, the left pointer is null: if the right subtree is empty, the right pointer is null. Figure below shows a BST in which the data field of each node is a record.

class tree:
  def __init__(self):
    self.data=0
    self.left=None
    self.right=None
  def __repr__(self):
    lines, *_ = self._display_aux()
    return "\n".join(lines)
  def _display_aux(self):
    # https://stackoverflow.com/questions/34012886/print-binary-tree-level-by-level-in-python
    """Returns list of strings, width, height, and horizontal coordinate of the root."""
    # No child.
    if self.right is None and self.left is None:
      line = '%s' % self.data
      width = len(line)
      height = 1
      middle = width // 2
      return [line], width, height, middle

    # Only left child.
    if self.right is None:
      lines, n, p, x = self.left._display_aux()
      s = '%s' % self.data
      u = len(s)
      first_line = (x + 1) * ' ' + (n - x - 1) * '_' + s
      second_line = x * ' ' + '/' + (n - x - 1 + u) * ' '
      shifted_lines = [line + u * ' ' for line in lines]
      return [first_line, second_line] + shifted_lines, n + u, p + 2, n + u // 2

    # Only right child.
    if self.left is None:
      lines, n, p, x = self.right._display_aux()
      s = '%s' % self.data
      u = len(s)
      first_line = s + x * '_' + (n - x) * ' '
      second_line = (u + x) * ' ' + '\\' + (n - x - 1) * ' '
      shifted_lines = [u * ' ' + line for line in lines]
      return [first_line, second_line] + shifted_lines, n + u, p + 2, u // 2

    # Two children.
    left, n, p, x = self.left._display_aux()
    right, m, q, y = self.right._display_aux()
    s = '%s' % self.data
    u = len(s)
    first_line = (x + 1) * ' ' + (n - x - 1) * '_' + s + y * '_' + (m - y) * ' '
    second_line = x * ' ' + '/' + (n - x - 1 + u + y) * ' ' + '\\' + (m - y - 1) * ' '
    if p < q:
      left += [n * ' '] * (q - p)
    elif q < p:
      right += [m * ' '] * (p - q)
    zipped_lines = zip(left, right)
    lines = [first_line, second_line] + [a + u * ' ' + b for a, b in zipped_lines]
    return lines, n + m + u, max(p, q) + 2, n + u // 2    

def create_tree(root,val):  # create a binary tree
  newnode=tree()
  newnode.data=val
  newnode.left=None
  newnode.right=None
  if root==None:
    root=newnode
    return root
  else:
    current=root
    while current!=None:
      backup=current
      if current.data > val:
        current=current.left
      else:
        current=current.right
    if backup.data >val:
      backup.left=newnode
    else:
      backup.right=newnode
  return root
def inorder(ptr):   # Inorder
  if ptr!=None:
    inorder(ptr.left)
    print(ptr.data, end=' ')
    inorder(ptr.right)
data=[17,6,3,14,19]
ptr=None
root=None
for i in range(len(data)):
  ptr=create_tree(ptr,data[i])
ptr
  __17_ 
 /     \
 6_   19
/  \    
3 14    

A very interesting property of a BST is that if we apply the inorder traversal of a binary tree, the elements that are visited are sorted in ascending order. For example, the three BSTs in Figure above, when traversed in the order gives the list (3, 6, 17), (17, 19), and (3, 6, 14, 17, 19).

An inorder traversal of a BST creates a list that is sorted in ascending order.

E.5.3.2 Example 8: The code below is the inorder traversal of a binary search tree.

print('=======================================================')
print('Inorder:')
inorder(ptr) 
print()
=======================================================
Inorder:
3 6 14 17 19 

E.5.3.3 Exercise 4: Try to write the postorder and preorder traversal of the above tree.

def postorder(ptr):  # Postorder
  if ptr!=None:
    postorder(ptr.left)
    postorder(ptr.right)
    print(ptr.data, end=' ')

def preorder(ptr):   # Preorder
  if ptr!=None:
    print(ptr.data, end= ' ')
    preorder(ptr.left)
    preorder(ptr.right)
print('=======================================================')
print('Postorder:')
postorder(ptr)
print()
print('=======================================================')
print('Preorder:')
preorder(ptr)
print() 
=======================================================
Postorder:
3 14 6 19 17 
=======================================================
Preorder:
17 6 3 14 19 
E.5.3.3.1 Binary search tree ADTs

https://visualgo.net/en/bst

The ADT for a binary search tree is similar to the one we defined for a general linear list with the same operation. As a matter of fact, we see more BST lists than general linear lists today. The reason is that searching a BST is more efficient that searching a linear list: a general linear list uses sequential searching, but BSTs use a version of binary search.

Another feature that makes a BST interesting is that we can use a version of the binary search we used in Chapter 8 for a binary search tree. Figure below shows the UML for a BST search.

E.5.3.4 Exercise 5: Search the above tree.

def search(ptr,val):    
  i=1
  while True:
    if ptr==None:    
      return "Not Found"
    if ptr.data==val:      
      print('Total search number %d'%i)
      return "Found"
    elif ptr.data > val: 
      ptr=ptr.left
    else:
      ptr=ptr.right
    i+=1
search(ptr, 20)
'Not Found'
search(ptr, 14)
Total search number 3
'Found'

E.6 Graphs

A graph is an ADT made of a set of nodes, called vertices, and set of lines connecting the vertices, called edges or arcs. Whereas a tree defines a hierarchical structure in which a node can have only one single parent, each node in a graph can have one or more parents. Graphs may be either directed or undirected. In a directed graph, or digraph, each edge, which connects two vertices, has a direction (shown in the figure by an arrowhead) from one vertex to the other. In an undirected graph, there is no direction. Figure below shows an example of both a directed graph (a) and an undirected graph (b).

The vertices in a graph can represent objects or concepts and the edges or arcs can represent a relationship between those objects or concepts. If a graph is directed, the relations are one-way: if a graph is undirected, the relation is two-way.

For instance, a map of cities and the roads connecting the cities can be represented in a computer using an undirected graph. The cities are vertices and the undirected edges are the roads that connect them. If we want to show the distances between the cities, we can use weighted graphs, in which each edge has a weight that represent the distance between two cities connected by that edge. Another application of graphs is in computer networks (Chapter 6). The vertices can represent the nodes or hubs; the edges can represent the route. Each edge can have a weight that defines the cost of reaching from one hub to the adjacent hub. A router can use graph algorithms to find the shortest path between itself and the final destination of a packet.