Notebook slides (original) (raw)

Algorithms¶

Rubric: No raw loops

{ int r = a < b ? a : b; }

What does this line of code do?

{ // r is the minimum of a and b int r = a < b ? a : b; }

A comment helps...
Naming an operation, even a simple operation, lowers cognitive overhead

/// Returns the minimum of two integers int min(int a, int b) { return a < b ? a : b; }

A function allows us to associate a name with semantics

/// Returns the minimum of two integers int min(int a, int b);

Functions allow us to build a vocabulary and focus on what the code does

Naming a Function (Guidelines)¶

Operations with the same semantics should have the same name
- Follow existing conventions
Name should describe the postconditions and make the use clear
For properties and constant(ish) complexity operations use:
- nouns: capacity
- adjectives: empty (ambiguous but used by convention)
- copular constructions: is_blue
For higher complexity operations use:
- verbs: partition
  * a = sort(b) not a = sorted(b)
For setting properties:
- Prefix with the verb, set_, i.e. set_capacity
Clarity is of highest priority. Don't construct unnatural verb phrases
- intersection(a, b) not calculate_intersection(a, b)
- name not get_name

Specifying a Function¶

at a minimum, the specification should be a sentence fragment defining the postconditions
State any preconditions that are not clear from the types and conventions

/// Returns the truncated square root of `x`. `x` must be >= 0.
int sqrti(int x);

Argument Types¶

let: by-const-lvalue-reference or by-value for known small types and invocable types
inout: by-lvalue-reference, prefer sink arguments and result to in-out arguments
sink: by-rvalue-reference or by-value for known small types and to avoid forwarding references

spans, views, iterator pairs, and so on are a way to pass a range of objects as if they were a simple argument. The value_type of the range determines if it is in (const), or inout (not const) with input ranges (input iterators) used for sink.

Implicit Preconditions¶

Object Lifetimes¶

Caller must ensure referenced arguments are valid for duration of the call

Validity¶

An invalid object cannot be used as an argument

Law of Exclusivity¶

{ vector a{0, 0, 1, 0, 1 }; erase(a, a[0]); }

What is in a after this code?

{ vector a{0, 0, 1, 0, 1 }; erase(a, a[0]); display(a); }

This fails because a[0] is passed by a reference aliasing the vector, a. The standard is inconsistent in how it deals with aliasing with mutations. Unless aliasing is explicitly allowed, avoid it.

{ vector a{0, 0, 1, 0, 1 }; erase(a, copy(a[0])); display(a); }

The Law of Exclusivity is borrowed from Swift, and the term was coined by John McCall. To modify a variable, exclusive access to that variable is required. C++ does not enforce this rule; it must be manually enforced.

The Law of Exclusivity has far greater implications outside the scope of this talk. You might be interested in learning about the Val language.

Ranges¶

Iterator pairs denote a valid half-open interval

/// Transmogrify the elements in the range [f, l). template void transmogrify(ForwardIterator f, ForwardIterator l);

Implicit Postconditions¶

Validity of References¶

Any reference (pointer, iterator, etc.) to an object is assumed to be invalidated if the object is mutated.

A returned reference must be to one of the arguments to the function (or a part of one of the arguments) and is valid until the argument is modified or its lifetime ends.

Loops and Recursion¶

{ /// removes values equal to a in the range [f, l) /// returns the position, b, such that [f, b) contains all the values in [f, l) not equal to a /// values in the range [b, l) are unspecified auto remove = [](auto f, auto l, const auto& a) { auto b = find(f, l, a); if (b == l) return b; auto p = next(b); // invariant: all values in the range [f, b) do not equal x // decreasing: distance(p, l) while (p != l) { if (*p != a) { *b = std::move(*p); ++b; } ++p; } return b; };

vector a{ 0, 0, 1, 0, 1 };
a.erase(remove(begin(a), end(a), 0), end(a));

display(a);

}

Sequences¶

For a sequence of n elements, there are n + 1 positions
How to represent a range of elements?
Problem with closed interval [f, l]?
Problem with open interval (f, l)?
Half-open intervals have significant advantages [f, l)
- By strong convention we are open on the right
[p, p) represents an empty range, at position p
- All empty ranges are not equal
Think of the positions as the lines between the elements

Sequence 1
Sequence With Pointers To Objects

Sequence 2
Sequence With Pointers Between Objects

Limitations of half-open intervals
- The last element cannot be enclosed if the position type is the same as the element type
- i.e. You can't express the range [INT_MIN, INT_MAX) with type int
There are different common ways to represent a half-open interval
A position f in a sequence can be denoted with an index, pointer, or iterator
- The only requirement is that f be incrementable to obtain the next element in the sequence
[f, l)
[f, f + n) _n
[f, predicate()) _until
[f, is_sentinel()) NTBS
- const char*
[f, ...) unbounded (dependent on something else)
- i.e. range is required to be same or greater length than another range
For a variable a in C and C++, it is guaranteed that &a + 1 yields a valid, but not dereferenceable, pointer
- [&a, &a + 1) is a valid range

Common algorithms and their uses¶

A great resource for finding standard algorithms:
- https://en.cppreference.com/w/cpp/algorithm

Generic Algorithms¶

find(f, l, a) returns the position of the first element in the range [f, l) that is equal to a.

{ int a[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; display(*find(begin(a), end(a), 5)); }

Question: How do we know find() will find a value?

Iterator must meet the requirements of LegacyInputIterator
[f, l) must form a valid range
the value type must be EqualityComparable to the iterator value_type

Requirements and Guarantees¶

A generic algorithm is specified in terms of a set of requirements on its arguments. The requirements are a set of concepts and preconditions which, if satisfied, guarantee the algorithm performs as specified
The C++ standard contains tables of named requirements and concepts
Concepts and Preconditions are closely related and both ideas are rooted in Hoare Logic
A given type or operation guarantees it satisfies some requirements
Guarantees are provided by modeling concepts, and ensuring postconditions and class invariants
The description of each type in the standard will specify it is a (model of) a concept as well as specifying the semantics (postconditions and class invariants) of operations and object properties
By matching requirements with guarantees we create software which works

Concept¶

The term concept first appeared in the paper Fundamentals of Generic Programming:

We call the set of axioms satisfied by a data type and a set of operations on it a concept.

— Fundamentals of Generic Programming, Dehnert & Stepanov

In C++20, a language concept is a set of syntactic requirements with a set of specified, in documentation, semantic and complexity requirements.

As with spoken language, we associate meaning with words
Even this is controversial

Names should not be associated with semantics because everybody has their own hidden assumptions about what semantics are, and they clash, causing comprehension problems without knowing why. This is why it's valuable to write code to reflect what code is actually doing, rather than what code "means": it's hard to have conceptual clashes about what code actually does.

— Craig Silverstein, personal correspondence

LegacyInputIterator and EqualityComparable are concepts
Concept semantics usually are specified in terms of existential quantifiers ($\forall$, exists\existsexists)
However, they also imply a precondition that the argument is within the domain of the operation (more discussion to follow).

Model¶

We say that a concept is modeled by specific types, or that the type models the concept, if the requirements are satisfied for these types.

By stating that a type models a concept, a type is providing a guarantee that it may be used where a concept is required.

Contracts¶

Contracts, or Design by Contract, is a systematic approach to ensuring the values passed to, and returned by an operation satisfy specific assertions.

If the execution of a certain task relies on a routine call to handle one of its subtasks, it is necessary to specify the relationship between the client (the call- er) and the supplier (the called routine) as precisely as possible. The mechanisms for expressing such conditions are called assertions. Some assertions, called preconditions and postconditions, apply to individual routines. Others, the class invariants, constrain all the routines of a given class...

— Applying "Design by Contract", Bertrand Meyer

Precondition¶

A precondition is a condition that must be satisfied by the argument values to a function, or by the state of the system (a global precondition) for the operation to execute correctly.
Correct execution can include returning an error.
[f, l) must form a valid range is a precondition
Not all preconditions can be asserted in code
- i.e. f(int* p) with the precondition that p is dereferenceable
We must still validate, prove, our code by hand
- Most concepts can not be asserted in code (because existential quantifiers) and many preconditions cannot be asserted in code.
An aside:
- A secure interface is an interface where all preconditions can be, and are, validated
  * which can also be phrased as an interface without preconditions
- A secure system requires the interface is secure and all the code in the system is correct

Postcondition¶

A postcondition is a guarantee about the result of an operation
result in this context is used broadly to include modified arguments and side-effects
For example,

The result of find(f, l, v) is an iterator within the range [f, l).

Is a postcondition of find().

Class Invariant¶

A class invariant is a postcondition that applies to all operations, including construction, on a class

Concepts, Partial Functions, and Domain¶

Compare the description for the old SGI STL LessThanComparable concept:

Expression semantics

Name	Expression	Precondition	Semantics	Postcondition
Less	x < y	x and y are in the domain of <

versus the C++17 concept.

Table 28: CppLessThanComparable requirements

Expression	Return type	Requirement
a < b	convertible to bool	< is a strict weak ordering relation

In the SGI STL the requirement of strict-weak-ordering was a separate concept.

Domain is defined in the C++ standard, but in the context of iterators. This passage used to refer to the domain of operations, but that has been narrowed to the domain of ==:

The term the domain of == is used in the ordinary mathematical sense to denote the set of values over which == is (required to be) defined. This set can change over time. Each algorithm places additional requirements on the domain of == for the iterator values it uses. These requirements can be inferred from the uses that algorithm makes of == and !=.

Expression	Return type	Operational Semantics	Assertion/notepre-/post-condition
a != b	contextually convertible to bool	!(a == b)	Preconditions: (a, b) is in the domain of ==

What was part of the definition of concepts in general has been weakened to a requirement for a single operation on iterators.

I've proposed to fix this in P2345:

The term domain of the operation is used in the ordinary mathematical sense to denote the set of values over which an operation is (required to be) defined. This set can change over time. Each component may place additional requirements on the domain of an operation. These requirements can be inferred from the uses that a component makes of the operation and is generally constrained to those values accessible through the operation’s arguments.

A partial function is a function whose domain is a subset of the values for the type of the operands

Because machines are physical and finite, many operations are partial
- addition on signed integers for example
Learn to manage partial functions with care to ensure values are within the functions domain

Axes of Freedom¶

Guarantees can be strengthened without breaking existing code
i.e. std::basic_string() only guarantees the value of a moved-from string is valid
- in the future it could guarantee the value is empty without breaking any code
Requirements can be weakened without breaking existing code
i.e. std::merge() currently requires a strict-weak ordering
- in the future that could be relaxed to a partial-ordering without breaking any code
Weakening a guarantee or strengthening a requirement is a breaking change
- best handled by introducing a new name for the type or operation

OutputIterators and sink functions¶

OutputIterators are isomorphic with a sink function object
- Function objects are simpler to write with lambda expressions
std::fill(), std::iota() and std::generate() would be better expressed with output iterators and sink function forms:

namespace bcc {

template <class T, class F> constexpr void iota(T first, T last, F out) { for (; first != last; ++first) { out(first); } }

} // namespace bcc

{ vector v; bcc::iota(0, 10, [&](int n) { v.push_back(n); }); display(v); }

Every algorithm using an output iterator should have a corresponding form with a sink function

Algorithm Semantics¶

The following code attempts to remove the first odd element from a sequence

{ vector a{0, 1, 2, 3, 4, 5};

erase_if(a, [n = 0](int x) mutable {
    return (x & 1) && !n++;
});
display(a);

}

Question: What went wrong?

template <class F, class P> auto remove_if(F f, F l, P pred) { f = find_if(f, l, pred); // <-- pred is passed by value

if (f == l) return f;

for (auto p = next(f); p != l; ++p) {
    if (!pred(*p)) *f++ = move(*p);
}
return f;

}

{ vector a{0, 1, 2, 3, 4, 5};

int n = 0;
auto first_is_odd = [&n](int x) {
    return (x & 1) && !n++;
};

erase_if(a, first_is_odd);
display(a);

}

Question: Does the above code fix the issue?

The requirement is that pred is a regular, pure, function although the standard wording is obtuse:

Given a glvalue u of type (possibly const) T that designates the same object as *first, pred(u) shall be a valid expression that is equal to pred(*first).

There is no filter operation in the standard which guarantees the predicate is evaluated exactly N-times, in order. If you need it, write it.

Positional Permutations¶

rotate
reverse
swap, swap_ranges
shuffle
next_permutation
Positional permutation reorder object based on their position, not their value
The algorithms in the standard library are basic building blocks for higher level algorithms
As we saw with remove_if(), many algorithms in the standard library are implemented in terms of other algorithms in the library
When confronted with a problem that is not a direct match to a standard algorithm, try and decompose the problem into one or more standard algorithms
All permutations algorithms can be decomposed into a series of cycles
Each cycle of length n requires n+1 move operations
swap_ranges() and reverse() are the worst case, where every cycle is length 2 and requires 3 moves
- to reverse a sequence of n elements requires (n / 2) * 3 moves

Reverse Cycles

A two-element cycle is implemented with swap()

Exercise: Write an algorithm, swirl(), that implements the following single cycle permutation.

Swirl Cycles
Swirl

Swirl (Even)

namespace v0 {

template void swirl(I f, I l) { auto m = next(f, distance(f, l) / 2); if (f == m) return; m = rotate(f, m, l); reverse(m, l); ++f; reverse(f, m); }

} // namespace v0

{ using namespace v0;

int a[]{0, 1, 2, 3, 4, 5, 6, 7, 8};
swirl(begin(a), end(a));

display(a);

}

Question: What are the requirements for I, f, and l?

namespace v1 {

template void swirl(I f, I l) { reverse(f, l); auto m = next(f, distance(f, l) / 2); if (m == l) return; rotate(f, m, next(m)); }

} // namespace v1

{ using namespace v1;

int a[]{0, 1, 2, 3, 4, 5, 6, 7, 8};
swirl(begin(a), end(a));

display(a);

}

As a general rule, an algorithm should not throw away information it calculated that the caller can't easily (small constant time) reconstruct.
swirl computes the mid-point, so it is a good thing to return
(Unfortunately, std::reverse() throws it away!)

namespace v2 {

template I swirl(I f, I l) { reverse(f, l); auto m = next(f, distance(f, l) / 2); if (m == l) return m; rotate(f, m, next(m)); return m; }

} // namespace v1

Homework: Implement swirl with the minimum number of moves

Predicate Permutations¶

partition & stable_partition

Predicate permutations use a predicate function to determine ordering. These algorithms partition the set into two sets. A stable partition algorithm preserves the relative order of the elements in each set.

Comparison Permutations¶

sort, stable_sort, & partial_sort
nth_element
make_heap

Comparison permutations use a comparison function to map an ordering of the values to an ordering of their position

Comparison function is required to be a strict-weak ordering:

An ordering relation, prec\precprec, is a strict-weak ordering iff

\begin{align} (a & \nprec a). && \text{(Irreflexivity)} \\ (a & \prec b) \wedge (b \prec c) \implies a \prec c. && \text {(Transitivity)} \\ (a & \equiv b) \wedge (b \equiv c) \implies a \equiv c. && \text {(Equivalence Transitivity)}\\ \end{align}

Where aaa and bbb, are equivalent, equiv\equivequiv, iff (anprecb)wedge(bnpreca)(a \nprec b) \wedge (b \nprec a)(anprecb)wedge(bnpreca).

The default is to use operator<()
On a type, the expectation is that operator<() is a total ordering
- Which is consistent with other operations on the type

A total ordering is a strict-weak ordering where the defined equivalence is equality

operator<() is not defined on std::complex<> because there is no definition consistent with multiplication
- For example, both i>0i > 0i>0 and i<0i < 0i<0 imply that 0<i20 < i^20<i2, however, i2=−1i^2 = -1i2=−1
Despite nan values, both float and double are totally-ordered because nan is explicitly outside the value domain for floating point types
- nan is not a number
A floating point object containing nan is partially formed
Don't try and sort a sequence containing nan values with operator<()
- The result is UB and will likely crash

C++20 defines the ordering on floating-point types as a std::partial_ordering because of nan values. I find this complicates matters unnecessarily, and means the requirements of the comparison operator cannot be defined with a concept but require a precondition. See prior discussion of domain of an operation.

The sort and nth element operations establish a direct relationship between the order of elements and their position in the sequence.
make_heap orders the elements as a max heap
make_heap, and the related heap operation are included in the standard library because they are used by sort
sort in the standard library is an introsort, which is an introspective quicksort that switches to heap sort when the recursion depths exceeds a level kcdotlog(n)k \cdot log(n)kcdotlog(n)
nth_element sorts a sequence enough so that the specified element is in the correct location
A postcondition of nth_element is that the sequence is partitions such that every element before the nth element is less than or equal to it, and every element after is greater than or equal.
With nth_element and partial_sort you can sort a subrange of a sequence as if the entire range where sorted
This has the same time complexity as a full sort, ncdotlog(n)n \cdot log(n)ncdotlog(n), but can be considerably faster
This is very useful when you want to sort a "window" view into a large sequence

template // I models RandomAccessIterator void sort_subrange_0(I f, I l, I sf, I sl) { std::sort(f, l); }

template // I models RandomAccessIterator void sort_subrange(I f, I l, I sf, I sl) { if (f != sf && sf != l) { std::nth_element(f, sf, l); // partition [f, l) at sf ++sf; } std::partial_sort(sf, sl, l); }

Operations on sorted sequences¶

lower_bound, upper_bound, equal_range, binary_search
merge
includes, set_difference, set_intersection, set_symmetric_difference, set_union
lower_bound should is the most commonly used way to do a binary search for an element in a sequence

{ int a[]{ 0, 1, 2, 3, 3, 3, 4, 4, 5, 6 }; auto p = lower_bound(begin(a), end(a), 3); display(a); cout << string(distance(begin(a), p) * 3 + 1, ' ') << "^\n"; }

Projections¶

The range algorithms in C++20 contain projection arguments
Projections are very useful with the comparison permutations and operations on sorted sequences to provide symmetry

{ struct employee { string _first; string _last; };

employee a[]{{"Joe", "Zimmer"}, {"Frank", "Berse"}};

sort(begin(a), end(a), [](const employee& a, const employee& b) {
    return a._last < b._last;
});
auto p = lower_bound(begin(a), end(a), "Zimmer"s, [](const employee& a, const string& b) {
    return a._last < b;
});

display(p->_first);

}

{ struct employee { string _first; string _last; };

employee a[]{{"Joe", "Zimmer"}, {"Frank", "Berse"}};

ranges::sort(a, less<>(), &employee::_last);
auto p = ranges::lower_bound(a, "Zimmer"s, less<>(), &employee::_last);

display(p->_first);

}

"Joe"

Iterator hierarchy (and why you probably shouldn't care)¶

Composition vs. multi-pass¶

Generators vs input iterator¶

Heap operations¶

make_heap, push_heap, pop_heap, sort_heap
priority_queue

Exercise: Create a priority_queue of my_type.

Can you move out the top element?

Exercise: Create a new priority_queue adapter that supports move using the heap operations.