Notebook slides (original) (raw)
Algorithms¶
Rubric: No raw loops
{ int r = a < b ? a : b; }
- What does this line of code do?
{ // r is the minimum of a and b int r = a < b ? a : b; }
A comment helps...
Naming an operation, even a simple operation, lowers cognitive overhead
/// Returns the minimum of two integers int min(int a, int b) { return a < b ? a : b; }
- A function allows us to associate a name with semantics
/// Returns the minimum of two integers int min(int a, int b);
- Functions allow us to build a vocabulary and focus on what the code does
Naming a Function (Guidelines)¶
Operations with the same semantics should have the same name
- Follow existing conventions
Name should describe the postconditions and make the use clear
For properties and constant(ish) complexity operations use:
- nouns:
capacity - adjectives:
empty(ambiguous but used by convention) - copular constructions:
is_blue
- nouns:
For higher complexity operations use:
- verbs:
partition
*a = sort(b)nota = sorted(b)
- verbs:
For setting properties:
- Prefix with the verb,
set_, i.e.set_capacity
- Prefix with the verb,
Clarity is of highest priority. Don't construct unnatural verb phrases
intersection(a, b)notcalculate_intersection(a, b)namenotget_name
Specifying a Function¶
- at a minimum, the specification should be a sentence fragment defining the postconditions
- State any preconditions that are not clear from the types and conventions
/// Returns the truncated square root of `x`. `x` must be >= 0.
int sqrti(int x);
Argument Types¶
- let: by-const-lvalue-reference or by-value for known small types and invocable types
- inout: by-lvalue-reference, prefer sink arguments and result to in-out arguments
- sink: by-rvalue-reference or by-value for known small types and to avoid forwarding references
spans, views, iterator pairs, and so on are a way to pass a range of objects as if they were a simple argument. The value_type of the range determines if it is in (const), or inout (not const) with input ranges (input iterators) used for sink.
Implicit Preconditions¶
Object Lifetimes¶
- Caller must ensure referenced arguments are valid for duration of the call
Validity¶
- An invalid object cannot be used as an argument
Law of Exclusivity¶
{ vector a{0, 0, 1, 0, 1 }; erase(a, a[0]); }
- What is in
aafter this code?
{ vector a{0, 0, 1, 0, 1 }; erase(a, a[0]); display(a); }
This fails because a[0] is passed by a reference aliasing the vector, a. The standard is inconsistent in how it deals with aliasing with mutations. Unless aliasing is explicitly allowed, avoid it.
{ vector a{0, 0, 1, 0, 1 }; erase(a, copy(a[0])); display(a); }
The Law of Exclusivity is borrowed from Swift, and the term was coined by John McCall. To modify a variable, exclusive access to that variable is required. C++ does not enforce this rule; it must be manually enforced.
The Law of Exclusivity has far greater implications outside the scope of this talk. You might be interested in learning about the Val language.
Ranges¶
- Iterator pairs denote a valid half-open interval
/// Transmogrify the elements in the range [f, l). template void transmogrify(ForwardIterator f, ForwardIterator l);
Implicit Postconditions¶
Validity of References¶
Any reference (pointer, iterator, etc.) to an object is assumed to be invalidated if the object is mutated.
A returned reference must be to one of the arguments to the function (or a part of one of the arguments) and is valid until the argument is modified or its lifetime ends.
Loops and Recursion¶
{ /// removes values equal to a in the range [f, l) /// returns the position, b, such that [f, b) contains all the values in [f, l) not equal to a /// values in the range [b, l) are unspecified auto remove = [](auto f, auto l, const auto& a) { auto b = find(f, l, a); if (b == l) return b; auto p = next(b); // invariant: all values in the range [f, b) do not equal x // decreasing: distance(p, l) while (p != l) { if (*p != a) { *b = std::move(*p); ++b; } ++p; } return b; };
vector a{ 0, 0, 1, 0, 1 };
a.erase(remove(begin(a), end(a), 0), end(a));
display(a);}
Sequences¶
For a sequence of n elements, there are n + 1 positions
How to represent a range of elements?
Problem with closed interval
[f, l]?Problem with open interval
(f, l)?Half-open intervals have significant advantages
[f, l)- By strong convention we are open on the right
[p, p)represents an empty range, at positionp- All empty ranges are not equal
Think of the positions as the lines between the elements
Sequence With Pointers To Objects
Sequence With Pointers Between Objects
Limitations of half-open intervals
- The last element cannot be enclosed if the position type is the same as the element type
- i.e. You can't express the range
[INT_MIN, INT_MAX)with type int
There are different common ways to represent a half-open interval
A position
fin a sequence can be denoted with an index, pointer, or iterator- The only requirement is that
fbe incrementable to obtain the next element in the sequence
- The only requirement is that
[f, l)[f, f + n) _n[f, predicate()) _until[f, is_sentinel())NTBSconst char*
[f, ...)unbounded (dependent on something else)- i.e. range is required to be same or greater length than another range
For a variable
ain C and C++, it is guaranteed that&a + 1yields a valid, but not dereferenceable, pointer[&a, &a + 1)is a valid range
Common algorithms and their uses¶
- A great resource for finding standard algorithms:
Generic Algorithms¶
find(f, l, a)returns the position of the first element in the range[f, l)that is equal toa.
{ int a[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; display(*find(begin(a), end(a), 5)); }
Question: How do we know find() will find a value?
- Iterator must meet the requirements of LegacyInputIterator
[f, l)must form a valid range- the value type must be EqualityComparable to the iterator
value_type
Requirements and Guarantees¶
A generic algorithm is specified in terms of a set of requirements on its arguments. The requirements are a set of concepts and preconditions which, if satisfied, guarantee the algorithm performs as specified
The C++ standard contains tables of named requirements and concepts
Concepts and Preconditions are closely related and both ideas are rooted in Hoare Logic
A given type or operation guarantees it satisfies some requirements
Guarantees are provided by modeling concepts, and ensuring postconditions and class invariants
The description of each type in the standard will specify it is a (model of) a concept as well as specifying the semantics (postconditions and class invariants) of operations and object properties
By matching requirements with guarantees we create software which works
Concept¶
The term concept first appeared in the paper Fundamentals of Generic Programming:
We call the set of axioms satisfied by a data type and a set of operations on it a concept.
— Fundamentals of Generic Programming, Dehnert & Stepanov
In C++20, a language concept is a set of syntactic requirements with a set of specified, in documentation, semantic and complexity requirements.
- As with spoken language, we associate meaning with words
- Even this is controversial
Names should not be associated with semantics because everybody has their own hidden assumptions about what semantics are, and they clash, causing comprehension problems without knowing why. This is why it's valuable to write code to reflect what code is actually doing, rather than what code "means": it's hard to have conceptual clashes about what code actually does.
— Craig Silverstein, personal correspondence
LegacyInputIterator and EqualityComparable are concepts
Concept semantics usually are specified in terms of existential quantifiers ($\forall$, exists\existsexists)
However, they also imply a precondition that the argument is within the domain of the operation (more discussion to follow).
Model¶
We say that a concept is modeled by specific types, or that the type models the concept, if the requirements are satisfied for these types.
- By stating that a type models a concept, a type is providing a guarantee that it may be used where a concept is required.
Contracts¶
Contracts, or Design by Contract, is a systematic approach to ensuring the values passed to, and returned by an operation satisfy specific assertions.
If the execution of a certain task relies on a routine call to handle one of its subtasks, it is necessary to specify the relationship between the client (the call- er) and the supplier (the called routine) as precisely as possible. The mechanisms for expressing such conditions are called assertions. Some assertions, called preconditions and postconditions, apply to individual routines. Others, the class invariants, constrain all the routines of a given class...
— Applying "Design by Contract", Bertrand Meyer
Precondition¶
A precondition is a condition that must be satisfied by the argument values to a function, or by the state of the system (a global precondition) for the operation to execute correctly.
Correct execution can include returning an error.
[f, l)must form a valid range is a preconditionNot all preconditions can be asserted in code
- i.e.
f(int* p)with the precondition thatpis dereferenceable
- i.e.
We must still validate, prove, our code by hand
- Most concepts can not be asserted in code (because existential quantifiers) and many preconditions cannot be asserted in code.
An aside:
- A secure interface is an interface where all preconditions can be, and are, validated
* which can also be phrased as an interface without preconditions - A secure system requires the interface is secure and all the code in the system is correct
- A secure interface is an interface where all preconditions can be, and are, validated
Postcondition¶
A postcondition is a guarantee about the result of an operation
result in this context is used broadly to include modified arguments and side-effects
For example,
The result of
find(f, l, v)is an iterator within the range[f, l).
Is a postcondition of find().
Class Invariant¶
- A class invariant is a postcondition that applies to all operations, including construction, on a class
Concepts, Partial Functions, and Domain¶
Compare the description for the old SGI STL LessThanComparable concept:
Expression semantics
| Name | Expression | Precondition | Semantics | Postcondition |
|---|---|---|---|---|
| Less | x < y | x and y are in the domain of < |
versus the C++17 concept.
Table 28: CppLessThanComparable requirements
| Expression | Return type | Requirement |
|---|---|---|
| a < b | convertible to bool | < is a strict weak ordering relation |
In the SGI STL the requirement of strict-weak-ordering was a separate concept.
Domain is defined in the C++ standard, but in the context of iterators. This passage used to refer to the domain of operations, but that has been narrowed to the domain of ==:
The term the domain of == is used in the ordinary mathematical sense to denote the set of values over which == is (required to be) defined. This set can change over time. Each algorithm places additional requirements on the domain of == for the iterator values it uses. These requirements can be inferred from the uses that algorithm makes of == and !=.
| Expression | Return type | Operational Semantics | Assertion/notepre-/post-condition |
|---|---|---|---|
| a != b | contextually convertible to bool | !(a == b) | Preconditions: (a, b) is in the domain of == |
What was part of the definition of concepts in general has been weakened to a requirement for a single operation on iterators.
- I've proposed to fix this in P2345:
The term domain of the operation is used in the ordinary mathematical sense to denote the set of values over which an operation is (required to be) defined. This set can change over time. Each component may place additional requirements on the domain of an operation. These requirements can be inferred from the uses that a component makes of the operation and is generally constrained to those values accessible through the operation’s arguments.
A partial function is a function whose domain is a subset of the values for the type of the operands
Because machines are physical and finite, many operations are partial
- addition on signed integers for example
Learn to manage partial functions with care to ensure values are within the functions domain
Axes of Freedom¶
Guarantees can be strengthened without breaking existing code
i.e.
std::basic_string()only guarantees the value of a moved-from string is valid- in the future it could guarantee the value is empty without breaking any code
Requirements can be weakened without breaking existing code
i.e.
std::merge()currently requires a strict-weak ordering- in the future that could be relaxed to a partial-ordering without breaking any code
Weakening a guarantee or strengthening a requirement is a breaking change
- best handled by introducing a new name for the type or operation
OutputIterators and sink functions¶
OutputIterators are isomorphic with a sink function object
- Function objects are simpler to write with lambda expressions
std::fill(),std::iota()andstd::generate()would be better expressed with output iterators and sink function forms:
namespace bcc {
template <class T, class F> constexpr void iota(T first, T last, F out) { for (; first != last; ++first) { out(first); } }
} // namespace bcc
{ vector v; bcc::iota(0, 10, [&](int n) { v.push_back(n); }); display(v); }
- Every algorithm using an output iterator should have a corresponding form with a sink function
Algorithm Semantics¶
- The following code attempts to remove the first odd element from a sequence
{ vector a{0, 1, 2, 3, 4, 5};
erase_if(a, [n = 0](int x) mutable {
return (x & 1) && !n++;
});
display(a);}
Question: What went wrong?
template <class F, class P> auto remove_if(F f, F l, P pred) { f = find_if(f, l, pred); // <-- pred is passed by value
if (f == l) return f;
for (auto p = next(f); p != l; ++p) {
if (!pred(*p)) *f++ = move(*p);
}
return f;}
{ vector a{0, 1, 2, 3, 4, 5};
int n = 0;
auto first_is_odd = [&n](int x) {
return (x & 1) && !n++;
};
erase_if(a, first_is_odd);
display(a);}
Question: Does the above code fix the issue?
- The requirement is that
predis a regular, pure, function although the standard wording is obtuse:
Given a glvalue
uof type (possiblyconst)Tthat designates the same object as*first,pred(u)shall be a valid expression that is equal topred(*first).
- There is no filter operation in the standard which guarantees the predicate is evaluated exactly N-times, in order. If you need it, write it.
Positional Permutations¶
rotatereverseswap,swap_rangesshufflenext_permutationPositional permutation reorder object based on their position, not their value
The algorithms in the standard library are basic building blocks for higher level algorithms
As we saw with
remove_if(), many algorithms in the standard library are implemented in terms of other algorithms in the libraryWhen confronted with a problem that is not a direct match to a standard algorithm, try and decompose the problem into one or more standard algorithms
All permutations algorithms can be decomposed into a series of cycles
Each cycle of length n requires n+1 move operations
swap_ranges()andreverse()are the worst case, where every cycle is length 2 and requires 3 moves- to reverse a sequence of
nelements requires(n / 2) * 3moves
- to reverse a sequence of
Reverse Cycles
- A two-element cycle is implemented with
swap()
Exercise: Write an algorithm, swirl(), that implements the following single cycle permutation.
Swirl
Swirl (Even)
namespace v0 {
template void swirl(I f, I l) { auto m = next(f, distance(f, l) / 2); if (f == m) return; m = rotate(f, m, l); reverse(m, l); ++f; reverse(f, m); }
} // namespace v0
{ using namespace v0;
int a[]{0, 1, 2, 3, 4, 5, 6, 7, 8};
swirl(begin(a), end(a));
display(a);}
Question: What are the requirements for I, f, and l?
namespace v1 {
template void swirl(I f, I l) { reverse(f, l); auto m = next(f, distance(f, l) / 2); if (m == l) return; rotate(f, m, next(m)); }
} // namespace v1
{ using namespace v1;
int a[]{0, 1, 2, 3, 4, 5, 6, 7, 8};
swirl(begin(a), end(a));
display(a);}
As a general rule, an algorithm should not throw away information it calculated that the caller can't easily (small constant time) reconstruct.
swirlcomputes the mid-point, so it is a good thing to return(Unfortunately,
std::reverse()throws it away!)
namespace v2 {
template I swirl(I f, I l) { reverse(f, l); auto m = next(f, distance(f, l) / 2); if (m == l) return m; rotate(f, m, next(m)); return m; }
} // namespace v1
Homework: Implement swirl with the minimum number of moves
Predicate Permutations¶
partition&stable_partition
Predicate permutations use a predicate function to determine ordering. These algorithms partition the set into two sets. A stable partition algorithm preserves the relative order of the elements in each set.
Comparison Permutations¶
sort,stable_sort, &partial_sortnth_elementmake_heap
Comparison permutations use a comparison function to map an ordering of the values to an ordering of their position
- Comparison function is required to be a strict-weak ordering:
An ordering relation, prec\precprec, is a strict-weak ordering iff
\begin{align} (a & \nprec a). && \text{(Irreflexivity)} \\ (a & \prec b) \wedge (b \prec c) \implies a \prec c. && \text {(Transitivity)} \\ (a & \equiv b) \wedge (b \equiv c) \implies a \equiv c. && \text {(Equivalence Transitivity)}\\ \end{align}
Where aaa and bbb, are equivalent, equiv\equivequiv, iff (anprecb)wedge(bnpreca)(a \nprec b) \wedge (b \nprec a)(anprecb)wedge(bnpreca).
- The default is to use
operator<() - On a type, the expectation is that
operator<()is a total ordering- Which is consistent with other operations on the type
A total ordering is a strict-weak ordering where the defined equivalence is equality
operator<()is not defined onstd::complex<>because there is no definition consistent with multiplication- For example, both i>0i > 0i>0 and i<0i < 0i<0 imply that 0<i20 < i^20<i2, however, i2=−1i^2 = -1i2=−1
Despite
nanvalues, bothfloatanddoubleare totally-ordered becausenanis explicitly outside the value domain for floating point typesnanis not a number
A floating point object containing
nanis partially formedDon't try and sort a sequence containing
nanvalues withoperator<()- The result is UB and will likely crash
C++20 defines the ordering on floating-point types as a std::partial_ordering because of nan values. I find this complicates matters unnecessarily, and means the requirements of the comparison operator cannot be defined with a concept but require a precondition. See prior discussion of domain of an operation.
The
sortandnth elementoperations establish a direct relationship between the order of elements and their position in the sequence.make_heaporders the elements as a max heapmake_heap, and the related heap operation are included in the standard library because they are used bysortsortin the standard library is an introsort, which is an introspective quicksort that switches to heap sort when the recursion depths exceeds a level kcdotlog(n)k \cdot log(n)kcdotlog(n)nth_elementsorts a sequence enough so that the specified element is in the correct locationA postcondition of
nth_elementis that the sequence is partitions such that every element before the nth element is less than or equal to it, and every element after is greater than or equal.With
nth_elementandpartial_sortyou can sort a subrange of a sequence as if the entire range where sortedThis has the same time complexity as a full sort, ncdotlog(n)n \cdot log(n)ncdotlog(n), but can be considerably faster
This is very useful when you want to sort a "window" view into a large sequence
template // I models RandomAccessIterator void sort_subrange_0(I f, I l, I sf, I sl) { std::sort(f, l); }
template // I models RandomAccessIterator void sort_subrange(I f, I l, I sf, I sl) { if (f != sf && sf != l) { std::nth_element(f, sf, l); // partition [f, l) at sf ++sf; } std::partial_sort(sf, sl, l); }
Operations on sorted sequences¶
lower_bound,upper_bound,equal_range,binary_searchmergeincludes,set_difference,set_intersection,set_symmetric_difference,set_unionlower_boundshould is the most commonly used way to do a binary search for an element in a sequence
{ int a[]{ 0, 1, 2, 3, 3, 3, 4, 4, 5, 6 }; auto p = lower_bound(begin(a), end(a), 3); display(a); cout << string(distance(begin(a), p) * 3 + 1, ' ') << "^\n"; }
Projections¶
The range algorithms in C++20 contain projection arguments
Projections are very useful with the comparison permutations and operations on sorted sequences to provide symmetry
{ struct employee { string _first; string _last; };
employee a[]{{"Joe", "Zimmer"}, {"Frank", "Berse"}};
sort(begin(a), end(a), [](const employee& a, const employee& b) {
return a._last < b._last;
});
auto p = lower_bound(begin(a), end(a), "Zimmer"s, [](const employee& a, const string& b) {
return a._last < b;
});
display(p->_first);}
{ struct employee { string _first; string _last; };
employee a[]{{"Joe", "Zimmer"}, {"Frank", "Berse"}};
ranges::sort(a, less<>(), &employee::_last);
auto p = ranges::lower_bound(a, "Zimmer"s, less<>(), &employee::_last);
display(p->_first);}
"Joe"
Iterator hierarchy (and why you probably shouldn't care)¶
Composition vs. multi-pass¶
Generators vs input iterator¶
Heap operations¶
make_heap,push_heap,pop_heap,sort_heappriority_queue
Exercise: Create a priority_queue of my_type.
- Can you move out the top element?
Exercise: Create a new priority_queue adapter that supports move using the heap operations.