ref: c4e51c5678d7f359bf0d33dba8f0bd422d979725
dir: /sys/src/cmd/python/Doc/lib/libsets.tex/
\section{\module{sets} --- Unordered collections of unique elements} \declaremodule{standard}{sets} \modulesynopsis{Implementation of sets of unique elements.} \moduleauthor{Greg V. Wilson}{gvwilson@nevex.com} \moduleauthor{Alex Martelli}{aleax@aleax.it} \moduleauthor{Guido van Rossum}{guido@python.org} \sectionauthor{Raymond D. Hettinger}{python@rcn.com} \versionadded{2.3} The \module{sets} module provides classes for constructing and manipulating unordered collections of unique elements. Common uses include membership testing, removing duplicates from a sequence, and computing standard math operations on sets such as intersection, union, difference, and symmetric difference. Like other collections, sets support \code{\var{x} in \var{set}}, \code{len(\var{set})}, and \code{for \var{x} in \var{set}}. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior. Most set applications use the \class{Set} class which provides every set method except for \method{__hash__()}. For advanced applications requiring a hash method, the \class{ImmutableSet} class adds a \method{__hash__()} method but omits methods which alter the contents of the set. Both \class{Set} and \class{ImmutableSet} derive from \class{BaseSet}, an abstract class useful for determining whether something is a set: \code{isinstance(\var{obj}, BaseSet)}. The set classes are implemented using dictionaries. Accordingly, the requirements for set elements are the same as those for dictionary keys; namely, that the element defines both \method{__eq__} and \method{__hash__}. As a result, sets cannot contain mutable elements such as lists or dictionaries. However, they can contain immutable collections such as tuples or instances of \class{ImmutableSet}. For convenience in implementing sets of sets, inner sets are automatically converted to immutable form, for example, \code{Set([Set(['dog'])])} is transformed to \code{Set([ImmutableSet(['dog'])])}. \begin{classdesc}{Set}{\optional{iterable}} Constructs a new empty \class{Set} object. If the optional \var{iterable} parameter is supplied, updates the set with elements obtained from iteration. All of the elements in \var{iterable} should be immutable or be transformable to an immutable using the protocol described in section~\ref{immutable-transforms}. \end{classdesc} \begin{classdesc}{ImmutableSet}{\optional{iterable}} Constructs a new empty \class{ImmutableSet} object. If the optional \var{iterable} parameter is supplied, updates the set with elements obtained from iteration. All of the elements in \var{iterable} should be immutable or be transformable to an immutable using the protocol described in section~\ref{immutable-transforms}. Because \class{ImmutableSet} objects provide a \method{__hash__()} method, they can be used as set elements or as dictionary keys. \class{ImmutableSet} objects do not have methods for adding or removing elements, so all of the elements must be known when the constructor is called. \end{classdesc} \subsection{Set Objects \label{set-objects}} Instances of \class{Set} and \class{ImmutableSet} both provide the following operations: \begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result} \lineiii{len(\var{s})}{}{cardinality of set \var{s}} \hline \lineiii{\var{x} in \var{s}}{} {test \var{x} for membership in \var{s}} \lineiii{\var{x} not in \var{s}}{} {test \var{x} for non-membership in \var{s}} \lineiii{\var{s}.issubset(\var{t})}{\code{\var{s} <= \var{t}}} {test whether every element in \var{s} is in \var{t}} \lineiii{\var{s}.issuperset(\var{t})}{\code{\var{s} >= \var{t}}} {test whether every element in \var{t} is in \var{s}} \hline \lineiii{\var{s}.union(\var{t})}{\var{s} \textbar{} \var{t}} {new set with elements from both \var{s} and \var{t}} \lineiii{\var{s}.intersection(\var{t})}{\var{s} \&\ \var{t}} {new set with elements common to \var{s} and \var{t}} \lineiii{\var{s}.difference(\var{t})}{\var{s} - \var{t}} {new set with elements in \var{s} but not in \var{t}} \lineiii{\var{s}.symmetric_difference(\var{t})}{\var{s} \^\ \var{t}} {new set with elements in either \var{s} or \var{t} but not both} \lineiii{\var{s}.copy()}{} {new set with a shallow copy of \var{s}} \end{tableiii} Note, the non-operator versions of \method{union()}, \method{intersection()}, \method{difference()}, and \method{symmetric_difference()} will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like \code{Set('abc') \&\ 'cbs'} in favor of the more readable \code{Set('abc').intersection('cbs')}. \versionchanged[Formerly all arguments were required to be sets]{2.3.1} In addition, both \class{Set} and \class{ImmutableSet} support set to set comparisons. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). A set is less than another set if and only if the first set is a proper subset of the second set (is a subset, but is not equal). A set is greater than another set if and only if the first set is a proper superset of the second set (is a superset, but is not equal). The subset and equality comparisons do not generalize to a complete ordering function. For example, any two disjoint sets are not equal and are not subsets of each other, so \emph{all} of the following return \code{False}: \code{\var{a}<\var{b}}, \code{\var{a}==\var{b}}, or \code{\var{a}>\var{b}}. Accordingly, sets do not implement the \method{__cmp__} method. Since sets only define partial ordering (subset relationships), the output of the \method{list.sort()} method is undefined for lists of sets. The following table lists operations available in \class{ImmutableSet} but not found in \class{Set}: \begin{tableii}{c|l}{code}{Operation}{Result} \lineii{hash(\var{s})}{returns a hash value for \var{s}} \end{tableii} The following table lists operations available in \class{Set} but not found in \class{ImmutableSet}: \begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result} \lineiii{\var{s}.update(\var{t})} {\var{s} \textbar= \var{t}} {return set \var{s} with elements added from \var{t}} \lineiii{\var{s}.intersection_update(\var{t})} {\var{s} \&= \var{t}} {return set \var{s} keeping only elements also found in \var{t}} \lineiii{\var{s}.difference_update(\var{t})} {\var{s} -= \var{t}} {return set \var{s} after removing elements found in \var{t}} \lineiii{\var{s}.symmetric_difference_update(\var{t})} {\var{s} \textasciicircum= \var{t}} {return set \var{s} with elements from \var{s} or \var{t} but not both} \hline \lineiii{\var{s}.add(\var{x})}{} {add element \var{x} to set \var{s}} \lineiii{\var{s}.remove(\var{x})}{} {remove \var{x} from set \var{s}; raises \exception{KeyError} if not present} \lineiii{\var{s}.discard(\var{x})}{} {removes \var{x} from set \var{s} if present} \lineiii{\var{s}.pop()}{} {remove and return an arbitrary element from \var{s}; raises \exception{KeyError} if empty} \lineiii{\var{s}.clear()}{} {remove all elements from set \var{s}} \end{tableiii} Note, the non-operator versions of \method{update()}, \method{intersection_update()}, \method{difference_update()}, and \method{symmetric_difference_update()} will accept any iterable as an argument. \versionchanged[Formerly all arguments were required to be sets]{2.3.1} Also note, the module also includes a \method{union_update()} method which is an alias for \method{update()}. The method is included for backwards compatibility. Programmers should prefer the \method{update()} method because it is supported by the builtin \class{set()} and \class{frozenset()} types. \subsection{Example \label{set-example}} \begin{verbatim} >>> from sets import Set >>> engineers = Set(['John', 'Jane', 'Jack', 'Janice']) >>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice']) >>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack']) >>> employees = engineers | programmers | managers # union >>> engineering_management = engineers & managers # intersection >>> fulltime_management = managers - engineers - programmers # difference >>> engineers.add('Marvin') # add element >>> print engineers Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack']) >>> employees.issuperset(engineers) # superset test False >>> employees.union_update(engineers) # update from another set >>> employees.issuperset(engineers) True >>> for group in [engineers, programmers, managers, employees]: ... group.discard('Susan') # unconditionally remove element ... print group ... Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack']) Set(['Janice', 'Jack', 'Sam']) Set(['Jane', 'Zack', 'Jack']) Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack']) \end{verbatim} \subsection{Protocol for automatic conversion to immutable \label{immutable-transforms}} Sets can only contain immutable elements. For convenience, mutable \class{Set} objects are automatically copied to an \class{ImmutableSet} before being added as a set element. The mechanism is to always add a hashable element, or if it is not hashable, the element is checked to see if it has an \method{__as_immutable__()} method which returns an immutable equivalent. Since \class{Set} objects have a \method{__as_immutable__()} method returning an instance of \class{ImmutableSet}, it is possible to construct sets of sets. A similar mechanism is needed by the \method{__contains__()} and \method{remove()} methods which need to hash an element to check for membership in a set. Those methods check an element for hashability and, if not, check for a \method{__as_temporarily_immutable__()} method which returns the element wrapped by a class that provides temporary methods for \method{__hash__()}, \method{__eq__()}, and \method{__ne__()}. The alternate mechanism spares the need to build a separate copy of the original mutable object. \class{Set} objects implement the \method{__as_temporarily_immutable__()} method which returns the \class{Set} object wrapped by a new class \class{_TemporarilyImmutableSet}. The two mechanisms for adding hashability are normally invisible to the user; however, a conflict can arise in a multi-threaded environment where one thread is updating a set while another has temporarily wrapped it in \class{_TemporarilyImmutableSet}. In other words, sets of mutable sets are not thread-safe. \subsection{Comparison to the built-in \class{set} types \label{comparison-to-builtin-set}} The built-in \class{set} and \class{frozenset} types were designed based on lessons learned from the \module{sets} module. The key differences are: \begin{itemize} \item \class{Set} and \class{ImmutableSet} were renamed to \class{set} and \class{frozenset}. \item There is no equivalent to \class{BaseSet}. Instead, use \code{isinstance(x, (set, frozenset))}. \item The hash algorithm for the built-ins performs significantly better (fewer collisions) for most datasets. \item The built-in versions have more space efficient pickles. \item The built-in versions do not have a \method{union_update()} method. Instead, use the \method{update()} method which is equivalent. \item The built-in versions do not have a \method{_repr(sorted=True)} method. Instead, use the built-in \function{repr()} and \function{sorted()} functions: \code{repr(sorted(s))}. \item The built-in version does not have a protocol for automatic conversion to immutable. Many found this feature to be confusing and no one in the community reported having found real uses for it. \end{itemize}