.. _text-bytes: .. _bytes_mode: Bytes/text management ===================== The LDAP protocol states that some fields (distinguished names, relative distinguished names, attribute names, queries) be encoded in UTF-8. In python-ldap, these are represented as text (``str`` on Python 3). Attribute *values*, on the other hand, **MAY** contain any type of data, including text. To know what type of data is represented, python-ldap would need access to the schema, which is not always available (nor always correct). Thus, attribute values are *always* treated as ``bytes``. Encoding/decoding to other formats – text, images, etc. – is left to the caller. Historical note --------------- Python 3 introduced a hard distinction between *text* (``str``) – sequences of characters (formally, *Unicode codepoints*) – and ``bytes`` – sequences of 8-bit values used to encode *any* kind of data for storage or transmission. Python 2 had the same distinction between ``str`` (bytes) and ``unicode`` (text). However, values could be implicitly converted between these types as needed, e.g. when comparing or writing to disk or the network. The implicit encoding and decoding can be a source of subtle bugs when not designed and tested adequately. In python-ldap 2.x (for Python 2), bytes were used for all fields, including those guaranteed to be text. From version 3.0 to 3.3, python-ldap uses text where appropriate. On Python 2, special ``bytes_mode`` and ``bytes_strictness`` settings influenced how text was handled. From version 3.3 on, only Python 3 is supported. The “bytes mode” settings are deprecated and do nothing.