|  |  |  | Python/C API Reference Manual |  |  |  | 
 
 
7.3.2 Unicode Objects 
These are the basic Unicode object types used for the Unicode
implementation in Python:
- Py_UNICODE
- 
  This type represents the storage type which is used by Python
  internally as basis for holding Unicode ordinals.  Python's default
  builds use a 16-bit type for Py_UNICODE and store Unicode
  values internally as UCS2. It is also possible to build a UCS4
  version of Python (most recent Linux distributions come with UCS4
  builds of Python). These builds then use a 32-bit type for
  Py_UNICODE and store Unicode data internally as UCS4. On
  platforms where wchar_t is available and compatible with the
  chosen Python Unicode build variant, Py_UNICODE is a typedef
  alias for wchar_t to enhance native platform compatibility.
  On all other platforms, Py_UNICODE is a typedef alias for
  either unsigned short (UCS2) or unsigned long
  (UCS4).
Note that UCS2 and UCS4 Python builds are not binary compatible.
Please keep this in mind when writing extensions or interfaces.
- PyUnicodeObject
- 
  This subtype of PyObject represents a Python Unicode object.
- PyTypeObject PyUnicode_Type
- 
  This instance of PyTypeObject represents the Python Unicode
  type.  It is exposed to Python code as unicodeandtypes.UnicodeType.
The following APIs are really C macros and can be used to do fast
checks and to access internal read-only data of Unicode objects:
| int PyUnicode_Check( | PyObject *o) |  
 
- 
  Return true if the object o is a Unicode object or an
  instance of a Unicode subtype.
  
Changed in version 2.2:
Allowed subtypes to be accepted.
| int PyUnicode_CheckExact( | PyObject *o) |  
 
- 
  Return true if the object o is a Unicode object, but not an
  instance of a subtype.
  
New in version 2.2.
| Py_ssize_t PyUnicode_GET_SIZE( | PyObject *o) |  
 
- 
  Return the size of the object.  o has to be a
  PyUnicodeObject (not checked).
| Py_ssize_t PyUnicode_GET_DATA_SIZE( | PyObject *o) |  
 
- 
  Return the size of the object's internal buffer in bytes.  o
  has to be a PyUnicodeObject (not checked).
| Py_UNICODE* PyUnicode_AS_UNICODE( | PyObject *o) |  
 
- 
  Return a pointer to the internal Py_UNICODE buffer of the
  object.  o has to be a PyUnicodeObject (not checked).
| const char* PyUnicode_AS_DATA( | PyObject *o) |  
 
- 
  Return a pointer to the internal buffer of the object.
  o has to be a PyUnicodeObject (not checked).
Unicode provides many different character properties. The most often
needed ones are available through these macros which are mapped to C
functions depending on the Python configuration.
| int Py_UNICODE_ISSPACE( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a whitespace
  character.
| int Py_UNICODE_ISLOWER( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a lowercase character.
| int Py_UNICODE_ISUPPER( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is an uppercase
  character.
| int Py_UNICODE_ISTITLE( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a titlecase character.
| int Py_UNICODE_ISLINEBREAK( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a linebreak character.
| int Py_UNICODE_ISDECIMAL( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a decimal character.
| int Py_UNICODE_ISDIGIT( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a digit character.
| int Py_UNICODE_ISNUMERIC( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is a numeric character.
| int Py_UNICODE_ISALPHA( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is an alphabetic
  character.
| int Py_UNICODE_ISALNUM( | Py_UNICODE ch) |  
 
- 
  Return 1 or 0 depending on whether ch is an alphanumeric
  character.
These APIs can be used for fast direct character conversions:
| Py_UNICODE Py_UNICODE_TOLOWER( | Py_UNICODE ch) |  
 
- 
  Return the character ch converted to lower case.
| Py_UNICODE Py_UNICODE_TOUPPER( | Py_UNICODE ch) |  
 
- 
  Return the character ch converted to upper case.
| Py_UNICODE Py_UNICODE_TOTITLE( | Py_UNICODE ch) |  
 
- 
  Return the character ch converted to title case.
| int Py_UNICODE_TODECIMAL( | Py_UNICODE ch) |  
 
- 
  Return the character ch converted to a decimal positive
  integer.  Return -1if this is not possible.  This macro
  does not raise exceptions.
| int Py_UNICODE_TODIGIT( | Py_UNICODE ch) |  
 
- 
  Return the character ch converted to a single digit integer.
  Return -1if this is not possible.  This macro does not raise
  exceptions.
| double Py_UNICODE_TONUMERIC( | Py_UNICODE ch) |  
 
- 
  Return the character ch converted to a double.
  Return -1.0if this is not possible.  This macro does not raise
  exceptions.
To create Unicode objects and access their basic sequence properties,
use these APIs:
| PyObject* PyUnicode_FromUnicode( | const Py_UNICODE *u,
                                                    Py_ssize_t size) |  
 
- 
  Return value:
  New reference.
 Create a Unicode Object from the Py_UNICODE buffer u of the
  given size. u may be NULL which causes the contents to be
  undefined. It is the user's responsibility to fill in the needed
  data.  The buffer is copied into the new object. If the buffer is
  not NULL, the return value might be a shared object. Therefore,
  modification of the resulting Unicode object is only allowed when
  u is NULL.
| Py_UNICODE* PyUnicode_AsUnicode( | PyObject *unicode) |  
 
- 
  Return a read-only pointer to the Unicode object's internal
  Py_UNICODE buffer, NULL if unicode is not a Unicode
  object.
| Py_ssize_t PyUnicode_GetSize( | PyObject *unicode) |  
 
- 
  Return the length of the Unicode object.
| PyObject* PyUnicode_FromEncodedObject( | PyObject *obj,
                                                      const char *encoding,
                                                      const char *errors) |  
 
- 
  Return value:
  New reference.
 Coerce an encoded object obj to an Unicode object and return a
  reference with incremented refcount.
String and other char buffer compatible objects are decoded
  according to the given encoding and using the error handling
  defined by errors.  Both can be NULL to have the interface
  use the default values (see the next section for details).
 
All other objects, including Unicode objects, cause a
  TypeError to be set.
 
The API returns NULL if there was an error.  The caller is
  responsible for decref'ing the returned objects.
 
| PyObject* PyUnicode_FromObject( | PyObject *obj) |  
 
- 
  Return value:
  New reference.
 Shortcut forPyUnicode_FromEncodedObject(obj, NULL, "strict")which is used throughout the interpreter whenever coercion to
  Unicode is needed.
If the platform supports wchar_t and provides a header file
wchar.h, Python can interface directly to this type using the
following functions. Support is optimized if Python's own
Py_UNICODE type is identical to the system's wchar_t.
| PyObject* PyUnicode_FromWideChar( | const wchar_t *w,
                                                     Py_ssize_t size) |  
 
- 
  Return value:
  New reference.
 Create a Unicode object from the wchar_t buffer w of
  the given size.  Return NULL on failure.
| Py_ssize_t PyUnicode_AsWideChar( | PyUnicodeObject *unicode,
                                             wchar_t *w,
                                             Py_ssize_t size) |  
 
- 
  Copy the Unicode object contents into the wchar_t buffer
  w.  At most size wchar_t characters are copied
  (excluding a possibly trailing 0-termination character).  Return
  the number of wchar_t characters copied or -1 in case of an
  error.  Note that the resulting wchar_t string may or may
  not be 0-terminated.  It is the responsibility of the caller to make
  sure that the wchar_t string is 0-terminated in case this is
  required by the application.
Release 2.5.2, documentation updated on 21st February, 2008.
 
See About this document... for information on suggesting changes.