Table of Contents
Data Types
The data type module implements the type system supported by TerraLib for dealing with data that comes from different data sources. It has an important role in TerraLib since each data source has its own set of data types used for representing and storing data. This module works integrated with the data access module and it provides the base foundation for data type extensibility.
Design
The basic idea behind this module is to provide a data type extension mechanism and an abstraction to handle the data values in a more general way. This is achieved by providing an abstract class called AbstractData from which all other data type values must derive and a data type system where you can register new types (see DataType and DataTypeManager classes). This module also includes the base classes for describing properties applied to set of values.
Type, Type Management and Type Codes
Each data type is associated to a type code (an integer value). This code must be unique and must be a standardized value known by all TerraLib modules. The following table shows some data type codes (see Enums.h for a complete and updated type list):
macro name | data type code | description |
---|---|---|
TE_UNKNOWN_DT | 0 | when the correct data type is unknown |
TE_VOID_DT | 1 | data type for void values |
TE_BIT_DT | 2 | data type for values stored as 1 bit of data |
TE_CHAR_DT | 3 | character data type (1 byte long) |
TE_UCHAR_DT | 4 | unsigned character data type (1 byte long) |
TE_INT16_DT | 5 | integer number data type (2 bytes long) |
TE_UINT16_DT | 6 | unsigned integer number data type (2 bytes long) |
TE_INT32_DT | 7 | signed integer number data type (4 bytes long) |
TE_UINT32_DT | 8 | unsigned integer number data type (4 bytes long) |
TE_INT64_DT | 9 | signed integer number data type (8 bytes long) |
TE_UINT64_DT | 10 | unsigned integer number data type (8 bytes long) |
TE_BOOLEAN_DT | 11 | boolean type (true or false) |
TE_FLOAT_DT | 12 | float number (32 bits) data type |
TE_DOUBLE_DT | 13 | double number (64 bits) data type |
TE_NUMERIC_DT | 14 | arbitrary precision data type: Numeric(p, q) |
TE_STRING_DT | 15 | string data types, may be: fixed-length strings (blank padded if needed), variable length string with a limited size or variable unlimited length |
TE_BYTE_ARRAY_DT | 16 | binary data (BLOB) |
TE_GEOMETRY_DT | 17 | vectorial geometry data type |
TE_DATETIME_DT | 18 | for date an time types |
TE_ARRAY_DT | 19 | multidimensional array of homogeneous elements |
TE_COMPOSITE_DT | 20 | composite type |
TE_DATASET_DT | 21 | when the type is a DataSet |
TE_RASTER_DT | 22 | when the type is a raster |
TE_CINT16_DT | 23 | complex signed integer (4 bytes long → 2 + 2) |
TE_CINT32_DT | 24 | complex signed integer (8 bytes long → 4 + 4) |
TE_CFLOAT_DT | 25 | complex float (8 bytes long → 4 + 4) |
TE_CDOUBLE_DT | 26 | complex double(16 bytes long → 8 + 8) |
TE_XML_DT | 27 | for XML documents data type |
You can use the set of macros listed above when working with the built-in types of TerraLib. Although they have a special-fixed code you must rely just on the macros not in their values because they can change in future releases. Note also that these type codes are used by the classes that describes properties.
Besides the numeric codes there are also two other classes that helps registering the available data types:
The DataType stores descriptive information about a given data type.
DataTypeManager is a singleton for managing all data types in the system. There are some basic constraints for data types:
- No two data types may have the same name
- The id of a data type will be dynamically generated by the manager (it is the same as the type code).
- Data type names must be in capital letters although it can contain numbers and other symbols.
AbstractData
This is the base class for values that can be retrieved from the data access module using the getValue
method in the DataSet or DataSetItem classes. This class provides the basic extensibility for data types. Through implementing this interface you can handle new data type values in the data access module.
As one can see in the class diagram every data type supported by TerraLib, like Geometry or Raster, is a subclass of AbstractData and thus can be handled in the DataSet and DataSetItem API as a built-in type via getGeometry
method or as a general value via getValue
method.
For more information on how to create a new data type see the section below called Data Type Extensibility.
Byte Array
The byte array class can be used for representing binary data. Most data sources comes with the type BLOB (CLOB) that can be mapped to a byte array data type. It is a copy constructible type.
A byte array object can be constructed from a new buffer or using a pre-existing one and hence avoiding the overhead of copying data. A byte array has an internal capacity and also a internal pointer that marks how much of the internal buffer is in use.
Date and Time Types
This module introduces a base abstract class named DateTime for date and time classes based on ISO 8601 and ISO 19108. Internally these classes uses Boost.Date_Time library. The class diagram shows the following classes:
- Date: a class to represent dates based on the Gregorian Calendar. Internally, it uses boost::gregorian::date.
- DateDuration: it is a simple day count used for arithmetic with date. Internally, it uses boost::gregorian::date_duration.
- DatePeriod: it represents a range between two dates. Internally, it uses boost::gregorian::date_period.
- TimeDuration: it represents time duration. Internally, it uses boost::posix_time::time_duration.
- TimeInstant: it is a time point composed by a gregorian date portion and a time portion. Internally, it uses boost::posix_time::ptime.
- TimePeriod: representation for ranges between two times. Internally, it uses boost::posix_time::time_period.
- TimeInstantTZ: it is a time point with time zone information. Internally, it uses boost::local_time::local_date_time.
- TimePeriodTZ: representation for ranges between two times accounting for time zone. Internally, it uses boost::local_time::local_time_period.
Numeric Type
String Type
Array Type
SimpleData
The class SimpleData is a template for atomic data types (integers, floats, strings, boolean and numerics). Most of the atomic types are just typedefs.
Requirements on type T:
- T must be a copy constructible type.
- T must be used with output streams via operator «.
typedef SimpleData<char, TE_CHAR_DT> Char; typedef SimpleData<unsigned char, TE_UCHAR_DT> UChar; typedef SimpleData<boost::int16_t, TE_INT16_DT> Int16; typedef SimpleData<boost::uint16_t, TE_UINT16_DT> UInt16; typedef SimpleData<boost::int32_t, TE_INT32_DT> Int32; typedef SimpleData<boost::uint32_t, TE_UINT32_DT> UInt32; typedef SimpleData<boost::int64_t, TE_INT64_DT> Int64; typedef SimpleData<boost::uint64_t, TE_UINT64_DT> UInt64; typedef SimpleData<bool, TE_BOOLEAN_DT> Boolean; typedef SimpleData<float, TE_FLOAT_DT> Float; typedef SimpleData<double, TE_DOUBLE_DT> Double; typedef SimpleData<std::string, TE_NUMERIC_DT> Numeric; typedef SimpleData<std::string, TE_STRING_DT> String;
Composites
A composite is a data type which can be constructed using primitive data types and other composite types. It is acceptable for the pieces of a composite to be composite types themselves. This type can be used for example to map the traditional composite type of database systems.
Data Type Mapping
This module introduces a set of classes for dealing with data type conversions. The AbstractDataConverter
is a helper class that can be used to guide applications to convert data types to the right types. In the data access module each data source can publish its list of conversion operations and hence be used by applications to guide the process of data conversion.
The DataConverterManager
is a singleton for managing the data type converter objects available in the system. All converters available in the system must be registered in this singleton. Data sources can mantain pointers to this converters and so operations that can change this singleton must be used just by data access driver developers.
There are some built-in converters:
- Int32ToStringConverter: a converter from Int32 data values to String.
- <color red>Continuar esta lista</color>
Properties
The property classes can be used to model the definition of properties or set of values.
The base abstract class Property defines the common information about the value of a given property or set of values. It includes:
- The data type associated to the property;
- Any restrictions on the values of the property;
The SimpleProperty class represents an atomic property like an integer or double. It may have default values, it may indicates if the value is an auto-increment or if the value is always required.
The classes StringProperty, NumericProperty, DateTimeProperty and ArrayProperty refines the SimpleProperty class adding more semantics to the represented properties.
The CompositeProperty class is a base class for compound properties (non-atomic properties).
The Geometry Module and Raster Module add new properties (GeometryProperty and RasterProperty) to describe geometric and raster properties.
Exceptions
This module introduces an exception class te::dt::Exception to help catch specific exceptions thrown by this module. The following class diagram shows the exception class in detail:
Data Type Extensibility
<color red>TO BE DONE From this point on the documentation is under construction TO BE DONE</color>
Creation of New Data Types
You can create a new data type by registering it in the DataTypeManager. For each data type you must supply a unique name for the type and then manager will assigned an GID (global ID)…
Besides registering the data type you must also provide an abstract data implementation for the new type….
For each data source driver where this new data type will be used you must register two routines to convert instances of the data type from/to the driver internal representation. These routines allow wide flexibility for data source implementations…
In the case of data sources you should also register it in the data source driver. See the data type code table below if you want to know the code of the basic data types of TerraLib.
There is also a data type catalog called DataTypeManager, a singleton that keeps information about all available data types in the system. In this singleton you will find all supported data types. Some of the data type codes are reserved for the primitive types in TerraLib. The basic data types all have fixed codes that can be seen in the config definitions.
Registering the Data Type in the TerraLib Query Language
From Theory to Practice
Module Summary
------------------------------------------------------------------------------- Language files blank comment code scale 3rd gen. equiv ------------------------------------------------------------------------------- C++ 28 536 534 1133 x 1.51 = 1710.83 C/C++ Header 30 1233 2073 1021 x 1.00 = 1021.00 ------------------------------------------------------------------------------- SUM: 58 1769 2607 2154 x 1.27 = 2731.83 -------------------------------------------------------------------------------
Besides the C++ code there is also…
Final Remarks
- We need to types: Numeric and Array!
- We need to add a serialization/deserialization signature to all data types.
- Think about overloading operator « and » and io type (binary, text, xml, json, db).
- Remove the copy and allocation overhead in byte array toString method.
- We must consider that boost::gregorian::date is stored as a 32 bit integer type and it is specifically designed to NOT contain virtual functions because this design allows for efficient calculation and memory usage with large collections of dates.
If you want more information about the use of data types, please, refer to the following classes/concepts: