TerraLib and TerraView Wiki Page

This is an old revision of the document!


Data Access Module

The Data Access module provides the fundamental layer for applications that handle spatial data from different sources, ranging from traditional DBMSs to OGC Web Services.

This module is composed by some base abstract classes that must be extended to allow the creation of Data Access Drivers which actually implement all the details needed to access data in a specific format or system.

This module provides the base foundation for an application discover what is stored in a data source and how the data is organized in it.

Keep in mind that this organization is the low-level organization of the data. For instance, in a DBMS, you can find out the tables stored in a database and the relationships between them or detailed informations about their columns (data type, name and constraints).

It is not the role of this module to provide higher-level metadata about the data stored in a data source. This support is provided by another TerraLib module: Spatial Metadata module.

This section describes the Data Access module in details.

Design

As one can see in the class diagram below, the Data Access module provides a basic framework for accessing data.

Data Access Class Diagram

It is designed towards extensibility and data interoperability, so you can easily extend it with your own data access implementation.

The requirements that drove this design were:

  • extensible data formats/access: the API must be flexible enough to allow new data source driver implementations for new data formats.
  • data type extension: it must be possible to add new data types to the list of supported ones. The API must provide a way for developers to access new data types that exist only in particular implementations. The new data types can be added to all data source drivers or just for part of them. This will enable the use of extensible data types available in all object-relational DBMS.
  • query language extension: it must be feasible to add new functions to the list of supported operations in the query language of a specific driver.
  • dynamic linking and loading: new drivers can be added without the need of an explicit linking. The data access architecture must support dynamic loading of data source driver modules, so that new drivers can be installed at any time, without any requirement to recompile TerraLib or the application.

The yellow classes with the names in italic are abstract and must be implemented by data access drivers. Following we discuss each class in detail. See the Doxygen documentation for more details.

DataSource

The DataSource class is the fundamental class of the data access module and it represents a data repository.

It may represent, for instance, a PostgreSQL database, an Oracle database, an OGC Web Feature Service, a directory of ESRI shape-files, a single shape-file, a directory of TIFF images, a single TIFF image or a data stream.

Each system or file format requires an implementation of this class.

A DataSource shows the data contained in it as a collection of Datasets.

The information about the data that is stored in a data source may be available through a DataSetType, that contains the dataset name name and its structure/schema.

Besides the descriptive information about the underlying data repository each data source also provides information about its requirements and capabilities. This information may be used by applications so that they can adapt to the abilities of the underlying data source in use.

Each data source driver must have a unique identifier. This identifier is a string (in capital letters) with the data source type name and it is available through the method getType. Examples of identifiers are: POSTGIS, OGR, GDAL, SQLITE, WFS, WCS, SHP, ACCESS.

REVER DAQUI PRA BAIXO A data source is also characterized by a set of parameters that can be used to set up an access channel to its underlying repository. This information is referred as the data source connection information. This information may be provided as an associative container (a set of key-value pairs) through the method setConnectionInfo or using a plain connection string through the method setConnectionStr. The key-value pairs (kvp) may contain information about maximum number of accepted connections, user name and password required for establishing a connection, the url of a service or any other information needed by the data source to operate. The parameters are dependent on the data source driver. So, please, check the driver documentation for any additional information on the supported parameters. When using a plain string, the information is encoded by a set of key-value pairs separated by an equal sign and each pair is separated by the ampersand (&) and they must be URL encoded. For instance, in a PostGIS data source it is usual to use the following syntax:

std::string connInfo = "host=atlas.dpi.inpe.br&port=5432&dbname=mydb&user=postgres&password=mypasswd&connect_timeout=20";

For a WFS data source available at http://www.dpi.inpe.br/wfs the connection string could be:

std::string connInfo = "service=http%3A%2F%2Fwww.dpi.inpe.br%2Fwfs";

The method getConnectionInfo returns an associative container (set of key-value pairs) with the connection information. The same information is also available in an URL encoded string through the method getConnectionStr.

Another useful information available in a data source is its known capabilities. The method getCapabilities returns an associative container with all information about what the data source can perform. Here you will find if the data source implementation supports primary keys, foreign keys, if it can be used in a thread environment and much more information. There is a list of common key-value pairs that every data access driver must supply although each implementation can provide additional information.

A data source can be in one of the following states: opened or closed. In order to open a data source and makes it ready for use, one needs to provide the set of parameters required to set up the underlying access channel to the repository and then call one of the open methods. These methods will prepare the data source to work. If the implementation needs to open a connection to a database server, or to open a file or to get information from a Web Service, these methods can do this kind of job in order to prepare the data source to be in an operational mode. As one can see, you can use an associative container with the connection information or a string in a kvp notation.

Once opened, the data source can be closed, releasing all the resources used by its internal communication channel. The close method closes any database connection, opened files or resources used by the data source.

You can inquire the data source in order to know if it is opened or if it still valid (available for use) using the methods isOpened and isValid accordingly.

The data stored in a data source may be discovered using the DataSourceTransactor and DataSourceCatalogLoader. The developer may also cache the description of datasets available in the data source in its catalog (DataSourceCatalog). The method getCatalog gives you the access to the cached data description in the data source. All drivers must assure a non-NULL data source catalog when the getCatalog method is called, although it can be empty.

In order to interact with the data source you need a transactor. The getTransactor method returns an object that can execute transactions in the context of a data source. You can use this method to get an object that allows to retrieve a dataset, to insert data or to modify the schema of a dataset. You don't need to cache this kind of object because each driver in TerraLib already keeps a pool. So as soon as you finish using the transactor, destroy it. For more information see the DataSourceTransactor class.

A data source repository can be created using the create class method. All you need is to inform the type of data source (providing its type name, for example: POSTGIS, ORACLE, WFS) and the creational information (a set of key-value pairs). Note that after creation the data source is in a “closed state”. The caller will have to decide when to open it. Not all drivers may perform this operation and it must be checked in the capabilities.

As you can create a new data source, you can also drop an existing one, through the drop class method. This command will also remove the data source from the data source manager (if it is stored there). Not all drivers may perform this operation and it must be checked in the capabilities.

For an in depth explanation, see Doxygen documentation of this class.

DataSet

For an in depth explanation, see Doxygen documentation of this class.