The MonetDB Python API¶
pymonetdb is the native Python client API for monetDB. It is cross-platform and does not depend on any MonetDB libraries. It supports Python 3.6+ and PyPy and is Python DBAPI 2.0 compatible.
Besides the functionality required by DBAPI 2.0, pymonetdb also provides some MonetDB-specific functionality, in particular file transfers. These are detailed in the API section.
Contents¶
Getting Started¶
Installation¶
pymonetdb is available on PyPI and can be installed with the following command:
$ pip install pymonetdb
It can also be installed from its source directory by running:
$ python setup.py install
Connecting¶
In its simplest form, the function pymonetdb.connect()
takes a single
parameter, the database name:
conn = pymonetdb.connect('demo')
Usually, you have to pass more:
conn = pymonetdb.connect(
'demo',
hostname='dbhost', port=50001,
username='yours', password='truly')
There are also some options you can set, for example autocommit=True
.
It is also possible to combine everything in a URL:
url = 'mapi:monetdb://yours:truly@dbhost:50001/demo?autocommit=true'
conn = pymonetdb.connect(url)
For more details see the documentation of pymonetdb.connect()
.
Examples¶
Here are some examples of how to use pymonetdb.
Example session¶
> # import the SQL module
> import pymonetdb
>
> # set up a connection. arguments below are the defaults
> connection = pymonetdb.connect(username="monetdb", password="monetdb",
> hostname="localhost", database="demo")
>
> # create a cursor
> cursor = connection.cursor()
>
> # increase the rows fetched to increase performance (optional)
> cursor.arraysize = 100
>
> # execute a query (return the number of rows to fetch)
> cursor.execute('SELECT * FROM tables')
26
>
> # fetch only one row
> cursor.fetchone()
[1062, 'schemas', 1061, None, 0, True, 0, 0]
>
> # fetch the remaining rows
> cursor.fetchall()
[[1067, 'types', 1061, None, 0, True, 0, 0],
[1076, 'functions', 1061, None, 0, True, 0, 0],
[1085, 'args', 1061, None, 0, True, 0, 0],
[1093, 'sequences', 1061, None, 0, True, 0, 0],
[1103, 'dependencies', 1061, None, 0, True, 0, 0],
[1107, 'connections', 1061, None, 0, True, 0, 0],
[1116, '_tables', 1061, None, 0, True, 0, 0],
...
[4141, 'user_role', 1061, None, 0, True, 0, 0],
[4144, 'auths', 1061, None, 0, True, 0, 0],
[4148, 'privileges', 1061, None, 0, True, 0, 0]]
>
> # Show the table meta data
> cursor.description
[('id', 'int', 4, 4, None, None, None),
('name', 'varchar', 12, 12, None, None, None),
('schema_id', 'int', 4, 4, None, None, None),
('query', 'varchar', 168, 168, None, None, None),
('type', 'smallint', 1, 1, None, None, None),
('system', 'boolean', 5, 5, None, None, None),
('commit_action', 'smallint', 1, 1, None, None, None),
('temporary', 'tinyint', 1, 1, None, None, None)]
MAPI Connection¶
If you would like to communicate with the database at a lower level you can use the MAPI library (but not recommended):
> from pymonetdb import mapi
> server = mapi.Connection()
> server.connect(hostname="localhost", port=50000, username="monetdb",
password="monetdb", database="demo", language="sql")
> server.cmd("sSELECT * FROM tables;")
...
CSV Upload¶
This is an example script that uploads some CSV data from the local file system:
#!/usr/bin/env python3
import os
import pymonetdb
# Create the data directory and the CSV file
try:
os.mkdir("datadir")
except FileExistsError:
pass
with open("datadir/data.csv", "w") as f:
for i in range(10):
print(f"{i},item{i + 1}", file=f)
# Connect to MonetDB and register the upload handler
conn = pymonetdb.connect('demo')
handler = pymonetdb.SafeDirectoryHandler("datadir")
conn.set_uploader(handler)
cursor = conn.cursor()
# Set up the table
cursor.execute("DROP TABLE foo")
cursor.execute("CREATE TABLE foo(i INT, t TEXT)")
# Upload the data, this will ask the handler to upload data.csv
cursor.execute("COPY INTO foo FROM 'data.csv' ON CLIENT USING DELIMITERS ','")
# Check that it has loaded
cursor.execute("SELECT t FROM foo WHERE i = 9")
row = cursor.fetchone()
assert row[0] == 'item10'
# Goodbye
conn.commit()
cursor.close()
conn.close()
File Transfers¶
MonetDB supports the non-standard COPY INTO
statement to load a CSV-like
text file into a table or to dump a table into a text file. This statement has an
optional modifier ON CLIENT
to indicate that the server should not
try to open the file on the server side but instead ask the client to open it
on its behalf.
For example:
COPY INTO mytable FROM 'data.csv' ON CLIENT
USING DELIMITERS ',', E'\n', '"';
However, by default, if pymonetdb receives a file request from the server, it will refuse it for security considerations. You do not want an unauthorised party pretending to be the server to be able to request arbitrary files on your system and even overwrite them.
To enable file transfers, create a pymonetdb.Uploader or pymonetdb.Downloader and register them with your connection:
transfer_handler = pymonetdb.SafeDirectoryHandler(datadir)
conn.set_uploader(transfer_handler)
conn.set_downloader(transfer_handler)
With this in place, the COPY INTO ... ON CLIENT
statement above will cause pymonetdb to open
the file data.csv in the given datadir and upload its contents. As its name
suggests, SafeDirectoryHandler
will only allow access to the files in
that directory.
Note that in this example, we register the same handler object as an uploader and a downloader for demonstration purposes. In the real world, it is good security practice only to register an uploader or a downloader It is also possible to use two separate handlers.
See the API documentation for details.
Make up data as you go¶
You can also write your own transfer handlers. And instead of opening a file, such handlers can make up the data on the fly, for instance, retrieve it from a remote microservice, prompt the user interactively or do whatever else you come up with:
class MyUploader(pymonetdb.Uploader):
def handle_upload(self, upload, filename, text_mode, skip_amount):
tw = upload.text_writer()
for i in range(skip_amount, 1000):
print(f'{i},number{i}', file=tw)
In this example, we call upload.text_writer() to yield a text-mode file-like object. There is also an upload.binary_writer(), which creates a binary-mode file-like object. The binary_writer() works even if the server requests a text mode object, but in that case, you have to make sure the bytes you write are valid UTF-8 and delimited with Unix line endings rather than Windows line endings.
If you want to refuse an upload or download, call upload.send_error() to send an error message before any call to text_writer() or binary_writer().
For custom downloaders, the situation is similar, except that instead of text_writer and binary_writer, the download parameter offers download.text_reader() and download.text_writer().
Skip amount¶
MonetDB’s COPY INTO
statement allows you to skip, for example, the first
line in a file using the modifier OFFSET 2
. In such a case,
the skip_amount parameter to handle_upload() will be greater than zero.
Note that the offset in the SQL statement is 1-based, whereas the skip_amount
parameter has already been converted to 0-based. The example above thus
allows us to write for i in range(skip_amount, 1000):
rather
than for i in range(1000):
.
Cancellation¶
In cases depicted by the following query, the server does not need to receive all data of the input file:
COPY 100 RECORDS INTO mytable FROM ‘data.csv’ ON CLIENT
Therefore, pymonetdb regularly asks the server if it is still interested in receiving more data. In this way, the server can cancel the uploading after it has received sufficient data to process the query. By default, pymonetdb does this after every MiB of data, but you can change this frequency using upload.set_chunk_size().
If the server answers that it is no longer interested, pymonetdb will discard any further data written to the writer. It is recommended to call upload.is_cancelled() occasionally to check for this and exit early if the upload has been cancelled.
Upload handlers also have an optional method cancel() that you can override. This method is called when pymonetdb receives the cancellation request.
Copying data from or to a file-like object¶
If you are moving large amounts of data between pymonetdb and a file-like object such as a file, Python’s copyfileobj function may come in handy:
class MyUploader(pymonetdb.Uploader):
def __init__(self, dir):
self.dir = pathlib.Path(dir)
def handle_upload(self, upload, filename, text_mode, skip_amount):
# security check
path = self.dir.joinpath(filename).resolve()
if not str(path).startswith(str(self.dir.resolve())):
return upload.send_error('Forbidden')
# open
tw = upload.text_writer()
with open(path) as f:
# skip
for i in range(skip_amount):
f.readline()
# bulk upload
shutil.copyfileobj(f, tw)
However, note that copyfileobj does not handle cancellations as described above.
Security considerations¶
If your handler accesses the file system or the network, it is critical to validate the file name you are given carefully. Otherwise, an attacker can take over the server or the connection to the server and cause great damage.
The code sample above also includes an example of validating file systems paths. Similar considerations apply to text inserted into network URLs and other resource identifiers.
Result set batch size¶
When a query produces a large result set, pymonetdb will often only retrieve part of the result set, retrieving the rest later, one batch at a time. The default behavior is to start with a reasonably small batch size but increase it rapidly. However, if necessary, the application can configure this behavior. In the table below, you can see the settings controlling the behavior of large transfers.
Setting name | Defined by | Range | Default |
---|---|---|---|
replysize | pymonetdb | positive integer or -1 [*] | 100 |
maxprefetch | pymonetdb | positive integer or -1 [*] | 2500 |
arraysize | DBAPI 2.0 | positive integer | Connection.replysize |
[*] The value -1 means unlimited.
The replysize and maxprefetch settings can be set as attributes of both Connection and Cursor. They can also be passed as parameters in the connection URL. The arraysize setting only exists for Cursor. It defaults to the replysize of the connection when the cursor was created if that is positive, or 100 otherwise.
Batching behavior¶
When MonetDB has finished executing a query, the server includes the first rows of the result set in its response to Cursor.execute(). The exact number of rows it includes can be configured using the replysize setting.
How the rest of the rows are retrieved depends on how they are accessed. Cursor.fetchone() and Cursor.fetchmany() retrieve the remaining rows in batches of increasing size. Every batch is twice as large as the previous one until the prefetch limit maxprefetch has been reached. This setting controls the maximum number of fetched rows that are not immediately used.
For instance, with replysize = 100, the first 100 fetchone() calls immediately return the next row from the cache. For the 101-st fetchone(), pymonetdb will first double the replysize and retrieve rows 101-300 before returning row 101. When Cursor.fetchmany() is used, pymonetdb also adjusts the replysize to the requested stride. For example, for fetchmany(40), the first two calls will return rows from the cache. However, for the third call, pymonetdb will first retrieve rows 101-320, i.e. double the replysize and enlarge it to reach a multiple of 40, before returning rows 81 - 120.
With Cursor.fetchall(), all rows are retrieved at once.
New result set format¶
Version Jun2023 of MonetDB introduces a new, binary result set format that is much more efficient to parse. The initial transfer of replysize rows still uses the existing text-based format; however, the subsequent batches can be transferred much more efficiently with the binary format. By default, pymonetdb will automatically use it when possible unless configured otherwise using the binary setting, e.g. pymonetdb.connect(‘demo’, binary=0) or pymonetdb.connect(‘mapi:monetdb://localhost/demo?binary=0’).
Normally, the binary result set transfer is transparent to the user applications. The result set fetching functions automatically do the necessary data conversion. However, if you want to know explicitly if the binary format has been used, you can use Cursor.used_binary_protocol(), e.g. after having called a fetch function.
We have implemented a special case to benefit from the binary protocol even when the replysize is set to -1. When pymonetdb knows that binary transfers are possible (e.g. learnt when connecting with MoentDB) while replysize is -1, it overrides the replysize. Pymonetdb will use a small size for the initial transfer and then retrieve the rest of the result set in one large binary batch.
Tweaking the behavior¶
Usually, the batching behavior does not need to be tweaked.
When deciding which function to use to fetch the result sets, Cursor.fetchmany() seems to be a few percent more efficient than Cursor.fetchall(), while Cursor.fetchone() tends to be 10-15% slower.
To reduce the amount of prefetched data, set maxprefetch to a lower value or even 0. The value 0 disables prefetch entirely, only fetching the requested rows. Setting maxprefetch to -1 has the opposite effect: it allows the prefetch size to increase without a bound.
If you expect the size of the individual rows to be huge, consider setting both replysize and maxprefetch to small values, for example, 10 and 20, respectively, or even 1 and 0. These small batch sizes limit the memory each batch consumes. As a quick rule of thumb for the memory requirements, one can assume that pymonetdb may need up to three times the size of the result set. Also, remember that if MonetDB is running on the same host, the server will also need at least that amount of memory.
Generally, one does not need to make replysize larger than the default because it will grow rapidly. Furthermore, with the newer versions of MonetDB and pymonetdb, it is better to keep the size of the initial response small to transfer more data in the binary format.
Arraysize¶
The batching behavior of pymonetdb is governed mainly by replysize and maxprefetch, but the Python DBAPI also specifies the setting arraysize. The relationship between these three is as follows:
- The replysize and maxprefetch settings are specific to pymonetdb, while arraysize comes from the Python DBAPI.
- The DBAPI only uses arraysize as the default value for fetchmany() and says that it may influence the efficiency of fetchall(). It does not mention arraysize anywhere else.
- In pymonetdb, the batching behavior is only influenced by arraysize if fetchmany() is used without an explicit size because then arraysize is used as the default size, and fetchmany() tries to round the batches to this size. It has no effect on fetchall() because that always fetches everything at once.
- The DBAPI says that the default value for the arraysize of a newly created cursor is 1. Pymonetdb deviates from that, similar to, for example, python-oracledb. Pymonetdb uses the replysize of the connection instead. If replysize is not a positive integer, the default is 100.
In general, all this means that arraysize needs no tweaking.
API¶
Basic SQL usage¶
-
pymonetdb.
connect
(*args, **kwargs) → pymonetdb.sql.connections.Connection¶ Set up a connection to a MonetDB SQL database.
- database (str)
- name of the database, or MAPI URI (see below)
- hostname (str)
- Hostname where MonetDB is running
- port (int)
- port to connect to (default: 50000)
- username (str)
- username for connection (default: “monetdb”)
- password (str)
- password for connection (default: “monetdb”)
- unix_socket (str)
- socket to connect to. used when hostname not set (default: “/tmp/.s.monetdb.50000”)
- autocommit (bool)
- enable/disable auto commit (default: false)
- connect_timeout (int)
- the socket timeout while connecting
- binary (int)
- enable binary result sets when possible if > 0 (default: 1)
- replysize(int)
- number of rows to retrieve immediately after query execution (default: 100, -1 means everything)
- maxprefetch(int)
- max. number of rows to prefetch during Cursor.fetchone() or Cursor.fetchmany()
MAPI URI Syntax:
- tcp socket
- mapi:monetdb://[<username>[:<password>]@]<host>[:<port>]/<database>
- unix domain socket
- mapi:monetdb:///[<username>[:<password>]@]path/to/socket?database=<database>
-
class
pymonetdb.sql.connections.
Connection
(database, hostname=None, port=50000, username='monetdb', password='monetdb', unix_socket=None, autocommit=False, host=None, user=None, connect_timeout=-1, binary=1, replysize=None, maxprefetch=None)¶ Bases:
object
A MonetDB SQL database connection
-
exception
DataError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised for errors that are due to problems with the processed data like division by zero, numeric value out of range, etc. It must be a subclass of DatabaseError.
-
exception
DatabaseError
¶ Bases:
pymonetdb.exceptions.Error
Exception raised for errors that are related to the database. It must be a subclass of Error.
-
exception
Error
¶ Bases:
Exception
Exception that is the base class of all other error exceptions. You can use this to catch all errors with one single ‘except’ statement. Warnings are not considered errors and thus should not use this class as base. It must be a subclass of the Python StandardError (defined in the module exceptions).
-
exception
IntegrityError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised when the relational integrity of the database is affected, e.g. a foreign key check fails. It must be a subclass of DatabaseError.
-
exception
InterfaceError
¶ Bases:
pymonetdb.exceptions.Error
Exception raised for errors that are related to the database interface rather than the database itself. It must be a subclass of Error.
-
exception
InternalError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised when the database encounters an internal error, e.g. the cursor is not valid anymore, the transaction is out of sync, etc. It must be a subclass of DatabaseError.
-
exception
NotSupportedError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised in case a method or database API was used which is not supported by the database, e.g. requesting a .rollback() on a connection that does not support transaction or has transactions turned off. It must be a subclass of DatabaseError.
-
exception
OperationalError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised for errors that are related to the database’s operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, a memory allocation error occurred during processing, etc. It must be a subclass of DatabaseError.
-
exception
ProgrammingError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised for programming errors, e.g. table not found or already exists, syntax error in the SQL statement, wrong number of parameters specified, etc. It must be a subclass of DatabaseError.
-
exception
Warning
¶ Bases:
Exception
Exception raised for important warnings like data truncations while inserting, etc. It must be a subclass of the Python StandardError (defined in the module exceptions).
-
binary
¶
-
binary_command
(command)¶ use this function to send low level mapi commands that return raw bytes
-
close
()¶ Close the connection.
The connection will be unusable from this point forward; an Error exception will be raised if any operation is attempted with the connection. The same applies to all cursor objects trying to use the connection. Note that closing a connection without committing the changes first will cause an implicit rollback to be performed.
-
command
(command)¶ use this function to send low level mapi commands
-
commit
()¶ Commit any pending transaction to the database. Note that if the database supports an auto-commit feature, this must be initially off. An interface method may be provided to turn it back on.
Database modules that do not support transactions should implement this method with void functionality.
-
cursor
()¶ Return a new Cursor Object using the connection. If the database does not provide a direct cursor concept, the module will have to emulate cursors using other means to the extent needed by this specification.
-
default_cursor
¶ alias of
pymonetdb.sql.cursors.Cursor
-
execute
(query)¶ use this for executing SQL queries
-
get_binary
() → int¶
-
get_maxprefetch
() → int¶
-
get_replysize
() → int¶
-
gettimeout
()¶ get the amount of time before a connection times out
-
maxprefetch
¶
-
replysize
¶
-
rollback
()¶ This method is optional since not all databases provide transaction support.
In case a database does provide transactions this method causes the database to roll back to the start of any pending transaction. Closing a connection without committing the changes first will cause an implicit rollback to be performed.
-
set_autocommit
(autocommit)¶ Set auto commit on or off. ‘autocommit’ must be a boolean
-
set_binary
(binary: int)¶
-
set_downloader
(downloader)¶ Register a Downloader object which will handle file download requests.
Must be an instance of class pymonetdb.Downloader or None
-
set_maxprefetch
(maxprefetch: int)¶
-
set_replysize
(replysize: int)¶
-
set_sizeheader
(sizeheader)¶ Set sizeheader on or off. When enabled monetdb will return the size a type. ‘sizeheader’ must be a boolean.
-
set_timezone
(seconds_east_of_utc)¶
-
set_uploader
(uploader)¶ Register an Uploader object which will handle file upload requests.
Must be an instance of class pymonetdb.Uploader or None.
-
settimeout
(timeout)¶ set the amount of time before a connection times out
-
exception
-
class
pymonetdb.sql.cursors.
Cursor
(connection: pymonetdb.sql.connections.Connection)¶ Bases:
object
This object represents a database cursor, which is used to manage the context of a fetch operation. Cursors created from the same connection are not isolated, i.e., any changes done to the database by a cursor are immediately visible by the other cursors
-
arraysize
= None¶ Default value for the size parameter of
fetchmany()
.
-
binary
¶
-
close
()¶ Close the cursor now (rather than whenever __del__ is called). The cursor will be unusable from this point forward; an Error (or subclass) exception will be raised if any operation is attempted with the cursor.
-
debug
(query, fname, sample=-1)¶ Locally debug a given Python UDF function in a SQL query using the PDB debugger. Optionally can run on only a sample of the input data, for faster data export.
-
execute
(operation: str, parameters: Optional[Dict[KT, VT]] = None)¶ Prepare and execute a database operation (query or command). Parameters may be provided as mapping and will be bound to variables in the operation.
-
executemany
(operation, seq_of_parameters)¶ Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters.
It will return the number or rows affected
-
export
(query, fname, sample=-1, filespath='./')¶
-
fetchall
()¶ Fetch all remaining rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples).
A
ProgrammingError
is raised if the previous call to .execute*() did not produce any result set or no call was issued yet.
-
fetchmany
(size=None)¶ Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.
The number of rows to fetch per call is specified by the parameter. If it is not given, the cursor’s arraysize determines the number of rows to be fetched.
A
ProgrammingError
is raised if the previous call to .execute*() did not produce any result set or no call was issued yet.
-
fetchone
()¶ Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.
-
get_binary
() → int¶
-
get_maxprefetch
() → int¶
-
get_replysize
() → int¶
-
maxprefetch
¶
-
next
()¶
-
replysize
¶
-
scroll
(value, mode='relative')¶ Scroll the cursor in the result set to a new position according to mode.
If mode is ‘relative’ (default), value is taken as offset to the current position in the result set, if set to ‘absolute’, value states an absolute target position.
An IndexError is raised in case a scroll operation would leave the result set.
-
set_binary
(level: int)¶
-
set_maxprefetch
(maxprefetch: int)¶
-
set_replysize
(replysize: int)¶
-
setinputsizes
(sizes)¶ This method would be used before the .execute*() method is invoked to reserve memory. This implementation doesn’t use this.
-
setoutputsize
(size, column=None)¶ Set a column buffer size for fetches of large columns This implementation doesn’t use this
-
used_binary_protocol
() → bool¶ Pymonetdb-specific. Return True if the last fetch{one,many,all} for the current statement made use of the binary protocol.
Primarily used for testing.
Note that the binary protocol is never used for the first few rows of a result set. Exactly when it kicks in depends on the replysize setting.
-
Type conversion¶
functions for converting python objects to monetdb SQL format. If you want to add support for a specific type you should add a function as a value to the mapping dict and the datatype as key.
-
pymonetdb.sql.monetize.
convert
(data)¶ Return the appropriate convertion function based upon the python type.
-
pymonetdb.sql.monetize.
monet_bool
(data)¶ returns “true” or “false”
-
pymonetdb.sql.monetize.
monet_bytes
(data)¶ converts bytes to string
-
pymonetdb.sql.monetize.
monet_date
(data)¶ returns a casted date
-
pymonetdb.sql.monetize.
monet_datetime
(data)¶ returns a casted timestamp
-
pymonetdb.sql.monetize.
monet_escape
(data)¶ returns an escaped string
-
pymonetdb.sql.monetize.
monet_none
(_)¶ returns a NULL string
-
pymonetdb.sql.monetize.
monet_time
(data)¶ returns a casted time
-
pymonetdb.sql.monetize.
monet_timedelta
(data)¶ returns timedelta casted to interval seconds
-
pymonetdb.sql.monetize.
monet_unicode
(data)¶
functions for converting monetdb SQL fields to Python objects
-
pymonetdb.sql.pythonize.
Binary
(data)¶ Convert to wraps binary data
-
pymonetdb.sql.pythonize.
DateFromTicks
(ticks)¶ Convert ticks to python Date
-
pymonetdb.sql.pythonize.
TimeFromTicks
(ticks)¶ Convert ticks to python Time
-
pymonetdb.sql.pythonize.
TimeTzFromTicks
(ticks)¶ Convert ticks to python Time
-
pymonetdb.sql.pythonize.
TimestampFromTicks
(ticks)¶ Convert ticks to python Timestamp
-
pymonetdb.sql.pythonize.
TimestampTzFromTicks
(ticks)¶ Convert ticks to python Timestamp
-
pymonetdb.sql.pythonize.
convert
(data, type_code)¶ Calls the appropriate convertion function based upon the python type
-
pymonetdb.sql.pythonize.
oid
(data)¶ represents an object identifier
For now we will just return the string representation just like mclient does.
-
pymonetdb.sql.pythonize.
py_bool
(data)¶ return python boolean
-
pymonetdb.sql.pythonize.
py_bytes
(data: str)¶ Returns a bytes (py3) or string (py2) object representing the input blob.
-
pymonetdb.sql.pythonize.
py_date
(data)¶ Returns a python Date
-
pymonetdb.sql.pythonize.
py_day_interval
(data: str) → int¶ Returns a python number of days where data represents a value of MonetDB’s INTERVAL DAY type which resembles a stringified decimal.
-
pymonetdb.sql.pythonize.
py_sec_interval
(data: str) → datetime.timedelta¶ Returns a python TimeDelta where data represents a value of MonetDB’s INTERVAL SECOND type which resembles a stringified decimal.
-
pymonetdb.sql.pythonize.
py_time
(data)¶ returns a python Time
-
pymonetdb.sql.pythonize.
py_timestamp
(data)¶ Returns a python Timestamp
-
pymonetdb.sql.pythonize.
py_timestamptz
(data)¶ Returns a python Timestamp where data contains a tz code
-
pymonetdb.sql.pythonize.
py_timetz
(data)¶ returns a python Time where data contains a tz code
-
pymonetdb.sql.pythonize.
strip
(data)¶ returns a python string, with chopped off quotes, and replaced escape characters
MAPI¶
This is the python implementation of the mapi protocol.
-
class
pymonetdb.mapi.
Connection
¶ Bases:
object
MAPI (low level MonetDB API) connection
-
binary_cmd
(operation: str) → memoryview¶ put a mapi command on the line, with a binary response.
returns a memoryview that can only be used until the next operation on this Connection object.
-
cmd
(operation: str)¶ put a mapi command on the line
-
connect
(database: str, username: str, password: str, language: str, hostname: Optional[str] = None, port: Optional[int] = None, unix_socket=None, connect_timeout=-1, handshake_options_callback: Callable[[bool], List[HandshakeOption]] = <function Connection.<lambda>>)¶ setup connection to MAPI server
unix_socket is used if hostname is not defined.
-
disconnect
()¶ disconnect from the monetdb server
-
set_downloader
(downloader: Downloader)¶ Register the given Downloader, or None to deregister
-
set_reply_size
(size)¶ Set the amount of rows returned by the server.
- args:
- size: The number of rows
-
set_uploader
(uploader: Uploader)¶ Register the given Uploader, or None to deregister
-
-
class
pymonetdb.mapi.
HandshakeOption
(level, name, fallback, value)¶ Bases:
object
Option that can be set during the MAPI handshake
Should be sent as <name>=<val>, where <val> is value converted to int. The level is used to determine if the server supports this option. The fallback is a function-like object that can be called with the value (not converted to an integer) as a parameter. Field sent can be used to keep track of whether the option has been sent.
-
pymonetdb.mapi.
handle_error
(error)¶ Return exception matching error code.
- args:
- error (str): error string, potentially containing mapi error code
- returns:
- tuple (Exception, formatted error): returns OperationalError if unknown
- error or no error code in string
-
pymonetdb.mapi.
mapi_url_options
(possible_mapi_url: str) → Dict[str, str]¶ Try to parse the argument as a MAPI URL and return a Dict of url options
Return empty dict if it’s not a MAPI URL.
File Uploads and Downloads¶
Classes related to file transfer requests as used by COPY INTO ON CLIENT.
-
class
pymonetdb.filetransfer.
Upload
(mapi: MapiConnection)¶ Represents a request from the server to upload data to the server. It is passed to the Uploader registered by the application, which for example might retrieve the data from a file on the client system. See pymonetdb.sql.connections.Connection.set_uploader().
Use the method send_error() to refuse the upload, binary_writer() to get a binary file object to write to, or text_writer() to get a text-mode file object to write to.
Implementations should be VERY CAREFUL to validate the file name before opening any files on the client system!
-
is_cancelled
() → bool¶ Returns true if the server has cancelled the upload.
-
has_been_used
() → bool¶ Returns true if .send_error(), .text_writer() or .binary_writer() have been called.
-
set_chunk_size
(size: int)¶ After every CHUNK_SIZE bytes, the server gets the opportunity to cancel the rest of the upload. Defaults to 1 MiB.
-
send_error
(message: str) → None¶ Tell the server the requested upload has been refused
-
binary_writer
() → io.BufferedIOBase¶ Returns a binary file-like object. All data written to it is uploaded to the server.
-
text_writer
() → io.TextIOBase¶ Returns a text-mode file-like object. All text written to it is uploaded to the server. DOS/Windows style line endings (CR LF, \r \n) are automatically rewritten to single \n’s.
-
close
()¶ End the upload succesfully
-
-
class
pymonetdb.filetransfer.
Uploader
¶ Base class for upload hooks. Instances of subclasses of this class can be registered using pymonetdb.Connection.set_uploader(). Every time an upload request is received, an Upload object is created and passed to this objects .handle_upload() method.
If the server cancels the upload halfway, the .cancel() methods is called and all further data written is ignored.
-
handle_upload
(upload: pymonetdb.filetransfer.uploads.Upload, filename: str, text_mode: bool, skip_amount: int)¶ Called when an upload request is received. Implementations should either send an error using upload.send_error(), or request a writer using upload.text_writer() or upload.binary_writer(). All data written to the writer will be sent to the server.
Parameter ‘filename’ is the file name used in the COPY INTO statement. Parameter ‘text_mode’ indicates whether the server requested a text file or a binary file. In case of a text file, ‘skip_amount’ indicates the number of lines to skip. In binary mode, ‘skip_amount’ is always 0.
SECURITY NOTE! Make sure to carefully validate the file name before opening files on the file system. Otherwise, if an adversary has taken control of the network connection or of the server, they can use file upload requests to read arbitrary files from your computer (../../)
-
cancel
()¶ Optional method called when the server cancels the upload.
-
-
class
pymonetdb.filetransfer.
Download
(mapi: pymonetdb.mapi.Connection)¶ Represents a request from the server to download data from the server. It is passed to the Downloader registered by the application, which for example might write the data to a file on the client system. See pymonetdb.Connection.set_downloader().
Use the method send_error() to refuse the download, binary_reader() to get a binary file object to read bytes from, or text_reader() to get a text-mode file object to read text from.
Implementations should be EXTREMELY CAREFUL to validate the file name before opening and writing to any files on the client system!
-
send_error
(message: str) → None¶ Tell the server the requested download is refused
-
binary_reader
()¶ Returns a binary file-like object to read the downloaded data from.
-
text_reader
()¶ Returns a text mode file-like object to read the downloaded data from.
-
close
()¶ End the download succesfully. Any unconsumed data will be discarded.
-
-
class
pymonetdb.filetransfer.
Downloader
¶ Base class for download hooks. Instances of subclasses of this class can be registered using pymonetdb.Connection.set_downloader(). Every time a download request arrives, a Download object is created and passed to this objects .handle_download() method.
SECURITY NOTE! Make sure to carefully validate the file name before opening files on the file system. Otherwise, if an adversary has taken control of the network connection or of the server, they can use download requests to OVERWRITE ARBITRARY FILES on your computer
-
handle_download
(download: pymonetdb.filetransfer.downloads.Download, filename: str, text_mode: bool)¶ Called when a download request is received. Implementations should either send an error using download.send_error(), or request a reader using download.text_reader() or download.binary_reader().
Parameter ‘filename’ is the file name used in the COPY INTO statement. Parameter ‘text_mode’ indicates whether the server requested text or binary mode.
SECURITY NOTE! Make sure to carefully validate the file name before opening files on the file system. Otherwise, if an adversary has taken control of the network connection or of the server, they can use file download requests to overwrite arbitrary files on your computer. (../../)
-
-
class
pymonetdb.filetransfer.
SafeDirectoryHandler
(dir, encoding: Optional[str] = None, newline: Optional[str] = None, compression=True)¶ File transfer handler which uploads and downloads files from a given directory, taking care not to allow access to files outside that directory. Instances of this class can be registered using the pymonetb.Connection’s set_uploader() and set_downloader() methods.
When downloading text files, the downloaded text is converted according to the encoding and newline parameters, if present. Valid values for encoding are any encoding known to Python, or None. Valid values for newline are “\n”, “\r\n” or None. None means to use the system default.
For binary up- and downloads, no conversions are applied.
When uploading text files, the encoding parameter indicates how the text is read and newline is mostly ignored: both \n and \r\n are valid line endings. The exception is that because the server expects its input to be \n-terminated UTF-8 text, if you set encoding to “utf-8” and newline to “\n”, text mode transfers are performed as binary, which improves performance. For uploads, only do this if you are absolutely, positively sure that all files in the directory are actually valid UTF-8 encoded and have Unix line endings.
If compression is set to True, which is the default, the SafeDirectoryHandler will automatically compress and decompress files with extensions .gz, .bz2, .xz and .lz4. Note that the first three algorithms are built into Python, but LZ4 only works if the lz4.frame module is available.
-
handle_upload
(upload: pymonetdb.filetransfer.uploads.Upload, filename: str, text_mode: bool, skip_amount: int)¶ Meta private:
-
handle_download
(download: pymonetdb.filetransfer.downloads.Download, filename: str, text_mode: bool)¶ Called when a download request is received. Implementations should either send an error using download.send_error(), or request a reader using download.text_reader() or download.binary_reader().
Parameter ‘filename’ is the file name used in the COPY INTO statement. Parameter ‘text_mode’ indicates whether the server requested text or binary mode.
SECURITY NOTE! Make sure to carefully validate the file name before opening files on the file system. Otherwise, if an adversary has taken control of the network connection or of the server, they can use file download requests to overwrite arbitrary files on your computer. (../../)
-
MonetDB remote control¶
-
class
pymonetdb.control.
Control
(hostname=None, port=50000, passphrase=None, unix_socket=None, connect_timeout=-1)¶ Bases:
object
Use this module to manage your MonetDB databases. You can create, start, stop, lock, unlock, destroy your databases and request status information.
-
create
(database_name)¶ Initialises a new database or multiplexfunnel in the MonetDB Server. A database created with this command makes it available for use, however in maintenance mode (see pymonetdb lock).
-
defaults
()¶
-
destroy
(database_name)¶ Removes the given database, including all its data and logfiles. Once destroy has completed, all data is lost. Be careful when using this command.
-
get
(database_name)¶ gets value for property for the given database, or retrieves all properties for the given database
-
inherit
(database_name, property_)¶ unsets property, reverting to its inherited value from the default configuration for the given database
-
kill
(database_name)¶ Kills the given database, if the MonetDB Database Server is running. Note: killing a database should only be done as last resort to stop a database. A database being killed may end up with data loss.
-
lock
(database_name)¶ Puts the given database in maintenance mode. A database under maintenance can only be connected to by the DBA. A database which is under maintenance is not started automatically. Use the “release” command to bring the database back for normal usage.
-
neighbours
()¶
-
release
(database_name)¶ Brings back a database from maintenance mode. A released database is available again for normal use. Use the “lock” command to take a database under maintenance.
-
rename
(old, new)¶
-
set
(database_name, property_, value)¶ sets property to value for the given database for a list of properties, use pymonetdb get all
-
start
(database_name)¶ Starts the given database, if the MonetDB Database Server is running.
-
status
(database_name=False)¶ Shows the state of a given glob-style database match, or all known if none given. Instead of the normal mode, a long and crash mode control what information is displayed.
-
stop
(database_name)¶ Stops the given database, if the MonetDB Database Server is running.
-
-
pymonetdb.control.
isempty
(result)¶ raises an exception if the result is not empty
-
pymonetdb.control.
parse_statusline
(line)¶ parses a sabdb format status line. Support v1 and v2.
pymonetdb Exceptions¶
MonetDB Python API specific exceptions
-
exception
pymonetdb.exceptions.
DataError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised for errors that are due to problems with the processed data like division by zero, numeric value out of range, etc. It must be a subclass of DatabaseError.
-
exception
pymonetdb.exceptions.
DatabaseError
¶ Bases:
pymonetdb.exceptions.Error
Exception raised for errors that are related to the database. It must be a subclass of Error.
-
exception
pymonetdb.exceptions.
Error
¶ Bases:
Exception
Exception that is the base class of all other error exceptions. You can use this to catch all errors with one single ‘except’ statement. Warnings are not considered errors and thus should not use this class as base. It must be a subclass of the Python StandardError (defined in the module exceptions).
-
exception
pymonetdb.exceptions.
IntegrityError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised when the relational integrity of the database is affected, e.g. a foreign key check fails. It must be a subclass of DatabaseError.
-
exception
pymonetdb.exceptions.
InterfaceError
¶ Bases:
pymonetdb.exceptions.Error
Exception raised for errors that are related to the database interface rather than the database itself. It must be a subclass of Error.
-
exception
pymonetdb.exceptions.
InternalError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised when the database encounters an internal error, e.g. the cursor is not valid anymore, the transaction is out of sync, etc. It must be a subclass of DatabaseError.
-
exception
pymonetdb.exceptions.
NotSupportedError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised in case a method or database API was used which is not supported by the database, e.g. requesting a .rollback() on a connection that does not support transaction or has transactions turned off. It must be a subclass of DatabaseError.
-
exception
pymonetdb.exceptions.
OperationalError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised for errors that are related to the database’s operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, a memory allocation error occurred during processing, etc. It must be a subclass of DatabaseError.
-
exception
pymonetdb.exceptions.
ProgrammingError
¶ Bases:
pymonetdb.exceptions.DatabaseError
Exception raised for programming errors, e.g. table not found or already exists, syntax error in the SQL statement, wrong number of parameters specified, etc. It must be a subclass of DatabaseError.
-
exception
pymonetdb.exceptions.
Warning
¶ Bases:
Exception
Exception raised for important warnings like data truncations while inserting, etc. It must be a subclass of the Python StandardError (defined in the module exceptions).
Development¶
Github¶
We maintain pymonetdb on GitHub. If you have problems with pymonetdb, please raise an issue in the issue tracker. Even better is if you have a solution to the problem! In that case, you can make our lives easier by following these steps:
- Fork our repository on GitHub
- Add tests that will fail because of the problem
- Fix the problem
- Run the test suite again
- Commit all changes to your repository
- Issue a GitHub pull request.
Also, we try to be pep8 compatible as much as possible, where possible and reasonable.
Test suite¶
pymonetdb comes with a test suite to verify that the code works and make development easier.
Prepare test databases¶
Most tests use an existing MonetDB database that you must prepare beforehand. By default they try to connect to a database named “demo” but this can be configured otherwise, see below.
Some of the tests rely on a running MonetDB daemon, to test creating and destroying new databases. This daemon also needs to be prepared beforehand, and configured to allow control connections. Alternatively, you may disable the control tests by setting the environment variable TSTCONTROL=off.
The commands below assume an environment without any running MonetDB processes.
Create a test database farm, e.g. “/tmp/pymonetdbtest”, and the “demo” database:
$ monetdbd create /tmp/pymonetdbtest
$ monetdbd start /tmp/pymonetdbtest
$ monetdb create demo
$ monetdb release demo
If you want to run the control tests (in tests/test_control.py), you need to set a passphrase and enable remote control:
$ monetdbd set control=yes /tmp/pymonetdbtest
$ monetdbd set passphrase=testdb /tmp/pymonetdbtest
$ monetdbd stop /tmp/pymonetdbtest
$ monetdbd start /tmp/pymonetdbtest
Note 1: Test databases created by test_control.py are cleaned up after the control tests have finished. However, the demo database and the MonetDB daemon itself are neither stopped nor destroyed.
Note 2: The above commands are also in the file tests/initdb.sh. Once the database farm has been created, you can use that script to do the remaining work:
$ tests/initdb.sh demo /tmp/pymonetdbtest
WARNING: initdb.sh will destroy the given database demo WITHOUT asking for confirmation!
Run tests¶
There are many ways to run the tests. Below we list several often-used commands. The commands should be run in the root directory of the pymonetdb source directory.
With Python unittest:
$ python -m unittest # to run all tests $ python -m unittest -f # to run all tests but stop after the first failure $ python -m unittest -v # to run all tests and get information about individual test $ python -m unittest -v tests.test_policy # to run all tests of the module "tests.test_policy" $ python -m unittest -v -k test_fetch # to run the sub-test set "test_fetch*"
With pytest:
$ pytest # to run all tests $ pytest -v # to run all tests and get information about individual test $ pytest -v tests/test_oid.py # to run one test file
With make:
$ make test
Note: make test creates a venv in which it installs and runs pytest. If you get the error “Could not install packages due to an OSError: [Errno 39] Directory not empty: ‘_internal’”, it is probably because your pymonetdb source is in a Vagrant shared folder. A simple workaround is to move your pymonetdb source to a local folder on your VM. See also vagrant.
With tox:
$ pip install tox; tox
Note: If it is not listed there, you must add your Python version to the envlist in the tox.ini file.
Environment variables¶
Several environment variables are defined in tests/util.py. Many of them are self-explanatory. Here we just highlight a few:
TSTDB is the name of the preexisting database used by most of the tests. TSTHOSTNAME, TSTUSERNAME, TSTPASSWORD and MAPIPORT control the other connection parameters. Note that for historical reasons it is MAPIPORT, not TSTPORT.
TSTPASSPHRASE is the Merovingian passphrase you must set to run the control test (see Prepare test databases above).
Some tests are skipped unless you set TSTFULL to true, e.g.:
$ TSTFULL=true python3 -m unittest -v tests/test_control.py
TSTCONTROL is used to control the tests in test_control.py. The default tcp,local means run the tests over TCP/IP (e.g. on port 50000) and the Unix domain socket (e.g. “/tmp/s.merovingian.50000”). When you run MonetDB in, e.g., a Docker container, you can turn off the tests over the Unix socket using TSTCONTROL=tcp. If you want to turn off all Merovingian tests, you can use TSTCONTROL=off (actually, any string other than “tcp” and “local” will do):
$ TSTFULL=true TSTCONTROL=tcp python3 -m unittest -v tests/test_control.py
TSTREPLYSIZE, TSTMAXPREFETCH and TSTBINARY control the size and format of the result set transfer (see Result set batch size). Check out the tests in test_policy.py for examples of implemented data transfer policies and how setting the variables replysize, maxprefetch and binary affects those policies.