Introducing petitdb

2018/03/26

python

What is petitdb

petitdb is a python library that I created a few years ago. As you can guess by the name, it’s a library that provides some basic database functionnalities. Under the hood it’s just operating on shelves, inside which dicts of dicts are created. Each dict in the shelve is presented as a table, and inside each dict resides another dict that stores key/value pairs.

Use cases

One way I used this library was to store some configurations, mainly hostnames and some others attributes that go with it. For example, a Host object definition that looked like this in the main script:

class Host:
    """Define a Host object
    A host object has the following  attributes:
      hostname, ip address, password, system name, node name
    """
    def __init__(self, hostname, ip, passwd, system, node, login_user):
        self.hostname = hostname
        self.ip = ip
        self.login_user = login_user
        self.password = passwd
        self.system = system
        self.node = node

    def __str__(self):
        return \
            "{0:s}\t=>\t{1:<15s}\t{2:<20s}\t{3:<20s}\t{4:<7s}\t{5:<7s}".format(
                self.hostname, self.ip, self.login_user, 'password',
                self.system, self.node)

I could store the attributes in a csv file, and everytime the script boots, read and parse the csv, initialize a Host object with every record. The problem I faced with this approach was: if somebody misconfigured the csv file, on the next boot the error could lead to the script not running properly. Using petitdb, I can easily separate the parsing of the configuration file, and the actual processing of the main script. That is, build a function that’s purpose is only to read and parse the csv file, store the objects in a shelve using petitdb, and the main script itself would only have to read the configurations from petitdb, not a csv file. In this way I could ensure that when the script runs, it only reads configurations that are already validated and ready to use. OTOH, if the function that reads and parses the csv file(the configuration file) generates any error, the user knows right away that there is something wrong with csv.

As petit also provide some convenient methods to update objects, it was quite usefull when counting logs as well. For example, To count records based on the log type and datetime, it’s usefull to store the counts in a dict while reading the log records one by one.
The data structure could look like this:

# To count errors in the syslog during a certain period of time.
counter['sys_log_errors'][datetime] = 1

Instead of directly manipulating dicts like this, I used petitdb, as it makes incremental operations very convenient with the db.add() method. Example:

# Using dicts, this is how we would increment:
counter['sys_log_errors']['datetime_string'] += 1
# But in case "counter" is shelve, this would not work, we'd need to retrieve the value first
current_value = counter['sys_log_errors'][datetime]
new_value = current_value + 1
counter['sys_log_errors']['datetime_string'] = new_value

# Using "petit_db", incrementing is as simple as:
db.add('sys_log_errors', 'datetime_string', 1)

Using `petitdb`

Installation

The tool is just one python file that you could store and import locally inside your script.
You can download it from github:

$ git clone https://github.com/ebsarr/petitdb.git

And copy the petitdb.py in a place where your script could import it.

Features

petitdb provides two classes:

SmallDB: provides an easy interface to shelves. You can store and retreive any object on dicts.
MemDB: as subclass of SmallDB with no access to shelves, that is everything you do stays in memory.

Data manipulation on a SmallDB or MemDB instance can be done through the following methods:

Method	Description
insert	insert records
update	replace the value of an existing records
add	convinient method to update records
append	convinient method to update records
remove	remove records
create_table	create tables(dicts)
remove_table	remove tables(dicts)

And the following methods to retrieve data:

Method	Description
select	retreive one singe record
tables	retreive all tables from the object
keys	retreive all keys from tables

Storing data

Example illustrated in iptyhon. First declare the Host object.

In [1]: class Host:
   ...:         def __init__(self, hostname, ip, passwd, system, node, login_user):
   ...:                 self.hostname = hostname
   ...:                 self.ip = ip
   ...:                 self.login_user = login_user
   ...:                 self.password = passwd
   ...:                 self.system = system
   ...:                 self.node = node
   ...:

In [2]:     def __str__(self):
   ...:             return \
   ...:                 "{0:s}\t=>\t{1:<15s}\t{2:<20s}\t{3:<20s}\t{4:<7s}\t{5:<7s}".format(
   ...:                     self.hostname, self.ip, self.login_user, 'password',
   ...:                     self.system, self.node)
   ...:

Import SmallDB and create a db object:

In [3]: from petitdb import SmallDB
In [4]: db = SmallDB('config.db')

Now we can create a table and store a Host object in it:

In [5]: db.create_table('HostsConfig')
In [6]: db.insert('HostsConfig', 'Hostname1', Host('Hostname1', '1.1.1.1', 'password1', 'system1', 'node1', 'root'))

You can see that the object have been stored by calling db.print_db():

In [8]: db.print_db()
***************
* HostsConfig *
***************
key				data
-----------------------------------------------------------------
Hostname1				<__main__.Host instance at 0x105945248>

To persistently save the data on disk(in the shelve), we must call db.print():

In [10]: db.save()
In [11]: db.close()

After closing, trying save data will initiate an error:

In [12]: db.save()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-0883b2e3c451> in <module>()
----> 1 db.save()
...
ValueError: invalid operation on closed shelf

I think I could improve the error handling here by watching the close status…

If you examine your filesystem, you can see that a config.db file was created

➜  petitdb git:(master) ✗ ls -l config.db
-rw-r--r--  1 kemal  staff  16384 Mar 26 01:23 config.db
➜  petitdb git:(master) ✗ file config.db
config.db: Berkeley DB 1.85 (Hash, version 2, native byte-order)

Retreiving data

If we go back and initialize a db object the same way we did before, the data will be read from config.db. The contents can be accessed easily with methods provided by SmallDB.
You can get a list of the tables:

In [4]: db = SmallDB('config.db')
In [9]: db.tables()
Out[9]: ['HostsConfig']

And also see the keys stored inside a table:

In [10]: db.keys('HostsConfig')
Out[10]: ['Hostname1']

And get the value with db.select()

In [11]: h = db.select('HostsConfig', 'Hostname1')
In [14]: h.ip
Out[14]: '1.1.1.1'
In [16]: h.hostname
Out[16]: 'Hostname1'

I’ve mainly used this library to store complex configurations to use in some main scripts. It’s a very lightweight solution to easily store and manage a small amount of data when writing python scripts.