What is petitdb
petitdb
is a python library that I created a few years ago. As you can guess by the name, it’s a library that provides some basic database functionnalities. Under the hood it’s just operating on shelves, inside which dicts of dicts are created. Each dict in the shelve is presented as a table, and inside each dict resides another dict that stores key/value pairs.
Use cases
One way I used this library was to store some configurations, mainly hostnames and some others attributes that go with it. For example, a Host
object definition that looked like this in the main script:
class Host:
"""Define a Host object
A host object has the following attributes:
hostname, ip address, password, system name, node name
"""
def __init__(self, hostname, ip, passwd, system, node, login_user):
self.hostname = hostname
self.ip = ip
self.login_user = login_user
self.password = passwd
self.system = system
self.node = node
def __str__(self):
return \
"{0:s}\t=>\t{1:<15s}\t{2:<20s}\t{3:<20s}\t{4:<7s}\t{5:<7s}".format(
self.hostname, self.ip, self.login_user, 'password',
self.system, self.node)
I could store the attributes in a csv file, and everytime the script boots, read and parse the csv, initialize a Host
object with every record. The problem I faced with this approach was: if somebody misconfigured the csv file, on the next boot the error could lead to the script not running properly. Using petitdb
, I can easily separate the parsing of the configuration file, and the actual processing of the main script. That is, build a function that’s purpose is only to read and parse the csv file, store the objects in a shelve using petitdb
, and the main script itself would only have to read the configurations from petitdb
, not a csv file. In this way I could ensure that when the script runs, it only reads configurations that are already validated and ready to use. OTOH, if the function that reads and parses the csv file(the configuration file) generates any error, the user knows right away that there is something wrong with csv.
As petit
also provide some convenient methods to update objects, it was quite usefull when counting logs as well. For example, To count records based on the log type and datetime, it’s usefull to store the counts in a dict while reading the log records one by one.
The data structure could look like this:
# To count errors in the syslog during a certain period of time.
counter['sys_log_errors'][datetime] = 1
Instead of directly manipulating dicts like this, I used petitdb
, as it makes incremental operations very convenient with the db.add()
method. Example:
# Using dicts, this is how we would increment:
counter['sys_log_errors']['datetime_string'] += 1
# But in case "counter" is shelve, this would not work, we'd need to retrieve the value first
current_value = counter['sys_log_errors'][datetime]
new_value = current_value + 1
counter['sys_log_errors']['datetime_string'] = new_value
# Using "petit_db", incrementing is as simple as:
db.add('sys_log_errors', 'datetime_string', 1)
Using petitdb
Installation
The tool is just one python file that you could store and import locally inside your script.
You can download it from github:
$ git clone https://github.com/ebsarr/petitdb.git
And copy the petitdb.py
in a place where your script could import it.
Features
petitdb
provides two classes:
SmallDB
: provides an easy interface to shelves. You can store and retreive any object on dicts.MemDB
: as subclass ofSmallDB
with no access to shelves, that is everything you do stays in memory.
Data manipulation on a SmallDB
or MemDB
instance can be done through the following methods:
Method | Description |
---|---|
insert | insert records |
update | replace the value of an existing records |
add | convinient method to update records |
append | convinient method to update records |
remove | remove records |
create_table | create tables(dicts) |
remove_table | remove tables(dicts) |
And the following methods to retrieve data:
Method | Description |
---|---|
select | retreive one singe record |
tables | retreive all tables from the object |
keys | retreive all keys from tables |
Storing data
Example illustrated in iptyhon
. First declare the Host
object.
In [1]: class Host:
...: def __init__(self, hostname, ip, passwd, system, node, login_user):
...: self.hostname = hostname
...: self.ip = ip
...: self.login_user = login_user
...: self.password = passwd
...: self.system = system
...: self.node = node
...:
In [2]: def __str__(self):
...: return \
...: "{0:s}\t=>\t{1:<15s}\t{2:<20s}\t{3:<20s}\t{4:<7s}\t{5:<7s}".format(
...: self.hostname, self.ip, self.login_user, 'password',
...: self.system, self.node)
...:
Import SmallDB
and create a db
object:
In [3]: from petitdb import SmallDB
In [4]: db = SmallDB('config.db')
Now we can create a table and store a Host
object in it:
In [5]: db.create_table('HostsConfig')
In [6]: db.insert('HostsConfig', 'Hostname1', Host('Hostname1', '1.1.1.1', 'password1', 'system1', 'node1', 'root'))
You can see that the object have been stored by calling db.print_db()
:
In [8]: db.print_db()
***************
* HostsConfig *
***************
key data
-----------------------------------------------------------------
Hostname1 <__main__.Host instance at 0x105945248>
To persistently save the data on disk(in the shelve), we must call db.print()
:
In [10]: db.save()
In [11]: db.close()
After closing, trying save data will initiate an error:
In [12]: db.save()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-0883b2e3c451> in <module>()
----> 1 db.save()
...
ValueError: invalid operation on closed shelf
I think I could improve the error handling here by watching the close status…
If you examine your filesystem, you can see that a config.db file was created
➜ petitdb git:(master) ✗ ls -l config.db
-rw-r--r-- 1 kemal staff 16384 Mar 26 01:23 config.db
➜ petitdb git:(master) ✗ file config.db
config.db: Berkeley DB 1.85 (Hash, version 2, native byte-order)
Retreiving data
If we go back and initialize a db
object the same way we did before, the data will be read from config.db
. The contents can be accessed easily with methods provided by SmallDB
.
You can get a list of the tables:
In [4]: db = SmallDB('config.db')
In [9]: db.tables()
Out[9]: ['HostsConfig']
And also see the keys stored inside a table:
In [10]: db.keys('HostsConfig')
Out[10]: ['Hostname1']
And get the value with db.select()
In [11]: h = db.select('HostsConfig', 'Hostname1')
In [14]: h.ip
Out[14]: '1.1.1.1'
In [16]: h.hostname
Out[16]: 'Hostname1'
I’ve mainly used this library to store complex configurations to use in some main scripts. It’s a very lightweight solution to easily store and manage a small amount of data when writing python scripts.