class Tabel

class tabel.Tabel(datastruct=None, columns=None, copy=True)

Tabel datastructure

Data table with rows and columns, rows are numbered columns are named. Each column has its own datatype. Data is stored by columns (column store), fixed datatype per column, varyiable datatypes from column to column.

Parameters:
  • datastruct (object) – list, tuple, ndarray or dict of lists, tuples, ndarrays or elements; or a pandas.DataFrame. List of columns of data. See tabel.T for a convenience function to transpose a list of records.
  • columns (list of strings) – Column names, ignored when keys are part of the datastruct (dict and pandas.DataFrame). Automatic names are generated, if omitted, as strings of column number.
  • copy (boolean) – Wether to make a copy of the data or to reference to the current memory location (when possible), default: True

Notes

  1. It is possible to create an empty Tabel instance and later add data using the tabel.Tabel.append and/or tabel.Tabel.__setitem__ methods.
  2. It is possibe to add or manipulate data directly through the instance attributes tabel.Tabel.columns and tabel.Tabel.data. One could use the tabel.Tabel.valid method to check wether the manipulated structure is still valid.
  1. If one or more (but not all) of the columns contain a single element this element is repeated to match the length of the other columns.

Examples

To initialize a Tabel, call the constructor with the data in column lists:

>>> from tabel import Tabel
>>> Tabel( [ ["John", "Joe", "Jane"],
...          [1.82, 1.65, 2.15],
...          [False, False, True] ],
...       columns = ["Name", "Height", "Married"])
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']

getter

Tabel.__getitem__(key)

Indexing and slicing parts of a Tabel.

Slicing and indexing mostly follows Numpy array and Python list conventions.

Arguments:

key (r, c):
r can be a single integer, a boolean array, an integer itereable or a slice object. c can be a single integer or string, a boolean array, an integer or string itereable or a slice object.
key (int or string) :

When only a single int is supplied, it is considered to point to a whole single row.

When only a single string is supplied, it is considered to point to a whole single column.

Returns:

Depending on key, four different types can be returned.

element ():
If both the row place and the column place are a single integer (or string for the column place), adressing a single element in the Tabel, wich could be of any datatype supported by Numpy.ndarray.
column (ndarray):
If the column place is a single string or integer, adressing a single column and the row place is either abscent or not an integer.
row (tuple) :
If the row place is a single integer, adressing a single row and the coumn place is either abscent or not a single integer/string.
Tabel (Tabel) :
If a tuple key (r, c) is provided with anything other than an integer for the row place and anything other than a single integer/string type for the column place.

Notes

Returned Tabel objects from slicing are referenced to the original Tabel object unless row indexing was with a boolean list/array or the returned type was not a Tabel or np.ndarray object. Changes made to the slice will be reflected in the original Tabel. Appending or joining Tabels or adding/renaming columns will never be reflected in the original Tabel object. Use the py:copy function to make a full copy of the object.

Raises:KeyError – When a key is referencing an invallid or not existing part of the data.

Examples

>>> tbl[:, 1:3]
   Height |   Married
----------+-----------
     1.82 |         0
     1.65 |         0
     2.15 |         1
3 rows ['<f8', '|b1']
>>> tbl[0, 0]
'John'
>>> tbl["Name"]
array(['John', 'Joe', 'Jane'], dtype='<U4')
>>> tbl[0]
('John', 1.82, False)

setter

Tabel.__setitem__(key, value)

Setting a slice of a Tabel

Setting, like getting, slices mostly follows numpy conventions. Specifically the rules for the key are the same as for tabel.Tabel.__getitem__ with the same relation between key and expected type for the value. In adition this method can also be used to add new columns.

Parameters:
  • key (int or string) –

    r can be a single integer, a boolean array, an integer itereable or a slice object.

    c can be a single integer or string, a boolean array, an integer or string itereable or a slice object.

    To adress a single element in the Tabel object the key should be a tuple of (r, c) with r a single integer adressing the row and c a single integer or string addressing the column of the element to be changed.

  • key

    When only a single int is supplied, it is considered to point to a whole single row.

    When only a single string is supplied, it is considered to point to a whole single column.

  • value (object) –

    The type the value needs to have depends on the key provided.

    element:
    A single element of the same type, or a type convertable to the same, as the column targeted as a destination. See tabel.Tabel.dtype to get the type of the columns.
    column :
    An array or list of elements, each element of of the same type, or a type convertable to the same, as the column targeted as a destination. If a new column is targeted a single element could be provided, in which case it will be replicated along all rows.
    row :
    A tuple of elements, each of the same type or a type convertable to the same, as the column targeted as a destination. Length of the tuple should match the number of columns addressed.
    Tabel :
    Not currently implemented.
Returns:

nothing, change in-place.

Notes

When changing a column two syntaxes give approximately the same result, with, however, a noteable difference. Using a slice object “:” will change all elements of the column with the new element(s) provided. If just the colum name is provided, with no indication for row, than the whole column is replaced with the column provided.

>>> tbl = Tabel( [ ["John", "Joe", "Jane"], [1.82, 1.65, 2.15],
...              [False, False, True] ], columns = ["Name", "Height", "Married"])
>>> tbl[:, "Name"] = [1, 2, 3]
>>> tbl
   Name |   Height |   Married
--------+----------+-----------
      1 |     1.82 |         0
      2 |     1.65 |         0
      3 |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']
>>> tbl["Name"] = [1, 2, 3]
>>> tbl
   Name |   Height |   Married
--------+----------+-----------
      1 |     1.82 |         0
      2 |     1.65 |         0
      3 |     2.15 |         1
3 rows ['<i8', '<f8', '|b1']

Note how in the first case the type of the name column stays “<U8” while seccond case the type of the Name column changes to “<i8”.

delitem

Tabel.__delitem__(key)

Deleting rows or columns from a Tabel.

Deleting rows or columns can be done using the del keyword.

Parameters:

key (int, list of ints, slice or string) –

If the key is a single integer, a list of integers or a slice object, then the specified rows will be removed from the Tabel.

If the key is a single string, then the specified column will be removed from the Tabel.

Returns:

nothing, change in-place.

Raises:
  • IndexError – When key is an integer or list of integers that references an invalid row.

    Note that no exception is thrown if key is a slice object that refers to one or more invalid rows.

  • ValueError – When key is a string that references an invalid column.

Notes

Because Tabel stores data by columns, this operation requires creating new numpy arrays for all columns in the Tabel.

Examples:

>>> tbl = Tabel( [ ["John", "Joe", "Jane"], [1.82, 1.65, 2.15],
...              [False, False, True] ], columns = ["Name", "Height", "Married"])
>>> del tbl["Name"]
>>> del tbl[0]
>>> tbl
   Height |   Married
----------+-----------
     1.65 |         0
     2.15 |         1
2 rows ['<f8', '|b1']
>>> del tbl[0:2]
>>> tbl
 Height   | Married
----------+-----------
0 rows ['<f8', '|b1']
>>> del tbl['Married']
>>> tbl
 Height
----------
0 rows ['<f8']

repr

Tabel.__repr__()

Pretty print using tabulate.

Examples

>>> tbl = Tabel( [ ["John", "Joe", "Jane"], [1.82, 1.65, 2.15],
...          [False, False, True] ], columns = ["Name", "Height", "Married"])
>>> tbl
 Name   |   Height |   Married
--------+----------+-----------
 John   |     1.82 |         0
 Joe    |     1.65 |         0
 Jane   |     2.15 |         1
3 rows ['<U4', '<f8', '|b1']

append

Tabel.append(tbl)

Append new Tabel to the current Tabel.

Append a Tabel or pandas.DataFrame to the end of this Tabel. Each column is appended to each column of the instance invoking the method.

Parameters:tbl (Tabel) – Tabel with the same columns as the current Tabel, order of columns does not need to match. Columns do not need to match if the current Tabel has zero length. Besides Tabel onjects pandas.DataFrame objects are also allowed.
Returns:Nothing, change in-place.

row_append

Tabel.row_append(row)

Append a row reccord at the end of the Tabel.

Appending a single row at the end of the Tabel.

Parameters:row (dict, list, tuple) – The row to be appended to the Tabel. If a dict is provided the keys should match the column names of the Tabel. If a list or tuple is provided the length and order should match the columns of the Tabel. columns do not need to match if the current Tabel has zero length.
Returns:Nothing. Change in-place.

join

Tabel.join(tbl_r, key, key_r=None, jointype=u'inner', suffixes=(u'_l', u'_r'))

dbase join tables with ind column(s) as the keys.

Performs a database style joins on the two tables, the current instance and the provided tabel ‘tbl_r’ on the columns listed in ‘key’.

Parameters:
  • tbl_r (Tabel) – The right tabel to be joined.
  • key (string or list) – Name of the column(s) to be used as the key(s).
  • key_r (list) – A list of columnnames of the right tabel matching the left tabel. Defaults to the list provided in ind.
  • jointype (str) – One of: inner, left, right, outer. If inner, returns the elements common to both tabels. If outer, returns the common elements as well as the elements of the left tabel not in the right tabel and the elements of the right tabel not in the left tabel. If left, returns the common elements and the elements of the left tabel not in the right tabel. If right, returns the common elements and the elements of the right tabel not in the left tabel.
  • suffixes (tuple) – Strings to be added to the left and right tabel column names.
Returns:

The joined tabel

Notes

The order and suffixes of the returned Tabel depend on the jointype. For all types, all but the key columns are suffixed with the left and the right suffix respectively. The left Tabel columns come first followed by the right Tabel columns, with the key column placed first of its Tabel columns. For inner and left jointypes the right key column is left out. for right jointype the left key column is left out. For the outer jointype both keys are present and suffixed.

Examples

Join a Tabel into the current Tabel matching on column ‘a’:

>>> tbl = Tabel({"a":list(range(4)), "b": ['a', 'b'] *2})
>>> tbl_b = Tabel({"a":list(range(4)), "c": ['d', 'e'] *2})
>>> tbl.join(tbl_b, "a")
   a | b_l   | c_r
-----+-------+-------
   0 | a     | d
   1 | b     | e
   2 | a     | d
   3 | b     | e
4 rows ['<i8', '<U1', '<U1']

group_by

Tabel.group_by(key, aggregate_fie_col=None)

Groups and aggregates Tabel.

Parameters:
  • key (str or list) – name or list of names of the columns to be grouped by.
  • aggregate_fie_col (list) – list of tuples (function, column) where function is the function to be applied to aggregate and column is the string name of the column. function should take an 1D array as an input and the returned value is treated as a single element. Only the grouped columns of key are returned if ommited.
Returns:

Tabel object with requested columns

Examples

grouping by ‘a’ and then by ‘b’, agregating with taking the sum of ‘a’ elements and taking the first ‘c’ element of each group:

>>> tbl = Tabel({'a':[10, 20, 30, 40]*3, 'b':["100", "200"]*6, 'c':[100, 200]*6})
>>> from tabel import first
>>> tbl.group_by(['b', 'a'], [ (np.sum, 'a'), (first, 'c')])
   b |   a |   a_sum |   c_first
-----+-----+---------+-----------
 100 |  10 |      30 |       100
 200 |  20 |      60 |       200
 100 |  30 |      90 |       100
 200 |  40 |     120 |       200
4 rows ['<U3', '<i8', '<i8', '<i8']

sort

Tabel.sort(columns)

Sort the Tabel.

Sorting in-place the Tabel according to columns provided. Rows always stay together, just the order of rows is affectd.

Parameters:columns (string or list) – column name or column names to be sorted, listed in-order.
Returns:Nothing. Sorting in-place.

Examples

>>> tbl = Tabel({'a':['b', 'g', 'd'], 'b':list(range(3))})
>>> tbl.sort('a')
>>> tbl
 a   |   b
-----+-----
 b   |   0
 d   |   2
 g   |   1
3 rows ['<U1', '<i8']

astype

Tabel.astype(dtypes)

Returns a type-converted tabel.

Converts the tabel according to the provided list of dtypes and returns a new Tabel instance.

Parameters:dtypes (list) – list of valid numpy dtypes in the order of the columns. List should have same length as number of columns present (see Tabel.shape) See Tabel.dtype for the current types of the Tabel.
Returns:Tabel object with the columns converted to the new dtype.

Examples:

save

Tabel.save(filename, fmt=u'auto', header=True)

Save to file

Saves the Tabel data including a header with the column names to a file of the specified name in the current directory or the directory specified.

Parameters:
  • filename (str) – filename, should include path
  • fmt (str) –

    formatting, valid values are: ‘auto’, ‘csv’, ‘npz’, ‘gz’

    auto :
    Determine the filetype from the fiel extension.
    csv :
    Write to csv file using pythons csv module.
    gz :
    Write to csv using pythons csv module and zip using standard gzip module.
    npz :
    Write to compressed numpy native binary format.
  • header (bool) – whether to write a header line with the column names, only used for csv and gz
Returns:

Nothing.

properties

dict

Tabel.dict

Dump all data as a dict of columns.

Keywords are the column names and values are the column Numpy.ndarrays. Usefull when transferring to a pandas DataFrame.

shape

Tabel.shape

Tabel shape.

Returns:tuple (r, c) with r the number of rows and c the number of columns.

len

Tabel.__len__ = <unbound method Tabel.__len__>

dtype

Tabel.dtype

List of dtypes of the data columns.

valid

Tabel.valid

Check wether the current datastructure is legit.

Returns:(bool) True if the Tabel internal structure is valid.

Notes

This is currently checking for the length of the columns to be the same and the number of the columns to be the same as the number of column names.

class attributes

Tabel.repr_layout = u'presto'

The layout used with tabulate in the __repr__() method.

Type:string
Tabel.max_repr_rows = 20

Maximum number of rows to show when __repr__() is invoked.

Type:int
Tabel.join_fill_value = {u'float': nan, u'integer': 999999, u'string': u''}

Fill vallues to be used when doing outer joins

Type:dict