Cookbook¶
Initialization¶
API documentation: tabel.Tabel
.
From a list of lists and a separate list of column names:
>>> from tabel import Tabel
>>> tbl = Tabel([ ["John", "Joe", "Jane"],
... [1.82,1.65,2.15],
... [False,False,True]], columns = ["Name", "Height", "Married"])
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
From a dictionary:
>>> data = {'Name' : ["John", "Joe", "Jane"],
... 'Height' : [1.82,1.65,2.15],
... 'Married': [False,False,True]}
>>> tbl = Tabel(data)
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
From numpy arrays:
>>> Tabel([np.array(["John", "Joe", "Jane"]),
... np.array([1.82,1.65,2.15]),
... np.array([False,False,True])])
0 | 1 | 2
------+------+-----
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Or initialize an empty Tabel:
>>> tbl = Tabel()
>>> tbl
0 rows []
>>> len(tbl), tbl.shape
(0, (0, 0))
Slicing¶
API documentation: tabel.Tabel.__getitem__
.
See the API reference for a detailed description: Tabel.__getitem__.
Some common examples:
>>> tbl = Tabel([ ["John", "Joe", "Jane"],
... [1.82,1.65,2.15],
... [False,False,True]], columns = ["Name", "Height", "Married"])
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Slicing works on both rows and columns:
>>> tbl[:,1:3]
Height | Married
----------+-----------
1.82 | 0
1.65 | 0
2.15 | 1
3 rows ['<f8', '|b1']
Indexing:
>>> tbl[[1,2],[0,2]]
Name | Married
--------+-----------
Joe | 0
Jane | 1
2 rows ['<U4', '|b1']
Using named columns:
>>> tbl[:,['Name','Married']]
Name | Married
--------+-----------
John | 0
Joe | 0
Jane | 1
3 rows ['<U4', '|b1']
the “:” can be left out, if you’re addressing columns by their names:
>>> tbl[:, ['Name','Married']]
Name | Married
--------+-----------
John | 0
Joe | 0
Jane | 1
3 rows ['<U4', '|b1']
Indexing using boolean array’s:
>>> index = ~tbl[:,'Married']
>>> tbl[index, :]
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
2 rows ['<U4', '<f8', '|b1']
(The “:” can be omitted for columns as well)
If a single index is given for the row or colum position, the returned datatype is a row (tuple) or column (array) instead of Tabel:
>>> tbl['Married']
array([False, False, True])
>>> tbl[1:2, 'Married']
array([False])
In all other cases the returned datatype is Tabel, including lists of length one:
>>> tbl[:, ['Married']]
Married
-----------
0
0
1
3 rows ['|b1']
Equally so for rows:
>>> tbl[2]
('Jane', 2.15, True)
>>> tbl[2,1:3]
(2.15, True)
Finally, single elements are obtained by individually addressing them:
>>> tbl[0,"Name"]
'John'
Setting¶
API documentation: tabel.Tabel.__setitem__
.
There is an detailed description in the API documentation:
tabel.Tabel.__setitem__
. Generally, one just provides the datatype and
shape that would have come from the equivalent get call:
Set a single element:
>>> tbl[0,"Name"] = "Jos"
>>> tbl
Name | Height | Married
--------+----------+-----------
Jos | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Set (part of) a column:
>>> tbl[0:2,1] = np.array([2,3])
>>> tbl
Name | Height | Married
--------+----------+-----------
Jos | 2 | 0
Joe | 3 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Set (part of) a single row:
>>> tbl[0] = ["John", 1.333, 0]
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.333 | 0
Joe | 3 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Referencing¶
Slices are references, Slices return new table objects, but their data always refers back to the original one as long as that remains in existence. Exceptions are:
- New initialization of Tabel objects copy the data
- Boolean or integer indexing of rows returns a Tabel object with copied data
To show, take a slice from tbl, modify its first columns, check out the original tbl:
>>> tbl = Tabel({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> tbl_b = tbl[:,[1,2]]
>>> tbl_b[:,0] = [1.3,1.4,1.5]
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.3 | 0
Joe | 1.4 | 0
Jane | 1.5 | 1
3 rows ['<U4', '<f8', '|b1']
Column arrays are references too, with the same exceptions as slices. Therefore the standard numpy arithmetic can be used:
>>> tbl = Tabel({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> tbl["Height"] *= 2
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 3.64 | 0
Joe | 3.3 | 0
Jane | 4.3 | 1
3 rows ['<U4', '<f8', '|b1']
Appending¶
Appending Tabel¶
API documentation: tabel.Tabel.append
.
Tabels can be appended with their append method:
>>> tbl = Tabel([ ["John", "Joe", "Jane"],
... [1.82,1.65,2.15],
... [False,False,True]], columns = ["Name", "Height", "Married"])
>>> tblb = Tabel([["Bas"],[2.01],[True]], columns=["Name", "Height", "Married"])
>>> tbl.append(tblb)
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
Bas | 2.01 | 1
4 rows ['<U4', '<f8', '|b1']
or using the “+=” syntax:
>>> tblb = Tabel([["Bas"],[2.01],[True]], columns=["Name", "Height", "Married"])
>>> tbl = Tabel([ ["John", "Joe", "Jane"],
... [1.82,1.65,2.15],
... [False,False,True]], columns = ["Name", "Height", "Married"])
>>> tbl += tblb
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
Bas | 2.01 | 1
4 rows ['<U4', '<f8', '|b1']
Appending row¶
API documentation: tabel.Tabel.row_append
.
You can also append a row (dict, list or tuple) at the end of the Tabel, for example:
>>> tbl.row_append({'Name':"Jack", 'Height':1.82, 'Married':1})
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
Bas | 2.01 | 1
Jack | 1.82 | 1
5 rows ['<U4', '<f8', '<i8']
Appending column¶
API documentation: tabel.Tabel.__setitem__
.
To add a new column to the Tabel, just provide a new column name:
>>> tbl = Tabel({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> tbl["New"] = "Foo" >>> tbl["Newer"] = list(range(3)) >>> tbl Name | Height | Married | New | Newer --------+----------+-----------+-------+--------- John | 1.82 | 0 | Foo | 0 Joe | 1.65 | 0 | Foo | 1 Jane | 2.15 | 1 | Foo | 2 3 rows ['<U4', '<f8', '|b1', '<U3', '<i8']
- Notes:
When changing a column two syntaxes give approximately the same result, with, however, a noteable difference. Using a slice object “:” will change all elements of the column with the new element(s) provided. If just the colum name is provided, with no indication for row, than the whole column is replaced with the column provided.
>>> tbl = Tabel( [ ["John", "Joe", "Jane"], ... [1.82,1.65,2.15], ... [False,False,True] ], ... columns = ["Name", "Height", "Married"]) >>> tbl[:, "Name"] = [1, 2, 3] >>> tbl Name | Height | Married --------+----------+----------- 1 | 1.82 | 0 2 | 1.65 | 0 3 | 2.15 | 1 3 rows ['<U4', '<f8', '|b1'] >>> tbl["Name"] = [1, 2, 3] >>> tbl Name | Height | Married --------+----------+----------- 1 | 1.82 | 0 2 | 1.65 | 0 3 | 2.15 | 1 3 rows ['<i8', '<f8', '|b1']Note how in the first case the type of the name column stays “<U8” while seccond case the type of the Name column changes to <i8.
Changing column names¶
API documentation: tabel.Tabel.columns
.
Just manipulate the columns property directly:
>>> data = [ ["John", "Joe", "Jane"],
... [1.82,1.65,2.15],
... [False,False,True]]
>>> tbl = Tabel(data, columns=['Name','Height','Married'])
>>> tbl
Name | Height | Married
--------+----------+-----------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
>>> tbl.columns = ["First Name", "BMI", "Overweght"]
>>> tbl
First Name | BMI | Overweght
--------------+-------+-------------
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Transposing¶
API documentation: tabel.T
.
Data from database connectors often comes in list of records, a convenience function is available to make the transpose:
>>> from tabel import T
>>> data = [['John', 1.82, False], ['Joe', 1.65, False], ['Jane', 2.15, True]]
>>> tbl = Tabel(T(data))
>>> tbl
0 | 1 | 2
------+------+-----
John | 1.82 | 0
Joe | 1.65 | 0
Jane | 2.15 | 1
3 rows ['<U4', '<f8', '|b1']
Group By¶
API documentation: tabel.Tabel.group_by
.
To group by unique elements in a column or unique combinations of elements in columns provide the column(s) as a list as the first argument. The second argument is a list of tuples for the aggregate functions and their columns:
>>> from tabel import first
>>> tbl = Tabel({'a':[10,20,30, 40]*3, 'b':["100","200"]*6, 'c':[100,200]*6})
>>> tbl
a | b | c
-----+-----+-----
10 | 100 | 100
20 | 200 | 200
30 | 100 | 100
40 | 200 | 200
10 | 100 | 100
20 | 200 | 200
30 | 100 | 100
40 | 200 | 200
10 | 100 | 100
20 | 200 | 200
30 | 100 | 100
40 | 200 | 200
12 rows ['<i8', '<U3', '<i8']
>>> tbl.group_by(['b','a'], [(np.sum, 'a'), (first, 'c')])
b | a | a_sum | c_first
-----+-----+---------+-----------
100 | 10 | 30 | 100
200 | 20 | 60 | 200
100 | 30 | 90 | 100
200 | 40 | 120 | 200
4 rows ['<U3', '<i8', '<i8', '<i8']
first is a convenience function, for when aggregation should just take the first element.
Sorting¶
API documentation:
tabel.Tabel.sort
.>>> tbl = Tabel({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]}) >>> tbl.sort("Name") >>> tbl Name | Height | Married --------+----------+----------- Jane | 2.15 | 1 Joe | 1.65 | 0 John | 1.82 | 0 3 rows ['<U4', '<f8', '|b1']
Note that one can use indexing to reorder in any order:
>>> tbl[[2,1,0],[2,1,0]]
Married | Height | Name
-----------+----------+--------
0 | 1.82 | John
0 | 1.65 | Joe
1 | 2.15 | Jane
3 rows ['|b1', '<f8', '<U4']
Joining¶
API documentation: tabel.Tabel.join
.
To join another Tabel, provide the Tabe and the column or columns to use as a key for joining:
>>> tbl = Tabel({"a":list(range(4)), "b": ['a','b'] *2})
>>> tbl_b = Tabel({"a":list(range(4)), "c": ['d','e'] *2})
>>> tbl.join(tbl_b, "a")
a | b_l | c_r
-----+-------+-------
0 | a | d
1 | b | e
2 | a | d
3 | b | e
4 rows ['<i8', '<U1', '<U1']
Saving¶
API documentation: tabel.Tabel.save
.
Data can be saved to disk in various formats:
>>> tbl = Tabel({'Name' : ["John", "Joe", "Jane"], 'Height' : [1.82,1.65,2.15], 'Married': [False,False,True]})
>>> tbl.save("test.csv", fmt="csv")
I recommend the numpy native ‘npz’ format:
>>> tbl.save("test.npz", fmt="npz")
Reading¶
API documentation: tabel.read_tabel
.
Read from disk:
>>> from tabel import read_tabel
>>> t = read_tabel("test.csv", fmt="csv")
>>> t
Name | Height | Married
--------+----------+-----------
John | 1.82 | False
Joe | 1.65 | False
Jane | 2.15 | True
3 rows ['<U4', '<f8', '<U5']