go-tablib: format-agnostic tabular dataset library
Go-Tablib is a format-agnostic tabular dataset library, written in Go. This is a port of the famous Python’s tablib by Kenneth Reitz with some new features.
Export formats supported:
- JSON (Sets + Books)
- YAML (Sets + Books)
- XLSX (Sets + Books)
- XML (Sets + Books)
- TSV (Sets)
- CSV (Sets)
- ASCII + Markdown (Sets)
- MySQL (Sets)
- Postgres (Sets)
Loading formats supported:
- JSON (Sets + Books)
- YAML (Sets + Books)
- XML (Sets)
- CSV (Sets)
- TSV (Sets)
Overview
tablib.Dataset
A Dataset is a table of tabular data. It must have a header row. Datasets can be exported to JSON, YAML, CSV, TSV, and XML. They can be filtered, sorted and validated against constraint on columns.
tablib.Databook
A Databook is a set of Datasets. The most common form of a Databook is an Excel file with multiple spreadsheets. Databooks can be exported to JSON, YAML and XML.
tablib.Exportable
An exportable is a struct that holds a buffer representing the Databook or Dataset after it has been formated to any of the supported export formats.
At this point the Datbook or Dataset cannot be modified anymore, but it can be returned as a string
, a []byte
or written to a io.Writer
or a file.
Usage
Creates a dataset and populate it:
ds := NewDataset([]string{"firstName", "lastName"})
Add new rows:
ds.Append([]interface{}{"John", "Adams"})
ds.AppendValues("George", "Washington")
Add new columns:
ds.AppendColumn("age", []interface{}{90, 67})
ds.AppendColumnValues("sex", "male", "male")
Add a dynamic column, by passing a function which has access to the current row, and must return a value:
func lastNameLen(row []interface{}) interface{} {
return len(row[1].(string))
}
ds.AppendDynamicColumn("lastName length", lastNameLen)
ds.CSV()
// >>
// firstName, lastName, age, sex, lastName length
// John, Adams, 90, male, 5
// George, Washington, 67, male, 10
Delete rows:
ds.DeleteRow(1) // starts at 0
Delete columns:
ds.DeleteColumn("sex")
Get a row or multiple rows:
row, _ := ds.Row(0)
fmt.Println(row["firstName"]) // George
rows, _ := ds.Rows(0, 1)
fmt.Println(rows[0]["firstName"]) // George
fmt.Println(rows[1]["firstName"]) // Thomas
Slice a Dataset:
newDs, _ := ds.Slice(1, 5) // returns a fresh Dataset with rows [1..5[
Filtering
You can add tags to rows by using a specific Dataset
method. This allows you to filter your Dataset
later. This can be useful to separate rows of data based on arbitrary criteria (e.g. origin) that you don’t want to include in your Dataset
.
ds := NewDataset([]string{"Maker", "Model"})
ds.AppendTagged([]interface{}{"Porsche", "911"}, "fast", "luxury")
ds.AppendTagged([]interface{}{"Skoda", "Octavia"}, "family")
ds.AppendTagged([]interface{}{"Ferrari", "458"}, "fast", "luxury")
ds.AppendValues("Citroen", "Picasso")
ds.AppendValues("Bentley", "Continental")
ds.Tag(4, "luxury") // Bentley
ds.AppendValuesTagged("Aston Martin", "DB9", /* these are tags */ "fast", "luxury")
Filtering the Dataset
is possible by calling Filter(column)
:
luxuryCars, err := ds.Filter("luxury").CSV()
fmt.Println(luxuryCars)
// >>>
// Maker,Model
// Porsche,911
// Ferrari,458
// Bentley,Continental
// Aston Martin,DB9
fastCars, err := ds.Filter("fast").CSV()
fmt.Println(fastCars)
// >>>
// Maker,Model
// Porsche,911
// Ferrari,458
// Aston Martin,DB9
Tags at a specific row can be retrieved by calling Dataset.Tags(index int)
Sorting
Datasets can be sorted by a specific column.
ds := NewDataset([]string{"Maker", "Model", "Year"})
ds.AppendValues("Porsche", "991", 2012)
ds.AppendValues("Skoda", "Octavia", 2011)
ds.AppendValues("Ferrari", "458", 2009)
ds.AppendValues("Citroen", "Picasso II", 2013)
ds.AppendValues("Bentley", "Continental GT", 2003)
sorted, err := ds.Sort("Year").CSV()
fmt.Println(sorted)
// >>
// Maker, Model, Year
// Bentley, Continental GT, 2003
// Ferrari, 458, 2009
// Skoda, Octavia, 2011
// Porsche, 991, 2012
// Citroen, Picasso II, 2013
Constraining
Datasets can have columns constrained by functions and further checked if valid.
ds := NewDataset([]string{"Maker", "Model", "Year"})
ds.AppendValues("Porsche", "991", 2012)
ds.AppendValues("Skoda", "Octavia", 2011)
ds.AppendValues("Ferrari", "458", 2009)
ds.AppendValues("Citroen", "Picasso II", 2013)
ds.AppendValues("Bentley", "Continental GT", 2003)
ds.ConstrainColumn("Year", func(val interface{}) bool { return val.(int) > 2008 })
ds.ValidFailFast() // false
if !ds.Valid() { // validate the whole dataset, errors are retrieved in Dataset.ValidationErrors
ds.ValidationErrors[0] // Row: 4, Column: 2
}
A Dataset with constrained columns can be filtered to keep only the rows satisfying the constraints.
valid := ds.ValidSubset().Tabular("simple") // Cars after 2008
fmt.Println(valid)
Will output:
------------ --------------- ---------
Maker Model Year
------------ --------------- ---------
Porsche 991 2012
Skoda Octavia 2011
Ferrari 458 2009
Citroen Picasso II 2013
------------ --------------- ---------
invalid := ds.InvalidSubset().Tabular("simple") // Cars before 2008
fmt.Println(invalid)
Will output:
------------ ------------------- ---------
Maker Model Year
------------ ------------------- ---------
Bentley Continental GT 2003
------------ ------------------- ---------
Loading
JSON
ds, _ := LoadJSON([]byte(`[
{"age":90,"firstName":"John","lastName":"Adams"},
{"age":67,"firstName":"George","lastName":"Washington"},
{"age":83,"firstName":"Henry","lastName":"Ford"}
]`))
YAML
ds, _ := LoadYAML([]byte(`- age: 90
firstName: John
lastName: Adams
- age: 67
firstName: George
lastName: Washington
- age: 83
firstName: Henry
lastName: Ford`))
Exports
Exportable
Any of the following export format returns an *Exportable
which means you can use:
- Bytes()
to get the content as a byte array
- String()
to get the content as a string
- WriteTo(io.Writer)
to write the content to an io.Writer
- WriteFile(filename string, perm os.FileMode)
to write to a file
It avoids unnecessary conversion between string
and []byte
to output/write/whatever.
Thanks to @figlief for the proposition.
JSON
json, _ := ds.JSON()
fmt.Println(json)
Will output:
[{"age":90,"firstName":"John","lastName":"Adams"},{"age":67,"firstName":"George","lastName":"Washington"},{"age":83,"firstName":"Henry","lastName":"Ford"}]
XML
xml, _ := ds.XML()
fmt.Println(xml)
Will ouput:
<dataset>
<row>
<age>90</age>
<firstName>John</firstName>
<lastName>Adams</lastName>
</row> <row>
<age>67</age>
<firstName>George</firstName>
<lastName>Washington</lastName>
</row> <row>
<age>83</age>
<firstName>Henry</firstName>
<lastName>Ford</lastName>
</row>
</dataset>
CSV
csv, _ := ds.CSV()
fmt.Println(csv)
Will ouput:
firstName,lastName,age
John,Adams,90
George,Washington,67
Henry,Ford,83
TSV
tsv, _ := ds.TSV()
fmt.Println(tsv)
Will ouput:
firstName lastName age
John Adams 90
George Washington 67
Henry Ford 83
YAML
yaml, _ := ds.YAML()
fmt.Println(yaml)
Will ouput:
- age: 90
firstName: John
lastName: Adams
- age: 67
firstName: George
lastName: Washington
- age: 83
firstName: Henry
lastName: Ford
HTML
html := ds.HTML()
fmt.Println(html)
Will output:
<table class="table table-striped">
<thead>
<tr>
<th>firstName</th>
<th>lastName</th>
<th>age</th>
</tr>
</thead>
<tbody>
<tr>
<td>George</td>
<td>Washington</td>
<td>90</td>
</tr>
<tr>
<td>Henry</td>
<td>Ford</td>
<td>67</td>
</tr>
<tr>
<td>Foo</td>
<td>Bar</td>
<td>83</td>
</tr>
</tbody>
</table>
XLSX
xlsx, _ := ds.XLSX()
fmt.Println(xlsx)
// >>>
// binary content
xlsx.WriteTo(...)
ASCII
Grid format
ascii := ds.Tabular("grid" /* tablib.TabularGrid */)
fmt.Println(ascii)
Will output:
+--------------+---------------+--------+
| firstName | lastName | age |
+==============+===============+========+
| George | Washington | 90 |
+--------------+---------------+--------+
| Henry | Ford | 67 |
+--------------+---------------+--------+
| Foo | Bar | 83 |
+--------------+---------------+--------+
Simple format
ascii := ds.Tabular("simple" /* tablib.TabularSimple */)
fmt.Println(ascii)
Will output:
-------------- --------------- --------
firstName lastName age
-------------- --------------- --------
George Washington 90
Henry Ford 67
Foo Bar 83
-------------- --------------- --------
Condensed format
ascii := ds.Tabular("condensed" /* tablib.TabularCondensed */)
fmt.Println(ascii)
Similar to simple but with less line feed:
-------------- --------------- --------
firstName lastName age
-------------- --------------- --------
George Washington 90
Henry Ford 67
Foo Bar 83
-------------- --------------- --------
Markdown
Markdown tables are similar to the Tabular condensed format, except that they have pipe characters separating columns.
mkd := ds.Markdown() // or
mkd := ds.Tabular("markdown" /* tablib.TabularMarkdown */)
fmt.Println(mkd)
Will output:
| firstName | lastName | gpa |
| -------------- | --------------- | ------- |
| John | Adams | 90 |
| George | Washington | 67 |
| Thomas | Jefferson | 50 |
Which equals to the following when rendered as HTML:
firstName | lastName | gpa |
---|---|---|
John | Adams | 90 |
George | Washington | 67 |
Thomas | Jefferson | 50 |
MySQL
sql := ds.MySQL()
fmt.Println(sql)
Will output:
CREATE TABLE IF NOT EXISTS presidents
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
firstName VARCHAR(9),
lastName VARCHAR(8),
gpa DOUBLE
);
INSERT INTO presidents VALUES(1, 'Jacques', 'Chirac', 88);
INSERT INTO presidents VALUES(2, 'Nicolas', 'Sarkozy', 98);
INSERT INTO presidents VALUES(3, 'François', 'Hollande', 34);
COMMIT;
Numeric (uint
, int
, float
, …) are stored as DOUBLE
, string
s as VARCHAR
with width set to the length of the longest string in the column, and time.Time
s are stored as TIMESTAMP
.
Postgres
sql := ds.Postgres()
fmt.Println(sql)
Will output:
CREATE TABLE IF NOT EXISTS presidents
(
id SERIAL PRIMARY KEY,
firstName TEXT,
lastName TEXT,
gpa NUMERIC
);
INSERT INTO presidents VALUES(1, 'Jacques', 'Chirac', 88);
INSERT INTO presidents VALUES(2, 'Nicolas', 'Sarkozy', 98);
INSERT INTO presidents VALUES(3, 'François', 'Hollande', 34);
COMMIT;
Numerics (uint
, int
, float
, …) are stored as NUMERIC
, string
s as TEXT
and time.Time
s are stored as TIMESTAMP
.
Databooks
This is an example of how to use Databooks.
db := NewDatabook()
// or loading a JSON content
db, err := LoadDatabookJSON([]byte(`...`))
// or a YAML content
db, err := LoadDatabookYAML([]byte(`...`))
// a dataset of presidents
presidents, _ := LoadJSON([]byte(`[
{"Age":90,"First name":"John","Last name":"Adams"},
{"Age":67,"First name":"George","Last name":"Washington"},
{"Age":83,"First name":"Henry","Last name":"Ford"}
]`))
// a dataset of cars
cars := NewDataset([]string{"Maker", "Model", "Year"})
cars.AppendValues("Porsche", "991", 2012)
cars.AppendValues("Skoda", "Octavia", 2011)
cars.AppendValues("Ferrari", "458", 2009)
cars.AppendValues("Citroen", "Picasso II", 2013)
cars.AppendValues("Bentley", "Continental GT", 2003)
// add the sheets to the Databook
db.AddSheet("Cars", cars.Sort("Year"))
db.AddSheet("Presidents", presidents.SortReverse("Age"))
fmt.Println(db.JSON())
Will output the following JSON representation of the Databook:
[
{
"title": "Cars",
"data": [
{"Maker":"Bentley","Model":"Continental GT","Year":2003},
{"Maker":"Ferrari","Model":"458","Year":2009},
{"Maker":"Skoda","Model":"Octavia","Year":2011},
{"Maker":"Porsche","Model":"991","Year":2012},
{"Maker":"Citroen","Model":"Picasso II","Year":2013}
]
},
{
"title": "Presidents",
"data": [
{"Age":90,"First name":"John","Last name":"Adams"},
{"Age":83,"First name":"Henry","Last name":"Ford"},
{"Age":67,"First name":"George","Last name":"Washington"}
]
}
]
Installation
go get github.com/agrison/go-tablib
For those wanting the v1 version where export methods returned a string
and not an Exportable
:
go get gopkg.in/agrison/go-tablib.v1
TODO
- Loading in more formats
- Support more formats: DBF, XLS, LATEX, …
Contribute
It is a work in progress, so it may exist some bugs and edge cases not covered by the test suite.
But we’re on Github and this is Open Source, pull requests are more than welcomed, come and have some fun :)
Acknowledgement
Thanks to kennethreitz for the first implementation in Python, github.com/bndr/gotabulate
, github.com/clbanning/mxj
, github.com/tealeg/xlsx
, gopkg.in/yaml.v2