Export
export feature can export a dataset stored on disk or in memory.
You can use CLI or Python API as follows:
export command can export a dataset.
There are two subcommands for export:
Usage: geist export [OPTIONS] COMMAND [ARGS]...
Export a dataset
Options:
--help Show this message and exit.
Commands:
duckdb Export a SQL dataset
rdflib Export an RDF dataset
geist export duckdb [OPTIONS]
Usage: geist export duckdb [OPTIONS]
Export a SQL dataset
Options:
-d, --dataset TEXT Name of SQL dataset to be exported (default
"kb")
-oroot, --outputroot TEXT Path of the directory to store the exported
table (default: current directory). If the
given path (i.e., --outputfile) is None or a
relative path, it will be ignored.
-ofile, --outputfile TEXT Path of the file to store the exported table
(default: None)
-oformat, --outputformat [csv|json]
Format of the exported table (default csv)
-t, --table TEXT Name of the table to be exported (default
"df")
--help Show this message and exit.
Example: export the df
table in test
dataset
By default, the exported table will be printed in terminal:
geist export duckdb --dataset test --table df
geist export rdflib [OPTIONS]
Usage: geist export rdflib [OPTIONS]
Export an RDF dataset
Options:
-d, --dataset TEXT Name of RDF dataset to be exported (default
"kb")
-oroot, --outputroot TEXT Path of the directory to store these
exported triples (default: current
directory). If the given path (i.e.,
--outputfile) is None or a relative path, it
will be ignored.
-ofile, --outputfile TEXT Path of the file to store these exported
triples (default: None)
-oformat, --outputformat [json-ld|n3|nquads|nt|hext|pretty-xml|trig|trix|turtle|longturtle|xml]
Format of the exported triples (default nt)
--help Show this message and exit.
Example: export the test
dataset
By default, the exported triples will be printed in terminal:
geist export rdflib --dataset test
export function can export a dataset.
Parameters description for export():
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED |
dataset | string OR DuckPyConnection object OR GeistGraph object |
Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a DuckPyConnection object OR a GeistGraph object for dataset in memory |
REQUIRED |
hasoutput | bool | True to export as a file or print it out |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Name | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store the exported table | './' |
outputfile | string | Path of the file to store the exported table | None |
outputformat | string | Format of the exported table, i.e., csv or json |
'csv' |
table | string | Name of the table to be exported | 'df' |
Example 1: export all rows of the df
table in test
dataset on disk
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named data
and a DuckPyConnection
object named conn
.
import geist
# Export the df table of the test dataset
(data, conn) = geist.export(datastore='duckdb', dataset='test', hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df
table in test
dataset in memory
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data
and the same DuckPyConnection
object named conn
.
import geist
# Export the df table of the test dataset
(data, conn) = geist.export(datastore='duckdb', dataset=conn, hasoutput=False, config={'table': 'df'})
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store these exported triples | './' |
outputfile | string | Path of the file to store these exported triples | None |
outputformat | string | Format of the exported triples, i.e., 'json-ld' , 'n3' , 'nquads' , 'nt' , 'hext' , 'pretty-xml' , 'trig' , 'trix' , 'turtle' , 'longturtle' , or 'xml' |
'nt' |
Example 1: export all triples of the test
dataset on disk
There exist a file with the path of .geistdata/rdflib/test.pkl
. The following code returns a string named data
and a GeistGraph
object named conn
.
import geist
# Export all triples of the test dataset
(data, conn) = geist.export(datastore='rdflib', dataset='test', hasoutput=False)
Example 2: export all triples of the test
dataset in memory
Suppose conn
is a GeistGraph
object points to a RDF dataset in memory. The following code returns a string named data
and the same GeistGraph
object named conn
.
import geist
# Export all triples of the test dataset
(data, conn) = geist.export(datastore='rdflib', dataset=conn, hasoutput=False)
export method of the Connection class exports a dataset. It is very similar to the export()
function. The only difference is that the datastore
and the dataset
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for export method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
hasoutput | bool | True to export as a file or print it out |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Key | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store the exported table | './' |
outputfile | string | Path of the file to store the exported table | None |
outputformat | string | Format of the exported table, i.e., 'csv' or 'json' |
'csv' |
table | string | Name of the table to be exported | 'df' |
Example 1: export all rows of the df
table in test
dataset on disk
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named data
.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df
table in test
dataset in memory
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data
.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
datastore: rdflib
Key | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store these exported triples | './' |
outputfile | string | Path of the file to store these exported triples | None |
outputformat | string | Format of the exported triples, i.e., 'json-ld' , 'n3' , 'nquads' , 'nt' , 'hext' , 'pretty-xml' , 'trig' , 'trix' , 'turtle' , 'longturtle' , or 'xml' |
'nt' |
Example 1: export all triples of the test
dataset on disk
There exist a file with the path of .geistdata/rdflib/test.pkl
. The following code returns a string named data
.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Export all triples of the test dataset as a string with the 'nt' format
data = connection.export(hasoutput=False)
Example 2: export all triples of the test
dataset in memory
Suppose conn
is a GeistGraph
object points to a RDF dataset in memory. The following code returns a string named data
.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False)