Export
export feature can export a dataset stored on disk or in memory.
You can use CLI or Python API as follows:
export command can export a dataset.
There are three subcommands for export:
Usage: geist export [OPTIONS] COMMAND [ARGS]...
Export a dataset
Options:
--help Show this message and exit.
Commands:
clingo Export an ASP dataset
duckdb Export a SQL dataset
rdflib Export an RDF dataset
geist export clingo [OPTIONS]
Usage: geist export clingo [OPTIONS]
Export an ASP dataset
Options:
-d, --dataset TEXT Name of an ASP dataset to be exported
(default "kb")
-oroot, --outputroot TEXT Path of the directory to store the exported
data (default: current directory). If the
given path (i.e., --outputfile) is None or a
relative path, it will be ignored.
-ofile, --outputfile TEXT Path of the file to store the exported data
(default: None). This file can be reused to
create a dataset by setting
inputformat=json.
-rformat, --returnformat [lp|df|dict]
Format of the returned data in memory
(default lp)
-pred, --predicate TEXT Name of the predicate to be exported
(default "predicate")
--help Show this message and exit.
Example: export the test ASP dataset
By default, the exported data will be printed in terminal:
geist export clingo --dataset test
geist export duckdb [OPTIONS]
Usage: geist export duckdb [OPTIONS]
Export a SQL dataset
Options:
-d, --dataset TEXT Name of SQL dataset to be exported (default
"kb")
-oroot, --outputroot TEXT Path of the directory to store the exported
table (default: current directory). If the
given path (i.e., --outputfile) is None or a
relative path, it will be ignored.
-ofile, --outputfile TEXT Path of the file to store the exported table
(default: None)
-oformat, --outputformat [csv|json]
Format of the exported table (default csv)
-t, --table TEXT Name of the table to be exported (default
"df")
--help Show this message and exit.
Example: export the df table in test dataset
By default, the exported table will be printed in terminal:
geist export duckdb --dataset test --table df
geist export rdflib [OPTIONS]
Usage: geist export rdflib [OPTIONS]
Export an RDF dataset
Options:
-d, --dataset TEXT Name of RDF dataset to be exported (default
"kb")
-oroot, --outputroot TEXT Path of the directory to store these
exported triples (default: current
directory). If the given path (i.e.,
--outputfile) is None or a relative path, it
will be ignored.
-ofile, --outputfile TEXT Path of the file to store these exported
triples (default: None)
-oformat, --outputformat [json-ld|n3|nquads|nt|hext|pretty-xml|trig|trix|turtle|longturtle|xml]
Format of the exported triples (default nt)
--help Show this message and exit.
Example: export the test dataset
By default, the exported triples will be printed in terminal:
geist export rdflib --dataset test
export function can export a dataset.
Parameters description for export():
| Name | Type | Description | Default |
|---|---|---|---|
| datastore | string | A backend datastore, i.e., 'clingo', 'duckdb', or 'rdflib' |
REQUIRED |
| dataset | string OR Control object OR DuckPyConnection object OR GeistGraph object |
Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a Control object, a DuckPyConnection object, or a GeistGraph object for dataset in memory |
REQUIRED |
| hasoutput | bool | True to export as a file or print it out |
REQUIRED |
| config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: clingo
| Name | Type | Description | Default |
|---|---|---|---|
| returnformat | string | Format of the returned data, i.e., 'lp', 'df', or 'dict' |
'lp' |
| predicate | string | Name of the predicate to be exported | None |
Example 1: export all facts of the test ASP dataset on disk
There exist a file with the path of .geistdata/clingo/test.pkl. The following code returns data and a Control object named conn.
import geist
# Export all facts of the test ASP dataset
(data, conn) = geist.export(datastore='clingo', dataset='test', hasoutput=False)
Example 2: export all facts of the test ASP dataset in memory
Suppose conn is a Control object pointing to an ASP dataset in memory. The following code returns data and the same Control object named conn.
import geist
# Export all facts of the test ASP dataset
(data, conn) = geist.export(datastore='clingo', dataset=conn, hasoutput=False)
datastore: duckdb
| Name | Type | Description | Default |
|---|---|---|---|
| outputroot | string | Path of the directory to store the exported table | './' |
| outputfile | string | Path of the file to store the exported table | None |
| outputformat | string | Format of the exported table, i.e., csv or json |
'csv' |
| table | string | Name of the table to be exported | 'df' |
Example 1: export all rows of the df table in test dataset on disk
There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named data and a DuckPyConnection object named conn.
import geist
# Export the df table of the test dataset
(data, conn) = geist.export(datastore='duckdb', dataset='test', hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df table in test dataset in memory
Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data and the same DuckPyConnection object named conn.
import geist
# Export the df table of the test dataset
(data, conn) = geist.export(datastore='duckdb', dataset=conn, hasoutput=False, config={'table': 'df'})
datastore: rdflib
| Name | Type | Description | Default |
|---|---|---|---|
| outputroot | string | Path of the directory to store these exported triples | './' |
| outputfile | string | Path of the file to store these exported triples | None |
| outputformat | string | Format of the exported triples, i.e., 'json-ld', 'n3', 'nquads', 'nt', 'hext', 'pretty-xml', 'trig', 'trix', 'turtle', 'longturtle', or 'xml' |
'nt' |
Example 1: export all triples of the test dataset on disk
There exist a file with the path of .geistdata/rdflib/test.pkl. The following code returns a string named data and a GeistGraph object named conn.
import geist
# Export all triples of the test dataset
(data, conn) = geist.export(datastore='rdflib', dataset='test', hasoutput=False)
Example 2: export all triples of the test dataset in memory
Suppose conn is a GeistGraph object points to a RDF dataset in memory. The following code returns a string named data and the same GeistGraph object named conn.
import geist
# Export all triples of the test dataset
(data, conn) = geist.export(datastore='rdflib', dataset=conn, hasoutput=False)
export method of the Connection class exports a dataset. It is very similar to the export() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for export method of the Connection class:
| Name | Type | Description | Default |
|---|---|---|---|
| hasoutput | bool | True to export as a file or print it out |
REQUIRED |
| config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
| Key | Type | Description | Default |
|---|---|---|---|
| outputroot | string | Path of the directory to store the exported table | './' |
| outputfile | string | Path of the file to store the exported table | None |
| outputformat | string | Format of the exported table, i.e., 'csv' or 'json' |
'csv' |
| table | string | Name of the table to be exported | 'df' |
Example 1: export all rows of the df table in test dataset on disk
There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named data.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df table in test dataset in memory
Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
datastore: rdflib
| Key | Type | Description | Default |
|---|---|---|---|
| outputroot | string | Path of the directory to store these exported triples | './' |
| outputfile | string | Path of the file to store these exported triples | None |
| outputformat | string | Format of the exported triples, i.e., 'json-ld', 'n3', 'nquads', 'nt', 'hext', 'pretty-xml', 'trig', 'trix', 'turtle', 'longturtle', or 'xml' |
'nt' |
Example 1: export all triples of the test dataset on disk
There exist a file with the path of .geistdata/rdflib/test.pkl. The following code returns a string named data.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Export all triples of the test dataset as a string with the 'nt' format
data = connection.export(hasoutput=False)
Example 2: export all triples of the test dataset in memory
Suppose conn is a GeistGraph object points to a RDF dataset in memory. The following code returns a string named data.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False)
datastore: clingo
| Key | Type | Description | Default |
|---|---|---|---|
| returnformat | string | Format of the returned data, i.e., 'lp', 'df', or 'dict' |
'lp' |
| predicate | string | Name of the predicate to be exported | None |
Example 1: export all facts of the test ASP dataset on disk
There exist a file with the path of .geistdata/clingo/test.pkl. The following code returns data.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='clingo', dataset='test')
# Export all facts of the test ASP dataset
data = connection.export(hasoutput=False)
Example 2: export all facts of the test ASP dataset in memory
Suppose conn is a Control object pointing to an ASP dataset in memory. The following code returns data.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='clingo', dataset=':memory:', conn=conn)
# Export all facts of the test ASP dataset
data = connection.export(hasoutput=False)