Export

export feature can export a dataset stored on disk or in memory.

You can use CLI or Python API as follows:

export command can export a dataset.

There are two subcommands for export:

Usage: geist export [OPTIONS] COMMAND [ARGS]...

  Export a dataset

Options:
  --help  Show this message and exit.

Commands:
  duckdb  Export a SQL dataset
  rdflib  Export an RDF dataset

geist export duckdb [OPTIONS]
Usage: geist export duckdb [OPTIONS]

Export a SQL dataset

Options:
-d, --dataset TEXT              Name of SQL dataset to be exported (default
                                "kb")
-oroot, --outputroot TEXT       Path of the directory to store the exported
                                table (default: current directory). If the
                                given path (i.e., --outputfile) is None or a
                                relative path, it will be ignored.
-ofile, --outputfile TEXT       Path of the file to store the exported table
                                (default: None)
-oformat, --outputformat [csv|json]
                                Format of the exported table (default csv)
-t, --table TEXT                Name of the table to be exported (default
                                "df")
--help                          Show this message and exit.
Example: export the df table in test dataset

By default, the exported table will be printed in terminal:

geist export duckdb --dataset test --table df

geist export rdflib [OPTIONS]
Usage: geist export rdflib [OPTIONS]

Export an RDF dataset

Options:
-d, --dataset TEXT              Name of RDF dataset to be exported (default
                                "kb")
-oroot, --outputroot TEXT       Path of the directory to store these
                                exported triples (default: current
                                directory). If the given path (i.e.,
                                --outputfile) is None or a relative path, it
                                will be ignored.
-ofile, --outputfile TEXT       Path of the file to store these exported
                                triples (default: None)
-oformat, --outputformat [json-ld|n3|nquads|nt|hext|pretty-xml|trig|trix|turtle|longturtle|xml]
                                Format of the exported triples (default nt)
--help                          Show this message and exit.
Example: export the test dataset

By default, the exported triples will be printed in terminal:

geist export rdflib --dataset test

export function can export a dataset.

Parameters description for export():

Name Type Description Default
datastore string A backend datastore, i.e., 'rdflib' or 'duckdb' REQUIRED
dataset string OR DuckPyConnection object OR GeistGraph object Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a DuckPyConnection object OR a GeistGraph object for dataset in memory REQUIRED
hasoutput bool True to export as a file or print it out REQUIRED
config dict A dictionary with configurations for certain backend store see below

Description for the config parameter:

datastore: duckdb
Name Type Description Default
outputroot string Path of the directory to store the exported table './'
outputfile string Path of the file to store the exported table None
outputformat string Format of the exported table, i.e., csv or json 'csv'
table string Name of the table to be exported 'df'
Example 1: export all rows of the df table in test dataset on disk

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named data and a DuckPyConnection object named conn.

import geist

# Export the df table of the test dataset
(data, conn) = geist.export(datastore='duckdb', dataset='test', hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df table in test dataset in memory

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data and the same DuckPyConnection object named conn.

import geist

# Export the df table of the test dataset
(data, conn) = geist.export(datastore='duckdb', dataset=conn, hasoutput=False, config={'table': 'df'})
datastore: rdflib
Name Type Description Default
outputroot string Path of the directory to store these exported triples './'
outputfile string Path of the file to store these exported triples None
outputformat string Format of the exported triples, i.e., 'json-ld', 'n3', 'nquads', 'nt', 'hext', 'pretty-xml', 'trig', 'trix', 'turtle', 'longturtle', or 'xml' 'nt'
Example 1: export all triples of the test dataset on disk

There exist a file with the path of .geistdata/rdflib/test.pkl. The following code returns a string named data and a GeistGraph object named conn.

import geist

# Export all triples of the test dataset
(data, conn) = geist.export(datastore='rdflib', dataset='test', hasoutput=False)
Example 2: export all triples of the test dataset in memory

Suppose conn is a GeistGraph object points to a RDF dataset in memory. The following code returns a string named data and the same GeistGraph object named conn.

import geist

# Export all triples of the test dataset
(data, conn) = geist.export(datastore='rdflib', dataset=conn, hasoutput=False)

export method of the Connection class exports a dataset. It is very similar to the export() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for export method of the Connection class:

Name Type Description Default
hasoutput bool True to export as a file or print it out REQUIRED
config dict A dictionary with configurations for certain backend store see below

Description for the config parameter:

datastore: duckdb
Key Type Description Default
outputroot string Path of the directory to store the exported table './'
outputfile string Path of the file to store the exported table None
outputformat string Format of the exported table, i.e., 'csv' or 'json' 'csv'
table string Name of the table to be exported 'df'
Example 1: export all rows of the df table in test dataset on disk

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named data.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df table in test dataset in memory

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
datastore: rdflib
Key Type Description Default
outputroot string Path of the directory to store these exported triples './'
outputfile string Path of the file to store these exported triples None
outputformat string Format of the exported triples, i.e., 'json-ld', 'n3', 'nquads', 'nt', 'hext', 'pretty-xml', 'trig', 'trix', 'turtle', 'longturtle', or 'xml' 'nt'
Example 1: export all triples of the test dataset on disk

There exist a file with the path of .geistdata/rdflib/test.pkl. The following code returns a string named data.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Export all triples of the test dataset as a string with the 'nt' format
data = connection.export(hasoutput=False)
Example 2: export all triples of the test dataset in memory

Suppose conn is a GeistGraph object points to a RDF dataset in memory. The following code returns a string named data.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False)