Load

load feature can import data into an existing dataset on disk or in memory.

You can write a Geist template with the load tag. You can also use CLI or Python API step by step as follows:

load command can import data into an existing dataset.

There are two subcommands for load:

Usage: geist load [OPTIONS] COMMAND [ARGS]...

Import data into a dataset

Options:
  --help  Show this message and exit.

Commands:
  duckdb  Import data into a SQL dataset
  rdflib  Import data into a RDF dataset

geist load duckdb [OPTIONS]
Usage: geist load duckdb [OPTIONS]

Import data into a SQL dataset

Options:
-d, --dataset TEXT              Name of SQL dataset to load a file (default
                                "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as a table
                                [required]
-iformat, --inputformat [csv|json]
                                Format of the file to be loaded as a table
                                (default csv)
-t, --table TEXT                Name of the table to be created  [required]
--help                          Show this message and exit.
Example: load a file into the test dataset
geist load duckdb --dataset test --inputfile test_add.csv --inputformat csv --table df
geist load rdflib [OPTIONS]

Here are options of the load command:

Usage: geist load rdflib [OPTIONS]

Import data into a RDF dataset

Options:
-d, --dataset TEXT              Name of RDF dataset to load a file (default
                                "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as triples
                                [required]
-iformat, --inputformat [xml|n3|turtle|nt|pretty-xml|trix|trig|nquads|json-ld|hext|csv]
                                Format of the file to be loaded as triples
                                (default json-ld)
--colnames TEXT                 Column names of triples with the format of
                                [[subject1, predicate1, object1], [subject2,
                                predicate2, object2], ...] when the input
                                format is csv
--help                          Show this message and exit.

Example: load a file into the test dataset
geist load rdflib --dataset test --inputfile test_add.jsonld

load function can import data into an existing dataset.

Parameters description for query():

Name Type Description Default
datastore string A backend datastore, i.e., 'rdflib' or 'duckdb' REQUIRED
dataset string OR DuckPyConnection object OR GeistGraph object Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a DuckPyConnection object OR a GeistGraph object for dataset in memory REQUIRED
inputfile string File to be loaded REQUIRED
inputformat string Format of the file to be loaded REQUIRED
isinputpath bool True if the inputfile is the file path, otherwise the inputfile is the content REQUIRED
config dict A dictionary with configurations for certain backend store see below

Description for the config parameter:

datastore: duckdb
Name Type Description Default
table string Name of the table to be loaded REQUIRED
Example: load a table into the test dataset

There exist a file with the path of .geistdata/duckdb/test.duckdb. The csv_str will be imported into the df table. Note that the order of table columns should be consistent with the imported data.

import geist

csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""

# Load csv_str to the df table of the test dataset
geist.load(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})
datastore: rdflib
Name Type Description Default
inmemory bool True if the new dataset (after loading data) is stored in memory only, otherwise it is stored on disk False
colnames string Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] REQUIRED when inputformat='csv'
Example: load a triple into the test dataset

There exist a file with the path of .geistdata/rdflib/test.pkl. The csv_str will be imported into the test RDF dataset.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""

# Load csv_str to the df table of the test dataset
geist.load(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

load method of the Connection class imports data into an existing dataset on disk or in memory. It is very similar to the load() function. The only difference is that datastore, dataset, and inmemory parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for load method of the Connection class:

Name Type Description Default
inputfile string A file to be loaded REQUIRED
inputformat string Format of the file to be loaded REQUIRED
isinputpath bool True if the inputfile is the file path, otherwise the inputfile is the content REQUIRED
config dict A dictionary with configurations for certain backend store see below

Description for the config parameter:

datastore: duckdb
Name Type Description Default
table string Name of the table to be loaded REQUIRED
Example: load a table into the test dataset

There exist a file with the path of .geistdata/duckdb/test.duckdb. The csv_str will be imported into the df table. Note that the order of table columns should be consistent with the imported data.

import geist

csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Load csv_str to the df table of the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})
datastore: rdflib
Name Type Description Default
colnames string Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] REQUIRED when inputformat='csv'
Example: load a triple into the test dataset

There exist a file with the path of .geistdata/rdflib/test.pkl. The csv_str will be imported into the test RDF dataset.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""

# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Load csv_str to the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})