Load
load feature can import data into an existing dataset on disk or in memory.
You can write a Geist template with the load tag. You can also use CLI or Python API step by step as follows:
load command can import data into an existing dataset.
There are two subcommands for load:
Usage: geist load [OPTIONS] COMMAND [ARGS]...
Import data into a dataset
Options:
--help Show this message and exit.
Commands:
duckdb Import data into a SQL dataset
rdflib Import data into a RDF dataset
geist load duckdb [OPTIONS]
Usage: geist load duckdb [OPTIONS]
Import data into a SQL dataset
Options:
-d, --dataset TEXT Name of SQL dataset to load a file (default
"kb")
-ifile, --inputfile FILENAME Path of the file to be loaded as a table
[required]
-iformat, --inputformat [csv|json]
Format of the file to be loaded as a table
(default csv)
-t, --table TEXT Name of the table to be created [required]
--help Show this message and exit.
Example: load a file into the test
dataset
geist load duckdb --dataset test --inputfile test_add.csv --inputformat csv --table df
geist load rdflib [OPTIONS]
Here are options of the load command:
Usage: geist load rdflib [OPTIONS]
Import data into a RDF dataset
Options:
-d, --dataset TEXT Name of RDF dataset to load a file (default
"kb")
-ifile, --inputfile FILENAME Path of the file to be loaded as triples
[required]
-iformat, --inputformat [xml|n3|turtle|nt|pretty-xml|trix|trig|nquads|json-ld|hext|csv]
Format of the file to be loaded as triples
(default json-ld)
--colnames TEXT Column names of triples with the format of
[[subject1, predicate1, object1], [subject2,
predicate2, object2], ...] when the input
format is csv
--help Show this message and exit.
Example: load a file into the test
dataset
geist load rdflib --dataset test --inputfile test_add.jsonld
load function can import data into an existing dataset.
Parameters description for query():
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED |
dataset | string OR DuckPyConnection object OR GeistGraph object |
Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a DuckPyConnection object OR a GeistGraph object for dataset in memory |
REQUIRED |
inputfile | string | File to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Name | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be loaded | REQUIRED |
Example: load a table into the test
dataset
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The csv_str
will be imported into the df
table. Note that the order of table columns should be consistent with the imported data.
import geist
csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""
# Load csv_str to the df table of the test dataset
geist.load(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
inmemory | bool | True if the new dataset (after loading data) is stored in memory only, otherwise it is stored on disk |
False |
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] |
REQUIRED when inputformat='csv' |
Example: load a triple into the test
dataset
There exist a file with the path of .geistdata/rdflib/test.pkl
. The csv_str
will be imported into the test
RDF dataset.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""
# Load csv_str to the df table of the test dataset
geist.load(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
load method of the Connection class imports data into an existing dataset on disk or in memory. It is very similar to the load()
function. The only difference is that datastore
, dataset
, and inmemory
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for load method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
inputfile | string | A file to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Name | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be loaded | REQUIRED |
Example: load a table into the test
dataset
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The csv_str
will be imported into the df
table. Note that the order of table columns should be consistent with the imported data.
import geist
csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Load csv_str to the df table of the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] |
REQUIRED when inputformat='csv' |
Example: load a triple into the test
dataset
There exist a file with the path of .geistdata/rdflib/test.pkl
. The csv_str
will be imported into the test
RDF dataset.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""
# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Load csv_str to the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})