Function load

load function can import data into an existing dataset.

Parameters description for query():

Name	Type	Description	Default
datastore	string	A backend datastore, i.e., `'rdflib'` or `'duckdb'`	REQUIRED
dataset	string OR `DuckPyConnection` object OR `GeistGraph` object	Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a `DuckPyConnection` object OR a `GeistGraph` object for dataset in memory	REQUIRED
inputfile	string	File to be loaded	REQUIRED
inputformat	string	Format of the file to be loaded	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: duckdb

Name	Type	Description	Default
table	string	Name of the table to be loaded	REQUIRED

Example: load a table into the test dataset

There exist a file with the path of .geistdata/duckdb/test.duckdb. The csv_str will be imported into the df table. Note that the order of table columns should be consistent with the imported data.

import geist

csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""

# Load csv_str to the df table of the test dataset
geist.load(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})

datastore: rdflib

Name	Type	Description	Default
inmemory	bool	`True` if the new dataset (after loading data) is stored in memory only, otherwise it is stored on disk	`False`
colnames	string	Column names of triples with the format of `[[subject1, predicate1, object1], [subject2, predicate2, object2], ...]`	REQUIRED when `inputformat='csv'`

Example: load a triple into the test dataset

There exist a file with the path of .geistdata/rdflib/test.pkl. The csv_str will be imported into the test RDF dataset.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""

# Load csv_str to the df table of the test dataset
geist.load(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})