Create

create feature can create a new dataset on disk or in memory. For the disk option, a .duckdb or a .pkl file will be created under the .geistdata/duckdb or the .geistdata/rdflib folder with the same name of this dataset. For the memory option, a DuckDBPyConnection object or a GeistGraph object (a dictionary with rdf_graph and infer keys) will be returned.

You can write a Geist template with the create tag. You can also use CLI or Python API step by step as follows:

CLI: create commandPython API: create functionPython API: create method of the Connection Class

The create command has three subcommands, all of which create a new dataset on disk. The dataset name :memory: is a reserved value for datasets that exist only in memory and is not allowed in the CLI.

Usage: geist create [OPTIONS] COMMAND [ARGS]...

Create a new dataset

Options:
--help  Show this message and exit.

Commands:
clingo  Create a new ASP dataset using Clingo
duckdb  Create a new SQL dataset using DuckDB
rdflib  Create a new RDF dataset using RDFLib

geist create clingo [OPTIONS]

Usage: geist create clingo [OPTIONS]

Create a new ASP dataset using Clingo

Options:
-d, --dataset TEXT              Name of ASP dataset to create (default "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as facts,
                                rules, and contraints  [required]
-iformat, --inputformat [lp|csv|json]
                                Format of the file to be loaded as facts,
                                rules, and constraints. Note that "csv" only
                                supports facts (default "lp"). If multiple
                                possibilities are provided (as a list), only
                                the first one will be considered.
-pred, --predicate TEXT         "isfirstcol" for using the first column as
                                the predicate name; strings other than
                                "isfirstcol" are used as the predicate name
                                directly (default: "isfirstcol")
-prog, --programname TEXT       Name of the program (default: "base")
--help                          Show this message and exit.

Example 1: create a test ASP dataset from stdin with LP format

geist create clingo --dataset test --inputformat lp << __END_INPUT__
friends(a, b).
friends(a, c).
__END_INPUT__

Example 2: create a test ASP dataset from a file with LP format

Here is the friends.lp file:

friends(a, b).
friends(a, c).

Code:

geist create clingo --dataset test --inputfile friends.lp --inputformat lp

Example 3: create a test ASP dataset from a CSV file

Here is the friends.csv file:

arg1,arg2
a,b
a,c

Code:

geist create clingo --dataset test --inputfile friends.csv --inputformat csv --predicate friends

geist create duckdb [OPTIONS]

Usage: geist create duckdb [OPTIONS]

Create a new SQL dataset using DuckDB

Options:
-d, --dataset TEXT              Name of SQL dataset to create (default "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as a Pandas
                                DataFrame  [required]
-iformat, --inputformat [csv|json]
                                Format of the file to be loaded as a Pandas
                                DataFrame (default csv)
-t, --table TEXT                Name of the table to be created (default
                                "df")
--help                          Show this message and exit.

Example 1: create a test SQL dataset from stdin

geist create duckdb --dataset test --inputformat csv --table df << __END_INPUT__
v1,v2,v3
1,2,3
4,5,6
7,8,9
__END_INPUT__

Example 2: create a test dataset from a file

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

geist create duckdb --dataset test --inputfile test.csv --inputformat csv --table df

geist create rdflib [OPTIONS]

Usage: geist create rdflib [OPTIONS]

Create a new RDF dataset

Options:
-d, --dataset TEXT              Name of RDF dataset to create (default "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as triples
                                [required]
-iformat, --inputformat [xml|n3|turtle|nt|pretty-xml|trix|trig|nquads|json-ld|hext|csv]
                                Format of the file to be loaded as triples
                                (default json-ld)
--colnames TEXT                 Column names of triples with the format of
                                [[subject1, predicate1, object1], [subject2,
                                predicate2, object2], ...] when the input
                                format is csv
--infer [none|rdfs|owl|rdfs_owl]
                                Inference to perform on update [none, rdfs,
                                owl, rdfs_owl] (default "none")
--help                          Show this message and exit.

Example 1: create a test RDF dataset from stdin

geist create rdflib --dataset test --inputformat nt --infer none << __END_INPUT__

<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .

__END_INPUT__

Example 2: create a test dataset from a file

Here is the test.nt file:

<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .

Code:

geist create rdflib --dataset test --inputfile test.nt --inputformat nt --infer none

create function can create a new dataset on disk or in memory.

Parameters description for create():

Name	Type	Description	Default
datastore	string	A backend datastore, i.e., `'clingo'`, `'duckdb'`, or `'rdflib'`	REQUIRED
dataset	string	Name of the dataset to be created. Note that `':memory:'` is a reserved value for datasets that exist only in memory	REQUIRED
inputfile	string	A file to be loaded	REQUIRED
inputformat	string	Format of the file to be loaded	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: clingo

Name	Type	Description	Default
predicate	string	`'isfirstcol'` for using the first column as the predicate name; strings other than `'isfirstcol'` are used as the predicate name directly	`'isfirstcol'`
programname	string	Name of the program	`'base'`

Example 1: create a test ASP dataset on disk from a string

The .geistdata/clingo/test.pkl file is created and a Control object is returned.

import geist

lp_str = """
friends(a, b).
friends(a, c).
"""

# Create a Control object
conn = geist.create(datastore='clingo', dataset='test', inputfile=lp_str, inputformat="lp", isinputpath=False)

Example 2: create a test ASP dataset on disk from a file

The .geistdata/clingo/test.pkl file is created and a Control object is returned.

Here is the friends.lp file:

friends(a, b).
friends(a, c).

Code:

import geist

# Create a Control object
conn = geist.create(datastore='clingo', dataset='test', inputfile="friends.lp", inputformat="lp", isinputpath=True)

Example 3: create an ASP dataset in memory from a string

A Control object is returned.

import geist

lp_str = """
friends(a, b).
friends(a, c).
"""

# Create a Control object
conn = geist.create(datastore='clingo', dataset=':memory:', inputfile=lp_str, inputformat="lp", isinputpath=False)

datastore: duckdb

Name	Type	Description	Default
table	string	Name of the table to be created	`'df'`

Example 1: create a test SQL dataset on disk from a string

The .geistdata/duckdb/test.duckdb file is created and a DuckDBPyConnection object is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 2: create a test SQL dataset on disk from a file

The .geistdata/duckdb/test.duckdb file is created and a DuckDBPyConnection object is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

Example 3: create a SQL dataset in memory from a string

A DuckDBPyConnection object is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 4: create a SQL dataset in memory from a file

A DuckDBPyConnection object is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

datastore: rdflib

Name	Type	Description	Default
colnames	string	Column names of triples with the format of `[[subject1, predicate1, object1], [subject2, predicate2, object2], ...]`	REQUIRED when `inputformat='csv'`
infer	string	Inference to perform on update, i.e., `'none'`, `'rdfs'`, `'owl'`, or `'rdfs_owl'`	`'none'`

Example 1: create a test RDF dataset on disk from a string

The .geistdata/rdflib/test.pkl file is created and a GeistGraph object is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 2: create a test RDF dataset on disk from a file

The .geistdata/rdflib/test.pkl file is created and a GeistGraph object is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 3: create a RDF dataset in memory from a string

A GeistGraph object is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 4: create a RDF dataset in memory from a file

A GeistGraph object is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile='test.csv', inputformat='csv', isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

create method of the Connection class creates a new dataset on disk or in memory. It is very similar to the create() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for create method of the Connection class:

Name	Type	Description	Default
inputfile	string	A file to be loaded	REQUIRED
inputformat	string	Format of the file to be loaded	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: duckdb

Key	Type	Description	Default
table	string	Name of the table to be created	`'df'`

Example 1: create a test SQL dataset from a string

The .geistdata/duckdb/test.duckdb file is created and a Connection instance is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 2: create a test SQL dataset from a file

The .geistdata/duckdb/test.duckdb file is created and a Connection instance is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

Example 3: create a SQL dataset in memory from a string

A Connection instance is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 4: create a SQL dataset in memory from a file

A Connection instance is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

datastore: rdflib

Key	Type	Description	Default
colnames	string	Column names of triples with the format of `[[subject1, predicate1, object1], [subject2, predicate2, object2], ...]`.	REQUIRED when `inputformat='csv'`)
infer	string	Inference to perform on update, i.e., `'none'`, `'rdfs'`, `'owl'`, or `'rdfs_owl'`	`'none'`

Example 1: create a test RDF dataset from a string

The .geistdata/rdflib/test.pkl file is created and a Connection instance is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 2: create a test RDF dataset from a file

The .geistdata/rdflib/test.pkl file is created and a Connection instance is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 3: create a RDF dataset in memory from a string

A Connection instance is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 4: create a RDF dataset in memory from a file

A Connection instance is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

datastore: clingo

Key	Type	Description	Default
predicate	string	`'isfirstcol'` for using the first column as the predicate name; strings other than `'isfirstcol'` are used as the predicate name directly	`'isfirstcol'`
programname	string	Name of the program	`'base'`

Example 1: create a test ASP dataset from a string

The .geistdata/clingo/test.pkl file is created and a Connection instance is returned.

import geist

lp_str = """
friends(a, b).
friends(a, c).
"""

# Create a Connection instance
connection = geist.Connection(datastore='clingo', dataset='test')
connection.create(inputfile=lp_str, inputformat="lp", isinputpath=False)

Example 2: create a test ASP dataset from a file

The .geistdata/clingo/test.pkl file is created and a Connection instance is returned.

Here is the friends.lp file:

friends(a, b).
friends(a, c).

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='clingo', dataset='test')
connection.create(inputfile="friends.lp", inputformat="lp", isinputpath=True)

Example 3: create an ASP dataset in memory from a string

A Connection instance is returned.

import geist

lp_str = """
friends(a, b).
friends(a, c).
"""

# Create a Connection instance
connection = geist.Connection(datastore='clingo', dataset=':memory:')
connection.create(inputfile=lp_str, inputformat="lp", isinputpath=False)