Create
create feature can create a new dataset on disk or in memory. For the disk option, a .duckdb
or a .pkl
file will be created under the .geistdata/duckdb
or the .geistdata/rdflib
folder with the same name of this dataset. For the memory option, a DuckDBPyConnection
object or a GeistGraph
object (a dictionary with rdf_graph
and infer
keys) will be returned.
You can write a Geist template with the create tag. You can also use CLI or Python API step by step as follows:
The create command has two subcommands, both of which create a new dataset on disk. The dataset name :memory:
is a reserved value for datasets that exist only in memory and is not allowed in the CLI.
Usage: geist create [OPTIONS] COMMAND [ARGS]...
Create a new dataset
Options:
--help Show this message and exit.
Commands:
duckdb Create a new SQL dataset using DuckDB
rdflib Create a new RDF dataset using RDFLib
geist create duckdb [OPTIONS]
Usage: geist create duckdb [OPTIONS]
Create a new SQL dataset using DuckDB
Options:
-d, --dataset TEXT Name of SQL dataset to create (default "kb")
-ifile, --inputfile FILENAME Path of the file to be loaded as a Pandas
DataFrame [required]
-iformat, --inputformat [csv|json]
Format of the file to be loaded as a Pandas
DataFrame (default csv)
-t, --table TEXT Name of the table to be created (default
"df")
--help Show this message and exit.
Example 1: create a test
SQL dataset from stdin
geist create duckdb --dataset test --inputformat csv --table df << __END_INPUT__
v1,v2,v3
1,2,3
4,5,6
7,8,9
__END_INPUT__
Example 2: create a test
dataset from a file
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
geist create duckdb --dataset test --inputfile test.csv --inputformat csv --table df
geist create rdflib [OPTIONS]
Usage: geist create rdflib [OPTIONS]
Create a new RDF dataset
Options:
-d, --dataset TEXT Name of RDF dataset to create (default "kb")
-ifile, --inputfile FILENAME Path of the file to be loaded as triples
[required]
-iformat, --inputformat [xml|n3|turtle|nt|pretty-xml|trix|trig|nquads|json-ld|hext|csv]
Format of the file to be loaded as triples
(default json-ld)
--colnames TEXT Column names of triples with the format of
[[subject1, predicate1, object1], [subject2,
predicate2, object2], ...] when the input
format is csv
--infer [none|rdfs|owl|rdfs_owl]
Inference to perform on update [none, rdfs,
owl, rdfs_owl] (default "none")
--help Show this message and exit.
Example 1: create a test
RDF dataset from stdin
geist create rdflib --dataset test --inputformat nt --infer none << __END_INPUT__
<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .
__END_INPUT__
Example 2: create a test
dataset from a file
Here is the test.nt
file:
<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .
geist create rdflib --dataset test --inputfile test.nt --inputformat nt --infer none
create function can create a new dataset on disk or in memory.
Parameters description for create():
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED |
dataset | string | Name of the dataset to be created. Note that ':memory:' is a reserved value for datasets that exist only in memory |
REQUIRED |
inputfile | string | A file to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Name | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be created | 'df' |
Example 1: create a test
SQL dataset on disk from a string
The .geistdata/duckdb/test.duckdb
file is created and a DuckDBPyConnection
object is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 2: create a test
SQL dataset on disk from a file
The .geistdata/duckdb/test.duckdb
file is created and a DuckDBPyConnection
object is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
Example 3: create a SQL dataset in memory from a string
A DuckDBPyConnection
object is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 4: create a SQL dataset in memory from a file
A DuckDBPyConnection
object is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] |
REQUIRED when inputformat='csv' |
infer | string | Inference to perform on update, i.e., 'none' , 'rdfs' , 'owl' , or 'rdfs_owl' |
'none' |
Example 1: create a test
RDF dataset on disk from a string
The .geistdata/rdflib/test.pkl
file is created and a GeistGraph
object is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 2: create a test
RDF dataset on disk from a file
The .geistdata/rdflib/test.pkl
file is created and a GeistGraph
object is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 3: create a RDF dataset in memory from a string
A GeistGraph
object is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 4: create a RDF dataset in memory from a file
A GeistGraph
object is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile='test.csv', inputformat='csv', isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
create method of the Connection class creates a new dataset on disk or in memory. It is very similar to the create()
function. The only difference is that the datastore
and the dataset
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for create method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
inputfile | string | A file to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Key | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be created | 'df' |
Example 1: create a test
SQL dataset from a string
The .geistdata/duckdb/test.duckdb
file is created and a Connection
instance is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 2: create a test
SQL dataset from a file
The .geistdata/duckdb/test.duckdb
file is created and a Connection
instance is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
Example 3: create a SQL dataset in memory from a string
A Connection
instance is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 4: create a SQL dataset in memory from a file
A Connection
instance is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
datastore: rdflib
Key | Type | Description | Default |
---|---|---|---|
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] . |
REQUIRED when inputformat='csv' ) |
infer | string | Inference to perform on update, i.e., 'none' , 'rdfs' , 'owl' , or 'rdfs_owl' |
'none' |
Example 1: create a test
RDF dataset from a string
The .geistdata/rdflib/test.pkl
file is created and a Connection
instance is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 2: create a test
RDF dataset from a file
The .geistdata/rdflib/test.pkl
file is created and a Connection
instance is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 3: create a RDF dataset in memory from a string
A Connection
instance is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 4: create a RDF dataset in memory from a file
A Connection
instance is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})