Class Connection

Connection class can interact with a dataset with create, close, destroy, export, graph, load, and query methods.

What is a Connection class?

A Connection class has three attributes:

Name	Type	Description	Default
datastore	string	A backend datastore, i.e., `'rdflib'` or `'duckdb'`	REQUIRED
dataset	string	Name of the dataset. Note that `':memory:'` is a reserved value for datasets that exist only in memory	REQUIRED
conn	object	A `DuckPyConnection` object OR a `GeistGraph` object	`None`

How to instantiate a Connection class?

If the dataset exists, the Connection class can be instantiated using its connect method:

# create a Connection object to an existing dataset named test
connection = geist.Connection.connect(datastore='duckdb', dataset='test')

If the dataset does not exist, there are two approaches to create and connect:

Approach 1: use the create function, then initialize the Connection class

create function

create function can create a new dataset on disk or in memory.

Parameters description for create():

Name	Type	Description	Default
datastore	string	A backend datastore, i.e., `'rdflib'` or `'duckdb'`	REQUIRED
dataset	string	Name of the dataset to be created. Note that `':memory:'` is a reserved value for datasets that exist only in memory	REQUIRED
inputfile	string	A file to be loaded	REQUIRED
inputformat	string	Format of the file to be loaded	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: duckdb

Name	Type	Description	Default
table	string	Name of the table to be created	`'df'`

Example 1: create a test SQL dataset on disk from a string

The .geistdata/duckdb/test.duckdb file is created and a DuckDBPyConnection object is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 2: create a test SQL dataset on disk from a file

The .geistdata/duckdb/test.duckdb file is created and a DuckDBPyConnection object is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

Example 3: create a SQL dataset in memory from a string

A DuckDBPyConnection object is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 4: create a SQL dataset in memory from a file

A DuckDBPyConnection object is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

datastore: rdflib

Name	Type	Description	Default
colnames	string	Column names of triples with the format of `[[subject1, predicate1, object1], [subject2, predicate2, object2], ...]`	REQUIRED when `inputformat='csv'`
infer	string	Inference to perform on update, i.e., `'none'`, `'rdfs'`, `'owl'`, or `'rdfs_owl'`	`'none'`

Example 1: create a test RDF dataset on disk from a string

The .geistdata/rdflib/test.pkl file is created and a GeistGraph object is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 2: create a test RDF dataset on disk from a file

The .geistdata/rdflib/test.pkl file is created and a GeistGraph object is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 3: create a RDF dataset in memory from a string

A GeistGraph object is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 4: create a RDF dataset in memory from a file

A GeistGraph object is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile='test.csv', inputformat='csv', isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

import geist

csv_str = """
v1,v2,v3
1,2,3
7,8,9
"""

# create a Connection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)

Approach 2: use the create method of the Connection class

create method of the Connection class

create method of the Connection class creates a new dataset on disk or in memory. It is very similar to the create() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for create method of the Connection class:

Name	Type	Description	Default
inputfile	string	A file to be loaded	REQUIRED
inputformat	string	Format of the file to be loaded	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: duckdb

Key	Type	Description	Default
table	string	Name of the table to be created	`'df'`

Example 1: create a test SQL dataset from a string

The .geistdata/duckdb/test.duckdb file is created and a Connection instance is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 2: create a test SQL dataset from a file

The .geistdata/duckdb/test.duckdb file is created and a Connection instance is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

Example 3: create a SQL dataset in memory from a string

A Connection instance is returned.

import geist

csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

Example 4: create a SQL dataset in memory from a file

A Connection instance is returned.

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})

datastore: rdflib

Key	Type	Description	Default
colnames	string	Column names of triples with the format of `[[subject1, predicate1, object1], [subject2, predicate2, object2], ...]`.	REQUIRED when `inputformat='csv'`)
infer	string	Inference to perform on update, i.e., `'none'`, `'rdfs'`, `'owl'`, or `'rdfs_owl'`	`'none'`

Example 1: create a test RDF dataset from a string

The .geistdata/rdflib/test.pkl file is created and a Connection instance is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 2: create a test RDF dataset from a file

The .geistdata/rdflib/test.pkl file is created and a Connection instance is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 3: create a RDF dataset in memory from a string

A Connection instance is returned.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

Example 4: create a RDF dataset in memory from a file

A Connection instance is returned.

Here is the test.csv file:

subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})

import geist

csv_str = """
v1,v2,v3
1,2,3
7,8,9
"""

# create a Connection object
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})

How to interact with a Connection class?

Once a Connection class is instantiated, we can interact with it using close, destroy, export, graph, load, and query methods.

close method

close method of the Connection class is to close the dataset connection, i.e., reset all attributes to None. No parameters are required.

Example: close the connection

Suppose connection is the instance of the Connection class.

# Close the connection
connection.close()

destroy method

destroy method of the Connection class is to delete the dataset and close the dataset connection.

Example: delete the dataset and close the connection

Suppose connection is the instance of the Connection class for a DuckDB dataset named test stored on disk. The following code will delete the .geistdata/duckdb/test.duckdb file.

# Delete the dataset and close the connection
connection.destroy()

export method

export method of the Connection class exports a dataset. It is very similar to the export() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for export method of the Connection class:

Name	Type	Description	Default
hasoutput	bool	`True` to export as a file or print it out	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: duckdb

Key	Type	Description	Default
outputroot	string	Path of the directory to store the exported table	`'./'`
outputfile	string	Path of the file to store the exported table	`None`
outputformat	string	Format of the exported table, i.e., `'csv'` or `'json'`	`'csv'`
table	string	Name of the table to be exported	`'df'`

Example 1: export all rows of the df table in test dataset on disk

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named data.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})

Example 2: export all rows of the df table in test dataset in memory

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})

datastore: rdflib

Key	Type	Description	Default
outputroot	string	Path of the directory to store these exported triples	`'./'`
outputfile	string	Path of the file to store these exported triples	`None`
outputformat	string	Format of the exported triples, i.e., `'json-ld'`, `'n3'`, `'nquads'`, `'nt'`, `'hext'`, `'pretty-xml'`, `'trig'`, `'trix'`, `'turtle'`, `'longturtle'`, or `'xml'`	`'nt'`

Example 1: export all triples of the test dataset on disk

There exist a file with the path of .geistdata/rdflib/test.pkl. The following code returns a string named data.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Export all triples of the test dataset as a string with the 'nt' format
data = connection.export(hasoutput=False)

Example 2: export all triples of the test dataset in memory

Suppose conn is a GeistGraph object points to a RDF dataset in memory. The following code returns a string named data.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False)

graph method

graph method of the Connection class exports a dataset. Only rdflib is supported for now. It is very similar to the graph() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for graph method of the Connection class:

Name	Type	Description	Default
datastore	string	A backend datastore, i.e., `'rdflib'` or `'duckdb'`	REQUIRED`
dataset	string OR `GeistGraph` object	Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a `GeistGraph` object for dataset in memory	REQUIRED
hasoutput	bool	`True` to export as a file or print it out	REQUIRED
config	dict	A dictionary with configurations for certain backend store.	see below

Description for the config parameter:

datastore: rdflib

Name	Type	Description	Default
rankdir	string	Direction of the graph: `'TB'` or `'BT'` or `'LR'` or `'RL'`	`'TB'`
mappings	string	File of the mappings to shorten text (str): path of a JSON file, where the key is the original text and the value is the shorter text	`None`
on	string	Column(s) to be mapped	`None`
samecolor	bool	`True` to use the same color for same edges	`True`
outputroot	string	Path of the directory to store the graph	`'./'`
outputfile	string	Path of the file without extension to store the graph	`'res'`
outputformats	list	Format of the graph: `'none'` or `'svg'` or `'png'` or `'gv'`	`['none']`

Example 1: visualize the test dataset on disk

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code visualizes the test dataset as a graph and saves it as the res.svg file.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Visualize the test dataset
connection.graph(hasoutput=True, config={'outputformats': ['svg']})

Example 2: visualize the test dataset in memory

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code visualizes the test dataset as a graph and saves it as the res.svg file.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Visualize the test dataset
connection.graph(hasoutput=True, config={'outputformats': ['svg']})

load method

load method of the Connection class imports data into an existing dataset on disk or in memory. It is very similar to the load() function. The only difference is that datastore, dataset, and inmemory parameters do not need to be passed as they have already been specified while initialzing the Connection class.

Parameters description for load method of the Connection class:

Name	Type	Description	Default
inputfile	string	A file to be loaded	REQUIRED
inputformat	string	Format of the file to be loaded	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
config	dict	A dictionary with configurations for certain backend store	see below

Description for the config parameter:

datastore: duckdb

Name	Type	Description	Default
table	string	Name of the table to be loaded	REQUIRED

Example: load a table into the test dataset

There exist a file with the path of .geistdata/duckdb/test.duckdb. The csv_str will be imported into the df table. Note that the order of table columns should be consistent with the imported data.

import geist

csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Load csv_str to the df table of the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})

datastore: rdflib

Name	Type	Description	Default
colnames	string	Column names of triples with the format of `[[subject1, predicate1, object1], [subject2, predicate2, object2], ...]`	REQUIRED when `inputformat='csv'`

Example: load a triple into the test dataset

There exist a file with the path of .geistdata/rdflib/test.pkl. The csv_str will be imported into the test RDF dataset.

import geist

csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""

# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Load csv_str to the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})

query method

query method of the Connection class can query a dataset stored on disk or in memory. It is very similar to the query() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialze the Connection class.

Parameters description for query method of the Connection class:

Name	Type	Description	Default
inputfile	string	File containing the query	REQUIRED
isinputpath	bool	`True` if the inputfile is the file path, otherwise the inputfile is the content	REQUIRED
hasoutput	bool	`True` to store the query results as a CSV file or print them out	REQUIRED
config	dict	A dictionary with configurations when `hasoutput=True`	see below

Description for the config parameter:

Name	Type	Description	Default
outputroot	string	Path of the directory to store the query results	`'./'`
outputfile	string	Path of the file to store the query results	`None`

Example 1: all rows of the df table in test dataset on disk (query from a string)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Query the df table of the test dataset
res = connection.query(inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)

Example 2: all rows of the df table in test dataset on disk (query from a file)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Query the df table of the test dataset
res = connection.query(inputfile="query.txt", isinputpath=True, hasoutput=False)

Example 3: all rows of the df table in test dataset in memory (query from a string)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Query the df table of the test dataset
res = connection.query(inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)

Example 4: all rows of the df table in test dataset in memory (query from a file)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Query the df table of the test dataset
res = connection.query(inputfile="query.txt", isinputpath=True, hasoutput=False)