Class Connection
Connection class can interact with a dataset with create
, close
, destroy
, export
, graph
, load
, and query
methods.
What is a Connection class?
A Connection class has three attributes:
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED |
dataset | string | Name of the dataset. Note that ':memory:' is a reserved value for datasets that exist only in memory |
REQUIRED |
conn | object | A DuckPyConnection object OR a GeistGraph object |
None |
How to instantiate a Connection class?
If the dataset exists, the Connection class can be instantiated using its connect method:
# create a Connection object to an existing dataset named test
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
If the dataset does not exist, there are two approaches to create and connect:
Approach 1: use the create function, then initialize the Connection class
create function
create function can create a new dataset on disk or in memory.
Parameters description for create():
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED |
dataset | string | Name of the dataset to be created. Note that ':memory:' is a reserved value for datasets that exist only in memory |
REQUIRED |
inputfile | string | A file to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Name | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be created | 'df' |
Example 1: create a test
SQL dataset on disk from a string
The .geistdata/duckdb/test.duckdb
file is created and a DuckDBPyConnection
object is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 2: create a test
SQL dataset on disk from a file
The .geistdata/duckdb/test.duckdb
file is created and a DuckDBPyConnection
object is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
Example 3: create a SQL dataset in memory from a string
A DuckDBPyConnection
object is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 4: create a SQL dataset in memory from a file
A DuckDBPyConnection
object is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a DuckPyConnection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] |
REQUIRED when inputformat='csv' |
infer | string | Inference to perform on update, i.e., 'none' , 'rdfs' , 'owl' , or 'rdfs_owl' |
'none' |
Example 1: create a test
RDF dataset on disk from a string
The .geistdata/rdflib/test.pkl
file is created and a GeistGraph
object is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 2: create a test
RDF dataset on disk from a file
The .geistdata/rdflib/test.pkl
file is created and a GeistGraph
object is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset='test', inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 3: create a RDF dataset in memory from a string
A GeistGraph
object is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 4: create a RDF dataset in memory from a file
A GeistGraph
object is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a GeistGraph object: a dictionary with 'rdf_graph' and 'infer' keys
conn = geist.create(datastore='rdflib', dataset=':memory:', inputfile='test.csv', inputformat='csv', isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
import geist
csv_str = """
v1,v2,v3
1,2,3
7,8,9
"""
# create a Connection object
conn = geist.create(datastore='duckdb', dataset=':memory:', inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
Approach 2: use the create method of the Connection class
create method of the Connection class
create method of the Connection class creates a new dataset on disk or in memory. It is very similar to the create()
function. The only difference is that the datastore
and the dataset
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for create method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
inputfile | string | A file to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Key | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be created | 'df' |
Example 1: create a test
SQL dataset from a string
The .geistdata/duckdb/test.duckdb
file is created and a Connection
instance is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 2: create a test
SQL dataset from a file
The .geistdata/duckdb/test.duckdb
file is created and a Connection
instance is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
Example 3: create a SQL dataset in memory from a string
A Connection
instance is returned.
import geist
csv_str = """
v1,v2,v3
1,2,3
4,5,6
7,8,9
"""
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
Example 4: create a SQL dataset in memory from a file
A Connection
instance is returned.
Here is the test.csv
file:
v1,v2,v3
1,2,3
4,5,6
7,8,9
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"table": "df"})
datastore: rdflib
Key | Type | Description | Default |
---|---|---|---|
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] . |
REQUIRED when inputformat='csv' ) |
infer | string | Inference to perform on update, i.e., 'none' , 'rdfs' , 'owl' , or 'rdfs_owl' |
'none' |
Example 1: create a test
RDF dataset from a string
The .geistdata/rdflib/test.pkl
file is created and a Connection
instance is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 2: create a test
RDF dataset from a file
The .geistdata/rdflib/test.pkl
file is created and a Connection
instance is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset='test')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 3: create a RDF dataset in memory from a string
A Connection
instance is returned.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
"""
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
Example 4: create a RDF dataset in memory from a file
A Connection
instance is returned.
Here is the test.csv
file:
subject,predicate,object
<http://example.com/drewp>,<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,<http://xmlns.com/foaf/0.1/Person>
<http://example.com/drewp>,<http://example.com/says>,"Hello World"
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:')
connection.create(inputfile="test.csv", inputformat="csv", isinputpath=True, config={"colnames": "[['subject', 'predicate', 'object']]"})
import geist
csv_str = """
v1,v2,v3
1,2,3
7,8,9
"""
# create a Connection object
connection = geist.Connection(datastore='duckdb', dataset=':memory:')
connection.create(inputfile=csv_str, inputformat="csv", isinputpath=False, config={"table": "df"})
How to interact with a Connection class?
Once a Connection class is instantiated, we can interact with it using close
, destroy
, export
, graph
, load
, and query
methods.
close method
close method of the Connection class is to close the dataset connection, i.e., reset all attributes to None. No parameters are required.
Example: close the connection
Suppose connection
is the instance of the Connection class.
# Close the connection
connection.close()
destroy method
destroy method of the Connection class is to delete the dataset and close the dataset connection.
Example: delete the dataset and close the connection
Suppose connection
is the instance of the Connection class for a DuckDB dataset named test
stored on disk. The following code will delete the .geistdata/duckdb/test.duckdb
file.
# Delete the dataset and close the connection
connection.destroy()
export method
export method of the Connection class exports a dataset. It is very similar to the export()
function. The only difference is that the datastore
and the dataset
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for export method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
hasoutput | bool | True to export as a file or print it out |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Key | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store the exported table | './' |
outputfile | string | Path of the file to store the exported table | None |
outputformat | string | Format of the exported table, i.e., 'csv' or 'json' |
'csv' |
table | string | Name of the table to be exported | 'df' |
Example 1: export all rows of the df
table in test
dataset on disk
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named data
.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
Example 2: export all rows of the df
table in test
dataset in memory
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named data
.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False, config={'table': 'df'})
datastore: rdflib
Key | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store these exported triples | './' |
outputfile | string | Path of the file to store these exported triples | None |
outputformat | string | Format of the exported triples, i.e., 'json-ld' , 'n3' , 'nquads' , 'nt' , 'hext' , 'pretty-xml' , 'trig' , 'trix' , 'turtle' , 'longturtle' , or 'xml' |
'nt' |
Example 1: export all triples of the test
dataset on disk
There exist a file with the path of .geistdata/rdflib/test.pkl
. The following code returns a string named data
.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Export all triples of the test dataset as a string with the 'nt' format
data = connection.export(hasoutput=False)
Example 2: export all triples of the test
dataset in memory
Suppose conn
is a GeistGraph
object points to a RDF dataset in memory. The following code returns a string named data
.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='rdflib', dataset=':memory:', conn=conn)
# Export the df table of the test dataset
data = connection.export(hasoutput=False)
graph method
graph method of the Connection class exports a dataset. Only rdflib
is supported for now. It is very similar to the graph()
function. The only difference is that the datastore
and the dataset
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for graph method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED` |
dataset | string OR GeistGraph object |
Dataset to load an object: (1) A string indicates the name of the dataset stored on disk OR (2) a GeistGraph object for dataset in memory |
REQUIRED |
hasoutput | bool | True to export as a file or print it out |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store. | see below |
Description for the config parameter:
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
rankdir | string | Direction of the graph: 'TB' or 'BT' or 'LR' or 'RL' |
'TB' |
mappings | string | File of the mappings to shorten text (str): path of a JSON file, where the key is the original text and the value is the shorter text | None |
on | string | Column(s) to be mapped | None |
samecolor | bool | True to use the same color for same edges |
True |
outputroot | string | Path of the directory to store the graph | './' |
outputfile | string | Path of the file without extension to store the graph | 'res' |
outputformats | list | Format of the graph: 'none' or 'svg' or 'png' or 'gv' |
['none'] |
Example 1: visualize the test
dataset on disk
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code visualizes the test dataset as a graph and saves it as the res.svg file.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Visualize the test dataset
connection.graph(hasoutput=True, config={'outputformats': ['svg']})
Example 2: visualize the test
dataset in memory
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code visualizes the test dataset as a graph and saves it as the res.svg file.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Visualize the test dataset
connection.graph(hasoutput=True, config={'outputformats': ['svg']})
load method
load method of the Connection class imports data into an existing dataset on disk or in memory. It is very similar to the load()
function. The only difference is that datastore
, dataset
, and inmemory
parameters do not need to be passed as they have already been specified while initialzing the Connection class.
Parameters description for load method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
inputfile | string | A file to be loaded | REQUIRED |
inputformat | string | Format of the file to be loaded | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
config | dict | A dictionary with configurations for certain backend store | see below |
Description for the config parameter:
datastore: duckdb
Name | Type | Description | Default |
---|---|---|---|
table | string | Name of the table to be loaded | REQUIRED |
Example: load a table into the test
dataset
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The csv_str
will be imported into the df
table. Note that the order of table columns should be consistent with the imported data.
import geist
csv_str = """
v1,v2,v3
1,1,1
2,2,2
3,3,3
"""
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Load csv_str to the df table of the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={'table': 'df'})
datastore: rdflib
Name | Type | Description | Default |
---|---|---|---|
colnames | string | Column names of triples with the format of [[subject1, predicate1, object1], [subject2, predicate2, object2], ...] |
REQUIRED when inputformat='csv' |
Example: load a triple into the test
dataset
There exist a file with the path of .geistdata/rdflib/test.pkl
. The csv_str
will be imported into the test
RDF dataset.
import geist
csv_str = """
subject,predicate,object
<http://example.com/drewp>,<http://example.com/feels>,"Happy"
"""
# Create a Connection instance
connection = geist.Connection.connect(datastore='rdflib', dataset='test')
# Load csv_str to the test dataset
connection.load(inputfile=csv_str, inputformat='csv', isinputpath=False, config={"colnames": "[['subject', 'predicate', 'object']]"})
query method
query method of the Connection class can query a dataset stored on disk or in memory. It is very similar to the query()
function. The only difference is that the datastore
and the dataset
parameters do not need to be passed as they have already been specified while initialze the Connection class.
Parameters description for query method of the Connection class:
Name | Type | Description | Default |
---|---|---|---|
inputfile | string | File containing the query | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content |
REQUIRED |
hasoutput | bool | True to store the query results as a CSV file or print them out |
REQUIRED |
config | dict | A dictionary with configurations when hasoutput=True |
see below |
Description for the config parameter:
Name | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store the query results | './' |
outputfile | string | Path of the file to store the query results | None |
Example 1: all rows of the df
table in test
dataset on disk (query from a string)
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named res
with query results.
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Query the df table of the test dataset
res = connection.query(inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 2: all rows of the df
table in test
dataset on disk (query from a file)
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named res
with query results.
Here is the query.txt
file:
SELECT * FROM df;
Code:
import geist
# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Query the df table of the test dataset
res = connection.query(inputfile="query.txt", isinputpath=True, hasoutput=False)
Example 3: all rows of the df
table in test
dataset in memory (query from a string)
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res
with query results.
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Query the df table of the test dataset
res = connection.query(inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 4: all rows of the df
table in test
dataset in memory (query from a file)
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res
with query results.
Here is the query.txt
file:
SELECT * FROM df;
Code:
import geist
# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Query the df table of the test dataset
res = connection.query(inputfile="query.txt", isinputpath=True, hasoutput=False)