Skip to content

query feature can perform a query on a dataset stored on disk or in memory.

You can write a Geist template with the query tag. You can also use CLI or Python API step by step as follows:

The create command has three subcommands, all of which create a new dataset on disk. The dataset name :memory: is a reserved value for datasets that exist only in memory and is not allowed in the CLI.

Usage: geist create [OPTIONS] COMMAND [ARGS]...

Create a new dataset

Options:
--help  Show this message and exit.

Commands:
clingo  Create a new ASP dataset using Clingo
duckdb  Create a new SQL dataset using DuckDB
rdflib  Create a new RDF dataset using RDFLib
geist create clingo [OPTIONS]
Usage: geist create clingo [OPTIONS]

Create a new ASP dataset using Clingo

Options:
-d, --dataset TEXT              Name of ASP dataset to create (default "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as facts,
                                rules, and contraints  [required]
-iformat, --inputformat [lp|csv|json]
                                Format of the file to be loaded as facts,
                                rules, and constraints. Note that "csv" only
                                supports facts (default "lp"). If multiple
                                possibilities are provided (as a list), only
                                the first one will be considered.
-pred, --predicate TEXT         "isfirstcol" for using the first column as
                                the predicate name; strings other than
                                "isfirstcol" are used as the predicate name
                                directly (default: "isfirstcol")
-prog, --programname TEXT       Name of the program (default: "base")
--help                          Show this message and exit.
Example 1: create a test ASP dataset from stdin with LP format
geist create clingo --dataset test --inputformat lp << __END_INPUT__
friends(a, b).
friends(a, c).
__END_INPUT__
Example 2: create a test ASP dataset from a file with LP format

Here is the friends.lp file:

friends(a, b).
friends(a, c).

Code:

geist create clingo --dataset test --inputfile friends.lp --inputformat lp

Example 3: create a test ASP dataset from a CSV file

Here is the friends.csv file:

arg1,arg2
a,b
a,c

Code:

geist create clingo --dataset test --inputfile friends.csv --inputformat csv --predicate friends

geist create duckdb [OPTIONS]
Usage: geist create duckdb [OPTIONS]

Create a new SQL dataset using DuckDB

Options:
-d, --dataset TEXT              Name of SQL dataset to create (default "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as a Pandas
                                DataFrame  [required]
-iformat, --inputformat [csv|json]
                                Format of the file to be loaded as a Pandas
                                DataFrame (default csv)
-t, --table TEXT                Name of the table to be created (default
                                "df")
--help                          Show this message and exit.
Example 1: create a test SQL dataset from stdin
geist create duckdb --dataset test --inputformat csv --table df << __END_INPUT__
v1,v2,v3
1,2,3
4,5,6
7,8,9
__END_INPUT__
Example 2: create a test dataset from a file

Here is the test.csv file:

v1,v2,v3
1,2,3
4,5,6
7,8,9

Code:

geist create duckdb --dataset test --inputfile test.csv --inputformat csv --table df

geist create rdflib [OPTIONS]
Usage: geist create rdflib [OPTIONS]

Create a new RDF dataset

Options:
-d, --dataset TEXT              Name of RDF dataset to create (default "kb")
-ifile, --inputfile FILENAME    Path of the file to be loaded as triples
                                [required]
-iformat, --inputformat [xml|n3|turtle|nt|pretty-xml|trix|trig|nquads|json-ld|hext|csv]
                                Format of the file to be loaded as triples
                                (default json-ld)
--colnames TEXT                 Column names of triples with the format of
                                [[subject1, predicate1, object1], [subject2,
                                predicate2, object2], ...] when the input
                                format is csv
--infer [none|rdfs|owl|rdfs_owl]
                                Inference to perform on update [none, rdfs,
                                owl, rdfs_owl] (default "none")
--help                          Show this message and exit.
Example 1: create a test RDF dataset from stdin
geist create rdflib --dataset test --inputformat nt --infer none << __END_INPUT__

<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .

__END_INPUT__
Example 2: create a test dataset from a file

Here is the test.nt file:

<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .
Code:
geist create rdflib --dataset test --inputfile test.nt --inputformat nt --infer none

query function can perform a query on a dataset.

Parameters description for query():

Name Type Description Default
datastore string A backend datastore, i.e., 'clingo', 'duckdb', or 'rdflib' REQUIRED
dataset string OR Control object OR DuckPyConnection object OR GeistGraph object (1) A string indicates the name of the dataset stored on disk OR (2) a Control object, a DuckPyConnection object, or a GeistGraph object for dataset in memory REQUIRED
inputfile string File containing the query REQUIRED
isinputpath bool True if the inputfile is the file path, otherwise the inputfile is the content REQUIRED
hasoutput bool True to store the query results as a CSV file or print them out REQUIRED
config dict A dictionary with configurations when hasoutput=True see below

Description for the config parameter:

Name Type Description Default
outputroot string Path of the directory to store the query results './'
outputfile string Path of the file to store the query results None
returnformat string Format of the returned data, i.e., 'lp', 'df', or 'dict'. Available for clingo data backend only. 'lp'
predicate string Name of the predicate to be queried. Available for clingo data backend only. None
programname string Name of the program. Available for clingo data backend only. 'base'
Example 1: all rows of the df table in test dataset on disk (query from a string)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results, and a DuckPyConnection object.

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset='test', inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 2: all rows of the df table in test dataset on disk (query from a file)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results, and a DuckPyConnection object.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset='test', inputfile="query.txt", isinputpath=True, hasoutput=False)

Example 3: all rows of the df table in test dataset in memory (query from a string)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results, and the same DuckPyConnection object.

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset=conn, inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 4: all rows of the df table in test dataset in memory (query from a file)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results, and the same DuckPyConnection object.

Here is the query.txt file:

SELECT * FROM df;

Code: ``` import geist

Query the df table of the test dataset

(res, conn) = geist.query(datastore='duckdb', dataset=conn, inputfile="query.txt", isinputpath=True, hasoutput=False)

```

query method of the Connection class can query a dataset stored on disk or in memory. It is very similar to the query() function. The only difference is that the datastore and the dataset parameters do not need to be passed as they have already been specified while initialze the Connection class.

Parameters description for query method of the Connection class:

Name Type Description Default
inputfile string File containing the query REQUIRED
isinputpath bool True if the inputfile is the file path, otherwise the inputfile is the content REQUIRED
hasoutput bool True to store the query results as a CSV file or print them out REQUIRED
config dict A dictionary with configurations when hasoutput=True see below

Description for the config parameter:

Name Type Description Default
outputroot string Path of the directory to store the query results './'
outputfile string Path of the file to store the query results None
Example 1: all rows of the df table in test dataset on disk (query from a string)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results.

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Query the df table of the test dataset
res = connection.query(inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 2: all rows of the df table in test dataset on disk (query from a file)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Create a Connection instance
connection = geist.Connection.connect(datastore='duckdb', dataset='test')
# Query the df table of the test dataset
res = connection.query(inputfile="query.txt", isinputpath=True, hasoutput=False)

Example 3: all rows of the df table in test dataset in memory (query from a string)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results.

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Query the df table of the test dataset
res = connection.query(inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 4: all rows of the df table in test dataset in memory (query from a file)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Create a Connection instance
connection = geist.Connection(datastore='duckdb', dataset=':memory:', conn=conn)
# Query the df table of the test dataset
res = connection.query(inputfile="query.txt", isinputpath=True, hasoutput=False)