Function query

query function can perform a query on a dataset.

Parameters description for query():

Name Type Description Default
datastore string A backend datastore, i.e., 'rdflib' or 'duckdb' REQUIRED
dataset string OR DuckPyConnection object OR GeistGraph object (1) A string indicates the name of the dataset stored on disk OR (2) a DuckPyConnection object OR a GeistGraph object for dataset in memory REQUIRED
inputfile string File containing the query REQUIRED
isinputpath bool True if the inputfile is the file path, otherwise the inputfile is the content REQUIRED
hasoutput bool True to store the query results as a CSV file or print them out REQUIRED
config dict A dictionary with configurations when hasoutput=True see below

Description for the config parameter:

Name Type Description Default
outputroot string Path of the directory to store the query results './'
outputfile string Path of the file to store the query results None
Example 1: all rows of the df table in test dataset on disk (query from a string)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results, and a DuckPyConnection object.

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset='test', inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 2: all rows of the df table in test dataset on disk (query from a file)

There exist a file with the path of .geistdata/duckdb/test.duckdb. The following code returns a Pandas data frame named res with query results, and a DuckPyConnection object.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset='test', inputfile="query.txt", isinputpath=True, hasoutput=False)

Example 3: all rows of the df table in test dataset in memory (query from a string)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results, and the same DuckPyConnection object.

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset=conn, inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 4: all rows of the df table in test dataset in memory (query from a file)

Suppose conn is a DuckPyConnection object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res with query results, and the same DuckPyConnection object.

Here is the query.txt file:

SELECT * FROM df;

Code:

import geist

# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset=conn, inputfile="query.txt", isinputpath=True, hasoutput=False)