Function query
query function can perform a query on a dataset.
Parameters description for query():
Name | Type | Description | Default |
---|---|---|---|
datastore | string | A backend datastore, i.e., 'rdflib' or 'duckdb' |
REQUIRED |
dataset | string OR DuckPyConnection object OR GeistGraph object |
(1) A string indicates the name of the dataset stored on disk OR (2) a DuckPyConnection object OR a GeistGraph object for dataset in memory |
REQUIRED |
inputfile | string | File containing the query | REQUIRED |
isinputpath | bool | True if the inputfile is the file path, otherwise the inputfile is the content | REQUIRED |
hasoutput | bool | True to store the query results as a CSV file or print them out | REQUIRED |
config | dict | A dictionary with configurations when hasoutput=True |
see below |
Description for the config parameter:
Name | Type | Description | Default |
---|---|---|---|
outputroot | string | Path of the directory to store the query results | './' |
outputfile | string | Path of the file to store the query results | None |
Example 1: all rows of the df
table in test
dataset on disk (query from a string)
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named res
with query results, and a DuckPyConnection
object.
import geist
# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset='test', inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 2: all rows of the df
table in test
dataset on disk (query from a file)
There exist a file with the path of .geistdata/duckdb/test.duckdb
. The following code returns a Pandas data frame named res
with query results, and a DuckPyConnection
object.
Here is the query.txt
file:
SELECT * FROM df;
Code:
import geist
# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset='test', inputfile="query.txt", isinputpath=True, hasoutput=False)
Example 3: all rows of the df
table in test
dataset in memory (query from a string)
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res
with query results, and the same DuckPyConnection
object.
import geist
# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset=conn, inputfile="SELECT * FROM df;", isinputpath=False, hasoutput=False)
Example 4: all rows of the df
table in test
dataset in memory (query from a file)
Suppose conn
is a DuckPyConnection
object points to a DuckDB dataset in memory. The following code returns a Pandas data frame named res
with query results, and the same DuckPyConnection
object.
Here is the query.txt
file:
SELECT * FROM df;
Code:
import geist
# Query the df table of the test dataset
(res, conn) = geist.query(datastore='duckdb', dataset=conn, inputfile="query.txt", isinputpath=True, hasoutput=False)