SQL Tasks
Once you’ve received Pod access, you are now ready to query datasets! If you’d rather learn how to train ML models, see [Machine Learning Tasks](docs/for-data-scientists/machine-learning-tasks/ index.md).
Bitfount refers to any activity a data scientist performs on a dataset in a Pod, including running SQL queries, as a ‘task’. All task execution is tracked in a Pod’s activity history.
Before you embark on running SQL queries, it’s always a good idea to determine:
- If there are any Pod policy restrictions which might dictate what tasks you can perform against a dataset in a given Pod.
- If the Pod is online. You can tell this by the green icon in the Pod’s box on the “My Pods” page. If the Pod is offline, the Pod owner will need to bring the Pod back online for you.
- The structure of the dataset upon which you are acting.
Running SQL Queries
Currently, running SQL queries against a Pod is only supported via the Bitfount Python API. To do so, you must specify your query as a parameter to the SQLQuery
algorithm. We recommend doing so using a notebook tool, such as Jupyter. We provide a tutorial on this topic here.
The steps are:
- Import relevant pieces from the installed Bitfount package (see tutorial for example).
- Set up the loggers. Loggers enable you to receive input on the progress of your task and details on completion or failure.
- Specify your pod_identifier(s) and query prior to running the query. The Pod identifier(s) can be found in the Bitfount Hub at the top of the Pod’s page under its display name or at the end of the Pod’s URL.
For example:
pod_identifier= "my-data-pod"query= SqlQuery( query="""SELECT `occupation`, AVG(`age`)FROM dfGROUP BY `occupation`""")query.execute(pod_identifiers=[pod_identifier])
Next Steps
You did it! For more detailed illustrations of the Bitfount product suite, feel free to peruse the User Guide. Have more questions? Check out the Troubleshooting & FAQs guide.