Google BigQuery Machine Learning
BigQuery ML can be used to create and execute machine learning models in BigQuery using Standard SQL queries performing the following processes:
- Training of Linear and Logistic Regression models from a dataset
- Inspection of training information such as runtime
- Model evaluation against labeled data
- Prediction of values using trained models
An overview and sample code for each main step is below. Each of these functions are executed as Standard SQL queries on the Google BigQuery instance. It's advised to keep any queries used to display data in separate charts from machine learning queries as they should be run independently.
Note: BigQuery ML is still a beta feature and documentation and functionality may change. Stay up-to-date with BigQuery ML syntax and capabilities on the BigQuery ML website.
Training a model
Before making any machine learning predictions, a “model” needs to be trained. There are currently two types of models supported by Google BigQueryML: Linear Regression for continuous predictions and Logistic Regression for classification. When a model is trained, it’s added as a schema to your BigQuery instance, and will appear in the Periscope Schema Browser after schemas are refreshed:
Creating a model is done with a single Create command that includes model settings and training data. An example command to create a model is shown below. The ‘training_data’ dataset includes a column named ‘label’ which acts as the target variable for the model.
Inspect the training information
Metadata describing the trained model is available with a simple command. With ML.Training_Info like the example below, information such as number of iterations, runtime per iteration, and loss metrics are made available.
Evaluate the model
Machine learning models in Google BigQuery ML can be evaluated using either the function ML.Evaluate, available for both linear and logistic regression, and ML.ROC_Curve, available for logistic regression. The evaluation will return different metrics depending on the model type such as precision & recall for logistic regression and mean squared error for linear regression. ML.ROC_Curve and ML.Evaluate documentation available on Google's site.
Perhaps the most exciting part, after training your model and tuning it as needed, we can now make predictions on real data. The ML.Predict function will use the trained model along with a dataset with matching columns to create predictions for each record. This query and its output are best suited for charting and display on dashboards.