Create ML Models with BigQuery ML: Challenge Lab
LAB CODE:- GSP341
LAB NAME:- Create ML Models with BigQuery ML: Challenge Lab
Task 1: Create a dataset to store your machine learning models
Open Cloud Shell Terminal and run the following command:-
#############################################################################
bq mk austin
#############################################################################
Task 2: Create a forecasting BigQuery machine
learning model
Open BigQuery Console
Query Editor and run the following command:-
#############################################################################
CREATE OR REPLACE MODEL austin.location_model
OPTIONS
(model_type='linear_reg', labels=['duration_minutes']) AS
SELECT
start_station_name AS location,
EXTRACT(HOUR FROM start_time) AS start_hour,
EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
duration_minutes,
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
JOIN
`bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
ON
trips.start_station_name = stations.name
WHERE
EXTRACT(YEAR FROM start_time) = 2018
AND duration_minutes > 0
Task 3: Create the second machine learning model
On BigQuery Console Query Editor and run the following command:-
#############################################################################
CREATE OR REPLACE MODEL austin.subscriber_model
OPTIONS
(model_type='linear_reg', labels=['duration_minutes']) AS
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
subscriber_type,
duration_minutes
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
WHERE EXTRACT(YEAR FROM start_time) = 2018
#############################################################################
Task 4: Evaluate the two machine learning models
On BigQuery Console Query Editor and run the following 2 commands one by one:-
########################## QUERY-1 FOR TASK 4 #############################
SELECT
SQRT(mean_squared_error) AS rmse,
mean_absolute_error
FROM
ML.EVALUATE(MODEL austin.location_model, (
SELECT
start_station_name AS location,
EXTRACT(HOUR FROM start_time) AS start_hour,
EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
duration_minutes
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
JOIN
`bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
ON
trips.start_station_name = stations.name
WHERE EXTRACT(YEAR FROM start_time) = 2019)
)
#############################################################################
########################## QUERY-2 FOR TASK 4 #############################
SELECT
SQRT(mean_squared_error) AS rmse,
mean_absolute_error
FROM
ML.EVALUATE(MODEL austin.subscriber_model, (
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
subscriber_type,
duration_minutes
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
WHERE
EXTRACT(YEAR FROM start_time) = 2019)
)
#############################################################################
Task 5: Use the subscriber type machine learning
model to predict average trip durations
On BigQuery Console Query Editor and run the following 2 commands one by one:-
########################## QUERY-1 FOR TASK 5 ##############################
SELECT
start_station_name,
COUNT(*) AS trips
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips`
WHERE
EXTRACT(YEAR FROM start_time) = 2019
GROUP BY
start_station_name
ORDER BY
trips DESC
########################## QUERY-2 FOR TASK 5 ##############################
SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length
FROM ML.predict(MODEL austin.subscriber_model, (
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
subscriber_type,
duration_minutes
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips`
WHERE
EXTRACT(YEAR FROM start_time) = 2019
AND subscriber_type = 'Single Trip'
AND start_station_name = '21st & Speedway @PCL'))


It's a great blog but would like to let you know that there is typo in task 1 - It is "bq mk austin" and other one was to query start_station_name AS location wherever it was included. This helped me in completing the lab. Thank you :)
ReplyDeleteHey! Thanks DIvya for your feedback. The typo mistake has been solved now.
Delete