Create ML Models with BigQuery ML: Challenge Lab


LAB CODE:- GSP341

LAB NAME:- Create ML Models with BigQuery ML: Challenge Lab

Task 1: Create a dataset to store your machine learning models

Open Cloud Shell Terminal and run the following command:-

#############################################################################
bq mk austin
#############################################################################

Task 2: Create a forecasting BigQuery machine learning model

Open BigQuery Console Query Editor and run the following command:-

#############################################################################
CREATE OR REPLACE MODEL austin.location_model
OPTIONS
  (model_type='linear_reg', labels=['duration_minutes']) AS
SELECT
    start_station_name AS location,
    EXTRACT(HOUR FROM start_time) AS start_hour,
    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
    duration_minutes,
FROM
    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
JOIN
    `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
ON
    trips.start_station_name = stations.name
WHERE
    EXTRACT(YEAR FROM start_time) = 2018
    AND duration_minutes > 0
#############################################################################

Task 3: Create the second machine learning model

On  BigQuery Console Query Editor and run the following command:-

#############################################################################
CREATE OR REPLACE MODEL austin.subscriber_model
OPTIONS
  (model_type='linear_reg', labels=['duration_minutes']) AS
SELECT
    start_station_name,
    EXTRACT(HOUR FROM start_time) AS start_hour,
    subscriber_type,
    duration_minutes
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
WHERE EXTRACT(YEAR FROM start_time) = 2018
#############################################################################



Task 4: Evaluate the two machine learning models


On  BigQuery Console Query Editor and run the following 2 commands one by one:-

########################## QUERY-1 FOR TASK 4 #############################
SELECT
  SQRT(mean_squared_error) AS rmse,
  mean_absolute_error
FROM
  ML.EVALUATE(MODEL austin.location_model, (
  SELECT
    start_station_name AS location,
    EXTRACT(HOUR FROM start_time) AS start_hour,
    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
    duration_minutes
  FROM
    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
  JOIN
   `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
  ON
    trips.start_station_name = stations.name
  WHERE EXTRACT(YEAR FROM start_time) = 2019)
)
#############################################################################

########################## QUERY-2 FOR TASK 4 #############################
SELECT
  SQRT(mean_squared_error) AS rmse,
  mean_absolute_error
FROM
  ML.EVALUATE(MODEL austin.subscriber_model, (
  SELECT
    start_station_name,
    EXTRACT(HOUR FROM start_time) AS start_hour,
    subscriber_type,
    duration_minutes
  FROM
    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
  WHERE
    EXTRACT(YEAR FROM start_time) = 2019)
)
#############################################################################


Task 5: Use the subscriber type machine learning model to predict average trip durations

On  BigQuery Console Query Editor and run the following 2 commands one by one:-

########################## QUERY-1 FOR TASK 5 ##############################
SELECT
  start_station_name,
  COUNT(*) AS trips
FROM
  `bigquery-public-data.austin_bikeshare.bikeshare_trips`
WHERE
  EXTRACT(YEAR FROM start_time) = 2019
GROUP BY
  start_station_name
ORDER BY
  trips DESC

########################## QUERY-2 FOR TASK 5 ##############################
SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length
FROM ML.predict(MODEL austin.subscriber_model, (
SELECT
    start_station_name,
    EXTRACT(HOUR FROM start_time) AS start_hour,
    subscriber_type,
    duration_minutes
FROM
  `bigquery-public-data.austin_bikeshare.bikeshare_trips`
WHERE 
  EXTRACT(YEAR FROM start_time) = 2019
  AND subscriber_type = 'Single Trip'
  AND start_station_name = '21st & Speedway @PCL'))




Comments

  1. It's a great blog but would like to let you know that there is typo in task 1 - It is "bq mk austin" and other one was to query start_station_name AS location wherever it was included. This helped me in completing the lab. Thank you :)

    ReplyDelete
    Replies
    1. Hey! Thanks DIvya for your feedback. The typo mistake has been solved now.

      Delete

Post a Comment

If you have any doubt, let me know?

Popular posts from this blog

Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab