Hands on with Gemini models in BigQuery: Decoding sentiment in customer reviews

7 months ago 36
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Sentiment analysis is a powerful tool that uses natural language processing (NLP) to uncover the underlying emotions (positive, negative, neutral) within text such as customer reviews. This analysis can offer valuable insight into how customers perceive your products, services, and brand overall. Furthermore, by utilizing techniques like topic modeling and keyword extraction, you can identify recurring themes. These themes could highlight specific product features, aspects of customer service, or general pain points — providing a roadmap for improvement and a way to address customer needs more effectively.

In BigQuery you can use ML.GENERATE_TEXT function that lets you directly utilize powerful large language models (LLMs) from Google's Vertex AI within your SQL queries to analyze text in a BigQuery table. This means you can perform sophisticated text generation and analysis tasks on data stored in your BigQuery tables without needing to move data or write complex code outside the BigQuery environment. ML.GENERATE_TEXT function can also be used to generate text that describes visual content using a remote model based on a gemini-pro-vision multimodal model. Some key benefits are:

  1. Ease of use: BigQuery's SQL integration lets you tap into advanced language model capabilities without requiring separate machine learning pipelines or specialized coding expertise.

  2. Scalability: BigQuery's strengths in handling massive datasets pair nicely with LLMs to process customer reviews or other text sources at scale.

  3. Insight generation: ML.GENERATE_TEXT assists in tasks such as:

    • Sentiment analysis: Determine the overall emotional tone (positive, negative, neutral) in customer feedback.
    • Theme extraction: Identify common topics and trends within reviews
    • Summarization: Condense lengthy reviews into key points.
    • Text completion & generation: Get help with responses, ad copy, or creative writing based on existing reviews.
for_blog_1

ML.GENERATE_TEXT in action

Now, let’s take a look at how to use ML.GENERATE_TEXT, taking a hypothetical rideshare company as an example.

Setup instructions:

  1. Before starting, choose your GCP project, link a billing account, and enable the necessary API; full instructions here.

  2. Create a cloud resource connection and get the connection's service account; full guide here.

  3. Grant access to the service account by following the steps here

  4. Load data. To oad from public storage account, using the following command:

    1. Please replace '[PROJECT_ID.DATASET_ID]'  with your project_id, and enter a name for your dataset

    2. The command will create a table named ‘customer_review’ in your dataset

code_block <ListValue: [StructValue([('code', "CREATE SCHEMA IF NOT EXISTS `[PROJECT_ID.DATASET_ID]` OPTIONS (location='us');\r\n\r\n\r\nLOAD DATA OVERWRITE `[PROJECT_ID.DATASET_ID].customer_review`\r\nFROM FILES ( format = 'PARQUET',\r\nuris = ['gs://data-analytics-golden-demo/rideshare-lakehouse-raw-bucket/rideshare_llm_export/v1/raw_zone/customer_review/000000000000.parquet']);"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ecc5bb0>)])]>

Sentiment analysis

Let’s walk through an example of performing sentiment analysis.

1. Create a model

Create a remote model in BigQuery that utilizes a Vertex AI foundation model.

Syntax:

code_block <ListValue: [StructValue([('code', "CREATE OR REPLACE MODEL\r\n`PROJECT_ID.DATASET_ID.MODEL_NAME`\r\nREMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`\r\nOPTIONS (ENDPOINT = 'ENDPOINT');"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ecc5670>)])]>

Code example:

  • Please replace '[PROJECT_ID.DATASET_ID.MODEL_NAME]'  with your project_id, dataset_id and model name

  • Please replace '[PROJECT_ID.REGION.CONNECTION_ID]'  with your project_id, region and connection_id

code_block <ListValue: [StructValue([('code', "CREATE OR REPLACE MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`\r\nREMOTE WITH CONNECTION `[PROJECT_ID.REGION.CONNECTION_ID]`\r\nOPTIONS (ENDPOINT = 'gemini-pro');"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ecc54c0>)])]>

2. Generate text

With just a few lines of SQL, you can analyze text or visual content in your BigQuery table using that model and the ML.GENERATE_TEXT function.

ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote model targets. Read the documentation to understand all the parameters of the ML.GENERATE_TEXT function.

Syntax:

code_block <ListValue: [StructValue([('code', 'ML.GENERATE_TEXT(\r\nMODEL project_id.dataset.model,\r\n{ TABLE project_id.dataset.table | (query_statement) },\r\nSTRUCT(\r\n [max_output_tokens AS max_output_tokens]\r\n [, top_k AS top_k]\r\n [, top_p AS top_p]\r\n [, temperature AS temperature]\r\n [, flatten_json_output AS flatten_json_output]\r\n [, stop_sequences AS stop_sequences])\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ecc50d0>)])]>

Code example:

  • Please replace '[PROJECT_ID.DATASET_ID]'  with your project_id and dataset_id

  • Please replace '[PROJECT_ID.REGION.CONNECTION_ID]'  with your project_id, region and connection_id

code_block <ListValue: [StructValue([('code', "CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID].review_sentiment_analysis` AS\r\nWITH PROMPT AS (\r\n SELECT CONCAT ('For the given review classify the sentiment as Positive, Neutral or Negative.',\r\n\r\n\r\n'\\n input: The driver was able to make some small talk, but he didn\\'t go overboard. I liked that he was friendly and chatty, but he also knew when to leave me alone. The trunk fit my belongings, and the car was clean and comfortable. Overall, it was a good ride.',\r\n\r\n\r\n'\\n output: \\n Positive - Trunk fit my belongings, friendly, chatty',\r\n\r\n\r\n'\\n input: I took a rideshare last night and it was an okay experience. The car was adequately clean, but it was a bit warm for my liking. The driver was able to make some small talk, but I wasn\\'t really in the mood to talk. Overall, it was a fine ride.',\r\n\r\n\r\n'\\n output: Neutral - Clean, A bit warm, fun ride',\r\n '\\n input: ', customer_review_text,\r\n'\\n output: '\r\n) AS prompt, customer_id, \r\ncustomer_review_text\r\n FROM `[PROJECT_ID.DATASET_ID].customer_review`\r\n LIMIT 20\r\n),\r\nREVIEW_RESPONSE_GENERATION AS (\r\n SELECT *\r\n FROM\r\n ML.GENERATE_TEXT(\r\n MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,\r\n (SELECT * FROM PROMPT),\r\n STRUCT(\r\n 200 AS max_output_tokens,\r\n 0.5 AS temperature,\r\n 40 AS top_k,\r\n 1.0 AS top_p,\r\n TRUE AS flatten_json_output))\r\n)\r\nSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM REVIEW_RESPONSE_GENERATION;\r\n\r\n\r\nSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM `[PROJECT_ID.DATASET_ID].review_sentiment_analysis`;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ecc55b0>)])]>

3. Result:

In the prompt, we provided the model with context and two examples, clearly demonstrating our desired output format. You can validate that the outputs generated are aligned to the examples we provided to the model as part of a few-shot prompting approach.

In few-shot prompting, including a few examples of reviews with their corresponding sentiment labels is crucial for guiding the model's behavior. To help ensure the model's effectiveness in various situations, it's essential to offer a sufficient number of well-structured examples covering diverse review scenarios.

Then, by performing sentiment analysis on the customer reviews, we can have insights into their preferences and pain points regarding our products. By identifying key themes in the reviews, we can effectively communicate valuable feedback to the product team, enabling them to make informed, data-driven decisions and improvements.

for_blog_2

In the table above, you can see the results of ML.GENERATE_TEXT including the input table along with the following columns:

  1. ml_generate_text_result: This is the JSON response and the generated text is in the text element.

  2. ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.

  3. ml_generate_text_rai_result: a STRING value that contains the safety attributes. This column is returned when flatten_json_output is TRUE.

  4. ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

Extracting themes

Now, let’s extract the theme from the reviews using the model we created above:

  • Please replace '[PROJECT_ID.DATASET_ID]'  with your project_id and dataset_id
code_block <ListValue: [StructValue([('code', 'CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID].extract_themes` AS\r\n\r\n\r\nWITH PROMPT AS (\r\n SELECT CONCAT(\r\n"""\r\nClassify the text as one or more of the following categories and return in the below json format.\r\n- "trunk space small"\r\n- "trunk space large"\r\n- "driving too fast"\r\n- "driving too slow"\r\n- "clean car"\r\n- "dirty car"\r\n- "car too hot"\r\n- "car too cold"\r\n- "driver likes conversation"\r\n- "driver likes no conversation"\r\n- "driver likes music"\r\n- "driver likes no music"\r\n- "distracted driver"\r\n\r\n\r\nJSON format: [ "value" ]\r\nSample JSON Response: [ "dirty car", "car too cold" ]\r\n\r\n\r\nText:\r\n""", customer_review_text) AS prompt, customer_id, customer_review_text\r\n FROM `[PROJECT_ID.DATASET_ID].customer_review`\r\n LIMIT 10\r\n),\r\nEXTRACT_THEMES AS (\r\n SELECT *\r\n FROM\r\n ML.GENERATE_TEXT(\r\n MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,\r\n (SELECT * FROM PROMPT),\r\n STRUCT(\r\n 1024 AS max_output_tokens,\r\n 0 AS temperature,\r\n 1 AS top_k,\r\n 0 AS top_p,\r\n TRUE AS flatten_json_output))\r\n)\r\nSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM EXTRACT_THEMES;\r\n\r\n\r\nSELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM `[PROJECT_ID.DATASET_ID].extract_themes`;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ecc5b80>)])]>

Result:

Using ML.GENERATE_TEXT function and SQL from the BigQuery console, we can efficiently identify key themes within our customer reviews. This gives us deeper insight into customer perceptions and can provide actionable data to improve our products.

for_blog_3

ML.GENERATE_TEXT is tightly integrated with the Gemini model, which is designed for higher input/output scale and better result quality across a wide range of tasks, like text processing including classification and summarization, sentiment analysis, and code generation. 

Analyzing the themes

Now that we've identified the themes in our reviews, let's dive deeper with data canvas in BigQuery, an AI-centric experience to reimagine data analytics that we introduced at Next ‘24. BigQuery data canvas lets you discover, transform, query, and visualize data using natural language. It also provides a graphical interface that lets you work with data sources, queries, and visualizations in a directed acyclic graph (DAG), giving you a view of your analysis workflow that maps to your mental model.

Given that our themes are stored in the  'extract_themes' table, let's create a data canvas to analyze them further. Click the down arrow next to the '+' icon and choose 'Create Data canvas’

for_blog_4

You will be brought to a screen where you can search for the ‘extract themes’ table and get started.

for_blog_5

Once you’ve selected a table, you'll see it on a canvas where you can query it directly or join it with other tables.

To create a bar chart of themes, click on the 'Query' button and type 'bar chart for the most common theme and remove the null values and limit the result to top 10 values'. The AI understands your request and automatically generates the correct query, even though the 'themes' aren't in a dedicated column — the AI recognizes that the themes are found within the 'ml_genertae_text_llm_result' column. Finally, click 'Run' to see the query result.

for_blog_6
for_blog_7

Your theme data is ready! Click 'Visualize' to instantly see your bar chart. 

for_blog_8

You now have a bar chart of the extracted themes from your customer reviews,  plus automatically generated helpful insights based on the data and explanations of your findings. 

In short, BigQuery data canvas lets you analyze your data from start to finish with simple natural language commands: Discover relevant data, merge it with customer information, find key insights, collaborate with your team members, and create reports — all in one place. Plus, you can save these results or combine them with other data for further analysis or extract it into a notebook.

Curious to learn more? The official ML.GENERATE_TEXT documentation has all the details.

Read Entire Article