Sentiment analysis is a powerful tool that uses natural language processing (NLP) to uncover the underlying emotions (positive, negative, neutral) within text such as customer reviews. This analysis can offer valuable insight into how customers perceive your products, services, and brand overall. Furthermore, by utilizing techniques like topic modeling and keyword extraction, you can identify recurring themes. These themes could highlight specific product features, aspects of customer service, or general pain points — providing a roadmap for improvement and a way to address customer needs more effectively.
In BigQuery you can use ML.GENERATE_TEXT function that lets you directly utilize powerful large language models (LLMs) from Google's Vertex AI within your SQL queries to analyze text in a BigQuery table. This means you can perform sophisticated text generation and analysis tasks on data stored in your BigQuery tables without needing to move data or write complex code outside the BigQuery environment. ML.GENERATE_TEXT function can also be used to generate text that describes visual content using a remote model based on a gemini-pro-vision multimodal model. Some key benefits are:
-
Ease of use: BigQuery's SQL integration lets you tap into advanced language model capabilities without requiring separate machine learning pipelines or specialized coding expertise.
-
Scalability: BigQuery's strengths in handling massive datasets pair nicely with LLMs to process customer reviews or other text sources at scale.
-
Insight generation: ML.GENERATE_TEXT assists in tasks such as:
-
- Sentiment analysis: Determine the overall emotional tone (positive, negative, neutral) in customer feedback.
- Theme extraction: Identify common topics and trends within reviews
- Summarization: Condense lengthy reviews into key points.
- Text completion & generation: Get help with responses, ad copy, or creative writing based on existing reviews.
ML.GENERATE_TEXT in action
Now, let’s take a look at how to use ML.GENERATE_TEXT, taking a hypothetical rideshare company as an example.
Setup instructions:
-
Before starting, choose your GCP project, link a billing account, and enable the necessary API; full instructions here.
-
Create a cloud resource connection and get the connection's service account; full guide here.
-
Grant access to the service account by following the steps here.
-
Load data. To oad from public storage account, using the following command:
-
Please replace '[PROJECT_ID.DATASET_ID]' with your project_id, and enter a name for your dataset
-
The command will create a table named ‘customer_review’ in your dataset
Sentiment analysis
Let’s walk through an example of performing sentiment analysis.
1. Create a model
Create a remote model in BigQuery that utilizes a Vertex AI foundation model.
Syntax:
Code example:
-
Please replace '[PROJECT_ID.DATASET_ID.MODEL_NAME]' with your project_id, dataset_id and model name
-
Please replace '[PROJECT_ID.REGION.CONNECTION_ID]' with your project_id, region and connection_id
2. Generate text
With just a few lines of SQL, you can analyze text or visual content in your BigQuery table using that model and the ML.GENERATE_TEXT function.
ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote model targets. Read the documentation to understand all the parameters of the ML.GENERATE_TEXT function.
Syntax:
Code example:
-
Please replace '[PROJECT_ID.DATASET_ID]' with your project_id and dataset_id
-
Please replace '[PROJECT_ID.REGION.CONNECTION_ID]' with your project_id, region and connection_id
3. Result:
In the prompt, we provided the model with context and two examples, clearly demonstrating our desired output format. You can validate that the outputs generated are aligned to the examples we provided to the model as part of a few-shot prompting approach.
In few-shot prompting, including a few examples of reviews with their corresponding sentiment labels is crucial for guiding the model's behavior. To help ensure the model's effectiveness in various situations, it's essential to offer a sufficient number of well-structured examples covering diverse review scenarios.
Then, by performing sentiment analysis on the customer reviews, we can have insights into their preferences and pain points regarding our products. By identifying key themes in the reviews, we can effectively communicate valuable feedback to the product team, enabling them to make informed, data-driven decisions and improvements.
In the table above, you can see the results of ML.GENERATE_TEXT including the input table along with the following columns:
-
ml_generate_text_result: This is the JSON response and the generated text is in the text element.
-
ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
-
ml_generate_text_rai_result: a STRING value that contains the safety attributes. This column is returned when flatten_json_output is TRUE.
-
ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Extracting themes
Now, let’s extract the theme from the reviews using the model we created above:
- Please replace '[PROJECT_ID.DATASET_ID]' with your project_id and dataset_id
Result:
Using ML.GENERATE_TEXT function and SQL from the BigQuery console, we can efficiently identify key themes within our customer reviews. This gives us deeper insight into customer perceptions and can provide actionable data to improve our products.
ML.GENERATE_TEXT is tightly integrated with the Gemini model, which is designed for higher input/output scale and better result quality across a wide range of tasks, like text processing including classification and summarization, sentiment analysis, and code generation.
Analyzing the themes
Now that we've identified the themes in our reviews, let's dive deeper with data canvas in BigQuery, an AI-centric experience to reimagine data analytics that we introduced at Next ‘24. BigQuery data canvas lets you discover, transform, query, and visualize data using natural language. It also provides a graphical interface that lets you work with data sources, queries, and visualizations in a directed acyclic graph (DAG), giving you a view of your analysis workflow that maps to your mental model.
Given that our themes are stored in the 'extract_themes' table, let's create a data canvas to analyze them further. Click the down arrow next to the '+' icon and choose 'Create Data canvas’
You will be brought to a screen where you can search for the ‘extract themes’ table and get started.
Once you’ve selected a table, you'll see it on a canvas where you can query it directly or join it with other tables.
To create a bar chart of themes, click on the 'Query' button and type 'bar chart for the most common theme and remove the null values and limit the result to top 10 values'. The AI understands your request and automatically generates the correct query, even though the 'themes' aren't in a dedicated column — the AI recognizes that the themes are found within the 'ml_genertae_text_llm_result' column. Finally, click 'Run' to see the query result.
Your theme data is ready! Click 'Visualize' to instantly see your bar chart.
You now have a bar chart of the extracted themes from your customer reviews, plus automatically generated helpful insights based on the data and explanations of your findings.
In short, BigQuery data canvas lets you analyze your data from start to finish with simple natural language commands: Discover relevant data, merge it with customer information, find key insights, collaborate with your team members, and create reports — all in one place. Plus, you can save these results or combine them with other data for further analysis or extract it into a notebook.
Curious to learn more? The official ML.GENERATE_TEXT documentation has all the details.