Snowflake Cortex LLM Functions Explained (with Examples)

Snowflake Cortex LLM Functions Explained (with Examples)

I have been exploring Snowflake Cortex LLM functions, which were recently released. These functions provide us with instant access to leading LLM models from Mistral, Meta, and Google. The best part is that these LLMs are fully hosted and managed by Snowflake, providing performance, scalability, and governance. The data resides inside Snowflake, eliminating the need for any third-party tools, and it is easily accessible via SQL and Python.

In this blog post, I am explaining Snowflake Cortex functions with examples using SQL functions. The available Cortex functions are COMPLETE SENTIMENT EXTRACT_ANSWER SUMMARIZE TRANSLATE

For demonstration purposes, I have utilized the Amazon UK shoes product reviews dataset, which I loaded into Snowflake under the table called REVIEWS in the database AIPROJECT and schema dbo.

The dataset includes PRODUCT_NAME, REVIEW_TITLE, REVIEW_TEXT, and REVIEW_DATE.

Image by Author

TRANSLATE

Translate function translate text from source language to targeted language

Syntax

SNOWFLAKE.CORTEX.TRANSLATE(
    <text>, <source_language>, <target_language>)

This below SQL first creates a common table expression (CTE) called TRANSLATE, which selects REVIEW_TITLE and REVIEW_TEXT columns from the REVIEWS table and determines the country based on the content of the REVIEW_DATE column. Then, it selects REVIEW_TEXT and performs translation for reviews from Mexico using the SNOWFLAKE.CORTEX.TRANSLATE function, translating from Spanish ’es’ to English ’en’ . Finally, it filters the results to include only reviews from Mexico (COUNTRY = ‘MEXICO’).

– Define a Common Table Expression (CTE) named TRANSLATE
WITH TRANSLATE AS (
    -- Selecting columns REVIEW_TITLE and REVIEW_TEXT, and determining the country
    SELECT
        REVIEW_TITLE,
        REVIEW_TEXT,
        -- Determine the country based on the content of REVIEW_DATE
        CASE 
            WHEN POSITION('United States' IN REVIEW_DATE) > 0 THEN 'UNITED STATES'
            WHEN POSITION('India' IN REVIEW_DATE) > 0 THEN 'INDIA'
            WHEN POSITION('France' IN REVIEW_DATE) > 0 THEN 'FRANCE'
            WHEN POSITION('Italy' IN REVIEW_DATE) > 0 THEN 'ITALY'
            WHEN POSITION('Spain' IN REVIEW_DATE) > 0 THEN 'SPAIN'
            WHEN POSITION('Germany' IN REVIEW_DATE) > 0 THEN 'GERMANY'
            WHEN POSITION('Canada' IN REVIEW_DATE) > 0 THEN 'CANADA'
            WHEN POSITION('Singapore' IN REVIEW_DATE) > 0 THEN 'SINGAPORE'
            WHEN POSITION('Mexico' IN REVIEW_DATE) > 0 THEN 'MEXICO'
            WHEN POSITION('Australia' IN REVIEW_DATE) > 0 THEN 'AUSTRALIA'
            ELSE NULL  
        END AS Country
    FROM
        REVIEWS
)

-- Selecting REVIEW_TEXT and performing translation for reviews from Mexico
SELECT
    REVIEW_TEXT, COUNTRY,
    SNOWFLAKE.CORTEX.TRANSLATE(REVIEW_TEXT, 'es', 'en') AS TRANSLATION
FROM
    TRANSLATE
WHERE
    COUNTRY = 'MEXICO';

Result

Image by Author

Another example involves converting German text to English by filtering for the country Germany.

-- Translating German to English
SELECT
    REVIEW_TEXT, COUNTRY,
    SNOWFLAKE.CORTEX.TRANSLATE(REVIEW_TEXT, 'de', 'en') AS TRANSLATION
FROM
    TRANSLATE
WHERE
    COUNTRY = 'GERMANY';

Result

Image by Author

To learn more about supported languages for translation and for additional details, please check here.

SUMMARIZE

Summarize function returns a summary of the given English text. Source text to be in English

Syntax

SNOWFLAKE.CORTEX.SUMMARIZE(<text>)

The below script first categorizes reviews based on the country mentioned in the REVIEW_DATE column. Then, it selects REVIEW_TEXT and generates a summary using SNOWFLAKE.CORTEX.SUMMARIZE function, filtering specifically for reviews from the United States and limiting the output to 8 summaries.

USE DATABASE AIPROJECT;
USE SCHEMA DBO;
USE ROLE ACCOUNTADMIN;


-- Query to summarize review texts by country in Snowflake

WITH SUMMARIZE AS (
    SELECT
        REVIEW_TITLE,
        REVIEW_TEXT,
        CASE 
            WHEN POSITION('United States' IN REVIEW_DATE) > 0 THEN 'UNITED STATES'
            WHEN POSITION('India' IN REVIEW_DATE) > 0 THEN 'INDIA'
            WHEN POSITION('France' IN REVIEW_DATE) > 0 THEN 'FRANCE'
            WHEN POSITION('Italy' IN REVIEW_DATE) > 0 THEN 'ITALY'
            WHEN POSITION('Spain' IN REVIEW_DATE) > 0 THEN 'SPAIN'
            WHEN POSITION('Germany' IN REVIEW_DATE) > 0 THEN 'GERMANY'
            WHEN POSITION('Canada' IN REVIEW_DATE) > 0 THEN 'CANADA'
            WHEN POSITION('Singapore' IN REVIEW_DATE) > 0 THEN 'SINGAPORE'
            WHEN POSITION('Mexico' IN REVIEW_DATE) > 0 THEN 'MEXICO'
            WHEN POSITION('Australia' IN REVIEW_DATE) > 0 THEN 'AUSTRALIA'
            ELSE NULL  
        END AS Country
    FROM
        REVIEWS
)

SELECT 
    REVIEW_TEXT,
    SNOWFLAKE.CORTEX.SUMMARIZE(REVIEW_TEXT) AS SUMMARY 
FROM 
    SUMMARIZE
WHERE 
    COUNTRY = 'UNITED STATES'
LIMIT 
    8;

Result

Image by Author

For additional details, please check here.

SENTIMENT

The SENTIMENT function provides a sentiment score ranging from -1 to 1 for English-language input text. A score of -1 indicates the most negative sentiment, while a score of 1 signifies the most positive sentiment. Scores around 0 suggest a neutral sentiment.

Syntax

SNOWFLAKE.CORTEX.SENTIMENT(<text>)

The below SQL first assigns a country based on the REVIEW_DATE using a Common Table Expression (CTE). Then, in the main query, it calculates the sentiment score for each review using the SNOWFLAKE.CORTEX.SENTIMENT function. The sentiment score ranges from -1 to 1, with -1 indicating the most negative sentiment and 1 indicating the most positive sentiment. Based on these scores, it categorizes the sentiment as follows:

  • ‘AVERAGE’ for scores greater than 0 and less than or equal to 0.5,
  • ‘GOOD’ for scores greater than 0.5 and less than or equal to 0.8,
  • ‘EXCEPTIONAL’ for scores greater than 0.8,
  • ‘BAD’ for scores less than 0,
  • ‘NEUTRAL’ for scores equal to 0. Finally, it filters the results to include only reviews from the United States.
-- Query to analyze sentiment of reviews from the United States in Snowflake

-- Common Table Expression (CTE) to assign country based on review date
WITH SENTIMENT AS (
    SELECT
        REVIEW_TITLE,
        REVIEW_TEXT,
        CASE 
            WHEN POSITION('United States' IN REVIEW_DATE) > 0 THEN 'UNITED STATES'
            ELSE NULL  
        END AS Country
    FROM
        REVIEWS
)

-- Main query to calculate sentiment score and categorize sentiment
SELECT 
    REVIEW_TEXT,
    SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) AS SENTIMENT_SCORE,
    CASE
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) > 0 AND SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) <= 0.5 THEN 'AVERAGE'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) > 0.5 AND SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) <= 0.8 THEN 'GOOD'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) > 0.8 THEN 'EXCEPTIONAL'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) < 0 THEN 'BAD'
        ELSE 'NEUTRAL'
    END AS SENTIMENT 
FROM 
    SENTIMENT
WHERE 
    COUNTRY = 'UNITED STATES';

Result

Image by Author

For additional details, please check here.

COMPLETE

Given a prompt, COMPLETE function generates a response (completion) using our choice of supported language model. Currently, the function supports the following models. Each models might different cost and quotas.

'mistral-large' 'mixtral-8x7b' 'llama2-70b-chat' 'mistral-7b'

Syntax

SNOWFLAKE.CORTEX.COMPLETE(
    <model>, <prompt_or_history> [ , <options> ] )

Options are used to define the hyperparameters of the model. We can specify parameters such as temperature, top_p, or max_tokensTemperature controls the randomness of the model’s output. Top-p influences both the randomness and diversity of the language model. Max_tokens determines the maximum number of output tokens in the response; a smaller number will result in truncated responses.

Below SQL categorizes reviews with ‘UNITED STATES’ as the country based on the presence of ‘United States’ in the review date. It then selects the review text from the REVIEWS table and generates a complete message using the SNOWFLAKE.CORTEX.COMPLETE function by giving this prompt Draft a short message that acknowledges the problem, includes an apology, and provides a short recommendation I did this for only one record and didn’t use options in this example.

WITH COMPLETE AS (
    -- This common table expression (CTE) categorizes the country as 'UNITED STATES'
    -- if the review date contains 'United States'.
    SELECT
        REVIEW_TITLE,
        REVIEW_TEXT,
        CASE 
            WHEN POSITION('United States' IN REVIEW_DATE) > 0 THEN 'UNITED STATES'
            ELSE NULL  
        END AS Country
    FROM
        REVIEWS
)

-- This query selects review text from the REVIEWS table and limits the result to one entry.
SELECT 
    REVIEW_TEXT, 
    -- The SNOWFLAKE.CORTEX.COMPLETE function generates a complete message by using the prompt
    SNOWFLAKE.CORTEX.COMPLETE('mistral-large', CONCAT('Draft a short message that acknowledges the problem, includes an apology, and provides a short recommendation.', REVIEW_TEXT)) AS COMPLETE 
FROM 
    COMPLETE
WHERE 
    COUNTRY = 'UNITED STATES'
LIMIT 
    1;

Result

Image by Author

Below message was generated by the COMPLETE function by passing our prompt

Subject: We're Sorry for the Inconvenience with Your New Clear Shoes

Dear Customer,

Thank you for choosing our unique clear shoes! We're thrilled to hear that you love their style and that they fit you well. We understand that the plastic material might take some time to get used to, and we sincerely apologize for any discomfort you may be experiencing.

To help you adjust to your new shoes, we recommend wearing them for short periods initially and gradually increasing the duration. This will give your feet time to adapt to the new material. Additionally, wearing thin socks or using cushioned insoles might provide extra comfort.

We appreciate your patience and understanding. If you have any other concerns or if there's anything else we can assist you with, please don't hesitate to reach out.

Best Regards,
[Your Name]
[Your Position]
[Company Name]

For more details on Options, please check here.

EXTRACT_ANSWER

The EXTRACT_ANSWER function retrieves an answer to a specified question from a text document. The document can either be a plain-English document or a string representation of semi-structured (JSON) data

Syntax

SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
    <source_document>, <question>)

Below SQL first creates a Common Table Expression (CTE) named EXTRACT to extract relevant data from the REVIEWS table, identifying the country based on the content of REVIEW_DATE. Then, it executes the main query to extract and parse the answer from the review text using Snowflake Cortex’s SNOWFLAKE.CORTEX.EXTRACT_ANSWER function. I am asking this question What product this review talks about?

Finally, it filters the results to include only reviews from the United States and limits the output to one result. It provides an answer in JSON format, so I parse the extracted answer using the PARSE_JSON function, cast it as a string, and store it in a new column called PARSED_ANSWER

-- Common Table Expression (CTE) to extract relevant data
WITH EXTRACT AS (
    SELECT
        REVIEW_TITLE,
        REVIEW_TEXT,
        -- Identify the country
        CASE 
            WHEN POSITION('United States' IN REVIEW_DATE) > 0 THEN 'UNITED STATES'
            ELSE NULL  
        END AS Country
    FROM
        REVIEWS
)

-- Main query to extract answer and parse it
SELECT 
    REVIEW_TEXT, -- Original review text
    -- Extract the answer using Cortex
    SNOWFLAKE.CORTEX.EXTRACT_ANSWER(REVIEW_TEXT, 'What product this review talks about?') AS ANSWER,
    -- Parse the answer from JSON and cast it as STRING
    PARSE_JSON(
        SNOWFLAKE.CORTEX.EXTRACT_ANSWER(REVIEW_TEXT, 'What product this review talks about?')::variant
    )[0]:answer::STRING AS PARSED_ANSWER 
FROM 
    EXTRACT
WHERE 
    COUNTRY = 'UNITED STATES' -- Filter for reviews from the United States
LIMIT 1; -- Limit to one result

Result

Image by Author

For additional details, please check here.

Current Restrictions

  1. Currently, these functions are available only in selected regions, such as AWS US East (N. Virginia), AWS US West (Oregon), AWS Europe (Frankfurt), Azure East US 2 (Virginia), and Azure West Europe (Netherlands)
  2. The models used for these LLM functions have limitations in terms of tokens
  3. To maintain a performance across customers, Snowflake cortex LLM functions are subject to usage quotas.
  4. Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role to access the Cortex LLM functions. By default, this database role is granted only to the ACCOUNTADMIN role. The ACCOUNTADMIN role must then grant this role to user roles to allow users access to Cortex LLM functions

For more details on the cost and restrictions, please refer to the official documentation.

In my next blog, I will take real data and execute the above functions using Snowpark. Thanks for reading.

Feel free to connect with me on LinkedIn!