Logo
Brain/Visualization_with_Seaborn

Visualization_with_Seaborn

#visualization#seaborn#Python_for_Data_Engineering|python#EDA

Anatomy of seaborn code

import seaborn as sns

sns.plot(data=df, x='col1', y='col2')

main params are x, y and data:

  • data is an option param for the name of the pandas df
  • x is the col name for the x axis
  • y is the col name for the y axis the available plot options with season are:
  • bar chart sns.barplot
  • scatter plot sns.scatterplot
  • line chart sns.lineplot()
  • histogram sns.histplot()
  • KDE plot sns.kdeplot()
  • box plot sns.boxplot()

making a bar plot in seaborn

df = pd.read_csv('restaurant_data.csv')
sns.barplot(data=df, x='server', y='sales_totals')

The default aggregation method is to take the mean of each group, but we can change this by setting the estimator parameter to another method. Many functions from the NumPy library can be called like np.median, np.max, and np.std. This code will show us the same plot but with the median for each group instead of the mean.

import numpy as np
sns.barplot(data=df, x='server', y='sales_totals', estimator=np.median, ci = None)

ci is confidence interval, it's set at 95% by default by we can hide the bars by setting the param to None

making a scatter plot

sns.scatterplot(data=df, x='daily_customers', y='sales_totals')

Visual patterns we look for in scatter plots include:

  • Spacing: Points that are close together in a line or curve pattern show a stronger relationship. Points that are spaced out or more cloud-like show a weaker relationship.
  • Orientation: A pattern of points starting in the lower left corner and following up to the upper right corner shows a positive relationship. A negative relationship might appear as a pattern of points starting in the upper left corner and following down to the lower right corner.

making a line chart

sns.lineplot(data=df, x='month', y='sales_totals')

making distribution plots

Histograms

sns.histplot(data=df, x='sales_totals')

Using y instead of x will create a histogram with horizontal bars. Seaborn sets the bins parameter to auto by default, but we can change the binning of values in a number of ways.

  • Number of bins: an integer for the number of bins to fit the data to
  • Bin breaks: a list of values for where bins should start and end
  • Reference rule: the name of a method to compute the optimal bin width, including auto (the larger of the sturges and fd reference rules)

KDE plots

sns.kdeplot(data=df, x='sales_totals', fill=True)

Kernel density estimation plot displays a continuous probability density curve for the distribution

Box plots

sns.boxplot(data=df, x='sales_totals', y='day')

The box plot communicates specific information about each category’s distribution through a pattern of lines and a box

modify chart params with

error bars and uncertainty

sns.barplot(data=df, x='server', y='sales_totals', errcolor='red', errwidth=5, capsize=0.5)

error bars include color (errcolor), width (errwidth), and cap length (capsize). For example, we could use the following code to make a bar plot of mean sales_totals for each server with thick red error bars with long caps.

sns.lineplot(data=df, x='month', y='sales_totals', err_style='bars')

The style can be changed from a shaded error band to vertical error bars using the err_style parameter as shown in the following code.

group with hue and style

sns.scatterplot(data=df, x='daily_customers', y='sales_totals', hue='weekday')
sns.lineplot(data=df, x='month', y='sales', hue='location', style='location', linewidth=3)

Prompting

AI chat systems (likeChatGPT and Gemini) are called “large language models” (LLMs) because they are trained on massive text datasets. Through this training process, LLMs “learn” patterns and relationships between words in a language. Given a starting point, like a question, an LLM can use these learned patterns to generate plausible responses. That starting point is what we call a prompt. Prompt engineering is the art of designing prompts that result in the most effective output from an LLM. We’ll spend most of this course on tips and tricks for engineering the most effective prompts. However, one of the most important tips is to remember that interactive with an AI assistant is iterative: we don’t need to get everything right in the first prompt. In fact, it is often more efficient (and more powerful!) to converse back-and-forth with the AI assistant throughout the analytics process, iterating and improving upon the AI output as we go. Here are some ways to structure a conversation:

  1. Explain the goal (I’m working on an analytics report about hotel cancellations)
  2. Provide context (I have three dataframes with the following names)
  3. State your request clearly (Write a query to determine cancelation rates by month)
  4. Review and refine (Can you explain this piece of the query you generated:)
  5. Continuing the conversation (That’s great, what other types of statistics should I look into for this report?)

Types of prompts

It can be helpful to think of two basic types of prompts: open-ended vs detailed / goal-oriented.

Open-Ended Prompts

Open-ended prompts tend to be broader and less specific and can be used for

  • Brainstorming creative, out-of-the-box ideas
  • Reducing the amount of time spent writing prompts
  • More free exploration of the data without constraints

Goal-oriented prompts

Goal-oriented prompts tend to be more detailed. They are useful for

  • More precise and constrained responses like performing specific calculations
  • Complex tasks requiring multiple steps
  • Reducing ambiguity resulting in fewer prompting iterations
  • More control over the output format like for visualizations Here are some general tips to engineer a goal-oriented prompt
  • Set the overall goal
  • Establish the currentcode state
  • Describe the dataset and relevant columns
  • Outline what the code should achieve
  • Provide important details If you’re not satisfied with a response, try experimenting with both open-ended and detailed approaches until you find the right balance for your task. For example, if your task involves designing a visualization, it may take a couple of iterations until you find the right design. Consider adding more context at each iteration until you find a solution that satisfies your own style as a data analyst! Through the course of the lesson, we’ll go over prompting strategies for both open-ended and detailed approaches for data analytic tasks. Specifically, we’ll go more into depth on how to design detailed prompts step-by-step for tasks like creating visualizations, restructuring a dataset, and debugging errors.

Linked to this note