Visualization_with_Seaborn
Anatomy of seaborn code
import seaborn as sns
sns.plot(data=df, x='col1', y='col2')
main params are x, y and data:
- data is an option param for the name of the pandas df
- x is the col name for the x axis
- y is the col name for the y axis the available plot options with season are:
- bar chart
sns.barplot - scatter plot
sns.scatterplot - line chart
sns.lineplot() - histogram
sns.histplot() - KDE plot
sns.kdeplot() - box plot
sns.boxplot()
making a bar plot in seaborn
df = pd.read_csv('restaurant_data.csv')
sns.barplot(data=df, x='server', y='sales_totals')
The default aggregation method is to take the mean of each group, but we can change this by setting the estimator parameter to another method. Many functions from the NumPy library can be called like np.median, np.max, and np.std. This code will show us the same plot but with the median for each group instead of the mean.
import numpy as np
sns.barplot(data=df, x='server', y='sales_totals', estimator=np.median, ci = None)
ci is confidence interval, it's set at 95% by default by we can hide the bars by setting the param to None
making a scatter plot
sns.scatterplot(data=df, x='daily_customers', y='sales_totals')
Visual patterns we look for in scatter plots include:
- Spacing: Points that are close together in a line or curve pattern show a stronger relationship. Points that are spaced out or more cloud-like show a weaker relationship.
- Orientation: A pattern of points starting in the lower left corner and following up to the upper right corner shows a positive relationship. A negative relationship might appear as a pattern of points starting in the upper left corner and following down to the lower right corner.
making a line chart
sns.lineplot(data=df, x='month', y='sales_totals')
making distribution plots
Histograms
sns.histplot(data=df, x='sales_totals')
Using y instead of x will create a histogram with horizontal bars.
Seaborn sets the bins parameter to auto by default, but we can change the binning of values in a number of ways.
- Number of bins: an integer for the number of bins to fit the data to
- Bin breaks: a list of values for where bins should start and end
- Reference rule: the name of a method to compute the optimal bin width, including
auto(the larger of thesturgesandfdreference rules)
KDE plots
sns.kdeplot(data=df, x='sales_totals', fill=True)
Kernel density estimation plot displays a continuous probability density curve for the distribution
Box plots
sns.boxplot(data=df, x='sales_totals', y='day')
The box plot communicates specific information about each category’s distribution through a pattern of lines and a box
modify chart params with
error bars and uncertainty
sns.barplot(data=df, x='server', y='sales_totals', errcolor='red', errwidth=5, capsize=0.5)
error bars include color (errcolor), width (errwidth), and cap length (capsize). For example, we could use the following code to make a bar plot of mean sales_totals for each server with thick red error bars with long caps.
sns.lineplot(data=df, x='month', y='sales_totals', err_style='bars')
The style can be changed from a shaded error band to vertical error bars using the err_style parameter as shown in the following code.
group with hue and style
sns.scatterplot(data=df, x='daily_customers', y='sales_totals', hue='weekday')
sns.lineplot(data=df, x='month', y='sales', hue='location', style='location', linewidth=3)
Prompting
AI chat systems (likeChatGPT and Gemini) are called “large language models” (LLMs) because they are trained on massive text datasets. Through this training process, LLMs “learn” patterns and relationships between words in a language. Given a starting point, like a question, an LLM can use these learned patterns to generate plausible responses. That starting point is what we call a prompt. Prompt engineering is the art of designing prompts that result in the most effective output from an LLM. We’ll spend most of this course on tips and tricks for engineering the most effective prompts. However, one of the most important tips is to remember that interactive with an AI assistant is iterative: we don’t need to get everything right in the first prompt. In fact, it is often more efficient (and more powerful!) to converse back-and-forth with the AI assistant throughout the analytics process, iterating and improving upon the AI output as we go. Here are some ways to structure a conversation:
- Explain the goal (I’m working on an analytics report about hotel cancellations)
- Provide context (I have three dataframes with the following names)
- State your request clearly (Write a query to determine cancelation rates by month)
- Review and refine (Can you explain this piece of the query you generated:)
- Continuing the conversation (That’s great, what other types of statistics should I look into for this report?)
Types of prompts
It can be helpful to think of two basic types of prompts: open-ended vs detailed / goal-oriented.
Open-Ended Prompts
Open-ended prompts tend to be broader and less specific and can be used for
- Brainstorming creative, out-of-the-box ideas
- Reducing the amount of time spent writing prompts
- More free exploration of the data without constraints
Goal-oriented prompts
Goal-oriented prompts tend to be more detailed. They are useful for
- More precise and constrained responses like performing specific calculations
- Complex tasks requiring multiple steps
- Reducing ambiguity resulting in fewer prompting iterations
- More control over the output format like for visualizations Here are some general tips to engineer a goal-oriented prompt
- Set the overall goal
- Establish the currentcode state
- Describe the dataset and relevant columns
- Outline what the code should achieve
- Provide important details If you’re not satisfied with a response, try experimenting with both open-ended and detailed approaches until you find the right balance for your task. For example, if your task involves designing a visualization, it may take a couple of iterations until you find the right design. Consider adding more context at each iteration until you find a solution that satisfies your own style as a data analyst! Through the course of the lesson, we’ll go over prompting strategies for both open-ended and detailed approaches for data analytic tasks. Specifically, we’ll go more into depth on how to design detailed prompts step-by-step for tasks like creating visualizations, restructuring a dataset, and debugging errors.
