# Understanding Data

## Data, What Are You Telling Me?

Insights are within the sea of data. Data is telling us a story of the past, present and future all the time, but we need to `see` it.

`import pandas as pdimport matplotlib.pyplot as pltimport numpy as npimport seaborn as sns# Load example datatips_df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")# Read columns into liststotal_bill = tips_df['total_bill'].tolist()tip = tips_df['tip'].tolist()sex = tips_df['sex'].tolist()smoker = tips_df['smoker'].tolist()day = tips_df['day'].tolist()time = tips_df['time'].tolist()size = tips_df['size'].tolist()bill_ranges = [0, 10, 20, 30, 40, 50, 60, 70, 80]labels = ['0-10', '10-20', '20-30', '30-40', '40-50', '50-60', '60-70', '70-80']# Group the tips by bill ranges and sextips_df['total_bill_range'] = pd.cut(tips_df['total_bill'], bins=bill_ranges, labels=labels, include_lowest=True)bill_sex_group = tips_df.groupby(['total_bill_range', 'sex']).size().reset_index(name='count')# Plot the bar chart using seabornsns.barplot(x="total_bill_range", y="count", hue="sex", data=bill_sex_group)plt.title("Total Tips by Bill Ranges and Sex")plt.xlabel("Total Bill Ranges")plt.ylabel("Total Tips")plt.show()` This may suggest to us that the biggest spenders with bills \$[10 to 20] and the likely candidates are males. How would you make guesses on this trend?

The Likely Candidate Pool Or Cluster

`import plotly.express as pximport pandas as pd# Load example datatips = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")# Create scatter plot with regression linefig = px.scatter(tips, x='total_bill', y='tip', trendline='ols')# Set the title and axis labelsfig.update_layout(title='Total Bill vs. Tip Scatter Plot', xaxis_title='Total Bill', yaxis_title='Tip')fig.show()` We see the bill versus the tip. The highest bill is \$50.81 with a tip of \$10. People below the line would be consider less generous (whether willing or less able to) with tipping. Would we even suggest to people below the line for donations if the service is support a cause?

Let’s use another library (bokeh) to see.

`from bokeh.plotting import figure, showfrom bokeh.models import ColumnDataSourcefrom bokeh.transform import linear_cmapfrom bokeh.palettes import Spectral6import pandas as pdfrom bokeh.io import…`