Understanding Data

Data, What Are You Telling Me?

Mi'kail Eli'yah
7 min readMay 8

--

Insights are within the sea of data. Data is telling us a story of the past, present and future all the time, but we need to `see` it.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Load example data
tips_df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")

# Read columns into lists
total_bill = tips_df['total_bill'].tolist()
tip = tips_df['tip'].tolist()
sex = tips_df['sex'].tolist()
smoker = tips_df['smoker'].tolist()
day = tips_df['day'].tolist()
time = tips_df['time'].tolist()
size = tips_df['size'].tolist()

bill_ranges = [0, 10, 20, 30, 40, 50, 60, 70, 80]
labels = ['0-10', '10-20', '20-30', '30-40', '40-50', '50-60', '60-70', '70-80']

# Group the tips by bill ranges and sex
tips_df['total_bill_range'] = pd.cut(tips_df['total_bill'], bins=bill_ranges, labels=labels, include_lowest=True)
bill_sex_group = tips_df.groupby(['total_bill_range', 'sex']).size().reset_index(name='count')

# Plot the bar chart using seaborn
sns.barplot(x="total_bill_range", y="count", hue="sex", data=bill_sex_group)

plt.title("Total Tips by Bill Ranges and Sex")
plt.xlabel("Total Bill Ranges")
plt.ylabel("Total Tips")
plt.show()
This may suggest to us that the biggest spenders with bills $[10 to 20] and the likely candidates are males. How would you make guesses on this trend?

The Likely Candidate Pool Or Cluster

import plotly.express as px
import pandas as pd

# Load example data
tips = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")
# Create scatter plot with regression line
fig = px.scatter(tips, x='total_bill', y='tip', trendline='ols')
# Set the title and axis labels
fig.update_layout(title='Total Bill vs. Tip Scatter Plot', xaxis_title='Total Bill', yaxis_title='Tip')

fig.show()
We see the bill versus the tip. The highest bill is $50.81 with a tip of $10. People below the line would be consider less generous (whether willing or less able to) with tipping. Would we even suggest to people below the line for donations if the service is support a cause?

Let’s use another library (bokeh) to see.

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.transform import linear_cmap
from bokeh.palettes import Spectral6
import pandas as pd
from bokeh.io import…

--

--

Mi'kail Eli'yah