mirror of
https://github.com/jackyzha0/quartz.git
synced 2025-12-28 15:24:06 -06:00
Add data visualization examples with Matplotlib
This commit is contained in:
parent
120c57721f
commit
ea1930aa2f
@ -1,3 +1,197 @@
|
||||
https://kolibril13.github.io/plywood-gallery-matplotlib-examples/
|
||||
|
||||
https://matplotlib.org/stable/plot_types/basic/index.html
|
||||
https://matplotlib.org/stable/plot_types/basic/index.html
|
||||
|
||||
# Introduction to Data Visualization with Matplotlib
|
||||
|
||||
1. **Pyplot Interface Introduction**:
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
plt.plot([1, 2, 3, 4])
|
||||
plt.ylabel('some numbers')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
2. **Adding Data to Axes**:
|
||||
```python
|
||||
plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
|
||||
plt.ylabel('Average Temp')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
3. **Combining Multiple Data Sets**:
|
||||
```python
|
||||
plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60], label='Seattle')
|
||||
plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [30, 35, 40, 50], label='New York')
|
||||
plt.ylabel('Temp')
|
||||
plt.legend()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
4. **Customizing Plots**:
|
||||
```python
|
||||
plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60], linestyle='--', color='r', marker='o')
|
||||
plt.ylabel('Temp')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
5. **Axes Labels and Titles**:
|
||||
```python
|
||||
plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
|
||||
plt.xlabel('Month')
|
||||
plt.ylabel('Temp')
|
||||
plt.title('Monthly Temp in Seattle')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
6. **Creating Subplots**:
|
||||
```python
|
||||
fig, ax = plt.subplots()
|
||||
ax.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
|
||||
plt.show()
|
||||
```
|
||||
|
||||
7. **Subplots Customization**:
|
||||
```python
|
||||
fig, (ax1, ax2) = plt.subplots(1, 2)
|
||||
ax1.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
|
||||
ax2.plot(['Jan', 'Feb', 'Mar', 'Apr'], [30, 35, 40, 50])
|
||||
plt.show()
|
||||
```
|
||||
|
||||
# Time-series Data with Matplotlib
|
||||
|
||||
#timeseries
|
||||
|
||||
To create a DataFrame for `climate_change`, you can follow a structure similar to this:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# Sample data structure
|
||||
data = {
|
||||
"date": ["2020-01-01", "2020-01-02", "2020-01-03", ...],
|
||||
"co2": [414.7, 415.0, 415.3, ...],
|
||||
"relative_temp": [0.25, 0.27, 0.29, ...]
|
||||
}
|
||||
|
||||
climate_change = pd.DataFrame(data)
|
||||
climate_change['date'] = pd.to_datetime(climate_change['date'])
|
||||
climate_change.set_index('date', inplace=True)
|
||||
```
|
||||
|
||||
Replace the ellipses (`...`) with your actual data. This code sets up a DataFrame with date, CO2, and temperature data, converting the date column to a datetime format and setting it as the index, which is typical for time-series data.
|
||||
|
||||
1. **Plotting Basic Time-series Data**:
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
ax.plot(climate_change.index, climate_change['co2'])
|
||||
ax.set_xlabel('Time')
|
||||
ax.set_ylabel('CO2 (ppm)')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
2. **Zooming into Specific Time Frames**:
|
||||
- Decade Zoom:
|
||||
```python
|
||||
sixties = climate_change["1960-01-01":"1969-12-31"]
|
||||
ax.plot(sixties.index, sixties['co2'])
|
||||
```
|
||||
- Year Zoom:
|
||||
```python
|
||||
sixty_nine = climate_change["1969-01-01":"1969-12-31"]
|
||||
ax.plot(sixty_nine.index, sixty_nine['co2'])
|
||||
```
|
||||
|
||||
3. **Plotting Multiple Time-series Together**:
|
||||
```python
|
||||
ax.plot(climate_change.index, climate_change["co2"])
|
||||
ax.plot(climate_change.index, climate_change["relative_temp"])
|
||||
```
|
||||
|
||||
4. **Using Twin Axes for Different Variables**:
|
||||
```python
|
||||
ax2 = ax.twinx()
|
||||
ax2.plot(climate_change.index, climate_change["relative_temp"])
|
||||
```
|
||||
|
||||
5. **Differentiating Variables by Color**:
|
||||
```python
|
||||
ax.plot(climate_change.index, climate_change["co2"], color='blue')
|
||||
ax2.plot(climate_change.index, climate_change["relative_temp"], color='red')
|
||||
```
|
||||
|
||||
6. **Customizing Annotations on Time-series Plots**:
|
||||
- Basic Annotation:
|
||||
```python
|
||||
ax2.annotate(">1 degree", xy=[pd.Timestamp("2015-10-06"), 1])
|
||||
```
|
||||
- With Text Positioning and Arrows:
|
||||
```python
|
||||
ax2.annotate(">1 degree", xy=(pd.Timestamp('2015-10-06'), 1), xytext=(pd.Timestamp('2008-10-06'), -0.2), arrowprops={})
|
||||
```
|
||||
|
||||
7. **Creating a Function for Time-series Plotting**:
|
||||
```python
|
||||
def plot_timeseries(axes, x, y, color, xlabel, ylabel):
|
||||
axes.plot(x, y, color=color)
|
||||
axes.set_xlabel(xlabel)
|
||||
axes.set_ylabel(ylabel, color=color)
|
||||
```
|
||||
|
||||
Each section provides practical instructions and code snippets for effectively visualizing and analyzing time-series data using Matplotlib in Python.
|
||||
|
||||
# Quantitative Comparisons: Bar-Charts
|
||||
|
||||
1. **Olympic Medals Data**: Presents a dataset on Olympic medals and demonstrates basic bar chart plotting.
|
||||
2. **Rotating Tick Labels**: Shows how to rotate axis labels for clarity.
|
||||
3. **Visualizing Multiple Medal Types**: Explains stacking bars for different medal types (Gold, Silver, Bronze).
|
||||
4. **Adding a Legend**: Illustrates how to add a legend to distinguish between different bars.
|
||||
5. **Histograms**: Discusses creating histograms to represent distribution of data.
|
||||
6. **Error Bars in Bar Charts**: Teaches how to add error bars to bar charts for statistical representation.
|
||||
7. **Boxplots**: Describes creating boxplots for data distribution analysis.
|
||||
8. **Scatter Plots**: Introduces scatter plots for comparing two quantitative variables.
|
||||
|
||||
Here are example codes for each key point from "Quantitative Comparisons: Bar-Charts":
|
||||
|
||||
1. **Olympic Medals Data**:
|
||||
```python
|
||||
medals.plot(kind='bar')
|
||||
```
|
||||
|
||||
2. **Rotating Tick Labels**:
|
||||
```python
|
||||
plt.xticks(rotation=45)
|
||||
```
|
||||
|
||||
3. **Visualizing Multiple Medal Types**:
|
||||
```python
|
||||
medals.plot(kind='bar', stacked=True)
|
||||
```
|
||||
|
||||
4. **Adding a Legend**:
|
||||
```python
|
||||
plt.legend()
|
||||
```
|
||||
|
||||
5. **Histograms**:
|
||||
```python
|
||||
plt.hist(data)
|
||||
```
|
||||
|
||||
6. **Error Bars in Bar Charts**:
|
||||
```python
|
||||
plt.bar(x, height, yerr=error)
|
||||
```
|
||||
|
||||
7. **Boxplots**:
|
||||
```python
|
||||
data.boxplot()
|
||||
```
|
||||
|
||||
8. **Scatter Plots**:
|
||||
```python
|
||||
plt.scatter(x, y)
|
||||
```
|
||||
|
||||
These examples provide a basic structure for each type of plot, which you can adapt and expand upon based on your specific data and visualization needs.
|
||||
25
content/AI&DATA/Data Science/seaborn.md
Normal file
25
content/AI&DATA/Data Science/seaborn.md
Normal file
@ -0,0 +1,25 @@
|
||||
#seaborn
|
||||
|
||||
# Difference Between Catplot vs Relplot
|
||||
|
||||
1. **Catplot**: The `catplot` function is used for plotting categorical data. It provides a high-level interface for drawing categorical plots onto a FacetGrid. This function provides access to several axes-level functions that show the relationship between a numerical and one or more categorical variables using one of several visual representations. Examples include box plots, violin plots, bar plots, and strip plots.
|
||||
|
||||
2. **Relplot**: The `relplot` function, on the other hand, is used for plotting relational data. It provides a high-level interface for drawing attractive and informative statistical graphics. This function is particularly good for visualizing the relationship between two numerical variables, often using scatter plots or line plots. It also utilizes a FacetGrid, allowing you to create a grid of plots by mapping dataset columns to the rows and columns of the grid.
|
||||
|
||||
The key difference lies in the type of data they are best suited for: `catplot` is for categorical data and allows you to choose among different types of categorical plots, whereas `relplot` is for relational (usually numerical) data and focuses on scatter and line plots. Both functions offer the flexibility to plot across a FacetGrid, enabling easy comparison of subgroups within your data.
|
||||
|
||||
#whiskers
|
||||
|
||||
In Seaborn, which is a Python data visualization library based on matplotlib, "whiskers" are a component of box plots and are used to represent the variability outside the upper and lower quartiles of the data. They provide a visual indication of the spread of the data and potential outliers.
|
||||
|
||||
Here's a detailed explanation of how whiskers work in Seaborn's box plots:
|
||||
|
||||
1. **Box**: The central box of a box plot represents the interquartile range (IQR), which is the range between the first quartile (25th percentile) and the third quartile (75th percentile). The line inside the box indicates the median of the data.
|
||||
|
||||
2. **Whiskers**: The whiskers extend from the box to show the range of the data. By default, the whiskers extend to 1.5 * IQR above the third quartile and 1.5 * IQR below the first quartile. Any data point beyond this range is considered an outlier and is often represented with a different marker such as a dot.
|
||||
|
||||
3. **Outliers**: Points outside the end of the whiskers are considered outliers and are usually plotted individually.
|
||||
|
||||
It's important to note that the default behavior for whiskers can be adjusted in Seaborn. You can change the multiplier for the IQR to adjust how far the whiskers extend from the box. Additionally, you can set whiskers to represent a specific percentile or a specific range of data, depending on your specific needs and the nature of your dataset.
|
||||
|
||||
Using these whiskers, box plots provide a concise summary of the distribution of the data, highlighting the median, quartiles, and potential outliers. This makes them extremely useful for comparing distributions between several groups or datasets.
|
||||
BIN
docs/pdfs/matplot1.pdf
Normal file
BIN
docs/pdfs/matplot1.pdf
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user