quartz/content/AI&DATA/Data Science/Data visualization with Matplotlib.md
2023-12-10 23:56:19 +03:00

5.8 KiB

https://kolibril13.github.io/plywood-gallery-matplotlib-examples/

https://matplotlib.org/stable/plot_types/basic/index.html

Introduction to Data Visualization with Matplotlib

pdf

  1. Pyplot Interface Introduction:

    import matplotlib.pyplot as plt
    plt.plot([1, 2, 3, 4])
    plt.ylabel('some numbers')
    plt.show()
    
  2. Adding Data to Axes:

    plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
    plt.ylabel('Average Temp')
    plt.show()
    
  3. Combining Multiple Data Sets:

    plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60], label='Seattle')
    plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [30, 35, 40, 50], label='New York')
    plt.ylabel('Temp')
    plt.legend()
    plt.show()
    
  4. Customizing Plots:

    plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60], linestyle='--', color='r', marker='o')
    plt.ylabel('Temp')
    plt.show()
    
  5. Axes Labels and Titles:

    plt.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
    plt.xlabel('Month')
    plt.ylabel('Temp')
    plt.title('Monthly Temp in Seattle')
    plt.show()
    
  6. Creating Subplots:

    fig, ax = plt.subplots()
    ax.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
    plt.show()
    
  7. Subplots Customization:

    fig, (ax1, ax2) = plt.subplots(1, 2)
    ax1.plot(['Jan', 'Feb', 'Mar', 'Apr'], [40, 42, 50, 60])
    ax2.plot(['Jan', 'Feb', 'Mar', 'Apr'], [30, 35, 40, 50])
    plt.show()
    

Time-series Data with Matplotlib

#timeseries pdf

To create a DataFrame for climate_change, you can follow a structure similar to this:

import pandas as pd

# Sample data structure
data = {
    "date": ["2020-01-01", "2020-01-02", "2020-01-03", ...],
    "co2": [414.7, 415.0, 415.3, ...],
    "relative_temp": [0.25, 0.27, 0.29, ...]
}

climate_change = pd.DataFrame(data)
climate_change['date'] = pd.to_datetime(climate_change['date'])
climate_change.set_index('date', inplace=True)

Replace the ellipses (...) with your actual data. This code sets up a DataFrame with date, CO2, and temperature data, converting the date column to a datetime format and setting it as the index, which is typical for time-series data.

  1. Plotting Basic Time-series Data:

    import matplotlib.pyplot as plt
    ax.plot(climate_change.index, climate_change['co2'])
    ax.set_xlabel('Time')
    ax.set_ylabel('CO2 (ppm)')
    plt.show()
    
  2. Zooming into Specific Time Frames:

    • Decade Zoom:
      sixties = climate_change["1960-01-01":"1969-12-31"]
      ax.plot(sixties.index, sixties['co2'])
      
    • Year Zoom:
      sixty_nine = climate_change["1969-01-01":"1969-12-31"]
      ax.plot(sixty_nine.index, sixty_nine['co2'])
      
  3. Plotting Multiple Time-series Together:

    ax.plot(climate_change.index, climate_change["co2"])
    ax.plot(climate_change.index, climate_change["relative_temp"])
    
  4. Using Twin Axes for Different Variables:

    ax2 = ax.twinx()
    ax2.plot(climate_change.index, climate_change["relative_temp"])
    
  5. Differentiating Variables by Color:

    ax.plot(climate_change.index, climate_change["co2"], color='blue')
    ax2.plot(climate_change.index, climate_change["relative_temp"], color='red')
    
  6. Customizing Annotations on Time-series Plots:

    • Basic Annotation:
      ax2.annotate(">1 degree", xy=[pd.Timestamp("2015-10-06"), 1])
      
    • With Text Positioning and Arrows:
      ax2.annotate(">1 degree", xy=(pd.Timestamp('2015-10-06'), 1), xytext=(pd.Timestamp('2008-10-06'), -0.2), arrowprops={})
      
  7. Creating a Function for Time-series Plotting:

    def plot_timeseries(axes, x, y, color, xlabel, ylabel):
        axes.plot(x, y, color=color)
        axes.set_xlabel(xlabel)
        axes.set_ylabel(ylabel, color=color)
    

Each section provides practical instructions and code snippets for effectively visualizing and analyzing time-series data using Matplotlib in Python.

Quantitative Comparisons: Bar-Charts

!chapter3 (3).pdf

  1. Olympic Medals Data: Presents a dataset on Olympic medals and demonstrates basic bar chart plotting.
  2. Rotating Tick Labels: Shows how to rotate axis labels for clarity.
  3. Visualizing Multiple Medal Types: Explains stacking bars for different medal types (Gold, Silver, Bronze).
  4. Adding a Legend: Illustrates how to add a legend to distinguish between different bars.
  5. Histograms: Discusses creating histograms to represent distribution of data.
  6. Error Bars in Bar Charts: Teaches how to add error bars to bar charts for statistical representation.
  7. Boxplots: Describes creating boxplots for data distribution analysis.
  8. Scatter Plots: Introduces scatter plots for comparing two quantitative variables.

Here are example codes for each key point from "Quantitative Comparisons: Bar-Charts":

  1. Olympic Medals Data:

    medals.plot(kind='bar')
    
  2. Rotating Tick Labels:

    plt.xticks(rotation=45)
    
  3. Visualizing Multiple Medal Types:

    medals.plot(kind='bar', stacked=True)
    
  4. Adding a Legend:

    plt.legend()
    
  5. Histograms:

    plt.hist(data)
    
  6. Error Bars in Bar Charts:

    plt.bar(x, height, yerr=error)
    
  7. Boxplots:

    data.boxplot()
    
  8. Scatter Plots:

    plt.scatter(x, y)
    

These examples provide a basic structure for each type of plot, which you can adapt and expand upon based on your specific data and visualization needs.