Time Series¶

Examples:

  • Stock Prices
  • Weather
  • Heart Monitor Data

Components:

  • Trend- Idenitify a Trend with Visualization and Smoothing techniques
  • Seasonality - repeating patterns, predictable. Idenitify using decomposition
  • Cyclicality - long term, irregular fluctuations that are not strictly seasonal. Ex: Economic cycle. Identify with statistical models
  • Noise - Random variations in data that do not follow a pattern. Ex : water pipe break leading to extreme water usage
  • Stationary - Time series where statistical properties do not chagne over time - mean, variance, autocorrelation. Identify using rolling statistics
In [3]:
import pandas as pd
data = pd.read_csv('Water_Usage_Data.csv', parse_dates = ['Date'], index_col='Date')
In [ ]:
print(data.head())
print(data.info())
print(data.describe())
            total_gallons  residential_gallons  multi_family_gallons  \
Date                                                                   
2019-01-01    17240452.01          6896553.991           4936297.028   
2019-01-02    20204457.43          7262534.651           4687956.808   
2019-01-03    19367188.31          6214681.059           4649834.954   
2019-01-04    19294498.24          6032113.704           4552227.705   
2019-01-05    18073429.03          6678241.313           4914119.605   

            commercial_gallons  industrial_gallons  public_authority_gallons  
Date                                                                          
2019-01-01         2899003.223         372027.0000               2136570.769  
2019-01-02         4954364.353         538747.0000               2760854.620  
2019-01-03         5051775.413         602900.0000               2847996.884  
2019-01-04         5249283.403         613821.9999               2847051.425  
2019-01-05         3844688.871         430945.0000               2205434.239  
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 600 entries, 2019-01-01 to 2020-08-23
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   total_gallons             600 non-null    float64
 1   residential_gallons       600 non-null    float64
 2   multi_family_gallons      600 non-null    float64
 3   commercial_gallons        600 non-null    float64
 4   industrial_gallons        600 non-null    float64
 5   public_authority_gallons  600 non-null    float64
dtypes: float64(6)
memory usage: 32.8 KB
None
       total_gallons  residential_gallons  multi_family_gallons  \
count   6.000000e+02         6.000000e+02          6.000000e+02   
mean    2.540250e+07         6.709635e+06          6.665081e+06   
std     5.380030e+07         1.485217e+06          3.594590e+07   
min     4.506101e+05         1.000827e+05         -4.879963e+06   
25%     1.970888e+07         6.087324e+06          5.105664e+06   
50%     2.097685e+07         6.702186e+06          5.308363e+06   
75%     2.296736e+07         7.266935e+06          5.549265e+06   
max     9.176957e+08         2.578089e+07          8.850357e+08   

       commercial_gallons  industrial_gallons  public_authority_gallons  
count        6.000000e+02        6.000000e+02              6.000000e+02  
mean         5.956399e+06        8.955720e+05              5.175812e+06  
std          1.569350e+07        6.936311e+05              3.679105e+07  
min          8.751938e+04        2.208200e+04             -5.895248e+06  
25%          3.726428e+06        5.384380e+05              2.866413e+06  
50%          5.055808e+06        6.830705e+05              3.555144e+06  
75%          5.598446e+06        7.690850e+05              3.939894e+06  
max          3.420817e+08        4.000079e+06              8.982956e+08  
In [7]:
data.dropna(inplace=True)
data.dropna(axis = 1, inplace = True)
data.dropna(thresh=3, inplace=True)

Check for Stationary¶

from statsmodels.tsa.stattools import adfullerresult = adfuller(data[""])

Applying LOG¶

import numpy as np data["your_column"] = np.log(data["column])

Resampling - convert data to weekly or hourly data¶

Downsampling and Upsampling

Time Series Decomposition¶

Additive¶

Yt = Trendt + Seasonalityt + Residualt

Multiplicative¶

Yt = Trendt x Seasonalityt x Residualt

Decomposition¶

In [8]:
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

# Simulated time series: seasonal + trend + noise
np.random.seed(42)
period = 12  # e.g., monthly data with yearly seasonality
time = np.arange(100)
data = 0.1 * time + 2 * np.sin(2 * np.pi * time / period) + np.random.normal(0, 0.5, size=len(time))

# Create DataFrame
ts = pd.Series(data, index=pd.date_range(start="2020-01-01", periods=100, freq="M"))

# Decompose
result = seasonal_decompose(ts, model="additive", period=period)

# Plot
result.plot()
plt.suptitle("Seasonal Decomposition", fontsize=14)
plt.tight_layout()
plt.show()
C:\Users\joshu\AppData\Local\Temp\ipykernel_10448\2482162769.py:13: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  ts = pd.Series(data, index=pd.date_range(start="2020-01-01", periods=100, freq="M"))
No description has been provided for this image
In [ ]: