Functional Neural Networks

Posted on Sun 27 October 2024 in articles

After spending several years in the data industry, there's one trend that has consistently stood out to me: the heavy reliance on data aggregation, often leads to information being lost in the process. From A/B testing to high-level metrics and model features, data is commonly distilled into summaries that, while simplifying, can also obscure valuable details. In this article, I aim to highlight an alternative approach—Functional Data Analysis (FDA). While not applicable for every case, it is particularly relevant for scenarios where a metric is tracked over time, distance, or other measurements.

FDA treats each "unit of information" as a curve, not a single number. I'm including a few good examples of functional data below. Here you have temperatures tracked through time for several days in Adelaide, Australia. Each line represents a unique unit of information!

Why is functional data important? Consider comparing growth rates between boys and girls, tracking monthly cumulative spending by customers in the App Store, or analyzing how Spotify users listening time builds over the week. Summarizing these with a single number can obscure insights, and bucketing data into broad categories, like days of the week, risks oversimplification. However, when plotted as curves, trends and behaviors become clearer—even to non-technical stakeholders. For example - when does electricity demand rise, and to what scale? Are there any ritualistic habits during particular seasons? FDA reveals these insights, making behaviors easier to explore.

Consider seasonal electricity demand in Adelaide: during winter, demand spikes in the evenings for heating, while in summer, it rises during the hottest daytime hours for cooling. However, during the early morning (3AM to 6AM) there is a higher electricity demand during summer. A simple example, but one that illustrates the value of FDA in uncovering meaningful patterns.

Unfortunately, most models in the industry don’t handle 'curves' well. That's why I want to show you how to build a Neural Network that can learn from curves, rather than just making use of scalar features.

Learning from Curves

Functional data is often stored in discrete form, such as temperature measurements taken every 30 minutes. To apply FDA, we convert discrete data to its functional form using 'basis expansion'. This process estimates coefficients \(c_m\) and multiplies them by a respective basis function \(\phi_m(t)\), producing a smooth curve where "\(t\)" represents a variable like time or distance. The \(M\) denotes the number of basis functions used.

\begin{equation*} x(t) = \sum_{m=1}^M c_m \; \phi_m(t) \end{equation*}

Since we are working with functional data, we replace scalar values (like \(x_k\) ) with a function \(x_k(t)\). Additionally, given we are now dealing with functions, our weights must also be functions, leading to the following representation for the Functional Neural Network (FNN):

\begin{equation*} v_n = g \left( \int_{\mathbb{T}} \beta_{nk}(t) \; x_{k}(t) \; dt + b_n \right) \end{equation*}

In this equation, \(v_n\) represents neuron \(n\), \(g(.)\) denotes the activation function, \(b_n\) is the neuron's bias term, and \(\beta_{nk}(t)\) is the functional weight for the functional variable \(k\). These weights are represented by basis function coefficients \(c_{nmk}\) with a linear combination of basis functions \(\phi_{nmk}(t)\):

\begin{equation*} v_n = g \left( \int_{\mathbb{T}} \sum^{M}_{m=1} c_{nmk} \; \phi_{nmk}(t) \; x_k(t) \; dt + b_n \right) \end{equation*}

\begin{equation*} \; = g \left( \sum^{M}_{m=1} c_{nmk} \int_{\mathbb{T}} \phi_{nmk}(t) \; x_k(t) \; dt + b_n \right) \end{equation*}

The neural network learns \(c_{nmk}\) to represent these functional weights. If there are also scalar features, the network can incorporate them by adding the standard \(Wx\) form:

\begin{equation*} v_n = g \left( \sum^{M}_{m=1} c_{nmk} \int_{\mathbb{T}} \phi_{nmk}(t) \; x_k(t) \; dt + \sum_{j=1}^J w_{jn} x_j + b_n \right) \end{equation*}

Through integration, the neural network learns \(c\), allowing us to interpret the importance it assigns to different values of \(t\).

Building a Functional Neural Network

Let’s dive into a practical example. We’re working with two functional features: temperature (in Celsius) and electricity consumption (in kilowatts), each measured every 30 minutes in Adelaide, Australia. The objective? To classify the season based on these features. While distinguishing winter from summer is straightforward, spring and autumn are more challenging due to their similar patterns. Don’t believe it? Just look at the average daily temperature curves per season. Even in electricity demand, summer and winter stand out more clearly, as we saw earlier.

To provide a basis for comparison, I trained a standard Multi-Layer Perceptron Classifier (MLPC), Random Forest, and Gradient Boosted Trees using the discrete data—48 points each for temperature and electricity. The accuracies achieved were:

MLPC: 49%
Random Forest: 75%
Gradient Boosted Tree: 82%

With the Functional Neural Network (FNN), we use coefficients from the basis expansion rather than raw data. I selected 5 basis functions per variable, creating just 10 inputs instead of 96. The coefficients allow us to generate a smoother curve of 100 discrete points, providing a better integral estimate than the initial 48. Below is a high-level look at the FNN’s architecture, in this case we are not using scalar features, hence the dotted arrow.

Functional Neural Network Architecture

The confusion matrix below shows the results obtained over the test set, resulting in an accuracy of 70%. While this is lower than the Random Forest and Gradient Boosted Trees, it’s a solid starting point, especially given the reduced input size.

FNN Confusion Matrix
Predicted	Spring	Summer	Autumn	Winter
Spring	48	19	16	6
Summer	4	83	8	0
Autumn	14	13	49	14
Winter	8	0	6	75

Curious about the code? A snippet is provided below, and the full example is available on my GitHub for those ready to dive deeper.

import math
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class FunctionalLayer(nn.Module):
    def __init__(self, B, S, support, basis_fn_cnt):
       super(FunctionalLayer, self).__init__()
       # Grid for the support T (e.g., hours)
       self.grid = torch.linspace(support[0], support[1], 100)
       # Functional Object Basis Matrix
       self.B = B.float()
       # Φ Matrix
       self.S = S.float()
       # We don't need gradients for the splines
       self.B.requires_grad_(False)
       self.S.requires_grad_(False)

    def forward(self, X):
       # Obtain x(t)
       functional_data = torch.matmul(X, self.B).unsqueeze(1)
       # Φ(t) • x(t)
       S = self.S.unsqueeze(0)
       integrand = S * functional_data
       # Approximate Integral
       integral = torch.trapz(integrand, self.grid, dim=-1)
       return integral

class FNN(nn.Module):
    def __init__(self, temp_basis_matrix, beta_mat_temp, temp_ranges,
                elect_basis_matrix, beta_mat_elect, elect_ranges, basis_fn_cnt):
       super(FNN, self).__init__()
       # Functional sections, which estimate the integrals
       self.temp_layer = FunctionalLayer(temp_basis_matrix, beta_mat_temp, temp_ranges, basis_fn_cnt)
       self.elect_layer = FunctionalLayer(elect_basis_matrix, beta_mat_elect, elect_ranges, basis_fn_cnt)

       # Feed Forward Section & it's Norm
       self.fc1 = nn.Linear(10, 48)
       self.fc_norm_1 = nn.LayerNorm(48, bias=True, elementwise_affine=True)
       # Add dropout for the Feed Forward training
       self.dropout = torch.nn.Dropout(p=0.20)
       # Output Layer
       self.fc2 = nn.Linear(48, 4)   # Output layer

    def forward(self, X, training=True):
       # Process in set of Functional Coefficients to obtain integrals
       temp_res = self.temp_layer(X[0].float())
       elect_res = self.elect_layer(X[1].float())

       # Concatenate the Integrals
       X = torch.cat((temp_res, elect_res), dim=1)

       # Standardize it
       mean = X.mean(dim=0, keepdim=True)
       std = X.std(dim=0, keepdim=True)
       X = (X - mean) / std

       # Pass through Feed Forward
       X = self.fc1(X)
       X = self.dropout(torch.relu(self.fc_norm_1(X)))

       # Output layer
       return self.fc2(X)

Improving the Model with Derivatives

One major advantage of FDA is the ability to analyze derivatives of any order after converting data into functional objects. This allows us to create new features that capture changes in our functional variables over time, enhancing our model’s learning potential. During exploratory data analysis, I discovered a significant difference in the second derivatives (acceleration) of temperature between spring and autumn. As shown below, a noticeable gap appears between 12 PM and 2 PM.

To improve model performance, I added the coefficients of the temperature and electricity consumption derivatives. By incorporating these new features into the FNN, accuracy rose to 73%. However, this still fell short of the Gradient Boosted Tree model's performance.

FNN Confusion Matrix with Derivatives
Predicted	Spring	Summer	Autumn	Winter
Spring	49	19	11	10
Summer	2	83	4	0
Autumn	10	11	53	16
Winter	7	0	7	75

Introducing Latent Factors

I suspected there was shared information between electricity demand and temperature, so I introduced latent factors into the model. Using a shared matrix, I mapped the electricity and temperature curves into a lower-dimensional space via \(\Gamma\), capturing their common features. Then, each of these transformed vectors (i.e., \(\eta^{(k)}\) below) per functional variable are multiplied by a variable-specific matrix (denoted by \(\Lambda^{(k)}\) ), transforming it into another set of coefficients ( \(\beta^{(k)}\) ) that will be used to generate Latent Functional Factors.

\begin{equation*} \eta^{(k)}_{i} = \Gamma x^{(k)}_{i} + \epsilon \end{equation*}

\begin{equation*} \beta^{(k)}_{i} = \Lambda^{(k)} \eta^{(k)}_{i} + \delta^{(k)} \end{equation*}

This transformation allows us to capitalize on the latent factors that our \(k\) input curves share, enabling us to extract even more information from our data. This addition to our FNN changes its architecture to the following:

Functional Neural Network Architecture with Latent Factors

This approach improved our accuracy to 90%—a solid jump from 73% which surpasses the Gradient Boosted model. The confusion matrix is given below:

FNN Confusion Matrix with Latent Factors
Predicted	Spring	Summer	Autumn	Winter
Spring	75	7	4	3
Summer	2	82	5	0
Autumn	5	0	75	10
Winter	1	0	0	88

A more important question now is: "Why did it work?" The following images provide a clear explanation. Although the latent factor derived from electricity demand failed to effectively separate the seasons, the latent factor created from temperature data clearly distinguishes between Spring and Autumn, solving the main issue we faced.

The PyTorch "forward" function is changed to:

def forward(self, X, training=True):
    # Latent Factor Transformation
    latent_temp = self.gamma(X[4].float())
    latent_elect = self.gamma(X[5].float())
    latent_temp = self.lambda_temp(latent_temp)
    latent_elect = self.lambda_elect(latent_elect)

    # Process set of Functional Coefficients to obtain integrals
    temp_res = self.temp_layer(X[0].float())
    elect_res = self.elect_layer(X[1].float())
    temp_acc_res = self.temp_acc_layer(X[2].float())
    elect_acc_res = self.elect_acc_layer(X[3].float())
    latent_temp = self.latent_temp_layer(latent_temp)
    latent_elect = self.latent_elect_layer(latent_elect)

    # Concatenate the Integrals
    X = torch.cat((temp_res, elect_res, temp_acc_res, elect_acc_res, latent_temp, latent_elect), dim=1)

    # Standardize it
    mean = X.mean(dim=0, keepdim=True)
    std = X.std(dim=0, keepdim=True)
    X = (X - mean) / std

    # Pass through Feed Forward
    X = self.fc1(X)
    X = self.dropout_1(torch.relu(self.fc_norm_1(X)))
    X = self.fc2(X)
    X = self.dropout_2(torch.relu(self.fc_norm_2(X)))

    # Output layer
    return self.fc3(X)

The Potential of FNNs

FNNs are powerful not just for receiving curves as inputs but also for predicting. For instance, you could predict the cumulative number of iPhones needing repair during the first six months after launch and then analyze the derivatives of the output curve for better operational planning. With generative AI, FNNs should also be able to create realistic scenarios, like simulating seasonally adjusted electricity demand across new locations or predicting customer behavior patterns, which can be invaluable for planning and decision making. However, GenAI for FNNs is something I'm still exploring, maybe for a future article!

I hope this article sparks your curiosity about FDA. It’s an underutilized yet highly valuable tool in the Data Science world, one that can provide richer insights and more interpretable models. If you're interested in learning more, you can find the code I used for this example on my GitHub. Lastly, I want to credit "Fun Data Science" on Youtube, whose video was what inspired me to write this post.