CNN vs. FIR Filter: Visualizing Neural Network Weights for Signal Processing

For decades, signal processing engineers have lived by a clear credo: define the problem, derive the optimal solution mathematically, and implement it using structures we understand—FIRs, IIRs, FFTs, and Kalman filters. We demand interpretability. We need to know why a system works, not just that it does work.

This requirement for interpretability is the primary reason many senior engineers view Deep Learning (DL) with skepticism. Neural networks, with their millions of auto-tuned parameters, often appear as unapproachable “black boxes.”

However, if we strip away the hype and look at the fundamental mathematics of a 1D Convolutional Neural Network (CNN) used in time-series analysis, we find something surprisingly familiar. We find that deep learning is not replacing classical signal processing; it is automating it.

This article aims to demystify the 1D CNN for the signal processing engineer. We will demonstrate that a single convolutional layer is mathematically identical to a bank of Finite Impulse Response (FIR) filters. Furthermore, by training a simple network to perform a classic task—matched filtering in noise—we will visualize the learned weights and prove that the network effectively “re-invents” the optimal classical solution.

The Theoretical Bridge: CNNs are Adaptive FIR Filters

The disconnect between DSP and DL often stems from differing nomenclature for identical concepts.

The Classical View (DSP)

In discrete-time signal processing, the output $y[n]$ of an FIR filter with impulse response coefficients $h[k]$ acting on an input signal $x[n]$ is defined by the convolution sum:

$$y[n] = (x * h)[n] = \sum_{k=0}^{K-1} h[k] \cdot x[n-k]$$

where $K$ is the filter length (number of taps). We design the coefficients $h[k]$ to achieve a desired frequency response—low-pass, band-pass, or matched to a specific waveform.

The Deep Learning View

In a 1D CNN layer, we define a “kernel” (or “filter”) of size $K$. During the forward pass, this kernel slides across the input data. Let the learnable weights of the kernel be denoted by $w$. The output at position $n$, denoted as $y_n$, is calculated as the dot product of the weights and the local input segment:

$$y_n = \sum_{k=0}^{K-1} w_k \cdot x_{n+k}$$

(Note: Deep learning libraries typically implement cross-correlation rather than strict mathematical convolution ^[1], which just means the time index is flipped. The fundamental operation—a sliding dot product against fixed coefficients—is identical.)

The critical realization is this: The weights $w$ in a 1D CNN kernel are exactly equivalent to the tap coefficients $h$ of an FIR filter.

The difference lies not in the structure, but in how the coefficients are obtained.

In DSP, we derive $h$ analytically (e.g., window method, Parks-McClellan) based on specifications.
In Deep Learning, we initialize $w$ randomly and use gradient descent (backpropagation) to iteratively adjust them to minimize a specific error metric (loss function).

Therefore, training a 1D CNN layer is essentially an automated, data-driven process of finding optimal FIR filter coefficients.

Case Study: Reinventing the Matched Filter

To prove this concept, let’s look at the cornerstone of optimal detection: the Matched Filter^[3].

We know from classical theory that to maximize the Signal-to-Noise Ratio (SNR) when detecting a known signal $s[n]$ in Additive White Gaussian Noise (AWGN), the optimal filter impulse response $h_{opt}[n]$ is a time-reversed and conjugated version of the signal:

$$h_{opt}[n] = s^*[-n]$$

If we train a simple, linear 1D CNN to detect a pulse in noise, and the theory holds, the network should autonomously learn weights that resemble this time-reversed pulse.

The Experiment Setup

Signal: A simple rectangular pulse of length 10 samples.
Noise: Heavy AWGN added to the signal.
Task: Identify the center location of the pulse within the noisy data.
The Model: A single 1D Convolutional layer with one filter of kernel size 10 (matching the pulse length). Crucially, we will use no activation function (linear activation). This ensures the model remains a pure linear FIR filter.

Python Implementation: Visualizing the “Black Box”

The following Python script using sklearn demonstrates this. It trains the model and then extracts the learned weights to plot their impulse and frequency responses using standard DSP techniques.

Note on Mathematical Equivalence: For a single-layer linear CNN without activation, the “optimal” weights it searches for via Gradient Descent are the same coefficients found by Scikit-learn’s LinearRegression.

is loading …

Analyzing the Results

Running the script produces plots similar to the figure below.

1. Time Domain Interpretation (Impulse Response)

The left plot shows the actual weights learned by the Linear regression model (blue) compared to the theoretically ideal matched filter coefficients (green dashed).

The results are striking. The neural network initialized with random noise, and through the process of minimizing the Mean Squared Error between its output and the clean target signal, it converged almost perfectly to the classical matched filter solution. It “learned” that a rectangular pulse is the optimal template for detecting a rectangular pulse.

2. Frequency Domain Interpretation

The right plot uses scipy.signal.freqz to treat the learned weights exactly as if they were FIR taps ^[2].

Because our target signal was a rectangular pulse in time, its frequency representation is a Sinc function. A matched filter must have the same spectral shape to maximize SNR. The plot confirms that the frequency response of the weights perfectly matches the main lobe and side lobes of the ideal Sinc filter.

The network acted as an automated filter designer, shaping its passband to capture the signal energy while attenuating out-of-band noise.

Conclusion

The skepticism surrounding deep learning in signal processing is healthy, but it shouldn’t be paralyzing. As demonstrated, a 1D CNN layer is not magic; it is a highly efficient, data-driven mechanism for implementing adaptive FIR filters.

While this example used a linear model for demonstration, the true power of deep learning emerges when we stack multiple convolutional layers separated by non-linear activation functions (like ReLU). This creates a cascade of non-linear filters, capable of learning complex, hierarchical representations of signals that analytical derivations could never hope to capture.

Deep Learning^[4] is a powerful addition to the DSP toolbox, one whose foundations rest firmly on the classical theory we already know.

Now that you understand how a single layer works, see how we handle more complex sequences in our guide to Viterbi Decoding of Convolutional Codes or explore the foundations of randomness in The Glories of Gaussianity.

References

[1] PyTorch Documentation – Conv1d

[2] SciPy Signal Processing – freqz

[3] DSPRelated – The Matched Filter

[4] IEEE Signal Processing Magazine