IBM Patented Euler’s 200 year old Math Technique

LetRxiv is the successor to Papers with Code after the latter’s shutdown.
quick summary
IBM holds a patent on the use of derivatives to find convergence of a generalized continued fraction.
Here’s the odd thing: all they did was implement a number theory technique by Gauss, Euler and Ramanujan in PyTorch and call backward() on the computation graph.
https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6ca86b 0f66 4ff0 afaf
Now IBM’s patent trolls can fare on a math technique that has existed for more than 200 years.

As always, the code is available here google collab And GitHub,

2021 paper CoFrNets: interpretable neural architecture inspired by continuous fractions (Puri et al., 2021) Investigates the use of continued fractions in neural network design.

The paper took 13 pages to make this claim: Continuous fractions (like MLPS) are universal approximations.

Writers reinvented the wheel countless times:

  1. He converted continued fractions into ‘ladders’.

  2. They call the fundamental bifurcation the ‘1/z nonlinearity’.

  3. Ultimately, they take the well-defined concept of generalized continued fractions and call them CoFrNets.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F87420e7e 9ea4 4c35 9b82
The authors rename generalized continued fractions. Taken from page 2 of (Puri et al., 2021)

To be honest, the newspaper is full of pretentious nonsense like this:

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75ecbb2 014f 4166 8f69
Authors tell jokes while collecting rent on 200 years of math knowledge. Taken from page 2

simple continued fractions Mathematical expressions of the form are:

Where? Pn , Whyn Is nth convergence (Cook, 2022),

Continued fractions have been used by mathematicians:

  1. Estimated Pie (MJD, 2014),

  2. Design Gear System (Brocott, 1861)

  3. Even Ramanujan’s mathematics tricks used continued fractions (Barrow, 2000).

Continued fractions have been well studied and covered in previous LeeRxiv guides (Lehmer, 1931): continued fraction factorization method And Stern–Brocot fractions as floating-point alternatives,

If your background is in AI, a constant fraction looks exactly like a linear layer but the bias term is replaced by another linear layer.

(Jones, 1980) defines generalized continued fraction As an expression of form:

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbca69ec1 6f99 42ad ae9a

More economically written as:

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2fc1ca 5733 4fa8 ba94

Where a and b can be integers or polynomials.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F7427d9ea 8cce 4c6c 9d6f
The authors replace the word continued fraction with ‘ladder’ to hide the fact that they are re-inventing the wheel.

The authors simply implement a continued fraction library in PyTorch and call the backward() function on the resulting computation graph.

That is, they concatenate linear neural network layers and use reciprocity (not RELU) as the primary nonlinearity.

Then they replace the bias term of the current linear layer with another linear layer. This is a generalized continued fraction.

In Pytorch, their architecture is as follows:

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

class CoFrNet(nn.Module): 
    def __init__(self, input_dim, num_ladders=10, depth=6, num_classes=3, epsilon=0.1):
        super(CoFrNet, self).__init__()
        self.depth = depth
        self.epsilon = epsilon
        self.num_classes = num_classes

        #Linear layers for each step in each ladder
        self.weights = nn.ParameterList([
            nn.Parameter(torch.randn(num_ladders, input_dim)) for _ in range(depth + 1)
        ])

        #Output weights for each class
        self.output_weights = nn.Parameter(torch.randn(num_ladders, num_classes))

    def safe_reciprocal(self, x):
        return torch.sign(x) * 1.0 / torch.clamp(torch.abs(x), min=self.epsilon)

    def forward(self, x):
        batch_size = x.shape[0]
        num_ladders = self.weights[0].shape[0]

        # Compute continued fractions for all ladders
        current = torch.einsum(’nd,bd->bn’, self.weights[self.depth], x)

        # Build continued fractions from bottom to top
        for k in range(self.depth - 1, -1, -1):
            a_k = torch.einsum(’nd,bd->bn’, self.weights[k], x)
            current = a_k + self.safe_reciprocal(current)

        # Linear combination for each class
        output = torch.einsum(’bn,nc->bc’, current, self.output_weights)
        return output

def test_on_waveform():
    # Load Waveform-like dataset
    X, y = make_classification(
        n_samples=5000, n_features=40, n_classes=3, n_informative=10,
        random_state=42
    )

    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    # Standardize
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # Convert to torch tensors
    X_train = torch.FloatTensor(X_train)
    X_test = torch.FloatTensor(X_test)
    y_train = torch.LongTensor(y_train)
    y_test = torch.LongTensor(y_test)

    # Model
    input_dim = 40
    num_classes = 3
    model = CoFrNet(input_dim, num_ladders=20, depth=6, num_classes=num_classes)

    # Training
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    epochs = 100
    batch_size = 64

    for epoch in range(epochs):
        model.train()
        permutation = torch.randperm(X_train.size()[0])

        for i in range(0, X_train.size()[0], batch_size):
            indices = permutation[i:i+batch_size]
            batch_x, batch_y = X_train[indices], y_train[indices]

            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

        # Validation
        if epoch % 10 == 0:
            model.eval()
            with torch.no_grad():
                train_outputs = model(X_train)
                train_preds = torch.argmax(train_outputs, dim=1)
                train_acc = (train_preds == y_train).float().mean()

                test_outputs = model(X_test)
                test_preds = torch.argmax(test_outputs, dim=1)
                test_acc = (test_preds == y_test).float().mean()

            print(f’Epoch {epoch:3d} | Loss: {loss.item():.4f} | Train Acc: {train_acc:.4f} | Test Acc: {test_acc:.4f}’)

    print(f”\nFinal Test Accuracy: {test_acc:.4f}”)
    return test_acc.item()

if __name__ == “__main__”:
    accuracy = test_on_waveform()
    print(f”CoFrNet achieved {accuracy:.1%} accuracy on Waveform dataset”)

When tested on a non-linear waveform dataset, we see these results:

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F37a0d533 1c60 4fa4 aa18
CoFrNet learns a non-linear dataset

Accuracy of 61%.

Nowhere near SOTA and that’s to be expected.

Continued fractions are well studied and any number theorist will tell you that gradients vanish i.e. the variation of power series has limits.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff1de21 ae2c 4a80 b4e8
The authors use power series of continued fractions to explain their moderate success. Taken from page 6 of (Puri et al., 2021)

Even Euler’s original work (Euler, 1785) Point out the fact: It is an infinite series so optimization by differentiation has its limits.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e8b39e 1abb 4b41 b68c

Pytorch’s AutoDiff engine transforms differential series with a different computational graph.

The authors simply implemented a continuous fraction library in Pytorch and, as expected, they saw that the gradient could be optimized.

As reviewers have noted, the idea seems new but the technology is nowhere near SOTA and the truth is, continued variants have existed for some time. They simply replace the linear layers of the neural network with normalized continuous fractions.

Here’s the strange result: Author filed for patent in 2022 on their ‘discussion-oriented’ paper.

their The patent was published And its status is marked as Pending.

Here’s the thing:

  1. Continuous fractions have been in existence longer than IBM.

  2. Differentiation of continued fractions is well known.

  3. The authors did not deviate anything from Euler’s 1785 work.

Now, if IBM feels like litigating they can sue Sage, Mathematica, Wolfram or even you for coding 249 year old math technology.

  1. Mechanical Engineer, Robotics and Industrialist

    https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe384f30 9950 4d8c b44c
    Taken from page 30 of Introduction to continued fractions (Moore, 1964)
  2. pure mathematician and mathematics teacher

    I have a PhD in mathematics and I learned about patency while investigating continued fractions and their relation to elliptic curves (van der Porten, 2004).,

    I was trying to model an elliptic divisibility sequence in Python (using Pytorch) and that’s how I found out about IBM’s patent.

    https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd05e618 8145 4b8e 8bda
    Summary of the 2004 paper elliptic curves and continuous fractions (van der Porten, 2004)
  3. Numerical Analyst and Computation Scientist/Sage and Maple Programmer

    numerical analysis is the use of computer algorithms for approximate solutions to mathematics and physics problems (Shi, 2024),

    Continued fractions are used in error analysis when evaluating integrals and entire books describe these algorithms (Cuyt et al., 2008).,

    https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F76e71960 55a8 46e2 b436



Leave a Comment