Title: | Inference and Clustering of Functional Data |
---|---|
Description: | Some methods for the inference and clustering of univariate and multivariate functional data, using a generalization of Mahalanobis distance, along with some functions useful for the analysis of functional data. For further details, see Martino A., Ghiglietti, A., Ieva, F. and Paganoni A. M. (2017) <arXiv:1708.00386>. |
Authors: | Andrea Martino [aut, cre], Andrea Ghiglietti [aut], Francesca Ieva [aut], Anna Maria Paganoni [aut] |
Maintainer: | Andrea Martino <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-02-16 05:01:23 UTC |
Source: | https://github.com/cran/gmfd |
S3
Class for functional datasets.
A class for univariate or multivariate functional dataset.S3
Class for functional datasets.
A class for univariate or multivariate functional dataset.
funData(grid, data)
funData(grid, data)
grid |
the grid over which the functional dataset is defined. |
data |
a vector, a matrix or a |
The function returns a S3
object of class funData
, containing
the grid
over which the functional dataset is defined and a matrix or a list
of vectors or matrices containing the functional data
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) m2 <- t * ( 1 - t )^2 rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) FD <- funData( t, list( x1, x2 ) )
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) m2 <- t * ( 1 - t )^2 rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) FD <- funData( t, list( x1, x2 ) )
This function allows you to compute the distance between two curves with the chosen metric.
funDist(FD1, FD2, metric, p = NULL, lambda = NULL, phi = NULL, k_trunc = NULL)
funDist(FD1, FD2, metric, p = NULL, lambda = NULL, phi = NULL, k_trunc = NULL)
FD1 |
a functional data object of type |
FD2 |
a functional data object of type |
metric |
the chosen distance to be used: |
p |
a positive numeric value containing the parameter of the regularizing function for the generalized Mahalanobis distance. |
lambda |
a vector containing the eigenvalues in descending order of the functional data from which the curves are extracted. |
phi |
a matrix containing the eigenfunctions of the functional data in its columns from which the curves are extracted. |
k_trunc |
a positive numeric value representing the number of components at which the truncated mahalanobis distance must be truncated |
The function returns a numeric value indicating the distance between the two curves.
Ghiglietti A., Ieva F., Paganoni A. M. (2017). Statistical inference for stochastic processes: Two-sample hypothesis tests, Journal of Statistical Planning and Inference, 180:49-68.
Ghiglietti A., Paganoni A. M. (2017). Exact tests for the means of gaussian stochastic processes. Statistics & Probability Letters, 131:102–107.
# Define parameters: n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K ) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data z <- gmfd_simulate( n, m1, rho = rho, theta = theta ) # Extract two rows of the functional data x <- funData( t, z[1, ] ) y <- funData( t, z[2, ] ) lambda <- eigen(cov(z))$values phi <- eigen(cov(z))$vectors d <- funDist( x, y, metric = "mahalanobis", p = 1, lambda = lambda, phi = phi )
# Define parameters: n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K ) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data z <- gmfd_simulate( n, m1, rho = rho, theta = theta ) # Extract two rows of the functional data x <- funData( t, z[1, ] ) y <- funData( t, z[2, ] ) lambda <- eigen(cov(z))$values phi <- eigen(cov(z))$vectors d <- funDist( x, y, metric = "mahalanobis", p = 1, lambda = lambda, phi = phi )
This function computes the dissimilarity matrix containing the distances between the curves of the functional dataset
gmfd_diss(FD, metric, p = NULL, k_trunc = NULL)
gmfd_diss(FD, metric, p = NULL, k_trunc = NULL)
FD |
a functional data object of type |
metric |
the chosen distance to be used. Choose |
p |
a positive numeric value containing the parameter of the regularizing function for the generalized Mahalanobis distance. |
k_trunc |
a positive numeric value representing the number of components at which the truncated mahalanobis distance must be truncated |
The function returns a matrix of numeric values containing the distances between the curves.
Ghiglietti A., Ieva F., Paganoni A. M. (2017). Statistical inference for stochastic processes: Two-sample hypothesis tests, Journal of Statistical Planning and Inference, 180:49-68.
Ghiglietti A., Paganoni A. M. (2017). Exact tests for the means of gaussian stochastic processes. Statistics & Probability Letters, 131:102–107.
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K ) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x <- gmfd_simulate( n, m1, rho = rho, theta = theta ) FD <- funData( t, x ) D <- gmfd_diss( FD, metric = "L2" )
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K ) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x <- gmfd_simulate( n, m1, rho = rho, theta = theta ) FD <- funData( t, x ) D <- gmfd_diss( FD, metric = "L2" )
This function performs a k-means clustering algorithm on an univariate or multivariate functional data using a generalization of Mahalanobis distance.
gmfd_kmeans(FD, n.cl = 2, metric, p = NULL, k_trunc = NULL)
gmfd_kmeans(FD, n.cl = 2, metric, p = NULL, k_trunc = NULL)
FD |
a functional data object of type |
n.cl |
an integer representing the number of clusters. |
metric |
the chosen distance to be used: |
p |
a positive numeric value containing the parameter of the regularizing function for the generalized Mahalanobis distance. |
k_trunc |
a positive numeric value representing the number of components at which the truncated mahalanobis distance must be truncated |
The function returns a list with the following components:
cluster
: a vector of integers (from 1
to n.cl
) indicating the cluster to which each curve is allocated;
centers
: a list of d
matrices (k
x T
) containing the centroids of the clusters
Martino A., Ghiglietti A., Ieva F., Paganoni A. M. (2017). A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data, MOX report 44/2017
Ghiglietti A., Ieva F., Paganoni A. M. (2017). Statistical inference for stochastic processes: Two-sample hypothesis tests, Journal of Statistical Planning and Inference, 180:49-68.
Ghiglietti A., Paganoni A. M. (2017). Exact tests for the means of gaussian stochastic processes. Statistics & Probability Letters, 131:102–107.
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } s <- 0 for (k in 4:K) { s <- s + sqrt( rho[k] ) * theta[k, ] } m2 <- m1 + s # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) # Create a single functional dataset containing the simulated datasets: FD <- funData(t, rbind( x1, x2 ) ) output <- gmfd_kmeans( FD, n.cl = 2, metric = "mahalanobis", p = 10^6 )
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } s <- 0 for (k in 4:K) { s <- s + sqrt( rho[k] ) * theta[k, ] } m2 <- m1 + s # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) # Create a single functional dataset containing the simulated datasets: FD <- funData(t, rbind( x1, x2 ) ) output <- gmfd_kmeans( FD, n.cl = 2, metric = "mahalanobis", p = 10^6 )
Simulate a univariate functional sample using a Karhunen Loeve expansion.
gmfd_simulate(size, mean, covariance = NULL, rho = NULL, theta = NULL)
gmfd_simulate(size, mean, covariance = NULL, rho = NULL, theta = NULL)
size |
a positive integer indicating the size of the functional sample to simulate. |
mean |
a vector representing the mean of the sample. |
covariance |
a matrix from which the eigenvalues and eigenfunctions must be extracted. |
rho |
a vector of the eigenvalues in descending order to be used for the simulation. |
theta |
a matrix containing the eigenfunctions in its columns to be used for the simulation. |
The function returns a functional data object of type funData
.
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation # with the Karhunen - Loève expansion m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K ) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x <- gmfd_simulate( n, m1, rho = rho, theta = theta )
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation # with the Karhunen - Loève expansion m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K ) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x <- gmfd_simulate( n, m1, rho = rho, theta = theta )
Performs a two sample hypotesis tests on two samples of functional data.
gmfd_test(FD1, FD2, conf.level = 0.95, stat_test, p = NULL, k_trunc = NULL)
gmfd_test(FD1, FD2, conf.level = 0.95, stat_test, p = NULL, k_trunc = NULL)
FD1 |
a functional data object of type |
FD2 |
a functional data object of type |
conf.level |
confidence level of the test. |
stat_test |
the chosen test statistic to be used: |
p |
a vector of positive numeric value containing the parameters of the regularizing function for the generalized Mahalanobis distance. |
k_trunc |
a positive numeric value representing the number of components at which the truncated mahalanobis distance must be truncated |
The function returns a list with the following components:
statistic
the value of the test statistic.
quantile
the value of the quantile.
p.value
the p-value for the test.
Ghiglietti A., Ieva F., Paganoni A. M. (2017). Statistical inference for stochastic processes: Two-sample hypothesis tests, Journal of Statistical Planning and Inference, 180:49-68.
Ghiglietti A., Paganoni A. M. (2017). Exact tests for the means of gaussian stochastic processes. Statics & Probability Letters, 131:102–107.
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } s <- 0 for ( k in 4:K ) { s <- s + sqrt( rho[k] ) * theta[k,] } m2 <- m1 + 0.1 * s # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) FD1 <- funData( t, x1 ) FD2 <- funData( t, x2 ) output <- gmfd_test( FD1, FD2, 0.95, "mahalanobis", p = 10^5 )
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } s <- 0 for ( k in 4:K ) { s <- s + sqrt( rho[k] ) * theta[k,] } m2 <- m1 + 0.1 * s # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) FD1 <- funData( t, x1 ) FD2 <- funData( t, x2 ) output <- gmfd_test( FD1, FD2, 0.95, "mahalanobis", p = 10^5 )
funData
objectsThis function performs the plot of a functional dataset stored in
an object of class funData
.
## S3 method for class 'funData' plot(x, ...)
## S3 method for class 'funData' plot(x, ...)
x |
the univariate functional dataset in form of |
... |
additional graphical parameters to be used in plotting functions |
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) m2 <- t * ( 1 - t )^2 rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) FD <- funData( t, list( x1, x2 ) ) plot(FD)
# Define parameters n <- 50 P <- 100 K <- 150 # Grid of the functional dataset t <- seq( 0, 1, length.out = P ) # Define the means and the parameters to use in the simulation m1 <- t^2 * ( 1 - t ) m2 <- t * ( 1 - t )^2 rho <- rep( 0, K ) theta <- matrix( 0, K, P ) for ( k in 1:K) { rho[k] <- 1 / ( k + 1 )^2 if ( k%%2 == 0 ) theta[k, ] <- sqrt( 2 ) * sin( k * pi * t ) else if ( k%%2 != 0 && k != 1 ) theta[k, ] <- sqrt( 2 ) * cos( ( k - 1 ) * pi * t ) else theta[k, ] <- rep( 1, P ) } # Simulate the functional data x1 <- gmfd_simulate( n, m1, rho = rho, theta = theta ) x2 <- gmfd_simulate( n, m2, rho = rho, theta = theta ) FD <- funData( t, list( x1, x2 ) ) plot(FD)