Introduction
The theory of dynamical systems is a mathematical discipline that has close intersections with various research areas such as mechanics, chaos theory, and time series analysis.It investigates physical systems from a mathematical point of view and tries to build some abstract structures for studying and predicting future states Arnold1998dynmical, Benner2015Survey,Broomhead1989Time and for more specific purposes with continuum dynamics like fluid mechanics\cite{Brunton2020Machine}, human activity recognition motrenko2015extracting,ignatov2016human,grabovoy2020quasi, economical modeling \cite{gandolfo1971economic} and others.
Traditional methods for system reconstruction are based on physical principles, conservation laws, and empirical modeling Kevrekidis2003Equation,Sugihara2012Detecting,Ye2015Equation.An alternative method is to reconstruct equations and dynamical systems based solely on a fixed number of time series data obtained from an experiment with the usage of previous values in time.This approach is called time delay embedding (TDE) and first describe in Packard1980.It allows to move from a scalar value of a time series at a point in time to a vector representation which is called phase space.With theoretical results from Takens's theorem \cite{Takens1981Dynamical} time delay embedding fully reconstructs an unknown dynamical system or makes a diffeomorphisms to the original system without matching geometric shape.
The delay embedding method itself has many options such as uniform delay embedding, method characteristic lengths \cite{Cellucci2003Comparative}, autocorrelation and minimum mutual information \cite{Bradley2015Chaos}, empirical non-uniform methods, reduced autoregressive models JUDD1998273 and topological methods \cite{Tan_2023}.A combination of TDE vectorization and machine learning techniques results in various approaches for the discovery of dynamical systems \cite{Crutchfield1987EquationsOM}.Its include nonlinear regression \cite{Voss1999Amplitude}, artificial neural networks GONZALEZGARCIA1998S965, normal form identification \cite{Majda2009normal}, nonlinear spectral analysis \cite{Giannakis2012Nonlinear}, modeling emergent behavior \cite{Roberts2013Model} and automated refinement and inference of dynamics Schmidt2011Automated, Daniels2015Automated.
Previously mentioned methods have two disadvantages: the first one is a large dimension size of the initial phase space, the second one is the loss of cross information in the multidimensional time series case.First, as the dimension of the phase space increases, the distances between the points of the trajectory tend to the constant value.That makes distances uninformative and unstable due to the curse of dimensionality \cite{PowellWarrenB2011Adps}.It assumes that a more stable and robust model is constructed in a low dimensional subspace.The most common method for dimensionality reduction is the principal component analysis (PCA) \cite{Broomhead1986Extracting}.Second, traditional methods and graph models cause a loss of higher-order or cross-component information due to the separate use of each multivariate time series \cite{Wolf2016Advantages}.The papers KRUPPA20175610, Chen_2019, chen2022stabilitymultilineardynamicalsystems study tensor forecasting models for time series with multilinear algebra approaches.The methods have extensions to multivariate time series closely related to autoencoders \cite{Lusch_2018}.
The key idea is to combine the classical dimensionality reduction technique with the tensor approach.On the one hand, the tensor as a multilinear map reduces the dimension of initial phase spaces.On the other hand, it combines several time series in a simple form.This map prevents the loss of higher-order information between several time series and selects time series components that are recognized as noise in one time series case.This method also preserves the diffeomorphism between dynamical systems.Thus, the proposed method is essentially a method for a feature engineering.
The key contributions of the paper are the application of the previously proposed dynamic system model and extension to the time delay embedding and multivariate times series case.The computational experiment explores walking and jogging.The experiment was performed on data obtained from a mobile device's accelerometer Malekzadeh2019.The proposed method is tested for the forecasting accuracy of an unused time series from the dynamic system under study.The main conclusions about the accuracy and validity of the approach are the same as the conclusions in chen2022stabilitymultilineardynamicalsystems, Chen2021Multilinear.
The paper is organized into three main sections. In section 2, multilinear dynamical systems with time delay embedding are introduced.The multilinear map method can effectively reconstruct an attractor of the dynamical system.In section 4, a tensor preliminaries review includes notations and various tensor products.In section 8 experiment results with numerical examples are presented.Section \ref{сonclusion} draws some conclusions and plans for future work.
Multilinear dynamical system
Time delay embedding
Time delay embedding augments the scalar time series
into a higher dimension through the construction of delay vector
given as
.
The embedding parameters
is delay lag and
is embedding dimension.According to Taken's theorem, only one variable with time delays reconstructs a dynamical system.A periodical times series and reconstructed attractor is shown in Figure (
1).This augmentation with previous measures is called the trajectory matrix.Resulting trajectory matrix
$\mathbf{S}
$ of a time series
$\mathbf{s}
$ is defined as
where
is the width of the window,
is the lengths of the time series and
is equal to 1.
The original phase space from time delay embedding in Eq.(\ref{eq_time_delay}) has a high dimension.Thus, principal component analysis (PCA) is often used to reduce the dimensionality of the original phase space, by transforming an initial set of variables into a smaller one that is called a subspace.
where
is the transformation matrix of the PCA algorithm.The number of selected components is
, corresponding to the largest eigenvalues.
Low-dimensional representation in phase space allows to use of more robust and simpler models and applications.
Tensor representation of time series
Tensors are multidimensional generalizations of matrices.The number of dimensions is the order of a tensor. Each dimension is called a mode.A vector
has one mode, row, a matrix
has two modes, rows and columns, a
-th order tensor
has
$N
$ modes.
The
-mode multiplication of tensor
and matrix
is defined by
with the elements:
In case of tensor
and vector
-mode multiplication gives
:
Formally, mode products for a matrix Eq.(\ref{eq_n_mode_matrix}) and a vector Eq.(\ref{eq_n_mode_vector}) are the same operations, but in this paper, for simplicity, only one notation Eq.(\ref{eq_n_mode_matrix}) is used for both operations.It is implied that in the case of a matrix the second mode is used, in the case of a vector only one first mode of the vector is used.
Multilinear dynamical system for multivariate time series
This paper discusses the topic of a dynamic system, which is given by
where
is a multilinear map, and
is the state variable. It is assumed that the tensor
has multilinear properties in the sense of the definition of algebraic multilinearity.
The vectors of the state variables are the values of some measured quantities at time
.It is assumed that these quantities completely describe the state of the dynamic system.In the case of a mathematical pendulum, these quantities are velocity and acceleration.With certain restrictions, it is possible to completely reconstruct dynamically using only these variables.
This paper proposes to construct a map into a low-dimensional subspace, i.e. dimensionality reduction, instead of reconstructing the dynamics itself, as some evolution rule of a system.The evolution rule is a function that describes what future states follow from the current state of the dynamical system.
This map is used in further models for anomaly detection, classification, and signal phase extraction (in the case of periodic time series).Thus, Eq.(\ref{eq_dynamic_law}) is modified as
where
is a vector from time series
with
delays,
is a vector with
that represent system in its phase space.
In the case of Eq.(\ref{eq_dim_red_model}) only univariate time series is used.It has an extension to the case of multivariate time series.
To simplify the theory and to clarify it connections with computational experiment, let multivariate time series have three types of measurements, that come from a triaxial accelerometer. Let
be the time series of acceleration along x,y,z axes.A signal from each axis separately restores the attractor of the dynamic system according to Taken’s theorem using time delay embedding as Eq.(\ref{eq_time_delay}).There are maps between each variable
where
are trajectory matrices in initial phase space,
are the transformation matrices,
is an identity matrix.Thus, the multilinear model is modified as follows:
where
is modified dynamic tensor,
are state variable vectors from each axis at time
.Thus, in a shorter form, the equation is transformed into Eq.(\ref{eq_dim_red_model_final})
The graphical representation of the Penrose notation of the proposed method is shown in Figure (2).
The tensor
allows to select not only the main components, as in the case of PCA for univariate time series, but filters them according to multilinear dependencies with other time series.In this way, additional information from a set of time series allows us to select components that would not have been identified from the noise or would have been of lesser significance in the case of independent analysis.
Alternative view on tensor representation
The resulting mapping is alternatively represented in the classical linear algebra notations. Let the mapping function is
where
and
are vector spaces,
is embedding dimension,
is dimension of resulting space.There are a basis
for each
and a basis
for
. Thus tensor
is a collection of scalars values as
It determines the multilinear function
for
as
The proposed method is essentially a feature engineering technique.It combines linear dimensionality reduction methods and the tensor approach with nonlinear aggregation.The tensor itself contains the weights of the models mapping the original phase spaces from signal sources.
Dimension size problem
In TDE the reconstruction of dynamical systems is possible if the lag is taken at least equal to
with
the dimension of the manifold on which the dynamical system is defined.It is not clear how dimension
$q
$ is estimated.
In the case of periodic or quasi-periodic time series with a non-chaotic structure, the system will return to the same state at certain moments.Thus, at time
, where
--- dominant period of the system, all points correspond to the same area in the phase space.If two points of the phase trajectory with significantly different times are in the same area of the phase space, then it is called an
intersection.In other words, the nearest neighbors in phase space are the nearest neighbors in time.An example of intersection is shown in Figure (
3).
In this way, it is possible to select the minimum dimension of a dynamic system based on two criteria:
1)the appearance of self-intersections,
2)the slowdown of the growth of the target metric with an increase in the dimension of the space.
However, the problem of choosing the dimension of the phase space is beyond the scope of the current work.
Experimental results and discussion
The Lorenz system
This example uses the Lorenz attractor to analyze reconstructed phase spaces. A scheme of experiment is shown in Figure (4).
The variables under study are defined by a system of differential equations
with the following parameters
,
,
.
The result phase trajectory has the form shown in Fig. 4.It shows the reconstruction scheme and various state spaces.For comparison, an attractor is shown, that is obtained by the time delay embedding method.
As shown in Figure (5), additional information in the multilinear model reconstructs the shape of the phase trajectory similar to PCA.Both methods qualitatively restore the petals, maintaining repeating dynamics in two different modes of the original attractor.This result is obtained due to the noise-free time series and a sufficient length of history in each methods.
Human movement dataset
The purpose of the computational experiment is to analyze the quality of attractor reconstruction and compare it with the PCA as a basic linear approach for real data.
A
The experiment is performed on data obtained from the accelerometer of a mobile device
Malekzadeh2019.This dataset includes time-series data generated by accelerometer and gyroscope sensors.It is collected with an iPhone 6s kept in the participant's front pocket using SensingKit.All data is collected at the 50Hz sample rate.A total of 24 participants of various genders, ages, weights, and heights performed six activities in the same environment and conditions: downstairs, upstairs, walking, jogging, sitting, and standing.For this experiment only walking or jogging is chosen.
begin{table}[!htbp]\centering\caption{Average coefficient of determination (R2) and root mean square deviation (rMSE) between predicted and true value of Z axis for \textbf{walking} over 24 participants}\label{tb_r2_mse_walking}\scriptsize\begin{tabular}{l|p{1cm}p{1cm}|p{1cm}p{1cm}|p{1cm}p{1cm}|p{1cm}p{1cm}}\toprule & \multicolumn{2}{p{2cm}}{Only X axis} & \multicolumn{2}{p{2cm}}{Only Y axis} & \multicolumn{2}{p{2cm}}{Tensor X,Y} & \multicolumn{2}{p{2cm}}{RSS of X,Y} \\ & R2 & rMSE & R2 & rMSE & R2 & rMSE & R2 & rMSE \\Dim & & & & & & & & \\\midrule3 & 0.23 & 0.91 & 0.34 & 0.88 & 0.17 & 0.90 & 0.30 & 0.90 \\7 & 0.40 & 0.79 & 0.46 & 0.80 & 0.34 & 0.83 & 0.44 & 0.85 \\15 & 0.59 & 0.69 & 0.60 & 0.68 & 0.60 & 0.67 & 0.57 & 0.73 \\20 & 0.63 & 0.66 & 0.65 & 0.67 & 0.68 & 0.61 & 0.62 & 0.69 \\25 & 0.66 & 0.60 & 0.68 & 0.66 & 0.73 &0.54 & 0.65 & 0.68 \\\bottomrule\end{tabular}\end{table}
The main idea is to restore the attractor of the system using four different methods and then to forecast a new unknown component with linear mapping; for this case, it is values of the Z axis of the accelerometer.
For a basic simple model, PCA with a single time series is chosen.This method is chosen for a correct comparison without taking into account the influence of the model architecture.In particular, a simple two layer autoencoder with a large number of parameters effectively restores the attractor.In our case, this is PCA with X and Y components separately,
where
are the transformation matrices of the PCA algorithm,
are resulting low-dimensional representation.
The alternative approach is to aggregate the initial X and Y time series as root sum squares (RSS) as
where
are the transformation matrices of the PCA algorithm,
are resulting low-dimensional representation,
is the element-wise power ( known as the Hadamard power).
For correct comparison, Z-component is excluded from the model.Thus, the tensor approach is modified as follows
where only
and
components of the time series are used to restore the attractor of the system.
The inverse mapping of time series into an original space is made by using the multivariate regression model as
where
is model index,
are coefficient matrices.
In this experiment, the results obtained with four approaches were compared in terms of forecasting accuracy.All four cases have an equal number of time points and window length (i.e.
in Eq.(\ref{eq_time_delay})).The data is included from all 24 participants in jogging and walking.Time series without any activity type changes are selected, i.e. there are no stairs, climbs, or turns in the walking route.
begin{table}[!htbp]\centering\caption{Average coefficient of determination (R2) and root mean square deviation (rMSE) between predicted and true value of Z axis for \textbf{jogging} over 24 participants}\label{tb_r2_mse_jogging}\scriptsize\begin{tabular}{l|p{1cm}p{1cm}|p{1cm}p{1cm}|p{1cm}p{1cm}|p{1cm}p{1cm}}\toprule & \multicolumn{2}{p{2cm}}{Only X axis} & \multicolumn{2}{p{2cm}}{Only Y axis} & \multicolumn{2}{p{2cm}}{Tensor X,Y} & \multicolumn{2}{p{2cm}}{RSS of X,Y} \\ & R2 & rMSE & R2 & rMSE & R2 & rMSE & R2 & rMSE \\Dim & & & & & & & & \\\midrule3 & 0.16 & 0.92 & 0.17 & 0.92 & 0.17 & 0.92 & 0.17 & 0.91 \\7 & 0.28 & 0.90 & 0.27 & 0.90 & 0.28 & 0.85 & 0.29 & 0.89 \\15 & 0.40 & 0.82 & 0.41 & 0.85 & 0.46 & 0.82 & 0.41 & 0.85 \\20 & 0.44 & 0.81 & 0.45 & 0.82 & 0.51 & 0.78 & 0.45 & 0.83 \\25 & 0.46 & 0.81 & 0.47 & 0.81 & 0.55 & 0.73 & 0.47 & 0.81 \\\bottomrule\end{tabular}\end{table}
Figure (8) shows the attractor dimension and rMSE/R2 graph for jogging and Figure (7) for walking.This indicates that the proposed method has comparable metrics to classical approaches.Table (2) and Table(1) show the average metric values for all participants.The metrics have high values up to 0.55 and 0.73 explained variance.For the large values of the dimension of the attractor space, the quality of the proposed approach is better than that of similar ones.
Thus, on several real time series it was shown that in the case of a linear dependence, the proposed method allows to obtain more interpretable results and reduces the number of intersections. In the case of clearly nonlinear dependences, the result becomes complex and non-robust.
References:
Kevin Judd and Alistair Mees (1998) Embedding as a modeling problem. Physica D: Nonlinear Phenomena 120(3): 273-286 https://doi.org/10.1016/S0167-2789(98)00089-X
Bradley, Elizabeth and Kantz, Holger (2015) Nonlinear time-series analysis revisited. Chaos: An Interdisciplinary Journal of Nonlinear Science 25: https://doi.org/10.1063/1.4917289, 03
Cellucci, C. and Albano, Alfonso and Rapp, Paul (2003) Comparative study of embedding methods. Physical review. E, Statistical, nonlinear, and soft matter physics 67: 066210 https://doi.org/10.1103/PhysRevE.67.066210, 07
Broomhead, David and King, Gregory (1986) Extracting qualitative dynamics from experimental data. Physica D 20: 217-236. Physica D Nonlinear Phenomena 20: 217-236 https://doi.org/10.1016/0167-2789(86)90031-X, 06
Gandolfo, Giancarlo (1971) Economic dynamics: Methods and models. Elsevier, 16
Takens, Floris Detecting strange attractors in turbulence. In: Rand, David and Young, Lai-Sang (Eds.) Dynamical Systems and Turbulence, Warwick 1980, Lecture Notes in Mathematics, 1981, 898, Springer-Verlag, 366--381, Berlin Heidelberg
Brunton, Steven and Noack, Bernd and Koumoutsakos, Petros (2020) Machine Learning for Fluid Mechanics. Annual Review of Fluid Mechanics 52: https://doi.org/10.1146/annurev-fluid-010719-060214, 01
Tan, Eugene and Algar, Shannon and Corr êa, D ébora and Small, Michael and Stemler, Thomas and Walker, David (2023) Selecting embedding delays: An overview of embedding techniques and a new method using persistent homology. Chaos: An Interdisciplinary Journal of Nonlinear Science 33(3) https://doi.org/10.1063/5.0137223, March, AIP Publishing, 1089-7682
Lusch, Bethany and Kutz, J. Nathan and Brunton, Steven L. (2018) Deep learning for universal linear embeddings of nonlinear dynamics. Nature Communications 9(1) https://doi.org/10.1038/s41467-018-07210-0, November, Springer Science and Business Media LLC, 2041-1723
Chen, Can and Surana, Amit and Bloch, Anthony and Rajapakse, Indika (2021) Multilinear Control Systems Theory. SIAM Journal on Control and Optimization 59: 749-776 https://doi.org/10.1137/19M1262589, 02
Daniels, Bryan and Nemenman, Ilya (2015) Automated adaptive inference of phenomenological dynamical models. Nature communications 6: 8133 https://doi.org/10.1038/ncomms9133, 08
Schmidt, Michael and Vallabhajosyula, Ravishankar and Jenkins, Jerry and Hood, Jonathan and Soni, Abhishek and Wikswo, John and Lipson, Hod (2011) Automated refinement and inference of analytical models for metabolic networks. Physical biology 8: 055011 https://doi.org/10.1088/1478-3975/8/5/055011, 08
Roberts, A. (2023) Model Emergent Dynamics in Complex Systems. Society for Industrial and Applied Mathematics, 10.1137/1.9781611973563, 978-1-61197-355-6, , 06
Giannakis, Dimitrios and Majda, Andrew (2012) Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability. Proceedings of the National Academy of Sciences of the United States of America 109: 2222-7 https://doi.org/10.1073/pnas.1118984109, 02
Majda, Andrew and Franzke, Christian and Crommelin, Daan (2009) Normal forms for reduced stochastic climate models. Proceedings of the National Academy of Sciences of the United States of America 106: 3649-53 https://doi.org/10.1073/pnas.0900173106, 03
Ye, Hao and Beamish, R.J. and Glaser, Sarah and Grant, Sue and Hsieh, Chih-hao and Richards, Laura and Schnute, Jon and Sugihara, George (2015) Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proceedings of the National Academy of Sciences of the United States of America 112 https://doi.org/10.1073/pnas.1417063112, 03
Sugihara, George and May, Robert and Ye, Hao and Hsieh, Chih-hao and Deyle, Ethan and Fogarty, Michael and Munch, Stephan (2012) Detecting Causality in Complex Ecosystems. Science (New York, N.Y.) 338: https://doi.org/10.1126/science.1227079, 09
Voss, Henning and Kolodner, Paul and Abel, Markus and Kurths, Juergen (1999) Amplitude Equations from Spatiotemporal Binary-Fluid Convection Data. Physical Review Letters 83: https://doi.org/10.1103/PhysRevLett.83.3422, 10
R. Gonz ález-Garc ía and R. Rico-Mart ínez and I.G. Kevrekidis (1998) Identification of distributed parameter systems: A neural net based approach. Computers & Chemical Engineering 22: S965-S968 https://doi.org/10.1016/S0098-1354(98)00191-4, European Symposium on Computer Aided Process Engineering-8
Kevrekidis, Ioannis and Gear, C. and Hyman, James and Kevrekidid, Panagiotis and Runborg, Olof and Theodoropoulos, Constantinos (2003) Equation-Free, Coarse-Grained Multiscale Computation: Enabling Mocroscopic Simulators to Perform System-Level Analysis. Communications in Mathematical Sciences 1: https://doi.org/10.4310/CMS.2003.v1.n4.a5, 01
James P. Crutchfield and Bruce S. McNamara (1987) Equations of Motion from a Data Series. Complex Syst. 1https://api.semanticscholar.org/CorpusID:14493184
D. Broomhead and R. Jones (1989) Time-series analysis. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 423: 103 –121 https://doi.org/10.1098/rspa.1989.0044, The Royal Society
Benner, Peter and Gugercin, Serkan and Willcox, Karen (2015) A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical Systems. SIAM Review 57(4): 483-531 https://doi.org/10.1137/130932715
L. Arnold (1998) Random Dynamical Systems. Springer Monographs in Mathematics, Springer Berlin, Heidelberg, 10.1007/978-3-662-12878-7
Brunton, Steven L. and Proctor, Joshua L. and Kutz, J. Nathan (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113(15): 3932 –3937 https://doi.org/10.1073/pnas.1517384113, March, Proceedings of the National Academy of Sciences, 1091-6490
Wolf, Michael M. and Klinvex, Alicia M. and Dunlavy, Daniel M. (2016) Advantages to modeling relational data using hypergraphs versus graphs. 10.1109/HPEC.2016.7761624, Laplace equations;Electronic mail;Data models;Computational modeling;Analytical models;Sparse matrices;Software, 1-7, , , 2016 IEEE High Performance Extreme Computing Conference (HPEC)
Can Chen. On the Stability of Multilinear Dynamical Systems. 10.48550/arXiv.2105.01041, math.OC, arXiv, 2105.01041, 2022
Packard, N. H. and Crutchfield, J. P. and Farmer, J. D. and Shaw, R. S. (1980) Geometry from a Time Series. Phys. Rev. Lett. 45: 712--716 https://doi.org/10.1103/PhysRevLett.45.712, American Physical Society, Sep, 0, 9
Kai Kruppa (2017) Comparison of Tensor Decomposition Methods for Simulation of Multilinear Time-Invariant Systems with the MTI Toolbox. IFAC-PapersOnLine 50(1): 5610-5615 https://doi.org/10.1016/j.ifacol.2017.08.1107, 2405-8963, 20th IFAC World Congress
Chen, Can and Surana, Amit and Bloch, Anthony and Rajapakse, Indika (2019) Multilinear Time Invariant System Theory. Society for Industrial and Applied Mathematics, 10.1137/1.9781611975758.18, 978-1-61197-575-8, 118-125, 1, 06
Ignatov, Andrey D and Strijov, Vadim V (2016) Human activity recognition using quasiperiodic time series collected from a single tri-axial accelerometer. Multimedia tools and applications 75(12): 7257--7270 https://doi.org/10.1007/s11042-015-2643-0, Springer
Motrenko, Anastasia and Strijov, Vadim (2015) Extracting fundamental periods to segment biomedical signals. IEEE journal of biomedical and health informatics 20(6): 1466--1476 https://doi.org/10.1007/s11042-015-2643-0, IEEE
Grabovoy, AV and Strijov, VV (2020) Quasi-Periodic Time Series Clustering for Human Activity Recognition. Lobachevskii Journal of Mathematics 41(3): 333--339 https://doi.org/10.1134/S1995080220030075, Springer
Usmanova, KR and Zhuravlev, Yu I and Rudakov, KV and Strijov, VV (2020) Approximation of Quasiperiodic Signal Phase Trajectory Using Directional Regression. Moscow University Computational Mathematics and Cybernetics 44(4): 196--202 https://doi.org/10.3103/S0278641920040068, Springer
Powell, Warren B (2011) Approximate dynamic programming: solving the curses of dimensionality, second edition. John Wiley & Sons, Hoboken, N.J, 10.1002/9781118029176, Wiley series in probability and statistics, eng, Dynamic programming, 047060445X, 2nd ed.
Malekzadeh, Mohammad and Clegg, Richard G. and Cavallaro, Andrea and Haddadi, Hamed (2019) Mobile Sensor Data Anonymization. ACM, New York, NY, USA, adversarial training, deep learning, edge computing, sensor data privacy, time series analysis, 3310068, 10.1145/3302505.3310068, 10, 49--58, Montreal, Quebec, Canada, 978-1-4503-6283-2, IoTDI '19, Proceedings of the International Conference on Internet of Things Design and Implementation
Bengio, Yoshua and LeCun, Yann Scaling Learning Algorithms Towards {AI}. Large Scale Kernel Machines, 2007, MIT Press
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18: 1527--1554
Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron and Bengio, Yoshua (2016) Deep learning. MIT Press, 1