Fuzzy mathematics

Fuzzy mathematics

Fuzzy mathematics is a branch of mathematics that extends classical set theory and logic to model reasoning under uncertainty. Initiated by Lotfi Asker Zadeh in 1965 with the introduction of fuzzy sets, the field has since evolved to include fuzzy set theory, fuzzy logic, and various fuzzy analogues of traditional mathematic structures. Unlike classical mathematics, which usually relies on binary membership (an element either belongs to a set or it does not), fuzzy mathematics allows elements to partially belong to a set, with degrees of membership represented by values in the interval [0, 1]. This framework enables more flexible modeling of imprecise or vague concepts. Fuzzy mathematics has found applications in numerous domains, including control theory, artificial intelligence, decision theory, pattern recognition, and linguistics, where the modeling of gradations and uncertainty is essential. == Definition == A fuzzy subset A of a set X is defined by a function A: X → L, where L is typically the interval [0, 1]. This function is called the membership function of the fuzzy subset and assigns to each element x in X a degree of membership A(x) in the fuzzy set A. In classical set theory, a subset of X can be represented by an indicator function (also known as a characteristic function), which maps elements to either 0 or 1, indicating non-membership or full membership, respectively. Fuzzy subsets generalize this concept by allowing any real value between 0 and 1, thereby enabling partial membership. More generally, the codomain L of the membership function can be replaced with any complete lattice, resulting in the broader framework of L-fuzzy sets. == Fuzzification == The development of fuzzification in mathematics can be broadly divided into three historical stages: Initial, straightforward fuzzifications (1960s–1970s), Expansion of generalization techniques (1980s), Standardization, axiomatization, and L-fuzzification (1990s). Fuzzification generally involves extending classical mathematical concepts from binary (crisp) logic, where membership is determined by characteristic functions, to fuzzy logic, where membership is expressed by values in the interval [0, 1] via membership functions. Let A and B be fuzzy subsets of a set X. The fuzzy versions of set-theoretic operations are commonly defined as: ( A ∩ B ) ( x ) = min ( A ( x ) , B ( x ) ) {\displaystyle (A\cap B)(x)=\min(A(x),B(x))} ( A ∪ B ) ( x ) = max ( A ( x ) , B ( x ) ) {\displaystyle (A\cup B)(x)=\max(A(x),B(x))} for all x ∈ X {\displaystyle x\in X} . These operations can be generalized using t-norms and t-conorms, respectively. For example, the minimum operation can be replaced by multiplication: ( A ∩ B ) ( x ) = A ( x ) ⋅ B ( x ) {\displaystyle (A\cap B)(x)=A(x)\cdot B(x)} Fuzzification of algebraic structures often relies on generalizing the closure property. Let ∗ {\displaystyle } be a binary operation on X, and let A be a fuzzy subset of X. Then A is said to satisfy fuzzy closure if: A ( x ∗ y ) ≥ min ( A ( x ) , A ( y ) ) {\displaystyle A(xy)\geq \min(A(x),A(y))} for all x , y ∈ X {\displaystyle x,y\in X} . If ( G , ∗ ) {\displaystyle (G,)} is a group, then a fuzzy subset A of G is a fuzzy subgroup if: A ( x ∗ y − 1 ) ≥ min ( A ( x ) , A ( y − 1 ) ) {\displaystyle A(xy^{-1})\geq \min(A(x),A(y^{-1}))} for all x , y ∈ G {\displaystyle x,y\in G} . Similar generalizations apply to relational properties. For example, for example, for fuzzification of the transitivity property, a fuzzy relation R {\displaystyle R} on X {\displaystyle X} (i.e., a fuzzy subset of X × X {\displaystyle X\times X} ) is said to be fuzzy transitive if: R ( x , z ) ≥ min ( R ( x , y ) , R ( y , z ) ) {\displaystyle R(x,z)\geq \min(R(x,y),R(y,z))} for all x , y , z ∈ X {\displaystyle x,y,z\in X} . == Fuzzy analogues == Fuzzy subgroupoids and fuzzy subgroups were introduced in 1971 by A. Rosenfeld. Analogues of other mathematical subjects have been translated to fuzzy mathematics, such as fuzzy field theory and fuzzy Galois theory, fuzzy topology, fuzzy geometry, fuzzy orderings, and fuzzy graphs.

Automated machine learning

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning. Automating the process of applying machine learning end-to-end additionally offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models. Common techniques used in AutoML include hyperparameter optimization, meta-learning and neural architecture search. == Comparison to the standard approach == In a typical machine learning application, practitioners have a set of input data points to be used for training. The raw data may not be in a form that all algorithms can be applied to. To make the data amenable for machine learning, an expert may have to apply appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods. After these steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their model. If deep learning is used, the architecture of the neural network must also be chosen manually by the machine learning expert. Each of these steps may be challenging, resulting in significant hurdles to using machine learning. AutoML aims to simplify these steps for non-experts, and to make it easier for them to use machine learning techniques correctly and effectively. AutoML plays an important role within the broader approach of automating data science, which also includes challenging tasks such as data engineering, data exploration and model interpretation and prediction. == Targets of automation == Automated machine learning can target various stages of the machine learning process. Steps to automate are: Data preparation and ingestion (from raw data and miscellaneous formats) Column type detection; e.g., Boolean, discrete numerical, continuous numerical, or text Column intent detection; e.g., target/label, stratification field, numerical feature, categorical text feature, or free text feature Task detection; e.g., binary classification, regression, clustering, or ranking Feature engineering Feature selection Feature extraction Meta-learning and transfer learning Detection and handling of skewed data and/or missing values Model selection - choosing which machine learning algorithm to use, often including multiple competing software implementations Ensembling - a form of consensus where using multiple models often gives better results than any single model Hyperparameter optimization of the learning algorithm and featurization Neural architecture search Pipeline selection under time, memory, and complexity constraints Selection of evaluation metrics and validation procedures Problem checking Leakage detection Misconfiguration detection Analysis of obtained results Creating user interfaces and visualizations == Challenges and Limitations == There are a number of key challenges being tackled around automated machine learning. A big issue surrounding the field is referred to as "development as a cottage industry". This phrase refers to the issue in machine learning where development relies on manual decisions and biases of experts. This is contrasted to the goal of machine learning which is to create systems that can learn and improve from their own usage and analysis of the data. Basically, it's the struggle between how much experts should get involved in the learning of the systems versus how much freedom they should be giving the machines. However, experts and developers must help create and guide these machines to prepare them for their own learning. To create this system, it requires labor intensive work with knowledge of machine learning algorithms and system design. Additionally, other challenges include meta-learning and computational resource allocation.

Structured kNN

Structured k-nearest neighbours (SkNN) is a machine learning algorithm that generalizes k-nearest neighbors (k-NN). k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output. For instance, a data sample might be a natural language sentence, and the output could be an annotated parse tree. Training a classifier consists of showing many instances of ground truth sample-output pairs. After training, the SkNN model is able to predict the corresponding output for new, unseen sample instances; that is, given a natural language sentence, the classifier can produce the most likely parse tree. == Training == As a training set, SkNN accepts sequences of elements with class labels. The type of element does not matter; the only requirement is a defined metric function that gives a distance between each pair of elements of a set. SkNN is based on idea of creating a graph, with each node representing a class label. There is an edge between a pair of nodes if there is a sequence of two elements in the training set with corresponding classes. The first step of SkNN training is the construction of such a graph from training sequences. There are two special nodes in the graph corresponding to sentence beginnings and ends: if a sequence starts with class C, the edge between node START and node C should be created. Like regular k-NN, the second part of SkNN training consists of storing the elements of a training sequence in a certain way. Each element of the training sequences is stored in the node related to the class of the previous element in the sequence. Every first element is stored in the START node. == Inference == Labelling input sequences by SkNN consists of finding the sequence of transitions in the graph, starting from node START. Each transition corresponds to a single element of the input sequence. As a result, the label of each element is determined as the target node label of the transition. The cost of the path is defined as the sum of all transitions, with the cost of transition from node A to node B being the distance from the current input sequence element to the nearest element of class B, stored in node A. Determining an optimal path may be performed using a modified Viterbi algorithm (where the sum of the distances is minimized, unlike the original algorithm which maximizes the product of probabilities).

Relationship square

In statistics, the relationship square is a graphical representation for use in the factorial analysis of a table individuals x variables. This representation completes classical representations provided by principal component analysis (PCA) or multiple correspondence analysis (MCA), namely those of individuals, of quantitative variables (correlation circle) and of the categories of qualitative variables (at the centroid of the individuals who possess them). It is especially important in factor analysis of mixed data (FAMD) and in multiple factor analysis (MFA). == Definition of relationship square in the MCA frame == The first interest of the relationship square is to represent the variables themselves, not their categories, which is all the more valuable as there are many variables. For this, we calculate for each qualitative variable j {\displaystyle j} and each factor F s {\displaystyle F_{s}} ( F s {\displaystyle F_{s}} , rank s {\displaystyle s} factor, is the vector of coordinates of the individuals along the axis of rank s {\displaystyle s} ; in PCA, F s {\displaystyle F_{s}} is called principal component of rank s {\displaystyle s} ), the square of the correlation ratio between the F s {\displaystyle F_{s}} and the variable j {\displaystyle j} , usually denoted : η 2 ( j , F s ) {\displaystyle \eta ^{2}(j,F_{s})} Thus, to each factorial plane, we can associate a representation of qualitative variables themselves. Their coordinates being between 0 and 1, the variables appear in the square having as vertices the points (0,0), ( 0,1), (1,0) and (1,1). == Example in MCA == Six individuals ( i 1 , … , i 6 ) {\displaystyle i_{1},\ldots ,i_{6})} are described by three variables ( q 1 , q 2 , q 3 ) {\displaystyle (q_{1},q_{2},q_{3})} having respectively 3, 2 and 3 categories. Example : the individual i 1 {\displaystyle i_{1}} possesses the category a {\displaystyle a} of q 1 {\displaystyle q_{1}} , d {\displaystyle d} of q 2 {\displaystyle q_{2}} and f {\displaystyle f} of q 3 {\displaystyle q_{3}} . Applied to these data, the MCA function included in the R Package FactoMineR provides to the classical graph in Figure 1. The relationship square (Figure 2) makes easier the reading of the classic factorial plane. It indicates that: The first factor is related to the three variables but especially q 3 {\displaystyle q_{3}} (which have a very high coordinate along the first axis) and then q 2 {\displaystyle q_{2}} . The second factor is related only to q 1 {\displaystyle q_{1}} and q 3 {\displaystyle q_{3}} (and not to q 2 {\displaystyle q_{2}} which has a coordinate along axis 2 equal to 0) and that in a strong and equal manner. All this is visible on the classic graphic but not so clearly. The role of the relationship square is first to assist in reading a conventional graphic. This is precious when the variables are numerous and possess numerous coordinates. == Extensions == This representation may be supplemented with those of quantitative variables, the coordinates of the latter being the square of correlation coefficients (and not of correlation ratios). Thus, the second advantage of the relationship square lies in the ability to represent simultaneously quantitative and qualitative variables. The relationship square can be constructed from any factorial analysis of a table individuals x variables. In particular, it is (or should be) used systematically: in multiple correspondences analysis (MCA); in principal components analysis (PCA) when there are many supplementary variables; in factor analysis of mixed data (FAMD). An extension of this graphic to groups of variables (how to represent a group of variables by a single point ?) is used in Multiple Factor Analysis (MFA) == History == The idea of representing the qualitative variables themselves by a point (and not the categories) is due to Brigitte Escofier. The graphic as it is used now has been introduced by Brigitte Escofier and Jérôme Pagès in the framework of multiple factor analysis == Conclusion == In MCA, the relationship square provides a synthetic view of the connections between mixed variables, all the more valuable as there are many variables having many categories. This representation iscan be useful in any factorial analysis when there are numerous mixed variables, active and/or supplementary.

Multifactor dimensionality reduction

Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression. The basis of the MDR method is a constructive induction or feature engineering algorithm that converts two or more variables or attributes to a single attribute. This process of constructing a new attribute changes the representation space of the data. The end goal is to create or discover a representation that facilitates the detection of nonlinear or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data. == Illustrative example == Consider the following simple example using the exclusive OR (XOR) function. XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable. The table below represents a simple dataset where the relationship between the attributes (X1 and X2) and the class variable (Y) is defined by the XOR function such that Y = X1 XOR X2. Table 1 A machine learning algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2. An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling. The MDR algorithm would change the representation of the data (X1 and X2) in the following manner. MDR starts by selecting two attributes. In this simple example, X1 and X2 are selected. Each combination of values for X1 and X2 are examined and the number of times Y=1 and/or Y=0 is counted. In this simple example, Y=1 occurs zero times and Y=0 occurs once for the combination of X1=0 and X2=0. With MDR, the ratio of these counts is computed and compared to a fixed threshold. Here, the ratio of counts is 0/1 which is less than our fixed threshold of 1. Since 0/1 < 1 we encode a new attribute (Z) as a 0. When the ratio is greater than one we encode Z as a 1. This process is repeated for all unique combinations of values for X1 and X2. Table 2 illustrates our new transformation of the data. Table 2 The machine learning algorithm now has much less work to do to find a good predictive function. In fact, in this very simple example, the function Y = Z has a classification accuracy of 1. A nice feature of constructive induction methods such as MDR is the ability to use any data mining or machine learning method to analyze the new representation of the data. Decision trees, neural networks, or a naive Bayes classifier could be used in combination with measures of model quality such as balanced accuracy and mutual information. == Machine learning with MDR == As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about overfitting. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation. Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. Permutation testing makes it possible to generate an empirical p-value for the result. Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets. These approaches have all been shown to be useful for choosing and evaluating MDR models. An important step in a machine learning exercise is interpretation. Several approaches have been used with MDR including entropy analysis and pathway analysis. Tips and approaches for using MDR to model gene-gene interactions have been reviewed. == Extensions to MDR == Numerous extensions to MDR have been introduced. These include family-based methods, fuzzy methods, covariate adjustment, odds ratios, risk scores, survival methods, robust methods, methods for quantitative traits, and many others. == Applications of MDR == MDR has mostly been applied to detecting gene-gene interactions or epistasis in genetic studies of common human diseases such as atrial fibrillation, autism, bladder cancer, breast cancer, cardiovascular disease, hypertension, obesity, pancreatic cancer, prostate cancer and tuberculosis. It has also been applied to other biomedical problems such as the genetic analysis of pharmacology outcomes. A central challenge is the scaling of MDR to big data such as that from genome-wide association studies (GWAS). Several approaches have been used. One approach is to filter the features prior to MDR analysis. This can be done using biological knowledge through tools such as BioFilter. It can also be done using computational tools such as ReliefF. Another approach is to use stochastic search algorithms such as genetic programming to explore the search space of feature combinations. Yet another approach is a brute-force search using high-performance computing. == Implementations == www.epistasis.org provides an open-source and freely-available MDR software package. An R package for MDR. An sklearn-compatible Python implementation. An R package for Model-Based MDR. MDR in Weka. Generalized MDR.

Quantum machine learning

Quantum machine learning (QML) is the study of quantum algorithms for machine learning. It often refers to quantum algorithms for machine learning tasks which analyze classical data, sometimes called quantum-enhanced machine learning. QML algorithms use qubits and quantum operations to try to improve the space and time complexity of classical machine learning algorithms. Hybrid QML methods involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. The term "quantum machine learning" is sometimes used to refer classical machine learning methods applied to data generated from quantum experiments (i.e. machine learning of quantum systems), such as learning the phase transitions of a quantum system or creating new quantum experiments. QML also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory". == Machine learning with quantum computers == Quantum-enhanced machine learning refers to quantum algorithms that solve tasks in machine learning, thereby improving and often expediting classical machine learning techniques. Such algorithms typically require one to encode the given classical data set into a quantum computer to make it accessible for quantum information processing. Subsequently, quantum information processing routines are applied and the result of the quantum computation is read out by measuring the quantum system. For example, the outcome of the measurement of a qubit reveals the result of a binary classification task. While many proposals of QML algorithms are still purely theoretical and require a full-scale universal quantum computer to be tested, others have been implemented on small-scale or special purpose quantum devices. === Quantum associative memories and quantum pattern recognition === Early work on quantum associative memories has been done by Dan Ventura and Tony Martinez and by Carlo A. Trugenberger in the late 1990s and early 2000s. Associative (or content-addressable) memories are able to recognize stored content on the basis of a similarity measure, while random access memories are accessed by the address of stored information and not its content. As such they must be able to retrieve both incomplete and corrupted patterns, the essential machine learning task of pattern recognition. Typical classical associative memories store p patterns in the O ( n 2 ) {\displaystyle O(n^{2})} interactions (synapses) of a real, symmetric energy matrix over a network of n artificial neurons. The encoding is such that the desired patterns are local minima of the energy functional and retrieval is done by minimizing the total energy, starting from an initial configuration. Unfortunately, classical associative memories are severely limited by the phenomenon of cross-talk. When too many patterns are stored, spurious memories appear which quickly proliferate, so that the energy landscape becomes disordered and no retrieval is anymore possible. The number of storable patterns is typically limited by a linear function of the number of neurons, p ≤ O ( n ) {\displaystyle p\leq O(n)} . Quantum associative memories (in their simplest realization) store patterns in a unitary matrix U acting on the Hilbert space of n qubits. Retrieval is realized by the unitary evolution of a fixed initial state to a quantum superposition of the desired patterns with probability distribution peaked on the most similar pattern to an input. By its very quantum nature, the retrieval process is thus probabilistic. Because quantum associative memories are free from cross-talk, however, spurious memories are never generated. Correspondingly, they have a superior capacity than classical ones. The number of parameters in the unitary matrix U is O ( p n ) {\displaystyle O(pn)} . One can thus have efficient, spurious-memory-free quantum associative memories for any polynomial number of patterns. If the matrix U is encoded as a unique operator (as opposed as to a sequence of gates as in the circuit model), e.g. by an optical interferometer, the retrieval becomes efficient even for an exponential number of patterns. === Linear algebra simulation with quantum amplitudes === A number of quantum algorithms for machine learning are based on the idea of amplitude encoding, that is, to associate the amplitudes of a quantum state with the inputs and outputs of computations. Since a state of n {\displaystyle n} qubits is described by 2 n {\displaystyle 2^{n}} complex amplitudes, this information encoding can allow for an exponentially compact representation. Intuitively, this corresponds to associating a discrete probability distribution over binary random variables with a classical vector. The goal of algorithms based on amplitude encoding is to formulate quantum algorithms whose resources grow polynomially in the number of qubits n {\displaystyle n} , which amounts to a logarithmic time complexity in the number of amplitudes and thereby the dimension of the input. Many QML algorithms in this category are based on variations of the quantum algorithm for linear systems of equations (colloquially called HHL, after the paper's authors) which, under specific conditions, performs a matrix inversion using an amount of physical resources growing only logarithmically in the dimensions of the matrix. One of these conditions is that a Hamiltonian which entry-wise corresponds to the matrix can be simulated efficiently, which is known to be possible if the matrix is sparse or low rank. For reference, any known classical algorithm for matrix inversion requires a number of operations that grows more than quadratically in the dimension of the matrix (e.g. O ( n 2.373 ) {\displaystyle O{\mathord {\left(n^{2.373}\right)}}} ), but they are not restricted to sparse matrices. Quantum matrix inversion can be applied to machine learning methods in which the training reduces to solving a linear system of equations, for example in least-squares linear regression, the least-squares version of support vector machines, and Gaussian processes. A crucial bottleneck of methods that simulate linear algebra computations with the amplitudes of quantum states is state preparation, which often requires one to initialise a quantum system in a state whose amplitudes reflect the features of the entire dataset. Although efficient methods for state preparation are known for specific cases, this step easily hides the complexity of the task. === Variational quantum algorithms (VQAs) === In a variational quantum algorithm, a classical computer optimizes the parameters used to prepare a quantum state, while a quantum computer is used to do the actual state preparation and measurement. VQAs are considered promising candidates for noisy intermediate-scale quantum computers. Variational quantum circuits (or parameterized quantum circuits) are a popular class of VQAs where the parameters are those used in a fixed quantum circuit. Researchers have studied VQCs to solve optimization problems and find the ground state energy of complex quantum systems, which were difficult to solve using a classical computer. === Quantum binary classifier === Pattern reorganization is one of the important tasks of machine learning, binary classification is one of the tools or algorithms to find patterns. Binary classification is used in supervised learning and in unsupervised learning. In QML, classical bits are converted to qubits and they are mapped to Hilbert space; complex value data are used in a quantum binary classifier to use the advantage of Hilbert space. By exploiting the quantum mechanic properties such as superposition, entanglement, interference the quantum binary classifier produces the accurate result in short period of time. === Quantum machine learning algorithms based on Grover search === Another approach to improving classical machine learning with quantum information processing uses amplitude amplification methods based on Grover's search algorithm, which has been shown to solve unstructured search problems with a quadratic speedup compared to classical algorithms. These quantum routines can be employed for learning algorithms that translate into an unstructured search task, as can be done, for instance, in the case of the k-medians and the k-nearest neighbors algorithms. Other applications include quadratic speedups in the training of perceptrons. An e

Artificial development

Artificial development, also known as artificial embryogeny or machine intelligence or computational development, is an area of computer science and engineering concerned with computational models motivated by genotype–phenotype mappings in biological systems. Artificial development is often considered a sub-field of evolutionary computation, although the principles of artificial development have also been used within stand-alone computational models. Within evolutionary computation, the need for artificial development techniques was motivated by the perceived lack of scalability and evolvability of direct solution encodings (Tufte, 2008). Artificial development entails indirect solution encoding. Rather than describing a solution directly, an indirect encoding describes (either explicitly or implicitly) the process by which a solution is constructed. Often, but not always, these indirect encodings are based upon biological principles of development such as morphogen gradients, cell division and cellular differentiation (e.g. Doursat 2008), gene regulatory networks (e.g. Guo et al., 2009), degeneracy (Whitacre et al., 2010), grammatical evolution (de Salabert et al., 2006), or analogous computational processes such as re-writing, iteration, and time. The influences of interaction with the environment, spatiality and physical constraints on differentiated multi-cellular development have been investigated more recently (e.g. Knabe et al. 2008). Artificial development approaches have been applied to a number of computational and design problems, including electronic circuit design (Miller and Banzhaf 2003), robotic controllers (e.g. Taylor 2004), and the design of physical structures (e.g. Hornby 2004).