MarketWatch Picks has highlighted these products and services because we think readers will find them useful; the MarketWatch News staff is not involved in creating this content. Links in this content may result in us earning a commission, but our recommendations are independent of any compensation that we may receive. Learn more.

A KMeans model object is instantiated, fixing the number of clusters. The model is fitted to the features data. KMeans and VanderPlas , ch. Algorithms 5 The predictions are generated given the fitted model. The predictions are numbers from 0 to 3, each representing one cluster. Figure Unsupervised learning of clusters Once an algorithm such as KMeans is trained, it can, for instance, predict the cluster for a new not yet seen combination of features values.

The coin tossing game is heavily biased to emphasize the benefits of learning as compared to an uninformed baseline algorithm. An action is randomly chosen from the action space. A state is randomly chosen from the state space. The total reward tr is increased by one if the bet is correct. The game is played for a number of epochs; each epoch is bets. The average total reward of the epochs played is calculated. Reinforcement learning tries to learn from what is observed after an action is taken, usually based on a reward.

To keep things simple, the following learning algorithm only keeps track of the states that are observed in each round insofar as they are appended to the action space list object. In this way, the algorithm learns the bias in the game, though maybe not perfectly. By randomly sampling from the updated action space, the bias is reflected because naturally the bet will more often be heads. The following section provides examples for both types of tasks. Types of Approaches Some more definitions might be in order before finishing this section.

Machine learning ML ML is the discipline of learning relationships and other information about given data sets based on an algorithm and a measure of success; a measure of success might, for example, be the mean-squared error MSE given labels values and output values to be estimated and the predicted values from the algorithm.

ML is a sub-set of AI. DL is a sub-set of machine learning and so is therefore also a sub-set of AI. DL has proven useful for a number of broad problem areas. It is suited for estimation and classification tasks, as well as for RL. In many cases, DL-based approaches perform better than alternative algorithms, such as logistic regression or kernel-based ones, like support vector machines.

More details appear in later chapters, particularly in Part III. Neural Networks The previous sections provide a broader overview of algorithms in AI. This section shows how neural networks fit in. A simple example will illustrate what characterizes neural networks in comparison to traditional statistical methods, such as ordinary least-squares OLS regression.

The approach taken here is a supervised learning approach where the task is to estimate labels data based on features data. This section also illustrates the use of neural networks in the context of classification problems. Or it transforms a series of input values x1, x2, Assume that the previous input values and output values are given. They represent the sample data. The problem in statistical regression is to find a function that approximates the functional relationship between the input values also called the independent values and the output values also called the dependent values as well as possible.

The linear regression approach does not work too well here in approximating the functional relationship. In addition to the constant and linear terms, higher order monomials, for instance, can be easily added as basis functions. To this end, compare the regression results shown in Figure and the following code that creates the figure. For basis functions up to and including the cubic monomial, the estimation is perfect, and the functional relationship is perfectly recovered: In [27]: plt.

Estimation with Neural Networks However, not all relationships are of this kind. Without going into the details, neural networks can approximate a wide range of functional relationships. Knowledge of the form of the relationship is generally not required. However, they are quite good already for the simple configuration used: In [29]: from sklearn.

This in turn means that a set of parameters, the weights within the neural network, are first initialized randomly and then adjusted gradually given the differences between the neural network output and the sample output values. This approach lets you retrain update a neural network incrementally. For more background, see Goodfellow et al. Sample data and neural network—based estimations Keras The next example uses a sequential model with the Keras deep learning package.

The procedure is repeated for five rounds. After every such round, the approximation by the neural network is updated and plotted. Figure shows how the approximation gradually improves with every round. This is also reflected in the decreasing MSE values.

Neural Networks 15 model. A more comprehensive answer might need to come later in this book, but a somewhat different example might give some hint. Consider instead the previous sample data set, as generated from a well-defined mathematical function, now a random sample data set, for which both features and labels are randomly chosen. Of course, this example is for illustration and does not allow for a deep interpretation. Figure visualizes the results.

Even for the highest number of monomials in the example, the estimation results are still not too good. The MSE value is accordingly relatively high: In [40]: np. OLS regression in this case assumes that the approximation can be achieved through an appropriate combination of a finite number of basis functions. Since the sample data set has been generated randomly, the OLS regression does not perform well in this case.

Neural Networks 17 Figure Random sample data and OLS regression lines What about neural networks? The application is as straightforward as before and yields estimations as shown in Figure Network architecture and number of trainable parameters are shown. Random sample data and neural network estimations Neural Networks 19 Classification with Neural Networks Another benefit of neural networks is that they can be easily used for classification tasks as well. Consider the following Python code that implements a classification using a neural network based on Keras.

The binary features data and labels data are generated randomly. The major adjustment to be made modeling-wise is to change the activation function from the output layer to sigmoid from linear. More details on this appear in later chapters. The classification is not perfect. However, it reaches a high level of accuracy. How the accuracy, expressed as the relationship between correct results to all label values, changes with the number of training epochs is shown in Figure DataFrame model.

Classification accuracy and loss against number of epochs 6 The loss function calculates the prediction error of the neural network or other ML algorithms. Binary cross entropy is an appropriate loss function for binary classification problems, while the mean squared error MSE is, for example, appropriate for estimation problems. Statistical methods, such as OLS regression, might perform well for a smaller set of problems, but not too well or not at all for others.

Incremental learning The optimal weights within a neural network, given a target measure of success, are learned incrementally based on a random initialization and incremental improvements. Universal approximation There are strong mathematical theorems showing that neural networks even with one hidden layer only can approximate almost any function. Chapter 2 discusses more good reasons. Neural Networks Neural networks are good at learning relationships between input and output data.

They can be applied to a number of problem types, such as estimation in the presence of complex relationships or classification, for which traditional statistical methods are not well suited. The neural network with one hidden layer reaches a high degree of accuracy on the given data set, or in-sample. However, what about the predictive power of a neural network? This hinges significantly on the volume and variety of the data available to train the neural network.

Most algorithms used in AI are about pattern recognition. Given that the labels data is also binary, the algorithm tries to learn whether a 0 or 1 is more likely given a certain pattern, say [0, 0, 1, 1, 1, 1, 0, 0, 0, 0]. Because all numbers are randomly chosen with equal probability, there is not that much to learn beyond the fact that the labels 0 and 1 are equally likely no matter what random pattern is observed. First, not all patterns are in the sample data set.

Second, the sample size is much too small per observed pattern. What about the ability of a neural network to learn about the relationships within a given data set? The ability is pretty high, as the in-sample accuracy score shows. Importance of Data 25 But what about the predictive power of a trained neural network? To this end, the given data set can be split into a training and a test data sub-set.

The model is trained on the training data sub-set only and then tested with regard to its predictive power on the test data set. As before, the accuracy of the trained neural network is pretty high in-sample that is, on the training data set. The problems are not really relevant in the context of learning relationships in-sample. To the contrary, the smaller a data set is, the more easily in-sample relationships can be learned in general.

Larger Data Set Fortunately, there is often a clear way out of this problematic situation: a larger data set. Faced with real-world problems, this theoretical insight might be equally correct. From a practical point of view, though, such larger data sets are not always available, nor can they often be generated so easily. However, in the context of the example of this section, a larger data set is indeed easily created. The following Python code increases the number of samples in the initial sample data set significantly.

DataFrame np. First, all possible patterns are now represented in the data set. In other words, the neural network sees basically all the patterns multiple times. This difference can be considered huge given that AI practitioners and companies often fight for improvements as small as a tenth of a percentage point.

Big Data What is the difference between a larger data set and a big data set? The term big data has been used for more than a decade now to mean a number of things. The larger data set used before is still small in practical terms. However, it is large enough to accomplish the specified goal. The required volume and variety of the data set are mainly driven by the structure and characteristics of the features and labels data.

Given in-house data, the responsible data scientist 28 Chapter 1: Artificial Intelligence designs 25 categorical features, every one of which can take on 8 different values. This is due to a number of reasons. To name only a few, first, not every pattern will be relevant in practice—some patterns might simply not exist, might be impossible, and so forth.

Second, not all features might be equally important, reducing the number of relevant features and thereby the number of possible patterns. Third, a value of 4 or 5 for feature number 7, say, might not make a difference at all, further reducing the number of relevant patterns. With regard to algorithms, neural networks and deep learning approaches are at the core.

The central theme of this book is the application of neural networks to one of the core problems in finance: the prediction of future market movements. The prediction of the future market direction that is, whether a target level or price goes up or down is a problem that can be easily cast into a classification setting.

Before diving deeper into the core theme itself, the next chapter first discusses selected topics related to what is called superintelligence and technological singularity. That discussion will provide useful background for the chapters that follow, which focus on finance and the application of AI to the financial domain.

In this context, the next chapter discusses the importance of hardware for AI. Conclusions 29 References Books and papers cited in this chapter: Alpaydin, Ethem. MIT Press, Cambridge. Chollet, Francois. Deep Learning. Kratsios, Anastasis.

Shanahan, Murray. The Technological Singularity. Cambridge: MIT Press. Tegmark, Max. Life 3. United Kingdom: Penguin Random House. VanderPlas, Jake. Python Data Science Handbook. If one path turns out to be blocked, we can still progress. Shortly after, the human era will be ended. For the purposes of this chapter and book, technological singularity refers to a point in time at which certain machines achieve superhuman intelligence, or superintelligence —this is mostly in line with the original idea of Vinge The idea and concept was further popularized by the widely read and cited book by Kurzweil Barrat has a wealth of historical and anecdotal information around the topic.

The expression technological singularity itself has its origin in the concept of a singularity in physics. It refers to the center of a black hole, where mass is highly concentrated, gravitation becomes infinite, and traditional laws of physics break down. Although the general ideas and concepts of the technological singularity and of superintelligence might not have an obvious and direct relationship to AI applied to finance, a better understanding of their background, related problems, and potential consequences is beneficial.

Those insights also help guide the discussion about how AI might reshape the financial industry in the near and long term. Among others, it covers how the company DeepMind solved the problem of playing Atari games with neural networks.

It also tells the story of how the same company solved the problem of playing the game of Go at above-human-expert level. The story of chess and computer programs is also recounted in that section. Success Stories Many ideas and algorithms in AI date back a few decades already.

Over these decades there have been longer periods of hope on the one hand and despair on the other hand. Bostrom , ch. One reason for this is recent successes in applying AI to domains and problems that even a few years ago seemed immune to AI dominance for decades to come. The list of such success stories is long and growing rapidly.

Therefore, this section focuses on three such stories only. Gerrish provides a broader selection and more detailed accounts of the single cases. Atari This sub-section first tells the success story of how DeepMind mastered playing Atari games with reinforcement learning and neural networks, and then illustrates the basic approach that led to its success based on a concrete code example. The story The first success story is about playing Atari games on a superhuman level.

DeepMind published a paper Mnih et al. The algorithm is a variant of Q-learning applied to a convolutional neural network. The original project focused on seven Atari games, and for three of them—Pong, Enduro, and Breakout—the DeepMind team reported above-human expert performance of the AI agent.

From an AI point of view, it is remarkable not only that the DeepMind team achieved such a result, but also how it achieved it. Second, no human guidance or humanly labeled data was provided, just the interactive learning experience based on visual input properly transformed into features data.

In this game, the goal is to destroy lines of bricks at the top of the screen by using a paddle at the bottom of the screen from which a ball bounces back and moves straight across the screen. Whenever the ball hits a brick, the brick is destroyed and the ball bounces back. The ball also bounces back from the left, right, and top walls. The player loses a life in this game whenever the ball reaches the bottom of the screen without being hit by the paddle. The state space is represented by frames of the game screen of size x pixels with a color palette.

With regard to the action policy, the algorithm learns which action is best to take, given a certain game state, to maximize the game score total reward. An example There is not enough room in this chapter to explore in detail the approach taken by DeepMind for Breakout and the other Atari games. Therefore, the action space is similar to the Breakout action space. The state space consists of four physical data points: cart position, cart velocity, pole angle, and pole angular velocity see Figure If, after having taken an action, the pole is still in balance, the agent gets a reward of 1.

If the pole is out of balance, the game ends. Graphical representation of the CartPole environment The following code first instantiates a CartPole environment object, and then inspects the action and state spaces, takes a random action, and captures the results. The AI agent moves on toward the next round when the done variable is False: In [1]: import gym import numpy as np import pandas as pd np. However, to increase the quality of the data set, only data that results from games with a total reward of or more is collected.

DataFrame results length. The average total reward of all random games included in the data set. A look at the collected data in the DataFrame object. Equipped with the data, a neural network can be trained as follows. Set up a neural network for classification. Given that the data set only includes actions that have been successful for the given state, the neural network learns about what action to take label given the state features : In [13]: from pylab import plt plt.

The model is trained on the previously collected data. The metrics per training step are shown for the final few steps. The trained neural network, or AI agent, can then play the CartPole game given its learned best actions for any state it is presented with.

The AI agent achieves the maximum total reward of for each of the games played. The task of learning to play Breakout, for example, is of course more involved, if only because the state space is much larger. Go The board game Go is more than 2, years old.

For example, Lee Sedol, who was the Go world champion for years, holds the 9th dan. In Silver et al. Recounting their early successes from , the team points out in the introduction: [O]ur program AlphaGo achieved a This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away. It is remarkable that this milestone was achieved just one year after a leading AI researcher, Nick Bostrom, predicted that it might take another decade to reach that level.

Many observers remarked, however, that the beating European Go champion of that time, Fan Hui, cannot really be considered a benchmark since the world Go elite play on a much higher level. A wealth of background information is provided on the AlphaGo Korea web page, and there is even a movie available about the event.

The story of the competition and AlphaGo Lee is well documented and has drawn attention all over the world. This landmark achievement was a decade ahead of its time. The game earned AlphaGo a 9 dan professional ranking, the highest certification. This was the first time a computer Go player had ever received the accolade. Up until that point, AlphaGo used, among other resources, training data sets based on millions of human expert games for its supervised learning.

Silver et al. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. It is remarkable that a neural network trained not too dissimilarly to the CartPole example from the previous section that is, based on self-play can crack a game as complex as Go, whose possible board positions outnumber the atoms of the universe. It is also remarkable that the Go wisdom collected over centuries by human players is simply not necessary to achieve this milestone.

Success Stories 39 The DeepMind team did not stop there. AlphaZero was intended to be a general game-playing AI agent that was supposed to be able to learn different complex board games, such as Go, chess, and shogi. With regard to AlphaZero, the team summarizes in Silver a : In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains.

Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi Japanese chess as well as Go, and convincingly defeated a world-champion program in each case. Again, a remarkable milestone was reached by DeepMind in a game-playing AI agent that, after less than 24 hours of self-playing and training, achieved abovehuman-expert levels in three intensely studied board games with centuries-long histories in each case.

Chess Chess is, of course, one of the most popular board games in the world. Chess-playing computer programs have been around since the very early days of computing, and in particular, home computing. For example, an almost complete chess engine called ZX Chess, which only consisted of about bytes of machine code, was introduced in for the ZX Spectrum home computer.

In the mids, expert-level computer chess programs were still far away, even on better hardware with many fewer constraints than the basic home computer ZX Spectrum. No wonder then that leading chess players at that time felt confident when playing against computers.

In his book, published 20 years after his historic loss against Deep Blue, he writes: Twelve years later I was in New York City fighting for my chess life. The computer won with 3. While Deep Blue lost the first game, it would win two of the remaining five, with three games ending in a draw by mutual agreement.

It has been pointed out that Deep Blue should not be considered a form of AI since it mainly relied on a huge hardware cluster. In that sense, Deep Blue mainly relied on brute force techniques rather than modern AI algorithms such as neural networks. Since , both hardware and software have seen tremendous advancements.

However, chess applications for regular computers and smartphones still rely on the collected wisdom of decades of computer chess. This is where AlphaZero comes in. The approach of AlphaZero to mastering the game of chess is exclusively based on reinforcement learning with self-play of different versions of the AI agent competing against each other.

AlphaZero takes a totally different approach, replacing these hand-crafted Success Stories 41 rules with a deep neural network and general purpose algorithms that know nothing about the game beyond the basic rules. Given this tabula rasa approach of AlphaZero, its performance after a few hours of self-play-based training is exceptional when compared to the leading traditional chess-playing computer programs.

In a test series comprising 1, games, AlphaZero beat Stockfish by winning games mostly while playing white , losing just six games, and drawing the rest. At the same time, AlphaZero only analyzes about 60, positions per second. Despite analyzing 1, times fewer positions per second, it nevertheless is able to beat Stockfish. One might be inclined to think that AlphaZero indeed shows some form of intelligence that sheer brute force cannot compensate for.

Reinforcement learning, generally combined with neural networks for action policy representation, has proven useful and superior in many different areas, as the previous section illustrates. However, without advances on the hardware side, the recent AI achievements would not have been possible. Again, the story of DeepMind and its effort to master the game of Go with reinforcement learning RL provides some valuable insights. Table provides an overview of the hardware usage and power consumption for the major AlphaGo versions from onwards.

In this context, the absolute value is not relevant. It is rather the ordering that the values induce that is of interest. Regarding utility functions, von Neumann and Morgenstern , p. House numbers in streets are a good example for ordinal numbers.

Risk aversion In finance, the concept of risk aversion is important. In gambling, risk-loving agents probably can be found as well. It is easily verified that they model risk-averse, risk-neutral, and riskloving agents, respectively. Expected Utility Theory 69 The following code illustrates this application: In [11]: def u x : return np.

The following Python code models this problem and solves it exactly. The results are mainly driven by the particular form of the Bernoulli utility function: In [20]: from scipy. It is one of the first theories of investment under uncertainty that focused on statistical measures only for the construction of stock investment portfolios. MVP completely abstracts from, say, fundamentals of a company that might drive its stock performance or assumptions about the future competitiveness of a company that might be important for the growth prospects of a company.

Basically, the only input data that counts is the time series of share prices and statistics derived therefrom, such as the historical annualized mean return and the historical annualized variance of the returns. Assumptions and Results The central assumption of MVP, according to Markowitz , is that investors only care about expected returns and the variance of these returns: We next consider the rule that the investor does or should consider expected return a desirable thing and variance of return an undesirable thing.

This rule has many sound points, both as a maxim for, and hypothesis about, investment behavior. The portfolio with maximum expected return is not necessarily the one with minimum variance. There is a rate at which the investor can gain expected return by taking on variance, or reduce variance by giving up expected return. This is something almost never observed in real financial data, as the next chapter illustrates.

Short sales, for instance, might be allowed without altering the analysis significantly. Simulated expected portfolio volatility and return one risky asset Because there is only one risky asset and one risk-less asset, the opportunity set is a straight line: In [48]: from pylab import plt, mpl plt. Simulated expected portfolio volatility and return two risky assets Minimum volatility and maximum Sharpe ratio Next, the derivation of the minimum volatility minimum variance and maximum Sharpe ratio portfolios.

Figure shows the location of the two portfolios in the riskreturn space. Although the risky asset T has a negative expected return, it has a significant weight in the maximum Sharpe ratio portfolio. Minimum volatility and maximum Sharpe ratio portfolios Efficient frontier An efficient portfolio is one that has a maximum expected return risk given its expected risk return.

In Figure , all those portfolios that have a lower expected return than the minimum risk portfolio are inefficient. The following code derives the efficient portfolios in risk-return space and plots them as seen in Figure Generates the set of target expected returns. Derives the minimum volatility portfolio given a target expected return.

The model dates back to the pioneering work of Sharpe and Lintner Jones , ch. The specific equilibrium model of interest to many investors is known as the capital asset pricing model, typically referred to as the CAPM. The CAPM is attractive as an equilibrium model because of its simplicity and its implications. In the CAPM, agents are assumed to invest according to MVP, caring only about the risk and return statistics of risky assets over one period. Since agents are assumed to be identical in that they use MVP to form their efficient portfolios, this implies that all agents must hold the same efficient portfolio in terms of composition since the set of tradable assets is the same for every agent.

In other words, the market portfolio set of tradable assets must lie on the efficient frontier. If this were not the case, the market could not be in equilibrium. What is the mechanism to obtain a capital market equilibrium? If agents do not demand enough of a tradable asset, its price needs to decrease. If demand is higher than supply, its price needs to increase. If prices are set correctly, demand and supply are equal for every tradable asset.

While MVP takes the prices of tradable assets as given, the CAPM is a theory and model about what the equilibrium price of an asset should be, given its risk-return characteristics. The CAPM assumes the existence of at least one risk-free asset in which every agent can invest any amount and which earns the risk-free rate of r.

Figure shows the CML schematically. Capital Asset Pricing Model 83 Figure This type of risk is also called market risk or systemic risk. According to the CAPM, this is the only risk for which an agent should be rewarded with a higher expected return. The two risky assets S, T are available in quantities of 0.

What portfolio combination would the agent choose on the CML? A straightforward utility maximization, implemented in Python, yields the answer. Figure shows two indifference curves. Indifference curves in risk-return space In a next step, the indifference curves need to be combined with the CML to find out visually what the optimal portfolio choice of the agent is.

Making use of the previous numerical optimization results, Figure shows the optimal portfolio—the point at which the indifference curve is tangent to the CML. The CAPM is part of that theory and shall be illustrated by the use of real financial time series data in the next chapter. The arbitrage model was proposed as an alternative to the mean variance capital asset pricing model, introduced by Sharpe, Lintner, and Treynor, that has become the major analytic tool for explaining phenomena observed in capital markets for risky assets.

In that sense, APT does not assume that the market portfolio is the only relevant risk factor; there are rather multiple types of risk that together are assumed to drive the performance expected returns of a stock. Such risk factors might include size, volatility, value, and momentum. It does so, however, using different assumptions and procedures. Instead, APT recognizes that several types of risk may affect security returns. Arbitrage Pricing Theory 93 The factor loadings can be used to estimate an arbitrage-free price V 0 for the risky asset V.

This is not too surprising given standard results from linear algebra. T, np. Augmented market payoff matrix with full rank. Residual values are zero. Unique arbitrage-free price for the risky asset V. APT does not necessarily require that perfect replication is possible; its very model formulation contains residual values. Despite theories and models such as MVP and CAPM being intellectually appealing, easy to implement, and mathematically elegant, it is surprising that they are still so popular today, for a few reasons.

First, the popular theories and models presented in this chapter have hardly any meaningful empirical support. Third, there has been continuous progress on the theoretical and modeling fronts of finance, such that alternative theories and models are available.

The next chapter analyzes some of the theories and models introduced in this chapter on the basis of real financial data. While in the early years of quantitative finance, data was a scarce resource, today even students have access to a wealth of financial data and open source tools that allow the comprehensive analysis of financial theories and models based on real-world data. However, financial theory has usually driven empirical finance to a large extent.

The new area of data-driven finance might lead to a lasting shift in the relative importance of theory as compared to data in finance. Conclusions 95 References Books and papers cited in this chapter: Bender, Jennifer et al. Calvello, Angelo. Financial Economics. New York: Oxford University Press. Fishburn, Peter. Fama, Eugene F. Wiley Finance.

Jacod, Jean, and Philip Protter. Probability Essentials. Berlin: Springer. Johnstone, David and Dennis Lindley. Jones, Charles P. Investments: Analysis and Management. Karni, Edi. Machina and W. Kip Viscusi, Oxford: North Holland. Lintner, John. Markowitz, Harry. Pratt, John W. Ross, Stephen A. White Center for Financial Research. Hoboken: Wiley Finance. Sharpe, William F. Varian, Hal R. Intermediate Microeconomics: A Modern Approach.

Theory of Games and Economic Behavior. Princeton: Princeton University Press. For the purposes of this book, data-driven finance is understood to be a financial context theory, model, application, and so on that is primarily driven by and based on insights gained from data.

It involves careful observation, applying rigorous skepticism about what is observed, given that cognitive assumptions can distort how one interprets the observation. It involves formulating hypotheses, via induction, based on such observations; experimental and measurement-based testing of deductions drawn from the hypotheses; and refinement or elimination of the hypotheses based on the experimental findings.

These are principles of the scientific method, as distinguished from a definitive series of steps applicable to all scientific enterprises. Given this definition, normative finance, as discussed in Chapter 3, is in stark contrast to the scientific method. Normative financial theories mostly rely on assumptions and axioms in combination with deduction as the major analytical method to arrive at their central results.

From a historical point of view, many of these theories were rigorously tested against real-world data only long after their publication dates. The discipline at the intersection of mathematics, statistics, and finance that applies such methods to financial market data is typically called financial econometrics, the topic of the next section. It subjects real-world [financial] data to statistical trials and then compares and contrasts the results against the [financial] theory or theories being tested.

Alexander b is part of a series of four books called Market Risk Analysis. The book by Campbell is another comprehensive resource for financial theory and related econometric research. One of the major tools in financial econometrics is regression, in both its univariate and multivariate forms.

What is the difference between traditional mathematics and statistical learning? Although there is no general answer to this question after all, statistics is a sub-field of mathematics , a simple example should emphasize a major difference relevant to the context of this book. First is the standard mathematical way. Here, the data is generally given and a functional relationship is to be found. Among them are the following: Centuries old The least-squares approach, particularly in combination with regression, has been used for more than years.

Scalability There is basically no limit regarding the data size to which OLS regression can be applied. Financial Econometrics and Regression Flexibility OLS regression can be applied to a wide range of problems and data sets. Speed OLS regression is fast to evaluate, even on larger data sets. Availability Efficient implementations in Python and many other programming languages are readily available.

However, as easy and straightforward as the application of OLS regression might be in general, the method rests on a number of assumptions—most of them related to the residuals—that are not always satisfied in practice. Linearity The model is linear in its parameters, with regard to both the coefficients and the residuals.

Independence Independent variables are not perfectly to a high degree correlated with each other no multicollinearity. Zero mean The mean value of the residuals is close to zero. No correlation Residuals are not strongly correlated with the independent variables. Homoscedasticity The standard deviation of the residuals is almost constant.

No autocorrelation The residuals are not strongly correlated with each other. In practice, it is in general quite simple to test for the validity of the assumptions given a specific data set. Data Availability Financial econometrics is driven by statistical methods, such as regression, and the availability of financial data. For quite a while now, finance professionals have relied on data terminals from companies such as Refinitiv see Eikon Terminal or Bloomberg see Bloomberg Terminal , to mention just two of the leading providers.

Newspapers, magazines, financial reports, and the like have long been replaced by such terminals as the primary source for financial information. Therefore, the major breakthrough in data-driven finance is to be seen in the programmatic availability of data via application programming interfaces APIs that allow the usage of computer code to select, retrieve, and process arbitrary data sets. The remainder of this section is devoted to the illustration of such APIs by which even academics and retail investors can retrieve a wealth of different data sets.

Before such examples are provided, Table offers an overview of categories of data that are in general relevant in a financial context, as well as typical examples. In the table, structured data refers to numerical data types that often come in tabular structures, while unstructured data refers to data in the form of standard text that often has no structure beyond headers or paragraphs, for example.

Alternative data refers to data types that are typically not considered financial data. Table Relevant types of financial data Time Historical Structured data Unstructured data Alternative data Prices, fundamentals News, texts Web, social media, satellites Streaming Prices, volumes News, filings Web, social media, satellites, Internet of Things Structured Historical Data First, structured historical data types will be retrieved programmatically. ConfigParser c. Data Availability , P[] [MainThread ] Error on handshake port : ReadTimeout ReadTimeout If these requirements are met, historical structured data can be retrieved via a single function call.

O', 'MSFT. O', 'NFLX. O', 'AMZN. O non-null float64 1 MSFT. O non-null float64 2 NFLX. O non-null float64 3 AMZN. O non-null float64 dtypes: float64 4 memory usage: 9. O Date O NFLX. O AMZN. O O'], ['TR. Ebitda', 'TR. Financial time series data, in this context, is the paramount example. However, other structured data types such as fundamental data are available in the same way, simplifying the work of quantitative analysts, traders, portfolio managers, and the like significantly.

Structured Streaming Data Many applications in finance require real-time structured data, such as in algorithmic trading or market risk management. The significance of this observation becomes clear when looking at Apple Inc. Apple Inc. The following code retrieves tick data for the Apple stock price for one hour only. Today, financial institutions, and even retail traders and investors, are confronted with never-ending streams of real-time data. The example of Apple stock illustrates that for a single stock during one trading hour, there might be four times as many ticks coming in as the amount of EOD data accumulated over a period of 40 years.

This not only challenges actors in financial markets, but also puts into question whether existing financial theories can be applied to such an environment at all. Unstructured Historical Data Many important data sources in finance provide unstructured data only, such as financial news or company filings. Undoubtedly, machines are much better and faster than humans at crunching large amounts of structured, numerical data. However, recent advances in natural language processing NLP make machines better and faster at processing financial news too, for example.

In , data service providers ingest roughly 1. It is clear that this vast amount of text-based data cannot be processed properly by human beings. USA-Tesla choisit le Texas pour la production TSLA registered record production and deliveries of , and , vehicles, respectively, in the fourth quarter of The Model 3 division registered production of 86, vehicles, while 92, vehicles were delivered.

Retrieves metadata for a number of news articles that fall in the parameter range Selects one storyId for which to retrieve the full text Retrieves the full text for the selected article and shows it Data Availability Unstructured Streaming Data In the same way that historical unstructured data is retrieved, programmatic APIs can be used to stream unstructured news data, for example, in real time or at least near time. News-streaming application based on DNA Dow Jones The news-streaming application has the following main features: Full text The full text of each article is available by clicking on the article header.

Keyword summary A keyword summary is created and printed on the screen. Details become visible through a click on the arrows. Word cloud A word cloud summary bitmap is created, shown as a thumbnail and visible after a click on the thumbnail see Figure Word cloud bitmap shown in news-streaming application Alternative Data Nowadays, financial institutions, and in particular hedge funds, systematically mine a number of alternative data sources to gain an edge in trading and investing.

The first retrieves and processes Apple Inc. The raw HTML code is then retrieved for each press release. In a financial context, it would be of paramount importance to specify exactly what unstructured alternative data sources to tap into. The second example is about the retrieval of data from the social network Twitter, Inc. To this end, Twitter provides API access to tweets on its platform, provided one has set up a Twitter account appropriately.

Book going into production shortly. Then you will notice for sure when it is out. The way the tweets are retrieved from the Twitter API is almost in near time since the most recent tweets are accessed in the examples. These and similar API-based data sources therefore provide a never-ending stream of alternative data for which, as previously pointed out, it is important to specify exactly what one is looking for.

For quite a long time, students and academics learning and studying such theories were more or less constrained to the theory itself. It does not require small teams and larger studies anymore to do so. This is what this section is about. However, before diving into data-driven Normative Theories Revisited finance, the following sub-section discusses briefly some famous paradoxes in the context of EUT and how corporations model and predict the behavior of individuals in practice.

This is the standard assumption in finance and the context of EUT. Uncertainty subsumes the two different decision-making situations. Innumerable studies and experiments have been conducted to observe and analyze how agents behave when faced with uncertainty as compared to what theories such as EUT predict.

For centuries, paradoxa have played an important role in decision-making theory and research. One such paradox, the St. Petersburg paradox, gave rise to the invention of utility functions and EUT in the first place. Daniel Bernoulli presented the paradox—and a solution to it—in The paradox is based on the following coin tossing game G. An agent is faced with a game during which a perfect coin is tossed potentially infinitely many times. If after the first toss heads prevails, the agent receives a payoff of 1 currency unit.

As long as heads is observed, the coin is tossed again. Otherwise the game ends. If heads prevails a second time, the agent receives an additional payoff of 2. If it does a third time, the additional payoff is 4. For the fourth time it is 8, and so on. The expected payoff of this game is infinite. A major reason for this is the fact that relatively large payoffs only happen with a relatively small probability.

Therefore, an agent would probably not wager much more than to play this game. The way out of this paradox is the introduction of a utility function with positive but decreasing marginal utility. In the context of the St. Bernoulli utility functions and EUT resolve the St.

Petersburg paradox. This paradox is based on an experiment with four different games that test subjects should rank. One possible explanation is that decision makers in general value certainty higher than the typical models, such as EUT, predict. Another explanation lies in framing decisions and the psychology of decision makers.

Another famous paradox addressing shortcomings of EUT in its subjective form, according to Savage , , is the Ellsberg paradox, which dates back to the seminal paper by Ellsberg It addresses the importance of ambiguity in many real-world decision situations. For urn 2, it is only known that it contains black and red balls but not in which proportion. Since the probabilities for black and red balls, respectively, are not known for urn 2, decision makers prefer a situation of risk instead of ambiguity.

The two paradoxa of Allais and Ellsberg show that real test subjects quite often behave contrary to what well-established decision theories in economics predict. In other words, human beings as decision makers can in general not be compared to machines that carefully collect data and then crunch the numbers to make a decision under uncertainty, be it in the form of risk or ambiguity.

Human behavior is more complex than most, if not all, theories currently suggest. How difficult and complex it can be to explain human behavior is clear after reading, for example, the page book Behave by Sapolsky It covers multiple facets of this topic, ranging from biochemical processes to genetics, human evolution, tribes, language, religion, and more, in an integrative manner.

If standard economic decision paradigms such as EUT do not explain real-world decision making too well, what alternatives are available? Such experiments and their sometimes surprising and paradoxical results have indeed motivated a great number of researchers to come up with alternative theories and models that resolve the paradoxa.

The book The Experiment in the History of Economics by Fontaine and Leonard is about the historical role of experiments in economics. There is, for example, a whole string of literature that addresses issues arising from the Ellsberg paradox. But they are far from being mainstream in finance. What, after all, has proven to be useful in practice? Not too surprisingly, the answer lies in data and machine learning algorithms.

The internet, with its billions of users, generates a treasure trove of data describing real-world human behavior, or what is sometimes called revealed preferences. The default ML approach taken in this context is supervised learning. The algorithms themselves are in general theory- and model-free; variants of neural networks are often applied. Therefore, when companies today predict the behavior of their users or customers, more often than not a model-free ML algorithm is deployed.

Traditional decision theories like EUT or one of its successors generally do not play a role at all. This makes it somewhat surprising that such theories still, at the beginning of the s, are a cornerstone of most economic and financial theories applied in practice.

And this is not even to mention the large number of financial textbooks that cover traditional decision theories in detail. If one of the most fundamental building blocks of financial theory seems to lack meaningful empirical support or practical benefits, what about the financial models that build on top of it? On the other hand, big data and model-free, supervised learning approaches prove useful and successful in practice for predicting user and customer behavior.

One should rather focus on their indirectly revealed preferences based on features data new information that describes the state of a financial market and labels data outcomes that reflects the impact of the decisions made by financial agents. Financial agents become data-processing organisms that can be much better modeled, for example, by complex neural networks than, say, a simple utility function in combination with an assumed probability distribution.

Probably, the investor would access relevant historical price data via an API to a trading platform or a data provider. O non-null float64 2 INTC. O non-null float64 4 GS. N non-null float64 5 SPY non-null float64 6. SPX non-null float64 7. O', 'INTC. Normalized financial time series data The data-driven investor wants to first set a baseline for performance as given by an equally weighted portfolio over the whole period of the available data.

T In [66]: w[:5] Out[66]: array [[ [ [ [ [ 0. Simulated portfolio volatilities, returns, and Sharpe ratios The data-driven investor now wants to backtest the performance of a portfolio that was set up at the beginning of The optimal portfolio composition was derived from the financial time series data available from At the beginning of , the portfolio composition was adjusted given the available data from , and so on. Of course, this can be actively avoided by setting, for example, a minimum weight for every asset considered.

MVP theory does quite a good job in predicting the portfolio volatility. This is also supported by a relatively high correlation between the two time series: In [77]: res[['epv', 'rpv']]. Realized Portfolio Volatility' ; Figure Expected versus realized portfolio volatilities Chapter 4: Data-Driven Finance However, the conclusions are the opposite when comparing the expected with the realized portfolio returns see Figure MVP theory obviously fails in predicting the portfolio returns, as is confirmed by the negative correlation between the two time series: In [79]: res[['epr', 'rpr']].

Realized Portfolio Return' ; Figure Expected versus realized portfolio returns Similar, or even worse, conclusions need to be drawn with regard to the Sharpe ratio see Figure The correlation between the two time series is even lower than for the returns: In [81]: res[['esr', 'rsr']]. The predictive power with regard to portfolio return and Sharpe ratio is pretty bad in the numerical example, whereas the predictive power with regard to portfolio risk seems acceptable.

However, investors generally are interested in risk-adjusted performance measures, such as the Sharpe ratio, and this is the statistic for which MVP theory fails worst in the example. Assume that the data-driven technology investor from before wants to apply the CAPM to derive expected returns for the four technology stocks from before. The following Python code first derives the beta for every stock for a given year, and then calculates the expected return for the stock in the next year, given its beta and the performance of the market portfolio.

DataFrame In [87]: for sym in rets. CAPM-predicted versus realized stock returns for a single stock Figure compares the averages of the CAPM-predicted stock returns with the averages of the realized returns. Com disposes of topical news about Chad Trader current rate and its possible changes, current Chad Trader value with future forecasts and growth perspectives. Thanks to us, you will be notified about the latest tendencies in the chosen cryptocurrency market! Disclaimer : Investors should take the time to research any given product before they invest their funds.

Newest Tokens on Binance Smart Chain. Facebook Twitter Instagram. Add to Metamask. Report Scam. PancakeSwap v2. BSCStation Swap.

You can monitor realtime bitcoin prices at live bitcoin price. It is a payment system based on digital currency. Bitcoin Mechanism : Bitcoin works behind a new technology based on digital money. It works as a mobile app that you can send and receive data.

It is a digital wallet that you may pay with your bitcoins when you get a service. It works systematically with transactions as banks. Bitcoin network shares a public method ledger: Block Chain. The chain contains all transaction processes which happened already.

This ledger provides a permition to an user's computer to verify the validity of a transaction. Each transactions are protected by digital signatures corresponding to the "Sender Adress". The system provides that an user can control btc wallet to send bitcoins from his wallet.

Bitcoin Owners: No one controls the Bitcoin Network. It is a technology like cloud, emails, apps, etc. All bitcoin users control the network around the world. Check send rates. Xe Live Exchange Rates Inverse. The world's most popular currency tools. Xe International Money Transfer. Send money. Xe Currency Charts.

View charts. Xe Rate Alerts. Create alert. Xe Currency Tools. Historical Currency Rates. Travel Expenses Calculator. Currency Email Updates. More tools. Based on 0 reviews. Download the Xe App Check live rates, send money securely, set rate alerts, receive notifications and more. Daily market updates straight to your inbox.