AI Algorithms: Types of Data, Learning, and Problems

Jan 17, 2019
Book Reviews

Chapter 3 of Artificial Intelligence in Finance opens with the quote about AlphaGo beating a human Go player. That event was a big deal back in 2016. People thought it would take at least another decade. It didn’t. And that sets the tone for this chapter. AI moves faster than experts predict.

This first section of Chapter 3, titled “Algorithms,” is basically a crash course in the building blocks of AI. Hilpisch covers what kinds of data exist, how machines learn, what problems they solve, and what tools are in the toolbox. Let’s break it down.

Two Types of Data: Features and Labels

Before any algorithm can learn anything, it needs data. And data in AI falls into two buckets.

Features are your inputs. Think of them as the information you feed into a model. In a finance setting, this could be someone’s income, savings, age, or credit history.

Labels are your outputs. They’re the answers you want the model to learn. For a loan application, the label might be “credit-worthy” or “not credit-worthy.”

Here’s how I think about it. Features are the questions on a test. Labels are the answer key. You give a model both so it can learn the pattern between them.

Not every learning method needs labels though. More on that in a sec.

Three Types of Learning

This is where things get interesting. Hilpisch lays out the three main flavors of machine learning.

Supervised Learning

You give the algorithm both features and labels. It studies them together and learns the relationship. Then when you hand it new features it hasn’t seen before, it can predict the label.

Think of it like a student studying flashcards. The front of the card is the feature. The back is the label. After enough practice, the student can guess the back of a new card they haven’t seen.

Hilpisch says this is the most important type for the rest of his book. Makes sense. Most finance problems have historical data with known outcomes, which is exactly what supervised learning needs.

Unsupervised Learning

Here you only give the algorithm features. No labels. No answer key. The model has to figure out patterns on its own.

The book shows a Python example using KMeans clustering. You create some scattered data points and ask the algorithm to group them into clusters. It does this without being told what the “right” groups are. It just finds natural groupings in the data.

The finance application? Imagine clustering bank customers into groups based on their behavior. The algorithm might find that certain customers naturally fall into a “high risk” cluster and others into “low risk,” without you ever defining what those categories look like. Pretty cool.

I like how the book walks through actual code here. Hilpisch generates 100 sample data points arranged in 4 clusters, runs KMeans, and shows that the algorithm identifies all 4 clusters perfectly. It’s a clean example that makes the concept click.

Reinforcement Learning

This one is different from both supervised and unsupervised. The agent learns by doing things and getting rewards or punishments. No flashcards, no clustering. Just trial and error.

The book uses a coin-toss game to explain this. The coin is rigged to land on heads 80% of the time. A “dumb” baseline algorithm that guesses randomly scores about 50 out of 100 bets. But the reinforcement learning version keeps track of what it observes. After each toss, it adds the result to its memory. Over time, it learns the bias and starts betting on heads more often. The score jumps to around 68 out of 100.

That’s a 36% improvement just from paying attention to outcomes. The algorithm doesn’t know the coin is rigged. It just notices that heads keeps coming up and adjusts.

In real finance, reinforcement learning could work for trading. An agent takes actions (buy, sell, hold), sees the result (profit or loss), and adjusts its strategy over time. It’s the same idea as the coin game, just way more complex.

Two Types of Problems

Once you know how the learning works, you need to know what kind of problem you’re solving. Hilpisch boils it down to two.

Estimation (Regression): The output is a number on a continuous scale. Like predicting tomorrow’s stock price. The label is a floating point number, not a category. You’re estimating a value.

Classification: The output is a category. Credit-worthy or not. Fraud or not fraud. Buy, sell, or hold. The label is one of a finite set of options.

Most problems in finance fall into one of these two buckets. And the distinction matters because it determines which algorithms and which loss functions you use.

Types of Approaches: AI vs ML vs DL

The last part of this section clears up some terminology that people mix up all the time.

Artificial Intelligence (AI) is the big umbrella. It covers everything, including old-school expert systems and rule-based approaches.

Machine Learning (ML) is a subset of AI. It’s specifically about algorithms that learn patterns from data using some measure of success (like minimizing error).

Deep Learning (DL) is a subset of ML. It’s all about neural networks, especially ones with multiple hidden layers. The “deep” in deep learning refers to those extra layers.

Hilpisch is pretty direct about his preference. He says DL-based approaches often perform much better than alternatives like logistic regression or support vector machines. That’s why the rest of the book focuses mainly on deep learning, using dense neural networks (DNNs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs).

My Take

This section does a solid job as a foundation chapter. It’s not trying to teach you everything about each algorithm. It’s giving you the vocabulary and the mental framework so the rest of the book makes sense.

What I appreciate is the hands-on Python examples. A lot of textbooks would just describe unsupervised learning in abstract terms. Hilpisch actually shows you 15 lines of code, runs it, and displays the results. Same with the reinforcement learning coin-toss example. You can run these yourself and see the learning happen in real time.

If I had one critique, it’s that the “Types of Approaches” section is a bit brief. The book mentions OLS regression, decision trees, SVMs, Bayesian networks, genetic algorithms, and ensemble methods in passing but doesn’t really explain them here. I get it though. This book is about deep learning in finance, and Hilpisch is upfront about that focus. The other approaches get a name-drop, and then it’s on to neural networks.

For anyone just starting out with AI concepts, this section is a decent map of the territory. You won’t be an expert after reading it, but you’ll know the landscape. And sometimes that’s exactly what you need before going deeper.

This post is part of a series reviewing “Artificial Intelligence in Finance” by Yves Hilpisch (O’Reilly, 2020, ISBN 978-1-492-05543-3).

Previous: What AI in Finance Really Means

Next: Neural Networks and Why Data Matters

#artificial-intelligence-in-finance #yves-hilpisch #book-retelling #artificial-intelligence #machine-learning #supervised-learning #unsupervised-learning #reinforcement-learning

AI Algorithms: Types of Data, Learning, and Problems

Two Types of Data: Features and Labels

Three Types of Learning

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Two Types of Problems

Types of Approaches: AI vs ML vs DL

My Take

About

About BookGrill

Category

Tags View all tags

Theme Settings

Accent Color