How To Combine Numerical & Text Features: 10 Ways In Machine Learning And Deep Learning

by | Jun 13, 2023 | Data Science, Machine Learning, Natural Language Processing

Why Combine Numerical Features And Text Features?

Combining numerical and text features in machine learning models has become increasingly important in various applications, particularly natural language processing (NLP) and text analytics. By integrating structured numerical data and unstructured text data, we can leverage the complementary information from both sources and enhance the overall performance of our models.

Numerical features provide structured information that can encode valuable insights, such as demographic data, ratings, or measurements. On the other hand, text features contain unstructured information that captures semantics, sentiments, or domain-specific knowledge. Combining these two features can create more comprehensive and informative representations of the underlying data.

Integrating numerical and text features enables models to capture the nuances and subtleties of both data types. For example, in sentiment analysis, incorporating both a numerical rating and the corresponding text review allows the model to understand the sentiment expressed in the text within the context of the numerical rating. This combination provides a more nuanced understanding of the sentiment expressed by the user.

Combining numerical and text features

Combining numerical and text features incorporates both a numerical rating and the corresponding text review

Furthermore, combining numerical and text features facilitates cross-domain learning. By associating numerical features with the corresponding textual context, the model can learn to make connections between different types of information. Cross-domain learning can provide deeper insights and improve the model’s generalisation ability across different domains or tasks.

While combining numerical and text features in deep neural networks or traditional machine learning approaches offers numerous advantages, there are also challenges to consider. These challenges include increased complexity, dimensionality, preprocessing requirements, and potential data sparsity. Nonetheless, with careful consideration and appropriate techniques, the benefits of combining these features outweigh the challenges.

How can you combine numerical and text features in machine learning?

Combining numerical and text features in machine learning approaches can be done using various techniques:

1. Feature Concatenation:

  • Concatenate the numerical and text features into a single feature vector.
  • Apply appropriate preprocessing and normalization to ensure compatibility between the two types of features.
  • Pass the concatenated feature vector to your machine learning algorithm for training and prediction.

2. Feature Engineering:

  • Extract meaningful features from the text data using techniques like bag-of-words, TF-IDF, or word embeddings.
  • Combine these extracted features with the numerical features.
  • Use the combined feature set as input for your machine learning algorithm.

3. One-Hot Encoding or Binary Encoding:

  • Convert categorical text features into numerical representations using one-hot encoding or binary encoding.
  • Combine the encoded categorical features with the numerical features.
  • Feed the combined feature set to your machine learning algorithm.

4. Ensemble Methods:

  • Train separate models on the numerical and text features.
  • Combine the predictions of these models using techniques like voting, averaging, or stacking.
  • The combined predictions can be used as the final output.

5. Deep Neural Networks:

  • Use neural network architectures like multi-input models or Siamese networks.
  • Design separate branches or layers to handle numerical and text features.
  • Combine the representations learned from both branches or layers.
  • Further, process the combined representation and feed it into fully connected layers for prediction.

It’s vital to preprocess and normalize the features appropriately to ensure they are on similar scales before combining them. Additionally, consider the nature of your data, the problem you’re solving, and the available resources when deciding on the best approach for combining numerical and text features in machine learning.

How can you combine numerical and text features with deep learning in neural networks?

Combining numerical and text features in deep neural networks is common in many natural language processing (NLP) and machine learning applications. It allows you to leverage both structured numerical data and unstructured text data to improve the overall performance of your model. You can take several approaches to combine these different types of features effectively. Here are a few popular techniques:

1. Parallel Model Architecture:

  • Train separate models for numerical and text data.
  • Combine the outputs of both models using concatenation or other merging techniques.
  • Feed the merged output into a final fully connected layer for prediction.

2. Feature Concatenation:

  • Concatenate the numerical and text features together into a single input representation.
  • Apply appropriate preprocessing and normalization to ensure compatibility between the two types of features.
  • Pass the concatenated features through the neural network for training and prediction.

3. Hybrid Models:

  • Use a shared representation learning approach where a standard set of layers processes numerical and text features.
  • The shared layers capture relevant information from both feature types and create a joint representation.
  • Separate branches can be added after the shared layers to process specific features individually.
  • Finally, combine the representations and feed them into fully connected layers for prediction.

4. Attention Mechanisms:

  • Use attention mechanisms to weigh the importance of different features dynamically.
  • Apply attention to both the numerical and text elements.
  • Combine the attended representations using concatenation or other merging techniques.
  • Pass the merged features through the neural network for further processing.

5. Pre-trained Models:

  • Utilize pre-trained models such as word embeddings or language models (e.g., Word2Vec, GloVe, BERT).
  • Convert text inputs into their pre-trained representations.
  • Combine the pre-trained text embeddings with numerical features using concatenation or other merging techniques.
  • Fine-tune the combined model on your specific task.

It’s important to note that the choice of architecture depends on the specific problem, available data, and desired performance. Experimenting with different approaches and architectures is often necessary to find the most effective combination for your task.

Advantages of combining numerical and text features in deep neural networks

  1. Enhanced Performance: By combining numerical and text features, you can leverage the strengths of both data types. Numerical features provide structured information, while text features capture unstructured and contextual information. This combination can lead to improved model performance and better-underlying data representation.
  2. Comprehensive Information: Text features often contain valuable insights, such as sentiment, semantics, or domain-specific knowledge. By incorporating text features into deep neural networks, you can utilize this rich information to make more informed predictions and capture subtle patterns that may not be captured by numerical features alone.
  3. Cross-Domain Learning: Combining numerical and text features allows for cross-domain learning. For example, the model can learn to associate certain demographic characteristics with specific sentiment expressions if you have a dataset with customer demographic information (numerical) and their reviews (text). Cross-domain learning can provide deeper insights and improve the model’s generalization capabilities.
  4. Contextual Understanding: Text features can provide valuable context to numerical data. By incorporating textual information, the model can understand the meaning behind numerical values more effectively. For example, in sentiment analysis, understanding the context of numerical ratings (e.g., star ratings) through associated text reviews can help determine sentiment more accurately.

Disadvantages of combining numerical and text features in deep neural networks

  1. Increased Complexity: Combining different features introduces additional complexity to the model architecture and training process. Handling numerical and text elements simultaneously requires careful preprocessing, normalization, and compatibility considerations. This complexity can make the model more challenging to develop and fine-tune.
  2. Increased Dimensionality: Combining numerical and text features often leads to higher-dimensional feature representations. This increase in dimensionality can result in more complex models, which may require more training data and computational resources to train effectively. It can also lead to overfitting if the dataset is limited or imbalanced.
  3. Preprocessing Challenges: Numerical and text features often require different preprocessing techniques. Numerical features may need scaling or normalization, while text features require tokenization, stemming, or stop-word removal. Coordinating and integrating these preprocessing steps for different feature types can be non-trivial.
  4. Data Sparsity: Textual data is inherently sparse, especially using techniques like bag-of-words or TF-IDF. This sparsity can pose challenges in model training and may require specialized techniques like word embeddings or attention mechanisms to effectively capture text features’ information.

It’s essential to consider these advantages and disadvantages carefully when deciding whether and how to combine numerical and text features in deep neural networks. The specific characteristics of your data, the nature of the problem, and the available resources should be considered to make an informed decision.

Example of combining numerical and text features

Consider a practical example of combining numerical and text features in a deep neural network for sentiment analysis. Suppose we have a dataset of customer reviews for a product, where each review is associated with a numerical rating (1 to 5 stars) and a corresponding text review. We aim to predict the sentiment (positive, negative, or neutral) based on the numerical rating and the text review.

Here’s an example of how we can combine the numerical and text features:

Numerical Features:

  • Normalize the numerical ratings to a standard scale (e.g., 0 to 1).
  • Represent the normalized numerical ratings as a single numerical feature.

Text Features:

  • Preprocess the text reviews by tokenizing, removing stop words, and applying stemming if necessary.
  • Use word embeddings (e.g., GloVe, Word2Vec) to represent the text reviews as dense numerical vectors.
  • The word embeddings can be averaged to create a fixed-length vector representation of each text review.

Neural Network Architecture:

  • Design a neural network with two input branches: one for the numerical feature and one for the text feature.
  • The numerical input branch consists of a single input node.
  • The text input branch involves a sequence of embedding layers followed by pooling or recurrent layers to capture the sequential information.
  • Merge the outputs of both branches using concatenation or other merging techniques.
  • Add additional layers (e.g., fully connected layers, dropout, etc.) for further processing and non-linearity.
  • Finally, include an output layer with an appropriate activation function (e.g., softmax for multi-class sentiment classification) to obtain the final sentiment prediction.

Combining the numerical rating feature with the text feature represented by word embeddings allows the model to capture the sentiment expressed in the text reviews while also considering the associated numerical ratings. This combined approach allows the model to leverage structured numerical and unstructured contextual information in the text data, leading to more accurate sentiment predictions.

It’s worth noting that the specific architecture and preprocessing steps may vary depending on the dataset, problem, and available resources. Experimentation and fine-tuning are often necessary to find the best combination for a given task.

How To Python Tutorial Example

Here’s an example Python code snippet demonstrating how to combine numerical and text features using Keras as discussed in the example above:

import numpy as np
from keras.models import Model
from keras.layers import Input, Embedding, LSTM, Dense, concatenate
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

# Assuming you have a dataset with numerical ratings and corresponding text reviews
ratings = np.array([4, 5, 2, 3, 1])  # Example numerical ratings
reviews = np.array([
    "Great product, highly recommended!",
    "Awesome experience with this product.",
    "Average quality, not satisfied.",
    "Decent product, could be better.",
    "Terrible product, don't waste your money."
])  # Example text reviews
sentiments = np.array([1, 1, -1, 0, -1])  # Example sentiment labels (1 for positive, -1 for negative, 0 for neutral)

# Split the dataset into train and test sets
reviews_train, reviews_test, ratings_train, ratings_test, sentiments_train, sentiments_test = train_test_split(
    reviews, ratings, sentiments, test_size=0.2, random_state=42

# Text preprocessing
max_words = 1000  # Maximum number of words to consider
max_sequence_length = 100  # Maximum length of each review
tokenizer = Tokenizer(num_words=max_words)
sequences_train = tokenizer.texts_to_sequences(reviews_train)
sequences_test = tokenizer.texts_to_sequences(reviews_test)
word_index = tokenizer.word_index

# Pad sequences to have the same length
X_train = pad_sequences(sequences_train, maxlen=max_sequence_length)
X_test = pad_sequences(sequences_test, maxlen=max_sequence_length)

# Numerical feature normalization
ratings_min = ratings.min()
ratings_max = ratings.max()
ratings_train_normalized = (ratings_train - ratings_min) / (ratings_max - ratings_min)
ratings_test_normalized = (ratings_test - ratings_min) / (ratings_max - ratings_min)

# Define the neural network architecture
embedding_dim = 100  # Dimensionality of the word embeddings
lstm_units = 128  # Number of units in the LSTM layer

# Text input branch
text_input = Input(shape=(max_sequence_length,))
embedding_layer = Embedding(max_words, embedding_dim)(text_input)
lstm_layer = LSTM(lstm_units)(embedding_layer)

# Numerical input branch
numerical_input = Input(shape=(1,))
numerical_dense = Dense(32, activation='relu')(numerical_input)

# Merge the branches
merged = concatenate([lstm_layer, numerical_dense])
dense_layer = Dense(32, activation='relu')(merged)
output = Dense(3, activation='softmax')(dense_layer)  # 3 classes for sentiment prediction

# Create the model
model = Model(inputs=[text_input, numerical_input], outputs=output)

# Compile and train the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])[X_train, ratings_train_normalized], sentiments_train, epochs=10, batch_size=32, verbose=1)

# Evaluate the model
loss, accuracy = model.evaluate([X_test, ratings_test_normalized], sentiments_test, verbose=0)
print(f"Test loss: {loss:.4f}")
print(f"Test accuracy: {accuracy*100:.2f}%")

In this example, we use the Keras library with the TensorFlow backend to implement the neural network model. We define two branches: one for the text input and another for the numerical information.

These branches are merged, and additional layers are added for further processing. The model is compiled with appropriate loss functions and metrics for sentiment classification. Finally, the model is trained and evaluated using text and numerical ratings combined.

Please note that this is a simplified example, and you may need to adapt the code based on your specific dataset and requirements.


Combining numerical and text features in deep neural networks provides several advantages, such as enhanced performance, comprehensive information, cross-domain learning, and contextual understanding. By leveraging the strengths of both numerical and text data, models can capture more subtle patterns and make more accurate predictions.

However, challenges are also associated with combining these features, including increased complexity, dimensionality, preprocessing requirements, and data sparsity. It’s crucial to carefully consider these factors and experiment with different approaches to find the best combination for your task.

Combining numerical and text features in deep neural networks enables more powerful and versatile models, particularly sentiment analysis, text classification, and recommendation systems. It allows for a more holistic understanding of the data, incorporating structured information from numerical features and the rich context from text features, ultimately leading to improved performance and insights.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Recent Articles

One class SVM anomaly detection plot

How To Implement Anomaly Detection With One-Class SVM In Python

What is One-Class SVM? One-class SVM (Support Vector Machine) is a specialised form of the standard SVM tailored for unsupervised learning tasks, particularly anomaly...

decision tree example of weather to play tennis

Decision Trees In ML Complete Guide [How To Tutorial, Examples, 5 Types & Alternatives]

What are Decision Trees? Decision trees are versatile and intuitive machine learning models for classification and regression tasks. It represents decisions and their...

graphical representation of an isolation forest

Isolation Forest For Anomaly Detection Made Easy & How To Tutorial

What is an Isolation Forest? Isolation Forest, often abbreviated as iForest, is a powerful and efficient algorithm designed explicitly for anomaly detection. Introduced...

Illustration of batch gradient descent

Batch Gradient Descent In Machine Learning Made Simple & How To Tutorial In Python

What is Batch Gradient Descent? Batch gradient descent is a fundamental optimization algorithm in machine learning and numerical optimisation tasks. It is a variation...

Techniques for bias detection in machine learning

Bias Mitigation in Machine Learning [Practical How-To Guide & 12 Strategies]

In machine learning (ML), bias is not just a technical concern—it's a pressing ethical issue with profound implications. As AI systems become increasingly integrated...

text similarity python

Full-Text Search Explained, How To Implement & 6 Powerful Tools

What is Full-Text Search? Full-text search is a technique for efficiently and accurately retrieving textual data from large datasets. Unlike traditional search methods...

the hyperplane in a support vector regression (SVR)

Support Vector Regression (SVR) Simplified & How To Tutorial In Python

What is Support Vector Regression (SVR)? Support Vector Regression (SVR) is a machine learning technique for regression tasks. It extends the principles of Support...

Support vector Machines (SVM) work with decision boundaries

Support Vector Machines (SVM) In Machine Learning Made Simple & How To Tutorial

What are Support Vector Machines? Machine learning algorithms transform raw data into actionable insights. Among these algorithms, Support Vector Machines (SVMs) stand...

underfitting vs overfitting vs optimised fit

Weight Decay In Machine Learning And Deep Learning Explained & How To Tutorial

What is Weight Decay in Machine Learning? Weight decay is a pivotal technique in machine learning, serving as a cornerstone for model regularisation. As algorithms...


Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2024 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2024. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!