Top 10 Ways Of Combining Numerical And Text Features – With How To Tutorial In Python

by | Jun 13, 2023 | Data Science, Machine Learning, Natural Language Processing

Combining numerical and text features in machine learning models has become increasingly important in various applications, particularly natural language processing (NLP) and text analytics. By integrating structured numerical data and unstructured text data, we can leverage the complementary information from both sources and enhance the overall performance of our models.

Numerical features provide structured information that can encode valuable insights, such as demographic data, ratings, or measurements. On the other hand, text features contain unstructured information that captures semantics, sentiments, or domain-specific knowledge. Combining these two features can create more comprehensive and informative representations of the underlying data.

Integrating numerical and text features enables models to capture the nuances and subtleties of both data types. For example, in sentiment analysis, incorporating both a numerical rating and the corresponding text review allows the model to understand the sentiment expressed in the text within the context of the numerical rating. This combination provides a more nuanced understanding of the sentiment expressed by the user.

Combining numerical and text features

Combining numerical and text features incorporates both a numerical rating and the corresponding text review

Furthermore, combining numerical and text features facilitates cross-domain learning. By associating numerical features with the corresponding textual context, the model can learn to make connections between different types of information. Cross-domain learning can provide deeper insights and improve the model’s generalisation ability across different domains or tasks.

While combining numerical and text features in deep neural networks or traditional machine learning approaches offers numerous advantages, there are also challenges to consider. These challenges include increased complexity, dimensionality, preprocessing requirements, and potential data sparsity. Nonetheless, with careful consideration and appropriate techniques, the benefits of combining these features outweigh the challenges.

How can you combine numerical and text features in machine learning?

Combining numerical and text features in machine learning approaches can be done using various techniques:

1. Feature Concatenation:

  • Concatenate the numerical and text features into a single feature vector.
  • Apply appropriate preprocessing and normalization to ensure compatibility between the two types of features.
  • Pass the concatenated feature vector to your machine learning algorithm for training and prediction.

2. Feature Engineering:

  • Extract meaningful features from the text data using techniques like bag-of-words, TF-IDF, or word embeddings.
  • Combine these extracted features with the numerical features.
  • Use the combined feature set as input for your machine learning algorithm.

3. One-Hot Encoding or Binary Encoding:

  • Convert categorical text features into numerical representations using one-hot encoding or binary encoding.
  • Combine the encoded categorical features with the numerical features.
  • Feed the combined feature set to your machine learning algorithm.

4. Ensemble Methods:

  • Train separate models on the numerical and text features.
  • Combine the predictions of these models using techniques like voting, averaging, or stacking.
  • The combined predictions can be used as the final output.

5. Deep Neural Networks:

  • Use neural network architectures like multi-input models or Siamese networks.
  • Design separate branches or layers to handle numerical and text features.
  • Combine the representations learned from both branches or layers.
  • Further, process the combined representation and feed it into fully connected layers for prediction.

It’s vital to preprocess and normalize the features appropriately to ensure they are on similar scales before combining them. Additionally, consider the nature of your data, the problem you’re solving, and the available resources when deciding on the best approach for combining numerical and text features in machine learning.

How can you combine numerical and text features in deep neural networks?

Combining numerical and text features in deep neural networks is common in many natural language processing (NLP) and machine learning applications. It allows you to leverage both structured numerical data and unstructured text data to improve the overall performance of your model. You can take several approaches to combine these different types of features effectively. Here are a few popular techniques:

1. Parallel Model Architecture:

  • Train separate models for numerical and text data.
  • Combine the outputs of both models using concatenation or other merging techniques.
  • Feed the merged output into a final fully connected layer for prediction.

2. Feature Concatenation:

  • Concatenate the numerical and text features together into a single input representation.
  • Apply appropriate preprocessing and normalization to ensure compatibility between the two types of features.
  • Pass the concatenated features through the neural network for training and prediction.

3. Hybrid Models:

  • Use a shared representation learning approach where a standard set of layers processes numerical and text features.
  • The shared layers capture relevant information from both feature types and create a joint representation.
  • Separate branches can be added after the shared layers to process specific features individually.
  • Finally, combine the representations and feed them into fully connected layers for prediction.

4. Attention Mechanisms:

  • Use attention mechanisms to weigh the importance of different features dynamically.
  • Apply attention to both the numerical and text elements.
  • Combine the attended representations using concatenation or other merging techniques.
  • Pass the merged features through the neural network for further processing.

5. Pre-trained Models:

  • Utilize pre-trained models such as word embeddings or language models (e.g., Word2Vec, GloVe, BERT).
  • Convert text inputs into their pre-trained representations.
  • Combine the pre-trained text embeddings with numerical features using concatenation or other merging techniques.
  • Fine-tune the combined model on your specific task.

It’s important to note that the choice of architecture depends on the specific problem, available data, and desired performance. Experimenting with different approaches and architectures is often necessary to find the most effective combination for your task.

Advantages of combining numerical and text features in deep neural networks

  1. Enhanced Performance: By combining numerical and text features, you can leverage the strengths of both data types. Numerical features provide structured information, while text features capture unstructured and contextual information. This combination can lead to improved model performance and better-underlying data representation.
  2. Comprehensive Information: Text features often contain valuable insights, such as sentiment, semantics, or domain-specific knowledge. By incorporating text features into deep neural networks, you can utilize this rich information to make more informed predictions and capture subtle patterns that may not be captured by numerical features alone.
  3. Cross-Domain Learning: Combining numerical and text features allows for cross-domain learning. For example, the model can learn to associate certain demographic characteristics with specific sentiment expressions if you have a dataset with customer demographic information (numerical) and their reviews (text). Cross-domain learning can provide deeper insights and improve the model’s generalization capabilities.
  4. Contextual Understanding: Text features can provide valuable context to numerical data. By incorporating textual information, the model can understand the meaning behind numerical values more effectively. For example, in sentiment analysis, understanding the context of numerical ratings (e.g., star ratings) through associated text reviews can help determine sentiment more accurately.

Disadvantages of combining numerical and text features in deep neural networks

  1. Increased Complexity: Combining different features introduces additional complexity to the model architecture and training process. Handling numerical and text elements simultaneously requires careful preprocessing, normalization, and compatibility considerations. This complexity can make the model more challenging to develop and fine-tune.
  2. Increased Dimensionality: Combining numerical and text features often leads to higher-dimensional feature representations. This increase in dimensionality can result in more complex models, which may require more training data and computational resources to train effectively. It can also lead to overfitting if the dataset is limited or imbalanced.
  3. Preprocessing Challenges: Numerical and text features often require different preprocessing techniques. Numerical features may need scaling or normalization, while text features require tokenization, stemming, or stop-word removal. Coordinating and integrating these preprocessing steps for different feature types can be non-trivial.
  4. Data Sparsity: Textual data is inherently sparse, especially using techniques like bag-of-words or TF-IDF. This sparsity can pose challenges in model training and may require specialized techniques like word embeddings or attention mechanisms to effectively capture text features’ information.

It’s essential to consider these advantages and disadvantages carefully when deciding whether and how to combine numerical and text features in deep neural networks. The specific characteristics of your data, the nature of the problem, and the available resources should be considered to make an informed decision.

Example of combining numerical and text features

Consider a practical example of combining numerical and text features in a deep neural network for sentiment analysis. Suppose we have a dataset of customer reviews for a product, where each review is associated with a numerical rating (1 to 5 stars) and a corresponding text review. We aim to predict the sentiment (positive, negative, or neutral) based on the numerical rating and the text review.

Here’s an example of how we can combine the numerical and text features:

Numerical Features:

  • Normalize the numerical ratings to a standard scale (e.g., 0 to 1).
  • Represent the normalized numerical ratings as a single numerical feature.

Text Features:

  • Preprocess the text reviews by tokenizing, removing stop words, and applying stemming if necessary.
  • Use word embeddings (e.g., GloVe, Word2Vec) to represent the text reviews as dense numerical vectors.
  • The word embeddings can be averaged to create a fixed-length vector representation of each text review.

Neural Network Architecture:

  • Design a neural network with two input branches: one for the numerical feature and one for the text feature.
  • The numerical input branch consists of a single input node.
  • The text input branch involves a sequence of embedding layers followed by pooling or recurrent layers to capture the sequential information.
  • Merge the outputs of both branches using concatenation or other merging techniques.
  • Add additional layers (e.g., fully connected layers, dropout, etc.) for further processing and non-linearity.
  • Finally, include an output layer with an appropriate activation function (e.g., softmax for multi-class sentiment classification) to obtain the final sentiment prediction.

Combining the numerical rating feature with the text feature represented by word embeddings allows the model to capture the sentiment expressed in the text reviews while also considering the associated numerical ratings. This combined approach allows the model to leverage structured numerical and unstructured contextual information in the text data, leading to more accurate sentiment predictions.

It’s worth noting that the specific architecture and preprocessing steps may vary depending on the dataset, problem, and available resources. Experimentation and fine-tuning are often necessary to find the best combination for a given task.

Python example

Here’s an example Python code snippet demonstrating how to combine numerical and text features using Keras as discussed in the example above:

import numpy as np
from keras.models import Model
from keras.layers import Input, Embedding, LSTM, Dense, concatenate
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

# Assuming you have a dataset with numerical ratings and corresponding text reviews
ratings = np.array([4, 5, 2, 3, 1])  # Example numerical ratings
reviews = np.array([
    "Great product, highly recommended!",
    "Awesome experience with this product.",
    "Average quality, not satisfied.",
    "Decent product, could be better.",
    "Terrible product, don't waste your money."
])  # Example text reviews
sentiments = np.array([1, 1, -1, 0, -1])  # Example sentiment labels (1 for positive, -1 for negative, 0 for neutral)

# Split the dataset into train and test sets
reviews_train, reviews_test, ratings_train, ratings_test, sentiments_train, sentiments_test = train_test_split(
    reviews, ratings, sentiments, test_size=0.2, random_state=42

# Text preprocessing
max_words = 1000  # Maximum number of words to consider
max_sequence_length = 100  # Maximum length of each review
tokenizer = Tokenizer(num_words=max_words)
sequences_train = tokenizer.texts_to_sequences(reviews_train)
sequences_test = tokenizer.texts_to_sequences(reviews_test)
word_index = tokenizer.word_index

# Pad sequences to have the same length
X_train = pad_sequences(sequences_train, maxlen=max_sequence_length)
X_test = pad_sequences(sequences_test, maxlen=max_sequence_length)

# Numerical feature normalization
ratings_min = ratings.min()
ratings_max = ratings.max()
ratings_train_normalized = (ratings_train - ratings_min) / (ratings_max - ratings_min)
ratings_test_normalized = (ratings_test - ratings_min) / (ratings_max - ratings_min)

# Define the neural network architecture
embedding_dim = 100  # Dimensionality of the word embeddings
lstm_units = 128  # Number of units in the LSTM layer

# Text input branch
text_input = Input(shape=(max_sequence_length,))
embedding_layer = Embedding(max_words, embedding_dim)(text_input)
lstm_layer = LSTM(lstm_units)(embedding_layer)

# Numerical input branch
numerical_input = Input(shape=(1,))
numerical_dense = Dense(32, activation='relu')(numerical_input)

# Merge the branches
merged = concatenate([lstm_layer, numerical_dense])
dense_layer = Dense(32, activation='relu')(merged)
output = Dense(3, activation='softmax')(dense_layer)  # 3 classes for sentiment prediction

# Create the model
model = Model(inputs=[text_input, numerical_input], outputs=output)

# Compile and train the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])[X_train, ratings_train_normalized], sentiments_train, epochs=10, batch_size=32, verbose=1)

# Evaluate the model
loss, accuracy = model.evaluate([X_test, ratings_test_normalized], sentiments_test, verbose=0)
print(f"Test loss: {loss:.4f}")
print(f"Test accuracy: {accuracy*100:.2f}%")

In this example, we use the Keras library with the TensorFlow backend to implement the neural network model. We define two branches: one for the text input and another for the numerical information.

These branches are merged, and additional layers are added for further processing. The model is compiled with appropriate loss functions and metrics for sentiment classification. Finally, the model is trained and evaluated using text and numerical ratings combined.

Please note that this is a simplified example, and you may need to adapt the code based on your specific dataset and requirements.


Combining numerical and text features in deep neural networks provides several advantages, such as enhanced performance, comprehensive information, cross-domain learning, and contextual understanding. By leveraging the strengths of both numerical and text data, models can capture more subtle patterns and make more accurate predictions.

However, challenges are also associated with combining these features, including increased complexity, dimensionality, preprocessing requirements, and data sparsity. It’s crucial to carefully consider these factors and experiment with different approaches to find the best combination for your task.

Combining numerical and text features in deep neural networks enables more powerful and versatile models, particularly sentiment analysis, text classification, and recommendation systems. It allows for a more holistic understanding of the data, incorporating structured information from numerical features and the rich context from text features, ultimately leading to improved performance and insights.

About the Author

Neri Van Otten

Neri Van Otten

Neri Van Otten is the founder of Spot Intelligence, a machine learning engineer with over 12 years of experience specialising in Natural Language Processing (NLP) and deep learning innovation. Dedicated to making your projects succeed.

Related Articles

Continual learning addresses these challenges by allowing machine learning models to adapt and evolve alongside changing data and tasks.

Continual Learning Made Simple, How To Get Started & Top 4 Models

The need for continual learning In the ever-evolving landscape of machine learning and artificial intelligence, the ability to adapt and learn continuously (continual...

Sequence-to-sequence encoder-decoder architecture

Sequence-to-Sequence Architecture Made Easy & How To Tutorial In Python

What is sequence-to-sequence? Sequence-to-sequence (Seq2Seq) is a deep learning architecture used in natural language processing (NLP) and other sequence modelling...

Cross-entropy can be interpreted as a measure of how well the predicted probability distribution aligns with the true distribution.

Cross-Entropy Loss — Crucial In Machine Learning — Complete Guide & How To Use It

What is cross-entropy loss? Cross-entropy Loss, often called "cross-entropy," is a loss function commonly used in machine learning and deep learning, particularly in...

nlg can generate product descriptions

Natural Language Generation Explained & 2 How To Tutorials In Python

What is natural language generation? Natural Language Generation (NLG) is a subfield of artificial intelligence (AI) and natural language processing (NLP) that focuses...

y_actual - y_predicted

Top 8 Loss Functions Made Simple & How To Implement Them In Python

What are loss functions? Loss functions, also known as a cost or objective functions, are critical component in training machine learning models. It quantifies a...

chatbots are commonly used for Cross-lingual Transfer Learning

How To Implement Cross-lingual Transfer Learning In 5 Different Ways

What is cross-lingual transfer learning? Cross-lingual transfer learning is a machine learning technique that involves transferring knowledge or models from one...

In text labelling and classification, each document or piece of text is assigned to one or more predefined categories or classes

Text Labelling Made Simple With How To Guide & Tools List

What is text labelling? Text labelling, or text annotation or tagging, assigns labels or categories to text data to make it more understandable and usable for various...

Automatically identifying these languages is crucial for search engines, content recommendation systems, and social media platforms.

Language Identification Complete How To Guide In Python [With & Without Libraries]

What is language identification? Language identification is a critical component of Natural Language Processing (NLP), a field dedicated to interacting with computers...

Multilingual NLP is important for an ever globalising world

Multilingual NLP Made Simple — Challenges, Solutions & The Future

Understanding Multilingual NLP In the era of globalization and digital interconnectedness, the ability to understand and process multiple languages is no longer a...


Submit a Comment

Your email address will not be published. Required fields are marked *

nlp trends

2023 NLP Expert Trend Predictions

Get a FREE PDF with expert predictions for 2023. How will natural language processing (NLP) impact businesses? What can we expect from the state-of-the-art models?

Find out this and more by subscribing* to our NLP newsletter.

You have Successfully Subscribed!