Out-of-memory Sentiment Analysis code error - Need...

sachinbhatt · ‎Jul 25, 2023

Hello,

I'm working on a Sentiment Analysis project utilising a huge movie reviews dataset I obtained from here, and when I execute my code, I keep getting an out-of-memory issue. Here's an example of pertinent code from my code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC

# Load the large movie reviews dataset
data = pd.read_csv('large_movie_reviews.csv')

# Preprocess the data
# ... (code for data preprocessing)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data['review'], data['sentiment'], test_size=0.2, random_state=42)

# Vectorize the text data using TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Support Vector Machine (SVM) model
model = SVC(kernel='linear')
model.fit(X_train_vectorized, y_train)

# Evaluate the model
accuracy = model.score(X_test_vectorized, y_test)
print(f"Accuracy: {accuracy}")

Unfortunately, because my dataset is pretty enormous, I get an out-of-memory problem when I run this code. TfidfVectorizer, I believe, creates a big sparse matrix that takes a substantial amount of memory. I'm seeking for memory-efficient alternatives or strategies that will allow me to deal with enormous datasets while still training an accurate sentiment analysis model.

Could you kindly provide some memory-efficient ways or alternatives that I may use to prevent the out-of-memory problem and keep working with my enormous movie reviews dataset?

Thank you so much!

VladimirRosu · ‎Jul 25, 2023

Hi @sachinbhatt ,

The forum where you posted your message is a ThingWorx specific forum.

Is there a specific ask related to ThingWorx? If not, stackoverflow has a far higher chance of giving you the advice you need.

Out-of-memory Sentiment Analysis code error - Need memory-efficient alternatives!

Out-of-memory Sentiment Analysis code error - Need memory-efficient alternatives!