Speeding Up Text Splitting: Julia vs. Python

Alex Tantos
7 min readJust now
Photo by Indira Tjokorda on Unsplash

No medium membership? Read this article for free here.

In many fields, speed is crucial when working with large datasets or real-time data processing. As someone who frequently handles text and language data, I often find myself choosing between different programming languages to find the most efficient approach for such tasks. One common operation in data preprocessing is text splitting. In this article, I’ll share an experiment demonstrating how Julia’s speed outshines Python’s when it comes to splitting text.

The first Experiment: Splitting a Short Text

Text splitting is a frequent task when processing data for text analysis, natural language processing (NLP), or even when cleaning and tokenizing text for machine learning models. To compare Julia and Python, I benchmarked the time it takes to split strings of text using both languages.

The text used for this first experiment is from the well-known placeholder text of lorem ipsum:

"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore

--

--

Alex Tantos

Assoc. Prof. of Computational Linguistics, helping you understand programming concepts, AI, its limitations, and its ties to education and digital humanities.