
The BPM Effect: Success Among Spotify’s Most-Streamed Songs
**Visit my GitHub for project details, code, queries, and documentation.
Business Problem and Objectives:
The goal of this project is to explore the musical attributes that influence the streaming success of songs on Spotify.
Through detailed data analysis, we investigate patterns in song characteristics—such as tempo (BPM), energy, danceability, valence, liveness, instrumentalism, etc. —to uncover what drives high stream counts.
This analysis provides insights that can help artists, producers, and marketers optimize their song features for better engagement. Further research may explore factors like playlist placements and cross-platform performance, adding depth to the understanding of a song’s success.
Summary of Analysis:
This report explores the relationships between musical attributes such as BPM, energy, and danceability, and their impact on the streaming success of Spotify’s most-streamed songs. Through a combination of quartile analysis, scatterplots, and z-scores for identifying outliers, we reveal significant patterns and trends. The following Key Insights section delves deeper into these attributes, uncovering correlations that suggest which characteristics contribute most to high stream counts. Additionally, the analysis highlights outlier tracks that deviate from the norm, offering intriguing insights into what might make certain songs go viral or stand out.
Using data from Spotify’s most-streamed songs, I’ve created visualizations and performed statistical analyses to identify key trends and outliers. The analysis focuses on BPM ranges, energy levels, and danceability (measured on a scale of 0 to 100) - to uncover how these attributes correlate with streaming success. Each section highlights key patterns, as well as outliers that deviate from these trends, offering insights into what might make certain tracks stand out and perform better than expected.
Insights:
1. BPM's Influence on Streaming Success: Faster Tempos Dominate
Key Finding: While the overall correlation between BPM and streaming success is weak (-0.0024), indicating that BPM alone is not a strong predictor of success, songs with faster tempos (160+ BPM) outperform others in terms of average streams, reaching 643 million on average. This suggests that although BPM by itself doesn’t drive streaming success, fast-paced songs tend to perform better when combined with other factors, such as genre or energy level, which could explain their higher average streams.
2. Energy, Danceability, and Their Marginal Effects on BPM:
Key Finding: While Energy and danceability, both measured on a scale from 0 to 100, show weak correlations with streaming success (energy at -0.02605 and danceability at -0.1055), the highest average BPM is found in Quartiles 2 and 3, each with an average of 124 BPM. The top quartile (Quartile 1), representing the most-streamed songs, has a slightly lower average BPM of 121.9.
This suggests that while faster BPMs can be linked to higher streams, the relationship is not straightforward, and the top-streamed songs may not always have the highest BPM. Additionally, energy and danceability show smaller variations across the quartiles, with marginal differences between them, further suggesting that these attributes alone do not significantly impact streaming success.
Chart Insight: Songs in the 160+ BPM range dominate in average streams, as seen in the chart “Average Streams by BPM Range.” Fast-paced tracks tend to perform better in terms of streams, likely influenced by other factors beyond BPM.
Strategic Consideration: Artists and producers targeting virality or appeal should consider creating high-tempo songs, particularly in genres like pop or electronic music, where high energy levels are often associated with success.
———————————————
———————————————
Chart Insight: The visualization titled Energy, Danceability, and BPM Across Stream Percentiles shows that while Quartiles 2 and 3 have the highest BPM averages (124 BPM), Quartile 1, despite having the most streams, does not significantly differ in tempo. Energy and danceability percentages show only marginal increases, indicating that other factors beyond these metrics likely drive the success of top songs.
Strategic Consideration: Although BPM tends to be higher in Quartiles 2 and 3, this marginal difference across quartiles highlights that tempo alone is not a key driver of streaming success. The minimal variation in energy and danceability across quartiles suggests that inherent biases in analyzing top-streamed songs may overshadow these metrics, pointing to external factors like playlist placements. To maximize streaming success, especially in pop and electronic genres, artists and producers should focus not only on higher BPM but also on strategic playlist placements and robust marketing efforts.
Chart Insight: In the Valence, Liveness, and Instrumentalness Across Stream Quartiles chart, the drop in valence for top quartile songs might suggest that the most-streamed tracks either have fewer lyrics or focus on lyrical themes that are less explicitly positive. This could reflect trends toward more neutral or complex emotional content in widely popular songs. On the other hand, instrumentalness and liveness remain low and relatively stable, highlighting that live or instrumental elements alone aren’t strong predictors of popularity.
Strategic Consideration: The slight decline in valence within the top quartile suggests that while positivity may help mid-tier tracks gain traction, the most popular songs balance emotional tone, with a possible shift toward subtler, more nuanced themes. Artists and producers should explore a range of emotional expressions rather than focusing solely on high-valence tracks. Additionally, given the minimal impact of liveness and instrumentalness, these elements should be used as stylistic choices rather than as levers for increasing streams.
3. Emotional Nuance: The Role of Valence, Liveness, and Instrumentalness in Streaming Trends:
Key Finding: Valence (positivity) peaks in the third quartile at 53.80, while the top quartile shows a lower average of 48.80. This shift could imply that top-streamed tracks may rely less on highly positive emotional content, possibly due to a preference for more neutral or melancholic tones. Liveness and instrumentalness show little variation across quartiles, reinforcing their limited influence on streaming success.
4. The Impact of BPM Versatility on Streaming Success:
Key Finding:This analysis highlights a potential sweet spot for artists experimenting with different BPM ranges. The highest average streams, 825M, were seen in artists who spanned three BPM ranges. However, those who explored up to six ranges also saw significant success with 655M streams, suggesting a pattern that experimenting with BPM variety can enhance audience appeal. Conversely, artists focused on four ranges had the lowest average streams at 395M.
Chart Insight: The BPM Bucket Count vs. Average Streams chart shows that artists who experiment with BPM ranges—specifically those working within three distinct BPM buckets—outperform others in terms of streaming success. "BPM bucket count" refers to how many BPM ranges an artist's tracks cover. The data suggests that offering variety without overextending into too many BPM categories allows artists to engage diverse listener groups, potentially catering to both slower and faster tempo preferences.
Strategic Consideration: Artists aiming for higher streams could benefit from BPM diversity. While covering a wide range (up to six BPM buckets) can attract more streams, the most successful group focuses on a moderate level of experimentation, with the sweet spot at three BPM ranges. Balancing variety without overextending into too many BPM categories could boost audience appeal.
5. Outliers and Streaming Success: A Z-Score Analysis of Deviant Tracks
Key Finding: While BPM deviations exist among popular songs, they don’t heavily influence streaming success. Instead, external factors such as artist popularity, emotional resonance, and media promotion seem to play a far more significant role. Established artists, in particular, can successfully experiment with tempo while maintaining high streams.
2. BPM vs. Streams Z-Score Comparison for Outliers: For the outliers, although BPM values deviate significantly (e.g., "Worldwide Steppers" by Kendrick Lamar at a -1.801 BPM z-score), the corresponding streams remain relatively stable, with only minor deviations from the mean. This suggests that BPM alone does not heavily impact streaming success for these tracks.
Artists like Kendrick Lamar, with a BPM bucket count of 6, have demonstrated an ability to maintain high streams across a wide range of tempos. This indicates that other factors—such as the artist's established influence, lyrical content, media promotion, or emotional appeal—play a much larger role in driving streams than tempo alone. Furthermore, the presence in multiple BPM buckets may contribute to an artist’s ability to reach a broader and more diverse audience, making them less dependent on specific BPM ranges to achieve streaming success.
1. Streams vs. Tempo (BPM): The majority of songs fall within the 60-160 BPM range, without a strong correlation between BPM and streaming success. Outliers in terms of BPM do exist, but these tracks don’t necessarily have proportionally higher or lower streams, suggesting BPM alone isn’t a significant factor in their success.
3.Outlier Songs: BPM and Stream Z-Scores Analysis Table:
Established artists like Kendrick Lamar, Billie Eilish, and The Weeknd show high deviations in BPM, but their streams remain consistent. This reinforces the idea that their experimentation with tempo doesn't negatively impact streaming performance, as these artists already have a strong following and benefit from media attention.
Strategic Consideration:
The presence of well-established artists among the outliers demonstrates that BPM flexibility doesn’t detract from streaming success when coupled with factors like reputation, emotional depth, and media promotion. For example, Billie Eilish’s What Was I Made For? at 78 BPM taps into emotional appeal, possibly resonating with listeners on a deeper level. Similarly, Kendrick Lamar’s wide range of BPMs across his songs reflects his artistic experimentation, which is well-received due to his established fanbase and lyrical complexity.
Additionally, tracks like Ed Sheeran’s Curtains (176 BPM) and Feid’s POLARIS - Remix (170 BPM) highlight how media promotion or playlist placement can help push songs into higher streaming brackets, regardless of their tempo. This suggests that BPM variations are more of a creative choice than a strategy for driving streams.
Lastly, the dataset's inherent bias, as it focuses on the most-streamed songs, underscores how streaming success isn’t solely driven by song attributes like BPM. Instead, it points to the power of promotion, fan engagement, and external factors in determining the success of outlier tracks. Artists and producers should prioritize factors like emotional resonance, strategic releases, and media push when aiming for higher streams.
Conclusion:
This analysis provided an in-depth look at the factors influencing the streaming success of Spotify’s most-streamed songs. Through a detailed exploration of metrics such as BPM, energy, danceability, and valence, it became evident that there is no one-size-fits-all formula for streaming success.
Although the correlation between BPM and streams was minimal, experimental artists—those with a presence in multiple BPM ranges—often demonstrated higher stream counts. This suggests that these artists, by experimenting with different musical styles and tempos, can reach a broader audience and maintain relevance across genres. Their versatility allows them to connect with diverse listener preferences, which may contribute to their overall streaming success.
Additionally, metrics like valence (positivity) showed a stronger correlation with success than energy or danceability, underscoring the importance of emotional appeal. However, the analysis also acknowledged the dataset’s inherent bias toward popular tracks, emphasizing that external factors like marketing, playlist placements, and cross-platform promotion heavily influence a song’s reach.
In conclusion, this project reinforces the idea that while musical attributes provide valuable insight into a song’s potential, the broader context of an artist’s brand, promotion strategies, and emotional appeal to audiences are crucial in achieving streaming success. Experimental artists, in particular, benefit from a wider audience reach through their versatility. These findings offer practical guidance to artists and industry professionals, encouraging a balanced approach that considers both musical composition and broader promotional strategies.
Resources
Dataset:
Spotify Most-Streamed Songs Dataset, sourced from Kaggle:
https://www.kaggle.com/datasets/abdulszz/spotify-most-streamed-songs
Tools & Platforms:
Google Cloud SQL: Used for data storage, cleaning, and query optimization.
Mode Analytics: Utilized for SQL-based analysis, data visualization, and exploratory data analysis (EDA).
Extra Documentation: Methodology
Data Cleaning & Preparation
The dataset used in this analysis provides detailed information about Spotify’s most-streamed songs, including attributes such as BPM, energy, and danceability. The focus was on identifying correlations between musical attributes and streaming success, with additional exploration of outliers using z-scores to uncover deeper insights into artists’ performance.
Steps for Data Cleaning:
Data Acquisition & Importation:
The dataset was sourced from Kaggle and imported into Google Cloud SQL for storage and analysis. Proper importation involved restructuring columns for SQL compatibility and creating indices to optimize query performance for large datasets.
Data Formatting:
The initial dataset contained inconsistencies in numeric formats and percentage values (e.g., commas in numbers and percentages stored as strings). This was resolved by writing SQL queries to standardize the data:
Numeric Formatting: Removed commas from values (e.g., 1,021) and converted percentage metrics like energy% to decimals (e.g., 0.87), making them ready for calculations and comparisons.
Table Creation and Structure:
A custom table was created in Google Cloud SQL to efficiently handle the dataset. Key fields included:
track_name
,artist_name
,bpm
,energy_percent
,danceability_percent
,streams
,valence_percent
,liveness_percent
,instrumentalness_percent
, and more.Appropriate data types were assigned to each field (e.g.,
FLOAT
,VARCHAR
,INT
) to ensure efficient queries, aggregations, and filtering.
Handling Missing or Erroneous Data:
The dataset contained some missing or erroneous values, particularly in musical attributes and stream counts:
Missing Values: If important attributes (e.g., streams or BPM) were missing, records were either removed or replaced using statistically imputed values based on the column's mean.
Erroneous Data: For data showing clear inconsistencies (e.g., streams listed as “0” for highly streamed songs), these entries were cross-referenced with other datasets and corrected where possible or removed if unverifiable.
Exploratory Data Analysis (EDA) and Validation:
Once cleaned, preliminary SQL queries were run to explore key distributions within the dataset:
Summary Statistics: Calculations such as mean, median, and standard deviation for metrics like BPM, streams, and energy helped validate data consistency.
Visual Exploration in Mode Analytics: Scatterplots, histograms, and initial visualizations provided early insight into the relationships between musical features and streaming success, while helping to detect potential anomalies.
Iterative Refinement Using Z-Scores:
Z-score Calculations: Z-scores were applied to detect outliers in both streams and BPM, revealing deviations that were crucial to our analysis:
Outliers were flagged using a z-score threshold of |1.5|, leading to further refinement, especially for tracks with abnormal BPM and stream values.
Outliers were examined separately to explore how deviations from the norm impacted streaming success, uncovering insights such as how certain experimental artists achieved significant success despite tempo variations.
BPM Bucket Count Analysis:
One key aspect of the analysis was grouping artists by their presence across BPM ranges ("buckets") to understand if artists experimenting with different tempos performed better. The "bpm_bucket_count" field was created to track how many distinct BPM ranges each artist had songs in, providing valuable insight into the versatility and broader appeal of experimental artists.
Outlier Investigation & Trend Analysis:
Outliers were explored in detail by comparing the z-scores of both streams and BPM. Songs from established artists like Kendrick Lamar, The Weeknd, and Billie Eilish, though deviating in BPM, often had more stable stream z-scores, suggesting their established presence and versatility across different BPM ranges helped drive success. This insight was supported by artists' presence in multiple BPM buckets, highlighting that those with broader reach in terms of tempo often captured larger audiences.
Iterative Data Validation:
Throughout the analysis, the dataset was continuously revisited and refined. As new correlations and trends emerged (e.g., between BPM range versatility and streaming success), additional data checks were performed to ensure consistency and avoid bias. This iterative approach ensured that the final analysis was accurate, insightful, and reliable.