Win 15K When You Predict the Best March Madness Bracket Data Set

March Machine Learning Competition by Intel and Kaggle
March Machine Learning Competition by Intel and Kaggle

Intel and Kaggle have teamed up to introduce the new March Madness Learning Mania competition. The winner of this NCAA tournament based contest earns $15,000, may that be a Sports Techie community blog reader. I had a terrific opportunity to learn about Kaggle’s competition-based platform and Intel’s Big Data push towards business applications that can predict and break down the right data sets for any kind of SMB at an affordable price. Kaggle, a competition-based platform for predictive modeling and analytics, partnered with Intel to create the March Machine Learning Mania competition where some smart Sports Techie can win the free $15K prize. Today is the last day to enter so do is NOW!

We have assembled the basic elements necessary to get started with tournament prediction
We have assembled the basic elements necessary to get started with tournament prediction

Secret to Predicting March Madness Winners Hidden In Data Via iQ By Intel

Sports + Tech

By Robert Roble, iQ Contributor & Editor at SportsTechie

Big data, the process of crunching numbers and finding useful meaning, may be best know for driving business success, but increasingly it’s play an essential role in sports for both fans and organizations, especially during March Madness.

Kaggle, a competition-based platform for predictive modeling and analytics, teamed up with Intel to create the March Machine Learning Mania competition where the winner takes home $15,000.

According to Kaggle data scientist Will Curkierski, winners of this contest will rely more on numbers than gut intuition.

“This is all about the numbers, not necessarily sports,” said Curkierski, who describes his coworkers as more chess scoring and modeling experts than sports analysts. That’s because data collection, sleuthing and whittling down is a critical first step to surface the best data for attacking a problem like picking the winning 2014 NCAA men’s basketball team.

“People are surprised to find how much data is useless to making decisions,” said Curkierski. “Much of data is not fine grained enough, it comes from many different places and can be messy, so our team has to scrub or clean the data first, which gives March Machine Learning Mania competitors essential data they they can use to build their predictive models.”

He said the first phase of the competition was more like practice leading up to the NCAA tournament, but the data sets created by various teams underlined a trend early in the process.

“We saw interesting preliminary results,” he said. “Sure Louisville was ranked 4th or 5th, but almost everyone in our competition said that team should be ranked higher based on their data crunching. Soon, even the press began saying the same thing.”

He said this was borne out of the data, something that existed whether or not people had a gut feeling about Louisville’s official ranking.

Intel and Kaggle have teamed to introduce the March Madness Learning Mania competition
Intel and Kaggle have teamed to introduce the March Madness Learning Mania competition

The data part is fun, allowing number lovers to go totally crazy looking at information about where basketball players typically shoot from on the court or testing a dead simple models like choosing top seeds in every game to see how it turns out. Every play, score and game adds more fodder for data scientists, which makes Curkierski wonder where to draw the line between number crunching and experts such as sports analysts and commentators.

“Offence is a driving factor in sports, but numbers might show something very different,” he said. “It may show more clearly the critical role of defense. On average, number crunching is far more important than domain experts.”

“One of the greatest challenges we see is the lack of creativity in using data,” said Boyd Davis, vice president of Data Center Group at Intel. “Our goal with the competition is to drive more innovative uses for data that then may be applied to other industries such as healthcare or retail.”

More About March Madness Learning Mania

From fantasy football to college basketball and odds making, data is bringing an unprecedented level of insight to the game, which translates to better players, better games, and a richer (and way more fun) fan experience.

March Madness has become big data because of sports technology. A number of key factors usually come into play when predicting winners: a super-fast computer and possible a a second or third screen to reference, hard earned instincts, being a loyal alumni, season rankings, tournament seeds, player statistics, teams on hot streaks, experts and the closeness of the region where a team is placed in the tournament to the actual campus.

Kaggle Cross Validation Pool

Teams looking to gain the ultimate competitive advantage are now turning to data scientists for insight. Team administrators, fantasy basketball fans and sponsors who want to sponsor the hottest teams are all welcoming the fast-emerging trend of predictive analytics.

Kaggle is helping manage the big data part of the competition that will provide participants with historical data on the past two decades of college men’s basketball. Feel free to use supplementary external data. Acceptable data might include tournament seeds to individual player stats, from geographical factors to social media insights. The team to develop the closest prediction wins the entire cash prize once the tournament has finished.

How the Competition Works

In this first NCAA tournament predictive challenge developed for both casual and diehard sports fans, teams will go through two stages: first build and test the model against the last five tournaments, then predict the outcome by the beginning of the 2014 tournament on March 18-19. No changes are permitted once the tournament begins.

You don’t need to participate in the first stage to enter the second, but the first stage exists to incentivize model building and provide a means to score predictions. The real competition is forecasting the 2014 game results, for which you’ll predict winning percentages for the likelihood of each possible matchup, not just a traditional bracket.

The reason you predict 68 total games and not 64 games is because of the “play-in” round, known as the first round. The first round loser out games will not be scored. Look for an updated solution file as the brackets unfold, which will cause the ranks on the leaderboard to change.

The competitive edge of big data analytics can give players the insight they need to make more accurate predictions. Demonstrating the power of analytics for everyone from fans to sports organizations, this online competition will give data scientists, mathematicians and sports fans the chance to show off their data iQ smarts while they compete to bring home the big award.

Will Games be Weighted?

Intel made a decision to not weight later games in order to keep scoring simple while counting all games equally. Any weights Intel adds in would be mostly arbitrary (how many first-round games is a championship game worth or does travel distance matter?). Also, weighting any game increases the role that blind luck plays in determining the winner. Perhaps the smartest decision by Intel was to structure this competition so that people can still be in the running even if there are the usual early-round upsets.

Intel is making its big data technologies more affordable, available, and easier to use for everything from helping develop new scientific discoveries and business models to gaining the upper hand on good-natured predictions of sporting events. How well can machine learning and statistical techniques improve the forecast? Presented by Intel, this competition will test how well predictions based on data stack up against a (jump) shot in the dark.

Data Science Apps

A growing number of websites and applications are crunching data to help users take advantage of data analytics. One example is BracketOdds. Created by a computer science professor at University of Illinois, it calculates the probability of a combination of seeds advancing in the tournament.

But it’s also about putting the power of data directly in the hands of those who depend on it. Making big data analytics technology and science more affordable, available and easier to use will better unlock game intelligence, providing sports organizations with the competitive advantage they need, as well as enriching the sports experience – from technologically advanced stadiums to better fantasy data – for fans.

As business continues to accept data-driven thinking, it’s important to remember analytics can enable far more than the enterprise. If basic stats can give the upper hand in making a bet, imagine the capabilities of a finely tuned algorithm churning through volumes of structured data.

$1 Billion Prize

If you enter the Warren Buffet Quicken challenge to win the unprecedented $1 billion prize, statistics indicate the odds of creating a perfect entry are astronomical – 1 in 18 quintillion, so why bother. The smartest college hoops fan is 50 million times more likely to win a Mega Millions jackpot than predict each winner in the 63 game tournament format.

While the odds of creating a perfect bracket are excessive, these odds are made better by the growing amount of data collected throughout the season.

Betting on Big Data

A general manager is on the constant search for the best basketball algorithm that will help identify the perfect player, the perfect play, even the perfect win. Accurate predictions regarding the outcome of each game are what drive the “juice” and profits in Las Vegas for casinos, bookies and gamblers.

Can the results of a Big Data algorithm like this be used for sports betting? Maybe says Kaggle, but that’s not a goal of this competition. Kaggle claims no rights to the intellectual property developed by competitors. Have fun and learn. If you want to use the results to gamble, that’s between you, your bookie and your local laws.

Let the Games Begin

In this unique March Machine Learning Mania contest, consider picking a few 12th seed upset wins over the 5th seeds early on. That way the bracket becomes in essence a living, breathing software data set that can be unlocked through tech to gain unmatched insight, in turn allowing for more accurate predictions. If you dream in Hadoop, data center upgrades and about your data dream team, enter now.

May the most innovative team with the best data crunchers win!

Robert Roble founded Sports Techie, a sports technology community, blog and expert resource in 2010 after a once-in-a-lifetime role as Wetpaint’s Moderator. He moderated the New York Giants, Houston Rockets and HBO Entourage historical wiki and online communities, in addition to writing blogs for DWTS and MSN. Bob is a pioneer in sports tech, an untapped market valued at $200 billion. His career in sports and tech spans four decades, where he’s worked for Paul Allen and the Seattle Seahawks, Magic and Dartfish. The Sports Techie social media network is global and passionate about green, robots and animals. Engage with his blog, friend @SportsTechieNET on Twitter and Like the Facebook fan page; also follow on Google+, YouTube and LinkedIn. He is happy for the opportunity to focus his iQ by Intel eye on innovative sports technology related content, trends and products that involve Intel’s tech, people and happy customers.

Sports Techie, As of this writing, only 253 teams have enter to win the 15K prize by Intel so get your entry in right now at:

March Machine Learning Mania

The innovative software designed by Kaggle and Intel rewards data hounds who can creatively predict outcomes using data sets other fans or bettors may not have thought of yet. Gathering structured data makes for the beginnings of a good March Madness bracket and business planning. Crunching data should give you a competitive edge in bracketology and business development opportunities. Analytical technologies have come down in price for the small to medium sized sports tech business owner who is searching for a way to step up to final four levels thanks to Kaggle and Intel.

Secret to Predicting March Madness Winners Hidden In Data Via iQ by Intel.

I will see ya when I see ya, THE Sports Techie @THESportsTechie –

Sports Techie Social Media Networks
Sports Techie Skype: sportstechie
Sports Techie Twitter: @SportsTechieNET:
Sports Techie Facebook Fan Page:
Sports Techie YouTube Channel:
Sports Techie Google+:
Sports Techie Google+ Community:
Sports Techie (Robert Roble) Google+:
Sports Techie LinkedIn Group:
Sports Techie Instagram:
Sports Techie Pinterest:
Sports Techie Moby Picture:
Sports Techie Myspace (Bob Roble):




Tags: Sports Techie, sports technology, sports tech


One response to “Win 15K When You Predict the Best March Madness Bracket Data Set”

  1. […] the “Kaggle Competition” March Madness machine-learning bracket contest sponsored by Intel that I blogged about. It turns out they spent about 15-17 hours modeling the competition data. As […]