A New Generation Of Data Analytics Tools For the Cloud

Cloud Data Analytics

Subscribe to Cloud Data Analytics: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Cloud Data Analytics: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Cloud Data Analytics Authors: PagerDuty Blog, AppNeta Blog, Automic Blog, Ed Featherston, Steve Latham

Related Topics: Cloud Computing, Business Intelligence, Dana Gardner's BriefingsDirect, Cloud Data Analytics, Big Data on Ulitzer

Business Intelligence: Blog Post

Data Analysis Is Changing the Face of Political Campaigning | @BigDataExpo #BI #BigData #Analytics

2016 election campaigners look to Big Data analysis to gain an edge in intelligently reaching voters

The next BriefingsDirect Voice of the Customer digital transformation case study explores how data-analysis services startup BlueLabs in Washington, DC helps presidential election campaigns better know and engage with potential voters.

We'll learn how BlueLabs relies on high-performing analytics platforms that allow a democratization of querying, of opening the value of vast data resources to discretely identify more of those in the need to know.

Here to describe how big data is being used creatively by contemporary political organizations for two-way voter engagement, we're joined by Erek Dyskant Co-Founder and Vice President of Impact at BlueLabs Analytics in Washington. The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Obviously, this is a busy season for the analytics people who are focused on politics and campaigns. What are some of the trends that are different in 2016 from just four years ago. It’s a fast-changing technology set, it's also a fast-changing methodology. And of course, the trends about how voters think, react, use social, and engage are also dynamic. So what's different this cycle?

Dyskant: From a voter-engagement perspective, in 2012, we could reach most of our voters online through a relatively small set of social media channels -- Facebook, Twitter, and a little bit on the Instagram side. Moving into 2016, we see a fragmentation of the online and offline media consumption landscape and many more folks moving toward purpose-built social media platforms.

If I'm at the HPE Conference and I want my colleagues back in D.C. to see what I'm seeing, then maybe I'll use Periscope, maybe Facebook Live, but probably Periscope. If I see something that I think one of my friends will think is really funny, I'll send that to them on Snapchat.

Where political campaigns have traditionally broadcast messages out through the news-feed style social-media strategies, now we need to consider how it is that one-to-one social media is acting as a force multiplier for our events and for the ideas of our candidates, filtered through our campaign’s champions.

Gardner: So, perhaps a way to look at that is that you're no longer focused on precincts physically and you're no longer able to use broadcast through social media. It’s much more of an influence within communities and identifying those communities in a new way through these apps, perhaps more than platforms.

Social media

Dyskant: That's exactly right. Campaigns have always organized voters at the door and on the phone. Now, we think of one more way. If you want to be a champion for a candidate, you can be a champion by knocking on doors for us, by making phone calls, or by making phone calls through online platforms.

You can also use one-to-one social media channels to let your friends know why the election matters so much to you and why they should turn out and vote, or vote for the issues that really matter to you.

Gardner: So, we're talking about retail campaigning, but it's a bit more virtual. What’s interesting though is that you can get a lot more data through the interaction than you might if you were physically knocking on someone's door.

Dyskant: The data is different. We're starting to see a shift from demographic targeting. In 2000, we were targeting on precincts. A little bit later, we were targeting on combinations of demographics, on soccer moms, on single women, on single men, on rural, urban, or suburban communities separately.

Dyskant

Moving to 2012, we've looked at everything that we knew about a person and built individual-level predictive models, so that we knew each person's individual set of characteristics made that person more or less likely to be someone that our candidate would have an engaging conversation through a volunteer.

Now, what we're starting to see is behavioral characteristics trumping demographic or even consumer data. You can put whiskey drinkers in your model, you can put cat owners in your model, but isn't it a lot more interesting to put in your model that fact that this person has an online profile on our website and this is their clickstream? Isn't it much more interesting to put into a model that this person is likely to consume media via TV, is likely to be a cord-cutter, is likely to be a social media trendsetter, is likely to view multiple channels, or to use both Facebook and media on TV?

That lets us have a really broad reach or really broad set of interested voters, rather than just creating an echo chamber where we're talking to the same voters across different platforms.

Gardner: So, over time, the analytics tools have gone from semi-blunt instruments to much more precise, and you're also able to better target what you think would be the right voter for you to get the right message out to.

One of the things you mentioned that struck me is the word "predictive." I suppose I think of campaigning as looking to influence people, and that polling then tries to predict what will happen as a result. Is there somewhat less daylight between these two than I am thinking, that being predictive and campaigning are much more closely associated, and how would that work?

Predictive modeling

Dyskant: When I think of predictive modeling, what I think of is predicting something that the campaign doesn't know. That may be something that will happen in the future or it may be something that already exists today, but that we don't have an observation for it.

In the case of the role of polling, what I really see about that is understanding what issues matter the most to voters and how it is that we can craft messages that resonate with those issues. When I think of predictive analytics, I think of how is it that we allocate our resources to persuade and activate voters.

Over the course of elections, what we've seen is an exponential trajectory of the amount of data that is considered by predictive models. Even more important than that is an exponential set of the use cases of models. Today, we see every time a predictive model is used, it’s used in a million and one ways, whereas in 2012 it might have been used in 50, 20, or 100 sessions about each voter contract.

Gardner: It’s a fascinating use case to see how analytics and data can be brought to bear on the democratic process and to help you get messages out, probably in a way that's better received by the voter or the prospective voter, like in a retail or commercial environment. You don’t want to hear things that aren’t relevant to you, and when people do make an effort to provide you with information that's useful or that helps you make a decision, you benefit and you respect and even admire and enjoy it.

Dyskant: What I really want is for the voter experience to be as transparent and easy as possible, that campaigns reach out to me around the same time that I'm seeking information about who I'm going to vote for in November. I know who I'm voting for in 2016, but in some local actions, I may not have made that decision yet. So, I want a steady stream of information to be reaching voters, as they're in those key decision points, with messaging that really is relevant to their lives.

I want a steady stream of information to be reaching voters, as they're in those key decision points, with messaging that really is relevant to their lives.

I also want to listen to what voters tell me. If a voter has a conversation with a volunteer at the door, that should inform future communications. If somebody has told me that they're definitely voting for the candidate, then the next conversation should be different from someone who says, "I work in energy. I really want to know more about the Secretary’s energy policies."

Gardner: Just as if a salesperson is engaging with process, they use customer relationship management (CRM), and that data is captured, analyzed, and shared. That becomes a much better process for both the buyer and the seller. It's the same thing in a campaign, right? The better information you have, the more likely you're going to be able to serve that user, that voter.

Dyskant: There definitely are parallels to marketing, and that’s how we at BlueLabs decided to found the company and work across industries. We work with Fortune 100 retail organizations that are interested in how, once someone buys one item, we can bring them back into the store to buy the follow-on item or maybe to buy the follow-on item through that same store’s online portal. How it is that we can provide relevant messaging as users engage in complex processes online? All those things are driven from our lessons in politics.

Politics is fundamentally different from retail, though. It's a civic decision, rather than an individual-level decision. I always want to be mindful that I have a duty to voters to provide extremely relevant information to them, so that they can be engaged in the civic decision that they need to make.

Gardner: Suffice it to say that good quality comparison shopping is still good quality comparison decision-making.

Dyskant: Yes, I would agree with you.

Relevant and speedy

Gardner: Now that we've established how really relevant, important, and powerful this type of analysis can be in the context of the 2016 campaign, I'd like to learn more about how you go about getting that analysis and making it relevant and speedy across large variety of data sets and content sets. But first, let’s hear more about BlueLabs. Tell me about your company, how it started, why you started it, maybe a bit about yourself as well.

Dyskant: Of the four of us who started BlueLabs, some of us met in the 2008 elections and some of us met during the 2010 midterms working at the Democratic National Committee (DNC). Throughout that pre-2012 experience, we had the opportunity as practitioners to try a lot of things, sometimes just once or twice, sometimes things that we operationalized within those cycles.

Jumping forward to 2012 we had the opportunity to scale all that research and development to say that we did this one thing that was a different way of building models, and it worked for in this congressional array. We decided to make this three people’s full-time jobs and scale that up.

Moving past 2012, we got to build potentially one of the fastest-growing startups, one of the most data-driven organizations, and we knew that we built a special team. We wanted to continue working together with ourselves and the folks who we worked with and who made all this possible. We also wanted to apply the same types of techniques to other areas of social impact and other areas of commerce. This individual-level approach to identifying conversations is something that we found unique in the marketplace. We wanted to expand on that.

Increasingly, what we're working on is this segmentation-of-media problem. It's this idea that some people watch only TV, and you can't ignore a TV. It has lots of eyeballs. Some people watch only digital and some people consume a mix of media. How is it that you can build media plans that are aware of people's cross-channel media preferences and reach the right audience with their preferred means of communications?

Gardner: That’s fascinating. You start with the rigors of the demands of a political campaign, but then you can apply in so many ways, answering the types of questions anticipating the type of questions that more verticals, more sectors, and charitable organizations would want to be involved with. That’s very cool.

Let’s go back to the data science. You have this vast pool of data. You have a snappy analytics platform to work with. But, one of the things that I am interested in is how you get more people whether it's in your organization or a campaign, like the Hillary Clinton campaign, or the DNC to then be able to utilize that data to get to these inferences, get to these insights that you want.

What is it that you look for and what is it that you've been able to do in that form of getting more people able to query and utilize the data?

Dyskant: Data science happens when individuals have direct access to ask complex questions of a large, gnarly, but well-integrated data set. If I have 30 terabytes of data across online contacts, off-line contacts, and maybe a sample of clickstream data, and I want to ask things like of all the people who went to my online platform and clicked the password reset because they couldn't remember their password, then never followed up with an e-mail, how many of them showed up at a retail location within the next five days? They tried to engage online, and it didn't work out for them. I want to know whether we're losing them or are they showing up in person.

That type of question maybe would make it into a business-intelligence (BI) report a few months from that, but people who are thinking about what we do every day, would say, "I wonder about this, turn it into a query, and say, "I think I found something." If we give these customers phone calls, maybe we can reset their passwords over the phone and reengage them.

Human intensive

That's just one tiny, micro example, which is why data science is truly a human-intensive exercise. You get 50-100 people working at an enterprise solving problems like that and what you ultimately get is a positive feedback loop of self-correcting systems. Every time there's a problem, somebody is thinking about how that problem is represented in the data. How do I quantify that. If it’s significant enough, then how is it that the organization can improve in this one specific area?

All that can be done with business logic is the interesting piece. You need very granular data that's accessible via query and you need reasonably fast query time, because you can’t ask questions like that when you're going to get coffee every time you run a query.

Layering predictive modeling allows you to understand the opportunity for impact if you fix that problem. That one hypothesis with those users who cannot reset their passwords is that maybe those users aren't that engaged in the first place. You fix their password but it doesn’t move the needle.

The other hypothesis is that it's people who are actively trying to engage with your server and are unsuccessful because of this one very specific barrier. If you have a model of user engagement at an individual level, you can say that these are really high-value users that are having this problem, or maybe they aren’t. So you take data science, align it with really smart individual-level business analysis, and what you get is an organization that continues to improve without having to have at an executive-decision level for each one of those things.

Gardner: So a great deal of inquiry experimentation, iterative improvement, and feedback loops can all come together very powerfully. I'm all for the data scientist full-employment movement, but we need to do more than have people have to go through data scientist to use, access, and develop these feedback insights. What is it about the SQL, natural language, or APIs? What is it that you like to see that allows for more people to be able to directly relate and engage with these powerful data sets?

It's taking that hypothesis that’s driven from personal stories, and being able to, through a relatively simple query, translate that into a database query, and find out if that hypothesis proves true at scale.

Dyskant: One of the things is the product management of data schemas. So whenever we build an analytics database for a large-scale organization I think a lot about an analyst who is 22, knows VLOOKUP, took some statistics classes in college, and has some personal stories about the industry that they're working in. They know, "My grandmother isn't a native English speaker, and this is how she would use this website."

So it's taking that hypothesis that’s driven from personal stories, and being able to, through a relatively simple query, translate that into a database query, and find out if that hypothesis proves true at scale.

Then, potentially take the result of that query, dump them into a statistical-analysis language, or use database analytics to answer that in a more robust way. What that means is that our schemas favor very wide schemas, because I want someone to be able to write a three-line SQL statement, no joins, that enters a business question that I wouldn't have thought to put in a report. So that’s the first line -- is analyst-friendly schemas that are accessed via SQL.

The next line is deep key performance indicators (KPIs). Once we step out of the analytics database, consumers drop into the wider organization that’s consuming data at a different level. I always want reporting to report on opportunity for impact, to report on whether we're reaching our most valuable customers, not how many customers are we reaching.

"Are we reaching our most valuable customers" is much more easily addressable; you just talk to different people. Whereas, when you ask, "Are we reaching enough customers," I don’t know how find out. I can go over to the sales team and yell at them to work harder, but ultimately, I want our reporting to facilitate smarter working, which means incorporating model scores and predictive analytics into our KPIs.

Getting to the core

Gardner: Let’s step back from the edge, where we engage the analysts, to the core, where we need to provide the ability for them to do what they want and which gets them those great results.

It seems to me that when you're dealing in a campaign cycle that is very spiky, you have a short period of time where there's a need for a tremendous amount of data, but that could quickly go down between cycles of an election, or in a retail environment, be very intensive leading up to a holiday season.

Do you therefore take advantage of the cloud models for your analytics that make a fit-for-purpose approach to data and analytics pay as you go? Tell us a little bit about your strategy for the data and the analytics engine.

Dyskant: All of our customers have a cyclical nature to them. I think that almost every business is cyclical, just some more than others. Horizontal scaling is incredibly important to us. It would be very difficult for us to do what we do without using a cloud model such as Amazon Web Services (AWS).

Also, one of the things that works well for us with HPE Vertica is the licensing model where we can add additional performance with only the cost of hardware or hardware provision through the cloud. That allows us to scale up our cost areas during the busy season. We'll sometimes even scale them back down during slower periods so that we can have those 150 analysts asking their own questions about the areas of the program that they're responsible for during busy cycles, and then during less busy cycles, scale down the footprint of the operation.

I do everything I can to avoid aggregation. I want my analysts to be looking at the data at the interaction-by-interaction level.

Gardner: Is there anything else about the HPE Vertica OnDemand platform that benefits your particular need for analysis? I'm thinking about the scale and the rows. You must have so many variables when it comes to a retail situation, a commercial situation, where you're trying to really understand that consumer?

Dyskant: I do everything I can to avoid aggregation. I want my analysts to be looking at the data at the interaction-by-interaction level. If it’s a website, I want them to be looking at clickstream data. If it's a retail organization, I want them to be looking at point-of-sale data. In order to do that, we build data sets that are very frequently in the billions of rows. They're also very frequently incredibly wide, because we don't just want to know every transaction with this dollar amount. We want to know things like what the variables were, and where that store was located.

Getting back to the idea that we want our queries to be dead-simple, that means that we very frequently append additional columns on to our transaction tables. We’re okay that the table is big, because in a columnar model, we can pick out just the columns that we want for that particular query.

Then, moving into some of the in-database machine-learning algorithms allows us to perform more higher-order computation within the database and have less data shipping.

Gardner: We're almost out of time, but I wanted to do some predictive analysis ourselves. Thinking about the next election cycle, midterms, only two years away, what might change between now and then? We hear so much about machine learning, bots, and advanced algorithms. How do you predict, Erek, the way that big data will come to bear on the next election cycle?

Behavioral targeting

Dyskant: I think that a big piece of the next election will be around moving even more away from demographic targeting, toward even more behavioral targeting. How is it that we reach every voter based on what they're telling us about them and what matters to them, how that matters to them? That will increasingly drive our models.

To do that involves probably another 10X scale in the data, because that type of data is generally at the clickstream level, generally at the interaction-by-interaction level, incorporating things like Twitter feeds, which adds an additional level of complexity and laying in computational necessity to the data.

Gardner: It almost sounds like you're shooting for sentiment analysis on an issue-by-issue basis, a very complex undertaking, but it could be very powerful.

Dyskant: I think that it's heading in that direction, yes.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.