Azure Synapse Analytics | SQL, data warehouse, ML, and on-demand compute (Microsoft Ignite)


(upbeat electronic music) (audience applauding) – Hello, and welcome to
Microsoft Mechanics Live. Coming up, we’re joined by John MacIntyre to take a look at Azure Synapse Analytics, our next-gen Azure SQL
Data Warehouse Solution that delivers limitless
analytics, on-demand query compute at scale,
all as a single service. Now, we’re gonna show you
the end-to-end experience of building and deploying
rich analytic scenarios and how you can automatically generate predictive models and much more. And to do that I’m joined
again by Mr. John MacIntyre, from the Azure Synapse Analytics team. Welcome, everybody give
him a round of applause. (audience applauding) – [John] Thank you Jeremy,
it’s good to be back. – Thank you, so, last
time you were on the show wasn’t too many months
ago but we looked at, Azure SQL Data Warehouse generation two. Now that was super powerful,
it was a great platform in terms of analyzing massive data sets, but we recently looked at
the Modern Data Warehouse with Charles Fedderson,
and that solution pattern and how it really brought together a full stack effectively of data and analytics. But how do we evolve
that even further for, with Azure Synapse? – Yeah, so with our Modern Data Warehouse solution pattern really we were focused on taking best-of-breed services and allowing customers to interconnect these to enable, really entirely new analytics solutions that they couldn’t do with a traditional or enterprise data warehouse. Where you bring data in,
and we have very powerful ETL services, once that
data came into a data lake, I use big data processing
and processing engines to do that. To process that data, our
data warehouse services would allow you to serve that
data out to organizations, enabling visualization of data and BI. So, it’s a really good solution pattern, and for customers to go
build out their solutions, but you know, that said
I think what we realize is there’s a lot of complexity to wiring all of these pieces together. You need to right size of those services, you need to monitor health
and understand how services are interacting with one another. You need to be able to enforce security and security constraints. You need to be able to manage networking across all these technology pieces. So we’ve been really
focused on how do we create a platform that enables
end-to-end analytics that combines all of these
capabilities that you need for limitless analytics
into a single service. And that really is
Azure Synapse Analytics. It’s the evolution of our Azure
SQL Data Warehouse service. And so, with Azure Synapse we give you a unified service where a
fully-integrated capability with, not just ETL, but
we’ve also enabled hybrid data ingestion and orchestration, all of that’s built in to
the service and platform. We give you secure self-service
enterprise analytics through our data warehouse, and we’re also providing AI and big data processing
built into Synapse. We’ve added capabilities
such as efficient compute for on-demand query processing. As well as monitoring,
management, integrated security, all through a unified security. That’s a big piece as well,
is the great experience that we offer customers,
this all leverages our Azure Data Lake storage
service for any type of data. You can put all of your data
in that storage service. And then, certainly
we’ve done a lot of work on our open data
initiative, allowing people to bring data in from a variety sources, Adobe is a great example as a partner on open data initiative. – Right, so it’s really
powerful because you’re able to on-board your organization much faster. You’re removing all the integration work, kind of assemble all the
building blocks together and all those core pieces
and it’s kind of just a turn-key service now. It’s entire analytics
platform out of the box. It’s well tuned and
it’s gonna get massive, it will meet the needs really
of any workload, right? – Now that’s right, any
data, any analytics, that limitless scale, Azure Synapse, we’re really eliminating
that divide that exists between the data warehouse
and the data lake. And of course, this is an open platform that integrates a broad set of services including Power BI, Dynamics,
Azure Machine Learning, Azure Data Share. And then as you think about
sort of your existing landscape for data, all the applications and things that you use inside your environment, we integrate with that ecosystem as well. We talked about a lot of good partners that we’ve launched with here,
with Azure Synapse Platform. – Okay then so how do
data teams then interact with the service? – Yeah, so that’s one of
the really unique aspects of Azure Synapse, so here you
have the Synapse workspace. And so within the Synapse
workspace, this is really where data engineers, data
scientists and data analysts can collaborate within this environment to do work with data, to do analytics. From the data hub, if I’m a data analyst, I get access to all of my data. So I have my storage
account, databases, datasets. I have my develop tab, so
if I’m a data scientist or if I’m writing scripts,
or if I’m doing analysis, I can write SQL based analysis, I can do notebooks and I
can do, if I’m using Spark, I can do a large scale
data processing with Spark within that environment. And if I’m a data engineer
and I wanna orchestrate a pipeline, so I wanna put
something in production, and get everything going end-to-end. I can do that from the
orchestrate hub here within the Synapse Studio. And of course, you can
monitor and manage the service so the service is not
just remove the Silos between people in the organization, but also brings the data closer together. – Right, why don’t you
give us a closer look then at how all this kind of works. – The way we think about as
Synapse bringing everything into a single service,
in fact, to get started, it’s really, we wanted to
simplify that process as well. So within Azure Portal, it’s
very, very simple to get going. All you have to do is create a workspace. You don’t have to
necessarily define servers, create databases, or do any of that setup. You just want that workspace
where you can start going and doing your work. – Right, you just kind of
need a storage account, I think, then you go through a
resource management template, like you’re seeing here. But all those kind of
sub-services, components, all those things you had to do with the solution pattern, it’s all just kind of here for you, right? – So I can do that right
here within the Azure Portal, I don’t think I can do capital, let’s see, it’s complaining on my syntax 201911, there we go. So if I bind that to a storage account, so basically Synapse will link to an Azure Data Lake storage account, or multiple Azure Data
Lake storage accounts. That’s where I can put all of my data, I just provide that, I
provide the workspace name, I give the region, and boom, I’m done. – Okay, so it’s super
easy to provision again compared to kind of putting
all the pieces together. But then how do you bring
data into the platform once you kind of got the
scaffolds and everything setup? – Yeah, because Synapse is
already ODI where any data that is pushed to the lake in ODI format, Synapse can work with. And for all other enterprise
data, as a data engineer, you would come to the pipelines hub. So within the pipeline’s
hub, I can build pipelines and I can ingest data,
but to start to do that, what I would do to start with is actually create a linked service. So I can link to an external data source, or a place where I wanna
bring data in from. We have over 90 connections
to various data providers, or data sources. These can include databases,
these include SAS applications, and it also includes other clouds. And so, if there’s data in AWS, there’s data in Google,
there’s data in Sales Force, you’d wanna bring that in. Synapse makes it really
easy to go do that. We’re taking ETL and data
integration to the next level. – Right, so, you’ve
linked everything up now, how do I actually bring
data into the service, once you’ve got everything linked up? – Azure Synapse provides both code first and code free experiences
to bring data in. So I’ll go over to a pipeline
that we have created already. This is just a really
simple ingestion pipeline, it’s copying data from one
of the linked services. I’m gonna zoom so folks can see this. Here we have a copy data activity, and then we have a data
flow, which is actually doing some transformation of
that data as it comes in. As I mentioned, you can
write code to do this data transformation or you can get a visual experience to do that. If I click on the data
flow, I can actually modify that data flow. So if you think if you’ve done
data transformations before, you think of left side right side, type of transformations,
what do I wanna do with the data in between
to make it cleaner and make it more usable for people. So here we can do a
variety of transformations, we have a full visual
interface to go do this. So in this case, let’s say I
wanna create a drive column. I can create a drive
column for that data flow, and then I would just save that, and then Synapse will do
all the heavy lifting, all the hard work. It pushes this code down,
it does the code gen of all this, pushes it
down to a scalable cluster, executes that and then off you go. – Awesome, so it’s super
graphical, intuitive, really good for collaboration where maybe a distributive
team might be able to share and interact with a common set pipelines. It’s kind of a nice thing there. And also data sets, but why
don’t we keep moving down the left navigation here, and look at what you can do under develop. ‘Cause I imagine that
this is kind of the core of what Azure Synapse is
given that it’s the evolution of Azure SQL data warehouse, right? – Right, yeah, SQL, if
you think about SQL, SQL is one of the most popular languages for enterprise analytics. This lights up where we’re
doing data visualization, and the enterprise data
warehouse traditionally has been the platform powering this. So Synapse extends the
capabilities of SQL data warehouse, so we’re building on the performance, the elasticity, the
security, the robustness and the scale of the service
to give your analysts a single source of truth
for enterprising analytics. – Cool, so, it’s been
optimized and extended with even more capabilities. So after our years,
because we’ve been running these kind of massive data and
compute service at Microsoft, but why don’t we switch gears. How do we integrate with any
existing data that we have? – So this is where the data
lake integration comes in. You remember, Jeremy, we
linked our data lake account to our workspace, so we
can go into and navigate to our data lake here. We’ll switch over to
that screen, there we go. So here we have access to
all of our data lake data. I’ll zoom into this. These are just parquet
files, so what I can do through the data lake integration, and through to the
capabilities in Synapse, is I can just simply
select some of this data. I can click on new SQL
scripts, what Synapse will do is automatically generate
a SQL statement for you and for me in this case. What it’s doing is
detecting the type of data that we’re actually analyzing, and so, I can click run, and
then what will happen is, we’re using, if you can see here, we’re using what’s called
the SQL on-demand service. So this is an on-demand query service that allows you to work
directly over the lake. – [Jeremy] Okay, why
don’t I explain kind of what’s going on demand. We flip over to my machine, you can see kind of the
way that things work. Normally, it’s kind of
an integrated data lake that you’ve got like we’re showing here. But you can also use it on
demand, and actually provision compute resources,
basically, so you can run some of these more powerful,
more power hungry queries that you’re gonna need,
when you need them. Only the exact moment that
you’re running the query, right? – [John] Yeah, absolutely,
so what we can see here is that back up on the demo device is that the query is actually
completed at this point. And now, I have the opportunity I can, within Synapse, I can also
quickly visualize the data. So if I click on the chart option here, that data, I get options
to visualize that data, I get options to chart that data, and so I can start to really do something with that data at this point. – Cool, so pretty awesome
in terms of having a query service that works
directly in this case over a data lake. – [John] Yep, and you see if
I go back to the table view, Synapse’s also automatically
detected the schema for that data, so all
the columns are there in the results pane. As I showed you, I can
also visualize that data, and I can do all of that right
from the Synapse workspace. – It’s amazing, ’cause here in this case, we’re actually going over a data lake, we’re schematizing that
even though it might be unstructured data, right? – That’s right, yeah. And so now, I can load that data directly into a SQL data warehouse,
and that becomes, when you sort of think about
the enterprise source of truth. Those load now, into what
we call Synapse tables. And I can meet the
performance expectations for that enterprise data
warehouse by provisioning, you talked about, Jeremy,
those provisioned resources where I might have some
demanding workloads on the other side of that. I may choose not to use
the on-demand service, but really get the capabilities
of a provisioned pool. – Cool, so it’s really great
that you can quickly switch between query modes
depending on your workload. It looks like, it’s kind
of the complete unification when you think about
data lake and warehouse kind of all in one place. – Yeah, absolutely, that’s
really by enabling both a SQL query service combined
with provision resources, you can apply whichever
compute modality is optimal for your workloads. If you have DataHawk Explorations, you may use one mode if you’re using operational
dashboards and reporting, you may choose to use the
provisioned resources. – And the nice thing is
they can used regardless of where the data is
actually physically stored, but how would a data
scientist then use Synapse? – It’s easy. – [Jeremy] It’s always easy isn’t it? – It’s always easy, easy button! – [Jeremy] Python is easy. – So, as I mentioned earlier,
ML big data is actually built into Synapse, and
Synapse provides a spark-based analytics run time, and it
supports multiple languages for doing data processing,
and machine learning. Those languages, you know,
if you’re a python person, you can do python. Certainly Java, R, the other
thing that we’ve done is, we added dot net support into spark, so, for a large community
of dot net developers that wanna do large scale
data processing with spark, you can now do that. Here you can see, if I
go back to the data hub. I’ll navigate to my data,
similarly the same way that we opened a SQL script,
I can open a notebook. Synapse will detect and
populate some python code to actually go start analyzing this data. But I’m gonna switch over
to a more involved notebook that was developed earlier. I’ll scroll down. What we’re doing here is we’re
actually using this notebook to train a machine learning model. So, here we’re calling into
Azure Machine Learning, we’re creating an experiment set. We’re pushing that data
to, that configuration, to Azure Machine Learning. We then run that experiment,
I’ll scroll down a little bit. These are the experiment
results so, we ran, I think it was 18 iterations
of this particular experiment. And then when I’m finished doing that, I can actually go register that model with Azure Machine Learning. So, you can see here at the bottom, here where we’re finding the right model. We register that model with
Azure Machine Learning, and then I can actually
make use of that model within Azure Synapse. In fact, I can use that
model and I can use models that I create directly within SQL, which is was really powerful. – [Jeremy] So SQL with ML. – Yeah, yeah.
– All right, so let’s see it. – Let me show you that. So, I’m gonna flip over to a SQL statement that I wrote earlier. So the interesting thing here is, I have a SQL statement that’s
invoking a predict function. And so, that’s just SQL
that I know and love, I wrote that SQL and we’re
invoking a model here, a predictive model. – So this is a taxi cab data site, right?
– Right. – We’re gonna try and
figure out whether or not the taxi driver will– – It’s good to tell everyone that, because we haven’t told
everyone that at this point. (Jeremy laughing) It’s like, by the way. – Yeah, it’s good to tell people. So, we’re gonna predict whether or not the taxi driver will get a tip? – We’re actually gonna predict
what the fare is gonna be. So, if I’m driving around
and I’m gonna kinda pick, maybe I’m doing ride share,
I’m gonna pick someone up, and be like, hey, what is
actually the fare amount gonna be? We have a model here that’s
gonna predict the fare amount. – Which is more important? From the business decision
making perspective than it is the tipping of the driver. – [John] That’s right,
depends on who you are. (both laughing) So here, I ran that SQL query right within Synapse, here
you can see the results. At the far right, you see total amount, hopefully folks can see that. Total amount is a predicted value, ’cause actually the ride
has not happened yet. And so, we’re predicting that right within the SQL engine, so that
the really cool thing here is that if you are a SQL
developer and you know SQL, you can actually now
do predictive analytics right in place, right within Synapse. – But we’ve also integrated
some visualizations as well. So, why don’t you show us
kind of what you can do in the visualization front. – We have. – [Jeremy] All right, let’s see it. – All right, so, I’m gonna show you, we’ve integrated right within Synapse a Power BI Development experience. So I’m gonna collapse
some of these notebooks to give us a little more room. At the bottom there in the develop hub, you’re gonna see
something called Power BI. It’s a service, it does BI. – [Jeremy] I think people
know Power BI, yes. We’ve actually got a show on that later. – [John] Oh, good, we’ll tune in okay. So, when create a Synapse workspace, I can actually link my Power BI workspace to my Synapse workspace. And so these work really well together, we get a peanut butter
chocolate type of effect going on here with BI and analytics. And right from the Synapse studio, I can actually modify my BI report. So, I can load all that,
we’re sharing artifacts, we’re sharing security models. And so, this is really a great
simplified experience for, if I’m a data engineer and I’m
working with a set of people inside my organization
that are building reports and visualizations, we
get a common language. Where we can say, “Hey,
I updated this report. “It looks a little off, the
data looks a little off. “I’m working with data in Synapse, “I can go look at this report, “I can even make changes
if I have permission to go “make changes on that report. “I could add new fields, “and I can do that right
within the Synapse experience.” – Looks amazing. Loving the integration
there with Power BI. So, a lot of really good stuff that we covered today, across Synapse. We’re delivering some
really amazing experiences. But where can people go
then to start using this, and to kind of get their
hands on it and learn more? – So this is really just a first look at what we’re doing with Synapse. People are starting to
use it, we’re in preview. We’re getting some great feedback. We saw our customers like
Unilever that were on stages, were talking about their
experiences with Synapse. So, for people to get going, you can check out aka.ms/Synapes to learn more and sign up today. – Really good stuff John. Thank you so much for joining us today. And of course, keep
watching Microsoft Mechanics for the latest tech updates. If you haven’t already, hit subscribe. We’ll see you next time
and goodbye for now. (audience applauding) (upbeat electronic music)

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *