Azure Synapse Analytics – Next-gen Azure SQL Data Warehouse

(techno music) – Coming up, we take a look
at Azure Synapse Analytics. Our next-gen Azure SQL
Data Warehouse solution that delivers limitless analytics and on demand query compute
at scale as a single service. Now we’re gonna show you
the end-to-end experience of building and deploying
rich analytics scenarios and how you can automatically generate predictive models and much more. Today I’m joined by John Macintyre from the Azure Synapse Analytics
team at Microsoft, no stranger to Mechanics. Welcome back to the show. – Hi Jeremy, it’s great to be back. – So last time you were here,
we actually had a look at Azure SQL Data Warehouse Gen 2, and how powerful that platform is for analyzing massive data sets. And more recently, we actually had Charles
Feddersen on the show that really went through the Modern Data Warehouse
solution pattern, and really brought together the full data and analytics stack. But how do things evolve now
with Azure Synapse Analytics? – So with our solution pattern, we were really focused on taking best of grid
services and connecting them so that you could deliver
end-to-end analytics solutions that were not possible with traditional enterprise
data warehousing solutions. You could do ETL, bring the
data into your data lake, and from there you could take advantage of our big data processing
engines, data warehouse, and also visualize your data. That said, we know there was a certain degree of complexity associated with wiring these services together. You need to right-size the
services for your needs, and it’s not trivial to monitor
health, enforce security, and manage networking across
all of these technology pieces. So to help with that, we’ve
been really focused on delivering an end-to-end
analytics platform that combines all the capabilities that you need for limitless
analytics into a single service. And that is Azure Synapse Analytics, the evolution of our Azure
SQL Data Warehouse service. So with Azure Synapse in
a single unified service, we give you a broader and fully integrated set of capabilities. Not only do we give you ETL, but we’ve made hybrid data ingestion and orchestration easier. We give you secure self-service
enterprise analytics through our data warehouse. We provide AI and big data processing, and we’ve even added more capabilities, such as efficient compute
power on demand for queries, as well as end-to-end monitoring,
management, and security in a unified experience. All of this leverages our limitless Azure Data Lake Storage service for any type of data. And through our Open Data Initiative, you can bring data from other systems into Azure Synapse Analytics. – So this is super powerful, because you can actually
on board your organization a lot faster. You’re moving early the integration work that has to assemble all those core pieces that you were talking about. It’s all turnkey now, and it’s really got all
the intelligent analytics, all as a platform out of the box. And we also kind of do all the tuning to meet the needs of any workload. – Yeah, that’s right. It really supports any
data and any analytics at limitless scale. Azure Synapse Analytics
eliminates the divide between the data lake
and the data warehouse. And of course, this is an open platform that integrates a broad set of services including Power BI, Microsoft Dynamics, Azure Machine Learning, Azure Data Share, and of course your existing applications and data platforms that
you’re using today. – So how do data teams actually
interact with this service? – Yeah, that’s one of
the really unique aspects of Azure Synapse. We deliver all of these capabilities within a single workspace environment, so teams of data engineers,
data scientists, and analysts can use and collaborate on the same data. So for example, as a data analyst, I can access capabilities that allow me to explore and analyze data. I can develop SQL scripts,
and if I’m a data scientist, additionally build machine learning models and generate predictions. And as a data engineer, I can easily build and
orchestrate data pipeline. – Of course, you can monitor
and manage the entire service, so as a service that not only removes the silos kind of between
the different data sets and data sources but also
removes the silos between people, so they can work more
efficiently together. But can you show us how this works? – Yeah, absolutely. It’s really easy to get
started with Azure Synapse. You just have to go to the Azure portal and create a workspace. You can see here that it
only takes a few steps to create a workspace. I’ll go in to find my subscription, assign a resource group, and
I give that workspace a name. I’m gonna use johnmacws1. We’ll keep our default location, select our storage
account and file system, and it’s just that easy. Now I can review and
create that workspace. The workspace encapsulates capabilities that I showed you to build and deploy your enterprise
analytics solutions. – So it looks super easy to provision, but how would I bring data
then into the platform? – Yeah, so because Azure Synapse is ODI or Open Data Initiative aware, any data that gets pushed into
a data lake in ODI format, Azure Synapse can work with. And for all other enterprise
data, as a data engineer, you’d come into Synapse’ studio and you’d dive into Orchestrate. This allows both hybrid data
ingestion and orchestration. First to bring data in, we’ll need to create a linked connection. I can do this from the Manage
tab in Azure Synapse Studio. I can go and click New, and New will allow me to
create that new linked service. Azure Synapse offers more
than 85 enterprise connectors for a variety of sources,
including on-premises systems, SaaS applications, as
well as other clouds. It’s like ETO, without all the hassle. – Okay, so now you’ve
linked everything up, how do I actually bring data in? – Yeah, so Azure Synapse
will provide both code-free as well as code-first experiences for doing data transformations,
as well as data wrangling. If I look into the Pipeline hub, I can modify an existing flow
from Azure Synapse Studio. Here if I select my CleanAndEnrich
data flow and tab over, I can add an additional transformation to this existing data pipeline. I’m gonna go ahead and select plus, and plus allows me to do
additional transformations as well as schema modifiers. So here I can easily add a
Derived Column to my data flow, and if I select this, it does the heavy lifting to automatically generate
the transformation code and deploy that to fully
manage scale out clusters. – So this is a really graphical and intuitive experience here, where distributed team members
can really share and interact with common sets of
pipelines and datasets, but why don’t we continue
moving up the left nav, and also show what you
can do under Develop, because I imagine this
is really where the core of Azure Synapse Analytics is, because it came from
Azure SQL Data Warehouse. – That’s right. SQL is really one of the core and most popular enterprise
analytics languages. This lights up data visualization and the enterprise data warehouse has traditionally been the
platform powering this. Synapse extends the capabilities
of SQL Data Warehouse, so we’re building on the
performance, elasticity, security, robustness,
and scale of the service to give your analysts a
single source of truth for enterprise analytics. – Cool, so it’s been optimized then and extended with even more capabilities that after our years of writing massive data and compute services, all of that’s even better now. So why don’t we switch gears though and talk about how you’d
integrate an existing set of data. – Yeah, this is really where
our data lake integration comes in to play. You can extend your SQL analytics
directly to the data lake to analyze data without
moving or copying the data. You can use these capabilities with your allocated compute resources. We’ve also added a completely new serverless query on demand modality to give you the horsepower
you need at the exact moment that you’re running these queries. Let me show you how SQL Analytics works integrated with Azure Data
Lake in Synapse Studio. So here you can see I’m in the data pane, and I have my storage
accounts, my databases, and my datasets. I’ve pre-selected a set of files here that are in my data lake. I have over 300 terabytes of
data here in my data lake, and I wanna start to explore that data and see what’s in some of these files. If I right click on these files, and say Open in SQL Script, Synapse will automatically
generate a SQL statement to query over this data. You’ll notice, at the top here, that I’m attached to SQL on-demand, which means I haven’t provisioned
any pools or resources to process this query. I’m running completely serverless. So I’m only gonna pay for the
one query that I run here. – All right, so now you’re
gonna run the query, and it’s pretty awesome because
now we have a query service, in this case it’s not just
looking over a database, but a data lake. – That’s right. So we’re taking the power of SQL, and we’re bringing it to the data lake. And so if we go back to
our query results here, you can see that Synapse
has finished the query, and Synapse has also
automatically detected the schema of these files. You can see all of the
column headers populated in the results pane. I can also start to
visualize some of this data if I click on Chart, I can see I have options
there with Chart type, X-axis column, Y-axis column. I can start to play with
and visualize that data all within one place. – So that makes it super easy
then to explore your data now. – Yeah, that’s right. And now I can take this data, and I can load it into a data warehouse, in which case I would
use provisioned resources versus my on-demand resources so that I can meet the
performance expectations that I might expect in an
enterprise data warehouse. – Makes total sense. So now, it’s really great
that you can actually switch between the different query modes depending on which workload
you’re actually using and working against. So it looks like a
complete unification now between the data lake
and the data warehouse. – That’s right. And by enabling SQL On Demand
through a query service, combined with provisioned resources, you can apply whichever
compute modality is optimal for your workload, such as ad hoc query exploration or some of our enterprise
analytics use cases, such as dashboards and reporting. – So this is gonna be used then regardless of where the data is stored, but how do data scientists
then use Synapse? – Yeah, that’s a great question. As I mentioned, AI and ML are
built in to Azure Synapse. Synapse provides a
Spark-based analytics runtime that supports multiple
languages commonly used for data processing or machine learning. You can use Python,
Java, R, and even .NET. We’ve added .NET support for Spark for our broad community
of .NET developers. So if I navigate over to the data view, here I have the same set of data that I was processing with SQL. If I select one of these
files, and right-click, I have the option to open in a notebook. I’m gonna select Open in the Notebook, and here Azure Synapse will automatically generate code in Python for me to get started with this data. It’s important to note that this is a completely
serverless Spark platform. – So in this case, there is no provisioned
resource being used here. – That’s right. Let me switch back to the Analyze view, and here you can see a notebook
that I’ve created earlier. I’m using Python in this example to train a model against
my data in the data lake. And soon with Azure Synapse, you’ll be able to invoke
predictive models in SQL. If we look at the SQL script, we can see how this works. Here you can see I’m using
a familiar SQL statement to invoke an in-database model
to output a predicted value on each record in the results set. – Okay, this is really great. This is probably the first time I’ve ever seen machine
learning combined with SQL. – Yeah, this really brings
predictive analytics to SQL, so that any SQL developer can
do richer set of analytics just by writing SQL statements. So if we go and we run this query here, we’ll see we’re running our prediction right on top of this data, and we’re getting the
output of that prediction in our result pane. – [Jeremy] So what’s this data set here? – [John] Yeah, so this data
set is the New York taxi data, and what we’re doing is we’re predicting whether a driver is gonna
receive a tip or not. You’ll see in the first
column in the result pane the label Tipped, it says, it’s binary. Yes or no, are you gonna get a tip. That is what the model is outputting, and there you see just through SQL, we’ve invoked that model
and we’re scoring that data as a result. – Very cool, and this is a workspace and it can also be linked with Power BI right from Synapse Studio, and you can develop dashboards and reports from all your data including the data that we load into the warehouse as well as the output
of the predictive models that we just saw. So it’s really awesome that
we’ve integrated the experience for your data scientists, data analysts, and also data engineers. – Yeah, and this is really
just the first look at Azure Synapse. We’re bringing end-to-end monitoring for your entire analytics
solution with Azure Synapse, and we’ll also be
integrating with Azure DevOps for application lifecycle management. – So I know a lot of people are probably using Azure SQL
Data Warehouse right now, what do they have to do to
bring their data into this? – That’s the great thing, if you’re using Azure
SQL Data Warehouse today, you’re all set. All of these will be
released as new capabilities on the service, and they will start to roll out
soon for existing customers. – This is really awesome stuff, the work that you and the team have done to build this all out, but where would I go to learn more? – All you have to do is
you go to to learn more and sign up today. – Thanks, John. And of course, keep
watching Microsoft Mechanics for the latest tech updates. If you haven’t already,
hit subscribe today. And from the Bellevue
Microsoft Technology Center, thanks for watching. We’ll see you next time. (techno music)


  1. Very cool! So when will the Synapse Studio be available? Just tried to create a Workspace and this doesn't appear to be possible as yet…

  2. Awesome orchestration ! Thanks for creating Azure Synapse Analytics and sharing this explanation. Handy to have an end-to-end, single work space environment. Bringing the power SQL to directly query serverless the DataLake is very interesting ! Any SQL Developer can bring a rich set of analytics by just writing SQL statements !

  3. Hmmm, so in summary Azure Synapse is a mash of SQL DW + Azure Data Factory into one portal experience across a serverless architecture. Looks good but I have my concerns about costs given each query or action is now measured in $$$.

Leave a Reply

Your email address will not be published. Required fields are marked *