Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (1)

[MUSIC PLAYING]

MARY KOES: Thank you all for coming

on one of the later speeches of the conference today--

super excited to have you all here.

I'm Mary Koes.

I am a product manager with Stackdriver Logging.

EDUARDO SILVA: Hi.

My name Eduardo.

I'm a software engineer at Treasure Data.

MARY KOES: And we're here to talk to you today

about getting started running a centralized logging

solution on Google Cloud.

And I have to say nothing but my deep love of logs

would convince me to stand up on stage here.

So I'm really excited to get to share with you

why centralized logs can make your life easier,

how to do that in Google Cloud.

Eduardo will talk a little bit about logging in Kubernetes

specifically, and then some insights from managing

logs of scale that we've had--

we've learned, sometimes the hard way, through Google.

So let me see just a show of hands.

Has anyone-- do we have any current Google Cloud

customers in the audience who use Stackdriver Logging?

Awesome.

And another show of hands-- do we have any software engineers

from Google here who use logging in their day-to-day life

at Google?

Yeah, they don't really let us into these.

Oh, we have one.

Yay.

A couple.

So they don't usually let us into these talks,

so that's awesome.

The team that I work with in Pittsburgh

does both the logs to help all of our Cloud customers

manage their operations at scale,

but also the logs that our Google engineers use

for understanding the applications we develop

internal to Google as well.

So let's talk about why logs matter.

Fundamentally, troubleshooting-- anything,

but especially distributed microservices is really hard.

So we have, in the cloud, a bunch of interconnected pieces,

like this puzzle.

And when something goes wrong, it's

really hard to figure out how we need to put them back together.

Logs will give us visibility into our system.

So in the puzzle analogy, they're

the picture on our puzzle pieces here.

And while you could assemble things

one puzzle piece at a time, a centralized log view

will allow you to see all of the picture

at the same time, which will help

you to improve your reliability, performance, and security.

So let's talk about how we actually

deliver on a centralized logging solution running

in Google Cloud.

And I mentioned I am the product manager for Stackdriver

Logging.

That is one of three pillars that we

have within Stackdriver, so we've

got logging, monitoring, and application performance

management, or APM.

And each of these are part of a SaaS ops suite

to help all of you monitor and troubleshoot your applications,

whether they're running on GCP, AWS, or cloud

native infrastructure, and trying to keep

your apps fast and available.

So we have a bunch of different components here.

We have log producers on the one side,

and there are many different ways logs come in

that you care about.

So we have audit logs, which are super special.

We've got platform logs that are sent from individual Google

services like Cloud Load Balancer.

We have open source components that

are creating logs that you might be running

to power your infrastructure in your apps,

and then there are logs that come directly from the apps

themselves.

On the other hand, we've got consumers of logs.

So there are a number of different options

for consuming logs.

One, of course, is Stackdriver Logging.

Some users will also want to consume logs

in a tool like Google BigQuery or save them

for compliance in Google Cloud Storage

or use one of the many different partner applications

that we support, including Sumo Logic, Splunk, and Elk Stack,

just to name a few of the more common ones.

So the log router is what actually glues these two things

together-- the producers and the consumers.

So if I look over here, I've got,

at the top, the log producers.

Everything gets centralized through the logging API.

And then we've got the log router,

which will help us to get the logs to the right endpoint.

And down at the bottom, we have the Stackdriver product

on the left-hand side, and then on the right-hand side,

we have other potential destinations,

whether it's cloud storage, BigQuery, or anywhere else,

which can go through Pub/Sub.

In order to get logs into Stackdriver,

there is zero setup for GCP audit

logs and logs from GCP services, so any

logs that are being created on your behalf by a GCP service.

And then if you want to instrument your code,

you can also use client libraries

to send logs to the API directly.

But the vast majority of our logs

that aren't sent from cloud services

actually go through our open source logging agent, Fluentd.

EDUARDO SILVA: Hello.

Maybe most of you already know what is a logging agent,

but if we go like years ago, we will say that,

and we can think about logging.

The first thing that we can think about is maybe syslog.

Have you used syslog before?

Please raise your hand.

So everybody who raised their hand is over 30, [LAUGHS]

actually.

Yeah.

So logging mechanism exists for a couple of reasons,

maybe because you went to troubleshoot,

you want to monitor something, or understand

what's going on in your applications in your system.

So what we're talking about here, logging is not new.

I think that logging exists from the beginning,

since if you're doing development, you're debugging.

You have some kind of logging mechanism.

If you are deploying your application,

you have some logging mechanism for that.

It could be through syslog, rsyslog, or just writing

to a plain log file in the file system.

But we have current problems now.

In the past, we used to have, like, common messages.

The application is doing A, B, C, D, or it's crashing.

We have some exceptions.

But what happens when you have multiple applications that

generate data in a different way?

For example, if you know Apache web server,

Apache web server creates their own log entry,

saying this is an IP address, this

is a practical version size of the document.

If you look at, for example, MySQL logs, they

look pretty much different.

What kind of query was running?

What's our reception?

How are the tables?

Do we have some charting or things like that?

So if you want to do logging, it's

not because logging is cool.

I think that, honestly, logging is boring from a--

MARY KOES: It's cool.

EDUARDO SILVA: --from a play perspective,

but you have to deal with that, because you have problems

that you need to solve.

So the thing is if you have multiple source of information

that come from different places in different formats,

the ultimate goal that you want to do is data analysis.

So in order to do data analysis, you need a logging agent,

and this logging agent must have different strategies

to deal with data formats, data sources.

One thing is to read log files from the file system.

The other is to listen from TCP or UDP.

If you're using, for example, hardware devices

like firewalls, and they are sending their logs over UDP,

you need to support that too.

So we are moving forward.

We can think about many logging agents,

but the experience here is that Treasure Data, the company

that I work for, we created Fluentd around seven years ago

to solve the whole data collection problem.

And Fluentd is an open source project.

It's fully open source, and right now,

the CNCF, the Cloud Native Computing Foundation,

the goal of this project is it allows you to consume logs

from multiple places with different formats

and ingest back the information on any kind of central database

like Google Stackdriver or your own elastic search

database or MySQL.

In general, as a project and ecosystem,

we have more than 700 plugins.

And I think that from a company perspective,

we maintain like 10 or 20.

And all of the others are made by the community.

Do you know Fluentd?

Is anybody using Fluentd before here?

Good.

We have some users.

MARY KOES: Awesome.

Thanks.

So once we get logs into the centralized logging

API, however that happens, and many times

through our logging agent, our next goal

is to get it to the right destination.

And this is where the log router comes into place.

So we have two kind of different flows for managing

logs through the log router.

One is called log sinks, and that is basically an inclusion

filter, so you're going to say exactly what it is you want

sent to these destinations.

And if you don't say anything, nothing

will go to these destinations.

We can send the same log to BigQuery and Pub/Sub

or to none of the above.

On the other side, we have logs being

sent to Stackdriver, the logging product

and we manage this via exclusion filter.

So if you do nothing, all your logs

will go to Stackdriver logging.

And then if there are certain logs that you don't want to go

there-- either because you don't want anyone in your company

to have access to them or if you don't want to pay for them--

then you can use an exclusion filter

to control what goes into Stackdriver logging.

And that's a pretty big difference

compared to many of the other cloud

platform logging experiences out there.

So I'll call out two important ideas around our log router.

One is that logs can be exported to log sinks,

even if they are excluded from Stackdriver log storage.

So this is something that can be a little bit

tricky to get from our documentation-- we'll work on

that-- but hopefully, clear now.

And then the other thing that's a really powerful tool

is a tool called aggregated exports.

So most of our customers have an organization

with more than one project, and they

want to manage logs across their entire organization.

A special use case here is security, where oftentimes,

the security team wants to see the audit logs

across their entire organization.

And so an aggregated export can be set up either at the folder

level or at the org level.

And it will inherit all of the stuff below it,

so if audit logs are created for a new project that

didn't exist now in the organization,

we will go ahead and include those in the export as well.

And then the last thing I'll talk

about in terms of logging tools is how you

analyze logs with Stackdriver.

And we'll get to a demo here in just a moment,

but Stackdriver Logging supports both basic and advanced

filters.

We also have a tool called Error Reporting

as part of the logging library toolkit

here that will automatically go through your logs and group

like logs together, specifically looking for stack traces

and issues in your code.

And then we also provide the ability

to create alerts and dashboards from logs

by using a tool called Logs-Based Metrics,

and that pairs with Stackdriver Monitoring.

So that's kind of a theme that you'll

hear across the Stackdriver tools

is that while you can use logging by itself

or monitoring by itself or APM tools

by themselves, if you use multiple ones together,

you can actually get something that is more powerful than any

of them individually.

And now my favorite part is demo time.

If we could cut over here, perfect.

All right, so I have a web store running on GKE.

And I can see the logs from my web store.

At first glance, I can see, hey, there are quite a bit of logs,

looks like a lot of errors.

But I'm not really sure where to get

started if there's an issue.

Programming, Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (1)

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (1)