Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (2)

So my first stop and first stop for many customers

is let's check out Error Reporting

to see if it's automatically identified anything for us.

And sure enough, having looked through our logs,

out of all of those errors, it's identified a specific error

in our application that someone tried

to set the quantity to less than zero,

which caused a specific error.

And I can click on that.

I can see exactly how many customers have

been impacted by this error.

I can see the stack trace so that I

can tell exactly where in my code just came from,

and I could go to Stackdriver Debugging

to investigate further, if I wanted to, from here.

I can also jump directly back to the logs to view the raw logs.

I can link to an issue, so this one here

links to a GitHub issue, for example.

And I can also track the resolution status.

So for example, if someone tells me one of our developers says,

hey, I fixed this error.

It should be all set.

It's deployed.

I can go ahead and say, this error should be resolved,

and then we say, OK, no known errors.

That's great.

If I go back to my catalog here--

and we'll test it to see if this actually works--

and I add something to my cart, let's see

what happens if we try to set the quantity

to a negative number.

We'll update the basket, and I'll

come back over to Error Reporting,

and reload it here for just a moment.

And this usually updates in about five to 10 seconds,

and we can see that it automatically

identified that the error had been seen again and reopened

to the issue.

If I wanted to, I can also turn on notifications

so that rather than me going to this dashboard,

it'll proactively push alerts for new or re-opened errors

to my inbox.

So that's a very common use case we

hear from a lot of developers who

are using Stackdriver Logging.

But we hear a number of other examples

for, like, security use cases.

So something that I hear a lot is,

I want to be alerted if anyone adds an email,

let's say from a Gmail.com domain.

So in the UI, I can interact with it

in sort of a point and click mode,

but I can also interact with the logs in the Advanced Filter,

and then type in more advanced queries.

So in the case of identifying something

that comes from a Gmail.com account,

I need to create a logs-based metric so that I can then

alert on that.

So I'm going to go over to my Logs-Based Metrics,

and I've created one here.

I'm going to go ahead and edit that metric.

For anything that comes from logs

that are set IAM policy, that binds a member to @gmail.com,

I want to count all of these.

And that's kind of the first step.

So I'll be able to have dashboards about everyone who's

been added from an @gmail.com account,

and then I can also create alerting policies on that.

And I can see that earlier this morning, my Google account

did add somebody who was @gmail.com.

So let me go back to our logs-based metrics

and go ahead and show you how you would create

an alerting policy on this.

So I'm going to Create Alert from Metric.

This will pop me over into Stackdriver Monitoring, which

is where I manage all of my dashboards

and my alerting policies.

So I'm able to, here, see the logs-based metric.

Take a look here.

I don't have any recent ones, but I'm

going to go ahead and create an alerting policy if I ever

see this, because I don't ever expect to see this.

And instead of duration, I'm going to say most recent value.

So if this ever happens, I want to be alerted.

Go ahead and save the condition, and create a notification.

So I'll send it to my favorite email address,

auditlogsrock@gmail.com.

I can add some documentation in terms of who to contact.

Name this policy.

And I'll go ahead and save it.

So now if I go ahead and add a new member to an IAM policy

anywhere in this project, I will receive

a notification about this.

Another common use case we hear from customers

is that they want to send their logs someplace else.

So this is the beauty of the centralized logging solution

is that it all comes in centrally,

and then we can slice and dice it and send it

to many different places using exports, which

is the log router that we were talking

about just a moment ago.

So I'll start out with a very common use

case, which is I want to send all of my audit logs

to BigQuery.

So I select a filter.

In this case, I'm going to say everything

that matches the activity audit logs in the log name.

And then I simply select BigQuery and the destination

and create a sink.

And any future logs that match this will automatically

be sent to BigQuery, which is great,

but that only helps me for this project.

What if I want to do this at the organization level?

So in this case, I need to pull up my Cloud Shell here.

And I can use a gcloud command to set this

at the organizational level.

And I'll call out right here.

We have "include children," and we're setting this up

at the organization level.

Same thing, though-- the log name

matches the filter of anything that

has cloudaudit.googleapis.com.

And I'm going to go ahead and save this,

and now any audit logs from anywhere

in my organization will all go to the same BigQuery

destination.

Now one tricky thing just to remember

is now that I set it up at the org level,

any audit logs that come from this project

will be written twice.

So in that case, I'd probably want

to go back and get rid of the one at the project level.

Another thing that we have users do

a lot is I want to act on logs.

So in this case, let's say every time a new VM is spun up,

I want to take a look at it, process it

with Cloud Functions, maybe, and add some labels

or apply some rules to it.

So I'm going to create a sink for any time

that I have created an instance, which is the insert command.

And I'll be able to send all of these to Pub/Sub.

I can then use a Cloud Function to pull all of the logs

from Pub/Sub, process them, take whatever action I want on them.

So this is another common use case we see.

And then last but not least, we'll

talk about log exclusions.

So we have a page dedicated to helping

you understand what your log volume is

across your various Google Cloud resources or AWS resources

here as well.

And I can see, for example, taking a look here,

a lot of my volume in this project

is coming from Kubernetes, so that's the bulk of my logs.

I can see, though, that projected

through the end of the month is 43 gigabytes, which is well

within the free limit of 50 gigabytes,

so I'm not too, too worried.

But I could go ahead and say, you know what?

I'm going to send these logs maybe to my Elk Stack.

I don't want to pay for them in Stackdriver.

I could go ahead and disable the logs altogether here,

or I could create an exclusion filter based

on this and, for example, say, anything that is less

than a warning level, I want to maybe just sample those logs.

So instead of excluding 100% of them,

maybe I will set this is 99%.

I can also deep dive into the logs volume in Stackdriver

Monitoring using the Metrics Explorer tool

and visualize exactly where my volume of logs is coming from.

And if we could cut back please to the presentation, awesome.

EDUARDO SILVA: So now with a solution like Stackdriver,

we can say that logging is not long and boring.

But before that, handling logs in the different formats,

many sources, and correlate data is quite hard.

And I'm sure if you are working already with distributed system

or cluster or with Kubernetes, there's

quite challenges that needs to be solved.

So what I'm going to explain now is

about how logging operates at Kubernetes level.

So if you understand the problem, how

it works behind the scenes, means

you can optimize the engine queries

and get better insights from your data.

Do we have any Kubernetes users here?

Oh, good.

So I'm going to do a little introduction

about how logging works in Kubernetes

or when you play with docking containers.

And basically, you have one application

that triggers a message, and that message

means to be like a log message saying a status, an alert,

a warning, or anything related.

For example, we have, like, "Hey Next."

But it's not just about the message.

That message has some metadata that needs to be correlated.

One of them is at what time this message was generated,

and the other is what's the channel that this message was

generated from.

If we speak about containers, there

are two main channels, standard output, standard error.

And here, for example, "Hey Next" is just not a next.

We have JSON example with metadata.

That message then needs to be stored somewhere.

There are many ways to store container logs

or in Kubernetes logs, if, for example, we have systemd.

But here we are going to refer how Docker operates.

Basically, your message is stored in the file system,

in a pod.

So every message becomes a new JSON map,

and every message is appended at the end of that file.

But things will become a little bit complex later,

because if we think about how Kubernetes

works from an architectural perspective,

we can think about this.

You have your application, the most simple use case.

That application is running in a container.

Container appends limitation restrictions

and allows you to set up certain policy rules

and also how this process that is running

can communicate with others.

And when I say the "container," you

know that container is just a concept, right?

From an operating system perspective,

it's all about namespaces and cgroups.

So your application runs in a container,

and that container runs in a concept, which is called a pod.

So things get complex, because a pod

can have multiple containers.

Multiple pods can be running on the same node.

And a node I'm referring to to a bare metal

machine or a BPEL machine.

So here is just one single machine, but in a real cluster,

you have many of them.

So imagine that you have your distributed application.

You told Kubernetes, please deploy my application.

Kubernetes, based on the scaling policies,

is going to decide where these containers will run

or will scale up.

It's going to create some replicas on which

node these replicas will run.

And likely, not all of them will run in the same node.

So these become complex.

If I tell you 20 years ago, please

look at my logs from my application, well,

you open the terminal.

You do some SSH and look what's going on with your application.

Just cut, file, and you get the message.

But on this, if you have a huge cluster,

you cannot do an SSH on every node and try to find which is

the local JSON file that belongs to the application that I just

deployed, but maybe that application will destroy it,

because it failed or it was scaled up.

So things become more complex, and complex

Programming, Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (2)

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18... (2)