Analyzing Proxy and DNS Log Files - USALearning Proxy and DNS Log Files ... So take your proxy list,...

28
Analyzing Proxy and DNS Log Files Table of Contents Analyzing Proxy and DNS Log Files ................................................................................................. 2 Looking for Bad Stuff....................................................................................................................... 3 Maintain Your "White List" ............................................................................................................. 5 Log Analyzers .................................................................................................................................. 7 Analyzing Logs Using Grep ............................................................................................................ 12 Let's Do Some Long Tail Analysis .................................................................................................. 13 Researching the Outliers............................................................................................................... 15 What If You Find a Bad Site? ......................................................................................................... 17 Bypassing the Proxy ...................................................................................................................... 20 Automatic Correlation .................................................................................................................. 24 Summary ....................................................................................................................................... 25 Notices .......................................................................................................................................... 28 Page 1 of 28

Transcript of Analyzing Proxy and DNS Log Files - USALearning Proxy and DNS Log Files ... So take your proxy list,...

Analyzing Proxy and DNS Log Files

Table of Contents

Analyzing Proxy and DNS Log Files ................................................................................................. 2

Looking for Bad Stuff ....................................................................................................................... 3

Maintain Your "White List" ............................................................................................................. 5

Log Analyzers .................................................................................................................................. 7

Analyzing Logs Using Grep ............................................................................................................ 12

Let's Do Some Long Tail Analysis .................................................................................................. 13

Researching the Outliers ............................................................................................................... 15

What If You Find a Bad Site? ......................................................................................................... 17

Bypassing the Proxy ...................................................................................................................... 20

Automatic Correlation .................................................................................................................. 24

Summary ....................................................................................................................................... 25

Notices .......................................................................................................................................... 28

Page 1 of 28

Analyzing Proxy and DNS Log Files

13

Analyzing Proxy and DNS Log Files

**013 How would we actually go through and analyze these proxy logs?

Page 2 of 28

Looking for Bad Stuff

14

Looking for Bad Stuff

Based on intelligence information • something.evilsite.org seen as a site known to host malware

information• <IP> known to be used by malware as a C2 server

Based on analysis• Basic grep commands• Long Tail Analysis• Perhaps some automation?

**014 Well, what you're basically doing is you're looking for bad stuff. You're looking for the sites that out of the network of a thousand people, one person, five people, went to. You're looking for the outliers here. You don't care that a thousand people went to Google.com. What you care about is somebody went to hackme.com.42.ru, or something like that. And generally, not everybody on your network is going to go there, right? If there's a malware infection, unless it's a significant outbreak, there's going to be one person on the network, two people on the network that are affected by that.

Page 3 of 28

So what you're really looking for in the proxy logs, in the DNS logs, are those outliers. Those one or two sites that have one or two visitors at it. And that's basically what you're looking for. So you might be looking for evilsite.org, or you might have an IP address that you know to be bad, that you've gained from some of the subscription sites out there where you can sign up for malicious URLs, or malicious IP addresses. And they'll let you know what those are. So you can do-- you can look through your proxy logs and your DNS logs for that. How do you do this analysis? Well, it's just a text file, right? So if you wanted to do it by hand, you could use grep. You could do a-- not a heuristical method by any stretch of the imagination. But long- tail analysis; we'll talk about that. And of course there are ways to automate this.

Page 4 of 28

Maintain Your "White List"

15

Maintain Your “White List”

What sites do your users always use?

You can parse out known good sites in your analysis.• Google, Microsoft, antivirus sites, partners, your cloud-based

systems, etc.

You can also create a list of common known goods.• Common advertisement sites (if necessary), etc.

Note: For your proxy, you will probably want to start an entry with a “.” and end each entry with a slash as in “.google.com/” or else you might be tricked by sites posing as

mybadgoogle.com or google.com.x.com

**015 So how do you know what sites are good? Remembering your proxy log, you're probably going to have thousands of entries in there, right? How do you strip out what you know to be good? Well, the first suggestion here is maintain a white list. So not that you have to block people from going to websites or anything like that. But maintain a white list that you can scrub the proxy list against. So take your proxy list, and take out everything that's Google.com, msn.com, espn, sei.edu, or whatever the case may be. Put all of those in a known good white list, so that when you do your analysis, you don't have to worry about all those.

Page 5 of 28

Those are known trusted sites that you can just get rid of. So sites like Google Antivirus are good things to put in there. You also might want to put in a common list as well. So not necessarily a white list, but sites that you know people end up going to as part of just daily web browsing. So all the advertising networks like when you go to pcworld.com. There's a banner ad up there. Where does that banner ad come from? An advertising network, right? Well, that's got its own. You've got a proxy log hit for that. You've got a DNS hit for that as well. So not only establish a white list, but establish a common list of sites that you don't care about. They're not good; they're not bad. You just don't care about them. And you can pull those all out of your analysis. Something interesting is that if you're doing this white list analysis, and you've got Google.com, make sure you start it with a dot, so like, dot Google.com. And put a slash at the end of it. And match on that string and that string only. Because that will prevent somebody from coming in and doing Google.com.x.com. Well what is this? Well, one, it's an invalid, you know, domain name. Well, let's just say it was realistic. You know, this could be malicious and you don't want your white list of Google.com to flag this and say, "This is a good site." Because it definitely is not.

Page 6 of 28

Log Analyzers

16

Log AnalyzersA proxy log analyzer like Calamaris can provide nice HTML.

Develop a list of common sites.

Find suspicious anomalous hits and start investigating.

**016 So what you're trying to do-- yes, sir? Student: It's funny now. We've mentioned all these logs and all these filtering rules. 'Cause, you know, I'm thinking if worse case scenario, no one's trying to hack ya, or penetrate ya, you know, just with the traffic today, I mean, if you don't filter out anything, and you don't put certain rules on like your spam servers and things like that, you can just fill up your-- like for mail, for instance, if you don't put any filter in, people will just send you junk mail, and now our servers, just by not doing anything, that even though no one's really trying to take you down, they're just trying to send you information. Just traffic alone, if you don't put any filtering rules, it'd bog down your network.

Page 7 of 28

Chris Evans: Yeah, you'll get inundated quickly. Student: And I had one quick question, too. Chris Evans: Mm hm. Student: DNS, we were talking about the DNS requests. Could you poison your own server to like prevent clients from like-- say you had this successive, people who are going to aol. And you block the port. But aol says, "I'm just going to roll over to find an open port." So some of these programs that you can do, you can- - like you said, users will find a way if they want to get to it. But could you poison your own DNS server to say, "If you go to this URL, if you type in this URL, we're going to take you to like the loop-back address?" Chris Evans: You actually, poisoning a DNS server is a lot harder than you might think. It's not trivial to do that, 'cause most DNS servers are set up to not be poisoned. But you can set up something called the DNS black-hole list. Where if your DNS server, if your internal DNS server gets a request for aol.com, let's say, you can have the DNS server return 127.0.0.1. So you're not really poisoning it, you're configuring it to return something else. Student: Something else on it, right. Chris Evans: Yep. Now is that right? Wrong? Moral? Immoral? Student: It's difficult to maintain. Rather leave these jobs on a firewall, or on a

Page 8 of 28

data loss prevention device, instead of your DNS server. Chris Evans: Well, there are ways of automating it, such that you basically subscribe to a list, and every two hours they send you a new list, and it automatically goes into your DNS server. So there are easy ways of doing that. But again, what are you doing here? You're doing black-list filtering. Anytime you're doing that, you're always behind the power curve. And the attack-- because an attacker can always react faster than you can-- so they change their domain names. It takes you two hours to figure out they've changed their domain names. Well, that's two hours of time that they've got hacking into your network or something. So here's a-- Student: One more question. Chris Evans: Oh, yes, ma'am. Student: So you're talking about filtering so that you block people from going to sites, and block traffic coming in that you don't want to come in. But could you also be talking about like if you're trying to send logs to a server, would you use those same techniques to filter out what you want to send to the server as far as data that you'd want to look at? Chris Evans: You can. It depends on the device that's generating the logs. And whether there's an opportunity to filter before it puts it out on the wire to your central logging server. Some devices will

Page 9 of 28

let you do that. Some software programs will let you do that. Others just say, "I generated an alert," right to the central logging server. Student: And your proxy server's doing that based on rules you set in it? I haven't worked on... Chris Evans: Well, so not necessarily. The proxy server doesn't have to send these as messages. I think it would actually be better that you didn't, given that the amount of traffic that these process, and the number of messages. I mean, every time somebody goes to a website, you're going to get a message. That's not tractable. That's very clear information overload. So I don't think it's a good idea to have your proxy device generate that every time somebody goes. But you can go in and do this in batch analysis. So every twelve hours go grab the current proxy list, scrub through it. Every four hours automate that process, and go through it. Rather than having the proxy generate something to you every time it gets a hit, because that'll quickly saturate things. So on this screen what we're trying to show is kind of a consolidation of good things, bad things, number of hits. So if you look at this list here, the column on the left is various websites. So doubleclick.net, semantic.com. So what we've said here is, "All right, how many people went to this 192.168.1 destination?" And there is 1,750 hits in there. "How many people went to

Page 10 of 28

Google.com?" Two-hundred seventy- nine. How many people went to yimage.com?" One-hundred twenty-one. What you're doing here is you're establishing a pattern. And you're saying, "A lot of people go to these sites right here. And then a few people go to these sites. And even fewer people go to these sites." And what you're doing is you're looking at-- you're trying to find the outliers here. And try to identify, "Well, okay, well, what in this list is really bad?" So up at the top, Google.com, yimage.com, timeinc. I don't care about those. Those are very clearly in either my white list or my common list. And so I can filter those out. So what's highlighted here in blue, are things that I can just ignore. So now I only have to look at these, and determine, "Are any of these suspicious?" And so I've got a much-- this is kind of a contrived example, but you'll have a much reduced list. And you can look at that list and go, "Okay, well, what stands out here?" In this case, you know, baddude.com, other than being a horrible video game twenty years ago, I mean, that might stand out here as being a potential site that you should go look at and see and investigate, and see if it's something odd. Here's another one, fbcdn.net. Off the top of my head, I couldn't tell you what that is. Maybe it's something... Student: Facebook Content Delivery Network.

Page 11 of 28

Chris Evans: So I would actually put this up there in my common list, right? 2mdn? What's that? I don't know. But once I go and look at it, I can put it either into my white list, or generate a security ticket for it, and go investigate. "Okay, well, who went and hit that site? What did they get when they were there? And why didn't we block it?"

Analyzing Logs Using Grep

17

Analyzing Logs Using Grep

Basic commands useful in analyzing logs> cat [file] | grep [string]

— Prints lines to screen containing the string> cat [file] | grep -v [string]

— Prints “all but” the lines containing the string> cat [file] | grep -i -v [string]

— Same as before but case-insensitive> cat [file] | grep -i -v -f [file with keywords]

— Read in multiple strings from a file of keywords> cat [file] | grep [flags] [string/file] >> [newfile]

— Send the output of the grep commands into a new file

**017 Analyzing logs. So again, these are just text files. You can use grep. You can use Notepad to do fine, you know, "Find." Doesn't matter. There are a whole bunch of things you can do on these, because they are just text files. And so if you were going to do a bunch of grep commands, if

Page 12 of 28

you are on a Linux box, you could use these to kind of parse through the file, looking for things that kind of stand out.

Let's Do Some Long Tail Analysis

18

Let’s Do Some Long Tail Analysis

Remove our whitelist entries> cat ProxyLogs | grep -i -v -f whitelist >> less.txt

Next, we remove our most common known sites> cat less.txt | grep -i -v -f common >> evenless.txt

Now we have a smaller group of sites to analyze

Can you parse it down further?Long Tail Analysis

lets us explore sites in this frequency

commonwhitelist

Hits

Domain Names

**018 And essentially, what you're doing is this concept of long-tail analysis. Remember the number of hits we had? So we had sites in this column. The next column was number of people that requested that page. Well, this is the kind of chart that you're looking for right here. You'll have a whole bunch of sites up here that have the vast majority of the hits. Google.com, fbcdn.net, or something like that. The vast majority of the web traffic will be in that common list. And you'll be able to filter it out, and just ignore it.

Page 13 of 28

The next thing you'll have is, you know, your white list, or you know, the sites that you know are okay, or that you just don't care about. What you're really looking for here is this stuff. You're looking at the long-tail of this graph. Because that's where the individual outliers are that are going to be interesting to you from a security perspective. So if somebody goes to a malware site and gets malware, where is that going to appear? Are there a lot of people that are going to this malware site and getting infected with it? Probably not. Again, unless it's a very large outbreak. If it's a large outbreak, you're going to know by the hundred calls that come into the help desk. But if it's only one or two people, it's going to end up down here. And so this is the area that you want to concentrate your analysis on.

Page 14 of 28

Researching the Outliers

19

Researching the Outliers

Whois• http://www.whois.net/• Basic registration information

Forensics• Is there a binary file on the other end of that URL?• Download into a sandboxed environment and see what it does using

your forensics toolkits.

Known malware DNS Lists• http://www.malwaredomains.com/• Lists of reported sites known to host malware servers

**019 When you actually find some of these outliers, how do you go and analyze it? How do you know it's good or bad? You can do "Who Is?" look-ups on it. That will tell you who registered the domain name. You can look at the URL, and the content type. Again, if it's text.html, it's probably okay, 'cause there's nothing dynamic in it. If it's application executable, or a .pdf or something that can have malware embedded in it, those are the ones that I'd want to go look at, too, and see if there's something going on there. If you're going to do forensics on it yourself, you definitely want to download

Page 15 of 28

it into a sandboxed environment. Do not put it onto-- look at it on the proxy list. Turn around to your computer on your desktop and plug it in and see where it goes, 'cause you might end up getting infected with it as well. So use a sandbox and the forensics toolkit. If you are interested in subscribing to known malicious domain names. There's malwaredomains.com, you can subscribe to their lists. There are a bunch of pay sites out there, where you pay for a thread intelligence, and they will keep you up-to-date on, "Here are the IP addresses that we know are malicious. Here are the domains we know are malicious. Here are the hashes of files that we know to be malicious." And so you can get thread intelligence that way to help you whittle down the list of things that are either good or bad.

Page 16 of 28

What If You Find a Bad Site?

20

What If You Find a Bad Site?

Say you find a malwarecontrol.server.ru• Either put it in a DNS Blackhole List• Or, do some proxy site blocking• Or both

You can also set up a darknet, where DNS bad traffic can go to be collected in order to find malware sitting inside your network.

These alerts can launch incident response actions.

**020 What do you do if you find a bad site? Let's say you found malwarecontrol.server.ru. Again, we're not picking on .ru, it's just, you know, whatever. You can put the actual URL in a black- hole. So we talked about this earlier. If you wanted to redirect users instead of having them be able to hit malwarecontrol.server.ru, you can redirect that through DNS and just say, 127.0.0.1. And now when people go to that URL, they actually get redirected to themselves, and there's no connection there. Do you have a question?

Page 17 of 28

Student: Is DNS black-holing of feature only available on bind/Linux, or it's you can also do that with Windows? Chris Evans: You can also do it with Windows. Student: Okay. Chris Evans: It's a way that you structure the zone files that are on that host. So basically, you say... Student: Create a new domain, and assuming-- pretend you own that domain, and you are the authorized-- Chris Evans: That's it exactly. You are the authoritative server for malwarecontrol.server.ru. That's the way you configure the black hole. Student: I see. Chris Evans: Do you have a question? Student: Yeah, I was going to say, now can you block by-- like I know-- I worked with like a spam server before, it's called Barracuda. Chris Evans: Mm hm. Student: And I was able to block by IP, by content, and by domain names and things of that nature. Now, I'm trying to think why a spam-- and that was just preventing mail from coming, from the source for just mail, I would believe that server. Now, can you do the same thing on a proxy for the same thing? Block by-- like I want all .ru's to not...

Page 18 of 28

Chris Evans: You can set your proxy up to do that. Student: You can set your proxy up to do that, okay. Chris Evans: Yep. Student: Okay, and the bind book you were talking about, now that's Linux- based? Now is that book explain how DNS is, or is it with like the commands on Linux, or is it a book-- Chris Evans: It explains this much about DNS, and this much about how to configure bind, and use bind and read it. Student: Okay. Chris Evans: So the vast majority of the book is how to run bind. But there was a DNS primer in the front of it. Student: Okay, in the front is explained. Okay, thanks. Chris Evans: So you can block at the DNS. You can block at the proxy. Or you can set up something called the Darknet. Where basically requests get redirected, so if you have malware that's calling out, and it's trying to look up, let's say malwarecontrol.server.ru, it actually gets redirected to a virtual machine or a sandbox. Or something that you've set up so that you can analyze it further. Probably more advanced forensic method if you want to go down this road of understanding. All right, "Well, how does this actually work? Because there is some malware out there if it can't reach the

Page 19 of 28

internet anymore, it deletes itself, or it shuts down. Some of the anti-forensics methods, the malware writers will put in there, will do that to you. So it's up to you on how you actually want to block it and how you want to manage the incident response process.

Bypassing the Proxy

21

Bypassing the Proxy

Sometimes, people want to visit sites that are “off limits” and will use proxy bypass websites to pull that traffic into your network.

The bad news: You cannot see what users are actually doing.The good news: You can possibly detect this behavior with

good log analysis.

Visiting whatever.com from bypassthat.com will not show whatever.com in the logs, but bypassthat.com will be in the DNS and Proxy logs.

**021 One of the things that you need to be cautious of is people bypassing the proxy server. How do people bypass the proxy server? Student: Change their settings. Chris Evans: What's that?

Page 20 of 28

Student: Change their setting in the machine. Chris Evans: Change your settings on the machine. Well, hopefully it's not that simple. But it might be. Student: It has been at times. Chris Evans: Yep. Student: That should probably be grayed out, right? When you go into tools, internet options. Student: It should be. Chris Evans: You can hope. But you're right, not always. That's the first place I check. Can I change it? Yay, no more proxy! How else can you bypass it? Student: I'm not sure. Student: Tether yourself. Chris Evans: Tether yourself to your cell phone? Yep, but now you're on a different network. Yep, what's that? Student: It's a dual hum, which is the same. Chris Evans: Yep, same thing. BPN. Student: Huh? Chris Evans: Anonymizing site. So like www.anonymizer.com. Oh, boy. Well, if your proxy doesn't block those, now the proxy can't see where you're going from that site out, because it's that website that maintains the proxy for you. How else can you bypass the proxy?

Page 21 of 28

Student: IP v6 tunneling. Chris Evans: Yay, the IP v6 tunneling, yep. There's also one other really sneaky way and it's usually the way most administrators get around it. They don't set one. Oops. You don't set a proxy, what happens to your traffic? Well, there's a special administrator route out to the internet, so they can do "research." Right. So most-- I find that IT administrators are notorious for doing this. And they're ones who bypass their own proxy server. Student: But I mean, if the proxy's in-- like physically cabled into the network, you're still going through the proxy. You're just not-- Chris Evans: Well? Student: That was my question. How can it get through, if it's there physically, and for you to get to my gateway, you gotta come through me. Chris Evans: So not all devices are compatible with proxy devices. So there was a case a while ago, we saw a network that was about half UNIX, half Windows. All the Window's users were going through the blue-coat proxy with authentication, and MTLMV-2 authentication. Great! How does MTLMV-2 authentication work on a Linux box? Student: Not at all. Chris Evans: It doesn't! So how could these Linux users authenticate to the proxy to get through it?

Page 22 of 28

Student: They weren't. Chris Evans: They didn't. So there was an exception for all the Linux users and they just went around the proxy device. So if you looked and smelled like a Linux device, even if you were a Windows device, you could just go right around it. And of course, all the IT administrators were going around it anyway. Or whatever. So the point behind all this is, you know, the bad news is you can't see what users are doing when they're actively trying to bypass the proxy. But you do have ways of detecting that. So your proxy device itself could block an anonymizer.com. Or something like that. You could detect these various other methods that users are trying to do to get around things.

Page 23 of 28

Automatic Correlation

22

Automatic Correlation

You can automate most of this.• Push all logs to a centralized logging server.

— Use of products like ArcSight, Splunk, etc.— You might still have to do some “by-hand” sifting.

**022 What we've really talked about so far has been all manual correlation and you looking at log files and grepping things out. There are automated ways of doing this, automatic correlations, Arc site will do this. Splunk will do it. You still might have to do some stuff by hand. But by all means, let the computer do the heavy lifting for you, and get the computer to do the correlation and filter all this stuff out for you. Yes, ma'am.

Page 24 of 28

Summary

23

Summary

Proxy and DNS log file concepts

Analyzing proxy and DNS logs

**023 Student: A stupid question, but does a proxy completely make you non- attributable? Chris Evans: Does it make you completely non-attributable? Student: Does it make you non- attributable? Chris Evans: No, because that proxy device is-- oh, I'm sorry, so what type of proxy? Are talking about like a blue-coat proxy device that's in a network, or these anonymizer sites that are proxying your-- which one are you talking about? Student: Well, I probably don't know. That's why I'm asking the question.

Page 25 of 28

Chris Evans: Okay, so the easy answer there is no, there's no way to guarantee complete anonymity. There's even something like the Tour network, the Onion router network, that was built as, "Oh, it's completely non-attributional. They can't find out what you're doing and where you're going." There are ways to figure out who's going where and doing what on that network. So I don't think it's a good idea or it would be fair to say that the proxy's anonymized use efficiently or these anonymous sites anonymize you as well. Because you think about it, what these devices really are and what these websites are are man-in-the-middle devices. So who's to say that somebody's not monitoring that? Or looking at that or otherwise might be doing something malicious with that information? I don't know. I don't think it's fair to say, or safe to say that you're completely anonymous. Student: But there are proxies that will do that is what you're saying. Type different types of-- Student: Not by default. Chris Evans: Yeah, not by default. I think there's-- for any proxy device, there's always a danger that you're not truly anonymous in it. Any other questions? **024 Student: I do have one question. It might sound stupid again. But the way I understand proxy servers is more of like an application layer device where it's primarily, mostly concerned with http and your DNS, which sits at layer seven as

Page 26 of 28

your application protocols. Now as opposed to like, you know, some firewalls you can be on a firewall, and you can see traffic going in. But firewalls typically-- I mean, there's different types of firewalls as well. Like if you have a layer three firewall, it's really only inspecting the packet coming in. It's either going to let you through or out. Chris Evans: Right, IP specific. Student: So really can't help you with-- proxies do some things that the firewalls cannot. Correct? Chris Evans: Yeah. And there are different types of proxy devices. So there are web proxies, there's database proxies. I mean, you can build a proxy for anything. Student: For anything. Gotcha.

Page 27 of 28

Notices

NoticesCopyright 2013 Carnegie Mellon University

This material has been approved for public release and unlimited distribution except as restricted below. This material is distributed by the Software Engineering Institute (SEI) only to course attendees for their own individual study. Except for the U.S. government purposes described below, this material SHALL NOT be reproduced or used in any other manner without requesting formal permission from the Software Engineering Institute at [email protected].

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.

The U.S. Government's rights to use, modify, reproduce, release, perform, display, or disclose this material are restricted by the Rights in Technical Data-Noncommercial Items clauses (DFAR 252-227.7013 and DFAR 252-227.7013 Alternate I) contained in the above identified contract. Any reproduction of this material or portions thereof marked with this legend must also reproduce the disclaimers contained on this slide.

Although the rights granted by contract do not require course attendance to use this material for U.S. Government purposes, the SEI recommends attendance to ensure proper understanding.

NO WARRANTY. THE MATERIAL IS PROVIDED ON AN “AS IS” BASIS, AND CARNEGIE MELLON DISCLAIMS ANY AND ALL WARRANTIES, IMPLIED OR OTHERWISE (INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE, RESULTS OBTAINED FROM USE OF THE MATERIAL, MERCHANTABILITY, AND/OR NON-INFRINGEMENT).

CERT® is a registered mark of Carnegie Mellon University..

Page 28 of 28