Dynamic Lookups
Agenda
Lookups in General
Static Lookups
Dynamic Lookups- Retrieve fields from a web site- Retrieve fields from a database- Retrieve fields from a persistent cache
2
Enrich Your Events with Fields from External Sources
3
4
Splunk: The Engine for Machine Data
Web logsLog4J, JMS, JMX.NET eventsCode and scripts
ConfigurationssyslogSNMPnetflow
ConfigurationsAudit/query logsTablesSchemas
HypervisorGuest OS, AppsCloud
ConfigurationssyslogFile systemps, iostat, top
RegistryEvent logsFile systemsysinternals
Logfiles Configs Messages Traps Alerts
Metrics Scripts TicketsChanges
Linux/UnixWindows NetworkingDatabasesApplicationsVirtualization
& Cloud
Click-stream dataShopping cart dataOnline transaction data
Customer Facing Data
Outside the Datacenter
Manufacturing, logistics…CDRs & IPDRsPower consumptionRFID dataGPS data
5
6
7
8
Interesting Things to Lookup
• User’s Mailing Address• Error Code Descriptions• Product Names• Stock Symbol (from CUSIP)
• External Host Address• Database Query• Web Service Call for Status• Geo Location
9
Other Reasons For Lookup
10
• Bypass static developer or vendor that does not enrich logs• Imaginative correlations• Example: A website URL with “Like” or “Dislike” count
stored in external source• Make your data more interesting• Better to see textual descriptions than arcane codes
Agenda
Lookups in General
Static Lookups
Dynamic Lookups- Retrieve fields from a web site- Retrieve fields from a database- Retrieve fields from a persistent cache
11
Static vs. Dynamic Lookup
12
Static
Dynamic
External Data comes from a CSV file
External Data comes from output of external script, which resembles a CSV file
Static Lookup Review
13
• Pick the input fields that will be used to get output fields• Create or locate a CSV file that has all the fields you need in the
proper order• Tell Splunk via the Manager about your CSV file and your lookup• You can also define lookups manually via props.conf and
transforms.conf• If you use automatic lookups, they will run every time the
source, sourcetype or associated host stanza is used in a search• Non-automatic lookups run only when the lookup command is
invoked in the search
Example Static Lookup Conf Files
14
props.conf[access_combined]
lookup_http = http_status statusOUTPUT status_description, status_type
transforms.conf[http_status]
filename = http_status.csv
PermissionsDefine Lookups via Splunk Manager & set permissions there
15
local.meta
[lookups/http_status.csv]export = system
[transforms/http_status]export = system
Example Automatic Static Lookup
16
Agenda
Lookups in General
Static Lookups
Dynamic Lookups- Retrieve fields from a web site- Retrieve fields from a database- Retrieve fields from a persistent cache
17
Dynamic Lookups
18
• Write the script to simulate access to external source
• Test the script with one set of inputs
• Create the Splunk Version of the lookup script
• Register the script with Splunk via Manager or conf files
• Test the script explicitly before using automatic lookups
Lookups vs Custom Command
19
• Use dynamic lookups when returning fields given input fields
• Standard use case for users who already are familiar with lookups
• Use a custom command when doing MORE than a lookup
• Not all use cases involve just returning fields
• Decrypt event data
• Translate event data from one format to another with new fields
(e.g. FIX)
Write/Test External Field Gathering Script
20
External Data inCloud Your Python Script
Send: Input Fields
Return: Output Fields
Example Script to Test External Lookup
21
# Given a host, find the corresponding IP address
def mylookup(host):
try:
ipaddrlist = socket.gethostbyname_ex(host)
return ipaddrlist
except:
return[]
External Field Gathering Script with Splunk
22
External Data inCloud Your Python Script
Return: Output Fields
Script for Splunk Simulates Reading Input CSV
23
hostname, ip
a.b.c.com
zorrosty.com
seemanny.com
Output of Script Returns Logically Complete CSV
24
hostname, ip
a.b.c.com, 1.2.3.4
zorrosty.com, 192.168.1.10
seemanny.com, 10.10.2.10
transforms.conf for Dynamic Lookup
25
[NameofLookup]
external_cmd = <name>.py field1….fieldN
external_type = python
fields_list = field1, …, fieldN
Example Dynamic Lookup conf files
26
transforms.conf# Note – this is an explicit lookup
[whoisLookup]external_cmd = whois_lookup.py ip whoisexternal_type = pythonfields_list = ip, whois
Dynamic Lookup Python Flow
27
def lookup(input): Perform external lookup based on input. Return result
main()Check standard input for CSV headers.
Write headers to standard output.
For each line in standard input (input fields): Gather input fields into a dictionary (key-value structure) ret = lookup(input fields) If ret: Send to standard output input values and return values from lookup
Whois Lookup
28
def main():
if len(sys.arv) != 3:
print “Usage: python whois_lookup.py [ip field]
[whois field]”
sys.exit(0)
ipf = sys.argv[1]
whoisf = sys.argv[2]
r = csv.reader(sys.stdin)
w = none
header = [ ]
first = True…
Whois Lookup (cont.) to Read CSV Header
29
# First get read the “CSV Header” and output the field names
for line in r:
if first:
header = line
if whoisf not in header or ipf not in header:
print “IP and whois fields must exist in CSV
data”
sys.exit(0)
csv.write(sys.stdout).writerow(header)
w = csv.DictWriter(sys.stdout, header)
first = False continue…
Whois Lookup (cont.) to Populate Input Fields
30
# Read the result and populate the values for the
input fields (ip address in our case)
result = {}
i = 0
while i < len(header):
if i < len(line):
result[header[i]] = line[i]
else:
result[header[i]] = ''
i += 1
Whois Lookup (cont.) to Populate Input Fields
31
# Perform the whois lookup if necessary
if len(result[ipf]) and len(result[whoisf]):
w.writerow(result)
# Else call external website to get whois field from
the ip address as the key
elif len(result[ipf]):
result[whoisf] = lookup(result[ipf])
if len(result[whoisf]):
w.writerow(result)
Whois Lookup Function
32
LOCATION_URL=http://some.url.com?query=
# Given an ip, return the whois response
def lookup(ip):
try:
whois_ret = urllib.urlopen(LOCATION_URL + ip)
lines = whois_ret.readlines()
return lines
except:
return ''
Database Lookup
33
• Acquire proper modules to connect to the database
• Connect and authenticate to database
• Use a connection pool if possible
• Have lookup function query the database
• Return a list([]) of results
Database Lookup vs. Database Sent To Index
34
• Well, it depends…• Use a Lookup when:• Using needle in the haystack searches with a few users• Using form searches returning few results
• Index the database table or view when:• Having LOTS of users and ad hoc reporting is needed• It’s OK to have “stale” data (N minutes) old for a dynamic
database
Example Database Lookup using MySQL
35
# First connect to DB outside of the for loop
conn = MySQLdb.connect(host = “localhost”, user = “name of user”,passwd = “password”,db = “Name of DB”)
cursor = conn.cursor()
Example Database Lookup (cont.) using MySQL
36
import MySQLdb…
# Given a city, find its country
def lookup(city, cur):
try:
selString=“SELECT country FROM city_country where city=“
cur.execute(selString + “\”” + city + “\””)
row = cur.fetechone()
return row[0]
except:
return []
Lookup Using Key Value Persistent Cache
37
• Download and install Redis• Download and install Redis Python module• Import Redis module in Python and populate
key value DB• Import Redis module in lookup function
given to Splunk to lookup a value given a key
Redis is an open source, advanced key-value store.
Redis Lookup
38
###CHANGE PATH According to your REDIS install ######
sys.path.append(“/Library/Python/2.6/…/redis-2.4.5-py.egg”)
import redis
…
def main()
…
#Connect to redis – Change for your distribution
pool = redis.ConnectionPool(host=‘localhost’,port=6379,db=0)
redp = redis.Redis(connection_pool=pool)
Redis Lookup (cont.)
39
def lookup(redp, mykey):
try: return redp.get(mykey)
except: return “”
Combine Persistent Cache with External Lookup
40
• For data that is “relatively static”• First see if the data is in the persistent cache• If not, look it up in the external source such as a database or
web service• If results come back, add results to the persistent cache and
return results• For data that changes often, you will need to create your own cache
retention policies
Combining Redis with Whois Lookup
41
def lookup(redp, ip): try: ret = redp.get(ip) if ret!=None and ret!='': return ret else: whois_ret = urllib.urlopen(LOCATION_URL + ip) lines = whois_ret.readlines() if lines!='': redp.set(ip, lines) return lines… except:
Where do I get the add-ons from today?Splunkbase!
42
Add-On Download Location Release
Whoishttp://splunk-base.splunk.com/apps/22381/whois-add-on
4.x
DBLookuphttp://splunk-base.splunk.com/apps/22394/example-lookup-using-a-database
4.x
Redis Lookuphttp://splunk-base.splunk.com/apps/27106/redis-lookup
4.x
Geo IP Lookup (not in these slides)
http://splunk-base.splunk.com/apps/22282/geo-location-lookup-script-powered-by-maxmind
4.x
43
Conclusion
Lookups are a powerful way to enhance your search experience beyond indexing
the data.
Thank You
Top Related