Monday, September 24, 2012

HowTo: Scan for Internet Cache/History and URLs

This post will describe how you can leverage the flexibility of the Volatility framework to locate IE history from Windows memory dumps. Such artifacts have traditionally not been a priority, because the data is in user-mode (i.e. index.dat mappings) and the structure format is already well understood and documented - thus there's not much challenge to the task. However, in the interest of helping others learn some memory analysis techniques, I'll go through a short tutorial of how to locate and parse Internet history both with and without dedicated plugins. If you need to verify a user's activity on a website or determine the source of a drive-by malware infection, this information may be useful to you.

The first thing to do is identify the Internet Explorer (iexplore.exe) process(es) in the memory dump. We'll discuss in more detail later, but it is very important to remember that IE is not the only application to map sections of index.dat into memory. Any tool using IE through a COM object or even malware using Wininet APIs (such as InternetOpenUrl, InternetReadFileHttpSendRequest, etc) will alter the cache/history; thus they may have portions of the data in memory.

$ ./vol.py -f win7_x64.dmp --profile=Win7SP0x64 pslist | grep iexplore
Volatile Systems Volatility Framework 2.3_alpha
0x0000fa800dd11190 iexplore.exe           2580   1248     18      532      1      0 2011-04-24 04:04:42                      
0x0000fa800d0e73f0 iexplore.exe           3004   2580     77     1605      1      0 2011-04-24 04:04:42

Scanning with Yara

Now that we know the PIDs of IE processes (2580 and 3004), we can use the existing yarascan plugin to get an initial view of where index.dat file mappings may exist. Since the file's signature includes "Client UrlCache" that's a good starting point.

$ ./vol.py -f win7_x64.dmp --profile=Win7SP0x64 yarascan -Y "Client UrlCache" -p 2580,3004
Volatile Systems Volatility Framework 2.3_alpha
Rule: r1
Owner: Process iexplore.exe Pid 2580
0x00270000  43 6c 69 65 6e 74 20 55 72 6c 43 61 63 68 65 20   Client.UrlCache.
0x00270010  4d 4d 46 20 56 65 72 20 35 2e 32 00 00 80 00 00   MMF.Ver.5.2.....
0x00270020  00 40 00 00 80 00 00 00 20 00 00 00 00 00 00 00   .@..............
0x00270030  00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
Rule: r1
Owner: Process iexplore.exe Pid 2580
0x00260000  43 6c 69 65 6e 74 20 55 72 6c 43 61 63 68 65 20   Client.UrlCache.
0x00260010  4d 4d 46 20 56 65 72 20 35 2e 32 00 00 80 00 00   MMF.Ver.5.2.....
0x00260020  00 50 00 00 80 00 00 00 54 00 00 00 00 00 00 00   .P......T.......
0x00260030  00 00 20 03 00 00 00 00 55 ff 00 00 00 00 00 00   ........U.......
[snip]

Now you know at least you're barking up the right tree. However, to simply find visited URLs and things of that nature, you don't need to parse the index.dat initial file header at all. For example, you can  just scan for the individual cache records, which start with "URL ", "LEAK", or "REDR" (there's also a HASH tag but its not necessary for our goals). Feel free to combine the multiple strings into a regular expression so you only need to search once:

$ ./vol.py -f win7_x64.dmp --profile=Win7SP0x64 yarascan -Y "/(URL |REDR|LEAK)/" -p 2580,3004 
Volatile Systems Volatility Framework 2.3_alpha

Rule: r1
Owner: Process iexplore.exe Pid 3004
0x026f1600  55 52 4c 20 03 00 00 00 00 99 35 2c 82 43 ca 01   URL.......5,.C..
0x026f1610  a0 ec 34 cb 34 02 cc 01 00 00 00 00 00 00 00 00   ..4.4...........
0x026f1620  76 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00   v...............
0x026f1630  60 00 00 00 68 00 00 00 03 01 10 10 c4 00 00 00   `...h...........
0x026f1640  41 00 00 00 dc 00 00 00 7d 00 00 00 00 00 00 00   A.......}.......
0x026f1650  98 3e a3 20 01 00 00 00 00 00 00 00 98 3e a3 20   .>...........>..
0x026f1660  00 00 00 00 ef be ad de 68 74 74 70 3a 2f 2f 6d   ........http://m
0x026f1670  73 6e 62 63 6d 65 64 69 61 2e 6d 73 6e 2e 63 6f   snbcmedia.msn.co

[snip]

Rule: r1
Owner: Process iexplore.exe Pid 3004
0x026c0b00  4c 45 41 4b 06 00 00 00 00 a6 3b 01 cc 97 cb 01   LEAK......;.....
0x026c0b10  c0 71 20 14 33 02 cc 01 98 3e 39 1f 00 00 00 00   .q..3....>9.....
0x026c0b20  f8 cf 00 00 00 00 00 00 00 00 00 00 80 2a 02 00   .............*..
0x026c0b30  60 00 00 00 68 00 00 00 03 00 10 10 40 02 00 00   `...h.......@...
0x026c0b40  41 00 00 00 60 02 00 00 9e 00 00 00 00 00 00 00   A...`...........
0x026c0b50  98 3e 99 1e 01 00 00 00 00 00 00 00 98 3e 99 1e   .>...........>..
0x026c0b60  00 00 00 00 ef be ad de 68 74 74 70 3a 2f 2f 75   ........http://u
0x026c0b70  73 65 2e 74 79 70 65 6b 69 74 2e 63 6f 6d 2f 6b   se.typekit.com/k

[snip]

Rule: r1
Owner: Process iexplore.exe Pid 3004
0x026e2680  52 45 44 52 02 00 00 00 78 1b 02 00 40 af d3 51   REDR....x...@..Q
0x026e2690  68 74 74 70 3a 2f 2f 62 73 2e 73 65 72 76 69 6e   http://bs.servin
0x026e26a0  67 2d 73 79 73 2e 63 6f 6d 2f 42 75 72 73 74 69   g-sys.com/Bursti
0x026e26b0  6e 67 50 69 70 65 2f 61 64 53 65 72 76 65 72 2e   ngPipe/adServer.

[snip]

As explained in the file format documentation, at offset 0x34 of a "URL" or "LEAK" string, you can find a 4-byte number (68 00 00 00 in the above examples) which specifies the offset from the beginning of the string to the visited location (i.e. URL). For redirected URLs, the location can be found at offset 0x10 of the "REDR" string. So given that information, you've already started finding non-arbitrary URLs in memory (i.e. ones that in fact were related to the cache/history and not just a domain name floating around). 

Designing a Plugin

While you've seen a quick and dirty way of locating sites in the IE history, there may be a need for different output formatting and better automation/parsing of the results. For example, instead of a hex dump, you might want a CSV file of visited URLs, with timestamps, the HTTP response data, and various other fields. For that, you'll need a dedicated plugin, but that can all be done in about 100 lines of code. To start, first define the record structures using Volatility's vtypes language, as shown below. It's all pretty basic, except we use a little trickery to automatically determine the location of the URLs within the structures (using the lambda functions which build an absolute address based on the structure's base address plus the 4-byte offset). 

profile.vtypes.update({
    '_URL_RECORD' : [ None, {
    'Signature' : [ 0, ['String', dict(length = 4)]], 
    'Length' : [ 0x4, ['unsigned int']], 
    'LastModified' : [ 0x08, ['WinTimeStamp', {}]],
    'LastAccessed' : [ 0x10, ['WinTimeStamp', {}]], 
    'UrlOffset' : [ 0x34, ['unsigned char']], 
    'FileOffset' : [ 0x3C, ['unsigned int']], 
    'DataOffset' : [ 0x44, ['unsigned int']], 
    'DataSize': [ 0x48, ['unsigned int']], 
    'Url' : [ lambda x : x.obj_offset + x.UrlOffset, ['String', dict(length = 4096)]], 
    'File' : [ lambda x : x.obj_offset + x.FileOffset, ['String', dict(length = 4096)]], 
    'Data' : [ lambda x : x.obj_offset + x.DataOffset, ['String', dict(length = 4096)]], 
    }], 
    '_REDR_RECORD' : [ None, {
    'Signature' : [ 0, ['String', dict(length = 4)]], 
    'Length' : [ 0x4, ['unsigned int']], 
    'Url' : [ 0x10, ['String', dict(length = 4096)]], 
    }],
})

Then we'll build a plugin based on the Plugin Interface wiki page in the Volatility Developer Guide. The plugin's name will be iehistory and it will inherit from taskmods.DllList for access to the existing command-line arguments like --pid and --offset (for filtering or selecting specific processes). We'll also add two extra options (--leak and --redr) so that reporting deallocated and redirected records can be optional. The full plugin source code can be viewed in the 2.3-devel branch

class IEHistory(taskmods.DllList):
    """Reconstruct Internet Explorer cache / history"""

    def __init__(self, config, *args, **kwargs):
        taskmods.DllList.__init__(self, config, *args, **kwargs)
        config.add_option("LEAK", short_option = 'L', 
                        default = False, action = 'store_true',
                        help = 'Find LEAK records (deleted)')
        config.add_option("REDR", short_option = 'R', 
                        default = False, action = 'store_true',
                        help = 'Find REDR records (redirected)')

Now for the all-important calculate function. This is where we do the majority of the work. Per usual, we'll acquire a kernel address space (using the Idle process DTB). We'll build a list of tags based on the selected command-line options and associate the tags with our _URL_RECORD and _REDR_RECORD structures. Please note that since "LEAK" structs are nearly identical to "URL " structs, we just alias the two and merge them. We use the _EPROCESS.search_process_memory() API and pass it the list of strings to find. For each hit (an address in process memory), we create the correct structure, check if its valid with some sanity checks, and then yield the process object and the record to the render function (not shown). 

def calculate(self):
    kernel_space = utils.load_as(self._config)
    
    ## Select the tags to scan for. Always find visited URLs,
    ## but make freed and redirected records optional. 
    tags = ["URL "]
    if self._config.LEAK:
        tags.append("LEAK")
    if self._config.REDR:
        tags.append("REDR")
        
    ## Define the record type based on the tag
    tag_records = {
        "URL " : "_URL_RECORD", 
        "LEAK" : "_URL_RECORD", 
        "REDR" : "_REDR_RECORD"}

    ## Enumerate processes based on the --pid and --offset 
    for proc in self.filter_tasks(tasks.pslist(kernel_space)):
    
        ## Acquire a process specific AS
        ps_as = proc.get_process_address_space()
        
        for hit in proc.search_process_memory(tags):
            ## Get a preview of the data to see what tag was detected 
            tag = ps_as.read(hit, 4)
            
            ## Create the appropriate object type based on the tag 
            record = obj.Object(tag_records[tag], offset = hit, vm = ps_as)
            if record.is_valid():
                yield proc, record

Using the IEHistory Plugin

The plugin has two rendering options. The default "text" mode will output blocks of data - one for each cache hit. For example:

$ ./vol.py -f win7_x64.dmp --profile=Win7SP0x64 iehistory -p 2580,3004
Volatile Systems Volatility Framework 2.3_alpha
**************************************************
Process: 2580 iexplore.exe
Cache type "URL " at 0x275000
Record length: 0x100
Location: Cookie:admin@go.com/
Last modified: 2011-04-24 03:53:15 
Last accessed: 2011-04-24 03:53:15 
File Offset: 0x100, Data Offset: 0x80, Data Length: 0x0
File: admin@go[1].txt

[snip] 
**************************************************
Process: 2580 iexplore.exe
Cache type "URL " at 0x266500
Record length: 0x180
Location: https://ieonline.microsoft.com/ie/known_providers_download_v1.xml
Last modified: 2011-03-15 18:30:43 
Last accessed: 2011-04-24 03:48:02 
File Offset: 0x180, Data Offset: 0xac, Data Length: 0xd0
File: known_providers_download_v1[1].xml
Data: HTTP/1.1 200 OK
Content-Length: 49751
Content-Type: text/xml

~U:admin

As you can see, the cache information can be quite interesting. However, it can also be verbose, so you may try the CSV option and open it a spread sheet for sorting/filtering. 

$ ./vol.py -f win7_x64.dmp --profile=Win7SP0x64 iehistory -p 2580,3004 --output=csv
Volatile Systems Volatility Framework 2.3_alpha
URL ,2011-04-24 03:53:15,2011-04-24 03:53:15,Cookie:admin@go.com/
URL ,2010-03-25 09:42:43,2011-04-24 04:04:46,http://www.google.com/favicon.ico
URL ,2010-08-10 00:03:00,2011-04-24 04:05:01,http://col.stc.s-msn.com/br/gbl/lg/csl/favicon.ico
URL ,2006-12-13 01:02:33,2011-04-24 04:05:08,http://www.adobe.com/favicon.ico
URL ,2011-03-15 18:30:43,2011-04-24 03:48:02,https://ieonline.microsoft.com/ie/known_providers_download_v1.xml
URL ,2010-08-30 15:37:13,2011-04-24 04:05:10,http://www.cnn.com/favicon.ie9.ico
[snip]

Note: as explained in the libmsiecf index.dat format reference, the timestamps may be UTC or localtime depending on if the record is found in a global, weekly, or daily history file. The caveat to scanning for individual record tags is there's no backwards link with the containing history header, so you can't easily determine if UTC or localtime is correct. Currently we use UTC for everything and may determine a better fix sometime before the 2.3 release.

Malicious Code Example 

I randomly choose one of Hogfly's public memory images (exemplar 17) to test the plugin against and was rather pleased with what I found. Remember earlier when we discussed the fact that processes other than IE will have parts of cache/history in memory? This certainly proves the point. Take a look, and notice we don't specify any --pid so the plugin scans the memory of all processes. You'll see hits in both explorer.exe (Windows Explorer) and a strange PID 1192 named 15103.exe. 

$ python vol.py -f exemplar17_1.vmem iehistory
Volatile Systems Volatility Framework 2.3_alpha
**************************************************
Process: 1928 explorer.exe
Cache type "URL " at 0xf25100
Record length: 0x100
Location: Visited: foo@http://192.168.30.129/malware/40024.exe
Last modified: 2009-01-08 01:52:09 
Last accessed: 2009-01-08 01:52:09 
File Offset: 0x100, Data Offset: 0x0, Data Length: 0xa0
**************************************************
Process: 1928 explorer.exe
Cache type "URL " at 0xf25300
Record length: 0x180
Location: Visited: foo@http://www.abcjmp.com/jump1/?affiliate=mu1&subid=88037&terms=eminem&sid=Z605044303%40%40wMfNTNxkTM1EzX5QzNy81My8lM18FN4gTM2gzNzITM&a=zh5&mr=1&rc=0
Last modified: 2009-01-08 01:52:44 
Last accessed: 2009-01-08 01:52:44 
File Offset: 0x180, Data Offset: 0x0, Data Length: 0x108
**************************************************
Process: 1192 15103.exe
Cache type "URL " at 0xf56180
Record length: 0x180
Location: http://fhg-softportal.com/promo.exe
Last modified: 2009-03-23 16:14:17 
Last accessed: 2009-01-08 01:52:15 
File Offset: 0x180, Data Offset: 0x8c, Data Length: 0x9c
File: promo[1].exe
Data: HTTP/1.1 200 OK
ETag: "8554be-6200-49c7b559"
Content-Length: 25088
Content-Type: application/x-msdownload

~U:foo
**************************************************
Process: 1192 15103.exe
Cache type "URL " at 0x9d5000
Record length: 0x100
Location: :2009010720090108: foo@http://192.168.30.129/malware/40024.exe
Last modified: 2009-01-07 20:52:09 
Last accessed: 2009-01-08 01:52:09 
File Offset: 0x100, Data Offset: 0x0, Data Length: 0x0
**************************************************
Process: 1192 15103.exe
Cache type "URL " at 0x9d5100
Record length: 0x100
Location: :2009010720090108: foo@:Host: 192.168.30.129
Last modified: 2009-01-07 20:52:09 
Last accessed: 2009-01-08 01:52:09 
File Offset: 0x100, Data Offset: 0x0, Data Length: 0x0

Brute Forcing URL Scans

Volatility can be as thorough as you want it to be. That said, there are a few situations we haven't discussed. For example, what if you're looking for URLs just in process memory (i.e. embedded in a web page but not yet visited, in JavaScript code, in a SWF string, etc)? IE history files are also known to have "slack space" where old records with long URLs may be overtaken by new records with smaller URLs, thus leaving part of the original domains in-tact. Furthermore, what about browsers that store history in different formats like Firefox and Chrome? 

In the above cases, you can always search for URLs in a bit more forceful, yet un-structured manner. If you don't already have a favorite PCRE for finding domains, IPs, and URLs, try some of the ones on http://regexlib.com/Search.aspx?k=URL (some may not work with Yara). For testing purposes I just grabbed the first one on the list:

$ ./vol.py -f win7_x64.dmp --profile=Win7SP0x64 yarascan -p 3004 -Y "/[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|biz|name|info)/"
Volatile Systems Volatility Framework 2.3_alpha

Rule: r1
Owner: Process iexplore.exe Pid 3004
0x003e90dd  77 77 77 2e 72 65 75 74 65 72 73 2e 63 6f 6d 2f   www.reuters.com/
0x003e90ed  61 72 74 69 63 6c 65 2f 32 30 31 31 2f 30 34 2f   article/2011/04/
0x003e90fd  32 34 2f 75 73 2d 73 79 72 69 61 2d 70 72 6f 74   24/us-syria-prot
0x003e910d  65 73 74 73 2d 69 64 55 53 54 52 45 37 33 4c 31   ests-idUSTRE73L1
0x003e911d  53 4a 32 30 31 31 30 34 32 34 22 20 69 64 3d 22   SJ20110424".id="
0x003e912d  4d 41 41 34 41 45 67 42 55 41 4a 67 43 47 6f 43   MAA4AEgBUAJgCGoC
0x003e913d  64 58 4d 22 3e 3c 73 70 61 6e 20 63 6c 61 73 73   dXM"><span.class
0x003e914d  3d 22 74 69 74 6c 65 74 65 78 74 22 3e 52 65 75   ="titletext">Reu

Rule: r1
Owner: Process iexplore.exe Pid 3004
0x00490fa0  77 77 77 2e 62 69 6e 67 2e 63 6f 6d 2f 73 65 61   www.bing.com/sea
0x00490fb0  72 63 68 3f 71 3d 6c 65 61 72 6e 2b 74 6f 2b 70   rch?q=learn+to+p
0x00490fc0  6c 61 79 2b 68 61 72 6d 2b 31 11 3a 87 26 00 88   lay+harm+1.:.&..
0x00490fd0  00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00   ................
0x00490fe0  d8 50 0b 09 00 00 00 00 00 00 00 00 00 00 00 00   .P..............
0x00490ff0  00 00 00 00 3e 46 69 6e 5d c7 37 4e 20 00 00 00   ....>Fin].7N....
0x00491000  40 10 49 00 00 00 00 00 00 00 00 00 00 00 00 00   @.I.............
0x00491010  01 00 00 00 63 61 3c 2f 63 00 6f 00 6e 00 74 00   ....ca</c.o.n.t.
[snip]

Conclusion 

Whether you're searching for complex data structures, simple strings, regular expressions, or byte patterns, Volatility can do what you want (if you learn to use it right) or it can easily be programmed for your needs. In this post we discussed to how find and parse data Internet history records - a task we normally wouldn't prioritize above some of the other really exciting and innovative things going on in our labs. However, we know Volatility users will enjoy some extra tricks and information on plugin development. Learn by example, become a power user, then spread the knowledge!

No comments:

Post a Comment