Friday 14 December 2012

Cheeky Season's Greetings


Today I thought I would do a brain-dump of some things/tips I've done/encountered since starting my internship about 6 weeks ago.
Hopefully some of it will be useful to others but at the very least it will help ensure I don't forget stuff (an insidious affliction affecting 2 out of 3 older monkeys).
But before I get to that, I'd like to thank everyone who has helped me learn more about DFIR over the past year or so. This includes the generous DFIRers that have responded to my emails/Tweets or shared their knowledge (via blog, book, presentation or those unlucky enough to have a socially awkward meeting with me). Hopefully, I have also thanked you individually in an email/Tweet at some point. Mother monkey did not raise an ungrateful bastard - so if I have forgotten to thank you I apologise.
When I started this blog,  I was hoping it might help me connect with some other newbies. But what I've since realised is that as DFIR constantly changes,  everyone is a potential newbie and having the newbie curiosity is a pre-requisite. Perhaps that is why DFIRers seem so helpful/approachable?
Originally, it was also more of an excuse to draw crude (in every sense of the word) cartoons. But somehow it ended up becoming more technical - hopefully the picture above helps re-balance the universe somewhat.
Anyway, enough of the touchy-feely stuff - on to the forensics!

Bing Bar Parser

Based heavily on the work of others (*cough rip off *cough), I wrote a "bing-bar-parser.pl" Perl script to parse Bing Bar history files.
The Bing Bar is Microsoft's attempt at combining search, Hotmail and other stuff into an IE toolbar. I can't say how popular it is in the age of Google but just in case, here's a tool. This script is intended to provide a Perl based method to view a user's Bing Bar search terms as stored in "searchhs.dat". By writing it in Perl I hoped to both improve my Perl programming and provide a cross-platform tool for fellow SIFT/Linux users.

Specifically, the script was based on the following sources:
- Mari DeGrazia's comprehensive blog post on interpreting the Bing Bar's "searchhs.dat" file.
- The "processBingToolbar" function from the "sep-history-viewer" C# Google code project by "ForensicGeek InTheCorner"
- The SystemTime 128 bit data structure as defined by Microsoft.
- Mark Stosberg's blog on Percent-encoding URIs in Perl.

Note: According to the "sep-history-viewer" page, the Bing Bar's "searchhs.dat" file is typically located for XP in:
\Documents and Settings\<user>\Local Settings\Application Data\Microsoft\Search Enhancement Pack\Search Box Extension

For Vista / Win7 it is located in:
\Users\<user>\AppData\LocalLow\Microsoft\Search Enhancement Pack\Search Box Extension

Also note that for a Win7 system, an additional searchhs.dat file (with URI encoding) has been found in:
\Users\<user>\AppData\Local\Microsoft\BingBar\Apps\Search_6f21d9007fa34bc78d94309126de58f5\VersionIndependent

You can use the "bing-bar-parser.pl" script's -d option to decode the URI encoding to something more readable (eg convert %20 to a space character). For the help hints, type "bing-bar-parser.pl -h".

Here's a fictional example script output (fields are pipe separated):

.\bing-bar-parser.pl -f /cases/searchhs.dat -d

Running bing-bar-parser.pl v2012-11-10

File Header field check (should be "FACADE0"): FACADE0
Bing Bar File Version: 1
Record ID for next new record: 341
Number of Records in this file: 335

Record id: 1 | Search Count: 2 | Last Search Time (UTC) = 2012-12-14 Fri 16:06:36 | Term: monkeys
... [ bunch of records edited out ]
Record id: 340 | Search Count: 7 | Last Search Time (UTC) = 2012-12-15 Sat 01:26:39 | Term: monkeys on typewriters

bing-bar-parser.pl v2012-11-10 Finished!

Notice that the "Record ID for next new record" does not reflect the actual number of records in the file.

Limited testing has been done on SIFT v2.14 and on 64 bit Windows 7 with ActiveState Perl v5.16. The script seems to parse data in accordance with Mari's blog post.

For more information, please read the comments in the code (you can download it from my google code page here).
Special Thank-yous to Mari, "ForensicGeek InTheCorner" and Mark Stosberg - they saved me a bunch of time!

Viewing previous versions of Websites

 Sometimes forensicators might need to look at previous versions of a website.
The Wayback Machine takes historical snapshots of selected websites and allows users to view them in their former glory.

Google also caches pages. You can view the latest cached page using the URL:
http://webcache.googleusercontent.com/search?q=cache:http://nfl.com

where http://nfl.com is the URL in question.
Images/animations may not load so you might have to click on the "Text-only version" link provided.
Incidentally, this cached page is what is loaded when you hover over a Google search result.
The Google cache might be handy if the page has been recently taken offline and has not made it into the Wayback Machine.

Steam Profiles

One of my practice cases involved investigating a PC for communications between a particular user who had "run away" and another party.
I noticed various references to a Steam profile in the Internet History.
Steam is used to download/play online games (eg Call of Duty, Left for Dead) and keeps various statistics on:
- if a user is currently online
- a user's playing time (eg 5 hours in the last 2 weeks)
- a user's game achievements
- a user's friends

So I thought it might be worth looking up this user's Steam profile. It's easy enough - go to: http://steamcommunity.com/apps
and type in the user's Steam profile name. If their profile is public, it should be returned by the search.

In my case, I was able to view the profile and saw that they had recently played and had also included their current location in their profile information. Of course, the user could have typed anything in for their current location but at least there was some indication that the user was still active.

Extracting from an SQLite Database

 Say you want to extract the information from an SQLite database in CSV format for re-importing into another program. Here's one way of doing it courteousy of  The Definitive Guide to SQLite (2 ed. 2010) by Allen and Owens ...

1. If required, install the latest "sqlite" command line shell from here.
Note: SIFT already has a version of "sqlite" installed.

2. Run the "sqlite" command with the input sqlite file as an argument. For example type:
"sqlite cookies.sqlite"

3. (Optional) Type ".schema" to list the tables, indexes etc. contained in the given sqlite file.

4. Type ".separator ," to change the default separator to a comma.

5. Type ".output ffcookies.csv" to specify an output file (eg "ffcookies.csv").

6. Run your query for the data that you're interested in. For example type:
"select * from moz_cookies;"

7. Type ".quit" to quit the "sqlite" command line shell.

The query results should now be CSV formatted in the nominated output file.
Alternatively, you can also use the SQLiteManager Firefox plugin to open an sqlite file and export it as CSV.

So thats all folks! Thanks for reading/commenting over the past year and here's wishing you a Happy Holiday Season (and Happy New Year too if I don't post anything in the near future).

Sunday 21 October 2012

Thoughts on Intern Monkeys


I apologise for the long break between posts. I've been doing some renovation work and my well of ideas seems to have run dry. In an attempt to kickstart some creativeness, I recently contacted some people to volunteer my limited testing services. Even though I didn't end up testing much, one of the parties (lets call them an "Anonymous Benefactor") offered me an unpaid remote internship. It has the potential to help both of us - I get some actual hands-on experience and they get a (hopefully timesaving) research monkey.
So this got me to thinking that a post on internship issues could prove useful to my fellow noobs and/or a prospective employer who is considering taking an intern on.

Duties
Both parties should agree on what tasks the intern is expected to do before they commence. For example, the intern will conduct supervised forensic exams on cases, or subsets of data provided by a senior examiner. This may include ongoing cases, past cases, or simulated cases. Analysis may include, but is not limited to Windows registry files, Internet History, keyword searches, timeline construction, data carving and data recovery. Other duties may include report review and writing, research and testing, and script/programming development.

Position Details / Terms and Conditions of Internship
Some other issues which could be addressed include:
Timeframe: List the Start and End date (if applicable/known).
Working Hours: Is the internship part-time/full-time? It might be helpful to list the maximum number of hours per week expected. Time zone differences should also be taken into account for remote internships.
Location: Can the duties can be performed remotely (ie via Internet) or is the intern required on site/to travel.
Scheduling: Agree on how work is assigned, what to do if a deadline is unachievable etc.
Remuneration:
Spell out if it's an Unpaid Internship and if there is (not) a promise of future employment.
Termination: State the agreed period of notice, if both parties can terminate and what happens to any relevant data/hardware/software after termination (eg gets returned/wiped).
Liability: State who is legally responsible for the intern's work. For example, the intern's work will be verified/reviewed before being used in a report. Any liability then remains with the employer.
Travel costs: Obviously this is more of an issue with remote internships. Should the intern be required to travel / testify in court, both parties should agree beforehand on who will pay for reasonable travel costs.
Equipment: Both parties should agree on what hardware/software will be provided by the intern and what hardware/software will be supplied by the employer. Also, what happens to data/software/hardware upon the ending of the internship. One potential requirement which might surprise a new intern is that analysis computer(s) must not be connected to the Internet whilst the Intern is working with client data. Separate PCs and/or use of Virtual Machines could assist with this requirement.
Software Authorship: If the intern writes a script/program during the internship, do they own it? Or does the employer?
Blogging: If the intern wishes to blog about something they learned/observed, is it OK? Employers should be given the chance to review/approve any content which could potentially disclose confidential information.
Additional work for 3rd parties: Can the intern perform tasks for other parties (eg beta testing)? The employer might want final say just in case the 3rd party is a potential competitor.

Confidentiality
Obviously, the employer is trusting the intern not to disclose sensitive data but if Johnny/Janet Law comes knocking, the intern should be aware that they are obligated to obey any lawful orders. Some orders may even prohibit the intern from notifying their employer.
As an example of a confidentiality clause -  the intern is not to disclose any confidential information (eg client data, employers business data) unless with the employers consent or as required by the law.

Non compete
Address any restrictions on who the intern can work for after their internship ends. This could prove difficult to agree on because the intern will likely be interested in any/all potential offers. Limiting the intern's knowledge of the employer's business practices (eg client list, pricing list) could be one strategy to reduce an intern's ability to "compete" in the future. A remote internship is also less likely to result in the need for a non-compete agreement.

Applicable Labour Laws
This will vary from each state/country. I am not a lawyer so please don't rely on this monkey's ramblings - seek your own legal advice! Some things you may need to consider - term limits, start/end dates, which party is benefiting more (it should probably be the intern).
In general, I think most governments realise that unpaid internships are a good thing (especially in the current economy). As long as everyone agrees to what is expected of them, then there should be no need for lawyers. To minimise any surprises, spell out as much as you think relevant in any internship agreement. It may take a little longer to get started, but it really should be worth the effort.

Final Thoughts
As an intern, my attitude should be to learn as much as possible and to protect the employer's interests. The employer is doing me a favour, so I should treat them accordingly. By addressing the above issues before the internship starts, both parties can then focus on the forensic work at hand.
If you have any questions/thoughts, please leave a comment. Just note my "Anonymous Benefactor" is not seeking any other interns at this time. So please don't ask for their contact details!
I'd like to finish off by thanking a few forensic friends for sharing their valuable thoughts about internships - Mari DeGrazia, Syd Pleno and Carl House.
Hopefully, I will be able post something new and more forensically interesting soon ...

Wednesday 23 May 2012

A Perl script plays Matchmaker with ExifTool and SQLite


Imagine their cute little offspring!
- Query-able metadata = The ability to sort/query by several metadata fields (not just time).
- Metadata from an well established/tested source library (ExifTool)
- SQLite cross platform compatibility
- And all available for the one low price on SIFT (free!)

Being Perl, it won't be as fast as a compiled language like C/C++.
Also, at this time only .doc, .docx, .xls, .xlsx, .ppt, .pptx, .pdf and .jpg files are supported.
As a result, I'm pretty sure you wouldn't want to blindly point it at "C:\" as it could take a while/use a lot of memory.
Instead, say you found a directory of interest (eg a suspect's "naughty" directory). You could extract the metadata to the database, take a hash of the database and then send it on to another analyst for further analysis/reporting. Or you could analyse it yourself using standard SQL queries from either the "sqlite3" client or the Firefox "SQLite Manager" plugin.

As far as I know, this capability does not currently exist (at least for open source) so I think this script could prove handy for several different types of investigations. For example: fraud, e-discovery processing/culling, processing jpgs (eg exploitation).

Having a warped sense of humour, I'm calling this extraction tool "SquirrelGripper".
In Australia, to grab a bloke by his "nuts" (accidentally or on purpose or sometimes both) is also known as "the Squirrel Grip". This has been known to happen in the heat of battle/whilst tackling an opponent during a Rugby or Aussie Rules match. Thankfully, I have never had this done to me - I'd like to think I was just too quick but maybe they just couldn't find them LOL.
The idea behind the name is that "SquirrelGripper" (aka "SG") will lead analysts to the low hanging fruit right at the crux of the matter ;)
In polite company (eg client briefing), one could say that "SquirrelGripper" finds all the little nuts/nuggets of information.
By running this script an analyst will hopefully save time. Rather than having to launch the ExifTool exe multiple times for different files/metadata, they can launch this script once and then perform SQL queries for retrieving/presenting their metadata of interest.

Whilst developing this tool, I contacted some more experienced DFIR'ers for feedback. So I'd just like to say a big "ralphabetical" Thankyou to Brad Garnett, Cindy Murphy, Corey Harrell, Gerald Combs, "Girl, Unallocated" and Ken Pryor. Sorry for clogging your inboxes with my semi-organised/semi-chaotic thoughts. Having an audience for my email updates provided both structure and motivation for my thoughts.

An extra bunch of thanks to Corey Harrell (does this guy ever rest?) who was both my testing "guinea pig" and who also made some excellent suggestions. Most, if not all of his suggestions were incorporated into this latest version of SG. I can't wait to see/read how he will use it in his investigations.

Enough already! Let's get started ...

The Design

As SquirrelGripper has grown to over 1000 lines, I have decided to provide the code via my new Google Code project ("cheeky4n6monkey") and just give a general description in this blog post. Like all my scripts, it is provided "as is" and whilst I have done my best, there may still be bugs in the code. As always, perform your own validation testing before using it live.

We'll start by describing how SquirrelGripper will store metadata.
Currently, there are 9 tables in the created database - FileIndex, XLSXFiles, XLSFiles, DOCXFiles,DOCFiles, PPTXFiles, PPTFiles, PDFFiles, JPEGFiles.

FileIndex is the main index table and contains the following fields:
-"AbsFileName" (primary key) = Absolute path and filename. Used as a common key to FileIndex and other file specific tables.
-"DateTimeMetaAdded" = Date and time SquirrelGripper extracted the metadata for a file.
-"FileType" = ExifTool field indicating file type. We can use this to determine which table should contain the remaining file metadata.
-"CaseRef" = Mandatory User specified case tag string.
-"UserTag" = Optional User specified tag string. Can be used for labelling directories/files for a particular SG launch.

The 8 other tables will be used to record file type specific metadata (eg "Slides" in PPTXFiles = Number of Slides for a .pptx file). I decided to use separate tables for each file type because there are simply too many fields for a single table (over 100 in total).

Most of the metadata fields are stored as TEXT (strings) in the database. Some obvious integer fields are stored as INTs (eg Words, Slides, FileSize). GPS co-ordinates from tagged JPEGs are stored in both TEXT and REAL (decimal) format. If in doubt (eg ExifToolVersion), I made the script store the data as TEXT.
Dates are stored as TEXT in the database and I have tested that they can be processed by SQLite functions. For example, "julianday(CreateDate)" works OK. I have also been able to retrieve/sort records by date.

In the interests of everyone's sanity (including my own) I won't detail any more of the schema. You can see them yourself in the code by searching for the "CREATE TABLE" strings.

The code is structured quite similar to "exif2map.pl". We are using the File::Find package to recursively search for the handled filetypes and then we call Image::ExifTool's "GetFoundTags" and "GetInfo" functions to extract the metadata. Depending on the extracted "FileType" tag, we call the appropriate handling function (eg "ProcessPPT"). These handling functions will read selected fields and if the field is defined, the function will insert/replace the metadata into the user specified SQLite database. If the metadata does not exist, a "Not Present" string will be inserted into the table instead.

Selections of metadata fields were made by throwing darts by estimating which of the fields would be most likely to hold information of forensic interest.
For example, MS Office 2007+ files have a variable list of "Zip" fields which I thought was of limited value so the script doesn't store these. However, the main fields such as "FileName", "CreateDate", "FileModifyDate" etc. are all included.
Having said that, if you think SquirrelGripper should include a particular field that I've left out, please let me know.

Installing SquirrelGripper

The Image::ExifTool Perl package is already installed on SIFT. We have previously used it in our "exif2map.pl" script. However due to some internal label changes to ExifTool, we need to grab the latest version (v8.90) for our script. We can do this on SIFT by typing: "sudo cpan Image::ExifTool".

The last few posts have also detailed our travels with Perl and SQLite. We have been using the DBI Perl package to interface with SQLite databases. So if you haven't already done it, grab the latest DBI package from CPAN by typing: "sudo cpan DBI".

Next, you can download/unzip/copy "squirrelgripper.pl" (from here) to "/usr/local/bin" and make it executable (by typing "sudo chmod a+x /usr/local/bin/squirrelgripper.pl")

Now you should be ready to run SquirrelGripper on SIFT in all its glory.

To run on Windows, install ActiveState Perl and use the Perl Package Manager to download the ExifTool package (v8.90). DBI should already be installed. Next, copy the "squirrelgripper.pl" script to the directory of your choice.
You should now be able to run SG at the command prompt by typing something like:
"perl c:\squirrelgripper.pl -newdb -db nutz2u.sqlite -case caseABC -tag fraud-docs -dir c:\squirrel-testdata\subdir1"
See next section for what each of the arguments mean.


Running SquirrelGripper

For this test scenario, I have various .doc, .docx, .xls, .xlsx, .ppt, .pptx, .pdf files in the "/home/sansforensics/squirrel-testdata/subdir1" directory.
I have also copied various .jpg files to the "/home/sansforensics/squirrel-testdata/subdir2" directory

It is assumed that a new database will be created for each case. However, the same database can be also used for multiple iterations of the script. Just FYI - you can get a help/usage message by typing "squirrelgripper.pl -h"

The script recursively searches thru sub-directories so please ensure you've pointed it at the right level before launching. It is also possible to mark different sub-directories with different case tags. eg Launch script with one directory using the case tag "2012-04-18-caseA-companyA". Then launch the script a second time pointing to another directory using the case tag "2012-04-18-caseA-companyB". SG can also handle multiple -dir arguments in case you need to extract data from more than one directory (eg "-dir naughty/pics -dir naughty/docs"). If a "-tag" argument is also specified, it will apply to files from both directories.

The first example uses "-newdb" to create the "nutz2u.sqlite" database in the current directory. It also tags all "subdir1" files with the "fraud-docs" user tag (you can see the "UserTag" value in the "FileIndex" table). Currently, the "-db", "-case" and "-dir" arguments are mandatory.
Note: the -dir directory can be an absolute path or a relative one.

squirrelgripper.pl -newdb -db nutz2u.sqlite -case caseABC -tag fraud-docs -dir /home/sansforensics/squirrel-testdata/subdir1/

The output looks like:


squirrelgripper.pl v2012.05.18
Assuming /home/sansforensics/squirrel-testdata/subdir1/ is an absolute path
Directory entry for processing = /home/sansforensics/squirrel-testdata/subdir1/

Now processing /home/sansforensics/squirrel-testdata/subdir1/excel2k7-Book1.xlsx
/home/sansforensics/squirrel-testdata/subdir1/excel2k7-Book1.xlsx inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/excel2k7-Book1.xlsx inserted into XLSXFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/doj-forensic-examination-dig-evidence-law-enforcers-guide-04-199408.pdf
/home/sansforensics/squirrel-testdata/subdir1/doj-forensic-examination-dig-evidence-law-enforcers-guide-04-199408.pdf inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/doj-forensic-examination-dig-evidence-law-enforcers-guide-04-199408.pdf inserted into PDFFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/word2k7.docx
/home/sansforensics/squirrel-testdata/subdir1/word2k7.docx inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/word2k7.docx inserted into DOCXFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/08-069208CaseReport.doc
/home/sansforensics/squirrel-testdata/subdir1/08-069208CaseReport.doc inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/08-069208CaseReport.doc inserted into DOCFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/acme_report.pdf
/home/sansforensics/squirrel-testdata/subdir1/acme_report.pdf inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/acme_report.pdf inserted into PDFFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/Windows Passwords Master 1.5 Handout - Jesper Johansson.ppt
/home/sansforensics/squirrel-testdata/subdir1/Windows Passwords Master 1.5 Handout - Jesper Johansson.ppt inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/Windows Passwords Master 1.5 Handout - Jesper Johansson.ppt inserted into PPTFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/excel2k7-Book2.xlsx
/home/sansforensics/squirrel-testdata/subdir1/excel2k7-Book2.xlsx inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/excel2k7-Book2.xlsx inserted into XLSXFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir1/Powerpoint2k7.pptx
/home/sansforensics/squirrel-testdata/subdir1/Powerpoint2k7.pptx inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir1/Powerpoint2k7.pptx inserted into PPTXFiles table
sansforensics@SIFT-Workstation:~$



The second call assumes the "nutz2u.sqlite" database already exists and tags all "subdir2" files with the "fraud-pics" tag.

squirrelgripper.pl -db nutz2u.sqlite -case caseABC -tag fraud-pics -dir /home/sansforensics/squirrel-testdata/subdir2

The output looks like:


squirrelgripper.pl v2012.05.18
Assuming /home/sansforensics/squirrel-testdata/subdir2 is an absolute path
Directory entry for processing = /home/sansforensics/squirrel-testdata/subdir2

Now processing /home/sansforensics/squirrel-testdata/subdir2/GPS2.jpg
lat = 41.888948, long = -87.624494
/home/sansforensics/squirrel-testdata/subdir2/GPS2.jpg : No GPS Altitude data present
/home/sansforensics/squirrel-testdata/subdir2/GPS2.jpg inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir2/GPS2.jpg inserted into JPEGFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir2/Cheeky4n6Monkey.jpg
/home/sansforensics/squirrel-testdata/subdir2/Cheeky4n6Monkey.jpg : No GPS Lat/Long data present
/home/sansforensics/squirrel-testdata/subdir2/Cheeky4n6Monkey.jpg : No GPS Altitude data present
/home/sansforensics/squirrel-testdata/subdir2/Cheeky4n6Monkey.jpg inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir2/Cheeky4n6Monkey.jpg inserted into JPEGFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir2/GPS_location_stamped_with_GPStamper.jpg
lat = 41.888948, long = -87.624494
/home/sansforensics/squirrel-testdata/subdir2/GPS_location_stamped_with_GPStamper.jpg : No GPS Altitude data present
/home/sansforensics/squirrel-testdata/subdir2/GPS_location_stamped_with_GPStamper.jpg inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir2/GPS_location_stamped_with_GPStamper.jpg inserted into JPEGFiles table

Now processing /home/sansforensics/squirrel-testdata/subdir2/wheres-Cheeky4n6Monkey.jpg
lat = 36.1147630001389, long = -115.172811
/home/sansforensics/squirrel-testdata/subdir2/wheres-Cheeky4n6Monkey.jpg : No GPS Altitude data present
/home/sansforensics/squirrel-testdata/subdir2/wheres-Cheeky4n6Monkey.jpg inserted into FileIndex table
/home/sansforensics/squirrel-testdata/subdir2/wheres-Cheeky4n6Monkey.jpg inserted into JPEGFiles table
sansforensics@SIFT-Workstation:~$


SquirrelGripper has also been similarly tested using ActiveState Perl v5.14.2 on Win XP by myself and Corey Harrell has also tested it using ActiveState Perl v5.12 on Windows 7 (32/64). Please note however, that the majority of my testing was done on SIFT v2.12 and Perl 5.10.0.


Once the metadata has been extracted to the SQLite database we can use SQL queries to find specific files of interest. This can be done via the "sqlite3" client and/or more easily via the Firefox "SQLite Manager" plugin.


Extracting SquirrelGripper results

So now we have our SQLite database chock full of metadata - how can we search it?
We need some basic SQLite commands which we can learn here.
W3Schools and Wikipedia also have some useful general information on SQL queries.

For our first example query we will be finding pictures from a particular camera model and ordering the results by date/time of the original.
Building upon the previous section, we have opened the "nutz2u.sqlite" database by using the Firefox "SQLite Manager" plugin on SIFT.
To do this - open Firefox on SIFT, under the "Tools" menu, select "SQLite Manager" to launch.
Then under the "Database" menu, select "Connect Database", browse to the user specified database (eg "nutz2u.sqlite") and press the "Open" button. Click on the "FileIndex" table tree item on the left hand side and then look under the "Browse & Search" tab. You should now see something like:


nutz2u.sqlite's FileIndex Table

We can see that we've got 12 files of all the different supported file types (see "File Type" column in pic). We can also see the absolute path filename ("AbsFileName"), the Date/Time the data was extracted ("DateTimeMetaAdded"), the case tag field ("CaseRef") and the user tag field ("UserTag").
If we now click on the "JPEGFiles" table tree item and look under the "Browse & Search" tab, we can see which jpeg files had their metadata extracted:

nutz2u.sqlite's JPEGFiles Table

Note: For the file specific tables, you will probably have to use the scroll bar to see all of the fields. Not shown in the pic above are the "Model" and "DateTimeOriginal" fields. We will be using these fields in our SQL query.
To execute a query against the JPEGFiles table, we click on the "Execute SQL" tab and then enter in the following:

SELECT * FROM JPEGFiles WHERE Model='Canon PowerShot SX130 IS' ORDER BY DateTimeOriginal;


We then press the "Run SQL" button and we see that we have found two jpegs that meet our search criteria:

nutz2u.sqlite's SQL Search Results

Closer inspection of the "AbsFileName" fields shows the filenames are "/home/sansforensics/squirrel-testdata/subdir2/GPS2.jpg" and "/home/sansforensics/squirrel-testdata/subdir2/GPS_location_stamped_with_GPStamper.jpg". So given a bunch of .jpgs, we have quickly found multiple pictures that were taken by the same camera model.I thought that was pretty cool eh?

Similarly, we can run other queries against the database such as:

Finding .docs by same author and sorting by page count:
SELECT * FROM DOCFiles WHERE Author='Joe Friday' ORDER BY PageCount;

Sorting Powerpoint files by Revision Number:
SELECT * FROM PPTFiles ORDER BY RevisionNumber;

Finding all "fraud-pics" user tagged .JPEGs:
SELECT * FROM JPEGFiles, FileIndex WHERE JPEGFiles.AbsFileName=FileIndex.AbsFileName AND FileIndex.UserTag='fraud-pics';

Finding the LastModified time and the user responsible (for .xlsx files) BEFORE a given date/time:
SELECT LastModifiedBy, ModifyDate from XLSXFiles where DATETIME(ModifyDate) < DATETIME('2012-05-01 06:36:54')
Note: We are using the SQLite DATETIME function to convert our string date into a format we can perform comparisons with.

Finding the Filename and Keywords (for .pptx files) where the Keywords contains "dea". This will find keywords such as "death" "deal" "idea". The % represents a wildcard:
SELECT FileName, Keywords from PPTXFiles where Keywords LIKE '%dea%'

Finding the Filename and Template names used (for .doc files) when the Template name starts with "Normal.dot":
SELECT FileName, Template FROM DOCFiles WHERE Template LIKE 'Normal.dot%'

Find Unique Company names by performing a UNION of each table's Company fields. For more information on the UNION keyword see here.
SELECT Company FROM DOCFiles UNION SELECT Company FROM DOCXFiles UNION SELECT Company FROM XLSXFiles UNION SELECT Company FROM XLSFiles UNION SELECT Company FROM PPTXFiles UNION SELECT Company FROM PPTFiles

Finding the Filename and GPS Longitude (for .jpg files) where the GPS Longitude is less than "-115.0":
SELECT FileName, GPSLongitude FROM JPEGFiles WHERE GPSLongitude < '-115.0'

There are LOTS more ways of querying the database because there's a shedload of other fields we can use. Check out the script code (search for "CREATE TABLE") or create your own example database (using SG) for more details on what metadata fields are being extracted. Where possible, I have used the same field names as the ExifTool exe prints to the command line.


Final Words

We have written a Perl script ("squirrelgripper.pl") to extract file metadata into an SQLite Database. We have also shown how to perform some basic SQL queries against that database.

In total, for the .doc, .docx, .xls, .xlsx, .ppt, .pptx, .pdf, .jpeg file types there were over 100 fields to extract. Whilst I double-checked my script, it is possible I've made some errors/overlooked something. It is also possible to process other file types (eg EXE, DLL) but I thought I would wait for further feedback/specific requests before continuing.

UPDATE: Just had this thought - If/when I do add the EXE & DLL functionality, SquirrelGripper could also be used for locating known Malware/indicators of compromise as well. I've created a monster!

I suspect there will be some commonly used SQL queries across certain types of investigations. So analysts could build up a standard set of queries to run across multiple cases of the same type. That could save analysts time / help narrow down files of interest.

If you found this script useful and/or you have any questions, please drop me a Tweet / email / leave a comment. The more I know about if/how forensicators are using SquirrelGripper - the better I can make it. Now that the basic framework is in place, it should be pretty quick to add new fields/file types so long as they are supported by the Image::ExifTool package (v8.90).

Thursday 3 May 2012

Extracting Font metadata from MS Excel 2007 and MS Word 2007 files


Ever wondered how to programmatically extract a list of fonts from MS Excel 2007 or Word 2007?
Me neither ... However, thanks to a Corey Harrell Tweet I read about a case where documents could be proved false based upon the fonts used. That is, the forgers used a font which did not yet exist at the purported creation time. Corey also helpfully provided links that described which fonts are installed with the various MS Office versions (MS Office 2010 , Office 2007 & other MS Office versions).

My first stop was Exiftool (on SIFT v2.12) which extracted plenty of metadata but unfortunately not font information.
So my next stop was a Perl script. But to do this requires some background knowledge of how Excel 2007 and Word 2007 files are structured.

Background

Office 2007 uses a zipped series of XML files to construct the various file types.
These file structures are detailed in the Office Open XML Format specification.
Some helpful summaries are available here and here.
Some Word 2007 specific information is available here.

Another resource is "Digital Forensics with Open Source Tools" by Altheide & Carvey pp.201-203.

The 1 minute briefing is:
- Office Open XML files can be unzipped into a consistent file structure.
- At the top level, there is a "_rels" folder, "docProps" folder, "[ContentTypes].xml" file and an application specific folder (eg "word" or "xl"). Most sub-folders will also contain a "_rels" folder (containing various .xml.rels files) which are then used to construct relationships between the various folders/files (via an XML schema).
- Document metadata such as Subject, Creator are located in "docProps/core.xml". Additional application specific metadata (eg word count, template used) is located in "docProps/app.xml".
- "[ContentTypes].xml" lists all files (aka "document parts") which make up the ZIP archive.
- Within the application specific folder, there is a "_rels" folder and other various folders and XML files which contain the actual user content (eg headers, footers, footnotes, graphics, comments, Word document text, Excel worksheet contents).
- Any media inserted into a document has it's original filename changed to a generic filename (eg "image1.png") but these files can still retain their internal metadata (eg EXIF data).

After unzipping some sample .xlsx and .docx files, I found a list of fonts in the "xl/styles.xml" and "word/fontTable.xml" files respectively. Powerpoint 2007 .pptx files seem to be different - I could not find a central non-theme related file that lists all fonts used :(

XML documents look similar to HTML. Basically, an "element" (or "node") is considered the smallest building block of an XML schema. An element is delineated by a beginning "tag" and an end "tag".  Elements can also contain "attributes" which are indicated by the use of "=" within a "tag".

It'll probably help if I list an (edited) "word/fontTable.xml" excerpt here:

<w:fonts xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:font w:name="Calibri"> ... </w:font></w:fonts>

So we can see that there is a "w:fonts" element which in turn contains a "w:font" element. The"w:font" element has an "attribute" called "w:name" which has the value "Calibri" (which is what we're interested in).

So now we know WHAT we want to read, we need a way of programmatically reading the XML.
I found this helpful nugget of code on the StackOverflow forum.

Enter ... CPAN's XML::XPath!

The Code

#CODE BEGINS ON NEXT LINE
#!/usr/bin/perl -w

# MS Office 2010 fonts
# http://support.microsoft.com/kb/2121313

# MS Office 2k7 fonts
# http://support.microsoft.com/kb/924623

# MS Office2k3, 2k, 97 fonts
# http://support.microsoft.com/kb/837463


use strict;

use Getopt::Long;
use XML::XPath;
#use XML::XPath::XMLParser;

my $version = "docx-font-extractor.pl v2012.04.29";
my $help = 0; # help flag
my $isdocx = 0;
my $isxlsx = 0;
my $fontfile = "";

# TODO my @OFFICE2007_FONTS = ();
# TODO my @OFFICE2010_FONTS = ();


GetOptions('help|h' => \$help,
    'd' => \$isdocx,
    'x' => \$isxlsx,
    'f=s' => \$fontfile);

if ($help || $fontfile eq "" || ($isdocx eq 0 and $isxlsx eq 0) || ($isdocx eq 1 and $isxlsx eq 1) )
{
    print("\n$version\n");
    print("Perl script to list fonts used in an MS Office .docx or .xlsx file\n");
    print("Assumes .docx or .xlsx has already been unzipped to a local directory\n\n");
    print("Example: docx-font-extractor.pl -d -f /home/sansforensics/word2k7/word/fontTable.xml\n");
    print("Example: docx-font-extractor.pl -x -f /home/sansforensics/excelbk1/xl/styles.xml\n");
    exit;
}

my $xpath = XML::XPath->new(filename => $fontfile);
my $nodeset;
my $xmlfontfield;

if ($isdocx)
{
    $nodeset = $xpath->find("/w:fonts/w:font");
    $xmlfontfield = "w:name";
}
elsif ($isxlsx)
{
    $nodeset = $xpath->find("/styleSheet/fonts/font/name");
    $xmlfontfield = "val";
}

print "Found ".$nodeset->size." results\n";

foreach my $node ($nodeset->get_nodelist)
{
    my $fontname = $node->getAttribute($xmlfontfield);
    print "Found font = $fontname\n";

    # TODO Lookup $fontname in list of stored Office fonts

    # TODO Print "The ... font is installed on MS Office ..."

}

#END CODE


Code Summary

We start off with a "GetOptions" and Help section as usual.

Then we create a new "XML::XPath" object passing in the user specified filename (ie "$fontfile") as the argument.

If the user has specified the .docx flag ("-d"), we get our XML::XPath object ($xpath) to "find" the MS Word specific font element ("/w:fonts/w:font"). Note: the "/"s are used to seperate the element names. We then set the "$xmlfontfield" variable to the attribute name we're interested in (ie "w:name"). We'll use this later.

Similarly, if the user has specified the .xlsx flag ("-x"), we get our XML::XPath object ($xpath) to "find" the MS Excel specific font element ("/styleSheet/fonts/font/name") and then set "$xmlfontfield" to the attribute name we're interested in (ie "val").

Our various calls to "$xpath->find" should return results in the form of an XML::XPath::NodeSet object. We can then call XML::XPath::NodeSet's "get_nodelist" to return a list of search results. Next we can  iterate through each "node" and print out the relevant "attribute" value by calling the XML::XPath::Node::Element's "getAttribute" function with the "$xmlfontfield" variable we set previously.

Testing

After unzipping example .xlsx and .docx files to their respective "/home/sansforensics/" sub-folders (eg "unzip /cases/excel2k7-Book1.xlsx -d /home/sansforensics/excelbk1/") we now run our Perl script.

sansforensics@SIFT-Workstation:~$ ./docx-font-extractor.pl -x -f /home/sansforensics/excelbk1/xl/styles.xml
Found 4 results
Found font = Calibri
Found font = Calibri
Found font = Calibri
Found font = Arial Black
sansforensics@SIFT-Workstation:~$


For confirmation, here's the example listing of "xl/styles.xml":

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><fonts count="4"><font><sz val="11"/><color theme="1"/><name val="Calibri"/><family val="2"/><scheme val="minor"/></font><font><b/><sz val="11"/><color theme="1"/><name val="Calibri"/><family val="2"/><scheme val="minor"/></font><font><b/><u/><sz val="11"/><color theme="1"/><name val="Calibri"/><family val="2"/><scheme val="minor"/></font><font><sz val="11"/><color theme="1"/><name val="Arial Black"/><family val="2"/></font></fonts><fills count="2"><fill><patternFill patternType="none"/></fill><fill><patternFill patternType="gray125"/></fill></fills><borders count="1"><border><left/><right/><top/><bottom/><diagonal/></border></borders><cellStyleXfs count="1"><xf numFmtId="0" fontId="0" fillId="0" borderId="0"/></cellStyleXfs><cellXfs count="5"><xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/><xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/><xf numFmtId="0" fontId="1" fillId="0" borderId="0" xfId="0" applyFont="1"/><xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1"/><xf numFmtId="0" fontId="3" fillId="0" borderId="0" xfId="0" applyFont="1"/></cellXfs><cellStyles count="1"><cellStyle name="Normal" xfId="0" builtinId="0"/></cellStyles><dxfs count="0"/><tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/></styleSheet>


The reason for the 3 Calibri declarations appears to be due to the use of "normal", "Bold" and "Bold Underlined" versions of the Calibri font. The bold Calibri font is declared using a "<b/>" following the second "<font>" element tag. Similarly, I suspect "<b/><u/>" represents "Bold Underlined". Unfortunately, Word 2007 does not seem to implement the same XML mechanisms for denoting "Bold" and "Underline" fonts as Excel 2007.

Here's our script's results with Word 2007:

sansforensics@SIFT-Workstation:~$ ./docx-font-extractor.pl -d -f /home/sansforensics/word2k7/word/fontTable.xml
Found 3 results
Found font = Calibri
Found font = Times New Roman
Found font = Cambria
sansforensics@SIFT-Workstation:~$


For confirmation, here's the example listing of "word/fontTable.xml":

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:fonts xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:font w:name="Calibri"><w:panose1 w:val="020F0502020204030204"/><w:charset w:val="00"/><w:family w:val="swiss"/><w:pitch w:val="variable"/><w:sig w:usb0="A00002EF" w:usb1="4000207B" w:usb2="00000000" w:usb3="00000000" w:csb0="0000009F" w:csb1="00000000"/></w:font><w:font w:name="Times New Roman"><w:panose1 w:val="02020603050405020304"/><w:charset w:val="00"/><w:family w:val="roman"/><w:pitch w:val="variable"/><w:sig w:usb0="20002A87" w:usb1="00000000" w:usb2="00000000" w:usb3="00000000" w:csb0="000001FF" w:csb1="00000000"/></w:font><w:font w:name="Cambria"><w:panose1 w:val="02040503050406030204"/><w:charset w:val="00"/><w:family w:val="roman"/><w:pitch w:val="variable"/><w:sig w:usb0="A00002EF" w:usb1="4000004B" w:usb2="00000000" w:usb3="00000000" w:csb0="0000009F" w:csb1="00000000"/></w:font></w:fonts>


Note: Word's completely different XML schema compared to Excel's schema.

Conclusion

We have successfully written a "docx-font-extractor.pl" Perl script to extract the generic font names used from (unzipped) MS Office 2007 .docx and .xlsx files.

Whilst the extracted font names generally correspond to Microsoft's list of installed fonts - they do not match exactly. For example, the MS Office 2007 font table lists "Calibri (TrueType)", "Calibri Bold (TrueType)", "Calibri Italic (TrueType)", "Calibri Bold Italic (TrueType)" and our script can only report the attribute name ie "Calibri".
If we had a 1:1 mapping, then it would be simple enough to declare an array list of fonts for each MS Office release and then search them for each font found (see TODO comments in code).
Unfortunately, it looks like there's more work required to extract any further font metadata from each of the differing Excel/Word font mechanisms. If only MS had made their font attributes consistent throughout Office *sigh*. And having the full font names in the XML would have been nice too (eg "Calibri Bold (TrueType)") *double sigh*. Anyhoo, that's where I'll leave things (for now) ... I'll finish up by wishing you *Sigh O Nara* ;)

Tuesday 10 April 2012

I Thought I Saw A Twittering TweetDeck Monkey


Girl, Unallocated recently asked why I wasn't on Twitter. I'll admit that when Twitter first came out I was pretty skeptical - "Who would want to read about my over-ripe bananas?" and "140 characters isn't nearly enough to insult someone (creatively)". Not having a Twitter capable mobile device was just another nail in the coffin. However, after GU's recent questioning, I've since discovered that a lot of forensicky update notifications occur on Twitter and it's also a great way to stalk meet new people (I am obviously joking ... I don't really care about forensicky updates ;).
Anyhoo, I started looking into a couple of PC Twitter clients called TweetDeck and MetroTwit. A recent Youtube demo video showed MetroTwit using 600 Mb RAM just in normal ops. When you consider that the ancient WinXP PC I'm using has 2 GB RAM and I also like to run SIFT on VMWare, that made the decision pretty easy. Hello, TweetDeck! *I also noticed that Corey Harrell has used it - so I know at least one other person might find this post useful/semi-interesting ;)

I did a brief check after installation, and TweetDeck "only" uses ~100 Mb. A bit steep for 140 character messages, but methinks my trusty old PC can live with that. Also, as TweetDeck has been bought by Twitter, you kind of hope they have their functionality and security all sorted. Except ... recently there was an issue with TweetDeck granting a user access to other people's TweetDeck/Facebook profiles. This user wasn't actively trying to subvert the system, he just opened up TweetDeck and there it all was. So TweetDeck shut down for a while and they (presumably) fixed the issue(s). Not particularly inspiring stuff eh?

Which brings me to the topic of this post - I was initially interested to see what artifacts (if any) TweetDeck leaves behind. And it was while monkeying around with TweetDeck that I discovered that the Twitter web search functions also return some interesting information.

TweetDeck v1.3 (on WinXP)

The first thing TweetDeck wants you to do after installing, is register an email address and password for the TweetDeck interface. Chances are, most people will use the same login/password as their Twitter account. Tsk, tsk.

After launching TweetDeck and adding my Twitter profile, I ran SysInternals TCPView and found that TweetDeck connects via TCP over port 443 (HTTPS) to both "twitter.com" and "amazon.com" domains. Presumably, Twitter has outsourced their storage of tweets/images to Amazon's cloud storage. These connections seemed to timeout if nothing is browsed.
I wasn't going to go hardcore and network capture the traffic because a shortish time ago, in a University not so far away, Mike Wilkinson told my Network Forensics class that intercepting live network traffic is best left to the cats with warrants (in this part of The Empire anyway).

So then I ran SysInternals ProcMon (v2.96) and tried to see if TweetDeck was reading/writing any account info to the Windows Registry ... No such luck! Either I missed the entries (not uncommon) or they just weren't there. I did however, notice that the following file got accessed quite a lot:

C:\Documents and Settings\USER\Local Settings\Application Data\twitter\TweetDeck\localStorage\qrc__0.localstorage

I have obviously changed the profile name to protect the guilty ;)

So firing up our trusty WinHex hex editor revealed the following file header snippet in "qrc__0.localstorage":

SQLite3 DB

Look familiar? Those sneaky TweetDeck Developers!
So I copied the "qrc__0.localstorage" over to SANS SIFT to play around with it a little more (using the SQLite Manager Firefox Plugin) and here's what I found:
- The file contains one SQLite table called "ItemTable"
- "ItemTable" consists of a "rowid", "key" and "value" column fields
- "rowid" seems to increase with use - presumably as key/value fields are updated

Most of the "value" fields appear to be JSON encoded. More information about JSON is available here. But the 1 minute summary is:

- JSON is an acronym for "JavaScript Object Notation"
- JSON is a text based encoding system based on name/value pairs
- Possible value types are: strings (delimited by ""), numbers (base 10, can be negative/decimal fractions/base 10 exponentials), objects, arrays, true, false, null
- An object is defined using { } to enclose a name separated by a colon and then the value.
eg {object_name: object_value}
- An array is defined using [ ] to enclose a series of comma separated values.
eg [array_value1, array_value2]
- It is possible to nest values. So as we will see later, you can have an array of tweet objects each of which has an array of key/value objects. This can make it difficult to sort out all the different brackets and commas.

So what are the actual key/value fields in the "ItemTable" database?

Normally, there are 7 key/value pairs in the "ItemTable" database table:
"tweetdeck_account" = An "email" object representing the TweetDeck user's login (eg "user@email.com").
"columns" = Suspect it represents user column arrangement preferences for the GUI.
"hoard" = An object with it's name derived from the "email" object value (eg "user@email.com"). Used to store a 312 character alphanumeric string which I suspect is a hash/encrypted string (or maybe a group of such strings).
"clients" = Several objects representing account information plus the last 4 TweetDeck search terms.
"__version__" = 2.
"_session" = A 44 character alphanumeric string which probably represents a session key. It seems to change with each launch.
"feeds" = Suspect it has to do with the user defined columns each having its own data connection.

Additional testing revealed that if "Keep Me Signed In" is ticked on the TweetDeck login page and then TweetDeck is closed, these artifacts remain readable in "ItemTable" (using SQLite Manager).
If TweetDeck is closed and "Keep Me Signed In" is NOT ticked, then "ItemTable" appears empty (in SQLite Manager) . However, the data is still appears to be (partially) viewable with WinHex (looking at "qrc__0.localstorage").

By chance, I copied an active/open "qrc__0.localstorage" to SIFT and noticed that there were 2 extra key/value fields in "ItemTable" when TweetDeck is running. These field names are prepended with a ":" (indicating they're in shared memory) and are called:
":CACHED_HASHTAGS"  = Had no values
":CACHED_TWITTER_USERS_V2" = Appears to be a cache of Twitter user profiles that the TweetDeck user is "following". Example format follows:
{"lowercase_handle1":["123456781", "CapitalizedHandle1", "Proper Name1", "http link to profile1 pic"], "lowercase_handle2":["123456782", "CapitalizedHandle2", "Proper Name2", "http link to profile2 pic"]}

Note: The numerical fields turn out to be Twitter user ID numbers.

I was only able to see these additional fields after copying an active "qrc__0.localstorage". The weird thing was, after closing/reopening/closing (staying logged in) TweetDeck, I was still able to see these fields in both SQLite Manager and WinHex. After I subsequently opened/logged out/closed TweetDeck, these fields were no longer visible in SQLite Manager but some of the data still appears viewable with WinHex.
Perhaps copying the active file meant the contents of shared memory were also auto-magically written to the new file copy. The subsequent opening(s) of this database file then showed the previously recorded ":" fields in the database.
And perhaps TweetDeck/SQLite doesn't delete/overwrite all the data in a table when "deleting" a table's contents (eg after logging out). Perhaps it just deletes the addressing mechanism? Hence, while you can't see the table in SQLite Manager, you can still see some data using WinHex? To quote Doris Day et al: "Perhaps, perhaps, perhaps".

OK, so we've taken a look at "ItemTable" - now what?

One of the more interesting things I noticed in "ItemTable" was that the "columns", "clients" and "feeds" fields mention a recurring multiple digit number sequence throughout each. There were also similar looking numbers present in ":CACHED_TWITTER_USERS_V2".
Suspecting this number to be a Twitter user ID number, I went to the Twitter Developer site and found this article regarding the Twitter REST API (API = Application Programming Interface). It lists lots of searches that you can perform from the comfort of your favourite browser. These search queries are run against the Twitter database via a Twitter web interface. The number of searches may be limited per hour and may not be all-inclusive.
So to confirm that the multi-digit number I was seeing in my "ItemTable" table was a valid Twitter user ID number, I typed the following into Firefox on SIFT.

https://api.twitter.com/1/users/lookup.json?user_id=XXXXXXXXX
where XXXXXXXXX = Twitter user ID number

SIFT then auto-magically launched a save dialog for the returned JSON file. I then used gedit to display it and saw a bunch of JSON text. The first key/value field was "id":XXXXXXXXX followed later by the corresponding "name" and "screen_name" values of the TweetDeck user. The JSON also showed various other profile info which is described later in the testing section below.

So, given a viable TweetDeck "qrc__0.localstorage" database file, we can derive a TweetDeck user's Twitter user ID number. We can then use the Twitter REST API with that user ID number to determine that user's handle name. Subsequently, I also ran the lookup query using the numbers found in the ":CACHED_TWITTER_USERS_V2" user profiles. These numbers were found to correspond to their surrounding handle name.


More Twitter REST API Searches

We can also use the Twitter user ID number/handle to perform other queries with the Twitter REST API:

For retrieving a Twitter user's Profile info we can use:
https://api.twitter.com/1/users/show.json?user_id=XXXXXXXXX
OR
https://api.twitter.com/1/users/show.json?screen_name=TwitterHandle
Note: the "screen_name" argument is not case sensitive
For more info see here.

To list a Twitter user's "follower" ID numbers (eg who is following user_id=XXXXXXXXX?) we can use:
https://api.twitter.com/1/followers/ids.json?user_id=XXXXXXXXX
OR
https://api.twitter.com/1/followers/ids.json?screen_name=TwitterHandle
For more info see here.

To list a Twitter user's friend ID numbers (eg who is user_id=XXXXXXXXX following?) we can use:
https://api.twitter.com/1/friends/ids.json?user_id=XXXXXXXXX
OR
https://api.twitter.com/1/friends/ids.json?screen_name=TwitterHandle
For more info see here.

Given a list of Twitter user ID numbers (eg from a friend/follower lookup), you can use a comma separated list to retrieve each Twitter user's profile info via:
https://api.twitter.com/1/users/lookup.json?user_id=XXXXXXXXX, YYYYYYYYY, ZZZZZZZZZ
For more info see here.

You can also search Twitter by term:
http://search.twitter.com/search.json?q=TwitterHandle
This will return the last weeks worth of tweets containing the given TwitterHandle (eg tweets at/from/mentioning). It might not return 100% of all possible results. For more info see here.
Alternatively, you can perform a conventional web search at:
http://twitter.com/#!/search
This will return the results within the browser. If you need to record these results, having a JSON file (via the first search method) might make documentation easier rather than a screenshot/printout for the second method.

Additional Resources (Stuff I didn't know where else to put)

This article describes how each tweet has its own unique ID number.
This website allows you to retrieve a Twitter user's handle (eg Batman) from their Twitter user ID number or vice-versa. The website also states that the numeric user ID is like a primary key (unique number) that does not change. So if a user changes their Twitter handle, they can still be found via their unique ID number.
I also found out about these similar sites (here and here) which translate from a Twitter user's handle to a Twitter user's ID number (one way only).
From my limited testing, the results from these sites appear to be consistent with each other. Given a choice however, I would trust the Twitter REST API over these webpages. All that stuff about about straight from the horses mouth etc.
Whilst doing a search for TweetDeck artifacts, I came across this since superseded article. It describes the database schema for a previous version of TweetDeck. It's interesting that the developers have since changed to a more generic (and harder to understand) schema of key/JSON value pairs. An attempt at extra security through obscurity?

Enter the Twit-Monkey!

Manually parsing through all this JSON text was making me cross(eyed) and I vaguely remember Harlan Carvey mentioning he had written a Perl script that read JSONs. Just FYI, I'm going for a name drop record in this post :)
Anyhoo, I thought it would take me as long to read through some of these search results as it would to write a Perl script to print out JSONs in a more readable format (how's that for cocky?). So then I *strutted* to CPAN and found the aptly named JSON Perl package. This package has functions to both encode/decode JSONs and best of all, it has a function to "pretty print" JSONs in a more human readable format.

To install it on SIFT I typed:
"sudo cpan JSON"
and then said yes to any dependency installs.

Now we should be ready to write the script. As usual, my code is pretty hack-tacular but it seems to work OK for each of the search types listed above.

The Code

#CODE BEGINS ON LINE BELOW
#!/usr/bin/perl -w

# Perl script to pretty print JSON files

use strict;

use Getopt::Long;
use JSON;

my $version = "json-printer.pl v2012-04-08";
my $help = 0;
my $filename="";
my @jsonlist;

GetOptions('help|h' => \$help,
            'f=s' => \$filename);

if ($help || $filename eq "")
{
    print("\nHelp for $version\n\n");
    print("Perl script to pretty print JSON files.\n");
    print("Example: json-printer.pl -f search-5.json\n");
    exit;
}

open(my $tfile, "<", $filename) or die "Can't Open $filename JSON File!";
@jsonlist = <$tfile>; # extract each line to a list element
chomp(@jsonlist); # get rid of newlines
close($tfile);

my $json_string = join(' ', @jsonlist); # join list elements into one big happy scalar string

my $json = JSON->new->allow_nonref; # create a new JSON object which converts non-references into their values for encoding
my $perl_scalar = $json->decode($json_string); # converts JSON string into Perl hash(es)

# at this point (if you wanted to) you can add code to iterate thru the hash(es) to extract/use values.

my $pretty_printed = $json->pretty->encode( $perl_scalar ); # re-encode the hash just so we can then pretty print it (hack-tacular!)
print $pretty_printed;

#CODE ENDS ON LINE ABOVE


Code Summary

This short script ("json-printer.pl") lets the JSON Perl package do all the heavy lifting. First, it reads the user specified JSON file into a scalar string ("$json_string") and then it calls "$json->decode" to convert the string into a Perl hash data structure. The script then calls "$json->pretty->encode" to convert that hash data structure into a more human readable string format. It's a round-about way of doing things but it seems to work.

Note: "$perl_scalar" is a reference to a hash data structure. If you wanted to, you could use it to iterate through the hash structure and retrieve any of the individual values.

Code Testing

You'll just have to trust me on this and/or try out the code yourself. The script above is able to read the JSON data returned by the various Twitter REST API searches listed previously.
I don't feel comfortable displaying other people's tweeting information here (eg their user id, what method they use to tweet). Its probably no more detailed than you could get by searching www.twitter.com/search but I just don't think it's my place.

Anyhoo, as some of the REST API searches mentioned previously return information about individual tweets (eg in a user profile's "status" field), I thought it would be interesting to have a look at these returned tweet fields (without divulging real data). Each tweet in a JSON appears to have the following format:

"to_user_id" = Destination Twitter user ID number (in unquoted numeric format).
"source" = Source URL which can show how tweet was posted (eg via twitter.com, tweetdeck, tweethopper etc). Some URL arguments  may also list the device (eg iPad, Android).
"profile_image_url" = HTTP URL showing the source's avatar filename.
"profile_image_url_https" = HTTPS URL showing the source's avatar filename.
"created_at" = Date and Time of tweet creation.
"text" = Tweet text including @ addressing (eg "@Superman How do you keep your tights so clean?").
"id" = Unique tweet ID number (in "" quoted string format).
"in_reply_to_status_id" = When available, shows the tweet ID number this tweet is replying to (in "" quoted string format).
"from_user_id_str" = Source's Twitter user ID number (in "" quoted string format).
"from_user" = Source's Twitter user handle (eg "Batman").
"geo" = Source's(?) location information.
"from_user_id" = Source's Twitter user ID number (in unquoted numeric format).
"id_str" = Unique tweet ID number (in "" quoted string format).
"iso_language_code" = Presumably the language used to tweet (eg "en" = English).
"to_user_id_str" = Destination's Twitter user ID number (in "" quoted string format). Can also be null.
"to_user" = Destination's Twitter user handle string (eg "Superman"). Can also be null.
"from_user_name" = Source's Twitter user personal name string (eg "Bruce Wayne")
"to_user_name" = Destination's Twitter user personal name string (eg "Clark Kent"). Can also be null.

Note: This list is not all inclusive (eg there's some optional reply fields I've left out).

Twitter help uses the term "Username" to denote the addressing mechanism name (eg @Batman). It then uses "Name" to represent your personal name ID (eg "Bruce Wayne"). Got it? Now here's where it might get a little confusing ...
In the above tweet message, the "to_user_name" and "from_user_name" contain the personal ID Names (eg "Bruce Wayne"). The "to_user" and "from_user" contain the addressing mechanism names (eg "Batman").

Thanks to Frank McClain for sending me a "Mice" tweet to analyse/help confirm my findings. Quack-tacular!

Some of the REST API searches mentioned previously may return a Twitter user's Profile information. Here are some selected Profile fields I found interesting:

"friends_count" = How many Twitter users this user is following (in unquoted numeric format).
"profile_image_url" = HTTP URL string showing user's avatar image filename.
"profile_background_image_url_https" = HTTPS URL string showing user's profile background image filename.
"url" = User's website URL string (eg "www.blog.com").
"id" = User's Twitter user ID number (in unquoted numeric format).
"screen_name" = User's handle/screen name (eg "Batman").
"location" = User's location string (eg "Gotham").
"lang" = Presumably the user's tweet language (eg "en" for English).
"followers_count" = How many Twitter users are following this user (in unquoted numeric format).
"name" = User's personal name ID (eg "Bruce Wayne").
"description" = User's witty description of themselves (eg "The Dark Knight").
"status" = Object containing the user's most recent (unprotected) sent tweet (see above tweet format for more details).
"profile_image_url_https" = HTTPS URL string showing user's avatar image filename.
"created_at" = Date and Time when the user's account was created.
"utc_offset" = Appears to be the numeric timezone offset in seconds from GMT (eg -21600).
"profile_background_image_url" = HTTP URL string showing user's profile background image filename.
"protected" = Boolean presumably representing whether the user's tweets are protected (eg true/false).
"id_str" = User's Twitter user ID number (in "" quoted string format).
"time_zone" = Appears to be string based representation of time zone (eg "Central Time (US & Canada)").
"geo_enabled" = Boolean presumably representing if the user's location is disclosed.
"profile_use_background_image" = Boolean representing if the user is using their background profile picture.

Adding to the name confusion mentioned earlier, we now see the "screen_name" and "name" keys. In this case, "screen_name" seems to refer to the addressing mechanism name (eg "Batman") and "name" refers to the personal ID name (eg "Bruce Wayne"). And while not explicitly present in these fields, there's also the commonly used "handle" term (eg "Batman"). Also note that "id" in a tweet message refers to the unique tweet ID number where as "id" in a profile represents a Twitter user's ID number ... Consistency? Pffft! We don't need no stinkin' consistency! Everyone's a critic huh?

Summary

We've installed TweetDeck and seen that it can store/leave behind the last 4 user search terms and multiple Twitter user ID number artifacts. These user ID numbers can then be used to find corresponding Twitter handles/addressing mechanisms via a Twitter REST API web search. The Twitter REST API can also be used to retrieve other information about the user's followers/leaders/profile.
We have also coded a small Perl script ("json-printer.pl") to make it easier to read the JSON formatted results returned from the Twitter REST API.
Any comments/suggestions/requests for further name droppage (the record now stands at 5 / 6 if you count Doris) are welcome.
FYI This will probably be my last post for a few weeks (month?) as I have to swap my PC for some renovation work boots. I will still be contactable via email / Twitter though. Hopefully, I'll be back with some new ideas and all of my digits in the near future ;)

Tuesday 3 April 2012

Creating a RegRipper Plugins Maintenance Perl Script


I recently asked Corey Harrell about any unfulfilled programming ideas he might have and he told me about his idea for a RegRipper Plugins maintenance Perl script.

He was after a script that would go into the plugins directory and then tell the user what plugins were missing from the various "Plugins" text files. Note: I am using "Plugins" to refer to the "ntuser", "software", "sam", "system", "security" and "all" text files. These contain a list of Perl (*.pl) plugins that a user can use to batch run against a particular hive.
He also mentioned that it might be handy to be able to exclude certain files (eg malware plugins) from being considered so if a user has deleted certain entries, they aren't constantly being reminded "Hey! There's a difference between what's in the "Plugins" file and what's actually available (ie the *.pl plugins)".

UPDATE: This script was not intended to be used to blindly load every possible *.pl plugin into their relevant "Plugins" text file. It was intended for use as a diagnostic tool "What (*.pl) plugins are missing from my current "Plugins" file?" (eg "ntuser"). Users can then look at the missing plugins list and determine if they should/should not include them in their "Plugins" text file.

UPDATE: Script ("PrintMissingPlugins" function) has been updated so commented-out entries (eg #ccleaner) in "Plugins" text files are not used in comparisons for missing *.pl plugins. Thanks Francesco!

UPDATE: Script ("PrintMissingPlugins" function) has been updated so blank lines in "Plugins" text files are not used in comparisons for missing *.pl plugins Have also updated the Code and Testing sections.

So let's see how we can implement Corey's idea ...

The Code

# CODE STARTS ON LINE BELOW
#!/usr/bin/perl
# Note: I removed the "-w" above because the eval/require/getHive section was generating numerous variable masking warnings
# This shouldn't affect us as we only call the plugin's getHive method.

# Perl script to list updates for RegRipper plugin files (ntuser, software, sam, system, security, all)
# Original Idea by Corey Harrell
# Coded by cheeky4n6monkey@gmail.com
# Note: Probably best if you do NOT run this script from the RegRipper Plugins directory
# as it could affect the *.pl plugin processing.

# Created (2012-04-02)
# Modified (2012-04-03) for handling "#" commented out lines in "Plugins" text files (Thanks Francesco!)
# Modified (2012-04-04) for handling blank lines in "Plugins" text files (Thanks Francesco!)
# Modified (2012-04-04) re-instated "No missing plugins message"

use strict;

use Getopt::Long;
use File::Spec::Functions qw/catfile/;

my $version = "regripper-maint.pl v2012-04-04";
my $help = 0;
my $plugindir = "";
my $excludefile = "";
my @pluginslist;

GetOptions('help|h' => \$help,
    'dir=s' => \$plugindir,
    'x=s' => \$excludefile,
    'p=s@' => \@pluginslist);

if ($help || @pluginslist == 0  || $plugindir eq "")
{
    print("\nHelp for $version\n\n");
    print("Perl script to list discrepancies for RegRipper Plugin files (ntuser, software, sam, system, security, all)\n");
    print("\nUsage: regripper-maint.pl [-h|help] [-dir plugins_dir] [-x exclude_files] [-p plugins_list]\n");
    print("-h|help ............ Help (print this information). Does not run anything else.\n");
    print("-dir plugins_dir ... Plugins directory.\n");
    print("-x exclude_file .... Exclude text file which lists plugins to exclude from any checks.\n");
    print("-p plugins_list .... Plugins file(s) to be checked (eg all, ntuser).\n");
    print("\nExamples: \n");
    print("regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p ntuser -p sam\n");
    print("regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -x regripper-maint-exclude -p all\n\n");
    exit;
}

print "\nRunning $version\n\n";

# List of plugins derived from the default (ntuser, software, sam, system, security, all) plugins text files
my @ntuser_plugins;
my @software_plugins;
my @sam_plugins;
my @system_plugins;
my @security_plugins;
my @all_plugins;

# Directory file listing of *.pl files in user's plugins directory
my @readinplugins;
# @readinplugins list broken up by hive
my @ntuser_actual;
my @software_actual;
my @sam_actual;
my @system_actual;
my @security_actual;
my @all_actual;

my @excludelist;

# Extract entries from user nominated exclude file
if (not $excludefile eq "")
{
    open(my $xfile, "<", $excludefile) or die "Can't Open $excludefile Exclude File!";
    @excludelist = <$xfile>; # extract each line to a list element
    chomp(@excludelist); # get rid of newlines
    close($xfile);
    foreach my $ig (@excludelist)
    {
        print "Ignoring the $ig plugin for any comparisons\n";
    }
    print "\n";
}

# Read in the entries in the default Plugins text file(s)
# Plugin files have lowercase names
foreach my $plugin (@pluginslist)
{
    open(my $pluginsfile, "<", catfile($plugindir,$plugin) ) or die "Can't Open $plugin Plugins File!";
    if ($plugin =~ /ntuser/)
    {
        print "Reading the ntuser Plugins File\n";
        @ntuser_plugins = <$pluginsfile>; # extract each line to a list element
        chomp(@ntuser_plugins); # get rid of newlines
    }
    if ($plugin =~ /software/)
    {
        print "Reading the software Plugins File\n";
        @software_plugins = <$pluginsfile>;
        chomp(@software_plugins);
    }
    if ($plugin =~ /sam/)
    {
        print "Reading the sam Plugins File\n";
        @sam_plugins = <$pluginsfile>;
        chomp(@sam_plugins);
    }
    if ($plugin =~ /system/)
    {
        print "Reading the system Plugins File\n";
        @system_plugins = <$pluginsfile>;
        chomp(@system_plugins);
    }
    if ($plugin =~ /security/)
    {
        print "Reading the security Plugins File\n";
        @security_plugins = <$pluginsfile>;
        chomp(@security_plugins);
    }
    if ($plugin =~ /all/)
    {
        print "Reading the all Plugins File\n";
        @all_plugins = <$pluginsfile>;
        chomp(@all_plugins);
    }
    close $pluginsfile;
}

# This code for determining a package's hive was cut / pasted / edited from "rip.pl" lines 42-67.
# Reads in the *.pl plugin files from the plugins directory and store plugin names in hive related "actual" lists
# Note: the "all_actual" list will later be a concatenation of all of the "actual" lists populated below
opendir(DIR, $plugindir) || die "Could Not Open $plugindir: $!\n";
@readinplugins = readdir(DIR);
closedir(DIR);

foreach my $p (@readinplugins)
{
    my $hive;
    next unless ($p =~ m/\.pl$/); # gonna skip any files which don't end in .pl
    my $pkg = (split(/\./,$p,2))[0]; # extract the package name (by removing the .pl)
    $p = catfile($plugindir, $p); # catfile is used to create absolute path filename (from File::Spec::Functions)
    eval
    {
        require $p; # "require" needs to be inside an eval in order to import package functions ?
        $hive = $pkg->getHive(); # hive name should be UPPERCASE but could be mixed *sigh*
    };
    print "Error: $@\n" if ($@);

    if ($hive =~ /NTUSER/i )
    {
        push(@ntuser_actual, $pkg);
    }
    elsif ($hive =~ /SOFTWARE/i )
    {
        push(@software_actual, $pkg);
    }
    elsif ($hive =~ /SAM/i )
    {
        push(@sam_actual, $pkg);
    }
    elsif ($hive =~ /SYSTEM/i )
    {
        push(@system_actual, $pkg);
    }
    elsif ($hive =~ /SECURITY/i )
    {
        push(@security_actual, $pkg);
    }
    elsif ($hive =~ /ALL/i )
    {
        push(@all_actual, $pkg); # some .pl plugins have "All" listed as their hive
    }
}

# Calls PrintMissingPlugins to compare a Plugins text file list with an "actual" *.pl plugins list
foreach my $plugin (@pluginslist)
{
    if ($plugin =~ /ntuser/)
    {
        PrintMissingPlugins("NTUSER", "ntuser", \@ntuser_actual, \@ntuser_plugins);
    }
    if ($plugin =~ /software/)
    {
        PrintMissingPlugins("SOFTWARE", "software", \@software_actual, \@software_plugins);
    }
    if ($plugin =~ /sam/)
    {
        PrintMissingPlugins("SAM", "sam", \@sam_actual, \@sam_plugins);
    }
    if ($plugin =~ /system/)
    {
        PrintMissingPlugins("SYSTEM", "system", \@system_actual, \@system_plugins);
    }
    if ($plugin =~ /security/)
    {
        PrintMissingPlugins("SECURITY", "security", \@security_actual, \@security_plugins);
    }
    if ($plugin =~ /all/)
    {
        PrintMissingPlugins("ALL", "all", \@all_actual, \@all_plugins);
    }
}

# End Main

sub PrintMissingPlugins
{
    my $hive = shift; # hive name
    my $name = shift; # Plugins text file name
    my $actual_plugins  = shift; # reference to list of plugins derived from *.pl files
    my $listed_plugins = shift; # reference to list of plugins derived from Plugins text file

    my @missing_plugins; # list stores *.pl files which are NOT declared in a given Plugins text file
    my @missing_pl; # list stores Plugin entries which do NOT have a corresponding .pl file

    print "\nThere are ".scalar(@$actual_plugins)." $hive plugins in $plugindir\n";
    # We say "lines" because there can accidental multiple declarations *sigh*
    print "There are ".scalar(@$listed_plugins)." plugin lines listed in the $name Plugins file\n";
    print scalar(@excludelist)." plugins are being ignored\n";

    # Handle the "all" Plugin case discrepancy
    # There's a large mixture of different hive plugins listed in the "all" Plugins text file
    # and only a handful of plugins who actually return "All" as their hive.
    # At this point, @all_actual should only contain plugins which list "All" as their hive
    # In a fit of hacktacular-ness, we'll now also add the contents from the other "actual" arrays to @all_actual.
    # Otherwise, when we compare the list of "All" hive plugins (@$actual_plugins) with
    # the lines in the "all" Plugin (@$listed_plugins), there will be a lot of differences reported.
    if ($hive eq "ALL")
    {
        push(@$actual_plugins, @ntuser_actual);
        push(@$actual_plugins, @software_actual);
        push(@$actual_plugins, @sam_actual);
        push(@$actual_plugins, @system_actual);
        push(@$actual_plugins, @security_actual);
    }
    # From here on, @all_actual / @$actual_plugins will contain a list of every processed type of .pl plugin file

    # For each *.pl plugin file, check that it has a corresponding entry in the given Plugins text file
    foreach my $pkg (@$actual_plugins)
    {
        my $res = scalar(grep(/^($pkg)$/, @$listed_plugins)); # is this .pl listed in Plugins file ?
        my $ignore = scalar(grep(/^($pkg)$/, @excludelist)); # is this .pl being ignored ?
        if ( ($res eq 0) and ($ignore eq 0) )
        {
            push(@missing_plugins, $pkg);
        }
    }
    if (@missing_plugins)
    {
        print "\nThere are ".scalar(@missing_plugins)." plugins missing from the $name Plugins file:\n";
        foreach my $miss (@missing_plugins)
        {
            print $miss."\n";
        }
    }

    # For each line in the Plugins text file, check that it has a corresponding *.pl file
    foreach my $plug (@$listed_plugins)
    {
        # If this Plugins entry has been commented out (by a preceding "#") OR if it starts with a whitespace (newline),
        # skip to next entry so we don't get told there's no corresponding .pl file
        if ( ($plug =~ /^#/) or ( not $plug =~ /^\S/ ) )
        {
            next;
        }
        my $res = scalar (grep(/^($plug)$/, @$actual_plugins)); # does this Plugin entry have a corresponding .pl file ?
        my $ignore = scalar(grep(/^($plug)$/, @excludelist)); # is this Plugin entry being ignored ?
        if ( ($res eq 0) and ($ignore eq 0) )
        {
            push(@missing_pl, $plug);
        }
    }
    if (@missing_pl)
    {
        print "\nThere are ".scalar(@missing_pl)." plugins declared in the $name Plugins file with no corresponding .pl file:\n";
        foreach my $miss (@missing_pl)
        {
            print $miss."\n";
        }
    }

    if ( (@missing_plugins eq 0) and (@missing_pl eq 0) )
    {
        print "No missing plugins detected\n";
    }

}

# CODE ENDS HERE


Code Summary

Here's a high level overview of the code:
- Parses command line arguments using "GetOptions"
- Prints out Help message (if required)
- Extract entries from user nominated exclude text file (if one is specified)
- Read in the entries in the specified "Plugins" text file(s)
- Reads in the *.pl plugin names from the plugins directory and store them in hive related "actual" lists
- Calls "PrintMissingPlugins" subroutine to compare the differences between a "Plugins" text file list with an "actual" *.pl plugins list. Function also allows for commented out lines (eg "# COMMENT") and blank lines in "Plugin" files.

If you have RegRipper installed, you should already have the necessary Perl packages installed (eg File::Spec::Functions).

It's not the most concise/efficient code. If in doubt, I tried to make it more readable (at least for me!). I also made more comments in the code so I wouldn't have to write a lot in this section. I think that might prove more convenient than scrolling up/down between the summary and the code.

Finally, I should mention that I kept my "regripper-maint.pl" script (and my exclusion text file "regripper-maint-exclude") in "/home/sansforensics/". I didn't want the script to parse itself when looking for .pl plugin files. I suspect I could have just as easily called "regripper-maint.pl" from "/usr/local/src/regripper/". Meh.

Testing

We'll start testing (on SIFT v2.12 using the default RegRipper install) with the 2 examples given in the script's Help message.


sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p ntuser -p sam

Running regripper-maint.pl v2012-04-04

Reading the ntuser Plugins File
Reading the sam Plugins File

There are 98 NTUSER plugins in /usr/local/src/regripper/plugins/
There are 97 plugin lines listed in the ntuser Plugins file
0 plugins are being ignored

There are 1 plugins missing from the ntuser Plugins file:
ccleaner

There are 1 SAM plugins in /usr/local/src/regripper/plugins/
There are 1 plugin lines listed in the sam Plugins file
0 plugins are being ignored
No missing plugins detected
sansforensics@SIFT-Workstation:~$


Looks OK! I have not added my "ccleaner.pl" script to the "ntuser" Plugins file so the results make sense. We can check the number of lines in the "ntuser" Plugins file using the following:

sansforensics@SIFT-Workstation:~$ wc -l /usr/local/src/regripper/plugins/ntuser
97 /usr/local/src/regripper/plugins/ntuser
sansforensics@SIFT-Workstation:~$


Just as reported, there's 97 lines in the "ntuser" Plugins file. As for the SAM results:

sansforensics@SIFT-Workstation:~$ wc -l /usr/local/src/regripper/plugins/sam
1 /usr/local/src/regripper/plugins/sam
sansforensics@SIFT-Workstation:~$


Which seems OK as there's only one "sam" Plugins entry ("samparse").
Now let's try the second help example ...

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -x regripper-maint-exclude -p all

Running regripper-maint.pl v2012-04-04

Ignoring the ccleaner plugin for any comparisons

Reading the all Plugins File

There are 3 ALL plugins in /usr/local/src/regripper/plugins/
There are 204 plugin lines listed in the all Plugins file
1 plugins are being ignored

There are 2 plugins missing from the all Plugins file:
winlivemail
winlivemsn

There are 2 plugins declared in the all Plugins file with no corresponding .pl file:
port_dev
wlm_cu
sansforensics@SIFT-Workstation:~$


So this example shows the script referring to our "regripper-maint-exclude" exclusion file (which has one line containing "ccleaner"). Hence, it ignores any "ccleaner" plugin comparisons.
We can also see that while only 3 "ALL" *.pl plugins were found, there are 204 lines declared in the "all" Plugins file. The "all" Plugins file is a special case in that it can contain plugins which refer to more than one type of hive. See the code comments for more info on how I dealt with this (it ain't particularly pretty).
Anyhoo, we can also see that there are 2 .pl plugins which are NOT declared in the "all" Plugins file. Open the "all" file and verify it for yourself - there is no "winlivemail" or "winlivemsn" entry.
There are also 2 entries in the "all" Plugins file which don't have corresponding .pl files (ie "port_dev.pl" and "wlm_cu.pl" do not exist). That's gonna make it a bit hard to call those plugins eh?

And here is the same example test except WITHOUT using the "regripper-maint-exclude" exception file:

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/  -p all

Running regripper-maint.pl v2012-04-04

Reading the all Plugins File

There are 3 ALL plugins in /usr/local/src/regripper/plugins/
There are 204 plugin lines listed in the all Plugins file
0 plugins are being ignored

There are 3 plugins missing from the all Plugins file:
winlivemail
winlivemsn
ccleaner

There are 2 plugins declared in the all Plugins file with no corresponding .pl file:
port_dev
wlm_cu
sansforensics@SIFT-Workstation:~$


You can see that the "ccleaner" plugin is now included in the missing plugins.
Just to prove this isn't all smoke and mirrors, here are the results for all 6 hives (with no exclusions):

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p ntuser -p software -p sam -p system -p security -p all

Running regripper-maint.pl v2012-04-04

Reading the ntuser Plugins File
Reading the software Plugins File
Reading the sam Plugins File
Reading the system Plugins File
Reading the security Plugins File
Reading the all Plugins File

There are 98 NTUSER plugins in /usr/local/src/regripper/plugins/
There are 97 plugin lines listed in the ntuser Plugins file
0 plugins are being ignored

There are 1 plugins missing from the ntuser Plugins file:
ccleaner

There are 54 SOFTWARE plugins in /usr/local/src/regripper/plugins/
There are 54 plugin lines listed in the software Plugins file
0 plugins are being ignored
No missing plugins detected

There are 1 SAM plugins in /usr/local/src/regripper/plugins/
There are 1 plugin lines listed in the sam Plugins file
0 plugins are being ignored
No missing plugins detected

There are 44 SYSTEM plugins in /usr/local/src/regripper/plugins/
There are 44 plugin lines listed in the system Plugins file
0 plugins are being ignored
No missing plugins detected

There are 3 SECURITY plugins in /usr/local/src/regripper/plugins/
There are 3 plugin lines listed in the security Plugins file
0 plugins are being ignored
No missing plugins detected

There are 3 ALL plugins in /usr/local/src/regripper/plugins/
There are 204 plugin lines listed in the all Plugins file
0 plugins are being ignored

There are 3 plugins missing from the all Plugins file:
winlivemail
winlivemsn
ccleaner

There are 2 plugins declared in the all Plugins file with no corresponding .pl file:
port_dev
wlm_cu
sansforensics@SIFT-Workstation:~$


I subsequently viewed the relevant Plugin files and confirmed the number of plugin lines declared matched what was printed above. I could also have used the "wc -l" trick mentioned previously as well. Meh.

Summary

Using Corey Harrell's idea, we've coded a RegRipper maintenance script which can detect discrepancies between whats declared in Plugin text files and what .pl plugins actually exist.
While this script doesn't help you if you don't use the Plugin text files, it has still been an interesting programming exercise. I can feel my Perl-fu growing stronger by the day ... muhahaha!

And for those that don't know, Cheeky4n6Monkey is now (intermittently) on Twitter (@Cheeky4n6Monkey) ... and co-incidentally(?) getting a lot less work done! Thankyou to all those who have already extended their welcome.

As always, please leave a comment if you found this post helpful / interesting / snake oil. PS I will send a copy of this script to the RegRipper folks just in case they find it useful (Thanks Brett and Francesco!).