Parsing binary files with regular expressions 1.0 Python script

SPONSORED LINKS

    Specification

  • Version: 1.0
  • File size: 0 KB
  • File name: index_txt
  • Last update:
  • Platform: Windows / Linux / Mac OS / BSD / Solaris
  • Language: Python
  • Price:Other Free / Open So
  • Company: Jim Ursetto (View more)

Parsing binary files with regular expressions 1.0 script description:



Parsing binary files with regular expressions 1.0 is a Python script for File Management scripts design by Jim Ursetto. It runs on following operating system: Windows / Linux / Mac OS / BSD / Solaris.
Parsing binary files with regular expressions script allows you to use the regular expression engine to parse binary files.

Publisher review:
Parsing binary files with regular expressions script allows you to use the regular expression engine to parse binary files. This script allows you to use the regular expression engine to parse binary files, especially those for which the struct module alone is inadequate.The typical way to parse binary data in Python is to use the unpack method of the struct module. This works well for fixed-width fields, but becomes more complicated when you need to parse variable-width fields. Perl's implementation of unpack accepts "*" as the field length, and even allows grouping with parentheses, which mitigates this problem. Python does not currently offer these features. Although you can dynamically generate a format string for unpack with a lot of slicing and calls to calcsize, the resulting code will likely be hard to read and error-prone.Fortunately, in some cases there is a simpler way to do it: use the regular expression engine to grab each field, and use struct.unpack on the results.First, you construct a regular expression (RE) describing the entire record structure, grouping each field you'd like to extract with parentheses, and compile it. To create the regular expression, you just have to remember that one character in the RE equals one byte in the record. So, the expression ".." would match any short (2 bytes). To match a variable-width field, the REengine will have to be able to recognize where the field ends. In a null-terminated string, for example, the field ends with a zero byte. You'd therefore look for any number of characters followed by a null byte: "(.*?)". Note the use of the non-greedy qualifier "?" -- this way, we only match up to the first null, rather than the last null in the buffer.When compiling, make sure to pass the re.DOTALL flag to the compiler, or it will consider bytes that happen to match ASCII '' to be newlines. Then, you use the findall method of the compiled expression object on your buffer. findall finds all non-overlapping matches, one match for each record. It returns a list of tuples, one for each match; each tuple will contain one element for each field you grouped in the RE.You still need to unpack the fields in the tuples before using them, since they're still strings rather than usable values. Generally, you'll call unpack once for each field, with only one format character. (You can also group multiple consecutive fixed fields in one set of parentheses in the RE, and then unpack them in one call. But that may get confusing.)The code above demonstrates how to unpack a binary file that has an indeterminate number of variable-width records, each consisting of a little-endian short, a null-terminated string, and two more shorts. It drops the resulting values into a list and also into a dictionary.This technique is useful when your variable-width fields are terminated with a sentinel, such as the zero-terminated strings described above. If your field length is embedded in the data, and you can't use the "p" (Pascal string) modifier, you'll probably have to resort to slicing the buffer up manually.This technique is also applicable even if your fields are all fixed-width. The findall method will operate on the entire buffer at once with a single regular expression, which saves you from having to dynamically create a long format string encapsulating all your data, or alternatively iterating over slices of the buffer.
Operating system:
Windows / Linux / Mac OS / BSD / Solaris

Related script downloads:

Latest script and internet news

Samsung launched the Premium Music Hub

Samsung launched the Premium Music Hub

Samsung finally gives consumers multiple audio streaming countries promised a while. Premium Music Hub allows users access to over 19 million songs, along with other useful functions.

Posted on: 30 May 2012 13:02 by A. Brown

This could be the first photos of iPhone 5

This could be the first photos of iPhone 5

IPhone\'s case came online in a few pictures that confirm rumors of a larger screen. This could be the first photos of the next iPhone, but nothing is really safe to Apple announcement.

Posted on: 30 May 2012 12:58 by A. Brown

IE 10 for Windows 8, comes with Adobe Flash support

IE 10 for Windows 8, comes with Adobe Flash support

After criticism received at Metro version of Internet Explorer 10 , initially offered no support for Adobe Flash technology, Microsoft has taken measures to correct this problem by working directly with Adobe to integrate the necessary components into the

Posted on: 25 May 2012 10:36 by A. Brown

Windows 8: boot to fast to furious

Windows 8: boot to fast to furious

Last year began to circulate on the Internet videos where Windows 8 systems boot in 7 seconds. Unfortunately, from this performance derives some problems.

Posted on: 25 May 2012 07:28 by A. Brown

SPREAD THE WORD

User Rating

Parsing binary files with regular expressions
Rating: 0.0 out of 5
Based on 0 ratings. 0 user reviews.

  • Currently 0.00 out of 5
  • 1
  • 2
  • 3
  • 4
  • 5