If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

CUTE Homework

Page history last edited by Koushik Sen 15 years, 2 months ago

CUTE Homework Part 1

Introduction

In this assignment, you need to implement a test input generation tool, similar to DART and CUTE. The details follow. Please contact Prof. Koushik Sen with any questions. The slides explaining the assignment can be found at http://srl.cs.berkeley.edu/~ksen/Assignment.ppt .

Requirement

Windows or Linux or Mac OS X. Latest version of gcc, ocaml, and gmp.

On Windows machine, you need the latest version of cygwin.

Installation

The infrastructure required to implement the tool can be downloaded from http://srl.cs.berkeley.edu/~ksen/assignment.tar.gz . Invoke the following commands to install the infrastructure.

tar zxvf assignment.tar.gz

cd assignment

cd cil

./configure

make

cd ../cute

make

Description

Input: A C program, say testme.c. Inputs to the program are specified using the following macros.

CUTE_integer(x)

CUTE_unsigned_integer(x)

CUTE_character(x)

CUTE_unsigned_character(x)

CUTE_short(x)

CUTE_unsigned_short(x)

CUTE_long(x)

CUTE_unsigned_long(x)

CUTE_long_long(x)

CUTE_unsigned_long_long(x)

See the C files in the directory cute/tests/ to understand the usage of these macros. In this assignment, we will assume that inputs can only be of integral primitive types, such as int, char, long.

Given a C program, we want to generate inputs using concolic execution. The generated inputs should explore all feasible paths of the program. The infrastructure provides two scripts: cutec and cute in the directory cute/. The cutec script takes a C file as an input. The cutec script generates an instrumented executable. For example, the following commands

cd cute/tests

../cutec testme.c

generate an instrumented executable “testme.exe”. If we execute the executable, it generates a file called “trace”. Try the following commands to see the trace file.

./testme.exe

cat trace

An execution of the instrumented executable file reads a special file “input”, if present, to initialize the inputs. If the input file is absent, then all the inputs are initialized to 0 during the execution. A complete snapshot of the above 4 commands is given below.

ksen@potus:~/assignment > cd cute/tests

ksen@potus:~/assignment/cute/tests> ../cutec testme.c

gcc -D_GNUCC -E -I.. -DCIL=1 testme.c -o /tmp/cil-8Bk4vw6o.i

/home/ksen/assignment/cil/obj/x86_LINUX/cilly.asm.exe --out /tmp/cil-dTVy5yWt.cil.c --domakeCFG /tmp/cil-8Bk4vw6o.i

testme.c:15: Warning: Body of function main falls-through. Adding a return statement

gcc -D_GNUCC -E -I.. /tmp/cil-dTVy5yWt.cil.c -o /tmp/cil-P496B6jS.cil.i

gcc -D_GNUCC -c -I.. -o /tmp/cil-f3jhrZ5y.o /tmp/cil-P496B6jS.cil.i

gcc -D_GNUCC -o testme.exe -I.. /tmp/cil-f3jhrZ5y.o ../cute.o

Executables are ready

run cute yourprogram testme -i 1000

ksen@potus:~/assignment/cute/tests> ./testme.exe

x=0 y=0

I am fine here

ksen@potus:~/assignment/cute/tests> cat trace

(140733652199464,_) = (0,_)

(140733652199500,_) = (140733652199488,0)

(140733652199456,_) = (0,_)

(140733652199448,_) = (0,4202148)

(140733652199500,_) = (int)(x1,_)

(140733652199440,_) = (0,_)

(140733652199496,_) = (140733652199480,0)

(140733652199432,_) = (0,_)

(140733652199424,_) = (0,4202148)

(140733652199496,_) = (int)(x2,_)

(140733652199416,_) = (0,4202152)

(140733652199408,_) = (0,_)

(140733652199404,_) = (140733652199500,0)

(140733652199392,_) = (0,_)

(140733652199388,_) = (140733652199496,0)

(140733652199376,_) = (0,_)

(140733652199372,_) = (140733652199500,0)

(140733652199276,_) = (140733652199372,_)

(140733652199472,_) = (0,2) * (140733652199276,0)

(140733652199476,_) = (140733652199472,0)

(140733652199360,_) = (0,_)

(140733652199356,_) = (140733652199496,0)

then:1 (140733652199476,0) == (140733652199356,0)

(140733652199344,_) = (0,_)

(140733652199340,_) = (140733652199496,0)

(140733652199336,_) = (140733652199340,0) + (0,10)

(140733652199328,_) = (0,_)

(140733652199324,_) = (140733652199500,0)

then:3 (140733652199324,0) != (140733652199336,10)

(140733652199312,_) = (0,4202163)

ksen@potus:~/assignment/cute/tests>

ksen@apc /cygdrive/d/sync/work/teaching/cs294-5/assignment/cute2/tests

A trace gives a sequence of instructions executed by the program. Instructions can be of two types: assignment or conditional. We next describe these two types of instructions in details.

An assignment instruction can be of the following form:

lvalue = rvalue + rvalue

lvalue = rvalue - rvalue

lvalue = rvalue * rvalue

lvalue = rvalue

lvalue = - rvalue

where an lvalue or a rvlaue is a pair of memory address and the value stored at the memory address. Thus a lvalue or a rvalue can be of the form

(m,*m)

If the memory address m is 0, then the pair represents a constant. If the value *m is _, then the value of *m is not required for the symbolic execution and we can ignore it. The value stored at a memory address, i.e. *m, is used if the symbolic state has no mapping for m during concolic execution. A special kind of assignment statement is of the following form.

(m,_) = (type)(xn,_)

This statement assigns the symbolic input value xn to the memory location m. An example of such an instruction is

(2280636,_) = (int)(x2,_)

A conditional instruction is always of the following forms.

then:branchid true

then:branchid rvalue < rvalue

then:branchid rvalue <= rvalue

then:branchid rvalue > rvalue

then:branchid rvalue >= rvalue

then:branchid rvalue == rvalue

then:branchid rvalue != rvalue

else:branchid true

else:branchid rvalue < rvalue

else:branchid rvalue <= rvalue

else:branchid rvalue > rvalue

else:branchid rvalue >= rvalue

else:branchid rvalue == rvalue

else:branchid rvalue != rvalue

where the prefix then:branchid means that we have taken the “then” branch of some “if-then-else” statement in the execution and the prefix else:branchid means that the execution has taken the “else” branch of some “if-then-else” statement. The “branchid” statically identifies the “then” or “else” branch---each “then” or “else” branch in the program has a unique branch id. The suffix gives the actual instruction executed during the execution. A suffix “true” means that the execution of the conditional generates the trivial symbolic constraint “true”.

Problem

You need to write a program (in any language such as Java, C, C++, Perl, or Python), say mycute, which will take a trace file and generate an input file. A sample input file is shown below. An input file gives a mapping from symbolic input values to concrete values.

sat

(= x1 -10)

(= x2 -20)

where each xi is a symbolic input value encountered during the execution and each integer is the concrete value that should be used in the next execution. Once you have produced a file “input”, if you re-execute the instrumented executable, then the program will take input values from the “input” file. Note that during the first execution of the instrumented executable, the input file should not be present. In the first execution, the program will initialize all inputs with 0. In subsequent executions, the instrumented executable will use inputs from the file “input”.

Your program will thus take a “trace” file and use it to perform concolic execution and generate the file “input” that will force the program to take a different path in the next execution. Like DART, you need to only worry about linear arithmetic and you should use the provided yices theorem prover for constraint solving. The details on how to use yices can be found at http://yices.csl.sri.com/. The following guidelines must be followed to get full points.

1. The search must be DFS; so you may need to maintain extra files between executions. Specifically, your algorithm should maintain the “stack” data structure (see Figure 3, 4, 5 of the DART paper) across executions.

2. Symbolic execution of statements and predicates should be done according to Figure 5 and 6 of the CUTE paper. If a memory location maps to a constant, the mapping should be removed from the symbolic state. This helps to save memory.

3. For every statement of the form (m,_) = (type)(xi,_), you must generate a couple of symbolic constraints: (xi <= MAX_type) and (xi >= MIN_type), where MIN_type and MAX_type are minimum and maximum values that a variable of “type” can store, respectively.

4. Your program “mycute” must return 1, when it has explored all paths in the program and 0, otherwise.

5. Running cutec on a C file generates a file “branchcount”. This file contains the total number of distinct “then” and “else” branches in the program. You should use the number in this file along with all “branchid”s encountered during testing to compute the percentage branch coverage.

6. You should implement the three optimizations mentioned in the CUTE paper to get bonus points.

A sample sequence of commands to test testme.c would be the following.

cd cute/tests

../cutec testme.c

rm input

./testme.exe

../mycute

./testme.exe

../mycute

./testme.exe

../mycute

After the last call to mycute, you should notify that a complete search has been done (since the testme.c has 3 execution paths.) Note that we invoke the two commands

./testme.exe

../mycute

iteratively to generate inputs one by one. The script cute can be used to automatically perform this iteration. Therefore, if you do not want to invoke the commands manually, you can call

../cute ../mycute testme.exe -i 3

where -i 3 says that we want to generate 3 inputs. Your program mycute must return 1 when a complete exploration of the computation tree is over; otherwise, it should return 0.

The directory tests has a number of sample C programs that you need to test. A script testall is provided to perform these tests in a batch.

CUTE Homework Part 2

In part 1, you implemented a depth first search (DFS) of the path space. However, DFS is not quite efficient in quickly achieving high branch coverage. In part 2, you need to implement your own search algorithm so that you can quickly achieve high branch coverage. Specifically, in each iteration, you need to pick a branch to negate according to some different strategy. You need to run your implementation on cute/tests/replace.c and report percentage branch coverage and the number of iterations. The fewer the number of iterations and the higher the percentage branch coverage, the better is your algorithm.

Due Date

February 11, 2009.

How to submit?

You should email me a link to the entire archive before 11 am on February 11, 2009 along with instructions to run it. Ideally, I will run tests/testall script and see if your implementation generates exactly the same set of inputs as CUTE. Please come to my office on February 12, 2009 afternoon to give me a demo.

Comments (2)

In case it helps others, I could not get cil's make to work until I
downgraded to Ocaml 3.09, as ./configure suggests to do. Hopefully that
could help others.

Eric Hernandez

I think it's important to specify the operating system. I believe Eric is running Linux. I've taken the advice given by configure and downgraded to the 3.09 version of Ocaml and it didn't solve the problem. I'm running MacOS.

You don't have permission to comment on this page.