Looking for crossfold validation code

Sandy dksreddy at gmail.com
Sat Feb 20 09:44:52 EST 2010


Following is the code I use. I got it from web, but forgot the link.

def k_fold_cross_validation(X, K, randomise = False):
	"""
	Generates K (training, validation) pairs from the items in X.

	Each pair is a partition of X, where validation is an iterable
	of length len(X)/K. So each training iterable is of length
(K-1)*len(X)/K.

	If randomise is true, a copy of X is shuffled before partitioning,
	otherwise its order is preserved in training and validation.
	"""
	if randomise: from random import shuffle; X=list(X); shuffle(X)
	for k in xrange(K):
		training = [x for i, x in enumerate(X) if i % K != k]
		validation = [x for i, x in enumerate(X) if i % K == k]
		yield training, validation


Cheers,
dksr

On Feb 20, 1:15 am, Mark Livingstone <livingstonem... at gmail.com>
wrote:
> Hello,
>
> I am doing research as part of a Uni research Scholarship into using
> data compression for classification. What I am looking for is python
> code to handle the crossfold validation side of things for me - that
> will take my testing / training corpus and create the testing /
> training files after asking me for number of folds and number of times
> (or maybe allow me to enter a random seed or offset instead of times.)
> I could then either hook my classifier into the program or use it in a
> separate step.
>
> Probably not very hard to write, but why reinvent the wheel ;-)
>
> Thanks in advance,
>
> MarkL




More information about the Python-list mailing list