Extension to read pentium time stamp counter, an experiment

Bengt Richter bokr at oz.net
Mon Jan 28 01:12:26 EST 2002


Here's the result of calling readtsc various ways in an extension
I just came up with. (It should give Alex an alterantive to averaging
over thousands of loops, assuming a constant running clock ;-)

 [18:12] C:\pywk>python
 Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
 Type "copyright", "credits" or "license" for more information.
 >>> import readtsc
 >>> from readtsc import readtsc as r
 >>> g = readtsc.readtsc
 >>> def foo(c=g):
 ...     global r
 ...     return c()-c(),g()-g(),r()-r()
 ...
 >>> for i in range(10): print foo()
 ...
 (-1821.0, -522.0, -494.0)
 (-589.0, -458.0, -480.0)
 (-604.0, -439.0, -500.0)
 (-585.0, -439.0, -470.0)
 (-604.0, -439.0, -470.0)
 (-606.0, -439.0, -486.0)
 (-585.0, -440.0, -470.0)
 (-591.0, -439.0, -470.0)
 (-614.0, -439.0, -470.0)
 (-615.0, -449.0, -470.0)
 >>>
--
Is there a faster way to call from foo() ?
g() seems to be consistently faster than r(), and much
faster than c() above. The numbers are unscaled counts
on a 300mhz P2, so they're nearly all under 2 microseconds.
(OTOH, two calls in a row in C would be an order of magnitude
faster).

(The first number on this kind of loop is almost always
biggest due to extra time to load the CPU cache). Hm, I wonder
if the above are interacting in that regard. Maybe I should try
them separately.
... (later) yes, I'll have to look at byte code to see what I'm
_really_ timing I guess... since the lsb is ~ 3 1/3 nanoseconds,
differences could be due to subtle things differing between
different test loops, how many normal paths through hashing
depending on names, etc. ... to hungry now to pursue it ;-)


The extension is just for win32 right now, sorry. It shouldn't
be any big deal to port it to linux though. The code follows.
I don't know what the pentium detection part will do on AMD
processors, but I don't have one to test. Caveat executor, or
whatever the Latin is (since it's not emptor ;-) (NO WARRANTY)

BTW, rather than to abort the import, I substituted another
interface that will generate run time errors. I don't know if
that's a good idea, but I could see it as a way of adapting
dynamically to the current platform (I.e., you could switch
to code making use of e.g., MMX or special features if present,
though maybe the decision should be higher to import alternatives
using this technique just to warn?).

Also, I masked the counter value to 53 bits so that the least
significant bit would not be shifted out when coverting to double.

--
/* readtsc.c */
#include "Python.h"

#include <basetsd.h>
typedef UINT64 u64;
typedef  INT64 i64;


/* [R]ead the [T]ime [S]tamp [C]ounter */
/* readtsc XXX needs pre-check that CPU has RDTSC instruction*/ 
/* 64-bit value is masked to its 53 ls bits to preserve lsb in */
/* conversion to floating point double */
i64 __inline getTick53()
{
   __asm
   {
	rdtsc		; puts 64-bit counter in eax,edx
	and  edx, 0x0001ffff ; mask to 21+32=53 bits for conv to double
   }
   /* Return with result in EDX,EAX */
}

static PyObject *
ex_readtsc(PyObject *self, PyObject *args)
{
	if (!PyArg_ParseTuple(args, ":readtsc"))
		return NULL;
	return PyFloat_FromDouble((double)getTick53());
}

static PyMethodDef readtsc_methods[] = {
    {"readtsc", ex_readtsc, METH_VARARGS, 
        "[r]ea[d]s [t]ime [s]tamp [c]ounter (rdtsc) of pentium which better be there"},
	{NULL, NULL}
};

/* alternate methods when rdtsc is not available, to allow importing */
/**/
static PyObject *
ex_readtsc_na(PyObject *self, PyObject *args)
{
	if (!PyArg_ParseTuple(args, ":readtsc"))
		return NULL;
    PyErr_SetString(PyExc_RuntimeError, 
        "current CPU has no RDTSC instruction." );
        return NULL;
}

static PyMethodDef readtsc_unavail[] = {
    {"readtsc", ex_readtsc_na, METH_VARARGS, 
        "readtsc module loaded, but no RDTSC instruction available on current CPU"},
	{NULL, NULL}
};
/**/
int getPentiumFeatures()
{
   __asm
   {
	xor	eax,eax	; zero
	cpuid		; should return 1 if pentium
	xor	edx,edx	; zero features if not pentium
	test	eax,eax
	je	getpf_err	;no cpuid support
	mov	eax,1	; for features
	cpuid		; with eax==1, should put feat in edx
getpf_err:
	mov	eax,edx	; int return value in eax
   }
}

#define PENTIUM_FEATURES_TSC (1<<4)

void
initreadtsc(void)
{
	if( getPentiumFeatures() & PENTIUM_FEATURES_TSC){
		Py_InitModule("readtsc", readtsc_methods);
	} else {
		Py_InitModule("readtsc", readtsc_unavail);
	}
	 
}
--
The following two commands are single lines to compile and link.
There is probably unnecessary cruft. I extracted from the exported
make file. Adjust paths to fit your layout. Hopefully it'll work.
You could put the two in e.g., readtsc.mak.cmd and run that.
--
cl.exe/nologo /MT /W3 /GX /O2  /I "..\..\Python21\include" /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /D "_USRDLL" /D "READTSC_EXPORTS" /c .\readtsc.c
link.exe kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /dll /incremental:no /machine:I386 /EXPORT:"initreadtsc" /out:".\readtsc.dll" /implib:"readtsc.lib" /libpath:"..\..\Python21\libs" readtsc.obj"
--
In the same directory, you should then be able to run python and
have it import readtsc from that directory, or you can move it.

Regards,
Bengt Richter
If someone closer to intel has a safer and/or more up to date
version of the detection, please chime in. I didn't research it
very far, I'm afraid.




More information about the Python-list mailing list