PKS 5.0: puzzled by 32-bit vs 64-bit performance

Discussion in 'Cadence' started by anon_poster, Nov 7, 2004.

  1. anon_poster

    anon_poster Guest

    I got caught up in AMD's "64-bit hype" machine, so I went
    out and bought a cheap Athlon64 system, set it up with
    a whiteboxlinux.org (Redhat RHEL 3 clone) installation,
    then ran some simplistic synthesis tests.

    The results really confounded me:

    (A) = Athlon64 2800+ (1.8GHz Socket754), 1.0GB PC2700 CL2.5 DDR,
    (motherboard is an ECS 755-A2 v1.0)
    WhiteboxLinux 3.0 Respin1 x86_64 (similar to RHEL 3.0 update 2)

    (B) = dual Pentium3/S 1.26GHz, 4.0GB PC133 CL3 (reg,ECC) SDRAM
    (motherboard is a Supermicro P3TDDE)
    Redhat 8.0 linux, plus *all* released Redhat RPM updates

    (C) = Sun Blade-1000 2750 (dual USparc3 750MHz, 8MB cache), 8.0GB RAM
    Solaris 8, base installation (no updates)


    Machine Software RAM-usage runtime
    ------- ----------------- --------- -------
    (A) x86 PKS5 32-bit ~300MB 29min
    (A) x86 PKS5 64-bit ~400MB 22min
    (B) x86 PKS5 32-bit ~300MB 50min
    (C) SunOS PKS5 32-bit ~300MB 70min
    (C) SunOS PKS5 64-bit ~400MB 80min


    Software = Cadence SPR50 (October 2004 update)

    "Test-case" is a simple (<80Kgates) Verilog-HDL compile, from RTL all
    the way to placed-gates + clock-tree synthesis (no routing, no DFT, no
    lower-power stuff.)

    Here's the shocker...while the Solaris 64-bit ran *SLOWER* than its
    32-bit version -- the x86_64 platform did the exact opposite.
    The x86_64 binary ran *FASTER* than the x86 32-bit binary.

    ....

    What I want to know is ...

    (a) Did I setup something incorrectly? (Why does 64-bit on the
    linux platform run faster? Yet 64-bit on Solaris is slower?)

    (b) I didn't have a chance to try Intel's 64-bit IA32e Xeon.
    Does IA32e experience the same trend (i.e. 64-bit is faster)?

    (c) Why is the netlist-output different among all 4 platforms?!?
    I think I used identical setup-scripts for all 4 runs, but
    even among the same platform (Sun, x86), the 32-bit vs 64-bit
    QoR/area results differ. I guess this goes back to (a)

    (d) is my testcase consistant with other peoples' experiences?
    For example, does Synopsys's Design_Compiler follow the
    same trend?

    (e) When will all EDA vendors port *EVERYTHING* to x86_64? :)
     
    anon_poster, Nov 7, 2004
    #1
  2. I cannot speak to the Linux questions you ask, but Solaris is something
    I know a few things about. The 64bit Solaris run is using the same size
    processor cache as the 32bit run. When an application uses many
    pointers, there are fewer pointers stored in the cache in 64bit mode
    since the pointers are twice as large. This results in more cache
    misses, which cause the processor too have to go to RAM more often,
    resulting in more runtime.

    One might think the AMD and Intel 64bit processors would have the same
    issue, but processor designers do some very odd things at times. Maybe
    someone at AMD anticipated this problem and did something ingenious. If
    this is so, I hope AMD recognized that genius with a wad of cash.

    As for why AMD runs faster in 64bit mode, we can only speculate. It may
    be as simple as 32bit needing an extra step to convert addresses to
    64bit since the hardware is probably all 64bit pointers internally.
    Looking at the machine code might be very illuminating.
     
    Diva Physical Verification, Nov 7, 2004
    #2
  3. AMD also improved the x86 architecture at the same time. x86 originally
    had very few registers. x86_64 has more registers for example, and
    that can lead to better code generation. There are also other small
    improvements.

    --Kim
     
    Kim Enkovaara, Nov 8, 2004
    #3
  4. anon_poster

    gennari Guest

    Yes, in general you can't compare a 32-bit system with a 64-bit system based
    on the number of bits alone. There are many differences other than the bit
    width that affect performance: the architecture, memory access/bandwidth
    (64-bit systems may have higher bandwidth), cache size, etc. Also, if you
    compile a binary on a 32/64-bit system the compiler may or may not optimize
    for that particular bit width. If you're running a 32-bit OS or binary on a
    64-bit system, the 32-bit emulation might slow down software execution.

    Different simulation results might be due to several factors:
    The precision of 32-bit vs. 64-bit data values (though the floating-point
    numbers are probably all standard IEEE 64-bit)
    Different OS/compiler math libraries handling math exceptions such as divide
    by zeros differently
    Uninitialized memory (an error in the software)
    "Non-stable" functions such as sorts (or even the ordering of pointer
    values) may have been implemented differently in the libraries on the
    various systems

    Still, if the results differ significantly it's probably due to either user
    error or software error.

    Frank
     
    gennari, Nov 12, 2004
    #4
Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.