Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

DAPL ERR

$
0
0

Hi

i`m execute WRF in symetric mode in one coprocessor succesfully but obtain this error on two copprocessors. can help me?:

 

[21] MPI startup(): shm and dapl data transfer modes

[17] MPI startup(): DAPL provider ofa-v2-scif0

[16] MPI startup(): DAPL provider ofa-v2-scif0

[17] MPI startup(): shm and dapl data transfer modes

[16] MPI startup(): shm and dapl data transfer modes

Meteo-Xeon-Phi-mic1:SCM:2dbb:f305e500: 216177 us(216177 us):  modify_qp_state: ERR type 2 qpn 0xe gid 0x2b3cf40229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2dbb:f305e500: 216348 us(171 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2dbb:f305e500: 216391 us(43 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

Meteo-Xeon-Phi-mic1:SCM:2db8:1f47b500: 186585 us(186585 us):  modify_qp_state: ERR type 2 qpn 0x14 gid 0x2b32240229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2db8:1f47b500: 186763 us(178 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2db8:1f47b500: 186845 us(82 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

[15:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

[16:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

Meteo-Xeon-Phi-mic1:SCM:2dbe:36708500: 196925 us(196925 us):  modify_qp_state: ERR type 2 qpn 0x1a gid 0x2aae3c0229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2dbe:36708500: 197101 us(176 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2dbe:36708500: 197182 us(81 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

Meteo-Xeon-Phi-mic1:SCM:2dbc:15493500: 225066 us(225066 us):  modify_qp_state: ERR type 2 qpn 0x21 gid 0x2acb180229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2dbc:15493500: 225237 us(171 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2dbc:15493500: 225315 us(78 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

[17:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

[18:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

Meteo-Xeon-Phi-mic1:SCM:2db9:60277500: 199595 us(199595 us):  modify_qp_state: ERR type 2 qpn 0x27 gid 0x2aff640229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2db9:60277500: 199760 us(165 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2db9:60277500: 199860 us(100 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

[19:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

Meteo-Xeon-Phi-mic1:SCM:2dba:73fc9500: 231631 us(231631 us):  modify_qp_state: ERR type 2 qpn 0x2e gid 0x2b84780229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2dba:73fc9500: 231800 us(169 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2dba:73fc9500: 231904 us(104 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

[20:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

Meteo-Xeon-Phi-mic1:SCM:2dbd:56c6500: 234974 us(234974 us):  modify_qp_state: ERR type 2 qpn 0x36 gid 0x2b0d000229ec (1) lid 0x3e9 port 1 state 1 mtu 4 rd 4 rnr 12 sl 0

Meteo-Xeon-Phi-mic1:SCM:2dbd:56c6500: 235152 us(178 us):  DAPL ERR modify_qp_state Invalid argument

Meteo-Xeon-Phi-mic1:SCM:2dbd:56c6500: 235195 us(43 us):  ACCEPT_USR: QPS_RTR ERR Invalid argument -> 10.10.10.1

[21:10.10.10.2][../../dapl_conn_rc.c:620] error(0x40000): ofa-v2-scif0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()

Assertion failed in file ../../dapl_conn_rc.c at line 620: 0

internal ABORT - process 0

[12:10.10.10.1] unexpected DAPL event 0x4006

Assertion failed in file ../../dapl_init_rc.c at line 1402: 0

internal ABORT - process 0

[13:10.10.10.1] unexpected DAPL event 0x4006

Assertion failed in file ../../dapl_init_rc.c at line 1402: 0

internal ABORT - process 0

[8:10.10.10.1] unexpected DAPL event 0x4006

Assertion failed in file ../../dapl_init_rc.c at line 1402: 0

internal ABORT - process 0

[10:10.10.10.1] unexpected DAPL event 0x4006

Assertion failed in file ../../dapl_init_rc.c at line 1402: 0

internal ABORT - process 0

[3:10.10.10.254] unexpected disconnect completion event from [10:10.10.10.1]

Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0

internal ABORT - process 3

[7:10.10.10.254] unexpected disconnect completion event from [10:10.10.10.1]

Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0

internal ABORT - process 7

[1:10.10.10.254] unexpected disconnect completion event from [10:10.10.10.1]

Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0

internal ABORT - process 1

[5:10.10.10.254] unexpected disconnect completion event from [10:10.10.10.1]

Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0

internal ABORT - process 5

My configuration is:

wrf_start.sh 

#!/bin/bash

ulimit -s unlimited

ulimit -l unlimited

export I_MPI_PIN_MODE=mpd

export I_MPI_PIN_DOMAIN=auto

export I_MPI_MIC=1

export I_MPI_DEVICE=rdssm

export I_MPI_DEBUG=5

rm rsl.*

rm wrfout*

mpiexec.hydra -host 10.10.10.254 -n 8 ./wrf_sandy.sh : -host  10.10.10.1 -n 8 ./wrf_phi.sh  : -host  10.10.10.2 -n 8 ./wrf_phi.sh

phi.envars

#!/bin/sh

source /opt/intel/impi/4.1.3.048/mic/bin/mpivars.sh

export LD_LIBRARY_PATH=/opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/mic

export OMP_NUM_THREADS=30

export KMP_LIBRARY=turnaround

export KMP_BLOCKTIME=infinite

export KMP_STACKSIZE=32M

export OMP_SCHEDULE=STATIC

export KMP_AFFINITY=balanced

sandy.envars

#!/bin/sh

export OMP_NUM_THREADS=2

export KMP_LIBRARY=turnaround

export KMP_BLOCKTIME=infinite

export KMP_STACKSIZE=32M

export OMP_SCHEDULE=DYNAMIC

wrf_phi.sh

#!/bin/sh

ulimit -s unlimited

ulimit -l unlimited

source ./phi.envvars

./wrf.mic

wrf_sandy.sh

#!/bin/sh

ulimit -s unlimited

ulimit -l unlimited

source ./sandy.envvars

./wrf.exe

My system is one host with 2 coprocesor internal bridge, OS is SLES SP3 kernel 3.0.76-0.11 with OFED 1.5.4.1 and mpss 3.4

Thx in advance.

 

 


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>