Re: EX4600: IPv6 multicast traffic causing all IRB interfaces to become unresponsive

January 20, 2020, 1:31 pm

≫ Next: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

≪ Previous: Betreff: EX4300 crash while attempting to bring up 10G interface

Hi mriyaz,

thanks for your reply. The serial console stays responsive, so I have tried to collect some stats from the commands.

I have simulated a lighter version of the IPv6 multicast "flood", just enough to trigger the IRB not available.

CPU utilization went from somewhere around 10 percent per user and kernel, and 0 per interrupts:

    5 sec CPU utilization:
      User                       8 percent
      Background                 0 percent
      Kernel                     6 percent
      Interrupt                  0 percent
      Idle                      86 percent

    5 sec CPU utilization:
      User                      20 percent
      Background                 0 percent
      Kernel                    15 percent
      Interrupt                  4 percent
      Idle                      61 percent

On the queue subject:
Without the multicast flood, MCQ_DROP_ lines are missing from the output, and the MC_PERQ_BYTE(28) is in order of tens of thousands. With the flood, the drops could be clearly seen:

root@eight:RE:0% cprod -A fpc0 -c 'set dc bc "show c cpu"'


HW (unit 0)
IBCAST.cpu0             :             3,090,549                  +2               1/s
ING_NIV_RX_FR.cpu0      :             6,454,862                  +2               1/s
MC_PERQ_PKT(8).cpu0     :            13,845,492                 +10
MC_PERQ_PKT(14).cpu0    :                14,126                  +1
MC_PERQ_PKT(16).cpu0    :             3,939,840                  +7
MC_PERQ_PKT(19).cpu0    :            43,736,339                +427               1/s
MC_PERQ_PKT(28).cpu0    :            52,214,959             +17,153              87/s
MC_PERQ_PKT(33).cpu0    :            22,367,484                +294
MC_PERQ_PKT(34).cpu0    :            17,237,213                 +45
MC_PERQ_PKT(43).cpu0    :            12,787,910                 +52
MC_PERQ_BYTE(8).cpu0    :         1,945,899,342              +3,148             445/s
MC_PERQ_BYTE(14).cpu0   :               959,596                 +68              41/s
MC_PERQ_BYTE(16).cpu0   :           365,627,616                +786              57/s
MC_PERQ_BYTE(19).cpu0   :         9,152,981,497             +80,180           4,494/s
MC_PERQ_BYTE(28).cpu0   :        42,519,089,155         +23,662,826       3,771,190/s
MC_PERQ_BYTE(33).cpu0   :         1,768,896,420             +25,026             913/s
MC_PERQ_BYTE(34).cpu0   :         1,185,830,552              +3,702             560/s
MC_PERQ_BYTE(43).cpu0   :         1,175,344,202              +4,282             768/s
MCQ_DROP_PKT(28).cpu0   :             1,946,432              +1,522             123/s
MCQ_DROP_BYTE(28).cpu0  :         2,658,538,788          +1,832,544         323,657/s

So the drops are clearly happening... since I can't find anything other in my config that could cause this, I'm probably going to contact our partner with this, to see what they have to say about it.

Best regards,
-Pavel

↧

Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

January 20, 2020, 4:59 pm

≫ Next: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

≪ Previous: Re: EX4600: IPv6 multicast traffic causing all IRB interfaces to become unresponsive

We had the exact same issue. We were running Junos 17.3R3-S3.3 on our stack of EX4600s and DHCP was working fine. We run a Windows based DHCP server and use the EX4600 to relay DHCP packets. We upgraded to Junos 18.1R3-S6.1 and DHCP stopped working. J-Tac pointed me to this PR which states the problem

https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1396470

The issue was Junos 18.1R3-S6.1 was the J-Tac recommended version at the time we did the upgrade (Nov. 9, 2019) even though it had this DHCP bug. We rolled back to Junos 17.3R3-S3.3 and everything was working again. On 12-15-2019 we decided to try the upgrade again and went to Junos 18.4R2-S2 which is still one of two J-Tac recommended versions for the EX4600. We once again had a DHCP issue and rolled back to 17.3R3-S3.3.

We thought about trying Junos 18.1R3-S8 which is the other recommended version but am concerned we will have this problem again. Did you solve this issue? What version of Junos are you running?

↧

Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

January 20, 2020, 5:52 pm

≫ Next: EX2200-C won't boot.

≪ Previous: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

Here is how we have DHCP relay setup and as I stated in my previous post this works fine with Junos 17.3R3-S3.3 but has not worked when upgrading to other J-Tac recommended versions

dhcp-relay {

server-group {

DHCP-Servers {

10.121.125.100;

10.121.125.101;

}

active-server-group DHCP-Servers;

group DHCP-Servers {

interface irb.8;

interface irb.28;

interface irb.32;

interface irb.36;

interface irb.40;

interface irb.44;

interface irb.48;

interface irb.52;

interface irb.56;

interface irb.57;

interface irb.58;

interface irb.59;

interface irb.60;

interface irb.61;

interface irb.144;

interface irb.148;

interface irb.152;

interface irb.156;

interface irb.160;

interface irb.164;

interface irb.200;

interface irb.208;

interface irb.216;

interface irb.224;

interface irb.232;

interface irb.240;

interface irb.242;

}

↧

EX2200-C won't boot.

January 20, 2020, 6:10 pm

≫ Next: Re: EX2200-C won't boot.

≪ Previous: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

I have a 2200-C that will not boot. It looks like it is going into activating the factory configuration and then nothing. Unfortunately, all I have is a picture of the error message. Sorry, it's not searchable. Any help here would be hot

↧

Re: EX2200-C won't boot.

January 20, 2020, 8:10 pm

≫ Next: Betreff: EX4300 crash while attempting to bring up 10G interface

≪ Previous: EX2200-C won't boot.

Hi,

Have you tried booting from USB?

Thanks

John

↧

Betreff: EX4300 crash while attempting to bring up 10G interface

January 20, 2020, 9:52 pm

≫ Next: Betreff: EX4300 crash while attempting to bring up 10G interface

≪ Previous: Re: EX2200-C won't boot.

Thanks for your recommendation. Will this be a step-wise upgrade or can I jump directly from 13.x to 18.x?

↧

Betreff: EX4300 crash while attempting to bring up 10G interface

January 21, 2020, 2:53 am

≫ Next: Betreff: EX2200-C won't boot.

≪ Previous: Betreff: EX4300 crash while attempting to bring up 10G interface

There are two possibilities:

1.) Step-wise upgrade, you can find the policy here:

https://www.juniper.net/documentation/en_US/junos/information-products/topic-collections/release-notes/18.4/topic-142592.html#jd0e4894

That means, in your case, if you want to upgrade to 18.4:

13.2X51 --> 15.1 --> 17.1 --> 17.3 --> 18.1 --> 18.3 --> 18.4

2.) If you have console access to the device, you can perform an USB recovery installation directly to the new image without the above steps. You can find the description here:

https://kb.juniper.net/InfoCenter/index?page=content&id=KB11752

https://kb.juniper.net/InfoCenter/index?page=content&id=KB20643

https://kb.juniper.net/InfoCenter/index?page=content&id=KB10386

↧

Betreff: EX2200-C won't boot.

January 21, 2020, 3:17 am

≫ Next: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

≪ Previous: Betreff: EX4300 crash while attempting to bring up 10G interface

Hello,

try to reinstall the device with an external USB device attached or by TFTP:

https://kb.juniper.net/InfoCenter/index?page=content&id=KB20643

https://kb.juniper.net/InfoCenter/index?page=content&id=KB11752

↧

Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

January 21, 2020, 6:39 am

≫ Next: Betreff: EX2200-C won't boot.

≪ Previous: Betreff: EX2200-C won't boot.

To those of you experiencing this situation, could you please respond with Case #'s, not PR #s.

Thanks

↧

Betreff: EX2200-C won't boot.

January 21, 2020, 6:45 am

≫ Next: Betreff: EX4300 crash while attempting to bring up 10G interface

≪ Previous: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

Just FYI, but depending upon the age of this switch, you can get free warranty RMA to get a replacement.

Again, just FYI.

↧

Betreff: EX4300 crash while attempting to bring up 10G interface

January 21, 2020, 6:51 am

≫ Next: Re: EX2200-C won't boot.

≪ Previous: Betreff: EX2200-C won't boot.

Just FYI, but despite the documented multi-step upgrade requirement, you can in fact upgrade direct from any release to any other releases with EX Access ELS switches. The multi-step is to "cover" Juniper should something go wrong, which could also happen step-by-step - anything can happen. I do know of success from older releases beyond 2, to new releases, without issue.

Again just FYI and this may save you some extra cycles, if you decide to go the direct route. TAC is very likely to say you must perform multi-step.

Good luck.

↧

Re: EX2200-C won't boot.

January 21, 2020, 7:53 am

≫ Next: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

≪ Previous: Betreff: EX4300 crash while attempting to bring up 10G interface

I hadn't when I posted but I have now and the switch is restored. Thanks for the suggestion.

↧

Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

January 21, 2020, 8:20 am

≫ Next: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

≪ Previous: Re: EX2200-C won't boot.

When we upgrade from Junos 17.3R3-S3.3 to Junos 18.1R3-S6.1 and DHCP stopped working I opened case number 2019-1109-0121

When we upgraded from Junos 17.3R3-S3.3 to Junos 18.4R2-S2 and DHCP still had issues I opened case number 2019-1215-0006

Is this applicable where perhaps we have DHCP relay configured incorrectly? I posted our DHCP config earlier

https://forums.juniper.net/t5/Ethernet-Switching/DHCP-Relay-EX4600-with-Routing-Instances/td-p/319919

↧

Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

January 21, 2020, 3:15 pm

≫ Next: Re: layer2-protocol-tunneling lacp Decapsulation does not work on EX4500

≪ Previous: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

@HendersonD - my research also only found PR1396470 which should have been fixed in 18.4R2, which would include 18.4R2-S2 you tried, unless S1 or S2 created issue again. Same fix is in 18.1R3-S8 (not S6), which may be worth trying. If this code works, then likely 18.4R2-S1 or S2 created the new issues you saw.

Best I can come up with. If possible, I would try 18.1R3-S8 and regardless of outcome, open a new case with TAC to report your findings.

Good luck

↧

Re: layer2-protocol-tunneling lacp Decapsulation does not work on EX4500

January 21, 2020, 8:54 pm

≫ Next: EX3400 crashes after 4 days up

≪ Previous: Re: DHCP Relay on EX4600 not working when DHCP server is not routed in EX4600 device

Firstly, the ethertype of the incoming packet is 0x8100, I cannot change it. Further, I can't mix different ethertypes on one trunk interface. Secondly, it's coming on a trunk and there are several such VLANs. Accordingly, the native-vlan-id is of no assistance to me.

↧

EX3400 crashes after 4 days up

January 22, 2020, 12:24 am

≫ Next: Re: EX3400 crashes after 4 days up

≪ Previous: Re: layer2-protocol-tunneling lacp Decapsulation does not work on EX4500

Hi,

I have 4x Juniper EX3400 in a virtual chassis:

1x EX3400 with POE
3x EX3400 without POE

Since mid december 2019 i have to reboot the chassis after 4 days uptime -> SSH & WebInterface is not reachable anymore. SNMP requests the chassis is still answering.

The following things will be logged after the 4 days are reached:

Jan 21 22:37:17  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:37:32  switch last message repeated 3 times
Jan 21 22:37:37  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:37:37  switch jlaunchd: snmp (PID 17185) terminated by signal number 15!
Jan 21 22:37:37  switch jlaunchd: Registered PID 50568(snmp): exec_command
Jan 21 22:37:37  switch jlaunchd: snmp (PID 50568) started
Jan 21 22:37:37  switch jlaunchd: Registered PID 50568(snmp): new process
Jan 21 22:37:38  switch snmpd[50568]: SNMPD_TRAP_WARM_START: trap_generate_warm: SNMP trap: warm start
Jan 21 22:37:42  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:37:47  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:37:52  switch inetd[14764]: accept (for ssh): Software caused connection abort
Jan 21 22:37:52  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:37:57  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:37:57  switch kernel: rt_pfe_veto: Possible slowest client is mcsnoopd. States processed - 545192. States to be processed - 1
Jan 21 22:37:57  switch kernel: rt_pfe_veto: Possible second slowest client is l2ald. States processed - 545192. States to be processed - 1
Jan 21 22:38:02  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:38:32  switch last message repeated 6 times
Jan 21 22:38:37  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 2, veto simulation: 0
Jan 21 22:38:42  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 72, veto simulation: 0
Jan 21 22:38:47  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:38:52  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:38:52  switch kernel: rt_pfe_veto: Possible slowest client is mcsnoopd. States processed - 545257. States to be processed - 1
Jan 21 22:38:52  switch kernel: rt_pfe_veto: Possible second slowest client is l2ald. States processed - 545257. States to be processed - 1
Jan 21 22:38:57  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:38:57  switch kernel: rt_pfe_veto: Possible slowest client is mcsnoopd. States processed - 545266. States to be processed - 1
Jan 21 22:38:57  switch kernel: rt_pfe_veto: Possible second slowest client is l2ald. States processed - 545266. States to be processed - 1
Jan 21 22:39:02  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:39:12  switch last message repeated 2 times
Jan 21 22:39:13  switch sshd[50575]: sshd re-exec requires execution with an absolute path
Jan 21 22:39:17  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:39:22  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0

### tried to start sshd manually ###
Jan 21 22:39:26  switch sshd[50579]: error: Bind to port 22 on :: failed: Address already in use.
Jan 21 22:39:26  switch sshd[50579]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Jan 21 22:39:26  switch sshd[50579]: fatal: Cannot bind any address.
#############################

Jan 21 22:39:27  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:39:32  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:39:47  switch last message repeated 3 times
Jan 21 22:39:49  switch inetd[14764]: accept (for ssh): Software caused connection abort
Jan 21 22:39:52  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:40:02  switch last message repeated 2 times
Jan 21 22:40:12  switch last message repeated 2 times
Jan 21 22:40:17  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 72, veto simulation: 0
Jan 21 22:40:22  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:40:32  switch last message repeated 2 times
Jan 21 22:41:52  switch last message repeated 16 times
Jan 21 22:41:57  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 72, veto simulation: 0
Jan 21 22:42:02  switch kernel: rt_pfe_veto: Memory over consumed. Op 1 err 12, rtsm_id 0:-1, msg type 90, veto simulation: 0
Jan 21 22:42:32  switch last message repeated 6 times

#####rebooted chassis (all-members)#######

Jan 21 22:42:32  switch mgd[50581]: UI_REBOOT_EVENT: System rebooted by 'root'
Jan 21 22:42:36  switch shutdown: reboot requested by root at Tue Jan 21 22:43:36 2020

The chassis was fresh installed from USB stick on January 17 2020 but the problem still exists!

I hope somebody can help me.

If you need some other diagnostic logs or statistics please contact me.

Thank you in advanced!

↧

Re: EX3400 crashes after 4 days up

January 22, 2020, 12:46 am

≫ Next: Re: EX3400 crashes after 4 days up

≪ Previous: EX3400 crashes after 4 days up

Hello,

The syslog messages match a really old known bug

https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR864551&smlogin=true , fixed in

Release	junos
12.3R3	x
12.3X50-D30	x
13.1R2	x
13.2R1	x
13.2X50-D16	x
12.3R2	x
13.1R1	x
13.2X50-D17	x
13.1X50-D10	x
15.1R1	x
13.2X51-D10	x

What JUNOS version are You running?

Thanks

Alex

↧

Re: EX3400 crashes after 4 days up

January 22, 2020, 1:04 am

≫ Next: Re: EX3400 crashes after 4 days up

≪ Previous: Re: EX3400 crashes after 4 days up

Hi aarseniev,

thank you for your reply!

I forgot to write that. The following version is running on the switches -> 19.4R1.10

The version was fresh installed from a USB stick.

Thank you!

↧

Re: EX3400 crashes after 4 days up

January 22, 2020, 1:21 am

≫ Next: Re: EX3400 crashes after 4 days up

≪ Previous: Re: EX3400 crashes after 4 days up

Hello,

OK we can exclude that particular memory leak then.

Now, a follow-up question - how many routes and how many MAC addresses are pushing to this box?

EX3400 cannot possibly hold a full table, see the datasheet https://www.juniper.net/assets/us/en/local/pdf/datasheets/1000581-en.pdf

Pages 6-7:

Layer 2 Features
• Maximum MAC addresses per system: 32,000

Layer 3 Features: IPv4
• Maximum number of ARP entries: 16,000
• Maximum number of IPv4 unicast routes in hardware: 14,000 prefixes; 36,000 host routes

Layer 3 Features: IPv6
• Maximum number of Neighbor Discovery entries: 8,000
• Maximum number of IPv6 unicast routes in hardware: 3,500 prefixes; 18,000 host routes

HTH

Thx

Alex

↧

Re: EX3400 crashes after 4 days up

January 22, 2020, 1:31 am

≫ Next: Re: EX3400 crashes after 4 days up

≪ Previous: Re: EX3400 crashes after 4 days up

Hi,

Can you share the output of "show system virtual-memory no-forwarding"

↧