Document Actions

Global Control recovery procedure

by Daniel Sentenac — last modified 2009-03-09 17:21

This pages describes the support for Virgo+ global control & DAQ optical network failure

The Virgo+ global control and DAQ relies on a completely new hardware/software architecture. In case of troubles, the software oncall may be helped by the DAQ oncall as most of the problems may be due to propagation delays between the various optical nodes in the optical network. In case one of the Global Control flag becomes red in the Data Quality Monitor page, a check of the hardware components along the optical path route must be performed.
Some tools are available to identify the hardware devices involved from senders (ADC7476A, Tolm-PMC) through the MuxDemux network to receivers (Tolm-PCI) devices :

1) HWII database contains a link to the server configuration given the machine alias name (CFGDB database). This information is useful in case one may have to reboot a machine and start the processes attached to it. It also contains the physical optical cabling between each optical device. Below is shown an example of process list in the rtpc06 machine in computing room as obtained from the Web page of HWII:

Server list example

2) hwdbGetMxDxRoute.exe in /virgoDev/hwiiAPI/v0r1 package allows to retrieve in one query the serial adresses of the different devices nodes along a physical optical path.

usage : hwiiGetMxDxRoute.exe TolmType1# TolmSerial1# TolmType2# TolmSerial2#

with TolmTypes :
            - TOLM PCI : 18
            - TOLM PMC : 19
            - ADC7674A : 14

example :

The route query between GcLocking in rtpc06 located in the computing room and sender ADC7674A in the NE building is formulated by:

hwiiGetMxDxRoute.exe 18 30 14 10

which gives the result:

OUTPUT ADDRESS ROUTE : TOLM-PCI 30:OUT 0x0 -->  MxDx 08:OUT 0x3 -->  MxDx 11:OUT 0x0 -->  ADC7674A 10:IN 0x0

From this result and searching in the HWII interface, we find that MxDx 08 is in DAQ room Raq25, and MxDx 11 is in NE as expected :

The cabling indicated by the red and blue color goes to the ADC7674A #10 on the same page.

So, using HWII and hwiiAPI, one can identify the physical links between sender/receiver. From this information, the expert can perform various actions:

1) In case of abnormal propagation delay at the level of a MuxDemux, a reset must be performed.

2) In case of RIO (Pr/Qr) or RTPC (GcLocking/GcAlignment/FbLocking/FbAlignment/FbMoniXX servers ) trouble, check the led status on the devices. The led must be green and blinking for packet transmission/reception Ok. If it is not Ok, the problem may be due to a software failure at the producer level (Pr,Q, TolmProcessor), which may be solved by rebooting the machine associated to it.

3) In case of ADC7674A trouble, try reconfigure the ADC7674A by launching:

/virgoDev/Tolm/TolmBase/v0r35/Linux-i686-SL4/TolmAdcConfig.exe /virgoData/Adc7674/<CONFIGURATION>.cfg

If it doesn't recover the situation, follow step 1) & 4)

4) If the software/MuxDemux reset does not solve the problem, try replacing transeivers and swapping cables. Before doing so, the processes in the chain must be shut down to avoid corrupted packed misinterpretation in the DAQ chain.

Link to the powerpoint presentation can be found here.

Sections

Personal tools

Document Actions

Global Control recovery procedure