Vacuum
Links
main Vacuum page at LALDocumentation
Infor on OS9 crates
ToTu Servers Manual
Using Supervisor
Procedure for a clean VTT/VTTUI configuration from scratch
Start from "standard conditions": everything OFF
1- start the os9boot crate
2- start the stations (need the os9crate) one by one (due to os9boot server capabilities)
(or, remove all ethernet cables, and reconnect them one by one, which the same as you know)
3- start the Supervisor server via the icwm shortcut on ctrl (this will rsh to olserver34 and run VTT in a DEDICATED window).
4- start the Supervisor client via the icwm shortcut on ctrl (this will rsh to olserver34 and run VTTUI in a DEDICATED window).
5- start the Tu and To VSU (start process). REMINDER : There is no VSU for the Valves.
6- check that the status of VTTPartition/ VTTAll is WAITING
7- then click on IDLE, then REQUEST_CONFIGURE : this action automatically configure every servers To, Tu and Va (same result as if you oepn every client one by one and configure them)
8- open the User Menu FROM the VTTUI MAIN window (if not, you know that the datas are not collected by VTTVSU) and check the status of the servers.
9- check the datas sent by the vacuum with Data Display and the Detector Monitoring page which synthetizes the vacuum status efficiently.
The correct sending of data from VaOS9VaLink can be chacked via: bin/checkSms VaOS9VaLink as supervis user
Note: In case of problem with Va application always check that on the software intallation area the permission for
the directory: ../Vacuum/Va/<version>/binos9 are set to: drwxrwxrwx
Troubleshooting
In these examples the ToBs channel name will be used. The full list of channels related to the Vacuum software is:
Towers
ToIb - Injection Bench
ToDb - Detection Bench
ToBs - Beam Splitter
ToSr - Signal Recycling
ToPr - Power Recycling
ToIn - Injection Bench
ToIw - West Input
ToEn - North End
ToEw - West End
Configuring an individual tower
- In the VIRGO Partitions window of the Vacuum Supervisor, click on the relevant partition, e.g. Tov/General/TovGeneral.
- Once selected, choose Tov/Process/Tour from the Clients list on the toolbar. This opens the Tower client.
- On the Tour client window, click on the grey square of the relevant station to be configured in the Stations window, to the bottom-left of the overall client window.
- Once the grey square becomes green, click on the smaller blue square, which is to be found within the grey square.
- Enter the password when requested.
- From the Dialogues list in the toolbar, click on Db server configuration. This begins the configuration of the server and counts down the controllers that are being activated.
- When this activation process has finished, click on the grey square of one of the other towers in order that the password-protected connection to the previous tower is lost.
- If the green flags are not automatically restored to Detector Monitoring, run the Vacuum.sh script and then re-load the configuration of the FbsSt DAQ server.
General Tests
Checking Server Running and Configuration
- Is the ToBs server started AND in the cm list?
ctrl7[~]: cm names | grep OS9
If ToBs is not in the list, just wait, because a Vacuum server has only two states - started or rebooting. Thanks to the hardware watchdog, it cannot be unreachable, except during the booting delay. So, after a while, every server should re-appear when re-typing the cm names command. - Is To** properly configured?
Go to Client "Tour" accessed from VTTUI
If not, configure it! This happens AFTER a reboot (the automatic reconfiguration is not yet implemented) Is the ToBs server attached to the TovPartitionGeneral supervisor (which is the standard situation for all To servers)?
ctrl7[~]: cm print ToOS9ToBs
If this command returns the following:
ToOS9ToBs
Message type CmPrint received from ToOS9ToBs
-text CmName ToOS9ToBs, Host ToBs.virgo.infn.it, port 29001, socket_id 4,
owner NULL
-text CmName ToOS9ToBs, Host ToBs, port 0, socket_id 0, owner NULL
-text CmName CmNameServer_1, Host olserver.virgo.infn.it, port 29000,
socket_id 5, owner NULL
-text CmName NULL, Host NULL, port 0, socket_id 6, owner NULL
-text CmName NULL, Host NULL, port 0, socket_id 30, owner NULL
-text CmName cm_1, Host ctrl7.virgo.infn.it, port 29009, socket_id 29,
owner jehanno
Then the situation is NOT normal and needs to be fixed.
A comparison test with another server can be run:
ctrl7[~]: cm print ToOS9ToIn
ToOS9ToIn
Message type CmPrint received from ToOS9ToIn
-text CmName ToOS9ToIn, Host ToIn.virgo.infn.it, port 29001, socket_id 4,
owner NULL
-text CmName ToOS9ToIn, Host ToIn, port 0, socket_id 0, owner NULL
-text CmName CmNameServer_1, Host olserver.virgo.infn.it, port 29000,
socket_id 5, owner NULL
-text CmName FbsSt, Host olserver10.virgo.infn.it, port 29004, socket_id
6, owner virgorun
-text CmName Su_Tov_Partition_TovGeneral, Host ctrl7.virgo.infn.it, port
29002, socket_id 7, owner supervis
-text CmName cm_1, Host ctrl7.virgo.infn.it, port 29009, socket_id 8,
owner jehanno
Which indicates that ToIn is supervised by Su_Tov_Partition_TovGeneral.
This is the RIGHT situation for a server.
How to solve case of the server not being attached to the Supervisor
i. Launch the VTT client (VTTUI) with the LAL supervis account.
ii. Request the mastership on the To partition to request "Stop process".
iii. Select "Start Partition" (To). After a while, the circle becomes yellow and then green.
iv. When this process is complete, type the cm print command again:
ctrl7[~]: cm print ToOS9ToBs
Which should now return the following information:
ToOS9ToBs
Message type CmPrint received from ToOS9ToBs
-text CmName ToOS9ToBs, Host ToBs.virgo.infn.it, port 29001, socket_id 4,
owner NULL
-text CmName ToOS9ToBs, Host ToBs, port 0, socket_id 0, owner NULL
-text CmName CmNameServer_1, Host olserver.virgo.infn.it, port 29000,
socket_id 5, owner NULL
-text CmName NULL, Host NULL, port 0, socket_id 30, owner NULL
-text CmName Su_Tov_Partition_TovGeneral, Host ctrl7.virgo.infn.it, port
29002, socket_id 6, owner supervis
-text CmName cm_1, Host ctrl7.virgo.infn.it, port 29009, socket_id 29,
owner jehanno
Now, you should have the correct server status for the ToBs server:
a- it is started
b- it is configured
c- it is supervised by Su_Tov_Partition_TovGeneral
AT THIS POINT we have a proper standard ToBS server configuration, which
is the FIRST thing to verify
Problem with data being sent
A regular problem encountered with the Vacuum software is that SMS data ceases to arrive for an individual tower. This manifests itself in red flags on the Detector Monitoring System, missing data in DataDisplay and missing channels on the DAQ Data Collect FbsSt server. To solve this problem, take the following steps:
- FIRST of ALL, have a look at the file FbsSt.cfg managed by Alain: vi /virgoData/Fbs/FbsSt.cfg
Which returns:
SMS ToOS9ToBs 5 To_BS ""
SMS ToOS9ToSr 5 To_SR ""
SMS ToOS9ToPr 5 To_PR ""
SMS ToOS9ToMc 5 To_MC "" #to be removed
SMS ToOS9ToEn 5 To_NE ""
SMS ToOS9ToIn 5 To_NI ""
SMS ToOS9ToDb 5 To_OB ""
SMS ToOS9ToEw 5 To_WE ""
SMS ToOS9ToIw 5 To_WI ""
SMS ToOS9ToIb 5 To_IB ""
SMS TuOS9TuN1 5 Tu_N1 "" #to be removed
#SMS TuOS9TuN2 5 Tu_N2 ""
SMS TuOS9TuN3 5 Tu_N3 ""
SMS TuOS9TuN4 5 Tu_N4 "" #to be removed
SMS TuOS9TuN5 5 Tu_N5 "" #to be removed
SMS TuOS9TuN6 5 Tu_N6 "" #to be removed
SMS TuOS9TuW1 5 Tu_W1 "" #to be removed
SMS TuOS9TuW2 5 Tu_W2 ""
SMS TuOS9TuW3 5 Tu_W3 ""
#SMS TuOS9TuW4 5 Tu_W4 "" #to be removed
SMS TuOS9TuW5 5 Tu_W5 "" #to be removed
SMS TuOS9TuW6 5 Tu_W6 "" #to be removed
SMS VaOS9VaLink 10 Va "" 1
In this case, we see if any of the channels have been removed from the Frame Builder SMS. If there are problems with the data being sent from ToBs, we know from this configuration file that it should be sending correctly. - Check that ToBs is properly sending data:
ctrl7> cm send -to ToOS9ToBs -type FbGetAllSmsData -t All -i -1 -t -1 -handler FbSmsData
Which should return:
Message type FbSmsData received from ToOS9ToBs
-int 0
-text ToBs 296 075023 ALL 50899 0 G41 3.674000e-01 Gc2 0.000000e+00 G72
1.100000e+01 Gd1 6.655000e-02 Gd3 4.374000e-07 D1-BS 3.906250e-01 D2-BS
0.000000e+00 D3-BS 4.882810e-02 D4-BS 1.464840e-01 D-VWS 1.953120e-01
D-LWBS 4.882810e-02 D-BPBS 0.000000e+00 D-V81BS 2.441410e-01 D1-CBS
1.953120e-01 D2-CBS 9.765620e-02 D-VNS 9.765620e-02 D-VSS 1.953120e-01
D-LNBS 9.765620e-02 D-V31BS 2.441410e-01 D-TBS 1.464840e-01 D-UDBS
1.953120e-01 V31 1 V41 2 V42 2 V43 2 V51 1 V52 2 V53 1 V71 2 V72 0 V74 2
V75 2 V81 1 V82 1 V92 2 V32 1 V91 2 Vc1 1 Vc2 1 Vd1 1 Vd2 1 P41_STATUS 0
P41_DEFAULT 1 P61_STATUS 0 P61_DEFAULT 1 P33_STATUS 0 P33_CURRENT
0.000000e+00 P33_VOLTAGE 0.000000e+00 P81_STATUS 0 P81_CURRENT 0.000000e+00
P81_VOLTAGE 0.000000e+00 P31_STATUS 0 P31_VOLTAGE 0.000000e+00 P31_CURRENT
0.000000e+00 P32_STATUS 0 P32_VOLTAGE 0.000000e+00 P32_CURRENT 0.000000e+00
P51_STATUS 1 P51_SPEED 6.000000e+02 P51_POWER 6.800000e+01 P71_STATUS 2
P71_SPEED 0.000000e+00 P71_POWER 0.000000e+00 EndOfData
This indicates that ToBs is correctly sending data. A check of another station, ToIn for instance, should provide a similar result:
ctrl7> cm send -to ToOS9ToIn -type FbGetAllSmsData -t All -i -1 -t -1 -handler FbSmsData
Should return:
Message type FbSmsData received from ToOS9ToIn
-int 0
-text ToIn 296 055546 ALL 4 0 Gc2 1.000000e-01 G72 8.300000e+00 Gd1
5.392000e-04 D1-NI 2.358400e+01 D2-NI 2.578120e+01 D3-NI 2.304690e+01 D4-NI
2.270510e+01 D5-NI 2.285160e+01 D-LNANI 3.564450e+00 D-BPNI 2.651370e+01
D-V81NI 3.120120e+01 D1-CNI 5.371090e+00 D2-CNI 4.541020e+00 D-VNS
2.680660e+01 D-V31NI 3.710940e+00 D-TNI 4.394530e+00 D-UDNI 4.980470e+00
D-VNANI 4.589840e+00 V31 1 V41 2 V42 0 V43 2 V51 2 V52 2 V53 1 V71 2 V72 2
V74 2 V75 2 V81 1 V82 1 V92 2 V32 1 V91 2 Vc1 1 Vc2 1 Vd1 1 Vd2 1
P41_STATUS 0 P41_DEFAULT 1 P61_STATUS 0 P61_DEFAULT 1 P31_STATUS 0
P31_VOLTAGE 0.000000e+00 P31_CURRENT 0.000000e+00 P32_STATUS 0 P32_VOLTAGE
0.000000e+00 P32_CURRENT 0.000000e+00 P51_STATUS 2 P51_SPEED 0.000000e+00
P51_POWER 0.000000e+00 P71_STATUS 2 P71_SPEED 0.000000e+00 P71_POWER
0.000000e+00 EndOfData - Re-load the configuration of the FbsSt server on the DAQ Data Collect Client (available on both the Operators' and DAQ workstations. If the channels are still missing, then Stop and Re-start the FbsSt server. This should solve the problem.
- If data is still not arriving, then the likely cause of the problem is faulty equipment. In this case, it is the role of the Vacuum team to solve the problem.
Mastership is blocking, making it impossible to log-in - 'VSU rejects request'
On occasion, the VSU software provides a 'VSU rejects request' error message. To solve this problem take the following steps:
- Check all of the To stations. Send the following command to each station, where XX is the station name:
cm send -to ToOS9ToXX -type SuCreditShowState -handler SuCreditState - If the first line of the answer is not Su_Tov_Prtition_TovGeneral, but is instead blank, then this means that a station is not mastered.
- At this point it is necessary to launch a telnet on the ToXX and kill the server (this cannot be done from Supervisor due to mastership problem). Once logged-on to the ToXX type procs, which provides a list of all procedures running at that moment, and then kill the OS9 process (PID 18).
- The OS9 process will automatically reboot within two minutes so it is necessary to wait until the ToXX name reappears in the channel list. Check this with the following command:
ctrl7[~]: cm names | grep OS9 - Once the name reappears in the list, restart ToXX (Stop/Start process).
- The problem should now be solved.
OS9 crate not responding
If it is not possible to ping an OS9 crate, there is the possibility that it will need to be manually re-booted. The procedure is the following:
- First of all, check that the problem is really the OS9 crate and not the relevant switch. A list of switch IP addresses is avaible from Samuele.
- If it is possible to successfully ping the switch, it is necessary to go directly to the OS9 crate and reboot it. To do this it is necessary to take the Vacuum team laptop. Be sure to take a spare extension lead in case the battery is flat. Also make sure that the laptop bag contains a serial lead.
- When arrived at the OS9 crate, first of all check that the network connection is active - it is necessary to pass beneath the ITF tube to view the switch on the far wall. If all is standard then the problem is not related to the network and a reboot of the crate is necessary.
- Connect the laptop serial port to the first available serial port on the crate. This is normally called COM1 and will have a sticker above it called VT100 (or something similar). N.B. the crate will be at the bottom of the mini-rack.
- Once the the serial connection has been made between the OS9 crate and the laptop, start the laptop.
vtt cannot rsh to re-start process
The Supervisor has been closed 'un-cleanly'. It is necessary to stop the Supervisor and Client and kill any VSU processes that are still running:
ctrl7[~]: ps -ef | grep VSU
supervis 2555 1 0 Jul11 ? 00:00:00 /bin/tcsh -f /virgo/VCS-5.0/VIRGOSW/Vacuum/ToVSU/v1r5/mgr/ToVSU.csh Tov/Partition/TovGeneral
supervis 2799 2555 0 Jul11 ? 00:00:43 /virgoApp/Vacuum/ToVSU/v1r5/Linux-i686-SL4/ToVSU.exe Tov/Partition/TovGeneral
supervis 3505 1 0 Jul11 ? 00:00:00 /bin/tcsh -f /virgo/VCS-5.0/VIRGOSW/Vacuum/TuVSU/v1r4/mgr/TuVSU.csh Tuv/Partition/TuvWest
supervis 3645 3505 0 Jul11 ? 00:00:18 /virgoApp/Vacuum/TuVSU/v1r4/Linux-i686-SL4/TuVSU.exe Tuv/Partition/TuvWest
supervis 13174 13024 0 09:27 pts/1 00:00:00 /virgo/VCS-5.0/VIRGOSW/Vacuum/VTT/v1r4/Linux-i686-SL4/VTTVSU.exe VTT/Partition/VTTAll
supervis 15388 15335 0 09:56 pts/17 00:00:00 grep VSU
ctrl7[~]: kill 2555
ctrl7[~]: kill 2799
ctrl7[~]: kill 3505
ctrl7[~]: kill 3645
ctrl7[~]: kill 13174
Once all of the VSU-related processes have been killed it will be possible to re-start the Supervisor and Client successfully in order to re-start the relevant processes.
Impossibility to open client within the Supervisor, while all else works correctly
This is probably connected to an xhost problem. Type:
xhost +
and re-try the client. It should now work.
Tower and Tube servers are stopping and not re-booting successfully
This problem has been seen in the days following a re-boot of the os9boot machine. Over time more and more servers fail when re-starting. The problem occurred following a re-boot of os9boot on Monday the 13th of August and was resolved on the Thursday the 17th. The procedure used was the following:
1. Disconnection of the failed servers from the Vacuum switch (193.205.72.20) in the DAQ Room.
2. Re-boot of os9boot - also in the DAQ Room.
3. Re-boot of the RIO of one of the failed towers - the first re-boot being done directly on the crate with the laptop connected via serial port to monitor the response. If successfully completed then the remaining steps can be followed. If not, then problem requires further investigation.
4. The server cables are reconnected to the relevant ports on the Vacuum switch in the DAQ Room.
5. The remaining stopped servers can be re-started from the Vacuum Supervisor on ctrl7.
Useful information when dealing with this problem:
DAQ Switch Configuration (17/08/2007)
ToBs - ToIw - ToIn - ToSr - ToDb - OS9
PC1 - PC2 - - VaIb - ToIb - ToPr
VTTUI Control, Tower and Tube 'Not found' message displayed
Problem occurs when user is not logged-in as supervisor user, but is logged-in with another account. To resolve the problem, simply log-in as the supervisor.