Archive

Archive for February, 2013

usdsop: exec failed during spawn

February 24, 2013 2 comments

Y’day one of our PROD support team called up and asked that their ftp program is failing with the below error


/d05/applprod/11.5/xxzz/4.0.0/bin/ZZAAR1240F: No such file or directory usdsop: exec failed during spawn/d05/applprod/11.5/xxzz/4.0.0/bin/ZZAAR1240F Program exited with status 1

I first checked if the given shell script has the required permission to be executed by the application owner. and it was Ok – 755.

Then I opened the shell script on the server and tried to see if there is anything within the code that is causing the error. And there I saw the junk character (ctrl V + ctrl M). Looks like the file was edited in Windows and transferred to UNIX.

I removed all the junk character using the below command


:%s/(ctrl-v)(ctrl-m)//g
and press Enter key.

Then I asked the support team to run the program. program completed successfully and file got transferred.

-Anand

Categories: Oracle Apps

Oracle process ID is null in FND_CONCURRENT_REQUESTS table

February 11, 2013 Leave a comment

Couple of days back, I had the strange issue in our PROD environment.

End user called up and said that certain print job (custom job) has been running long and they were not able to see the output coming from the printer

We are on 11.5.9 with database version 9.2.0.6

Issue analysis/findings and resolution

  • What has been done to the PROD system
    • There were some code migrations. Although these code migrations did touch some of the objects related to the offending program but it was very unlikely to cause this strange behaviour.
    • Application tier/concurrent tier server was bounced due to some h/w change. Although this supposedly h/w change was later on reverted due to some other reason
    • application patch was applied. This patch was exclusively related to Oracle workflow.
  • Initial analysis showed that there were many duplicate requests submitted by the user. Hence those long running duplicate requests were cancelled.
  • Despite cancelling some of the requests, many more requests kept coming up and thereby pegging the concurrent queue
  • When tried to find what these long running requests are doing at the backend, I couldn’t find the Oracle process ID. ORACLE_PROCESS_ID column in FND_CONCURRENT_REQUESTS was NULL for all those concurrent requests. This was strange behaviour since this particular request has been completing successfully till sometime (may be around 10-12 hrs before) ago. And this is the reason, possibility of ‘SRW Exit and Init’ trigger code not being there was ruled out.
  • Many other concurrent jobs (Oracle standard as well custom) in this particular manager were completing successfully.
  • No errors were seen in the concurrent manager log – specific to the custom job which was giving issue and in question.
  • No errors were seen in the startup script. All the components had come gracefully when the application was bounced.
  • Later on I decided to bounce the concurrent manager. Even after bouncing the concurrent manager, behaviour remained the same i.e. when the concurrent managers came back, huge number of requests started showing running. Strange thing was none of “those” requests were having Oracle process ID associated.
  • Ran cmclean.sql script – since the behaviour was related to concurrent processing. Logged a Sev1 SR with oracle and Oracle also confirmed to run cmclean.sql and then start the concurrent manager
  • Despite running cmclean.sql, problem is not resolved.
  • Then it was decided to bounce the whole application along with the database. But even database bounce didn’t help and still all “those” requests kept coming up – but without any associated ‘Oracle process ID’. This really made the thing worst – as I was not able to identify if there is anything happening on the database
  • Database was perfectly fine –
    • No unusual database waits
    • No other long running jobs
    • No database locks
    • CPU utilization was normal
    • No persistent latches
  • Later on I tried submitting a request for the same concurrent job to see if there is some issue with the schedule. But even my job remained in the queue – due to many other requests already in queue.
  • Oracle was also not much forthcoming – despite we being on Sev1 SR.
  •  I tried one last chance – before jumping into some unknown troubleshooting steps.
    • shut down all the services on app and web tier using adstpall
    •  Tried checking if there is any old defunct process on app and web tier. I saw many (around 300-400) dead/defunct processes owned by applmgr. These sessions were print queue processes and many of them were ora_rw20_run. Ensured that no ‘applmgr’ process is left
    • Killed all the defunct processes owned by applmgr
    • bounced the database and listener
    • Restarted the app and web services
    • ‘Concurrent job’ in question started showing as completing successfully. Number of pending jobs also started receding.
  • This resolved the issue and PROD started working as expected but the exact root cause was not known

What could have caused it – Dead/defunct processes on the application tier as a result of unclean bounce of server.

  • We had server bounce due to some h/w maintenance and I remember AIX guy had some issue while unmounting the application filesystem and it took significantly more time to bring down the AIX box as compared to other instances.
  • Many print queue processes were showing of the same timestamp – as of server bounce.
  • Even relinking FND as suggested by Oracle through SR didn’t help to address the issue.

This issue haunted for several days before I happened to find the root cause and solution. Since it was never clear what has been causing this issue, so only evident solution that had worked everytime is bounce applicaiton services along with database.

Last time, when the issue came, I just happned to see the TOP command output and found VNC process taking around 6-7% CPU – which didnot look normal. I killed the VNC process and as soon as VNC process was killed, all the requests showing as running (phase code and status code = R) went away. Concurrent Manager queue which was pegged started coming to normal.

All these concurrent jobs which was running long without any databas handle process were POSTSCRIPT report and hence need some sort of DISPLAY (VNC in our case).

Currently working to figure out if upgrading VNC would resolve the frequent problem. I will update the post once I find the permamnent solution (not bouncing VNC process).

-Anand M

Categories: Oracle Apps

Installation of Oracle R12.1.1 on Linux 64 bit machine

February 10, 2013 5 comments

Today I completed the vision instance installation of Oracle R12.1.1 on Linux machine. I had some hiccups but finally were able to resolve all the error and got the login page successfully. I will be covering all the installation steps in this blog – all the issues faced, their resolutions etc.

Oracle R12 Version – 12.1.1

OS – RHEL5 update 3

Single Node and Multi user Installation.

Metalink Note ID followed – 761566.1

Before starting installation, pls make sure all the below packages/utility is installed

  • ar
  • gcc
  • g++
  • ld
  • ksh
  • make
  • X Display server

Installation size of Oracle E-business Suite 12.1.1

  • Fresh install with Vision demo database – 243 GB (Application Tier FS – 35 GB + DB Tier FS – 208 GB)
  • For my envionment
    • /d02 –> Application and Database FS  – 350 GB
  • Staging Area
    • /d03 –> Oracle provided media will be copied into the respective directory structure
    • My staging area consists of
      • /d03/R12.1.1_X86
        • startCD
        • oraAS
        • oraDB
        • oraAppsDB
        • oraApps
  • OS User and Group
    • created a group called ‘dbaerp’ and created 2 users – ‘applerp’ (application user) and ‘oraerp’ (Oracle filesystem user)
    • created a base directory – /d02/ebs/R12VIS
    • Change ownership and grant write permission to dbaerp group
    • cd /d02
    • mkdir -m 775 -p /ebs/R12VIS
    • chown -R oraerp:dbaerp ebs
  • Kernel parameters (Related to Oracle E-Business Suite – R12) – After making any changes inthe system file, use “sysctl -p” command tor restart the system to invoke the new settings. vi /etc/sysctl.conf
    • kernel.shmmax = 68719476736
    • kernel.shmall = 4294967296
    • kernel.sem =  250 32000 100 128
    • fs.file-max = 6815744
    • net.ipv4.ip_local_port_range = 9000 65500
    • net.core.rmem_default = 262144
    • net.core.wmem_default = 262144
    • net.core.rmem_max = 4194304
    • net.core.wmem_max = 1048576
  • $ vi /etc/security/limits.conf
    • Added below entries in the file
      • hard         nofile      65536
      • soft           nofile      4096
      • hard         noproc   16384
      • soft           noproc    2047
  • vi /etc/hosts – make sure hosts file is formatted as below
    • [ip_address] [node_name].[domain_name] [node_name]
  • vi /etc/sysconfig/network – Pls make sure it is formatted as below
    • HOSTNAME=[node_name].[domain_name]
  • vi  /etc/sysconfig/networking/profiles/default/network – make sure ‘network’ file doesnot exist. In my case, it was not there.
  • vi /etc/resolv.conf – Add or update the following entries to these minimum settings on each node
    • options attempts:5
    • options timeout:15
  • OS Library patch for Oracle HTTP server – Downlaod and apply the patch 6078836 from MOS to fix an issue with the Oracle HTTP server (missing libdb.so)
  • patch1
 
  • patch2
 
 

RPM required for Oracle E-Business Suite R12.1.1 – As per Metalink Note ID – 761566.1

  • Check Linux Kernel – uname -r
  • kernel
  • OS Kernel
  • OS_kernel
  • Hence for my RHEL5 update 3, following RPMs are needed – In my case, many of them were already installed and some of these RPMS need to be installed.
  • Command used to query the RPM –
    • rpm -qa –queryformat “%{NAME}-%{VERSION}.%{RELEASE} (%{ARCH})\n” |grep <RPM_NAME>
    • openmotif21-2.1.30-11.EL5.i3861
    • openmotif
    • xorg-x11-libs-compat-6.8.2-1.EL.33.0.1.i386
    • binutils-2.17.50.0.6-9.0.1.x86_642 – For update 3
    • gcc-4.1.2-14.el5.x86_64
    • gcc-c++-4.1.2-14.el5.x86_64
    • glibc-2.5-18.i686 (32-bit)
    • glibc-2.5-18.x86_64
    • glibc-common-2.5-18.x86_64
    • glibc-devel-2.5-18.i386 (32-bit)
    • glibc-devel-2.5-18.x86_64
    • libgcc-4.1.2-14.el5.i386
    • libgcc-4.1.2-14.el5.x86_64
    • libstdc++-devel-4.1.2-14.el5.i386
    • libstdc++-devel-4.1.2-14.el5.x86_64
    • libstdc++-4.1.2-14.el5.i386
    • libstdc++-4.1.2-14.el5.x86_64
    • libXi-1.0.1-3.1.i386
    • libXp-1.0.0-8.1.el5.i386
    • libXp-1.0.0-8.1.el5.x86_64
    • libaio-0.3.106-3.2.i386
    • libaio-0.3.106-3.2.x86_64
    • libgomp-4.1.2-14.el5.x86_64
    • make-3.81-1.1.x86_64
    • gdbm-1.8.0-26.2.1.i386
      • gdbm
    • gdbm-1.8.0-26.2.1.x86_64
    • sysstat-7.0.0-3.el5.x86_64
    • util-linux-2.13-0.45.el5.x86_64
    • compat-libstdc++-296-2.96-138.i386
    • compat-libstdc++-33-3.2.3-61.i386
  • Additionally few of the RPMs are required for 11gR1 database (which is bundles with 12.1.1 release) on database tier.
    • compat-libstdc++-33-3.2.3-61.x86_64
    • elfutils-libelf-devel-0.125-3.el5.x86_64
    • elfutils-libelf-devel-static-0.125-3.el5.x86_64
    • libaio-devel-0.3.106-3.2.x86_64
    • unixODBC-2.2.11-7.1.i386
    • unixODBC-devel-2.2.11-7.1.i386
    • unixODBC-2.2.11-7.1.x86_64
    • unixODBC-devel-2.2.11-7.1.x86_64
    • kernel-headers-2.6.18-8.el5.x86_64
  • I had some error installing some of these but those were dependecies error viz. error when  installed openmotif21-2.1.30-11.EL5.i3861
    • 123

Installation Process

  • Media Check – MD5 Checksums for R12.1.1 Rapid Install Media [ID 802195.1]
    • cd /d03/R12.1.1_X86 –> Staging area
    • Download md5sum_xxxxx.txt –> Depending on OS
    • md5sum
  • Standard Installation – Login as root
  • Export Display – Start the VNC process (if not done already) and export the display
  • Cd to staging area – $ cd /d03/R12.1.1_X86/startCD/Disk1/rapidwiz
  • /rapidwiz
  • inst1
  • inst2
  • inst4
  • inst5
  • inst6
  • inst7
  • inst8

Issues and resolution

  • Post Installation setp threw fllowing errors
    • HTTP Error
    • Login Pager Error
    • JSP Error
    • Virtual Page Error
    • Help page error
  • Troubleshooting step
    • I stopped all the services and tried starting it again all using adstrtall command but adopmnctl command errored out.
      • opmnctl error
    • Upon investigation, I saw some error related to APPS listener in the Autoconfig log file
    • Autoconfig error
    • Researching error over Metalink, I found a Metalink Note – 1324667.1 and performed the actions as advised
      • Clean the Data Topology Model
        • sqlplus apps/apps_password
        • Exec FND_CONC_CLONE.SETUP_CLEAN;
        • COMMIT;
        • EXIT
      • Run autoconfig on all tier – first on DB tier and then app tier.
    • Autoconfi completed on both the nodes successfully.  I started the services using adstpall. All the services came up successfully but Apache.
      • autoconfig Success
      • adopmnctl success
  • Apache still not coming up and is giving error. Error log shows
      • adapcctl error
    • Error log was pointing to libclntsh.so.10.1. This file was very much there in 10.1.2_HOME/lib and 10.1.3_HOME/lib as well. I googled a lot and every forum/thread was mentioning that this file should be there in lib as well as lib32. I didnot find lib32 folder either in 10.1.2_HOME or in 10.1.3_HOME
      • Pls remember I had already applied the patch 6078836 – which is a OS library patch and required for a known HTTP issue.
      • Hence I thought to create a lib32 folder under both the homes and create a soft link to the actual physical location
        • created lib32 folder in $10.1.2_ORACLE_HOME & $10.1.3_ORACLE_HOME
        • created soft link for libclntsh.so.10.1
          • ln -s /d02/ebs/R12VIS/apps/tech_st/10.1.2/lib/libclntsh.so.10.1 libclntsh.so
          • ln -s /d02/ebs/R12VIS/apps/tech_st/10.1.3/lib/libclntsh.so.10.1 libclntsh.so
      • Restarted apache and got another error this time. Error related to libclntsh.so was no more there.
      • adapcctl error1
      • This error was pointing to some SSL configuration in httpd.conf file. Hence commented the said configuration.
        • # Include the SSL definitions and Virtual host container
        • #include “/d02/ebs/R12VIS/inst/apps/R12VIS_adc-al-lnx45/ora/10.1.3/Apache/Apache/conf/ssl.conf”
      • Restarted Apache. This resovled the apache issue and all the services came up without any issue.
      • adapcctl success
  • Launched the login page and I was able  to see the login page
      • login page

-Anand

Categories: Oracle Apps